xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] vpci: add support for SR-IOV capability
@ 2018-07-17  9:48 Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

Hello,

The following series enables the usage of the SR-IOV capability by a PVH
Dom0. This allows Dom0 to enable virtual functions and access them as it
would do on bare metal.

No changes are needed in the Dom0 kernel in order to manage the PCIe
SR-IOV capability.

The first 10 patches are preparatory changes in order to support SR-IOV.
Patch 11 actually adds support for the capability.

The series can also be found at:

git://xenbits.xen.org/people/royger/xen.git sriov.v2

The series has been tested with a Linux PVH Dom0 and an Intel I350 nic.

Thanks, Roger.

Roger Pau Monne (11):
  vpci: move lock
  vpci/msix: add lock to protect the list of MSIX regions
  vpci: add tear down functions
  vpci/msix: add teardown cleanup
  vpci/msi: add teardown cleanup
  vpci/header: add teardown cleanup
  rangeset: introduce rangeset_merge
  vpci/header: allow multiple map operations
  pci: add vpci hooks for device addition/removal
  vpci: add a wait operation to the vpci vcpu pending actions
  vpci/sriov: add support for SR-IOV capability

 tools/tests/vpci/emul.h          |   5 +-
 tools/tests/vpci/main.c          |   4 +-
 xen/arch/arm/xen.lds.S           |   9 +-
 xen/arch/x86/hvm/hvm.c           |   1 +
 xen/arch/x86/hvm/vmsi.c          |   8 +-
 xen/arch/x86/xen.lds.S           |   9 +-
 xen/common/rangeset.c            |  12 ++
 xen/drivers/passthrough/pci.c    |   9 +
 xen/drivers/vpci/Makefile        |   2 +-
 xen/drivers/vpci/header.c        | 135 +++++++++++--
 xen/drivers/vpci/msi.c           |  61 ++++--
 xen/drivers/vpci/msix.c          |  78 ++++++--
 xen/drivers/vpci/sriov.c         | 317 +++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c          |  80 +++++---
 xen/include/asm-x86/hvm/domain.h |   1 +
 xen/include/xen/pci.h            |   1 +
 xen/include/xen/rangeset.h       |   3 +
 xen/include/xen/vpci.h           |  54 ++++--
 18 files changed, 676 insertions(+), 113 deletions(-)
 create mode 100644 xen/drivers/vpci/sriov.c

-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/11] vpci: move lock
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-06  9:57   ` Wei Liu
  2018-09-26 10:37   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions Roger Pau Monne
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

To the outside of the vpci struct. This way the lock can be used to
check whether vpci is present, and removal can be performed while
holding the lock, in order to make sure there are no accesses to the
contents of the vpci struct. Previously removal could race with
vpci_read for example, since the lock was dropped prior to freeing
pdev->vpci.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Assert that vpci_lock is locked in vpci_remove_device_locked.
 - Remove double newline.
 - Shrink critical section in vpci_{read/write}.
---
 tools/tests/vpci/emul.h       |  5 ++--
 tools/tests/vpci/main.c       |  4 +--
 xen/arch/x86/hvm/vmsi.c       |  8 +++---
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c     | 19 ++++++++----
 xen/drivers/vpci/msi.c        | 11 +++++--
 xen/drivers/vpci/msix.c       |  8 +++---
 xen/drivers/vpci/vpci.c       | 54 +++++++++++++++++++++++------------
 xen/include/xen/pci.h         |  1 +
 xen/include/xen/vpci.h        |  3 +-
 10 files changed, 72 insertions(+), 42 deletions(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 5d47544bf7..d344ef71c9 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -44,6 +44,7 @@ struct domain {
 };
 
 struct pci_dev {
+    bool vpci_lock;
     struct vpci *vpci;
 };
 
@@ -53,10 +54,8 @@ struct vcpu
 };
 
 extern const struct vcpu *current;
-extern const struct pci_dev test_pdev;
+extern struct pci_dev test_pdev;
 
-typedef bool spinlock_t;
-#define spin_lock_init(l) (*(l) = false)
 #define spin_lock(l) (*(l) = true)
 #define spin_unlock(l) (*(l) = false)
 
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006b..26c95b08b6 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -23,7 +23,8 @@ static struct vpci vpci;
 
 const static struct domain d;
 
-const struct pci_dev test_pdev = {
+struct pci_dev test_pdev = {
+    .vpci_lock = false,
     .vpci = &vpci,
 };
 
@@ -158,7 +159,6 @@ main(int argc, char **argv)
     int rc;
 
     INIT_LIST_HEAD(&vpci.handlers);
-    spin_lock_init(&vpci.lock);
 
     VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
     VPCI_READ_CHECK(0, 4, r0);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3001d5c488..94550cb8c4 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -893,14 +893,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
         {
             struct pci_dev *pdev = msix->pdev;
 
-            spin_unlock(&msix->pdev->vpci->lock);
+            spin_unlock(&msix->pdev->vpci_lock);
             process_pending_softirqs();
             /* NB: we assume that pdev cannot go away for an alive domain. */
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 return -EBUSY;
-            if ( pdev->vpci->msix != msix )
+            if ( !pdev->vpci || pdev->vpci->msix != msix )
             {
-                spin_unlock(&pdev->vpci->lock);
+                spin_unlock(&pdev->vpci_lock);
                 return -EAGAIN;
             }
         }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index c4890a4295..a5d59b83b7 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -315,6 +315,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
     *((u8*) &pdev->devfn) = devfn;
     pdev->domain = NULL;
     INIT_LIST_HEAD(&pdev->msi_list);
+    spin_lock_init(&pdev->vpci_lock);
 
     if ( pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                              PCI_CAP_ID_MSIX) )
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0ec4c082a6..9d5607d5f8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,11 +131,12 @@ bool vpci_process_pending(struct vcpu *v)
         if ( rc == -ERESTART )
             return true;
 
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
+        spin_lock(&v->vpci.pdev->vpci_lock);
+        if ( v->vpci.pdev->vpci )
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
+                            !rc && v->vpci.rom_only);
+        spin_unlock(&v->vpci.pdev->vpci_lock);
 
         rangeset_destroy(v->vpci.mem);
         v->vpci.mem = NULL;
@@ -267,6 +268,12 @@ static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
                 continue;
         }
 
+        spin_lock(&tmp->vpci_lock);
+        if ( !tmp->vpci )
+        {
+            spin_unlock(&tmp->vpci_lock);
+            continue;
+        }
         for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
         {
             const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
@@ -285,12 +292,14 @@ static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
             rc = rangeset_remove_range(mem, start, end);
             if ( rc )
             {
+                spin_unlock(&tmp->vpci_lock);
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
                 rangeset_destroy(mem);
                 return rc;
             }
         }
+        spin_unlock(&tmp->vpci_lock);
     }
 
     ASSERT(dev);
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 8f15ad7bf2..108e871d1c 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -275,7 +275,7 @@ void vpci_dump_msi(void)
     rcu_read_lock(&domlist_read_lock);
     for_each_domain ( d )
     {
-        const struct pci_dev *pdev;
+        struct pci_dev *pdev;
 
         if ( !has_vpci(d) )
             continue;
@@ -287,8 +287,13 @@ void vpci_dump_msi(void)
             const struct vpci_msi *msi;
             const struct vpci_msix *msix;
 
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 continue;
+            if ( !pdev->vpci )
+            {
+                spin_unlock(&pdev->vpci_lock);
+                continue;
+            }
 
             msi = pdev->vpci->msi;
             if ( msi && msi->enabled )
@@ -330,7 +335,7 @@ void vpci_dump_msi(void)
                 }
             }
 
-            spin_unlock(&pdev->vpci->lock);
+            spin_unlock(&pdev->vpci_lock);
             process_pending_softirqs();
         }
     }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index bcf63256f6..e28096329d 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -236,7 +236,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -265,7 +265,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
@@ -308,7 +308,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -384,7 +384,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 82607bdb9a..0e6ff09c9a 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,9 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
-void vpci_remove_device(struct pci_dev *pdev)
+static void vpci_remove_device_locked(struct pci_dev *pdev)
 {
-    spin_lock(&pdev->vpci->lock);
+    ASSERT(spin_is_locked(&pdev->vpci_lock));
+
     while ( !list_empty(&pdev->vpci->handlers) )
     {
         struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
@@ -47,13 +48,20 @@ void vpci_remove_device(struct pci_dev *pdev)
         list_del(&r->node);
         xfree(r);
     }
-    spin_unlock(&pdev->vpci->lock);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
     pdev->vpci = NULL;
 }
 
+void vpci_remove_device(struct pci_dev *pdev)
+{
+    spin_lock(&pdev->vpci_lock);
+    if ( pdev->vpci )
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
+}
+
 int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
 {
     unsigned int i;
@@ -62,12 +70,15 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
     if ( !has_vpci(pdev->domain) )
         return 0;
 
+    spin_lock(&pdev->vpci_lock);
     pdev->vpci = xzalloc(struct vpci);
     if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
         return -ENOMEM;
+    }
 
     INIT_LIST_HEAD(&pdev->vpci->handlers);
-    spin_lock_init(&pdev->vpci->lock);
 
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
@@ -77,7 +88,8 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
     }
 
     if ( rc )
-        vpci_remove_device(pdev);
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
 
     return rc;
 }
@@ -148,8 +160,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     r->offset = offset;
     r->private = data;
 
-    spin_lock(&vpci->lock);
-
     /* The list of handlers must be kept sorted at all times. */
     list_for_each ( prev, &vpci->handlers )
     {
@@ -161,14 +171,12 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
             break;
         if ( cmp == 0 )
         {
-            spin_unlock(&vpci->lock);
             xfree(r);
             return -EEXIST;
         }
     }
 
     list_add_tail(&r->node, prev);
-    spin_unlock(&vpci->lock);
 
     return 0;
 }
@@ -179,7 +187,6 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
     const struct vpci_register r = { .offset = offset, .size = size };
     struct vpci_register *rm;
 
-    spin_lock(&vpci->lock);
     list_for_each_entry ( rm, &vpci->handlers, node )
     {
         int cmp = vpci_register_cmp(&r, rm);
@@ -191,14 +198,12 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
         if ( !cmp && rm->offset == offset && rm->size == size )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
             return 0;
         }
         if ( cmp <= 0 )
             break;
     }
-    spin_unlock(&vpci->lock);
 
     return -ENOENT;
 }
@@ -315,7 +320,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
     uint32_t data = ~(uint32_t)0;
@@ -331,7 +336,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
     if ( !pdev )
         return vpci_read_hw(sbdf, reg, size);
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        return vpci_read_hw(sbdf, reg, size);
+    }
 
     /* Read from the hardware or the emulated register handlers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -374,6 +384,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
     {
@@ -383,7 +394,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
         data = merge_result(data, tmp_data, size - data_offset, data_offset);
     }
-    spin_unlock(&pdev->vpci->lock);
 
     return data & (0xffffffff >> (32 - 8 * size));
 }
@@ -418,7 +428,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
 
@@ -439,7 +449,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
         return;
     }
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        vpci_write_hw(sbdf, reg, size, data);
+        return;
+    }
+
 
     /* Write the value to the hardware or emulated registers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -474,13 +491,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
         vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
                       data >> (data_offset * 8));
-
-    spin_unlock(&pdev->vpci->lock);
 }
 
 /*
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 4cfa774615..e554c14b2f 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -114,6 +114,7 @@ struct pci_dev {
     u64 vf_rlen[6];
 
     /* Data for vPCI. */
+    spinlock_t vpci_lock;
     struct vpci *vpci;
 };
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index af2b8580ee..98556d31ed 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -29,7 +29,7 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
 
-/* Add/remove a register handler. */
+/* Add/remove a register handler. Must be called holding the vpci_lock. */
 int __must_check vpci_add_register(struct vpci *vpci,
                                    vpci_read_t *read_handler,
                                    vpci_write_t *write_handler,
@@ -58,7 +58,6 @@ bool __must_check vpci_process_pending(struct vcpu *v);
 struct vpci {
     /* List of vPCI handlers for a device. */
     struct list_head handlers;
-    spinlock_t lock;
 
 #ifdef __XEN__
     /* Hide the rest of the vpci struct from the user-space test harness. */
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-26 10:42   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 03/11] vpci: add tear down functions Roger Pau Monne
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

This is required in order to allow run-time removal of MSI-X regions.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/hvm.c           |  1 +
 xen/drivers/vpci/msix.c          | 17 +++++++++++------
 xen/include/asm-x86/hvm/domain.h |  1 +
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c7eb943ed3..955330019d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -587,6 +587,7 @@ int hvm_domain_initialise(struct domain *d)
     spin_lock_init(&d->arch.hvm_domain.uc_lock);
     spin_lock_init(&d->arch.hvm_domain.write_map.lock);
     rwlock_init(&d->arch.hvm_domain.mmcfg_lock);
+    rwlock_init(&d->arch.hvm_domain.msix_lock);
     INIT_LIST_HEAD(&d->arch.hvm_domain.write_map.list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.g2m_ioport_list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.mmcfg_regions);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index e28096329d..41138e4c73 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -148,10 +148,11 @@ static void control_write(const struct pci_dev *pdev, unsigned int reg,
         pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, val);
 }
 
-static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
+static struct vpci_msix *msix_find(struct domain *d, unsigned long addr)
 {
     struct vpci_msix *msix;
 
+    read_lock(&d->arch.hvm_domain.msix_lock);
     list_for_each_entry ( msix, &d->arch.hvm_domain.msix_tables, next )
     {
         const struct vpci_bar *bars = msix->pdev->vpci->header.bars;
@@ -160,8 +161,12 @@ static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
         for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
             if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
                  VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
+            {
+                read_unlock(&d->arch.hvm_domain.msix_lock);
                 return msix;
+            }
     }
+    read_unlock(&d->arch.hvm_domain.msix_lock);
 
     return NULL;
 }
@@ -196,8 +201,7 @@ static struct vpci_msix_entry *get_entry(struct vpci_msix *msix,
 static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
                      unsigned long *data)
 {
-    const struct domain *d = v->domain;
-    struct vpci_msix *msix = msix_find(d, addr);
+    struct vpci_msix *msix = msix_find(v->domain, addr);
     const struct vpci_msix_entry *entry;
     unsigned int offset;
 
@@ -273,8 +277,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
 static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
                       unsigned long data)
 {
-    const struct domain *d = v->domain;
-    struct vpci_msix *msix = msix_find(d, addr);
+    struct vpci_msix *msix = msix_find(v->domain, addr);
     struct vpci_msix_entry *entry;
     unsigned int offset;
 
@@ -287,7 +290,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
         /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
-        if ( is_hardware_domain(d) )
+        if ( is_hardware_domain(v->domain) )
         {
             switch ( len )
             {
@@ -438,10 +441,12 @@ static int init_msix(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    write_lock(&d->arch.hvm_domain.msix_lock);
     if ( list_empty(&d->arch.hvm_domain.msix_tables) )
         register_mmio_handler(d, &vpci_msix_table_ops);
 
     list_add(&pdev->vpci->msix->next, &d->arch.hvm_domain.msix_tables);
+    write_unlock(&d->arch.hvm_domain.msix_lock);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 588595059d..181f6a2704 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -187,6 +187,7 @@ struct hvm_domain {
 
     /* List of MSI-X tables. */
     struct list_head msix_tables;
+    rwlock_t msix_lock;
 
     /* List of permanently write-mapped pages. */
     struct {
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/11] vpci: add tear down functions
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-06  9:57   ` Wei Liu
  2018-09-26 10:50   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 04/11] vpci/msix: add teardown cleanup Roger Pau Monne
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

Tear down functions are not mandatory. Note that this patch just
implements the framework, but doesn't implement any tear down function
yet.

The list of vpci init and teardown functions is also moved outside of
the init section. This is in preparation for SR-IOV support, which
requires the ability to add and remove pci devices (and vpci handlers)
after boot.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Remove a redundant vpci != NULL check.
 - Expand commit message to mention moving the vpci handlers out of
   the init section.
---
 xen/arch/arm/xen.lds.S    |  9 +--------
 xen/arch/x86/xen.lds.S    |  9 +--------
 xen/drivers/vpci/header.c |  2 +-
 xen/drivers/vpci/msi.c    |  2 +-
 xen/drivers/vpci/msix.c   |  2 +-
 xen/drivers/vpci/vpci.c   | 18 +++++++++++++++---
 xen/include/xen/vpci.h    | 15 +++++++++++----
 7 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 245a0e0e85..1daaa680d6 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -66,7 +66,7 @@ SECTIONS
        *(.data.param)
        __param_end = .;
 
-#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
+#ifdef CONFIG_HAS_VPCI
        . = ALIGN(POINTER_ALIGN);
        __start_vpci_array = .;
        *(SORT(.data.vpci.*))
@@ -178,13 +178,6 @@ SECTIONS
        *(.init_array)
        *(SORT(.init_array.*))
        __ctors_end = .;
-
-#if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
-       . = ALIGN(POINTER_ALIGN);
-       __start_vpci_array = .;
-       *(SORT(.data.vpci.*))
-       __end_vpci_array = .;
-#endif
   } :text
   __init_end_efi = .;
   . = ALIGN(STACK_SIZE);
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 326e885402..4effe5462c 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -136,7 +136,7 @@ SECTIONS
        *(.data.param)
        __param_end = .;
 
-#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_LATE_HWDOM)
+#ifdef CONFIG_HAS_VPCI
        . = ALIGN(POINTER_ALIGN);
        __start_vpci_array = .;
        *(SORT(.data.vpci.*))
@@ -242,13 +242,6 @@ SECTIONS
        *(.init_array)
        *(SORT(.init_array.*))
        __ctors_end = .;
-
-#if defined(CONFIG_HAS_VPCI) && !defined(CONFIG_LATE_HWDOM)
-       . = ALIGN(POINTER_ALIGN);
-       __start_vpci_array = .;
-       *(SORT(.data.vpci.*))
-       __end_vpci_array = .;
-#endif
   } :text
 
   . = ALIGN(SECTION_ALIGN);
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 9d5607d5f8..4363270a55 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -560,7 +560,7 @@ static int init_bars(struct pci_dev *pdev)
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true, false) : 0;
 }
-REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
+REGISTER_VPCI_INIT(init_bars, NULL, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 108e871d1c..5bb505c864 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -266,7 +266,7 @@ static int init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
+REGISTER_VPCI_INIT(init_msi, NULL, VPCI_PRIORITY_LOW);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 41138e4c73..befc4192d9 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -450,7 +450,7 @@ static int init_msix(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msix, VPCI_PRIORITY_HIGH);
+REGISTER_VPCI_INIT(init_msix, NULL, VPCI_PRIORITY_HIGH);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 0e6ff09c9a..41fd089904 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -31,14 +31,26 @@ struct vpci_register {
 };
 
 #ifdef __XEN__
-extern vpci_register_init_t *const __start_vpci_array[];
-extern vpci_register_init_t *const __end_vpci_array[];
+extern const struct vpci_handler __start_vpci_array[];
+extern const struct vpci_handler __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
 static void vpci_remove_device_locked(struct pci_dev *pdev)
 {
+    unsigned int i;
+
     ASSERT(spin_is_locked(&pdev->vpci_lock));
 
+    for ( i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        vpci_teardown_t *teardown =
+            __start_vpci_array[NUM_VPCI_INIT - i - 1].teardown;
+
+        if ( !teardown )
+            continue;
+        teardown(pdev);
+    }
+
     while ( !list_empty(&pdev->vpci->handlers) )
     {
         struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
@@ -82,7 +94,7 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
 
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
-        rc = __start_vpci_array[i](pdev);
+        rc = __start_vpci_array[i].init(pdev);
         if ( rc )
             break;
     }
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 98556d31ed..208667227e 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,15 +13,22 @@ typedef uint32_t vpci_read_t(const struct pci_dev *pdev, unsigned int reg,
 typedef void vpci_write_t(const struct pci_dev *pdev, unsigned int reg,
                           uint32_t val, void *data);
 
-typedef int vpci_register_init_t(struct pci_dev *dev);
+typedef int vpci_init_t(struct pci_dev *dev);
+typedef void vpci_teardown_t(struct pci_dev *dev);
+
+struct vpci_handler {
+    vpci_init_t *init;
+    vpci_teardown_t *teardown;
+};
 
 #define VPCI_PRIORITY_HIGH      "1"
 #define VPCI_PRIORITY_MIDDLE    "5"
 #define VPCI_PRIORITY_LOW       "9"
 
-#define REGISTER_VPCI_INIT(x, p)                \
-  static vpci_register_init_t *const x##_entry  \
-               __used_section(".data.vpci." p) = x
+#define REGISTER_VPCI_INIT(i, t, p)                                     \
+  const static struct vpci_handler i ## t ## _entry                     \
+               __used_section(".data.vpci." p) = { .init = (i),         \
+                                                   .teardown = (t), }
 
 /* Add vPCI handlers to device. */
 int __must_check vpci_add_handlers(struct pci_dev *dev);
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/11] vpci/msix: add teardown cleanup
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (2 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 03/11] vpci: add tear down functions Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-06 10:14   ` Wei Liu
  2018-09-26 15:32   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 05/11] vpci/msi: " Roger Pau Monne
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

So that interrupts are properly freed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Shuffle order of the teardown function.
 - Free msix in the teardown function.
---
 xen/drivers/vpci/msix.c | 55 ++++++++++++++++++++++++++++++++++++-----
 xen/drivers/vpci/vpci.c |  1 -
 2 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index befc4192d9..4fd0b643df 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -436,11 +436,6 @@ static int init_msix(struct pci_dev *pdev)
         vpci_msix_arch_init_entry(&pdev->vpci->msix->entries[i]);
     }
 
-    rc = vpci_add_register(pdev->vpci, control_read, control_write,
-                           msix_control_reg(msix_offset), 2, pdev->vpci->msix);
-    if ( rc )
-        return rc;
-
     write_lock(&d->arch.hvm_domain.msix_lock);
     if ( list_empty(&d->arch.hvm_domain.msix_tables) )
         register_mmio_handler(d, &vpci_msix_table_ops);
@@ -448,9 +443,57 @@ static int init_msix(struct pci_dev *pdev)
     list_add(&pdev->vpci->msix->next, &d->arch.hvm_domain.msix_tables);
     write_unlock(&d->arch.hvm_domain.msix_lock);
 
+    rc = vpci_add_register(pdev->vpci, control_read, control_write,
+                           msix_control_reg(msix_offset), 2, pdev->vpci->msix);
+    if ( rc )
+        /* The teardown function will free the msix struct. */
+        return rc;
+
     return 0;
 }
-REGISTER_VPCI_INIT(init_msix, NULL, VPCI_PRIORITY_HIGH);
+
+static void teardown_msix(struct pci_dev *pdev)
+{
+    struct vpci_msix *msix = pdev->vpci->msix;
+    unsigned int i, pos;
+    uint16_t control;
+
+    if ( !msix )
+        return;
+
+    write_lock(&pdev->domain->arch.hvm_domain.msix_lock);
+    list_del(&pdev->vpci->msix->next);
+    write_unlock(&pdev->domain->arch.hvm_domain.msix_lock);
+
+    if ( !msix->enabled )
+        goto out;
+
+    /* Disable MSIX. */
+    pos = pci_find_cap_offset(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                              PCI_FUNC(pdev->devfn), PCI_CAP_ID_MSIX);
+    ASSERT(pos);
+    control = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                              PCI_FUNC(pdev->devfn), msix_control_reg(pos));
+    pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msix_control_reg(pos),
+                     (control & ~PCI_MSIX_FLAGS_ENABLE));
+
+    for ( i = 0; i < msix->max_entries; i++ )
+    {
+        int rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
+
+        if ( rc && rc != -ENOENT )
+            gprintk(XENLOG_WARNING,
+                    "%04x:%02x:%02x.%u: unable to disable MSIX entry %u: %d\n",
+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                    PCI_FUNC(pdev->devfn), i, rc);
+    }
+
+out:
+    xfree(msix);
+    pdev->vpci->msix = NULL;
+}
+REGISTER_VPCI_INIT(init_msix, teardown_msix, VPCI_PRIORITY_HIGH);
 
 /*
  * Local variables:
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 41fd089904..67548c4f43 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -60,7 +60,6 @@ static void vpci_remove_device_locked(struct pci_dev *pdev)
         list_del(&r->node);
         xfree(r);
     }
-    xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
     pdev->vpci = NULL;
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/11] vpci/msi: add teardown cleanup
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (3 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 04/11] vpci/msix: add teardown cleanup Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-06 10:14   ` Wei Liu
  2018-09-26 15:34   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 06/11] vpci/header: " Roger Pau Monne
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

So that interrupts are properly freed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Free the msi struct in the teardown function.
---
 xen/drivers/vpci/msi.c  | 50 ++++++++++++++++++++++++++++++++---------
 xen/drivers/vpci/vpci.c |  1 -
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 5bb505c864..fcbed772a6 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -200,16 +200,6 @@ static int init_msi(struct pci_dev *pdev)
     if ( !pdev->vpci->msi )
         return -ENOMEM;
 
-    ret = vpci_add_register(pdev->vpci, control_read, control_write,
-                            msi_control_reg(pos), 2, pdev->vpci->msi);
-    if ( ret )
-        /*
-         * NB: there's no need to free the msi struct or remove the register
-         * handlers form the config space, the caller will take care of the
-         * cleanup.
-         */
-        return ret;
-
     /* Get the maximum number of vectors the device supports. */
     control = pci_conf_read16(pdev->seg, pdev->bus, slot, func,
                               msi_control_reg(pos));
@@ -230,6 +220,16 @@ static int init_msi(struct pci_dev *pdev)
     pdev->vpci->msi->address64 = is_64bit_address(control);
     pdev->vpci->msi->masking = is_mask_bit_support(control);
 
+    ret = vpci_add_register(pdev->vpci, control_read, control_write,
+                            msi_control_reg(pos), 2, pdev->vpci->msi);
+    if ( ret )
+        /*
+         * NB: there's no need to free the msi struct or remove the register
+         * handlers form the config space, the teardown function will take care
+         * of the cleanup.
+         */
+        return ret;
+
     ret = vpci_add_register(pdev->vpci, address_read, address_write,
                             msi_lower_address_reg(pos), 4, pdev->vpci->msi);
     if ( ret )
@@ -266,7 +266,35 @@ static int init_msi(struct pci_dev *pdev)
 
     return 0;
 }
-REGISTER_VPCI_INIT(init_msi, NULL, VPCI_PRIORITY_LOW);
+
+static void teardown_msi(struct pci_dev *pdev)
+{
+    unsigned int pos = pci_find_cap_offset(pdev->seg, pdev->bus,
+                                           PCI_SLOT(pdev->devfn),
+                                           PCI_FUNC(pdev->devfn),
+                                           PCI_CAP_ID_MSI);
+    struct vpci_msi *msi = pdev->vpci->msi;
+    uint16_t control;
+
+    if ( !msi )
+        return;
+
+    if ( !msi->enabled )
+        goto out;
+
+    control = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                              PCI_FUNC(pdev->devfn), msi_control_reg(pos));
+    pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msi_control_reg(pos),
+                     (control & ~PCI_MSI_FLAGS_ENABLE));
+
+    vpci_msi_arch_disable(msi, pdev);
+
+out:
+    xfree(msi);
+    pdev->vpci->msi = NULL;
+}
+REGISTER_VPCI_INIT(init_msi, teardown_msi, VPCI_PRIORITY_LOW);
 
 void vpci_dump_msi(void)
 {
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 67548c4f43..0569459699 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -60,7 +60,6 @@ static void vpci_remove_device_locked(struct pci_dev *pdev)
         list_del(&r->node);
         xfree(r);
     }
-    xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
     pdev->vpci = NULL;
 }
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/11] vpci/header: add teardown cleanup
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (4 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 05/11] vpci/msi: " Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-28 15:29   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 07/11] rangeset: introduce rangeset_merge Roger Pau Monne
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

In order to unmap the BARs

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Add comment regarding the fact that memory decoding is not
   disabled when removing a device.
---
 xen/drivers/vpci/header.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 4363270a55..17a9dbb0bf 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,12 +131,15 @@ bool vpci_process_pending(struct vcpu *v)
         if ( rc == -ERESTART )
             return true;
 
-        spin_lock(&v->vpci.pdev->vpci_lock);
-        if ( v->vpci.pdev->vpci )
-            /* Disable memory decoding unconditionally on failure. */
-            modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
-                            !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci_lock);
+        if ( v->vpci.pdev )
+        {
+            spin_lock(&v->vpci.pdev->vpci_lock);
+            if ( v->vpci.pdev->vpci )
+                /* Disable memory decoding unconditionally on failure. */
+                modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
+                                !rc && v->vpci.rom_only);
+            spin_unlock(&v->vpci.pdev->vpci_lock);
+        }
 
         rangeset_destroy(v->vpci.mem);
         v->vpci.mem = NULL;
@@ -560,7 +563,25 @@ static int init_bars(struct pci_dev *pdev)
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true, false) : 0;
 }
-REGISTER_VPCI_INIT(init_bars, NULL, VPCI_PRIORITY_MIDDLE);
+
+static void teardown_bars(struct pci_dev *pdev)
+{
+    uint16_t cmd = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                                   PCI_FUNC(pdev->devfn), PCI_COMMAND);
+
+    if ( cmd & PCI_COMMAND_MEMORY )
+    {
+        /* Unmap all BARs from guest p2m. */
+        modify_bars(pdev, false, false);
+        /*
+         * Since this operation is deferred at the point when it finishes the
+         * device might have been removed, so don't attempt to disable memory
+         * decoding afterwards.
+         */
+        current->vpci.pdev = NULL;
+    }
+}
+REGISTER_VPCI_INIT(init_bars, teardown_bars, VPCI_PRIORITY_MIDDLE);
 
 /*
  * Local variables:
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/11] rangeset: introduce rangeset_merge
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (5 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 06/11] vpci/header: " Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 08/11] vpci/header: allow multiple map operations Roger Pau Monne
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

This new helper will merge two rangesets.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/rangeset.c      | 12 ++++++++++++
 xen/include/xen/rangeset.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index bb68ce62e4..195347669e 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -378,6 +378,18 @@ int rangeset_consume_ranges(struct rangeset *r,
     return rc;
 }
 
+static int merge(unsigned long s, unsigned long e, void *data)
+{
+    struct rangeset *r = data;
+
+    return rangeset_add_range(r, s, e);
+}
+
+int rangeset_merge(struct rangeset *r1, struct rangeset *r2)
+{
+    return rangeset_report_ranges(r2, 0, ~0ul, merge, r1);
+}
+
 int rangeset_add_singleton(
     struct rangeset *r, unsigned long s)
 {
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 583b72bb0c..0c05c2fd4e 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -80,6 +80,9 @@ int rangeset_consume_ranges(struct rangeset *r,
                                       void *, unsigned long *c),
                             void *ctxt);
 
+/* Merge rangeset r2 into rangeset r1. */
+int __must_check rangeset_merge(struct rangeset *r1, struct rangeset *r2);
+
 /* Add/remove/query a single number. */
 int __must_check rangeset_add_singleton(
     struct rangeset *r, unsigned long s);
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/11] vpci/header: allow multiple map operations
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (6 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 07/11] rangeset: introduce rangeset_merge Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-28 15:38   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 09/11] pci: add vpci hooks for device addition/removal Roger Pau Monne
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

To be queued in vpci_vcpu. This will be required for SR-IOV support,
which uses a single control register bit to toggle memory decoding for
all the virtual functions.

No functional change expected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/vpci/header.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 17a9dbb0bf..2e78f8d4a6 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -184,7 +184,19 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
+    if ( !curr->vpci.mem )
+        curr->vpci.mem = mem;
+    else
+    {
+        int rc = rangeset_merge(curr->vpci.mem, mem);
+
+        if ( rc )
+            gprintk(XENLOG_WARNING,
+                    "%04x:%02x:%02x.%u: unable to %smap memory region: %d\n",
+                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                    PCI_FUNC(pdev->devfn), map ? "" : "un", rc);
+        rangeset_destroy(mem);
+    }
     curr->vpci.map = map;
     curr->vpci.rom_only = rom_only;
 }
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/11] pci: add vpci hooks for device addition/removal
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (7 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 08/11] vpci/header: allow multiple map operations Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-28 15:48   ` Jan Beulich
  2018-07-17  9:48 ` [PATCH v2 10/11] vpci: add a wait operation to the vpci vcpu pending actions Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability Roger Pau Monne
  10 siblings, 1 reply; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

So that pci_{add/remove}_device work correctly with vpci. Note that
this requires moving vpci_add_handlers out of the init section.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/passthrough/pci.c | 8 ++++++++
 xen/drivers/vpci/vpci.c       | 2 +-
 xen/include/xen/vpci.h        | 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index a5d59b83b7..a712db0294 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -768,6 +768,13 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
             goto out;
         }
 
+        ret = vpci_add_handlers(pdev);
+        if ( ret )
+        {
+            pdev->domain = NULL;
+            goto out;
+        }
+
         list_add(&pdev->domain_list, &hardware_domain->arch.pdev_list);
     }
     else
@@ -812,6 +819,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
         if ( pdev->bus == bus && pdev->devfn == devfn )
         {
+            vpci_remove_device(pdev);
             ret = iommu_remove_device(pdev);
             if ( pdev->domain )
                 list_del(&pdev->domain_list);
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 0569459699..732d3525ae 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -72,7 +72,7 @@ void vpci_remove_device(struct pci_dev *pdev)
     spin_unlock(&pdev->vpci_lock);
 }
 
-int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
+int vpci_add_handlers(struct pci_dev *pdev)
 {
     unsigned int i;
     int rc = 0;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 208667227e..e629224088 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -224,6 +224,8 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
     return 0;
 }
 
+static inline void vpci_remove_device(struct pci_dev *pdev) { }
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/11] vpci: add a wait operation to the vpci vcpu pending actions
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (8 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 09/11] pci: add vpci hooks for device addition/removal Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-07-17  9:48 ` [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability Roger Pau Monne
  10 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

This allows waiting a specified number of cycles on the vcpu. Once the
wait has finished a callback is executed.

Note that this is still not used, but introduced here in order to
simplify the complexity of the patches that actually make use of it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - New in this version.
---
 xen/drivers/vpci/header.c | 73 +++++++++++++++++++++++++++++----------
 xen/include/xen/vpci.h    | 25 +++++++++++---
 2 files changed, 74 insertions(+), 24 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 2e78f8d4a6..e9c7b6aa72 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -120,29 +120,31 @@ static void modify_decoding(const struct pci_dev *pdev, bool map, bool rom_only)
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    switch ( v->vpci.task )
+    {
+    case MODIFY_MEMORY:
     {
         struct map_data data = {
             .d = v->domain,
-            .map = v->vpci.map,
+            .map = v->vpci.memory.map,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+        int rc = rangeset_consume_ranges(v->vpci.memory.mem, map_range, &data);
 
         if ( rc == -ERESTART )
             return true;
 
-        if ( v->vpci.pdev )
+        if ( v->vpci.memory.pdev )
         {
-            spin_lock(&v->vpci.pdev->vpci_lock);
-            if ( v->vpci.pdev->vpci )
+            spin_lock(&v->vpci.memory.pdev->vpci_lock);
+            if ( v->vpci.memory.pdev->vpci )
                 /* Disable memory decoding unconditionally on failure. */
-                modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
-                                !rc && v->vpci.rom_only);
-            spin_unlock(&v->vpci.pdev->vpci_lock);
+                modify_decoding(v->vpci.memory.pdev, !rc && v->vpci.memory.map,
+                                !rc && v->vpci.memory.rom_only);
+            spin_unlock(&v->vpci.memory.pdev->vpci_lock);
         }
 
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
+        rangeset_destroy(v->vpci.memory.mem);
+        v->vpci.task = NONE;
         if ( rc )
             /*
              * FIXME: in case of failure remove the device from the domain.
@@ -151,7 +153,20 @@ bool vpci_process_pending(struct vcpu *v)
              * killed in order to avoid leaking stale p2m mappings on
              * failure.
              */
-            vpci_remove_device(v->vpci.pdev);
+            vpci_remove_device(v->vpci.memory.pdev);
+        break;
+    }
+
+    case WAIT:
+        if ( get_cycles() < v->vpci.wait.end )
+            return true;
+
+        v->vpci.task = NONE;
+        v->vpci.wait.callback(v->vpci.wait.data);
+        break;
+
+    case NONE:
+        return false;
     }
 
     return false;
@@ -183,13 +198,35 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * is mapped. This can lead to parallel mapping operations being
      * started for the same device if the domain is not well-behaved.
      */
-    curr->vpci.pdev = pdev;
-    if ( !curr->vpci.mem )
-        curr->vpci.mem = mem;
+    if ( !pdev->info.is_virtfn )
+        curr->vpci.memory.pdev = pdev;
+    else
+    {
+        unsigned int i;
+
+        curr->vpci.memory.pdev = NULL;
+        /*
+         * Set the BARs as enabled now, for VF the memory decoding is not
+         * controlled by the VF command register.
+         */
+        for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
+            if ( MAPPABLE_BAR(&pdev->vpci->header.bars[i]) )
+                pdev->vpci->header.bars[i].enabled = map;
+    }
+    if ( curr->vpci.task == NONE )
+    {
+        curr->vpci.memory.mem = mem;
+        curr->vpci.memory.map = map;
+        curr->vpci.memory.rom_only = rom_only;
+        curr->vpci.task = MODIFY_MEMORY;
+    }
     else
     {
-        int rc = rangeset_merge(curr->vpci.mem, mem);
+        int rc = rangeset_merge(curr->vpci.memory.mem, mem);
 
+        ASSERT(curr->vpci.task == MODIFY_MEMORY);
+        ASSERT(curr->vpci.memory.map == map);
+        ASSERT(curr->vpci.memory.rom_only == rom_only);
         if ( rc )
             gprintk(XENLOG_WARNING,
                     "%04x:%02x:%02x.%u: unable to %smap memory region: %d\n",
@@ -197,8 +234,6 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
                     PCI_FUNC(pdev->devfn), map ? "" : "un", rc);
         rangeset_destroy(mem);
     }
-    curr->vpci.map = map;
-    curr->vpci.rom_only = rom_only;
 }
 
 static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
@@ -590,7 +625,7 @@ static void teardown_bars(struct pci_dev *pdev)
          * device might have been removed, so don't attempt to disable memory
          * decoding afterwards.
          */
-        current->vpci.pdev = NULL;
+        current->vpci.memory.pdev = NULL;
     }
 }
 REGISTER_VPCI_INIT(init_bars, teardown_bars, VPCI_PRIORITY_MIDDLE);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e629224088..5479fe57d2 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -148,11 +148,26 @@ struct vpci {
 };
 
 struct vpci_vcpu {
-    /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
-    struct pci_dev *pdev;
-    bool map      : 1;
-    bool rom_only : 1;
+    enum {
+        NONE,
+        MODIFY_MEMORY,
+        WAIT,
+    } task;
+    union {
+        struct {
+            /* Store state while {un}mapping of PCI BARs. */
+            struct rangeset *mem;
+            struct pci_dev *pdev;
+            bool map      : 1;
+            bool rom_only : 1;
+        } memory;
+        struct {
+           /* Store wait state. */
+           cycles_t end;
+           void (*callback)(void *);
+           void *data;
+        } wait;
+    };
 };
 
 #ifdef __XEN__
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability
  2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
                   ` (9 preceding siblings ...)
  2018-07-17  9:48 ` [PATCH v2 10/11] vpci: add a wait operation to the vpci vcpu pending actions Roger Pau Monne
@ 2018-07-17  9:48 ` Roger Pau Monne
  2018-09-06 10:37   ` Wei Liu
  2018-10-04 14:50   ` Jan Beulich
  10 siblings, 2 replies; 27+ messages in thread
From: Roger Pau Monne @ 2018-07-17  9:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

So that a PCI device that supports SR-IOV (PF) can enable the capability
and use the virtual functions.

This code is expected to only be used by privileged domains,
unprivileged domains should not get access to the SR-IOV capability.

The current code detects enabling of the virtual functions feature and
automatically adds the VFs to the domain. It also detects enabling of
memory space and maps the VFs BARs into the domain p2m. Disabling of
the VF enable bit removes the devices and the BAR memory map from the
domain p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Free sriov on teardown.
 - Use a recursive lock in order to prevent deadlocking when mapping a
   VF BAR and try to acquire the PF lock.
 - Fix comment typo.
 - Do not check pci_size_mem_bar return value against <= 0.
---
Tested with a Linux PVH Dom0 with Intel I350 nic.
---
 xen/drivers/vpci/Makefile |   2 +-
 xen/drivers/vpci/header.c |  36 +++--
 xen/drivers/vpci/sriov.c  | 317 ++++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c   |  10 +-
 xen/include/xen/vpci.h    |   9 +-
 5 files changed, 359 insertions(+), 15 deletions(-)
 create mode 100644 xen/drivers/vpci/sriov.c

diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 55d1bdfda0..6274f60e74 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o msi.o msix.o
+obj-y += vpci.o header.o msi.o msix.o sriov.o
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index e9c7b6aa72..092a888751 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -236,7 +236,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
     }
 }
 
-static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
+int vpci_modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
     struct rangeset *mem = rangeset_new(NULL, NULL, 0);
@@ -318,10 +318,16 @@ static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
                 continue;
         }
 
-        spin_lock(&tmp->vpci_lock);
+        /*
+         * When mapping the BARs of a VF the parent PF is already locked and
+         * trying to lock it will result in a deadlock, so use a recursive
+         * lock. This is because vpci_modify_bars is called from the parent PF
+         * control_write register handler.
+         */
+        spin_lock_recursive(&tmp->vpci_lock);
         if ( !tmp->vpci )
         {
-            spin_unlock(&tmp->vpci_lock);
+            spin_unlock_recursive(&tmp->vpci_lock);
             continue;
         }
         for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
@@ -342,14 +348,14 @@ static int modify_bars(const struct pci_dev *pdev, bool map, bool rom_only)
             rc = rangeset_remove_range(mem, start, end);
             if ( rc )
             {
-                spin_unlock(&tmp->vpci_lock);
+                spin_unlock_recursive(&tmp->vpci_lock);
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
                 rangeset_destroy(mem);
                 return rc;
             }
         }
-        spin_unlock(&tmp->vpci_lock);
+        spin_unlock_recursive(&tmp->vpci_lock);
     }
 
     ASSERT(dev);
@@ -391,7 +397,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
          * memory decoding bit has not been changed, so leave everything as-is,
          * hoping the guest will realize and try again.
          */
-        modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
+        vpci_modify_bars(pdev, cmd & PCI_COMMAND_MEMORY, false);
     else
         pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, cmd);
 }
@@ -472,13 +478,13 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
         header->rom_enabled = new_enabled;
         pci_conf_write32(pdev->seg, pdev->bus, slot, func, reg, val);
     }
-    else if ( modify_bars(pdev, new_enabled, true) )
+    else if ( vpci_modify_bars(pdev, new_enabled, true) )
         /*
          * No memory has been added or removed from the p2m (because the actual
          * p2m changes are deferred in defer_map) and the ROM enable bit has
          * not been changed, so leave everything as-is, hoping the guest will
          * realize and try again. It's important to not update rom->addr in the
-         * unmap case if modify_bars has failed, or future attempts would
+         * unmap case if vpci_modify_bars has failed, or future attempts would
          * attempt to unmap the wrong address.
          */
         return;
@@ -500,6 +506,16 @@ static int init_bars(struct pci_dev *pdev)
     };
     int rc;
 
+    if ( pdev->info.is_virtfn )
+        /*
+         * No need to set traps for the command register or the BAR registers
+         * because those are not used by VFs. Memory decoding and position of
+         * the VF BARs is controlled from the PF.
+         *
+         * TODO: add DomU support for VFs.
+         */
+        return 0;
+
     switch ( pci_conf_read8(pdev->seg, pdev->bus, slot, func, PCI_HEADER_TYPE)
              & 0x7f )
     {
@@ -608,7 +624,7 @@ static int init_bars(struct pci_dev *pdev)
             rom->type = VPCI_BAR_EMPTY;
     }
 
-    return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true, false) : 0;
+    return (cmd & PCI_COMMAND_MEMORY) ? vpci_modify_bars(pdev, true, false) : 0;
 }
 
 static void teardown_bars(struct pci_dev *pdev)
@@ -619,7 +635,7 @@ static void teardown_bars(struct pci_dev *pdev)
     if ( cmd & PCI_COMMAND_MEMORY )
     {
         /* Unmap all BARs from guest p2m. */
-        modify_bars(pdev, false, false);
+        vpci_modify_bars(pdev, false, false);
         /*
          * Since this operation is deferred at the point when it finishes the
          * device might have been removed, so don't attempt to disable memory
diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
new file mode 100644
index 0000000000..64b7ae8203
--- /dev/null
+++ b/xen/drivers/vpci/sriov.c
@@ -0,0 +1,317 @@
+/*
+ * Handlers for accesses to the SR-IOV capability structure.
+ *
+ * Copyright (C) 2018 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/delay.h>
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+#define SRIOV_SIZE(num) offsetof(struct vpci_sriov, vf[num])
+
+static void modify_memory_mapping(const struct pci_dev *pdev, unsigned int pos,
+                                  bool enable)
+{
+    const struct vpci_sriov *sriov = pdev->vpci->sriov;
+    unsigned int i;
+    int rc;
+
+    if ( enable )
+    {
+        struct pci_dev *pf_dev;
+
+        pcidevs_lock();
+        /*
+         * NB: a non-const pci_dev of the PF is needed in order to update
+         * vf_rlen.
+         */
+        pf_dev = pci_get_pdev(pdev->seg, pdev->bus, pdev->devfn);
+        pcidevs_unlock();
+        ASSERT(pf_dev);
+
+        /* Set the BARs addresses and size. */
+        for ( i = 0; i < PCI_SRIOV_NUM_BARS; i += rc )
+        {
+            unsigned int j, idx = pos + PCI_SRIOV_BAR + i * 4;
+            const pci_sbdf_t sbdf = {
+                .sbdf = PCI_SBDF3(pdev->seg, pdev->bus, pdev->devfn),
+            };
+            uint32_t bar = pci_conf_read32(pdev->seg, pdev->bus,
+                                           PCI_SLOT(pdev->devfn),
+                                           PCI_FUNC(pdev->devfn), idx);
+            uint64_t addr, size;
+
+            rc = pci_size_mem_bar(sbdf, idx, &addr, &size,
+                                  PCI_BAR_VF |
+                                  ((i == PCI_SRIOV_NUM_BARS - 1) ?
+                                   PCI_BAR_LAST : 0));
+
+            /*
+             * Update vf_rlen on the PF. According to the spec the size of
+             * the BARs can change if the system page size register is
+             * modified, so always update rlen when enabling VFs.
+             */
+            pf_dev->vf_rlen[i] = size;
+
+            for ( j = 0; j < sriov->num_vfs; j++ )
+            {
+                struct vpci_header *header;
+
+                if ( !sriov->vf[j] )
+                    /* Can happen if pci_add_device fails. */
+                    continue;
+
+                spin_lock(&sriov->vf[j]->vpci_lock);
+                header = &sriov->vf[j]->vpci->header;
+
+                if ( !size )
+                {
+                    header->bars[i].type = VPCI_BAR_EMPTY;
+                    spin_unlock(&sriov->vf[j]->vpci_lock);
+                    continue;
+                }
+
+                header->bars[i].addr = addr + size * j;
+                header->bars[i].size = size;
+                header->bars[i].prefetchable =
+                    bar & PCI_BASE_ADDRESS_MEM_PREFETCH;
+
+                switch ( rc )
+                {
+                case 1:
+                    header->bars[i].type = VPCI_BAR_MEM32;
+                    break;
+
+                case 2:
+                    header->bars[i].type = VPCI_BAR_MEM64_LO;
+                    header->bars[i + 1].type = VPCI_BAR_MEM64_HI;
+                    break;
+
+                default:
+                    ASSERT_UNREACHABLE();
+                    spin_unlock(&sriov->vf[j]->vpci_lock);
+                    domain_crash(pdev->domain);
+                    return;
+                }
+                spin_unlock(&sriov->vf[j]->vpci_lock);
+            }
+        }
+    }
+
+    /* Add/remove mappings for the VFs BARs into the p2m. */
+    for ( i = 0; i < sriov->num_vfs; i++ )
+    {
+        struct pci_dev *vf_pdev = sriov->vf[i];
+
+        spin_lock(&vf_pdev->vpci_lock);
+        rc = vpci_modify_bars(vf_pdev, enable, false);
+        spin_unlock(&vf_pdev->vpci_lock);
+        if ( rc )
+            gprintk(XENLOG_ERR,
+                    "failed to %smap BARs of VF %04x:%02x:%02x.%u: %d\n",
+                    enable ? "" : "un", vf_pdev->seg, vf_pdev->bus,
+                    PCI_SLOT(vf_pdev->devfn), PCI_FUNC(vf_pdev->devfn), rc);
+    }
+}
+
+static void enable_tail(const struct pci_dev *pdev, struct vpci_sriov *sriov,
+                        unsigned int pos, bool new_enabled,
+                        bool new_mem_enabled)
+{
+    uint16_t offset = pci_conf_read16(pdev->seg, pdev->bus,
+                                      PCI_SLOT(pdev->devfn),
+                                      PCI_FUNC(pdev->devfn),
+                                      pos + PCI_SRIOV_VF_OFFSET);
+    uint16_t stride = pci_conf_read16(pdev->seg, pdev->bus,
+                                      PCI_SLOT(pdev->devfn),
+                                      PCI_FUNC(pdev->devfn),
+                                      pos + PCI_SRIOV_VF_STRIDE);
+    unsigned int i;
+
+    for ( i = 0; i < sriov->num_vfs; i++ )
+    {
+        const pci_sbdf_t bdf = {
+            .bdf = PCI_BDF2(pdev->bus, pdev->devfn) + offset + stride * i,
+        };
+        int rc;
+
+        if ( new_enabled )
+        {
+            const struct pci_dev_info info = {
+                .is_virtfn = true,
+                .physfn.bus = pdev->bus,
+                .physfn.devfn = pdev->devfn,
+            };
+
+            rc = pci_add_device(pdev->seg, bdf.bus, bdf.extfunc, &info,
+                                pdev->node);
+        }
+        else
+            rc = pci_remove_device(pdev->seg, bdf.bus, bdf.extfunc);
+        if ( rc )
+            gprintk(XENLOG_ERR, "failed to %s VF %04x:%02x:%02x.%u: %d\n",
+                    new_enabled ? "add" : "remove", pdev->seg, bdf.bus,
+                    bdf.dev, bdf.func, rc);
+
+        pcidevs_lock();
+        sriov->vf[i] = pci_get_pdev(pdev->seg, bdf.bus, bdf.extfunc);
+        pcidevs_unlock();
+    }
+
+    if ( new_mem_enabled )
+        modify_memory_mapping(pdev, pos, true);
+}
+
+struct callback_data {
+    const struct pci_dev *pdev;
+    struct vpci_sriov *sriov;
+    unsigned int pos;
+    uint32_t value;
+    bool new_enabled;
+    bool new_mem_enabled;
+};
+
+static void enable_callback(void *data)
+{
+    struct callback_data *cb = data;
+    const struct pci_dev *pdev = cb->pdev;
+
+    enable_tail(pdev, cb->sriov, cb->pos, cb->new_enabled,
+                cb->new_mem_enabled);
+    pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), cb->pos + PCI_SRIOV_CTRL,
+                     cb->value);
+    xfree(cb);
+}
+
+static void control_write(const struct pci_dev *pdev, unsigned int reg,
+                          uint32_t val, void *data)
+{
+    struct vpci_sriov *sriov = data;
+    unsigned int pos = reg - PCI_SRIOV_CTRL;
+    uint16_t control = pci_conf_read16(pdev->seg, pdev->bus,
+                                       PCI_SLOT(pdev->devfn),
+                                       PCI_FUNC(pdev->devfn),
+                                       pos + PCI_SRIOV_CTRL);
+    bool enabled = control & PCI_SRIOV_CTRL_VFE;
+    bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
+    bool new_enabled = val & PCI_SRIOV_CTRL_VFE;
+    bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
+
+    if ( new_enabled != enabled )
+    {
+        if ( new_enabled )
+        {
+            struct callback_data *cb = xmalloc(struct callback_data);
+            struct vcpu *curr = current;
+
+            if ( !cb )
+            {
+                gprintk(XENLOG_WARNING, "%04x:%02x:%02x.%u: "
+                        "unable to allocate memory for SR-IOV enable\n",
+                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                        PCI_FUNC(pdev->devfn));
+                return;
+            }
+
+            /*
+             * Only update the number of active VFs when enabling, when
+             * disabling use the cached value in order to always remove the
+             * same number of VFs that were active.
+             */
+            sriov->num_vfs = pci_conf_read16(pdev->seg, pdev->bus,
+                                             PCI_SLOT(pdev->devfn),
+                                             PCI_FUNC(pdev->devfn),
+                                             pos + PCI_SRIOV_NUM_VF);
+
+            /*
+             * NB: VFE needs to be enabled before calling pci_add_device so Xen
+             * can access the config space of VFs.
+             */
+            pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                             PCI_FUNC(pdev->devfn), reg,
+                             control | PCI_SRIOV_CTRL_VFE);
+
+            /*
+             * The spec states that the software must wait at least 100ms
+             * before attempting to access VF registers when enabling virtual
+             * functions on the PF.
+             */
+            cb->pdev = pdev;
+            cb->sriov = sriov;
+            cb->pos = pos;
+            cb->value = val;
+            cb->new_enabled = new_enabled;
+            cb->new_mem_enabled = new_mem_enabled;
+            curr->vpci.task = WAIT;
+            curr->vpci.wait.callback = enable_callback;
+            curr->vpci.wait.data = cb;
+            curr->vpci.wait.end = get_cycles() + 100 * cpu_khz;
+            return;
+        }
+
+        enable_tail(pdev, sriov, pos, new_enabled, new_mem_enabled);
+    }
+    else if ( new_mem_enabled != mem_enabled && new_enabled )
+        modify_memory_mapping(pdev, pos, new_mem_enabled);
+
+    pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), reg, val);
+}
+
+static int init_sriov(struct pci_dev *pdev)
+{
+    unsigned int pos = pci_find_ext_capability(pdev->seg, pdev->bus,
+                                               pdev->devfn,
+                                               PCI_EXT_CAP_ID_SRIOV);
+    uint16_t total_vfs;
+
+    if ( !pos )
+        return 0;
+
+    total_vfs = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                                PCI_FUNC(pdev->devfn),
+                                pos + PCI_SRIOV_TOTAL_VF);
+
+    pdev->vpci->sriov = xzalloc_bytes(SRIOV_SIZE(total_vfs));
+    if ( !pdev->vpci->sriov )
+        return -ENOMEM;
+
+    return vpci_add_register(pdev->vpci, vpci_hw_read16, control_write,
+                             pos + PCI_SRIOV_CTRL, 2, pdev->vpci->sriov);
+}
+
+static void teardown_sriov(struct pci_dev *pdev)
+{
+    if ( pdev->vpci->sriov )
+    {
+        /* TODO: removing PFs is not currently supported. */
+        ASSERT_UNREACHABLE();
+        xfree(pdev->vpci->sriov);
+        domain_crash(pdev->domain);
+    }
+}
+REGISTER_VPCI_INIT(init_sriov, teardown_sriov, VPCI_PRIORITY_LOW);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 732d3525ae..8734076d39 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -459,10 +459,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
         return;
     }
 
-    spin_lock(&pdev->vpci_lock);
+    /*
+     * NB: use the recursive variant here so that mapping an unmapping of the
+     * VF vars works correctly and can recursively take the PF lock.
+     */
+    spin_lock_recursive(&pdev->vpci_lock);
     if ( !pdev->vpci )
     {
-        spin_unlock(&pdev->vpci_lock);
+        spin_unlock_recursive(&pdev->vpci_lock);
         vpci_write_hw(sbdf, reg, size, data);
         return;
     }
@@ -501,7 +505,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
             break;
         ASSERT(data_offset < size);
     }
-    spin_unlock(&pdev->vpci_lock);
+    spin_unlock_recursive(&pdev->vpci_lock);
 
     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 5479fe57d2..6fe03e9944 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -94,7 +94,6 @@ struct vpci {
          * is mapped into guest p2m) if there's a ROM BAR on the device.
          */
         bool rom_enabled      : 1;
-        /* FIXME: currently there's no support for SR-IOV. */
     } header;
 
     /* MSI data. */
@@ -144,6 +143,11 @@ struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+
+    struct vpci_sriov {
+        uint16_t num_vfs;
+        struct pci_dev *vf[];
+    } *sriov;
 #endif
 };
 
@@ -229,6 +233,9 @@ static inline unsigned int vmsix_entry_nr(const struct vpci_msix *msix,
 {
     return entry - msix->entries;
 }
+
+/* Map/unmap the BARs of a vPCI device. */
+int vpci_modify_bars(const struct pci_dev *pdev, bool map, bool rom_only);
 #endif /* __XEN__ */
 
 #else /* !CONFIG_HAS_VPCI */
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/11] vpci: move lock
  2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
@ 2018-09-06  9:57   ` Wei Liu
  2018-09-26 10:37   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2018-09-06  9:57 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jul 17, 2018 at 11:48:20AM +0200, Roger Pau Monne wrote:
> To the outside of the vpci struct. This way the lock can be used to
> check whether vpci is present, and removal can be performed while
> holding the lock, in order to make sure there are no accesses to the
> contents of the vpci struct. Previously removal could race with
> vpci_read for example, since the lock was dropped prior to freeing
> pdev->vpci.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/11] vpci: add tear down functions
  2018-07-17  9:48 ` [PATCH v2 03/11] vpci: add tear down functions Roger Pau Monne
@ 2018-09-06  9:57   ` Wei Liu
  2018-09-26 10:50   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2018-09-06  9:57 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jul 17, 2018 at 11:48:22AM +0200, Roger Pau Monne wrote:
> Tear down functions are not mandatory. Note that this patch just
> implements the framework, but doesn't implement any tear down function
> yet.
> 
> The list of vpci init and teardown functions is also moved outside of
> the init section. This is in preparation for SR-IOV support, which
> requires the ability to add and remove pci devices (and vpci handlers)
> after boot.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 04/11] vpci/msix: add teardown cleanup
  2018-07-17  9:48 ` [PATCH v2 04/11] vpci/msix: add teardown cleanup Roger Pau Monne
@ 2018-09-06 10:14   ` Wei Liu
  2018-09-26 15:32   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2018-09-06 10:14 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jul 17, 2018 at 11:48:23AM +0200, Roger Pau Monne wrote:
> So that interrupts are properly freed.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 05/11] vpci/msi: add teardown cleanup
  2018-07-17  9:48 ` [PATCH v2 05/11] vpci/msi: " Roger Pau Monne
@ 2018-09-06 10:14   ` Wei Liu
  2018-09-26 15:34   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2018-09-06 10:14 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jul 17, 2018 at 11:48:24AM +0200, Roger Pau Monne wrote:
> So that interrupts are properly freed.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability
  2018-07-17  9:48 ` [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability Roger Pau Monne
@ 2018-09-06 10:37   ` Wei Liu
  2018-09-07  9:16     ` Roger Pau Monné
  2018-10-04 14:50   ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Wei Liu @ 2018-09-06 10:37 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich, xen-devel

On Tue, Jul 17, 2018 at 11:48:30AM +0200, Roger Pau Monne wrote:
> +            curr->vpci.wait.callback = enable_callback;
> +            curr->vpci.wait.data = cb;
> +            curr->vpci.wait.end = get_cycles() + 100 * cpu_khz;

Does this guarantee to give you 100ms? What if the TSC is not constant
(invariant? forgot which name is which) ?

I think you might want to use NOW() and friends in time.c

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability
  2018-09-06 10:37   ` Wei Liu
@ 2018-09-07  9:16     ` Roger Pau Monné
  0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2018-09-07  9:16 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Jan Beulich, xen-devel

On Thu, Sep 06, 2018 at 11:37:28AM +0100, Wei Liu wrote:
> On Tue, Jul 17, 2018 at 11:48:30AM +0200, Roger Pau Monne wrote:
> > +            curr->vpci.wait.callback = enable_callback;
> > +            curr->vpci.wait.data = cb;
> > +            curr->vpci.wait.end = get_cycles() + 100 * cpu_khz;
> 
> Does this guarantee to give you 100ms? What if the TSC is not constant
> (invariant? forgot which name is which) ?
> 
> I think you might want to use NOW() and friends in time.c

Right, I originally copied this from __udelay, but using NOW is indeed
better.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/11] vpci: move lock
  2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
  2018-09-06  9:57   ` Wei Liu
@ 2018-09-26 10:37   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-26 10:37 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> @@ -62,12 +70,15 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>      if ( !has_vpci(pdev->domain) )
>          return 0;
>  
> +    spin_lock(&pdev->vpci_lock);
>      pdev->vpci = xzalloc(struct vpci);

Patterns like this should be avoided where possible: I'd prefer if you
called xzalloc() before acquiring the lock, storing the result in a local
variable.

Also the locked region becomes pretty big this way.

> @@ -148,8 +160,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      r->offset = offset;
>      r->private = data;
>  
> -    spin_lock(&vpci->lock);
> -
>      /* The list of handlers must be kept sorted at all times. */
>      list_for_each ( prev, &vpci->handlers )
>      {
> @@ -161,14 +171,12 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>              break;
>          if ( cmp == 0 )
>          {
> -            spin_unlock(&vpci->lock);
>              xfree(r);
>              return -EEXIST;
>          }
>      }
>  
>      list_add_tail(&r->node, prev);
> -    spin_unlock(&vpci->lock);
>  
>      return 0;
>  }

With the above in mind, this isn't very nice, because it puts an xmalloc()
invocation inside a locked region.

Can't you solve the race you mention in the description by simply latching
pdev->vpci into a local variable in vpci_remove_device(), and storing
NULL into the field with the lock still held? That way all the xfree()s
there could apparently move out of the locked region (if need be by
deferring all of this to an RCU callback).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions
  2018-07-17  9:48 ` [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions Roger Pau Monne
@ 2018-09-26 10:42   ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-26 10:42 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -148,10 +148,11 @@ static void control_write(const struct pci_dev *pdev, unsigned int reg,
>          pci_conf_write16(pdev->seg, pdev->bus, slot, func, reg, val);
>  }
>  
> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
> +static struct vpci_msix *msix_find(struct domain *d, unsigned long addr)
>  {
>      struct vpci_msix *msix;
>  
> +    read_lock(&d->arch.hvm_domain.msix_lock);
>      list_for_each_entry ( msix, &d->arch.hvm_domain.msix_tables, next )
>      {
>          const struct vpci_bar *bars = msix->pdev->vpci->header.bars;
> @@ -160,8 +161,12 @@ static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
>          for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>              if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>                   VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
> +            {
> +                read_unlock(&d->arch.hvm_domain.msix_lock);
>                  return msix;
> +            }
>      }
> +    read_unlock(&d->arch.hvm_domain.msix_lock);
>  
>      return NULL;
>  }

Don't you rather need the caller to acquire the lock, so that the return
value is guaranteed non-stale by the time the caller looks at it?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/11] vpci: add tear down functions
  2018-07-17  9:48 ` [PATCH v2 03/11] vpci: add tear down functions Roger Pau Monne
  2018-09-06  9:57   ` Wei Liu
@ 2018-09-26 10:50   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-26 10:50 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -31,14 +31,26 @@ struct vpci_register {
>  };
>  
>  #ifdef __XEN__
> -extern vpci_register_init_t *const __start_vpci_array[];
> -extern vpci_register_init_t *const __end_vpci_array[];
> +extern const struct vpci_handler __start_vpci_array[];
> +extern const struct vpci_handler __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
>  static void vpci_remove_device_locked(struct pci_dev *pdev)
>  {
> +    unsigned int i;
> +
>      ASSERT(spin_is_locked(&pdev->vpci_lock));
>  
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        vpci_teardown_t *teardown =
> +            __start_vpci_array[NUM_VPCI_INIT - i - 1].teardown;

Perhaps slightly easier to read if you made the loop count downwards?

>  #define VPCI_PRIORITY_HIGH      "1"
>  #define VPCI_PRIORITY_MIDDLE    "5"
>  #define VPCI_PRIORITY_LOW       "9"
>  
> -#define REGISTER_VPCI_INIT(x, p)                \
> -  static vpci_register_init_t *const x##_entry  \
> -               __used_section(".data.vpci." p) = x
> +#define REGISTER_VPCI_INIT(i, t, p)                                     \
> +  const static struct vpci_handler i ## t ## _entry                     \

Please flip static and const. Iirc we had a case in the past where some
tool (perhaps not a compiler but an analysis tool) choked about this
uncommon ordering.

With these taken care of
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 04/11] vpci/msix: add teardown cleanup
  2018-07-17  9:48 ` [PATCH v2 04/11] vpci/msix: add teardown cleanup Roger Pau Monne
  2018-09-06 10:14   ` Wei Liu
@ 2018-09-26 15:32   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-26 15:32 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -436,11 +436,6 @@ static int init_msix(struct pci_dev *pdev)
>          vpci_msix_arch_init_entry(&pdev->vpci->msix->entries[i]);
>      }
>  
> -    rc = vpci_add_register(pdev->vpci, control_read, control_write,
> -                           msix_control_reg(msix_offset), 2, pdev->vpci->msix);
> -    if ( rc )
> -        return rc;
> -
>      write_lock(&d->arch.hvm_domain.msix_lock);
>      if ( list_empty(&d->arch.hvm_domain.msix_tables) )
>          register_mmio_handler(d, &vpci_msix_table_ops);
> @@ -448,9 +443,57 @@ static int init_msix(struct pci_dev *pdev)
>      list_add(&pdev->vpci->msix->next, &d->arch.hvm_domain.msix_tables);
>      write_unlock(&d->arch.hvm_domain.msix_lock);
>  
> +    rc = vpci_add_register(pdev->vpci, control_read, control_write,
> +                           msix_control_reg(msix_offset), 2, pdev->vpci->msix);

Without the description saying why, I can't figure or guess why
this wants/needs moving.

> +    if ( rc )
> +        /* The teardown function will free the msix struct. */
> +        return rc;
> +
>      return 0;

This can be a single return statement now, without even a need
for going through rc.

> +static void teardown_msix(struct pci_dev *pdev)
> +{
> +    struct vpci_msix *msix = pdev->vpci->msix;
> +    unsigned int i, pos;
> +    uint16_t control;
> +
> +    if ( !msix )
> +        return;
> +
> +    write_lock(&pdev->domain->arch.hvm_domain.msix_lock);
> +    list_del(&pdev->vpci->msix->next);
> +    write_unlock(&pdev->domain->arch.hvm_domain.msix_lock);
> +
> +    if ( !msix->enabled )
> +        goto out;
> +
> +    /* Disable MSIX. */
> +    pos = pci_find_cap_offset(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                              PCI_FUNC(pdev->devfn), PCI_CAP_ID_MSIX);
> +    ASSERT(pos);
> +    control = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                              PCI_FUNC(pdev->devfn), msix_control_reg(pos));
> +    pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                     PCI_FUNC(pdev->devfn), msix_control_reg(pos),
> +                     (control & ~PCI_MSIX_FLAGS_ENABLE));

To avoid subsequent surprises, perhaps better also clear
PCI_MSIX_FLAGS_MASKALL?

> +    for ( i = 0; i < msix->max_entries; i++ )
> +    {
> +        int rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
> +
> +        if ( rc && rc != -ENOENT )
> +            gprintk(XENLOG_WARNING,
> +                    "%04x:%02x:%02x.%u: unable to disable MSIX entry %u: %d\n",
> +                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                    PCI_FUNC(pdev->devfn), i, rc);
> +    }
> +
> +out:

Labels indented by at least one blank please.

> +    xfree(msix);
> +    pdev->vpci->msix = NULL;

Perhaps better to zap the field before freeing the structure?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 05/11] vpci/msi: add teardown cleanup
  2018-07-17  9:48 ` [PATCH v2 05/11] vpci/msi: " Roger Pau Monne
  2018-09-06 10:14   ` Wei Liu
@ 2018-09-26 15:34   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-26 15:34 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> So that interrupts are properly freed.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Same remarks here as for patch 4.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 06/11] vpci/header: add teardown cleanup
  2018-07-17  9:48 ` [PATCH v2 06/11] vpci/header: " Roger Pau Monne
@ 2018-09-28 15:29   ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-28 15:29 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -131,12 +131,15 @@ bool vpci_process_pending(struct vcpu *v)
>          if ( rc == -ERESTART )
>              return true;
>  
> -        spin_lock(&v->vpci.pdev->vpci_lock);
> -        if ( v->vpci.pdev->vpci )
> -            /* Disable memory decoding unconditionally on failure. */
> -            modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
> -                            !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci_lock);
> +        if ( v->vpci.pdev )
> +        {
> +            spin_lock(&v->vpci.pdev->vpci_lock);
> +            if ( v->vpci.pdev->vpci )
> +                /* Disable memory decoding unconditionally on failure. */
> +                modify_decoding(v->vpci.pdev, !rc && v->vpci.map,
> +                                !rc && v->vpci.rom_only);
> +            spin_unlock(&v->vpci.pdev->vpci_lock);
> +        }
>  
>          rangeset_destroy(v->vpci.mem);
>          v->vpci.mem = NULL;

A few lines down from here there is

            vpci_remove_device(v->vpci.pdev);

which I think has the same issue.

> @@ -560,7 +563,25 @@ static int init_bars(struct pci_dev *pdev)
>  
>      return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, true, false) : 0;
>  }
> -REGISTER_VPCI_INIT(init_bars, NULL, VPCI_PRIORITY_MIDDLE);
> +
> +static void teardown_bars(struct pci_dev *pdev)
> +{
> +    uint16_t cmd = pci_conf_read16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                                   PCI_FUNC(pdev->devfn), PCI_COMMAND);
> +
> +    if ( cmd & PCI_COMMAND_MEMORY )
> +    {
> +        /* Unmap all BARs from guest p2m. */
> +        modify_bars(pdev, false, false);
> +        /*
> +         * Since this operation is deferred at the point when it finishes the
> +         * device might have been removed, so don't attempt to disable memory
> +         * decoding afterwards.
> +         */
> +        current->vpci.pdev = NULL;

Did you not mean to prefix the comment with FIXME: ? Ultimately
device removal should be delayed until all cleanup has been done,
imo.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 08/11] vpci/header: allow multiple map operations
  2018-07-17  9:48 ` [PATCH v2 08/11] vpci/header: allow multiple map operations Roger Pau Monne
@ 2018-09-28 15:38   ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-28 15:38 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -184,7 +184,19 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>       * started for the same device if the domain is not well-behaved.
>       */
>      curr->vpci.pdev = pdev;
> -    curr->vpci.mem = mem;
> +    if ( !curr->vpci.mem )
> +        curr->vpci.mem = mem;
> +    else
> +    {
> +        int rc = rangeset_merge(curr->vpci.mem, mem);
> +
> +        if ( rc )
> +            gprintk(XENLOG_WARNING,
> +                    "%04x:%02x:%02x.%u: unable to %smap memory region: %d\n",
> +                    pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                    PCI_FUNC(pdev->devfn), map ? "" : "un", rc);
> +        rangeset_destroy(mem);
> +    }
>      curr->vpci.map = map;
>      curr->vpci.rom_only = rom_only;

Is it certain that all other arguments match (pdev, map, rom_only)?
If so, please add ASSERT()s to that effect, and perhaps also half a
sentence to the description as to why that is guaranteed.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 09/11] pci: add vpci hooks for device addition/removal
  2018-07-17  9:48 ` [PATCH v2 09/11] pci: add vpci hooks for device addition/removal Roger Pau Monne
@ 2018-09-28 15:48   ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-09-28 15:48 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -768,6 +768,13 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>              goto out;
>          }
>  
> +        ret = vpci_add_handlers(pdev);
> +        if ( ret )
> +        {
> +            pdev->domain = NULL;
> +            goto out;
> +        }
> +
>          list_add(&pdev->domain_list, &hardware_domain->arch.pdev_list);
>      }
>      else
> @@ -812,6 +819,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>          if ( pdev->bus == bus && pdev->devfn == devfn )
>          {
> +            vpci_remove_device(pdev);
>              ret = iommu_remove_device(pdev);
>              if ( pdev->domain )
>                  list_del(&pdev->domain_list);

These should have been here even without SR-IOV in mind, as we don't
guarantee to find all devices during our boot time scan. Does this have
any dependencies on the earlier patches in the series? If not, I think it
should go in as soon as it's ready, independent of the rest of this series.

However, error handling in the "add" case looks wrong:
iommu_add_device() either needs undoing, or your call should be done
earlier (and its effects undone in case iommu_add_device() fails).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability
  2018-07-17  9:48 ` [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability Roger Pau Monne
  2018-09-06 10:37   ` Wei Liu
@ 2018-10-04 14:50   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2018-10-04 14:50 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, xen-devel

>>> On 17.07.18 at 11:48, <roger.pau@citrix.com> wrote:
> The current code detects enabling of the virtual functions feature and
> automatically adds the VFs to the domain. It also detects enabling of
> memory space and maps the VFs BARs into the domain p2m. Disabling of
> the VF enable bit removes the devices and the BAR memory map from the
> domain p2m.

"The domain" is ambiguous here: DYM the domain owning the PF, or
the (perhaps various) one(s) owning the VFs? From the code I guess
the latter, but that doesn't really look correct.

> @@ -500,6 +506,16 @@ static int init_bars(struct pci_dev *pdev)
>      };
>      int rc;
>  
> +    if ( pdev->info.is_virtfn )
> +        /*
> +         * No need to set traps for the command register or the BAR registers
> +         * because those are not used by VFs. Memory decoding and position of
> +         * the VF BARs is controlled from the PF.

But iirc the BARs can be read and written (for sizing purposes at least),
so I think you still need to handle accesses.

> +         * TODO: add DomU support for VFs.

It's not clear to me whether this TODO is meant to cover the above.

> +static void modify_memory_mapping(const struct pci_dev *pdev, unsigned int pos,
> +                                  bool enable)
> +{
> +    const struct vpci_sriov *sriov = pdev->vpci->sriov;
> +    unsigned int i;
> +    int rc;
> +
> +    if ( enable )
> +    {
> +        struct pci_dev *pf_dev;
> +
> +        pcidevs_lock();
> +        /*
> +         * NB: a non-const pci_dev of the PF is needed in order to update
> +         * vf_rlen.
> +         */
> +        pf_dev = pci_get_pdev(pdev->seg, pdev->bus, pdev->devfn);
> +        pcidevs_unlock();
> +        ASSERT(pf_dev);

        ASSERT(pf_dev == pdev);

?

> +static void control_write(const struct pci_dev *pdev, unsigned int reg,
> +                          uint32_t val, void *data)
> +{
> +    struct vpci_sriov *sriov = data;
> +    unsigned int pos = reg - PCI_SRIOV_CTRL;
> +    uint16_t control = pci_conf_read16(pdev->seg, pdev->bus,
> +                                       PCI_SLOT(pdev->devfn),
> +                                       PCI_FUNC(pdev->devfn),
> +                                       pos + PCI_SRIOV_CTRL);
> +    bool enabled = control & PCI_SRIOV_CTRL_VFE;
> +    bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
> +    bool new_enabled = val & PCI_SRIOV_CTRL_VFE;
> +    bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
> +
> +    if ( new_enabled != enabled )
> +    {
> +        if ( new_enabled )
> +        {
> +            struct callback_data *cb = xmalloc(struct callback_data);
> +            struct vcpu *curr = current;
> +
> +            if ( !cb )
> +            {
> +                gprintk(XENLOG_WARNING, "%04x:%02x:%02x.%u: "
> +                        "unable to allocate memory for SR-IOV enable\n",
> +                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                        PCI_FUNC(pdev->devfn));
> +                return;
> +            }
> +
> +            /*
> +             * Only update the number of active VFs when enabling, when
> +             * disabling use the cached value in order to always remove the
> +             * same number of VFs that were active.
> +             */
> +            sriov->num_vfs = pci_conf_read16(pdev->seg, pdev->bus,
> +                                             PCI_SLOT(pdev->devfn),
> +                                             PCI_FUNC(pdev->devfn),
> +                                             pos + PCI_SRIOV_NUM_VF);
> +
> +            /*
> +             * NB: VFE needs to be enabled before calling pci_add_device so Xen
> +             * can access the config space of VFs.
> +             */
> +            pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                             PCI_FUNC(pdev->devfn), reg,
> +                             control | PCI_SRIOV_CTRL_VFE);
> +
> +            /*
> +             * The spec states that the software must wait at least 100ms
> +             * before attempting to access VF registers when enabling virtual
> +             * functions on the PF.
> +             */
> +            cb->pdev = pdev;
> +            cb->sriov = sriov;
> +            cb->pos = pos;
> +            cb->value = val;
> +            cb->new_enabled = new_enabled;
> +            cb->new_mem_enabled = new_mem_enabled;
> +            curr->vpci.task = WAIT;
> +            curr->vpci.wait.callback = enable_callback;
> +            curr->vpci.wait.data = cb;
> +            curr->vpci.wait.end = get_cycles() + 100 * cpu_khz;
> +            return;

Not sure whether to better ask the question here or for the previous
patch: Isn't this WAIT mechanism leading to the guest actually spinning?
We certainly don't want this for a 100ms period.

> +static void teardown_sriov(struct pci_dev *pdev)
> +{
> +    if ( pdev->vpci->sriov )
> +    {
> +        /* TODO: removing PFs is not currently supported. */
> +        ASSERT_UNREACHABLE();

Irrespective of the TODO I think it would be nice if this didn't lead
to a host crash.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -459,10 +459,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>          return;
>      }
>  
> -    spin_lock(&pdev->vpci_lock);
> +    /*
> +     * NB: use the recursive variant here so that mapping an unmapping of the
> +     * VF vars works correctly and can recursively take the PF lock.
> +     */
> +    spin_lock_recursive(&pdev->vpci_lock);
>      if ( !pdev->vpci )
>      {
> -        spin_unlock(&pdev->vpci_lock);
> +        spin_unlock_recursive(&pdev->vpci_lock);

Wouldn't it make sense to split out the locking changes, or do the
conversion right in patch 1 (with a suitable explanation)?

> @@ -144,6 +143,11 @@ struct vpci {
>              struct vpci_arch_msix_entry arch;
>          } entries[];
>      } *msix;
> +
> +    struct vpci_sriov {
> +        uint16_t num_vfs;

Any particular reason for this to be uint16_t?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-10-04 14:50 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17  9:48 [PATCH v2 00/11] vpci: add support for SR-IOV capability Roger Pau Monne
2018-07-17  9:48 ` [PATCH v2 01/11] vpci: move lock Roger Pau Monne
2018-09-06  9:57   ` Wei Liu
2018-09-26 10:37   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 02/11] vpci/msix: add lock to protect the list of MSIX regions Roger Pau Monne
2018-09-26 10:42   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 03/11] vpci: add tear down functions Roger Pau Monne
2018-09-06  9:57   ` Wei Liu
2018-09-26 10:50   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 04/11] vpci/msix: add teardown cleanup Roger Pau Monne
2018-09-06 10:14   ` Wei Liu
2018-09-26 15:32   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 05/11] vpci/msi: " Roger Pau Monne
2018-09-06 10:14   ` Wei Liu
2018-09-26 15:34   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 06/11] vpci/header: " Roger Pau Monne
2018-09-28 15:29   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 07/11] rangeset: introduce rangeset_merge Roger Pau Monne
2018-07-17  9:48 ` [PATCH v2 08/11] vpci/header: allow multiple map operations Roger Pau Monne
2018-09-28 15:38   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 09/11] pci: add vpci hooks for device addition/removal Roger Pau Monne
2018-09-28 15:48   ` Jan Beulich
2018-07-17  9:48 ` [PATCH v2 10/11] vpci: add a wait operation to the vpci vcpu pending actions Roger Pau Monne
2018-07-17  9:48 ` [PATCH v2 11/11] vpci/sriov: add support for SR-IOV capability Roger Pau Monne
2018-09-06 10:37   ` Wei Liu
2018-09-07  9:16     ` Roger Pau Monné
2018-10-04 14:50   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).