All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleksandr Tyshchenko <olekstysh@gmail.com>
To: xen-devel@lists.xenproject.org
Cc: "Oleksandr Andrushchenko" <oleksandr_andrushchenko@epam.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: [PATCH V7 05/11] vpci/header: handle p2m range sets per BAR
Date: Tue, 19 Jul 2022 20:42:47 +0300	[thread overview]
Message-ID: <20220719174253.541965-6-olekstysh@gmail.com> (raw)
In-Reply-To: <20220719174253.541965-1-olekstysh@gmail.com>

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

This is in preparation of making non-identity mappings in p2m for the MMIOs.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- update according to the new locking scheme
- remove odd fail label in modify_bars
- OT: rebased
Since v5:
- fix comments
- move rangeset allocation to init_bars and only allocate
  for MAPPABLE BARs
- check for overlap with the already setup BAR ranges
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 241 +++++++++++++++++++++++++++-----------
 xen/drivers/vpci/vpci.c   |   5 +
 xen/include/xen/vpci.h    |   3 +-
 3 files changed, 182 insertions(+), 67 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 9fbbdc3500..f14ff11882 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,64 +131,106 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    struct pci_dev *pdev = v->vpci.pdev;
+
+    if ( !pdev )
+        return false;
+
+    pcidevs_read_lock();
+
+    if ( v->vpci.map_pending )
     {
         struct map_data data = {
             .d = v->domain,
             .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
-
-        if ( rc == -ERESTART )
-            return true;
-
-        pcidevs_read_lock();
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
-        pcidevs_read_unlock();
-
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
-        if ( rc )
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
+
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
         {
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
-             */
-            pcidevs_write_lock();
-            vpci_remove_device(v->vpci.pdev);
-            pcidevs_write_unlock();
+            struct vpci_bar *bar = &header->bars[i];
+            int rc;
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+            if ( rc == -ERESTART )
+            {
+                pcidevs_read_unlock();
+                return true;
+            }
+
+            spin_lock(&pdev->vpci->lock);
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY :
+                                       v->vpci.cmd, !rc && v->vpci.rom_only);
+            spin_unlock(&pdev->vpci->lock);
+
+            if ( rc )
+            {
+                /*
+                 * FIXME: in case of failure remove the device from the domain.
+                 * Note that there might still be leftover mappings. While this
+                 * is safe for Dom0, for DomUs the domain needs to be killed in
+                 * order to avoid leaking stale p2m mappings on failure.
+                 */
+                v->vpci.map_pending = false;
+                pcidevs_read_unlock();
+
+                if ( is_hardware_domain(v->domain) )
+                {
+                    pcidevs_write_lock();
+                    vpci_remove_device(v->vpci.pdev);
+                    pcidevs_write_unlock();
+                }
+                else
+                    domain_crash(v->domain);
+
+                return false;
+            }
         }
+
+        v->vpci.map_pending = false;
     }
 
+    pcidevs_read_unlock();
+
     return false;
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-                            struct rangeset *mem, uint16_t cmd)
+                            uint16_t cmd)
 {
     struct map_data data = { .d = d, .map = true };
-    int rc;
+    struct vpci_header *header = &pdev->vpci->header;
+    int rc = 0;
+    unsigned int i;
 
-    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
+    ASSERT(pcidevs_write_locked());
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        /*
-         * It's safe to drop and re-acquire the lock in this context
-         * without risking pdev disappearing because devices cannot be
-         * removed until the initial domain has been started.
-         */
-        pcidevs_write_unlock();
-        process_pending_softirqs();
-        pcidevs_write_lock();
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( rangeset_is_empty(bar->mem) )
+            continue;
+
+        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
+                                              &data)) == -ERESTART )
+        {
+            /*
+             * It's safe to drop and reacquire the lock in this context
+             * without risking pdev disappearing because devices cannot be
+             * removed until the initial domain has been started.
+             */
+            pcidevs_write_unlock();
+            process_pending_softirqs();
+            pcidevs_write_lock();
+        }
     }
-    rangeset_destroy(mem);
     if ( !rc )
         modify_decoding(pdev, cmd, false);
 
@@ -196,7 +238,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 }
 
 static void defer_map(struct domain *d, struct pci_dev *pdev,
-                      struct rangeset *mem, uint16_t cmd, bool rom_only)
+                      uint16_t cmd, bool rom_only)
 {
     struct vcpu *curr = current;
 
@@ -207,7 +249,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
+    curr->vpci.map_pending = true;
     curr->vpci.cmd = cmd;
     curr->vpci.rom_only = rom_only;
     /*
@@ -221,43 +263,61 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
 static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
-    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     struct pci_dev *tmp, *dev = NULL;
     const struct vpci_msix *msix = pdev->vpci->msix;
-    unsigned int i;
+    unsigned int i, j;
     int rc;
-
-    if ( !mem )
-        return -ENOMEM;
+    bool map_pending;
 
     /*
-     * Create a rangeset that represents the current device BARs memory region
-     * and compare it against all the currently active BAR memory regions. If
-     * an overlap is found, subtract it from the region to be mapped/unmapped.
+     * Create a rangeset per BAR that represents the current device memory
+     * region and compare it against all the currently active BAR memory
+     * regions. If an overlap is found, subtract it from the region to be
+     * mapped/unmapped.
      *
-     * First fill the rangeset with all the BARs of this device or with the ROM
+     * First fill the rangesets with the BARs of this device or with the ROM
      * BAR only, depending on whether the guest is toggling the memory decode
      * bit of the command register, or the enable bit of the ROM BAR register.
      */
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        const struct vpci_bar *bar = &header->bars[i];
+        struct vpci_bar *bar = &header->bars[i];
         unsigned long start = PFN_DOWN(bar->addr);
         unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
+        if ( !bar->mem )
+            continue;
+
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
                        : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
             continue;
 
-        rc = rangeset_add_range(mem, start, end);
+        rc = rangeset_add_range(bar->mem, start, end);
         if ( rc )
         {
             printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
                    start, end, rc);
-            rangeset_destroy(mem);
             return rc;
         }
+
+        /* Check for overlap with the already setup BAR ranges. */
+        for ( j = 0; j < i; j++ )
+        {
+            struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove overlapping range [%lx, %lx]: %d\n",
+                       start, end, rc);
+                return rc;
+            }
+        }
     }
 
     /* Remove any MSIX regions if present. */
@@ -267,14 +327,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
         unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
                                      vmsix_table_size(pdev->vpci, i) - 1);
 
-        rc = rangeset_remove_range(mem, start, end);
-        if ( rc )
+        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
         {
-            printk(XENLOG_G_WARNING
-                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
-                   start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            const struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
+                       start, end, rc);
+                return rc;
+            }
         }
     }
 
@@ -306,7 +373,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             unsigned long start = PFN_DOWN(bar->addr);
             unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
-            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
+            if ( !bar->enabled ||
+                 !rangeset_overlaps_range(bar->mem, start, end) ||
                  /*
                   * If only the ROM enable bit is toggled check against other
                   * BARs in the same device for overlaps, but not against the
@@ -315,12 +383,11 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                  (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
                 continue;
 
-            rc = rangeset_remove_range(mem, start, end);
+            rc = rangeset_remove_range(bar->mem, start, end);
             if ( rc )
             {
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
-                rangeset_destroy(mem);
                 return rc;
             }
         }
@@ -339,10 +406,23 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
          * will always be to establish mappings and process all the BARs.
          */
         ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
-        return apply_map(pdev->domain, pdev, mem, cmd);
+        return apply_map(pdev->domain, pdev, cmd);
     }
 
-    defer_map(dev->domain, dev, mem, cmd, rom_only);
+    /* Find out how many memory ranges has left after MSI and overlaps. */
+    map_pending = false;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        if ( !rangeset_is_empty(header->bars[i].mem) )
+        {
+            map_pending = true;
+            break;
+        }
+
+    /* If there's no mapping work write the command register now. */
+    if ( !map_pending )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    else
+        defer_map(dev->domain, dev, cmd, rom_only);
 
     return 0;
 }
@@ -525,6 +605,19 @@ static void cf_check rom_write(
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static int bar_add_rangeset(struct pci_dev *pdev, struct vpci_bar *bar, int i)
+{
+    char str[32];
+
+    snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
+
+    bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
+    if ( !bar->mem )
+        return -ENOMEM;
+
+    return 0;
+}
+
 static int cf_check init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -611,6 +704,13 @@ static int cf_check init_bars(struct pci_dev *pdev)
         else
             bars[i].type = VPCI_BAR_MEM32;
 
+        rc = bar_add_rangeset(pdev, &bars[i], i);
+        if ( rc )
+        {
+            bars[i].type = VPCI_BAR_EMPTY;
+            goto fail;
+        }
+
         rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
                               (i == num_bars - 1) ? PCI_BAR_LAST : 0);
         if ( rc < 0 )
@@ -661,6 +761,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
                                    rom_reg, 4, rom);
             if ( rc )
                 rom->type = VPCI_BAR_EMPTY;
+            else
+            {
+                rc = bar_add_rangeset(pdev, rom, i);
+                if ( rc )
+                {
+                    rom->type = VPCI_BAR_EMPTY;
+                    goto fail;
+                }
+            }
         }
     }
     else
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d187901422..f683346285 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -38,6 +38,8 @@ extern vpci_register_init_t *const __end_vpci_array[];
 
 void vpci_remove_device(struct pci_dev *pdev)
 {
+    unsigned int i;
+
     ASSERT(pcidevs_write_locked());
 
     if ( !has_vpci(pdev->domain) || !pdev->vpci )
@@ -54,6 +56,9 @@ void vpci_remove_device(struct pci_dev *pdev)
         xfree(r);
     }
     spin_unlock(&pdev->vpci->lock);
+
+    for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
+        rangeset_destroy(pdev->vpci->header.bars[i].mem);
     if ( pdev->vpci->msix && pdev->vpci->msix->pba )
         iounmap(pdev->vpci->msix->pba);
     xfree(pdev->vpci->msix);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e1d3b93cd..6332659c4a 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -72,6 +72,7 @@ struct vpci {
             /* Guest view of the BAR: address and lower bits. */
             uint64_t guest_reg;
             uint64_t size;
+            struct rangeset *mem;
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -146,9 +147,9 @@ struct vpci {
 
 struct vpci_vcpu {
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
     struct pci_dev *pdev;
     uint16_t cmd;
+    bool map_pending : 1;
     bool rom_only : 1;
 };
 
-- 
2.25.1



  parent reply	other threads:[~2022-07-19 17:43 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
2022-07-29 16:28   ` Oleksandr
2022-08-03  9:29     ` Rahul Singh
2022-08-03 14:18       ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Tyshchenko
2022-07-27 10:03   ` Jan Beulich
2022-07-27 14:01     ` Oleksandr
2022-07-27 14:35       ` Jan Beulich
2022-07-27 16:49         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 03/11] vpci/header: implement guest BAR register handlers Oleksandr Tyshchenko
2022-07-27 10:15   ` Jan Beulich
2022-07-27 16:17     ` Oleksandr
2022-07-28  7:01       ` Jan Beulich
2022-07-28 14:56         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag Oleksandr Tyshchenko
2022-07-26 14:48   ` Rahul Singh
2022-07-19 17:42 ` Oleksandr Tyshchenko [this message]
2022-07-19 17:42 ` [PATCH V7 06/11] vpci/header: program p2m with guest BAR view Oleksandr Tyshchenko
2022-07-27 10:19   ` Jan Beulich
2022-07-27 17:06     ` Oleksandr
2022-07-28  7:04       ` Jan Beulich
2022-07-19 17:42 ` [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Tyshchenko
2022-07-26 15:30   ` Jan Beulich
2022-07-27 17:30     ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
2022-07-26 15:09   ` Rahul Singh
2022-07-26 15:23   ` Jan Beulich
2022-07-27  8:58     ` Oleksandr
2022-07-27  9:46       ` Jan Beulich
2022-07-27 16:53         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology Oleksandr Tyshchenko
2022-07-27 10:32   ` Jan Beulich
2022-07-28 14:16     ` Oleksandr
2022-07-28 14:26       ` Jan Beulich
2022-07-28 14:41         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Tyshchenko
2022-07-26 15:16   ` Jan Beulich
2022-07-27 17:54     ` Oleksandr
2022-07-27 19:39       ` Oleksandr
2022-07-28  7:15         ` Jan Beulich
2022-07-28 16:35           ` Oleksandr
2022-07-29  6:06             ` Jan Beulich
2022-07-29 16:26               ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Tyshchenko
2022-07-26 14:50   ` Rahul Singh
2022-07-26 13:47 ` [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Rahul Singh
2022-07-26 15:18   ` Oleksandr Tyshchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220719174253.541965-6-olekstysh@gmail.com \
    --to=olekstysh@gmail.com \
    --cc=oleksandr_andrushchenko@epam.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.