All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling
@ 2022-10-20  9:46 Roger Pau Monne
  2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Wei Liu, Anthony PERARD, Jan Beulich, Paul Durrant

Hello,

First two patches fix some build isses that showed up on the vpci test
harness, following patches attempt to fix the regressions caused by
75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') and
the last patch relaxes the check done when attempting to write to BARs
with memory decoding enabled.

I consider all of them bug fixes, albeit the last patch is not fixing a
regression (since vPCI code has always behaved this way).

Thanks, Roger.

Roger Pau Monne (6):
  test/vpci: add dummy cfcheck define
  test/vpci: fix vPCI test harness to provide pci_get_pdev()
  vpci: don't assume that vpci per-device data exists unconditionally
  vpci: introduce a local vpci_bar variable to modify_decoding()
  pci: do not disable memory decoding for devices
  vpci: refuse BAR writes only if the BAR is mapped

 tools/tests/vpci/emul.h       |  3 +-
 xen/drivers/passthrough/pci.c | 69 -----------------------------------
 xen/drivers/vpci/header.c     | 38 ++++++++++++++-----
 xen/drivers/vpci/vpci.c       |  6 +--
 4 files changed, 34 insertions(+), 82 deletions(-)

-- 
2.37.3



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-20  9:57   ` Andrew Cooper
  2022-10-20 13:20   ` Anthony PERARD
  2022-10-20  9:46 ` [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev() Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Wei Liu, Anthony PERARD

Some vpci functions got the cfcheck attribute added, but that's not
defined in the user-space test harness, so add a dummy define in order
for the harness to build.

Fixes: 4ed7d5525f ('xen/vpci: CFI hardening')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/tests/vpci/emul.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 2e1d3057c9..386b15eb86 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -37,6 +37,7 @@
 #define prefetch(x) __builtin_prefetch(x)
 #define ASSERT(x) assert(x)
 #define __must_check __attribute__((__warn_unused_result__))
+#define cf_check
 
 #include "list.h"
 
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev()
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
  2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-20 13:21   ` Anthony PERARD
  2022-10-20  9:46 ` [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Wei Liu, Anthony PERARD

Instead of pci_get_pdev_by_domain(), which is no longer present in the
hypervisor.

While there add parentheses around the define value.

Fixes: a37f9ea7a6 ('PCI: fold pci_get_pdev{,_by_domain}()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/tests/vpci/emul.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 386b15eb86..f03e3a56d1 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -92,7 +92,7 @@ typedef union {
 #define xmalloc(type) ((type *)malloc(sizeof(type)))
 #define xfree(p) free(p)
 
-#define pci_get_pdev_by_domain(...) &test_pdev
+#define pci_get_pdev(...) (&test_pdev)
 #define pci_get_ro_map(...) NULL
 
 #define test_bit(...) false
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
  2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
  2022-10-20  9:46 ` [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev() Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-24 11:04   ` Jan Beulich
  2022-10-20  9:46 ` [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding() Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

It's possible for a device to be assigned to a domain but have no
vpci structure if vpci_process_pending() failed and called
vpci_remove_device() as a result.  The unconditional accesses done by
vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
then trigger a NULL pointer dereference.

Add checks for pdev->vpci presence in the affected functions.

Fixes: 9c244fdef7 ('vpci: add header handlers')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/vpci/vpci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 3467c0de86..647f7af679 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -37,7 +37,7 @@ extern vpci_register_init_t *const __end_vpci_array[];
 
 void vpci_remove_device(struct pci_dev *pdev)
 {
-    if ( !has_vpci(pdev->domain) )
+    if ( !has_vpci(pdev->domain) || !pdev->vpci )
         return;
 
     spin_lock(&pdev->vpci->lock);
@@ -326,7 +326,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
     /* Find the PCI dev matching the address. */
     pdev = pci_get_pdev(d, sbdf);
-    if ( !pdev )
+    if ( !pdev || !pdev->vpci )
         return vpci_read_hw(sbdf, reg, size);
 
     spin_lock(&pdev->vpci->lock);
@@ -436,7 +436,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
      * Passthrough everything that's not trapped.
      */
     pdev = pci_get_pdev(d, sbdf);
-    if ( !pdev )
+    if ( !pdev || !pdev->vpci )
     {
         vpci_write_hw(sbdf, reg, size, data);
         return;
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding()
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
                   ` (2 preceding siblings ...)
  2022-10-20  9:46 ` [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-24 11:05   ` Jan Beulich
  2022-10-20  9:46 ` [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

This is done to shorten line length in the function in preparation for
adding further usages of the vpci_bar data structure.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/vpci/header.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index a1c928a0d2..eb9219a49a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -103,24 +103,26 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        if ( !MAPPABLE_BAR(&header->bars[i]) )
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( !MAPPABLE_BAR(bar) )
             continue;
 
-        if ( rom_only && header->bars[i].type == VPCI_BAR_ROM )
+        if ( rom_only && bar->type == VPCI_BAR_ROM )
         {
             unsigned int rom_pos = (i == PCI_HEADER_NORMAL_NR_BARS)
                                    ? PCI_ROM_ADDRESS : PCI_ROM_ADDRESS1;
-            uint32_t val = header->bars[i].addr |
+            uint32_t val = bar->addr |
                            (map ? PCI_ROM_ADDRESS_ENABLE : 0);
 
-            header->bars[i].enabled = header->rom_enabled = map;
+            bar->enabled = header->rom_enabled = map;
             pci_conf_write32(pdev->sbdf, rom_pos, val);
             return;
         }
 
         if ( !rom_only &&
-             (header->bars[i].type != VPCI_BAR_ROM || header->rom_enabled) )
-            header->bars[i].enabled = map;
+             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
+            bar->enabled = map;
     }
 
     if ( !rom_only )
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
                   ` (3 preceding siblings ...)
  2022-10-20  9:46 ` [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding() Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-24 11:19   ` Jan Beulich
  2022-10-20  9:46 ` [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped Roger Pau Monne
  2022-10-20 10:12 ` [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Henry Wang
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Paul Durrant

Commit 75cc460a1b added checks to ensure the position of the BARs from
PCI devices don't overlap with regions defined on the memory map.
When there's a collision memory decoding is left disabled for the
device, assuming that dom0 will reposition the BAR if necessary and
enable memory decoding.

While this would be the case for devices being used by dom0, devices
being used by the firmware itself that have no driver would usually be
left with memory decoding disabled by dom0 if that's the state dom0
found them in, and thus firmware trying to make use of them will not
function correctly.

The initial intent of 75cc460a1b was to prevent vPCI from creating
MMIO mappings on the dom0 p2m over regions that would otherwise
already have mappings established.  It's my view now that we likely
went too far with 75cc460a1b, and Xen disabling memory decoding of
devices (as buggy as they might be) is harmful, and reduces the set of
hardware on which Xen works.

This commits reverts most of 75cc460a1b, and instead adds checks to
vPCI in order to prevent misplaced BARs from being added to the
hardware domain p2m.  Signaling on whether BARs are mapped is tracked
in the vpci structure, so that misplaced BARs are not mapped, and thus
Xen won't attempt to unmap them when memory decoding is disabled.

This restores the behavior of Xen for PV dom0 to the state it was
previous to 75cc460a1b, while also introducing a more contained fix
for the vPCI BAR mapping issues.

Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
AT Citrix we have a system with a device with the following BARs:

BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
BAR [0, 0x1fff] -> not positioned, outside host bridge window

And memory decoding enabled by the firmware.  With the current code
(or any of the previous fix proposals), Xen would still disable memory
decoding for the device, and the system will freeze when attempting to
set EFI vars.

I'm afraid the best solution to avoid regressions caused by 75cc460a1b
is the proposal here.
---
 xen/drivers/passthrough/pci.c | 69 -----------------------------------
 xen/drivers/vpci/header.c     | 22 ++++++++++-
 2 files changed, 20 insertions(+), 71 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 149f68bb6e..b42acb8d7c 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -233,9 +233,6 @@ static void check_pdev(const struct pci_dev *pdev)
      PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT | \
      PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY)
     u16 val;
-    unsigned int nbars = 0, rom_pos = 0, i;
-    static const char warn[] = XENLOG_WARNING
-        "%pp disabled: %sBAR [%#lx, %#lx] overlaps with memory map\n";
 
     if ( command_mask )
     {
@@ -254,8 +251,6 @@ static void check_pdev(const struct pci_dev *pdev)
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
     case PCI_HEADER_TYPE_BRIDGE:
-        nbars = PCI_HEADER_BRIDGE_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS1;
         if ( !bridge_ctl_mask )
             break;
         val = pci_conf_read16(pdev->sbdf, PCI_BRIDGE_CONTROL);
@@ -272,75 +267,11 @@ static void check_pdev(const struct pci_dev *pdev)
         }
         break;
 
-    case PCI_HEADER_TYPE_NORMAL:
-        nbars = PCI_HEADER_NORMAL_NR_BARS;
-        rom_pos = PCI_ROM_ADDRESS;
-        break;
-
     case PCI_HEADER_TYPE_CARDBUS:
         /* TODO */
         break;
     }
 #undef PCI_STATUS_CHECK
-
-    /* Check if BARs overlap with other memory regions. */
-    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
-    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
-        return;
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
-    for ( i = 0; i < nbars; )
-    {
-        uint64_t addr, size;
-        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
-        int rc = 1;
-
-        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
-             PCI_BASE_ADDRESS_SPACE_MEMORY )
-            goto next;
-
-        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
-                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
-        if ( rc < 0 )
-            /* Unable to size, better leave memory decoding disabled. */
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            /*
-             * Return without enabling memory decoding if BAR position is not
-             * in IO suitable memory. Let the hardware domain re-position the
-             * BAR.
-             */
-            printk(warn,
-                   &pdev->sbdf, "", PFN_DOWN(addr), PFN_DOWN(addr + size - 1));
-            return;
-        }
-
- next:
-        ASSERT(rc > 0);
-        i += rc;
-    }
-
-    if ( rom_pos &&
-         (pci_conf_read32(pdev->sbdf, rom_pos) & PCI_ROM_ADDRESS_ENABLE) )
-    {
-        uint64_t addr, size;
-        int rc = pci_size_mem_bar(pdev->sbdf, rom_pos, &addr, &size,
-                                  PCI_BAR_ROM);
-
-        if ( rc < 0 )
-            return;
-        if ( size && !pci_check_bar(pdev, maddr_to_mfn(addr),
-                                    maddr_to_mfn(addr + size - 1)) )
-        {
-            printk(warn, &pdev->sbdf, "ROM ", PFN_DOWN(addr),
-                   PFN_DOWN(addr + size - 1));
-            return;
-        }
-    }
-
-    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val);
 }
 
 static void apply_quirks(struct pci_dev *pdev)
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index eb9219a49a..4d7f8f4a30 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -121,7 +121,9 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
         }
 
         if ( !rom_only &&
-             (bar->type != VPCI_BAR_ROM || header->rom_enabled) )
+             (bar->type != VPCI_BAR_ROM || header->rom_enabled) &&
+             pci_check_bar(pdev, _mfn(PFN_DOWN(bar->addr)),
+                           _mfn(PFN_DOWN(bar->addr + bar->size - 1))) )
             bar->enabled = map;
     }
 
@@ -234,9 +236,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
-                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
+                       : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) ||
+             /* Skip BARs already in the requested state. */
+             bar->enabled == !!(cmd & PCI_COMMAND_MEMORY) )
             continue;
 
+        /*
+         * Only do BAR position checks for the hardware domain, for guests it's
+         * assumed that the hardware domain has changed the position of any
+         * problematic BARs.
+         */
+        if ( is_hardware_domain(pdev->domain) &&
+             !pci_check_bar(pdev, _mfn(start), _mfn(end)) )
+        {
+            printk(XENLOG_G_WARNING
+                   "%pp: not mapping BAR [%lx, %lx] invalid position\n",
+                   &pdev->sbdf, start, end);
+            continue;
+        }
+
         rc = rangeset_add_range(mem, start, end);
         if ( rc )
         {
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
                   ` (4 preceding siblings ...)
  2022-10-20  9:46 ` [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices Roger Pau Monne
@ 2022-10-20  9:46 ` Roger Pau Monne
  2022-10-24 11:51   ` Jan Beulich
  2022-10-20 10:12 ` [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Henry Wang
  6 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2022-10-20  9:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

Writes to the BARs are ignored if memory decoding is enabled for the
device, and the same happen with ROM BARs if the write is an attempt
to change the position of the BAR without disabling it first.

The reason of ignoring such writes is a limitation in Xen, as it would
need to unmap the BAR, change the address, and remap the BAR at the
new position, which the current logic doesn't support.

Some devices however seem to have the memory decoding bit hardcoded to
enabled, and attempts to disable it don't get reflected on the
command register.

This causes issues for well behaved guests that disable memory
decoding and then try to size the BARs, as vPCI will think memory
decoding is still enabled and ignore the write.

Since vPCI doesn't explicitly care about whether the memory decoding
bit is disabled as long as the BAR is not mapped in the guest p2m use
the information in the vpci_bar to check whether the BAR is mapped,
and refuse writes only based on that information.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/vpci/header.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 4d7f8f4a30..4b39737b76 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -388,7 +388,7 @@ static void cf_check bar_write(
     else
         val &= PCI_BASE_ADDRESS_MEM_MASK;
 
-    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
+    if ( bar->enabled )
     {
         /* If the value written is the current one avoid printing a warning. */
         if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
@@ -425,7 +425,7 @@ static void cf_check rom_write(
     uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
     bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
 
-    if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled && new_enabled )
+    if ( rom->enabled && new_enabled )
     {
         gprintk(XENLOG_WARNING,
                 "%pp: ignored ROM BAR write with memory decoding enabled\n",
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define
  2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
@ 2022-10-20  9:57   ` Andrew Cooper
  2022-10-20 13:20   ` Anthony PERARD
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2022-10-20  9:57 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Wei Liu, Anthony Perard

On 20/10/2022 10:46, Roger Pau Monne wrote:
> Some vpci functions got the cfcheck attribute added, but that's not
> defined in the user-space test harness, so add a dummy define in order
> for the harness to build.
>
> Fixes: 4ed7d5525f ('xen/vpci: CFI hardening')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

However, I think there wants to be another patch in this series wiring
up the unit test by default, so we don't keep breaking this...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling
  2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
                   ` (5 preceding siblings ...)
  2022-10-20  9:46 ` [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped Roger Pau Monne
@ 2022-10-20 10:12 ` Henry Wang
  6 siblings, 0 replies; 24+ messages in thread
From: Henry Wang @ 2022-10-20 10:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Wei Liu, Anthony PERARD, Jan Beulich, Paul Durrant

Hi Roger, 

> -----Original Message-----
> Subject: [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding
> handling
> 
> Hello,
> 
> First two patches fix some build isses that showed up on the vpci test
> harness, following patches attempt to fix the regressions caused by
> 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') and
> the last patch relaxes the check done when attempting to write to BARs
> with memory decoding enabled.
> 
> I consider all of them bug fixes, albeit the last patch is not fixing a
> regression (since vPCI code has always behaved this way).

As I don't really want to spam the list, I will provide my release-ack in
the cover letter.

This series is a bugfix series, so:

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


> 
> Thanks, Roger.
> 
> Roger Pau Monne (6):
>   test/vpci: add dummy cfcheck define
>   test/vpci: fix vPCI test harness to provide pci_get_pdev()
>   vpci: don't assume that vpci per-device data exists unconditionally
>   vpci: introduce a local vpci_bar variable to modify_decoding()
>   pci: do not disable memory decoding for devices
>   vpci: refuse BAR writes only if the BAR is mapped
> 
>  tools/tests/vpci/emul.h       |  3 +-
>  xen/drivers/passthrough/pci.c | 69 -----------------------------------
>  xen/drivers/vpci/header.c     | 38 ++++++++++++++-----
>  xen/drivers/vpci/vpci.c       |  6 +--
>  4 files changed, 34 insertions(+), 82 deletions(-)
> 
> --
> 2.37.3
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define
  2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
  2022-10-20  9:57   ` Andrew Cooper
@ 2022-10-20 13:20   ` Anthony PERARD
  1 sibling, 0 replies; 24+ messages in thread
From: Anthony PERARD @ 2022-10-20 13:20 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu

On Thu, Oct 20, 2022 at 11:46:44AM +0200, Roger Pau Monne wrote:
> Some vpci functions got the cfcheck attribute added, but that's not
> defined in the user-space test harness, so add a dummy define in order
> for the harness to build.
> 
> Fixes: 4ed7d5525f ('xen/vpci: CFI hardening')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev()
  2022-10-20  9:46 ` [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev() Roger Pau Monne
@ 2022-10-20 13:21   ` Anthony PERARD
  0 siblings, 0 replies; 24+ messages in thread
From: Anthony PERARD @ 2022-10-20 13:21 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu

On Thu, Oct 20, 2022 at 11:46:45AM +0200, Roger Pau Monne wrote:
> Instead of pci_get_pdev_by_domain(), which is no longer present in the
> hypervisor.
> 
> While there add parentheses around the define value.
> 
> Fixes: a37f9ea7a6 ('PCI: fold pci_get_pdev{,_by_domain}()')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally
  2022-10-20  9:46 ` [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally Roger Pau Monne
@ 2022-10-24 11:04   ` Jan Beulich
  2022-10-24 16:01     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 11:04 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

On 20.10.2022 11:46, Roger Pau Monne wrote:
> It's possible for a device to be assigned to a domain but have no
> vpci structure if vpci_process_pending() failed and called
> vpci_remove_device() as a result.  The unconditional accesses done by
> vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
> then trigger a NULL pointer dereference.
> 
> Add checks for pdev->vpci presence in the affected functions.
> 
> Fixes: 9c244fdef7 ('vpci: add header handlers')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I wonder though whether these changes are enough. Is
vpci_process_pending() immune to a pdev losing its ->vpci?

Furthermore msix_find() iterates over d->arch.hvm.msix_tables, which
looks to only ever be added to. Doesn't this list need pruning by
vpci_remove_device()? I've noticed this only because of looking at
derefs of ->vpci in msix.c - I don't think I can easily see that all
of those derefs are once again immune to a pdev losing its ->vpci.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding()
  2022-10-20  9:46 ` [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding() Roger Pau Monne
@ 2022-10-24 11:05   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 11:05 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

On 20.10.2022 11:46, Roger Pau Monne wrote:
> This is done to shorten line length in the function in preparation for
> adding further usages of the vpci_bar data structure.
> 
> No functional change.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-20  9:46 ` [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices Roger Pau Monne
@ 2022-10-24 11:19   ` Jan Beulich
  2022-10-24 12:45     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 11:19 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Paul Durrant

On 20.10.2022 11:46, Roger Pau Monne wrote:
> Commit 75cc460a1b added checks to ensure the position of the BARs from
> PCI devices don't overlap with regions defined on the memory map.
> When there's a collision memory decoding is left disabled for the
> device, assuming that dom0 will reposition the BAR if necessary and
> enable memory decoding.
> 
> While this would be the case for devices being used by dom0, devices
> being used by the firmware itself that have no driver would usually be
> left with memory decoding disabled by dom0 if that's the state dom0
> found them in, and thus firmware trying to make use of them will not
> function correctly.
> 
> The initial intent of 75cc460a1b was to prevent vPCI from creating
> MMIO mappings on the dom0 p2m over regions that would otherwise
> already have mappings established.  It's my view now that we likely
> went too far with 75cc460a1b, and Xen disabling memory decoding of
> devices (as buggy as they might be) is harmful, and reduces the set of
> hardware on which Xen works.
> 
> This commits reverts most of 75cc460a1b, and instead adds checks to
> vPCI in order to prevent misplaced BARs from being added to the
> hardware domain p2m.

Which makes me wonder: How do things work then? Dom0 then still can't
access the BAR address range, can it? Plus with this adjustment, is
...

>  Signaling on whether BARs are mapped is tracked
> in the vpci structure, so that misplaced BARs are not mapped, and thus
> Xen won't attempt to unmap them when memory decoding is disabled.
> 
> This restores the behavior of Xen for PV dom0 to the state it was
> previous to 75cc460a1b, while also introducing a more contained fix
> for the vPCI BAR mapping issues.

... this (in particular "restores the behavior") a valid description
of this change?

> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> AT Citrix we have a system with a device with the following BARs:
> 
> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> 
> And memory decoding enabled by the firmware.  With the current code
> (or any of the previous fix proposals), Xen would still disable memory
> decoding for the device, and the system will freeze when attempting to
> set EFI vars.

Isn't the latter (BAR at address 0) yet another problem? I have to admit
that I'm uncertain in how far it is a good idea to try to make Xen look
to work on such a system ...

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped
  2022-10-20  9:46 ` [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped Roger Pau Monne
@ 2022-10-24 11:51   ` Jan Beulich
  2022-10-24 15:04     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 11:51 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

On 20.10.2022 11:46, Roger Pau Monne wrote:
> Writes to the BARs are ignored if memory decoding is enabled for the
> device, and the same happen with ROM BARs if the write is an attempt
> to change the position of the BAR without disabling it first.
> 
> The reason of ignoring such writes is a limitation in Xen, as it would
> need to unmap the BAR, change the address, and remap the BAR at the
> new position, which the current logic doesn't support.
> 
> Some devices however seem to have the memory decoding bit hardcoded to
> enabled, and attempts to disable it don't get reflected on the
> command register.

This isn't compliant with the spec, is it? It looks to contradict both
"When a 0 is written to this register, the device is logically
disconnected from the PCI bus for all accesses except configuration
accesses" and "Devices typically power up with all 0's in this
register, but Section 6.6 explains some exceptions" (quoting from the
old 3.0 spec, which I have readily to hand). The referenced section
then says "Such devices are required to support the Command register
disabling function described in Section 6.2.2".

How does any arbitrary OS go about sizing the BARs of such a device?

> This causes issues for well behaved guests that disable memory
> decoding and then try to size the BARs, as vPCI will think memory
> decoding is still enabled and ignore the write.
> 
> Since vPCI doesn't explicitly care about whether the memory decoding
> bit is disabled as long as the BAR is not mapped in the guest p2m use
> the information in the vpci_bar to check whether the BAR is mapped,
> and refuse writes only based on that information.

From purely a vPCI pov this looks to be a plausible solution (or
should I better say workaround). I guess the two pieces of code that
you alter would benefit from a comment as to it being intentional to
_not_ check the command register (anymore).

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -388,7 +388,7 @@ static void cf_check bar_write(
>      else
>          val &= PCI_BASE_ADDRESS_MEM_MASK;
>  
> -    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> +    if ( bar->enabled )
>      {
>          /* If the value written is the current one avoid printing a warning. */
>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> @@ -425,7 +425,7 @@ static void cf_check rom_write(
>      uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>      bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
>  
> -    if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled && new_enabled )
> +    if ( rom->enabled && new_enabled )
>      {
>          gprintk(XENLOG_WARNING,
>                  "%pp: ignored ROM BAR write with memory decoding enabled\n",

The log message wording then wants adjustment, I guess?

What about

    if ( !(cmd & PCI_COMMAND_MEMORY) || header->rom_enabled == new_enabled )

a few lines down from here? Besides still using the command register
value here not looking very consistent, wouldn't header->rom_enabled
here an in the intermediate if() also better be converted to
rom->enabled for consistency?

Then again - is you also dropping the check of header->rom_enabled
actually correct? While both are written to the same value by
modify_decoding(), both rom_write() and init_bars() can bring the
two booleans out of sync afaics.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-24 11:19   ` Jan Beulich
@ 2022-10-24 12:45     ` Roger Pau Monné
  2022-10-24 13:59       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-10-24 12:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant

On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> On 20.10.2022 11:46, Roger Pau Monne wrote:
> > Commit 75cc460a1b added checks to ensure the position of the BARs from
> > PCI devices don't overlap with regions defined on the memory map.
> > When there's a collision memory decoding is left disabled for the
> > device, assuming that dom0 will reposition the BAR if necessary and
> > enable memory decoding.
> > 
> > While this would be the case for devices being used by dom0, devices
> > being used by the firmware itself that have no driver would usually be
> > left with memory decoding disabled by dom0 if that's the state dom0
> > found them in, and thus firmware trying to make use of them will not
> > function correctly.
> > 
> > The initial intent of 75cc460a1b was to prevent vPCI from creating
> > MMIO mappings on the dom0 p2m over regions that would otherwise
> > already have mappings established.  It's my view now that we likely
> > went too far with 75cc460a1b, and Xen disabling memory decoding of
> > devices (as buggy as they might be) is harmful, and reduces the set of
> > hardware on which Xen works.
> > 
> > This commits reverts most of 75cc460a1b, and instead adds checks to
> > vPCI in order to prevent misplaced BARs from being added to the
> > hardware domain p2m.
> 
> Which makes me wonder: How do things work then? Dom0 then still can't
> access the BAR address range, can it?

It does allow access on some situations where the previous arrangement
didn't work because it wholesale disabled memory decoding for the
device.

So if it's only one BAR that's misplaced the rest will still get added
to the dom0 p2m and be accessible, because memory decoding won't be
turned off for the device.

> Plus with this adjustment, is
> ...
> 
> >  Signaling on whether BARs are mapped is tracked
> > in the vpci structure, so that misplaced BARs are not mapped, and thus
> > Xen won't attempt to unmap them when memory decoding is disabled.
> > 
> > This restores the behavior of Xen for PV dom0 to the state it was
> > previous to 75cc460a1b, while also introducing a more contained fix
> > for the vPCI BAR mapping issues.
> 
> ... this (in particular "restores the behavior") a valid description
> of this change?

Yes, it restores the previous behavior for PV dom0, as memory decoding
is no longer turned off for any devices regardless of where the BARs
are positioned.

> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > AT Citrix we have a system with a device with the following BARs:
> > 
> > BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> > BAR [0, 0x1fff] -> not positioned, outside host bridge window
> > 
> > And memory decoding enabled by the firmware.  With the current code
> > (or any of the previous fix proposals), Xen would still disable memory
> > decoding for the device, and the system will freeze when attempting to
> > set EFI vars.
> 
> Isn't the latter (BAR at address 0) yet another problem?

It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
is a bug in the firmware but shouldn't prevent Xen from booting.

In the above system address 0 is outside of the PCI host bridge
window, so even if we mapped the BAR and memory decoding for the
device was enabled accessing such BAR wouldn't work.

> I have to admit
> that I'm uncertain in how far it is a good idea to try to make Xen look
> to work on such a system ...

PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
would consider 75cc460a1b to be a regression for PV dom0 setups.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-24 12:45     ` Roger Pau Monné
@ 2022-10-24 13:59       ` Jan Beulich
  2022-10-24 15:45         ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 13:59 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Paul Durrant

On 24.10.2022 14:45, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
>> On 20.10.2022 11:46, Roger Pau Monne wrote:
>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
>>> PCI devices don't overlap with regions defined on the memory map.
>>> When there's a collision memory decoding is left disabled for the
>>> device, assuming that dom0 will reposition the BAR if necessary and
>>> enable memory decoding.
>>>
>>> While this would be the case for devices being used by dom0, devices
>>> being used by the firmware itself that have no driver would usually be
>>> left with memory decoding disabled by dom0 if that's the state dom0
>>> found them in, and thus firmware trying to make use of them will not
>>> function correctly.
>>>
>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
>>> MMIO mappings on the dom0 p2m over regions that would otherwise
>>> already have mappings established.  It's my view now that we likely
>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
>>> devices (as buggy as they might be) is harmful, and reduces the set of
>>> hardware on which Xen works.
>>>
>>> This commits reverts most of 75cc460a1b, and instead adds checks to
>>> vPCI in order to prevent misplaced BARs from being added to the
>>> hardware domain p2m.
>>
>> Which makes me wonder: How do things work then? Dom0 then still can't
>> access the BAR address range, can it?
> 
> It does allow access on some situations where the previous arrangement
> didn't work because it wholesale disabled memory decoding for the
> device.
> 
> So if it's only one BAR that's misplaced the rest will still get added
> to the dom0 p2m and be accessible, because memory decoding won't be
> turned off for the device.

Right - without a per-BAR disable there can only be all or nothing. In
the end if things work with this adjustment, the problem BAR cannot
really be in use aiui. I wonder what you would propose we do if on
another system such a BAR is actually in use.

>> Plus with this adjustment, is
>> ...
>>
>>>  Signaling on whether BARs are mapped is tracked
>>> in the vpci structure, so that misplaced BARs are not mapped, and thus
>>> Xen won't attempt to unmap them when memory decoding is disabled.
>>>
>>> This restores the behavior of Xen for PV dom0 to the state it was
>>> previous to 75cc460a1b, while also introducing a more contained fix
>>> for the vPCI BAR mapping issues.
>>
>> ... this (in particular "restores the behavior") a valid description
>> of this change?
> 
> Yes, it restores the previous behavior for PV dom0, as memory decoding
> is no longer turned off for any devices regardless of where the BARs
> are positioned.

It restores one aspect of behavior but then puts in place another
restriction.

>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> AT Citrix we have a system with a device with the following BARs:
>>>
>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
>>>
>>> And memory decoding enabled by the firmware.  With the current code
>>> (or any of the previous fix proposals), Xen would still disable memory
>>> decoding for the device, and the system will freeze when attempting to
>>> set EFI vars.
>>
>> Isn't the latter (BAR at address 0) yet another problem?
> 
> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> is a bug in the firmware but shouldn't prevent Xen from booting.
> 
> In the above system address 0 is outside of the PCI host bridge
> window, so even if we mapped the BAR and memory decoding for the
> device was enabled accessing such BAR wouldn't work.

It's mere luck I would say that in this case the BAR is outside the
bridge's window. What if this was a device integrated in the root
complex?

>> I have to admit
>> that I'm uncertain in how far it is a good idea to try to make Xen look
>> to work on such a system ...
> 
> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> would consider 75cc460a1b to be a regression for PV dom0 setups.

Agreed, in a way it is a regression. In another way it is deliberate
behavior to not accept bogus configurations. The difficulty is to
find a reasonable balance between allowing Xen to work in such cases
and guarding Xen from suffering follow-on issues resulting from such
misconfiguration. After all if this system later was impacted by the
bad BAR(s), connecting the misbehavior to the root cause might end
up quite a bit more difficult.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped
  2022-10-24 11:51   ` Jan Beulich
@ 2022-10-24 15:04     ` Roger Pau Monné
  2022-10-24 16:03       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-10-24 15:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, Oct 24, 2022 at 01:51:03PM +0200, Jan Beulich wrote:
> On 20.10.2022 11:46, Roger Pau Monne wrote:
> > Writes to the BARs are ignored if memory decoding is enabled for the
> > device, and the same happen with ROM BARs if the write is an attempt
> > to change the position of the BAR without disabling it first.
> > 
> > The reason of ignoring such writes is a limitation in Xen, as it would
> > need to unmap the BAR, change the address, and remap the BAR at the
> > new position, which the current logic doesn't support.
> > 
> > Some devices however seem to have the memory decoding bit hardcoded to
> > enabled, and attempts to disable it don't get reflected on the
> > command register.
> 
> This isn't compliant with the spec, is it? It looks to contradict both
> "When a 0 is written to this register, the device is logically
> disconnected from the PCI bus for all accesses except configuration
> accesses" and "Devices typically power up with all 0's in this
> register, but Section 6.6 explains some exceptions" (quoting from the
> old 3.0 spec, which I have readily to hand). The referenced section
> then says "Such devices are required to support the Command register
> disabling function described in Section 6.2.2".

It's unclear to me whether setting the bit to 0 is plain ignored, or
just fails to be reflected on the command register.

> How does any arbitrary OS go about sizing the BARs of such a device?

AFAICT from Linux behavior, an OS would just set to 0 the memory
decoding command register bit and write the value, but there's no
check afterwards that the returned value from a read of the register
still has memory decoding disabled.   Xen does exactly the same:
attempt to toggle the bit but don't check the value written.

> > This causes issues for well behaved guests that disable memory
> > decoding and then try to size the BARs, as vPCI will think memory
> > decoding is still enabled and ignore the write.
> > 
> > Since vPCI doesn't explicitly care about whether the memory decoding
> > bit is disabled as long as the BAR is not mapped in the guest p2m use
> > the information in the vpci_bar to check whether the BAR is mapped,
> > and refuse writes only based on that information.
> 
> From purely a vPCI pov this looks to be a plausible solution (or
> should I better say workaround). I guess the two pieces of code that
> you alter would benefit from a comment as to it being intentional to
> _not_ check the command register (anymore).

Ack.

> > --- a/xen/drivers/vpci/header.c
> > +++ b/xen/drivers/vpci/header.c
> > @@ -388,7 +388,7 @@ static void cf_check bar_write(
> >      else
> >          val &= PCI_BASE_ADDRESS_MEM_MASK;
> >  
> > -    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
> > +    if ( bar->enabled )
> >      {
> >          /* If the value written is the current one avoid printing a warning. */
> >          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
> > @@ -425,7 +425,7 @@ static void cf_check rom_write(
> >      uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> >      bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
> >  
> > -    if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled && new_enabled )
> > +    if ( rom->enabled && new_enabled )
> >      {
> >          gprintk(XENLOG_WARNING,
> >                  "%pp: ignored ROM BAR write with memory decoding enabled\n",
> 
> The log message wording then wants adjustment, I guess?

Hm, I think the message is fine for the purposes here, vPCI is indeed
ignoring a write with memory decoding enabled, or else rom->enabled
would be false.

Or are you arguing that the message is already wrong in the current
context and should instead be:

"ignored ROM BAR write with memory decoding and ROM enabled"

> What about
> 
>     if ( !(cmd & PCI_COMMAND_MEMORY) || header->rom_enabled == new_enabled )
> 
> a few lines down from here? Besides still using the command register
> value here not looking very consistent, wouldn't header->rom_enabled
> here an in the intermediate if() also better be converted to
> rom->enabled for consistency?
> 
> Then again - is you also dropping the check of header->rom_enabled
> actually correct?

rom->enabled should only be set when both the ROM is enabled and
memory decoding is also enabled for the device.

> While both are written to the same value by
> modify_decoding(), both rom_write() and init_bars() can bring the
> two booleans out of sync afaics.

Right, bar->enabled is not a clone of header->rom_enabled, as the
later only caches the ROM BAR enable bit, while the former caches both
header->rom_enabled and whether memory decoding is also enabled (and
the BAR is mapped).

The usage of command register value in the checks below is indeed
dubious, as the purpose of the change is tono trust the value of
the memory decoding bit in the command register.

I think the only way is to cache the Xen intended value of the memory
decoding command register bit for it's usage in rom_write().

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-24 13:59       ` Jan Beulich
@ 2022-10-24 15:45         ` Roger Pau Monné
  2022-10-24 15:56           ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-10-24 15:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant

On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
> On 24.10.2022 14:45, Roger Pau Monné wrote:
> > On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> >> On 20.10.2022 11:46, Roger Pau Monne wrote:
> >>> Commit 75cc460a1b added checks to ensure the position of the BARs from
> >>> PCI devices don't overlap with regions defined on the memory map.
> >>> When there's a collision memory decoding is left disabled for the
> >>> device, assuming that dom0 will reposition the BAR if necessary and
> >>> enable memory decoding.
> >>>
> >>> While this would be the case for devices being used by dom0, devices
> >>> being used by the firmware itself that have no driver would usually be
> >>> left with memory decoding disabled by dom0 if that's the state dom0
> >>> found them in, and thus firmware trying to make use of them will not
> >>> function correctly.
> >>>
> >>> The initial intent of 75cc460a1b was to prevent vPCI from creating
> >>> MMIO mappings on the dom0 p2m over regions that would otherwise
> >>> already have mappings established.  It's my view now that we likely
> >>> went too far with 75cc460a1b, and Xen disabling memory decoding of
> >>> devices (as buggy as they might be) is harmful, and reduces the set of
> >>> hardware on which Xen works.
> >>>
> >>> This commits reverts most of 75cc460a1b, and instead adds checks to
> >>> vPCI in order to prevent misplaced BARs from being added to the
> >>> hardware domain p2m.
> >>
> >> Which makes me wonder: How do things work then? Dom0 then still can't
> >> access the BAR address range, can it?
> > 
> > It does allow access on some situations where the previous arrangement
> > didn't work because it wholesale disabled memory decoding for the
> > device.
> > 
> > So if it's only one BAR that's misplaced the rest will still get added
> > to the dom0 p2m and be accessible, because memory decoding won't be
> > turned off for the device.
> 
> Right - without a per-BAR disable there can only be all or nothing. In
> the end if things work with this adjustment, the problem BAR cannot
> really be in use aiui. I wonder what you would propose we do if on
> another system such a BAR is actually in use.

dom0 would have to change the position of the BAR to a suitable place
and then use it.  Linux dom0 does already reposition bogus BARs of
devices.

> >> Plus with this adjustment, is
> >> ...
> >>
> >>>  Signaling on whether BARs are mapped is tracked
> >>> in the vpci structure, so that misplaced BARs are not mapped, and thus
> >>> Xen won't attempt to unmap them when memory decoding is disabled.
> >>>
> >>> This restores the behavior of Xen for PV dom0 to the state it was
> >>> previous to 75cc460a1b, while also introducing a more contained fix
> >>> for the vPCI BAR mapping issues.
> >>
> >> ... this (in particular "restores the behavior") a valid description
> >> of this change?
> > 
> > Yes, it restores the previous behavior for PV dom0, as memory decoding
> > is no longer turned off for any devices regardless of where the BARs
> > are positioned.
> 
> It restores one aspect of behavior but then puts in place another
> restriction.

I assume the other restriction is about moving the check to vPCI code
rather than disabling memory decoding?

It restores the behavior for PV dom0, and keeps a more 'contained' fix
for PVH dom0.

> 
> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> ---
> >>> AT Citrix we have a system with a device with the following BARs:
> >>>
> >>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> >>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> >>>
> >>> And memory decoding enabled by the firmware.  With the current code
> >>> (or any of the previous fix proposals), Xen would still disable memory
> >>> decoding for the device, and the system will freeze when attempting to
> >>> set EFI vars.
> >>
> >> Isn't the latter (BAR at address 0) yet another problem?
> > 
> > It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> > is a bug in the firmware but shouldn't prevent Xen from booting.
> > 
> > In the above system address 0 is outside of the PCI host bridge
> > window, so even if we mapped the BAR and memory decoding for the
> > device was enabled accessing such BAR wouldn't work.
> 
> It's mere luck I would say that in this case the BAR is outside the
> bridge's window. What if this was a device integrated in the root
> complex?

I would expect dom0 to reposition the BAR, but doesn't a root complex
also have a set of windows in decodes accesses from (as listed in ACPI
_CRS method for the device), and hence still need BARs to be
positioned at certain ranges in order to be accessible?

> >> I have to admit
> >> that I'm uncertain in how far it is a good idea to try to make Xen look
> >> to work on such a system ...
> > 
> > PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> > would consider 75cc460a1b to be a regression for PV dom0 setups.
> 
> Agreed, in a way it is a regression. In another way it is deliberate
> behavior to not accept bogus configurations. The difficulty is to
> find a reasonable balance between allowing Xen to work in such cases
> and guarding Xen from suffering follow-on issues resulting from such
> misconfiguration. After all if this system later was impacted by the
> bad BAR(s), connecting the misbehavior to the root cause might end
> up quite a bit more difficult.

IMO we should strive to boot (almost?) everywhere Linux (or your
chosen dom0 OS) also boots, since that's what users expect.

I would assume if the system was impacted by the bad BARs, it would
also affect the dom0 OS when booting natively on such platform.

What we do right now with memory decoding already leads to a very
difficult to diagnose issue, as on the above example calling an UEFI
runtime method completely freezes the box (no debug keys, no watchdog
worked).

So I think leaving the system PCI devices as-is and letting dom0 deal
with the conflicts is likely a better option than playing with the
memory decoding bits.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-24 15:45         ` Roger Pau Monné
@ 2022-10-24 15:56           ` Jan Beulich
  2022-10-24 16:24             ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 15:56 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Paul Durrant

On 24.10.2022 17:45, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
>> On 24.10.2022 14:45, Roger Pau Monné wrote:
>>> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
>>>> On 20.10.2022 11:46, Roger Pau Monne wrote:
>>>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
>>>>> PCI devices don't overlap with regions defined on the memory map.
>>>>> When there's a collision memory decoding is left disabled for the
>>>>> device, assuming that dom0 will reposition the BAR if necessary and
>>>>> enable memory decoding.
>>>>>
>>>>> While this would be the case for devices being used by dom0, devices
>>>>> being used by the firmware itself that have no driver would usually be
>>>>> left with memory decoding disabled by dom0 if that's the state dom0
>>>>> found them in, and thus firmware trying to make use of them will not
>>>>> function correctly.
>>>>>
>>>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
>>>>> MMIO mappings on the dom0 p2m over regions that would otherwise
>>>>> already have mappings established.  It's my view now that we likely
>>>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
>>>>> devices (as buggy as they might be) is harmful, and reduces the set of
>>>>> hardware on which Xen works.
>>>>>
>>>>> This commits reverts most of 75cc460a1b, and instead adds checks to
>>>>> vPCI in order to prevent misplaced BARs from being added to the
>>>>> hardware domain p2m.
>>>>
>>>> Which makes me wonder: How do things work then? Dom0 then still can't
>>>> access the BAR address range, can it?
>>>
>>> It does allow access on some situations where the previous arrangement
>>> didn't work because it wholesale disabled memory decoding for the
>>> device.
>>>
>>> So if it's only one BAR that's misplaced the rest will still get added
>>> to the dom0 p2m and be accessible, because memory decoding won't be
>>> turned off for the device.
>>
>> Right - without a per-BAR disable there can only be all or nothing. In
>> the end if things work with this adjustment, the problem BAR cannot
>> really be in use aiui. I wonder what you would propose we do if on
>> another system such a BAR is actually in use.
> 
> dom0 would have to change the position of the BAR to a suitable place
> and then use it.  Linux dom0 does already reposition bogus BARs of
> devices.

Yet that still can't realistically work if the firmware expects the
BAR at the address recorded in the EFI memory map entry.

>>>> Plus with this adjustment, is
>>>> ...
>>>>
>>>>>  Signaling on whether BARs are mapped is tracked
>>>>> in the vpci structure, so that misplaced BARs are not mapped, and thus
>>>>> Xen won't attempt to unmap them when memory decoding is disabled.
>>>>>
>>>>> This restores the behavior of Xen for PV dom0 to the state it was
>>>>> previous to 75cc460a1b, while also introducing a more contained fix
>>>>> for the vPCI BAR mapping issues.
>>>>
>>>> ... this (in particular "restores the behavior") a valid description
>>>> of this change?
>>>
>>> Yes, it restores the previous behavior for PV dom0, as memory decoding
>>> is no longer turned off for any devices regardless of where the BARs
>>> are positioned.
>>
>> It restores one aspect of behavior but then puts in place another
>> restriction.
> 
> I assume the other restriction is about moving the check to vPCI code
> rather than disabling memory decoding?
> 
> It restores the behavior for PV dom0, and keeps a more 'contained' fix
> for PVH dom0.

Ouch, I'm sorry: I didn't pay attention to the "restore" applying to PV
(the similarity of the acronyms made me read "PVH" despite it being "PV").

>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>> ---
>>>>> AT Citrix we have a system with a device with the following BARs:
>>>>>
>>>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
>>>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
>>>>>
>>>>> And memory decoding enabled by the firmware.  With the current code
>>>>> (or any of the previous fix proposals), Xen would still disable memory
>>>>> decoding for the device, and the system will freeze when attempting to
>>>>> set EFI vars.
>>>>
>>>> Isn't the latter (BAR at address 0) yet another problem?
>>>
>>> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
>>> is a bug in the firmware but shouldn't prevent Xen from booting.
>>>
>>> In the above system address 0 is outside of the PCI host bridge
>>> window, so even if we mapped the BAR and memory decoding for the
>>> device was enabled accessing such BAR wouldn't work.
>>
>> It's mere luck I would say that in this case the BAR is outside the
>> bridge's window. What if this was a device integrated in the root
>> complex?
> 
> I would expect dom0 to reposition the BAR, but doesn't a root complex
> also have a set of windows in decodes accesses from (as listed in ACPI
> _CRS method for the device), and hence still need BARs to be
> positioned at certain ranges in order to be accessible?

Possibly; I guess I haven't learned enough of how this works at the
root complex. Yet still an unassigned BAR might end up overlapping a
valid window.

>>>> I have to admit
>>>> that I'm uncertain in how far it is a good idea to try to make Xen look
>>>> to work on such a system ...
>>>
>>> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
>>> would consider 75cc460a1b to be a regression for PV dom0 setups.
>>
>> Agreed, in a way it is a regression. In another way it is deliberate
>> behavior to not accept bogus configurations. The difficulty is to
>> find a reasonable balance between allowing Xen to work in such cases
>> and guarding Xen from suffering follow-on issues resulting from such
>> misconfiguration. After all if this system later was impacted by the
>> bad BAR(s), connecting the misbehavior to the root cause might end
>> up quite a bit more difficult.
> 
> IMO we should strive to boot (almost?) everywhere Linux (or your
> chosen dom0 OS) also boots, since that's what users expect.
> 
> I would assume if the system was impacted by the bad BARs, it would
> also affect the dom0 OS when booting natively on such platform.
> 
> What we do right now with memory decoding already leads to a very
> difficult to diagnose issue, as on the above example calling an UEFI
> runtime method completely freezes the box (no debug keys, no watchdog
> worked).
> 
> So I think leaving the system PCI devices as-is and letting dom0 deal
> with the conflicts is likely a better option than playing with the
> memory decoding bits.

Maybe. None of the workarounds really feel very good.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally
  2022-10-24 11:04   ` Jan Beulich
@ 2022-10-24 16:01     ` Roger Pau Monné
  2022-10-24 16:06       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2022-10-24 16:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, Oct 24, 2022 at 01:04:01PM +0200, Jan Beulich wrote:
> On 20.10.2022 11:46, Roger Pau Monne wrote:
> > It's possible for a device to be assigned to a domain but have no
> > vpci structure if vpci_process_pending() failed and called
> > vpci_remove_device() as a result.  The unconditional accesses done by
> > vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
> > then trigger a NULL pointer dereference.
> > 
> > Add checks for pdev->vpci presence in the affected functions.
> > 
> > Fixes: 9c244fdef7 ('vpci: add header handlers')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> I wonder though whether these changes are enough. Is
> vpci_process_pending() immune to a pdev losing its ->vpci?

I think this is safe so far because the only place where
vpci_remove_device() gets called that doesn't also deassign the device
from the domain is vpci_process_pending(), and in that error path it
also clears any pending work.  Since the device no longer has ->vpci
handlers  no further calls to vpci_process_pending() can happen.

> Furthermore msix_find() iterates over d->arch.hvm.msix_tables, which
> looks to only ever be added to. Doesn't this list need pruning by
> vpci_remove_device()? I've noticed this only because of looking at
> derefs of ->vpci in msix.c - I don't think I can easily see that all
> of those derefs are once again immune to a pdev losing its ->vpci.

I think you are correct, we are missing a
list_del(&pdev->vpci->msix->next) in vpci_remove_device().  We will
however have locking issues with this, as msix_find() doesn't take any
locks, neither do it's callers.  I guess this will be fixed as part of
the lager add vPCI locking series.  Will add another patch to the
series with the MSIX table removal.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped
  2022-10-24 15:04     ` Roger Pau Monné
@ 2022-10-24 16:03       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 16:03 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 24.10.2022 17:04, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 01:51:03PM +0200, Jan Beulich wrote:
>> On 20.10.2022 11:46, Roger Pau Monne wrote:
>>> Writes to the BARs are ignored if memory decoding is enabled for the
>>> device, and the same happen with ROM BARs if the write is an attempt
>>> to change the position of the BAR without disabling it first.
>>>
>>> The reason of ignoring such writes is a limitation in Xen, as it would
>>> need to unmap the BAR, change the address, and remap the BAR at the
>>> new position, which the current logic doesn't support.
>>>
>>> Some devices however seem to have the memory decoding bit hardcoded to
>>> enabled, and attempts to disable it don't get reflected on the
>>> command register.
>>
>> This isn't compliant with the spec, is it? It looks to contradict both
>> "When a 0 is written to this register, the device is logically
>> disconnected from the PCI bus for all accesses except configuration
>> accesses" and "Devices typically power up with all 0's in this
>> register, but Section 6.6 explains some exceptions" (quoting from the
>> old 3.0 spec, which I have readily to hand). The referenced section
>> then says "Such devices are required to support the Command register
>> disabling function described in Section 6.2.2".
> 
> It's unclear to me whether setting the bit to 0 is plain ignored, or
> just fails to be reflected on the command register.
> 
>> How does any arbitrary OS go about sizing the BARs of such a device?
> 
> AFAICT from Linux behavior, an OS would just set to 0 the memory
> decoding command register bit and write the value, but there's no
> check afterwards that the returned value from a read of the register
> still has memory decoding disabled.   Xen does exactly the same:
> attempt to toggle the bit but don't check the value written.

Sure - both assume that the bit is writable in the first place. Yet
altering the BARs when the write had no effect is not really correct.

>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -388,7 +388,7 @@ static void cf_check bar_write(
>>>      else
>>>          val &= PCI_BASE_ADDRESS_MEM_MASK;
>>>  
>>> -    if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>> +    if ( bar->enabled )
>>>      {
>>>          /* If the value written is the current one avoid printing a warning. */
>>>          if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>> @@ -425,7 +425,7 @@ static void cf_check rom_write(
>>>      uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>>>      bool new_enabled = val & PCI_ROM_ADDRESS_ENABLE;
>>>  
>>> -    if ( (cmd & PCI_COMMAND_MEMORY) && header->rom_enabled && new_enabled )
>>> +    if ( rom->enabled && new_enabled )
>>>      {
>>>          gprintk(XENLOG_WARNING,
>>>                  "%pp: ignored ROM BAR write with memory decoding enabled\n",
>>
>> The log message wording then wants adjustment, I guess?
> 
> Hm, I think the message is fine for the purposes here, vPCI is indeed
> ignoring a write with memory decoding enabled, or else rom->enabled
> would be false.
> 
> Or are you arguing that the message is already wrong in the current
> context and should instead be:
> 
> "ignored ROM BAR write with memory decoding and ROM enabled"

No, my point rather was that we're no longer checking for memory decoding
to be disabled. Aiui we still require that the guest has cleared its view
of the bit, so I guess the messages could still be viewed as applicable.
Personally I'd prefer disambiguation.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally
  2022-10-24 16:01     ` Roger Pau Monné
@ 2022-10-24 16:06       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2022-10-24 16:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 24.10.2022 18:01, Roger Pau Monné wrote:
> On Mon, Oct 24, 2022 at 01:04:01PM +0200, Jan Beulich wrote:
>> Furthermore msix_find() iterates over d->arch.hvm.msix_tables, which
>> looks to only ever be added to. Doesn't this list need pruning by
>> vpci_remove_device()? I've noticed this only because of looking at
>> derefs of ->vpci in msix.c - I don't think I can easily see that all
>> of those derefs are once again immune to a pdev losing its ->vpci.
> 
> I think you are correct, we are missing a
> list_del(&pdev->vpci->msix->next) in vpci_remove_device().  We will
> however have locking issues with this, as msix_find() doesn't take any
> locks, neither do it's callers.  I guess this will be fixed as part of
> the lager add vPCI locking series.  Will add another patch to the
> series with the MSIX table removal.

But the locking issue the isn't new then: List insertion can also disturb
msix_find(), can't it?

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices
  2022-10-24 15:56           ` Jan Beulich
@ 2022-10-24 16:24             ` Roger Pau Monné
  0 siblings, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2022-10-24 16:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant

On Mon, Oct 24, 2022 at 05:56:25PM +0200, Jan Beulich wrote:
> On 24.10.2022 17:45, Roger Pau Monné wrote:
> > On Mon, Oct 24, 2022 at 03:59:18PM +0200, Jan Beulich wrote:
> >> On 24.10.2022 14:45, Roger Pau Monné wrote:
> >>> On Mon, Oct 24, 2022 at 01:19:22PM +0200, Jan Beulich wrote:
> >>>> On 20.10.2022 11:46, Roger Pau Monne wrote:
> >>>>> Commit 75cc460a1b added checks to ensure the position of the BARs from
> >>>>> PCI devices don't overlap with regions defined on the memory map.
> >>>>> When there's a collision memory decoding is left disabled for the
> >>>>> device, assuming that dom0 will reposition the BAR if necessary and
> >>>>> enable memory decoding.
> >>>>>
> >>>>> While this would be the case for devices being used by dom0, devices
> >>>>> being used by the firmware itself that have no driver would usually be
> >>>>> left with memory decoding disabled by dom0 if that's the state dom0
> >>>>> found them in, and thus firmware trying to make use of them will not
> >>>>> function correctly.
> >>>>>
> >>>>> The initial intent of 75cc460a1b was to prevent vPCI from creating
> >>>>> MMIO mappings on the dom0 p2m over regions that would otherwise
> >>>>> already have mappings established.  It's my view now that we likely
> >>>>> went too far with 75cc460a1b, and Xen disabling memory decoding of
> >>>>> devices (as buggy as they might be) is harmful, and reduces the set of
> >>>>> hardware on which Xen works.
> >>>>>
> >>>>> This commits reverts most of 75cc460a1b, and instead adds checks to
> >>>>> vPCI in order to prevent misplaced BARs from being added to the
> >>>>> hardware domain p2m.
> >>>>
> >>>> Which makes me wonder: How do things work then? Dom0 then still can't
> >>>> access the BAR address range, can it?
> >>>
> >>> It does allow access on some situations where the previous arrangement
> >>> didn't work because it wholesale disabled memory decoding for the
> >>> device.
> >>>
> >>> So if it's only one BAR that's misplaced the rest will still get added
> >>> to the dom0 p2m and be accessible, because memory decoding won't be
> >>> turned off for the device.
> >>
> >> Right - without a per-BAR disable there can only be all or nothing. In
> >> the end if things work with this adjustment, the problem BAR cannot
> >> really be in use aiui. I wonder what you would propose we do if on
> >> another system such a BAR is actually in use.
> > 
> > dom0 would have to change the position of the BAR to a suitable place
> > and then use it.  Linux dom0 does already reposition bogus BARs of
> > devices.
> 
> Yet that still can't realistically work if the firmware expects the
> BAR at the address recorded in the EFI memory map entry.

I was thinking about the BAR at address 0, rather than the BAR in the
EfiMemoryMappedIO region.

dom0 OS would need to avoid moving it, but that would also apply when
running natively on the platform.  The behavior when running as a dom0
won't change vs the native behavior, which is what we should aim for.

> >>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>> ---
> >>>>> AT Citrix we have a system with a device with the following BARs:
> >>>>>
> >>>>> BAR [0xfe010, 0xfe010] -> in a EfiMemoryMappedIO region
> >>>>> BAR [0, 0x1fff] -> not positioned, outside host bridge window
> >>>>>
> >>>>> And memory decoding enabled by the firmware.  With the current code
> >>>>> (or any of the previous fix proposals), Xen would still disable memory
> >>>>> decoding for the device, and the system will freeze when attempting to
> >>>>> set EFI vars.
> >>>>
> >>>> Isn't the latter (BAR at address 0) yet another problem?
> >>>
> >>> It's a BAR that hasn't been positioned by the firmware AFAICT.  Which
> >>> is a bug in the firmware but shouldn't prevent Xen from booting.
> >>>
> >>> In the above system address 0 is outside of the PCI host bridge
> >>> window, so even if we mapped the BAR and memory decoding for the
> >>> device was enabled accessing such BAR wouldn't work.
> >>
> >> It's mere luck I would say that in this case the BAR is outside the
> >> bridge's window. What if this was a device integrated in the root
> >> complex?
> > 
> > I would expect dom0 to reposition the BAR, but doesn't a root complex
> > also have a set of windows in decodes accesses from (as listed in ACPI
> > _CRS method for the device), and hence still need BARs to be
> > positioned at certain ranges in order to be accessible?
> 
> Possibly; I guess I haven't learned enough of how this works at the
> root complex. Yet still an unassigned BAR might end up overlapping a
> valid window.

Right, but if the BAR overlaps a valid window it could be seen as
correctly positioned, and in any case that would be for dom0 to deal
with.

What we care about is BARs no overlapping regions on the memory map,
so that we can setup a valid p2m for dom0.

> >>>> I have to admit
> >>>> that I'm uncertain in how far it is a good idea to try to make Xen look
> >>>> to work on such a system ...
> >>>
> >>> PV dom0 works on a system like the above prior to c/s 75cc460a1b, so I
> >>> would consider 75cc460a1b to be a regression for PV dom0 setups.
> >>
> >> Agreed, in a way it is a regression. In another way it is deliberate
> >> behavior to not accept bogus configurations. The difficulty is to
> >> find a reasonable balance between allowing Xen to work in such cases
> >> and guarding Xen from suffering follow-on issues resulting from such
> >> misconfiguration. After all if this system later was impacted by the
> >> bad BAR(s), connecting the misbehavior to the root cause might end
> >> up quite a bit more difficult.
> > 
> > IMO we should strive to boot (almost?) everywhere Linux (or your
> > chosen dom0 OS) also boots, since that's what users expect.
> > 
> > I would assume if the system was impacted by the bad BARs, it would
> > also affect the dom0 OS when booting natively on such platform.
> > 
> > What we do right now with memory decoding already leads to a very
> > difficult to diagnose issue, as on the above example calling an UEFI
> > runtime method completely freezes the box (no debug keys, no watchdog
> > worked).
> > 
> > So I think leaving the system PCI devices as-is and letting dom0 deal
> > with the conflicts is likely a better option than playing with the
> > memory decoding bits.
> 
> Maybe. None of the workarounds really feel very good.

Hence this last suggestion which limits the workarounds to PVH dom0
only, thus limiting the interaction of Xen with PCI devices as much as
possible.  I think it's an appropriate compromise between being able
to boot as PVH dom0 and not playing with the PCI device memory
decoding bits.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-10-24 16:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20  9:46 [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Roger Pau Monne
2022-10-20  9:46 ` [PATCH for-4.17 1/6] test/vpci: add dummy cfcheck define Roger Pau Monne
2022-10-20  9:57   ` Andrew Cooper
2022-10-20 13:20   ` Anthony PERARD
2022-10-20  9:46 ` [PATCH for-4.17 2/6] test/vpci: fix vPCI test harness to provide pci_get_pdev() Roger Pau Monne
2022-10-20 13:21   ` Anthony PERARD
2022-10-20  9:46 ` [PATCH for-4.17 3/6] vpci: don't assume that vpci per-device data exists unconditionally Roger Pau Monne
2022-10-24 11:04   ` Jan Beulich
2022-10-24 16:01     ` Roger Pau Monné
2022-10-24 16:06       ` Jan Beulich
2022-10-20  9:46 ` [PATCH for-4.17 4/6] vpci: introduce a local vpci_bar variable to modify_decoding() Roger Pau Monne
2022-10-24 11:05   ` Jan Beulich
2022-10-20  9:46 ` [PATCH for-4.17 5/6] pci: do not disable memory decoding for devices Roger Pau Monne
2022-10-24 11:19   ` Jan Beulich
2022-10-24 12:45     ` Roger Pau Monné
2022-10-24 13:59       ` Jan Beulich
2022-10-24 15:45         ` Roger Pau Monné
2022-10-24 15:56           ` Jan Beulich
2022-10-24 16:24             ` Roger Pau Monné
2022-10-20  9:46 ` [PATCH for-4.17 6/6] vpci: refuse BAR writes only if the BAR is mapped Roger Pau Monne
2022-10-24 11:51   ` Jan Beulich
2022-10-24 15:04     ` Roger Pau Monné
2022-10-24 16:03       ` Jan Beulich
2022-10-20 10:12 ` [PATCH for-4.17 0/6] (v)pci: fixes related to memory decoding handling Henry Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.