All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/2] vPCI capabilities filtering
@ 2023-09-13 14:35 Stewart Hildebrand
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
  2023-09-13 14:35 ` [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities Stewart Hildebrand
  0 siblings, 2 replies; 21+ messages in thread
From: Stewart Hildebrand @ 2023-09-13 14:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Wei Liu

This small series enables vPCI to filter which PCI capabilities we expose to a
domU. This series adds vPCI register handlers within
xen/drivers/vpci/header.c:init_bars(), along with some supporting functions.

Note there are minor rebase conflicts with the in-progress vPCI series [1].
These conflicts fall into the category of functions and code being added
adjacent to one another, so are easily resolved. I did not identify any
dependency on the vPCI locking work, and the two series deal with different
aspects of emulating the PCI header.

Future work may involve adding handlers for more registers in the vPCI header,
such as VID/DID, etc. Future work may also involve exposing additional
capabilities to the guest for broader device/driver support.

v6->v7:
* address feedback in ("xen/vpci: header: status register handler")
* drop ("xen/pci: convert pci_find_*cap* to pci_sbdf_t") and
  ("x86/msi: rearrange read_pci_mem_bar slightly") as they were committed

v5->v6:
* drop ("xen/pci: update PCI_STATUS_* constants") as it has been committed

v4->v5:
* drop ("x86/msi: remove some unused-but-set-variables") as it has been
  committed
* add ("xen/pci: update PCI_STATUS_* constants")
* squash ro_mask patch

v3->v4:
* drop "xen/pci: address a violation of MISRA C:2012 Rule 8.3" as it has been
  committed
* re-order status register handler and capabilities filtering patches
* split an unrelated change from ("xen/pci: convert pci_find_*cap* to pci_sbdf_t")
  into its own patch
* add new patch ("x86/msi: rearrange read_pci_mem_bar slightly") based on
  feedback
* add new RFC patch ("xen/vpci: support ro mask")

v2->v3:
* drop RFC "xen/vpci: header: avoid cast for value passed to vpci_read_val"
* minor misra C violation fixup in preparatory patch
* switch to pci_sbdf_t in preparatory patch
* introduce status handler

v1->v2:
* squash helper functions into the patch where they are used to avoid transient
  dead code situation
* add new RFC patch, possibly throwaway, to get an idea of what it would look
  like to get rid of the (void *)(uintptr_t) cast by introducing a new memory
  allocation

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-08/msg02361.html

Stewart Hildebrand (2):
  xen/vpci: header: status register handler
  xen/vpci: header: filter PCI capabilities

 xen/drivers/pci/pci.c      | 26 +++++++----
 xen/drivers/vpci/header.c  | 92 ++++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c    | 67 ++++++++++++++++++++++-----
 xen/include/xen/pci.h      |  3 ++
 xen/include/xen/pci_regs.h |  9 ++++
 xen/include/xen/vpci.h     | 13 ++++++
 6 files changed, 189 insertions(+), 21 deletions(-)


base-commit: 6aa25c32180ab59081c73bae4c568367d9133a1f
-- 
2.42.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-09-13 14:35 [PATCH v7 0/2] vPCI capabilities filtering Stewart Hildebrand
@ 2023-09-13 14:35 ` Stewart Hildebrand
  2023-09-14 11:12   ` Jan Beulich
                     ` (3 more replies)
  2023-09-13 14:35 ` [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities Stewart Hildebrand
  1 sibling, 4 replies; 21+ messages in thread
From: Stewart Hildebrand @ 2023-09-13 14:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Wei Liu

Introduce a handler for the PCI status register, with ability to mask the
capabilities bit. The status register contains RsvdZ bits, read-only bits, and
write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
in the bitmask is set, then the special meaning applies:

  rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
  ro_mask: read normal, guest write ignore (preserve on write to hardware)
  rw1c_mask: read normal, write 1 to clear

The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.

Xen preserves the value of read-only bits on write to hardware, discarding the
guests write value.

The mask_cap_list flag will be set in a follow-on patch.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v6->v7:
* re-work args passed to vpci_add_register_mask() (called in init_bars())
* also check for overlap of (rsvdz_mask & ro_mask) in add_register()
* slightly adjust masking operation in vpci_write_helper()

v5->v6:
* remove duplicate PCI_STATUS_CAP_LIST in constant definition
* style fixup in constant definitions
* s/res_mask/rsvdz_mask/
* combine a new masking operation into single line
* preserve r/o bits on write
* get rid of status_read. Instead, use rsvdz_mask for conditionally masking
  PCI_STATUS_CAP_LIST bit
* add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
* add sanity checks in add_register
* move mask_cap_list from struct vpci_header to local variable

v4->v5:
* add support for res_mask
* add support for ro_mask (squash ro_mask patch)
* add constants for reserved, read-only, and rw1c masks

v3->v4:
* move mask_cap_list setting to the capabilities patch
* single pci_conf_read16 in status_read
* align mask_cap_list bitfield in struct vpci_header
* change to rw1c bit mask instead of treating whole register as rw1c
* drop subsystem prefix on renamed add_register function

v2->v3:
* new patch
---
 xen/drivers/vpci/header.c  | 16 +++++++++++
 xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
 xen/include/xen/pci_regs.h |  9 +++++++
 xen/include/xen/vpci.h     |  8 ++++++
 4 files changed, 76 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 767c1ba718d7..af267b75ac31 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -521,6 +521,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
+    bool mask_cap_list = false;
 
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
@@ -544,6 +545,21 @@ static int cf_check init_bars(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    /*
+     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
+     * support for rsvdp (reserved & preserved) is added in the future, the
+     * rsvdp mask should be used instead.
+     */
+    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
+                                PCI_STATUS, 2, NULL,
+                                PCI_STATUS_RSVDZ_MASK |
+                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                PCI_STATUS_RO_MASK &
+                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
+                                PCI_STATUS_RW1C_MASK);
+    if ( rc )
+        return rc;
+
     if ( pdev->ignore_bars )
         return 0;
 
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 3bec9a4153da..b4cde7db1b3f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -29,6 +29,9 @@ struct vpci_register {
     unsigned int offset;
     void *private;
     struct list_head node;
+    uint32_t rsvdz_mask;
+    uint32_t ro_mask;
+    uint32_t rw1c_mask;
 };
 
 #ifdef __XEN__
@@ -145,9 +148,16 @@ uint32_t cf_check vpci_hw_read32(
     return pci_conf_read32(pdev->sbdf, reg);
 }
 
-int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
-                      vpci_write_t *write_handler, unsigned int offset,
-                      unsigned int size, void *data)
+void cf_check vpci_hw_write16(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    pci_conf_write16(pdev->sbdf, reg, val);
+}
+
+static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
+                        vpci_write_t *write_handler, unsigned int offset,
+                        unsigned int size, void *data, uint32_t rsvdz_mask,
+                        uint32_t ro_mask, uint32_t rw1c_mask)
 {
     struct list_head *prev;
     struct vpci_register *r;
@@ -155,7 +165,8 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     /* Some sanity checks. */
     if ( (size != 1 && size != 2 && size != 4) ||
          offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
-         (!read_handler && !write_handler) )
+         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
+         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
         return -EINVAL;
 
     r = xmalloc(struct vpci_register);
@@ -167,6 +178,9 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     r->size = size;
     r->offset = offset;
     r->private = data;
+    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
+    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
+    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
 
     spin_lock(&vpci->lock);
 
@@ -193,6 +207,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     return 0;
 }
 
+int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
+                      vpci_write_t *write_handler, unsigned int offset,
+                      unsigned int size, void *data)
+{
+    return add_register(vpci, read_handler, write_handler, offset, size, data,
+                        0, 0, 0);
+}
+
+int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
+                           vpci_write_t *write_handler, unsigned int offset,
+                           unsigned int size, void *data, uint32_t rsvdz_mask,
+                           uint32_t ro_mask, uint32_t rw1c_mask)
+{
+    return add_register(vpci, read_handler, write_handler, offset, size, data,
+                        rsvdz_mask, ro_mask, rw1c_mask);
+}
+
 int vpci_remove_register(struct vpci *vpci, unsigned int offset,
                          unsigned int size)
 {
@@ -376,6 +407,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
         }
 
         val = r->read(pdev, r->offset, r->private);
+        val &= ~r->rsvdz_mask;
 
         /* Check if the read is in the middle of a register. */
         if ( r->offset < emu.offset )
@@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
 /*
  * Perform a maybe partial write to a register.
- *
- * Note that this will only work for simple registers, if Xen needs to
- * trap accesses to rw1c registers (like the status PCI header register)
- * the logic in vpci_write will have to be expanded in order to correctly
- * deal with them.
  */
 static void vpci_write_helper(const struct pci_dev *pdev,
                               const struct vpci_register *r, unsigned int size,
                               unsigned int offset, uint32_t data)
 {
+    uint32_t val = 0;
+
     ASSERT(size <= r->size);
 
-    if ( size != r->size )
+    if ( (size != r->size) || r->ro_mask )
     {
-        uint32_t val;
-
         val = r->read(pdev, r->offset, r->private);
+        val &= ~r->rw1c_mask;
         data = merge_result(val, data, size, offset);
     }
 
+    data &= ~(r->rsvdz_mask | r->ro_mask);
+    data |= val & r->ro_mask;
+
     r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
              r->private);
 }
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 84b18736a85d..b72131729db6 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -66,6 +66,15 @@
 #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
 #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
 #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
+#define  PCI_STATUS_RSVDZ_MASK		0x0006
+
+#define  PCI_STATUS_RO_MASK (PCI_STATUS_IMM_READY | PCI_STATUS_INTERRUPT | \
+    PCI_STATUS_CAP_LIST | PCI_STATUS_66MHZ | PCI_STATUS_UDF | \
+    PCI_STATUS_FAST_BACK | PCI_STATUS_DEVSEL_MASK)
+#define  PCI_STATUS_RW1C_MASK (PCI_STATUS_PARITY | \
+    PCI_STATUS_SIG_TARGET_ABORT | PCI_STATUS_REC_TARGET_ABORT | \
+    PCI_STATUS_REC_MASTER_ABORT | PCI_STATUS_SIG_SYSTEM_ERROR | \
+    PCI_STATUS_DETECTED_PARITY)
 
 #define PCI_CLASS_REVISION	0x08	/* High 24 bits are class, low 8 revision */
 #define PCI_REVISION_ID		0x08	/* Revision ID */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0b8a2a3c745b..7a5cca29e54c 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -37,6 +37,12 @@ int __must_check vpci_add_register(struct vpci *vpci,
                                    vpci_write_t *write_handler,
                                    unsigned int offset, unsigned int size,
                                    void *data);
+int __must_check vpci_add_register_mask(struct vpci *vpci,
+                                        vpci_read_t *read_handler,
+                                        vpci_write_t *write_handler,
+                                        unsigned int offset, unsigned int size,
+                                        void *data, uint32_t rsvdz_mask,
+                                        uint32_t ro_mask, uint32_t rw1c_mask);
 int __must_check vpci_remove_register(struct vpci *vpci, unsigned int offset,
                                       unsigned int size);
 
@@ -50,6 +56,8 @@ uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(
     const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_hw_write16(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data);
 
 /*
  * Check for pending vPCI operations on this vcpu. Returns true if the vcpu
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities
  2023-09-13 14:35 [PATCH v7 0/2] vPCI capabilities filtering Stewart Hildebrand
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
@ 2023-09-13 14:35 ` Stewart Hildebrand
  2023-11-17 11:44   ` Roger Pau Monné
  1 sibling, 1 reply; 21+ messages in thread
From: Stewart Hildebrand @ 2023-09-13 14:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné

Currently, Xen vPCI only supports virtualizing the MSI and MSI-X capabilities.
Hide all other PCI capabilities (including extended capabilities) from domUs for
now, even though there may be certain devices/drivers that depend on being able
to discover certain capabilities.

We parse the physical PCI capabilities linked list and add vPCI register
handlers for the next elements, inserting our own next value, thus presenting a
modified linked list to the domU.

Introduce helper functions vpci_hw_read8 and vpci_read_val. The vpci_read_val
helper function returns a fixed value, which may be used for RAZ registers, or
registers whose value doesn't change.

Introduce pci_find_next_cap_ttl() helper while adapting the logic from
pci_find_next_cap() to suit our needs, and implement the existing
pci_find_next_cap() in terms of the new helper.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v6->v7:
* no change

v5->v6:
* add register handlers before status register handler in init_bars()
* s/header->mask_cap_list/mask_cap_list/

v4->v5:
* use more appropriate types, continued
* get rid of unnecessary hook function
* add Jan's R-b

v3->v4:
* move mask_cap_list setting to this patch
* leave pci_find_next_cap signature alone
* use more appropriate types

v2->v3:
* get rid of > 0 in loop condition
* implement pci_find_next_cap in terms of new pci_find_next_cap_ttl function so
  that hypothetical future callers wouldn't be required to pass &ttl.
* change NULL to (void *)0 for RAZ value passed to vpci_read_val
* change type of ttl to unsigned int
* remember to mask off the low 2 bits of next in the initial loop iteration
* change return type of pci_find_next_cap and pci_find_next_cap_ttl
* avoid wrapping the PCI_STATUS_CAP_LIST condition by using ! instead of == 0

v1->v2:
* change type of ttl to int
* use switch statement instead of if/else
* adapt existing pci_find_next_cap helper instead of rolling our own
* pass ttl as in/out
* "pass through" the lower 2 bits of the next pointer
* squash helper functions into this patch to avoid transient dead code situation
* extended capabilities RAZ/WI
---
 xen/drivers/pci/pci.c     | 26 +++++++++-----
 xen/drivers/vpci/header.c | 76 +++++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c   | 12 +++++++
 xen/include/xen/pci.h     |  3 ++
 xen/include/xen/vpci.h    |  5 +++
 5 files changed, 113 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index 3569ccb24e9e..8799d60c2156 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -39,31 +39,39 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
     return 0;
 }
 
-unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
-                               unsigned int cap)
+unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
+                                   bool (*is_match)(unsigned int),
+                                   unsigned int cap, unsigned int *ttl)
 {
-    u8 id;
-    int ttl = 48;
+    unsigned int id;
 
-    while ( ttl-- )
+    while ( (*ttl)-- )
     {
         pos = pci_conf_read8(sbdf, pos);
         if ( pos < 0x40 )
             break;
 
-        pos &= ~3;
-        id = pci_conf_read8(sbdf, pos + PCI_CAP_LIST_ID);
+        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
 
         if ( id == 0xff )
             break;
-        if ( id == cap )
+        if ( (is_match && is_match(id)) || (!is_match && id == cap) )
             return pos;
 
-        pos += PCI_CAP_LIST_NEXT;
+        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
     }
+
     return 0;
 }
 
+unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
+                               unsigned int cap)
+{
+    unsigned int ttl = 48;
+
+    return pci_find_next_cap_ttl(sbdf, pos, NULL, cap, &ttl) & ~3;
+}
+
 /**
  * pci_find_ext_capability - Find an extended capability
  * @sbdf: PCI device to query
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index af267b75ac31..1e7dfe668ccf 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -513,6 +513,18 @@ static void cf_check rom_write(
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static bool cf_check vpci_cap_supported(unsigned int id)
+{
+    switch ( id )
+    {
+    case PCI_CAP_ID_MSI:
+    case PCI_CAP_ID_MSIX:
+        return true;
+    default:
+        return false;
+    }
+}
+
 static int cf_check init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -545,6 +557,70 @@ static int cf_check init_bars(struct pci_dev *pdev)
     if ( rc )
         return rc;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        if ( !(pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST) )
+        {
+            /* RAZ/WI */
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   PCI_CAPABILITY_LIST, 1, (void *)0);
+            if ( rc )
+                return rc;
+        }
+        else
+        {
+            /* Only expose capabilities to the guest that vPCI can handle. */
+            unsigned int next, ttl = 48;
+
+            next = pci_find_next_cap_ttl(pdev->sbdf, PCI_CAPABILITY_LIST,
+                                         vpci_cap_supported, 0, &ttl);
+
+            rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                   PCI_CAPABILITY_LIST, 1,
+                                   (void *)(uintptr_t)next);
+            if ( rc )
+                return rc;
+
+            next &= ~3;
+
+            if ( !next )
+                /*
+                 * If we don't have any supported capabilities to expose to the
+                 * guest, mask the PCI_STATUS_CAP_LIST bit in the status
+                 * register.
+                 */
+                mask_cap_list = true;
+
+            while ( next && ttl )
+            {
+                unsigned int pos = next;
+
+                next = pci_find_next_cap_ttl(pdev->sbdf,
+                                             pos + PCI_CAP_LIST_NEXT,
+                                             vpci_cap_supported, 0, &ttl);
+
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read8, NULL,
+                                       pos + PCI_CAP_LIST_ID, 1, NULL);
+                if ( rc )
+                    return rc;
+
+                rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL,
+                                       pos + PCI_CAP_LIST_NEXT, 1,
+                                       (void *)(uintptr_t)next);
+                if ( rc )
+                    return rc;
+
+                next &= ~3;
+            }
+        }
+
+        /* Extended capabilities RAZ/WI */
+        rc = vpci_add_register(pdev->vpci, vpci_read_val, NULL, 0x100, 4,
+                               (void *)0);
+        if ( rc )
+            return rc;
+    }
+
     /*
      * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
      * support for rsvdp (reserved & preserved) is added in the future, the
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index b4cde7db1b3f..e3b866aaf7a4 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -136,6 +136,18 @@ static void cf_check vpci_ignored_write(
 {
 }
 
+uint32_t cf_check vpci_read_val(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    return (uintptr_t)data;
+}
+
+uint32_t cf_check vpci_hw_read8(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    return pci_conf_read8(pdev->sbdf, reg);
+}
+
 uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data)
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 251b8761a8e9..db927eeaeef2 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -194,6 +194,9 @@ int pci_mmcfg_read(unsigned int seg, unsigned int bus,
 int pci_mmcfg_write(unsigned int seg, unsigned int bus,
                     unsigned int devfn, int reg, int len, u32 value);
 unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap);
+unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
+                                   bool (*is_match)(unsigned int),
+                                   unsigned int cap, unsigned int *ttl);
 unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
                                unsigned int cap);
 unsigned int pci_find_ext_capability(pci_sbdf_t sbdf, unsigned int cap);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 7a5cca29e54c..b79efc49bad6 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -51,7 +51,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size);
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data);
 
+uint32_t cf_check vpci_read_val(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
+
 /* Passthrough handlers. */
+uint32_t cf_check vpci_hw_read8(
+    const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read16(
     const struct pci_dev *pdev, unsigned int reg, void *data);
 uint32_t cf_check vpci_hw_read32(
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
@ 2023-09-14 11:12   ` Jan Beulich
  2023-11-17 12:40   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2023-09-14 11:12 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 13.09.2023 16:35, Stewart Hildebrand wrote:
> Introduce a handler for the PCI status register, with ability to mask the
> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
> in the bitmask is set, then the special meaning applies:
> 
>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>   rw1c_mask: read normal, write 1 to clear
> 
> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
> 
> Xen preserves the value of read-only bits on write to hardware, discarding the
> guests write value.
> 
> The mask_cap_list flag will be set in a follow-on patch.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities
  2023-09-13 14:35 ` [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities Stewart Hildebrand
@ 2023-11-17 11:44   ` Roger Pau Monné
  2023-11-28 16:04     ` Stewart Hildebrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-17 11:44 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Sep 13, 2023 at 10:35:47AM -0400, Stewart Hildebrand wrote:
> Currently, Xen vPCI only supports virtualizing the MSI and MSI-X capabilities.
> Hide all other PCI capabilities (including extended capabilities) from domUs for
> now, even though there may be certain devices/drivers that depend on being able
> to discover certain capabilities.
> 
> We parse the physical PCI capabilities linked list and add vPCI register
> handlers for the next elements, inserting our own next value, thus presenting a
> modified linked list to the domU.
> 
> Introduce helper functions vpci_hw_read8 and vpci_read_val. The vpci_read_val
> helper function returns a fixed value, which may be used for RAZ registers, or
> registers whose value doesn't change.
> 
> Introduce pci_find_next_cap_ttl() helper while adapting the logic from
> pci_find_next_cap() to suit our needs, and implement the existing
> pci_find_next_cap() in terms of the new helper.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> v6->v7:
> * no change
> 
> v5->v6:
> * add register handlers before status register handler in init_bars()
> * s/header->mask_cap_list/mask_cap_list/
> 
> v4->v5:
> * use more appropriate types, continued
> * get rid of unnecessary hook function
> * add Jan's R-b
> 
> v3->v4:
> * move mask_cap_list setting to this patch
> * leave pci_find_next_cap signature alone
> * use more appropriate types
> 
> v2->v3:
> * get rid of > 0 in loop condition
> * implement pci_find_next_cap in terms of new pci_find_next_cap_ttl function so
>   that hypothetical future callers wouldn't be required to pass &ttl.
> * change NULL to (void *)0 for RAZ value passed to vpci_read_val
> * change type of ttl to unsigned int
> * remember to mask off the low 2 bits of next in the initial loop iteration
> * change return type of pci_find_next_cap and pci_find_next_cap_ttl
> * avoid wrapping the PCI_STATUS_CAP_LIST condition by using ! instead of == 0
> 
> v1->v2:
> * change type of ttl to int
> * use switch statement instead of if/else
> * adapt existing pci_find_next_cap helper instead of rolling our own
> * pass ttl as in/out
> * "pass through" the lower 2 bits of the next pointer
> * squash helper functions into this patch to avoid transient dead code situation
> * extended capabilities RAZ/WI
> ---
>  xen/drivers/pci/pci.c     | 26 +++++++++-----
>  xen/drivers/vpci/header.c | 76 +++++++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c   | 12 +++++++
>  xen/include/xen/pci.h     |  3 ++
>  xen/include/xen/vpci.h    |  5 +++
>  5 files changed, 113 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
> index 3569ccb24e9e..8799d60c2156 100644
> --- a/xen/drivers/pci/pci.c
> +++ b/xen/drivers/pci/pci.c
> @@ -39,31 +39,39 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>      return 0;
>  }
>  
> -unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
> -                               unsigned int cap)
> +unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
> +                                   bool (*is_match)(unsigned int),
> +                                   unsigned int cap, unsigned int *ttl)

Maybe this has been discussed in previous patch versions, but why
pass a match function instead of expanding the cap parameter to
be an array of capabilities to search for?

I find it kind of weird to be able to pass both a specific capability
to match against and also a match function.

What the expected behavior if the caller provides both a match
function and a cap value?

>  {
> -    u8 id;
> -    int ttl = 48;
> +    unsigned int id;
>  
> -    while ( ttl-- )
> +    while ( (*ttl)-- )
>      {
>          pos = pci_conf_read8(sbdf, pos);
>          if ( pos < 0x40 )
>              break;
>  
> -        pos &= ~3;
> -        id = pci_conf_read8(sbdf, pos + PCI_CAP_LIST_ID);
> +        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
>  
>          if ( id == 0xff )
>              break;
> -        if ( id == cap )
> +        if ( (is_match && is_match(id)) || (!is_match && id == cap) )
>              return pos;
>  
> -        pos += PCI_CAP_LIST_NEXT;
> +        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
>      }
> +
>      return 0;
>  }
>  
> +unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
> +                               unsigned int cap)
> +{
> +    unsigned int ttl = 48;
> +
> +    return pci_find_next_cap_ttl(sbdf, pos, NULL, cap, &ttl) & ~3;
> +}
> +
>  /**
>   * pci_find_ext_capability - Find an extended capability
>   * @sbdf: PCI device to query
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index af267b75ac31..1e7dfe668ccf 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -513,6 +513,18 @@ static void cf_check rom_write(
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> +static bool cf_check vpci_cap_supported(unsigned int id)
> +{
> +    switch ( id )
> +    {
> +    case PCI_CAP_ID_MSI:
> +    case PCI_CAP_ID_MSIX:
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +
>  static int cf_check init_bars(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
> @@ -545,6 +557,70 @@ static int cf_check init_bars(struct pci_dev *pdev)

We might have to rename this to init_header() now :).

>      if ( rc )
>          return rc;
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        if ( !(pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST) )
> +        {
> +            /* RAZ/WI */

That RAZ/WI acronym seems very Arm specific (TBH I had to search for
it).

FWIW, it's my understanding that if the status register doesn't report
the capability list support, the register is unimplemented, and hence
would be fine to return ~0 from reads of PCI_CAPABILITY_LIST?

IOW: I'm not sure we need to add this handler for PCI_CAPABILITY_LIST
if it's not supported.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
  2023-09-14 11:12   ` Jan Beulich
@ 2023-11-17 12:40   ` Roger Pau Monné
  2023-11-17 13:23     ` Jan Beulich
                       ` (2 more replies)
  2023-11-17 13:33   ` Roger Pau Monné
  2023-11-21 14:45   ` Roger Pau Monné
  3 siblings, 3 replies; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-17 12:40 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> Introduce a handler for the PCI status register, with ability to mask the
> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
> in the bitmask is set, then the special meaning applies:
> 
>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>   rw1c_mask: read normal, write 1 to clear
> 
> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
> 
> Xen preserves the value of read-only bits on write to hardware, discarding the
> guests write value.
> 
> The mask_cap_list flag will be set in a follow-on patch.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> ---
> v6->v7:
> * re-work args passed to vpci_add_register_mask() (called in init_bars())
> * also check for overlap of (rsvdz_mask & ro_mask) in add_register()
> * slightly adjust masking operation in vpci_write_helper()
> 
> v5->v6:
> * remove duplicate PCI_STATUS_CAP_LIST in constant definition
> * style fixup in constant definitions
> * s/res_mask/rsvdz_mask/
> * combine a new masking operation into single line
> * preserve r/o bits on write
> * get rid of status_read. Instead, use rsvdz_mask for conditionally masking
>   PCI_STATUS_CAP_LIST bit
> * add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
> * add sanity checks in add_register
> * move mask_cap_list from struct vpci_header to local variable
> 
> v4->v5:
> * add support for res_mask
> * add support for ro_mask (squash ro_mask patch)
> * add constants for reserved, read-only, and rw1c masks
> 
> v3->v4:
> * move mask_cap_list setting to the capabilities patch
> * single pci_conf_read16 in status_read
> * align mask_cap_list bitfield in struct vpci_header
> * change to rw1c bit mask instead of treating whole register as rw1c
> * drop subsystem prefix on renamed add_register function
> 
> v2->v3:
> * new patch
> ---
>  xen/drivers/vpci/header.c  | 16 +++++++++++
>  xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
>  xen/include/xen/pci_regs.h |  9 +++++++
>  xen/include/xen/vpci.h     |  8 ++++++
>  4 files changed, 76 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 767c1ba718d7..af267b75ac31 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -521,6 +521,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool mask_cap_list = false;
>  
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
> @@ -544,6 +545,21 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      if ( rc )
>          return rc;
>  
> +    /*
> +     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
> +     * support for rsvdp (reserved & preserved) is added in the future, the
> +     * rsvdp mask should be used instead.
> +     */
> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
> +                                PCI_STATUS, 2, NULL,
> +                                PCI_STATUS_RSVDZ_MASK |
> +                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> +                                PCI_STATUS_RO_MASK &
> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
> +                                PCI_STATUS_RW1C_MASK);
> +    if ( rc )
> +        return rc;
> +
>      if ( pdev->ignore_bars )
>          return 0;
>  
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 3bec9a4153da..b4cde7db1b3f 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -29,6 +29,9 @@ struct vpci_register {
>      unsigned int offset;
>      void *private;
>      struct list_head node;
> +    uint32_t rsvdz_mask;
> +    uint32_t ro_mask;
> +    uint32_t rw1c_mask;

I understand that we need the rw1c_mask in order to properly merge
values when doing partial writes, but the other fields I'm not sure we
do need them.  RO bits don't care about what's written to them, and
RsvdZ are always read as 0 and written as 0, so the mask shouldn't
affect the merging.

>  };
>  
>  #ifdef __XEN__
> @@ -145,9 +148,16 @@ uint32_t cf_check vpci_hw_read32(
>      return pci_conf_read32(pdev->sbdf, reg);
>  }
>  
> -int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> -                      vpci_write_t *write_handler, unsigned int offset,
> -                      unsigned int size, void *data)
> +void cf_check vpci_hw_write16(
> +    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
> +{
> +    pci_conf_write16(pdev->sbdf, reg, val);
> +}
> +
> +static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
> +                        vpci_write_t *write_handler, unsigned int offset,
> +                        unsigned int size, void *data, uint32_t rsvdz_mask,
> +                        uint32_t ro_mask, uint32_t rw1c_mask)
>  {
>      struct list_head *prev;
>      struct vpci_register *r;
> @@ -155,7 +165,8 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      /* Some sanity checks. */
>      if ( (size != 1 && size != 2 && size != 4) ||
>           offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
> -         (!read_handler && !write_handler) )
> +         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
> +         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
>          return -EINVAL;
>  
>      r = xmalloc(struct vpci_register);
> @@ -167,6 +178,9 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      r->size = size;
>      r->offset = offset;
>      r->private = data;
> +    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
> +    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
> +    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
>  
>      spin_lock(&vpci->lock);
>  
> @@ -193,6 +207,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      return 0;
>  }
>  
> +int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> +                      vpci_write_t *write_handler, unsigned int offset,
> +                      unsigned int size, void *data)
> +{
> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
> +                        0, 0, 0);
> +}
> +
> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
> +                           vpci_write_t *write_handler, unsigned int offset,
> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
> +                           uint32_t ro_mask, uint32_t rw1c_mask)
> +{
> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
> +                        rsvdz_mask, ro_mask, rw1c_mask);
> +}
> +
>  int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>                           unsigned int size)
>  {
> @@ -376,6 +407,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>          }
>  
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rsvdz_mask;
>  
>          /* Check if the read is in the middle of a register. */
>          if ( r->offset < emu.offset )
> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  
>  /*
>   * Perform a maybe partial write to a register.
> - *
> - * Note that this will only work for simple registers, if Xen needs to
> - * trap accesses to rw1c registers (like the status PCI header register)
> - * the logic in vpci_write will have to be expanded in order to correctly
> - * deal with them.
>   */
>  static void vpci_write_helper(const struct pci_dev *pdev,
>                                const struct vpci_register *r, unsigned int size,
>                                unsigned int offset, uint32_t data)
>  {
> +    uint32_t val = 0;
> +
>      ASSERT(size <= r->size);
>  
> -    if ( size != r->size )
> +    if ( (size != r->size) || r->ro_mask )

Hm, I'm not sure this specific handling for read-only bits is
required, software is allowed to write either 0 or 1 to read-only
bits, but the write will be ignored.

>      {
> -        uint32_t val;
> -
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rw1c_mask;
>          data = merge_result(val, data, size, offset);
>      }
>  
> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> +    data |= val & r->ro_mask;

You cannot apply the register masks directly into the final value, you
need to offset and mask them as necessary, likewise for val, see
what's done in merge_result().

Regardless of the offset issue, I think the usage of val with the
ro_mask is bogus here, see above.

> +
>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>               r->private);
>  }
> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> index 84b18736a85d..b72131729db6 100644
> --- a/xen/include/xen/pci_regs.h
> +++ b/xen/include/xen/pci_regs.h
> @@ -66,6 +66,15 @@
>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> +#define  PCI_STATUS_RSVDZ_MASK		0x0006

In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
be 0x46.

But I'm unsure we really need this mask.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-17 12:40   ` Roger Pau Monné
@ 2023-11-17 13:23     ` Jan Beulich
  2023-11-17 13:45       ` Roger Pau Monné
  2023-11-17 16:45     ` Roger Pau Monné
  2023-11-22 20:16     ` Stewart Hildebrand
  2 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2023-11-17 13:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu, Stewart Hildebrand

On 17.11.2023 13:40, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -29,6 +29,9 @@ struct vpci_register {
>>      unsigned int offset;
>>      void *private;
>>      struct list_head node;
>> +    uint32_t rsvdz_mask;
>> +    uint32_t ro_mask;
>> +    uint32_t rw1c_mask;
> 
> I understand that we need the rw1c_mask in order to properly merge
> values when doing partial writes, but the other fields I'm not sure we
> do need them.  RO bits don't care about what's written to them, and
> RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> affect the merging.

What some version of the spec says is r/o or reserved may be different
in another. Also iirc devices may (wrongly?) implement r/o bits as r/w.
When presenting a virtual view of devices to guests, in this regard I
think we want (or even need) to enforce our view of the world.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
  2023-09-14 11:12   ` Jan Beulich
  2023-11-17 12:40   ` Roger Pau Monné
@ 2023-11-17 13:33   ` Roger Pau Monné
  2023-11-22 22:04     ` Stewart Hildebrand
  2023-11-21 14:45   ` Roger Pau Monné
  3 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-17 13:33 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
> +                           vpci_write_t *write_handler, unsigned int offset,
> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
> +                           uint32_t ro_mask, uint32_t rw1c_mask)

Forgot to mention, it would be good if you can add some tests in
tools/tests/vpci that ensure the masks work as expected.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-17 13:23     ` Jan Beulich
@ 2023-11-17 13:45       ` Roger Pau Monné
  0 siblings, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-17 13:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu, Stewart Hildebrand

On Fri, Nov 17, 2023 at 02:23:42PM +0100, Jan Beulich wrote:
> On 17.11.2023 13:40, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -29,6 +29,9 @@ struct vpci_register {
> >>      unsigned int offset;
> >>      void *private;
> >>      struct list_head node;
> >> +    uint32_t rsvdz_mask;
> >> +    uint32_t ro_mask;
> >> +    uint32_t rw1c_mask;
> > 
> > I understand that we need the rw1c_mask in order to properly merge
> > values when doing partial writes, but the other fields I'm not sure we
> > do need them.  RO bits don't care about what's written to them, and
> > RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> > affect the merging.
> 
> What some version of the spec says is r/o or reserved may be different
> in another. Also iirc devices may (wrongly?) implement r/o bits as r/w.
> When presenting a virtual view of devices to guests, in this regard I
> think we want (or even need) to enforce our view of the world.

That needs to be part of the commit message then.

Ideally we would also want to do a swept of all registers we currently
implement, in order to check for ro or rsvdz bits and properly enforce
them.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-17 12:40   ` Roger Pau Monné
  2023-11-17 13:23     ` Jan Beulich
@ 2023-11-17 16:45     ` Roger Pau Monné
  2023-11-22 20:16     ` Stewart Hildebrand
  2 siblings, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-17 16:45 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Fri, Nov 17, 2023 at 01:40:37PM +0100, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >      {
> > -        uint32_t val;
> > -
> >          val = r->read(pdev, r->offset, r->private);
> > +        val &= ~r->rw1c_mask;
> >          data = merge_result(val, data, size, offset);
> >      }
> >  
> > +    data &= ~(r->rsvdz_mask | r->ro_mask);
> > +    data |= val & r->ro_mask;
> 
> You cannot apply the register masks directly into the final value, you
> need to offset and mask them as necessary, likewise for val, see
> what's done in merge_result().

Never mind, I was wrong, there's no need to offset anything here.

Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
                     ` (2 preceding siblings ...)
  2023-11-17 13:33   ` Roger Pau Monné
@ 2023-11-21 14:45   ` Roger Pau Monné
  2023-11-21 15:03     ` Stewart Hildebrand
  3 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-21 14:45 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  
>  /*
>   * Perform a maybe partial write to a register.
> - *
> - * Note that this will only work for simple registers, if Xen needs to
> - * trap accesses to rw1c registers (like the status PCI header register)
> - * the logic in vpci_write will have to be expanded in order to correctly
> - * deal with them.
>   */
>  static void vpci_write_helper(const struct pci_dev *pdev,
>                                const struct vpci_register *r, unsigned int size,
>                                unsigned int offset, uint32_t data)
>  {
> +    uint32_t val = 0;
> +
>      ASSERT(size <= r->size);
>  
> -    if ( size != r->size )
> +    if ( (size != r->size) || r->ro_mask )
>      {
> -        uint32_t val;
> -
>          val = r->read(pdev, r->offset, r->private);
> +        val &= ~r->rw1c_mask;
>          data = merge_result(val, data, size, offset);
>      }
>  
> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> +    data |= val & r->ro_mask;

I've been thinking about this, and the way the ro_mask is implemented
(and the way we want to handle ro bits) is the same behavior as RsvdP.
I would suggest to rename the ro_mask to rsvdp_mask and note
that for resilience reasons we will handle RO bits as RsvdP.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-21 14:45   ` Roger Pau Monné
@ 2023-11-21 15:03     ` Stewart Hildebrand
  2023-11-21 15:18       ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-21 15:03 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/21/23 09:45, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  
>>  /*
>>   * Perform a maybe partial write to a register.
>> - *
>> - * Note that this will only work for simple registers, if Xen needs to
>> - * trap accesses to rw1c registers (like the status PCI header register)
>> - * the logic in vpci_write will have to be expanded in order to correctly
>> - * deal with them.
>>   */
>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>                                const struct vpci_register *r, unsigned int size,
>>                                unsigned int offset, uint32_t data)
>>  {
>> +    uint32_t val = 0;
>> +
>>      ASSERT(size <= r->size);
>>  
>> -    if ( size != r->size )
>> +    if ( (size != r->size) || r->ro_mask )
>>      {
>> -        uint32_t val;
>> -
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rw1c_mask;
>>          data = merge_result(val, data, size, offset);
>>      }
>>  
>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>> +    data |= val & r->ro_mask;
> 
> I've been thinking about this, and the way the ro_mask is implemented
> (and the way we want to handle ro bits) is the same behavior as RsvdP.
> I would suggest to rename the ro_mask to rsvdp_mask and note
> that for resilience reasons we will handle RO bits as RsvdP.

But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-21 15:03     ` Stewart Hildebrand
@ 2023-11-21 15:18       ` Roger Pau Monné
  2023-11-21 16:27         ` Stewart Hildebrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-21 15:18 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
> On 11/21/23 09:45, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> >>  
> >>  /*
> >>   * Perform a maybe partial write to a register.
> >> - *
> >> - * Note that this will only work for simple registers, if Xen needs to
> >> - * trap accesses to rw1c registers (like the status PCI header register)
> >> - * the logic in vpci_write will have to be expanded in order to correctly
> >> - * deal with them.
> >>   */
> >>  static void vpci_write_helper(const struct pci_dev *pdev,
> >>                                const struct vpci_register *r, unsigned int size,
> >>                                unsigned int offset, uint32_t data)
> >>  {
> >> +    uint32_t val = 0;
> >> +
> >>      ASSERT(size <= r->size);
> >>  
> >> -    if ( size != r->size )
> >> +    if ( (size != r->size) || r->ro_mask )
> >>      {
> >> -        uint32_t val;
> >> -
> >>          val = r->read(pdev, r->offset, r->private);
> >> +        val &= ~r->rw1c_mask;
> >>          data = merge_result(val, data, size, offset);
> >>      }
> >>  
> >> +    data &= ~(r->rsvdz_mask | r->ro_mask);
> >> +    data |= val & r->ro_mask;
> > 
> > I've been thinking about this, and the way the ro_mask is implemented
> > (and the way we want to handle ro bits) is the same behavior as RsvdP.
> > I would suggest to rename the ro_mask to rsvdp_mask and note
> > that for resilience reasons we will handle RO bits as RsvdP.
> 
> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).

Hm, right, sorry for the wrong suggestion.  We should force bits to 0
for guests reads, but make sure that for writes the value on the
hardware is preserved.

So we need the separate mask for RsvdP, which will be handled like
ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().

Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-21 15:18       ` Roger Pau Monné
@ 2023-11-21 16:27         ` Stewart Hildebrand
  2023-11-21 16:46           ` Stewart Hildebrand
  0 siblings, 1 reply; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-21 16:27 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/21/23 10:18, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
>> On 11/21/23 09:45, Roger Pau Monné wrote:
>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>  
>>>>  /*
>>>>   * Perform a maybe partial write to a register.
>>>> - *
>>>> - * Note that this will only work for simple registers, if Xen needs to
>>>> - * trap accesses to rw1c registers (like the status PCI header register)
>>>> - * the logic in vpci_write will have to be expanded in order to correctly
>>>> - * deal with them.
>>>>   */
>>>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>>>                                const struct vpci_register *r, unsigned int size,
>>>>                                unsigned int offset, uint32_t data)
>>>>  {
>>>> +    uint32_t val = 0;
>>>> +
>>>>      ASSERT(size <= r->size);
>>>>  
>>>> -    if ( size != r->size )
>>>> +    if ( (size != r->size) || r->ro_mask )
>>>>      {
>>>> -        uint32_t val;
>>>> -
>>>>          val = r->read(pdev, r->offset, r->private);
>>>> +        val &= ~r->rw1c_mask;
>>>>          data = merge_result(val, data, size, offset);
>>>>      }
>>>>  
>>>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>>>> +    data |= val & r->ro_mask;
>>>
>>> I've been thinking about this, and the way the ro_mask is implemented
>>> (and the way we want to handle ro bits) is the same behavior as RsvdP.
>>> I would suggest to rename the ro_mask to rsvdp_mask and note
>>> that for resilience reasons we will handle RO bits as RsvdP.
>>
>> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).
> 
> Hm, right, sorry for the wrong suggestion.  We should force bits to 0
> for guests reads, but make sure that for writes the value on the
> hardware is preserved.
> 
> So we need the separate mask for RsvdP, which will be handled like
> ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().

Agreed. The only reason I didn't add RsvdP support in this patch was that it wasn't needed for the status register... Since RsvdP will be needed for the command register, I think I may as well implement it as part of this series, perhaps in a separate patch following this one. Stay tuned for v8.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-21 16:27         ` Stewart Hildebrand
@ 2023-11-21 16:46           ` Stewart Hildebrand
  0 siblings, 0 replies; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-21 16:46 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/21/23 11:27, Stewart Hildebrand wrote:
> On 11/21/23 10:18, Roger Pau Monné wrote:
>> On Tue, Nov 21, 2023 at 10:03:01AM -0500, Stewart Hildebrand wrote:
>>> On 11/21/23 09:45, Roger Pau Monné wrote:
>>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>>  
>>>>>  /*
>>>>>   * Perform a maybe partial write to a register.
>>>>> - *
>>>>> - * Note that this will only work for simple registers, if Xen needs to
>>>>> - * trap accesses to rw1c registers (like the status PCI header register)
>>>>> - * the logic in vpci_write will have to be expanded in order to correctly
>>>>> - * deal with them.
>>>>>   */
>>>>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>>>>                                const struct vpci_register *r, unsigned int size,
>>>>>                                unsigned int offset, uint32_t data)
>>>>>  {
>>>>> +    uint32_t val = 0;
>>>>> +
>>>>>      ASSERT(size <= r->size);
>>>>>  
>>>>> -    if ( size != r->size )
>>>>> +    if ( (size != r->size) || r->ro_mask )
>>>>>      {
>>>>> -        uint32_t val;
>>>>> -
>>>>>          val = r->read(pdev, r->offset, r->private);
>>>>> +        val &= ~r->rw1c_mask;
>>>>>          data = merge_result(val, data, size, offset);
>>>>>      }
>>>>>  
>>>>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>>>>> +    data |= val & r->ro_mask;
>>>>
>>>> I've been thinking about this, and the way the ro_mask is implemented
>>>> (and the way we want to handle ro bits) is the same behavior as RsvdP.
>>>> I would suggest to rename the ro_mask to rsvdp_mask and note
>>>> that for resilience reasons we will handle RO bits as RsvdP.
>>>
>>> But the reads behave differently. RO should return the value, RsvdP should return 0 when read (according to the PCIe Base Spec 4.0).
>>
>> Hm, right, sorry for the wrong suggestion.  We should force bits to 0
>> for guests reads, but make sure that for writes the value on the
>> hardware is preserved.
>>
>> So we need the separate mask for RsvdP, which will be handled like
>> ro_mask in vpci_write_helper() and like rsvdz_mask in vpci_read().
> 
> Agreed. The only reason I didn't add RsvdP support in this patch was that it wasn't needed for the status register... Since RsvdP will be needed for the command register, I think I may as well implement it as part of this series, perhaps in a separate patch following this one. Stay tuned for v8.

I just remembered that RsvdP would actually be useful for the PCI_STATUS_CAP_LIST bit in the status register. So I'll implement it directly in this patch afterall, not a separate one.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-17 12:40   ` Roger Pau Monné
  2023-11-17 13:23     ` Jan Beulich
  2023-11-17 16:45     ` Roger Pau Monné
@ 2023-11-22 20:16     ` Stewart Hildebrand
  2023-11-23  8:14       ` Roger Pau Monné
  2 siblings, 1 reply; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-22 20:16 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/17/23 07:40, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> Introduce a handler for the PCI status register, with ability to mask the
>> capabilities bit. The status register contains RsvdZ bits, read-only bits, and
>> write-1-to-clear bits, so introduce bitmasks to handle these in vPCI. If a bit
>> in the bitmask is set, then the special meaning applies:
>>
>>   rsvdz_mask: read as zero, guest write ignore (write zero to hardware)
>>   ro_mask: read normal, guest write ignore (preserve on write to hardware)
>>   rw1c_mask: read normal, write 1 to clear
>>
>> The RsvdZ naming was borrowed from the PCI Express Base 4.0 specification.
>>
>> Xen preserves the value of read-only bits on write to hardware, discarding the
>> guests write value.
>>
>> The mask_cap_list flag will be set in a follow-on patch.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> ---
>> v6->v7:
>> * re-work args passed to vpci_add_register_mask() (called in init_bars())
>> * also check for overlap of (rsvdz_mask & ro_mask) in add_register()
>> * slightly adjust masking operation in vpci_write_helper()
>>
>> v5->v6:
>> * remove duplicate PCI_STATUS_CAP_LIST in constant definition
>> * style fixup in constant definitions
>> * s/res_mask/rsvdz_mask/
>> * combine a new masking operation into single line
>> * preserve r/o bits on write
>> * get rid of status_read. Instead, use rsvdz_mask for conditionally masking
>>   PCI_STATUS_CAP_LIST bit
>> * add comment about PCI_STATUS_CAP_LIST and rsvdp behavior
>> * add sanity checks in add_register
>> * move mask_cap_list from struct vpci_header to local variable
>>
>> v4->v5:
>> * add support for res_mask
>> * add support for ro_mask (squash ro_mask patch)
>> * add constants for reserved, read-only, and rw1c masks
>>
>> v3->v4:
>> * move mask_cap_list setting to the capabilities patch
>> * single pci_conf_read16 in status_read
>> * align mask_cap_list bitfield in struct vpci_header
>> * change to rw1c bit mask instead of treating whole register as rw1c
>> * drop subsystem prefix on renamed add_register function
>>
>> v2->v3:
>> * new patch
>> ---
>>  xen/drivers/vpci/header.c  | 16 +++++++++++
>>  xen/drivers/vpci/vpci.c    | 55 +++++++++++++++++++++++++++++---------
>>  xen/include/xen/pci_regs.h |  9 +++++++
>>  xen/include/xen/vpci.h     |  8 ++++++
>>  4 files changed, 76 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 767c1ba718d7..af267b75ac31 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -521,6 +521,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      struct vpci_header *header = &pdev->vpci->header;
>>      struct vpci_bar *bars = header->bars;
>>      int rc;
>> +    bool mask_cap_list = false;
>>  
>>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>      {
>> @@ -544,6 +545,21 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>      if ( rc )
>>          return rc;
>>  
>> +    /*
>> +     * Utilize rsvdz_mask to hide PCI_STATUS_CAP_LIST from the guest for now. If
>> +     * support for rsvdp (reserved & preserved) is added in the future, the
>> +     * rsvdp mask should be used instead.
>> +     */
>> +    rc = vpci_add_register_mask(pdev->vpci, vpci_hw_read16, vpci_hw_write16,
>> +                                PCI_STATUS, 2, NULL,
>> +                                PCI_STATUS_RSVDZ_MASK |
>> +                                    (mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> +                                PCI_STATUS_RO_MASK &
>> +                                    ~(mask_cap_list ? PCI_STATUS_CAP_LIST : 0),
>> +                                PCI_STATUS_RW1C_MASK);
>> +    if ( rc )
>> +        return rc;
>> +
>>      if ( pdev->ignore_bars )
>>          return 0;
>>  
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 3bec9a4153da..b4cde7db1b3f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -29,6 +29,9 @@ struct vpci_register {
>>      unsigned int offset;
>>      void *private;
>>      struct list_head node;
>> +    uint32_t rsvdz_mask;
>> +    uint32_t ro_mask;
>> +    uint32_t rw1c_mask;
> 
> I understand that we need the rw1c_mask in order to properly merge
> values when doing partial writes, but the other fields I'm not sure we
> do need them.  RO bits don't care about what's written to them, and
> RsvdZ are always read as 0 and written as 0, so the mask shouldn't
> affect the merging.

See discussions at [1] and [2]. I'll keep ro_mask/rsvdz_mask, add rsvdp_mask, and expand the commit message. For another data point, qemu (hw/xen/xen_pt_config_init.c:xen_pt_emu_reg_header0) also has a similar way of masking bits.

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01398.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01693.html

> 
>>  };
>>  
>>  #ifdef __XEN__
>> @@ -145,9 +148,16 @@ uint32_t cf_check vpci_hw_read32(
>>      return pci_conf_read32(pdev->sbdf, reg);
>>  }
>>  
>> -int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> -                      vpci_write_t *write_handler, unsigned int offset,
>> -                      unsigned int size, void *data)
>> +void cf_check vpci_hw_write16(
>> +    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
>> +{
>> +    pci_conf_write16(pdev->sbdf, reg, val);
>> +}
>> +
>> +static int add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> +                        vpci_write_t *write_handler, unsigned int offset,
>> +                        unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                        uint32_t ro_mask, uint32_t rw1c_mask)
>>  {
>>      struct list_head *prev;
>>      struct vpci_register *r;
>> @@ -155,7 +165,8 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      /* Some sanity checks. */
>>      if ( (size != 1 && size != 2 && size != 4) ||
>>           offset >= PCI_CFG_SPACE_EXP_SIZE || (offset & (size - 1)) ||
>> -         (!read_handler && !write_handler) )
>> +         (!read_handler && !write_handler) || (rsvdz_mask & ro_mask) ||
>> +         (rsvdz_mask & rw1c_mask) || (ro_mask & rw1c_mask) )
>>          return -EINVAL;
>>  
>>      r = xmalloc(struct vpci_register);
>> @@ -167,6 +178,9 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      r->size = size;
>>      r->offset = offset;
>>      r->private = data;
>> +    r->rsvdz_mask = rsvdz_mask & (0xffffffffU >> (32 - 8 * size));
>> +    r->ro_mask = ro_mask & (0xffffffffU >> (32 - 8 * size));
>> +    r->rw1c_mask = rw1c_mask & (0xffffffffU >> (32 - 8 * size));
>>  
>>      spin_lock(&vpci->lock);
>>  
>> @@ -193,6 +207,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>      return 0;
>>  }
>>  
>> +int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>> +                      vpci_write_t *write_handler, unsigned int offset,
>> +                      unsigned int size, void *data)
>> +{
>> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
>> +                        0, 0, 0);
>> +}
>> +
>> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
>> +                           vpci_write_t *write_handler, unsigned int offset,
>> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                           uint32_t ro_mask, uint32_t rw1c_mask)
>> +{
>> +    return add_register(vpci, read_handler, write_handler, offset, size, data,
>> +                        rsvdz_mask, ro_mask, rw1c_mask);
>> +}
>> +
>>  int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>>                           unsigned int size)
>>  {
>> @@ -376,6 +407,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>          }
>>  
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rsvdz_mask;
>>  
>>          /* Check if the read is in the middle of a register. */
>>          if ( r->offset < emu.offset )
>> @@ -407,26 +439,25 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  
>>  /*
>>   * Perform a maybe partial write to a register.
>> - *
>> - * Note that this will only work for simple registers, if Xen needs to
>> - * trap accesses to rw1c registers (like the status PCI header register)
>> - * the logic in vpci_write will have to be expanded in order to correctly
>> - * deal with them.
>>   */
>>  static void vpci_write_helper(const struct pci_dev *pdev,
>>                                const struct vpci_register *r, unsigned int size,
>>                                unsigned int offset, uint32_t data)
>>  {
>> +    uint32_t val = 0;
>> +
>>      ASSERT(size <= r->size);
>>  
>> -    if ( size != r->size )
>> +    if ( (size != r->size) || r->ro_mask )
> 
> Hm, I'm not sure this specific handling for read-only bits is
> required, software is allowed to write either 0 or 1 to read-only
> bits, but the write will be ignored.

See [1].

> 
>>      {
>> -        uint32_t val;
>> -
>>          val = r->read(pdev, r->offset, r->private);
>> +        val &= ~r->rw1c_mask;
>>          data = merge_result(val, data, size, offset);
>>      }
>>  
>> +    data &= ~(r->rsvdz_mask | r->ro_mask);
>> +    data |= val & r->ro_mask;
> 
> You cannot apply the register masks directly into the final value, you
> need to offset and mask them as necessary, likewise for val, see
> what's done in merge_result().

See [3].

[3] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg01412.html

> 
> Regardless of the offset issue, I think the usage of val with the
> ro_mask is bogus here, see above.

See [1].

> 
>> +
>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>>               r->private);
>>  }
>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
>> index 84b18736a85d..b72131729db6 100644
>> --- a/xen/include/xen/pci_regs.h
>> +++ b/xen/include/xen/pci_regs.h
>> @@ -66,6 +66,15 @@
>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> 
> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> be 0x46.

Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.

In PCIe 4 spec, it says this about bit 6:
"These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."

Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
"This bit is reserved. *"
"* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."

Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
#define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */

Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
        .ro_mask    = 0x06F8,

> 
> But I'm unsure we really need this mask.
> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-17 13:33   ` Roger Pau Monné
@ 2023-11-22 22:04     ` Stewart Hildebrand
  0 siblings, 0 replies; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-22 22:04 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/17/23 08:33, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>> +int vpci_add_register_mask(struct vpci *vpci, vpci_read_t *read_handler,
>> +                           vpci_write_t *write_handler, unsigned int offset,
>> +                           unsigned int size, void *data, uint32_t rsvdz_mask,
>> +                           uint32_t ro_mask, uint32_t rw1c_mask)
> 
> Forgot to mention, it would be good if you can add some tests in
> tools/tests/vpci that ensure the masks work as expected.

Will do

> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-22 20:16     ` Stewart Hildebrand
@ 2023-11-23  8:14       ` Roger Pau Monné
  2023-11-23 12:57         ` Stewart Hildebrand
  0 siblings, 1 reply; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-23  8:14 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
> On 11/17/23 07:40, Roger Pau Monné wrote:
> > On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
> >>               r->private);
> >>  }
> >> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> >> index 84b18736a85d..b72131729db6 100644
> >> --- a/xen/include/xen/pci_regs.h
> >> +++ b/xen/include/xen/pci_regs.h
> >> @@ -66,6 +66,15 @@
> >>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> >>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> >>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> >> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> > 
> > In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> > be 0x46.
> 
> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
> 
> In PCIe 4 spec, it says this about bit 6:
> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
> 
> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
> "This bit is reserved. *"
> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
> 
> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> 
> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
>         .ro_mask    = 0x06F8,

Right, given the implementation of ro_mask that would likely be fine.
Reading unconditionally as 0 while preserving the value on writes
seems the safest option.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-23  8:14       ` Roger Pau Monné
@ 2023-11-23 12:57         ` Stewart Hildebrand
  2023-11-23 14:21           ` Roger Pau Monné
  0 siblings, 1 reply; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-23 12:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/23/23 03:14, Roger Pau Monné wrote:
> On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
>> On 11/17/23 07:40, Roger Pau Monné wrote:
>>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
>>>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
>>>>               r->private);
>>>>  }
>>>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
>>>> index 84b18736a85d..b72131729db6 100644
>>>> --- a/xen/include/xen/pci_regs.h
>>>> +++ b/xen/include/xen/pci_regs.h
>>>> @@ -66,6 +66,15 @@
>>>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
>>>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
>>>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
>>>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
>>>
>>> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
>>> be 0x46.
>>
>> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
>>
>> In PCIe 4 spec, it says this about bit 6:
>> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
>>
>> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
>> "This bit is reserved. *"
>> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
>>
>> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
>> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
>>
>> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
>>         .ro_mask    = 0x06F8,
> 
> Right, given the implementation of ro_mask that would likely be fine.
> Reading unconditionally as 0 while preserving the value on writes
> seems the safest option.

That would mean treating bit 6 as RsvdP, even though the PCIe 6 spec says RsvdZ. I just want to confirm this is indeed the intent since we both said RsvdZ just a moment ago? If so, I would add a comment since it's deviating from spec. I would personally still vote in favor of following PCIe 6 spec (RsvdZ).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/2] xen/vpci: header: status register handler
  2023-11-23 12:57         ` Stewart Hildebrand
@ 2023-11-23 14:21           ` Roger Pau Monné
  0 siblings, 0 replies; 21+ messages in thread
From: Roger Pau Monné @ 2023-11-23 14:21 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On Thu, Nov 23, 2023 at 07:57:21AM -0500, Stewart Hildebrand wrote:
> On 11/23/23 03:14, Roger Pau Monné wrote:
> > On Wed, Nov 22, 2023 at 03:16:29PM -0500, Stewart Hildebrand wrote:
> >> On 11/17/23 07:40, Roger Pau Monné wrote:
> >>> On Wed, Sep 13, 2023 at 10:35:46AM -0400, Stewart Hildebrand wrote:
> >>>>      r->write(pdev, r->offset, data & (0xffffffffU >> (32 - 8 * r->size)),
> >>>>               r->private);
> >>>>  }
> >>>> diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
> >>>> index 84b18736a85d..b72131729db6 100644
> >>>> --- a/xen/include/xen/pci_regs.h
> >>>> +++ b/xen/include/xen/pci_regs.h
> >>>> @@ -66,6 +66,15 @@
> >>>>  #define  PCI_STATUS_REC_MASTER_ABORT	0x2000 /* Set on master abort */
> >>>>  #define  PCI_STATUS_SIG_SYSTEM_ERROR	0x4000 /* Set when we drive SERR */
> >>>>  #define  PCI_STATUS_DETECTED_PARITY	0x8000 /* Set on parity error */
> >>>> +#define  PCI_STATUS_RSVDZ_MASK		0x0006
> >>>
> >>> In my copy of the PCIe 6 spec bit 6 is also RsvdZ, so the mask should
> >>> be 0x46.
> >>
> >> Right, mine too. It's probably safer to follow the newer version of the spec, so I'll make the change. For completeness / archaeology purposes, I just want to document some relevant data points here.
> >>
> >> In PCIe 4 spec, it says this about bit 6:
> >> "These bits were used in previous versions of the programming model. Careful consideration should be given to any attempt to repurpose them."
> >>
> >> Going further back, PCI (old school PCI, not Express) spec 3.0 says this about bit 6:
> >> "This bit is reserved. *"
> >> "* In Revision 2.1 of this specification, this bit was used to indicate whether or not a device supported User Definable Features."
> >>
> >> Just above in our pci_regs.h (and equally in Linux include/uapi/linux/pci_regs.h) we have this definition for bit 6:
> >> #define  PCI_STATUS_UDF         0x40    /* Support User Definable Features [obsolete] */
> >>
> >> Qemu hw/xen/xen_pt_config_init.c treats bit 6 as RO:
> >>         .ro_mask    = 0x06F8,
> > 
> > Right, given the implementation of ro_mask that would likely be fine.
> > Reading unconditionally as 0 while preserving the value on writes
> > seems the safest option.
> 
> That would mean treating bit 6 as RsvdP, even though the PCIe 6 spec
> says RsvdZ. I just want to confirm this is indeed the intent since
> we both said RsvdZ just a moment ago? If so, I would add a comment
> since it's deviating from spec. I would personally still vote in
> favor of following PCIe 6 spec (RsvdZ).

Hm, yes, lets go with the spec and use RsdvZ.  I very much doubt we
passthrough any device that actually uses the UDF bit.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities
  2023-11-17 11:44   ` Roger Pau Monné
@ 2023-11-28 16:04     ` Stewart Hildebrand
  0 siblings, 0 replies; 21+ messages in thread
From: Stewart Hildebrand @ 2023-11-28 16:04 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

On 11/17/23 06:44, Roger Pau Monné wrote:
> On Wed, Sep 13, 2023 at 10:35:47AM -0400, Stewart Hildebrand wrote:
>> Currently, Xen vPCI only supports virtualizing the MSI and MSI-X capabilities.
>> Hide all other PCI capabilities (including extended capabilities) from domUs for
>> now, even though there may be certain devices/drivers that depend on being able
>> to discover certain capabilities.
>>
>> We parse the physical PCI capabilities linked list and add vPCI register
>> handlers for the next elements, inserting our own next value, thus presenting a
>> modified linked list to the domU.
>>
>> Introduce helper functions vpci_hw_read8 and vpci_read_val. The vpci_read_val
>> helper function returns a fixed value, which may be used for RAZ registers, or
>> registers whose value doesn't change.
>>
>> Introduce pci_find_next_cap_ttl() helper while adapting the logic from
>> pci_find_next_cap() to suit our needs, and implement the existing
>> pci_find_next_cap() in terms of the new helper.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v6->v7:
>> * no change
>>
>> v5->v6:
>> * add register handlers before status register handler in init_bars()
>> * s/header->mask_cap_list/mask_cap_list/
>>
>> v4->v5:
>> * use more appropriate types, continued
>> * get rid of unnecessary hook function
>> * add Jan's R-b
>>
>> v3->v4:
>> * move mask_cap_list setting to this patch
>> * leave pci_find_next_cap signature alone
>> * use more appropriate types
>>
>> v2->v3:
>> * get rid of > 0 in loop condition
>> * implement pci_find_next_cap in terms of new pci_find_next_cap_ttl function so
>>   that hypothetical future callers wouldn't be required to pass &ttl.
>> * change NULL to (void *)0 for RAZ value passed to vpci_read_val
>> * change type of ttl to unsigned int
>> * remember to mask off the low 2 bits of next in the initial loop iteration
>> * change return type of pci_find_next_cap and pci_find_next_cap_ttl
>> * avoid wrapping the PCI_STATUS_CAP_LIST condition by using ! instead of == 0
>>
>> v1->v2:
>> * change type of ttl to int
>> * use switch statement instead of if/else
>> * adapt existing pci_find_next_cap helper instead of rolling our own
>> * pass ttl as in/out
>> * "pass through" the lower 2 bits of the next pointer
>> * squash helper functions into this patch to avoid transient dead code situation
>> * extended capabilities RAZ/WI
>> ---
>>  xen/drivers/pci/pci.c     | 26 +++++++++-----
>>  xen/drivers/vpci/header.c | 76 +++++++++++++++++++++++++++++++++++++++
>>  xen/drivers/vpci/vpci.c   | 12 +++++++
>>  xen/include/xen/pci.h     |  3 ++
>>  xen/include/xen/vpci.h    |  5 +++
>>  5 files changed, 113 insertions(+), 9 deletions(-)
>>
>> diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
>> index 3569ccb24e9e..8799d60c2156 100644
>> --- a/xen/drivers/pci/pci.c
>> +++ b/xen/drivers/pci/pci.c
>> @@ -39,31 +39,39 @@ unsigned int pci_find_cap_offset(pci_sbdf_t sbdf, unsigned int cap)
>>      return 0;
>>  }
>>  
>> -unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
>> -                               unsigned int cap)
>> +unsigned int pci_find_next_cap_ttl(pci_sbdf_t sbdf, unsigned int pos,
>> +                                   bool (*is_match)(unsigned int),
>> +                                   unsigned int cap, unsigned int *ttl)
> 
> Maybe this has been discussed in previous patch versions, but why
> pass a match function instead of expanding the cap parameter to
> be an array of capabilities to search for?

Hm. I don't think we did discuss it previously. It's simple enough to change to an array, so I'll do this for v8.

> 
> I find it kind of weird to be able to pass both a specific capability
> to match against and also a match function.

Having two ways to specify it was a way to retain compatibility with the existing function pci_find_next_cap() without having to introduce another match function. But I'll change it to an array now.

> 
> What the expected behavior if the caller provides both a match
> function and a cap value?
> 
>>  {
>> -    u8 id;
>> -    int ttl = 48;
>> +    unsigned int id;
>>  
>> -    while ( ttl-- )
>> +    while ( (*ttl)-- )
>>      {
>>          pos = pci_conf_read8(sbdf, pos);
>>          if ( pos < 0x40 )
>>              break;
>>  
>> -        pos &= ~3;
>> -        id = pci_conf_read8(sbdf, pos + PCI_CAP_LIST_ID);
>> +        id = pci_conf_read8(sbdf, (pos & ~3) + PCI_CAP_LIST_ID);
>>  
>>          if ( id == 0xff )
>>              break;
>> -        if ( id == cap )
>> +        if ( (is_match && is_match(id)) || (!is_match && id == cap) )
>>              return pos;
>>  
>> -        pos += PCI_CAP_LIST_NEXT;
>> +        pos = (pos & ~3) + PCI_CAP_LIST_NEXT;
>>      }
>> +
>>      return 0;
>>  }
>>  
>> +unsigned int pci_find_next_cap(pci_sbdf_t sbdf, unsigned int pos,
>> +                               unsigned int cap)
>> +{
>> +    unsigned int ttl = 48;
>> +
>> +    return pci_find_next_cap_ttl(sbdf, pos, NULL, cap, &ttl) & ~3;
>> +}
>> +
>>  /**
>>   * pci_find_ext_capability - Find an extended capability
>>   * @sbdf: PCI device to query
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index af267b75ac31..1e7dfe668ccf 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -513,6 +513,18 @@ static void cf_check rom_write(
>>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>  }
>>  
>> +static bool cf_check vpci_cap_supported(unsigned int id)
>> +{
>> +    switch ( id )
>> +    {
>> +    case PCI_CAP_ID_MSI:
>> +    case PCI_CAP_ID_MSIX:
>> +        return true;
>> +    default:
>> +        return false;
>> +    }
>> +}
>> +
>>  static int cf_check init_bars(struct pci_dev *pdev)
>>  {
>>      uint16_t cmd;
>> @@ -545,6 +557,70 @@ static int cf_check init_bars(struct pci_dev *pdev)
> 
> We might have to rename this to init_header() now :).
> 
>>      if ( rc )
>>          return rc;
>>  
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        if ( !(pci_conf_read16(pdev->sbdf, PCI_STATUS) & PCI_STATUS_CAP_LIST) )
>> +        {
>> +            /* RAZ/WI */
> 
> That RAZ/WI acronym seems very Arm specific (TBH I had to search for
> it).
> 
> FWIW, it's my understanding that if the status register doesn't report
> the capability list support, the register is unimplemented, and hence
> would be fine to return ~0 from reads of PCI_CAPABILITY_LIST?
> 
> IOW: I'm not sure we need to add this handler for PCI_CAPABILITY_LIST
> if it's not supported.

Agreed, if the hardware itself doesn't have the PCI_STATUS_CAP_LIST bit set, there little to no point in emulating the PCI_CAPABILITY_LIST register. I'll fix this up for v8.

> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-11-28 16:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-13 14:35 [PATCH v7 0/2] vPCI capabilities filtering Stewart Hildebrand
2023-09-13 14:35 ` [PATCH v7 1/2] xen/vpci: header: status register handler Stewart Hildebrand
2023-09-14 11:12   ` Jan Beulich
2023-11-17 12:40   ` Roger Pau Monné
2023-11-17 13:23     ` Jan Beulich
2023-11-17 13:45       ` Roger Pau Monné
2023-11-17 16:45     ` Roger Pau Monné
2023-11-22 20:16     ` Stewart Hildebrand
2023-11-23  8:14       ` Roger Pau Monné
2023-11-23 12:57         ` Stewart Hildebrand
2023-11-23 14:21           ` Roger Pau Monné
2023-11-17 13:33   ` Roger Pau Monné
2023-11-22 22:04     ` Stewart Hildebrand
2023-11-21 14:45   ` Roger Pau Monné
2023-11-21 15:03     ` Stewart Hildebrand
2023-11-21 15:18       ` Roger Pau Monné
2023-11-21 16:27         ` Stewart Hildebrand
2023-11-21 16:46           ` Stewart Hildebrand
2023-09-13 14:35 ` [PATCH v7 2/2] xen/vpci: header: filter PCI capabilities Stewart Hildebrand
2023-11-17 11:44   ` Roger Pau Monné
2023-11-28 16:04     ` Stewart Hildebrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.