[PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
@ 2023-03-25  2:49 Marek Marczykowski-Górecki
  2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-25  2:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu

QEMU needs to know whether clearing maskbit of a vector is really
clearing, or was already cleared before. Currently Xen sends only
clearing that bit to the device model, but not setting it, so QEMU
cannot detect it. Because of that, QEMU is working this around by
checking via /dev/mem, but that isn't the proper approach.

Give all necessary information to QEMU by passing all ctrl writes,
including masking a vector. This does include forwarding also writes
that did not change the value, but as tested on both Linux (6.1.12) and
Windows (10 pro), they don't do excessive writes of unchanged values
(Windows seems to clear maskbit in some cases twice, but not more).

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v2:
 - passthrough quad writes to emulator too (Jan)
 - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
   #define for this magic value

This behavior change needs to be surfaced to the device model somehow,
so it knows whether it can rely on it. I'm open for suggestions.
---
 xen/arch/x86/hvm/vmsi.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3cd4923060c8..9c82bf9b4ec2 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -272,6 +272,15 @@ out:
     return r;
 }
 
+/*
+ * This function returns X86EMUL_UNHANDLEABLE even if write is properly
+ * handled, to propagate it to the device model (so it can keep its internal
+ * state in sync).
+ * len==0 means really len==4, but as a write completion that will return
+ * X86EMUL_OKAY on successful processing. Use WRITE_LEN4_COMPLETION to make it
+ * less confusing.
+ */
+#define WRITE_LEN4_COMPLETION 0
 static int msixtbl_write(struct vcpu *v, unsigned long address,
                          unsigned int len, unsigned long val)
 {
@@ -283,9 +292,6 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
     unsigned long flags;
     struct irq_desc *desc;
 
-    if ( (len != 4 && len != 8) || (address & (len - 1)) )
-        return r;
-
     rcu_read_lock(&msixtbl_rcu_lock);
 
     entry = msixtbl_find_entry(v, address);
@@ -345,7 +351,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long address,
 
 unlock:
     spin_unlock_irqrestore(&desc->lock, flags);
-    if ( len == 4 )
+    if ( len == WRITE_LEN4_COMPLETION )
         r = X86EMUL_OKAY;
 
 out:
@@ -357,6 +363,9 @@ static int cf_check _msixtbl_write(
     const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
     uint64_t val)
 {
+    if ( (len != 4 && len != 8) || (address & (len - 1)) )
+        return X86EMUL_UNHANDLEABLE;
+
     return msixtbl_write(current, address, len, val);
 }
 
@@ -635,7 +644,7 @@ void msix_write_completion(struct vcpu *v)
         return;
 
     v->arch.hvm.hvm_io.msix_unmask_address = 0;
-    if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
+    if ( msixtbl_write(v, ctrl_address, WRITE_LEN4_COMPLETION, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-25  2:49 [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
@ 2023-03-25  2:49 ` Marek Marczykowski-Górecki
  2023-03-27 10:47   ` Andrew Cooper
  2023-03-28 11:28   ` Roger Pau Monné
  2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
  2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
  2 siblings, 2 replies; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-25  2:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu

Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
on the same page as MSI-X table. Device model (especially one in
stubdomain) cannot really handle those, as direct writes to that page is
refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
server that handle those writes.

Doing this, requires correlating write location with guest view
of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
for PV would need to be done separately.

This can be also used to read Pending Bit Array, if it lives on the same
page, making QEMU not needing /dev/mem access at all (especially helpful
with lockdown enabled in dom0). If PBA lives on another page, QEMU will
map it to the guest directly.
If PBA lives on the same page, forbid writes and log a message.
Technically, writes outside of PBA could be allowed, but at this moment
the precise location of PBA isn't saved.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v2:
 - adjust commit message
 - pass struct domain to msixtbl_page_handler_get_hwaddr()
 - reduce local variables used only once
 - log a warning if write is forbidden if MSI-X and PBA lives on the same
   page
 - do not passthrough unaligned accesses
 - handle accesses both before and after MSI-X table
---
 xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
 xen/arch/x86/include/asm/msi.h |   5 ++
 xen/arch/x86/msi.c             |  38 ++++++++
 3 files changed, 197 insertions(+)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 9c82bf9b4ec2..9293009a4075 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
     .write = _msixtbl_write,
 };
 
+static void __iomem *msixtbl_page_handler_get_hwaddr(
+    const struct domain *d,
+    uint64_t address,
+    bool write)
+{
+    const struct pci_dev *pdev = NULL;
+    const struct msixtbl_entry *entry;
+    int adj_type;
+
+    rcu_read_lock(&msixtbl_rcu_lock);
+    /*
+     * Check if it's on the same page as the end of the MSI-X table, but
+     * outside of the table itself.
+     */
+    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
+    {
+        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
+             address < entry->gtable )
+        {
+            adj_type = ADJ_IDX_FIRST;
+            pdev = entry->pdev;
+            break;
+        }
+        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) &&
+             address >= entry->gtable + entry->table_len )
+        {
+            adj_type = ADJ_IDX_LAST;
+            pdev = entry->pdev;
+            break;
+        }
+    }
+    rcu_read_unlock(&msixtbl_rcu_lock);
+
+    if ( !pdev )
+        return NULL;
+
+    ASSERT(pdev->msix);
+
+    if ( !pdev->msix->adj_access_table_idx[adj_type] )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Page for adjacent MSI-X table access not initialized for %pp\n",
+                 pdev);
+
+        return NULL;
+    }
+
+    /* If PBA lives on the same page too, forbid writes. */
+    if ( write && (
+        (adj_type == ADJ_IDX_LAST &&
+         pdev->msix->table.last == pdev->msix->pba.first) ||
+        (adj_type == ADJ_IDX_FIRST &&
+         pdev->msix->table.first == pdev->msix->pba.last)) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "MSI-X table and PBA of %pp live on the same page, "
+                 "writing to other registers there is not implemented\n",
+                 pdev);
+        return NULL;
+    }
+
+    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
+        (address & (PAGE_SIZE - 1));
+
+}
+
+static bool cf_check msixtbl_page_accept(
+        const struct hvm_io_handler *handler, const ioreq_t *r)
+{
+    ASSERT(r->type == IOREQ_TYPE_COPY);
+
+    return msixtbl_page_handler_get_hwaddr(
+            current->domain, r->addr, r->dir == IOREQ_WRITE);
+}
+
+static int cf_check msixtbl_page_read(
+        const struct hvm_io_handler *handler,
+        uint64_t address, uint32_t len, uint64_t *pval)
+{
+    void __iomem *hwaddr;
+
+    if ( address & (len - 1) )
+        return X86EMUL_UNHANDLEABLE;
+
+    hwaddr = msixtbl_page_handler_get_hwaddr(
+            current->domain, address, false);
+
+    if ( !hwaddr )
+        return X86EMUL_UNHANDLEABLE;
+
+    switch ( len )
+    {
+    case 1:
+        *pval = readb(hwaddr);
+        break;
+    case 2:
+        *pval = readw(hwaddr);
+        break;
+    case 4:
+        *pval = readl(hwaddr);
+        break;
+    case 8:
+        *pval = readq(hwaddr);
+        break;
+    default:
+        return X86EMUL_UNHANDLEABLE;
+    }
+    return X86EMUL_OKAY;
+}
+
+static int cf_check msixtbl_page_write(
+        const struct hvm_io_handler *handler,
+        uint64_t address, uint32_t len, uint64_t val)
+{
+    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
+            current->domain, address, true);
+
+    if ( !hwaddr )
+        return X86EMUL_UNHANDLEABLE;
+
+    switch ( len ) {
+        case 1:
+            writeb(val, hwaddr);
+            break;
+        case 2:
+            writew(val, hwaddr);
+            break;
+        case 4:
+            writel(val, hwaddr);
+            break;
+        case 8:
+            writeq(val, hwaddr);
+            break;
+        default:
+            return X86EMUL_UNHANDLEABLE;
+    }
+    return X86EMUL_OKAY;
+
+}
+
+static const struct hvm_io_ops msixtbl_mmio_page_ops = {
+    .accept = msixtbl_page_accept,
+    .read = msixtbl_page_read,
+    .write = msixtbl_page_write,
+};
+
 static void add_msixtbl_entry(struct domain *d,
                               struct pci_dev *pdev,
                               uint64_t gtable,
@@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
         handler->type = IOREQ_TYPE_COPY;
         handler->ops = &msixtbl_mmio_ops;
     }
+
+    /* passthrough access to other registers on the same page */
+    handler = hvm_next_io_handler(d);
+    if ( handler )
+    {
+        handler->type = IOREQ_TYPE_COPY;
+        handler->ops = &msixtbl_mmio_page_ops;
+    }
 }
 
 void msixtbl_pt_cleanup(struct domain *d)
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index a53ade95c9ad..d13cf1c1f873 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -207,6 +207,10 @@ struct msg_address {
                                        PCI_MSIX_ENTRY_SIZE + \
                                        (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
 
+/* indexes in adj_access_table_idx[] below */
+#define ADJ_IDX_FIRST 0
+#define ADJ_IDX_LAST  1
+
 struct arch_msix {
     unsigned int nr_entries, used_entries;
     struct {
@@ -214,6 +218,7 @@ struct arch_msix {
     } table, pba;
     int table_refcnt[MAX_MSIX_TABLE_PAGES];
     int table_idx[MAX_MSIX_TABLE_PAGES];
+    int adj_access_table_idx[2];
     spinlock_t table_lock;
     bool host_maskall, guest_maskall;
     domid_t warned;
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index d0bf63df1def..680853f84685 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
                 domain_crash(d);
             /* XXX How to deal with existing mappings? */
         }
+
+        /*
+         * If the MSI-X table doesn't start at the page boundary, map the first page for
+         * passthrough accesses.
+         */
+        if ( table_paddr & (PAGE_SIZE - 1) )
+        {
+            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
+
+            if ( idx >= 0 )
+                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
+            else
+                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
+        }
+        /*
+         * If the MSI-X table doesn't span full page(s), map the last page for
+         * passthrough accesses.
+         */
+        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
+        {
+            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
+            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
+
+            if ( idx >= 0 )
+                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
+            else
+                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
+        }
     }
     WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
     ++msix->used_entries;
@@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
             WARN();
         msix->table.first = 0;
         msix->table.last = 0;
+        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
+        {
+            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
+            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
+        }
+        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
+        {
+            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
+            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;
+        }
 
         if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
                                    msix->pba.last) )
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-25  2:49 [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
  2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
@ 2023-03-25  2:49 ` Marek Marczykowski-Górecki
  2023-03-28 11:37   ` Roger Pau Monné
  2023-03-28 12:54   ` Jan Beulich
  2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
  2 siblings, 2 replies; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-25  2:49 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Marek Marczykowski-Górecki, Jan Beulich,
	Paul Durrant, Roger Pau Monné

Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Xen relies on initial state being both disabled.
Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
setting it due to msix->host_maskall or msix->guest_maskall. Clearing
just MASKALL might be unsafe if ENABLE is set, so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
v2:
 - new patch
---
 xen/drivers/passthrough/msi.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
index ce1a450f6f4a..60adad47e379 100644
--- a/xen/drivers/passthrough/msi.c
+++ b/xen/drivers/passthrough/msi.c
@@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
         ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
         msix->nr_entries = msix_table_size(ctrl);
 
+        /*
+         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
+         * initial state.
+         */
+        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
+        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
+
         pdev->msix = msix;
     }
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-25  2:49 [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
  2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
  2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
@ 2023-03-27 10:12 ` Roger Pau Monné
  2023-03-27 10:26   ` Marek Marczykowski-Górecki
  2023-03-27 15:32   ` Jan Beulich
  2 siblings, 2 replies; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-27 10:12 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> QEMU needs to know whether clearing maskbit of a vector is really
> clearing, or was already cleared before. Currently Xen sends only
> clearing that bit to the device model, but not setting it, so QEMU
> cannot detect it. Because of that, QEMU is working this around by
> checking via /dev/mem, but that isn't the proper approach.
> 
> Give all necessary information to QEMU by passing all ctrl writes,
> including masking a vector. This does include forwarding also writes
> that did not change the value, but as tested on both Linux (6.1.12) and
> Windows (10 pro), they don't do excessive writes of unchanged values
> (Windows seems to clear maskbit in some cases twice, but not more).

Since we passthrough all the accesses to the device model, is the
handling in Xen still required?  It might be worth to also expose any
interfaces needed to the device model so all the functionality done by
the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
passing the accesses anyway.

> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v2:
>  - passthrough quad writes to emulator too (Jan)
>  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
>    #define for this magic value
> 
> This behavior change needs to be surfaced to the device model somehow,
> so it knows whether it can rely on it. I'm open for suggestions.

Maybe exposed in XEN_DMOP_get_ioreq_server_info?

But I wonder whether it shouldn't be the other way arround, the device
model tells Xen it doesn't need to handle any MSI-X accesses because
QEMU will take care of it, likely using a new flag in
XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
part of the gflags, but then we would need to assert that the flag is
passed for all MSI-X interrupts bound from that device to the same
domain.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
@ 2023-03-27 10:26   ` Marek Marczykowski-Górecki
  2023-03-27 10:51     ` Roger Pau Monné
  2023-03-27 15:32   ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-27 10:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 2904 bytes --]

On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > QEMU needs to know whether clearing maskbit of a vector is really
> > clearing, or was already cleared before. Currently Xen sends only
> > clearing that bit to the device model, but not setting it, so QEMU
> > cannot detect it. Because of that, QEMU is working this around by
> > checking via /dev/mem, but that isn't the proper approach.
> > 
> > Give all necessary information to QEMU by passing all ctrl writes,
> > including masking a vector. This does include forwarding also writes
> > that did not change the value, but as tested on both Linux (6.1.12) and
> > Windows (10 pro), they don't do excessive writes of unchanged values
> > (Windows seems to clear maskbit in some cases twice, but not more).
> 
> Since we passthrough all the accesses to the device model, is the
> handling in Xen still required?  It might be worth to also expose any
> interfaces needed to the device model so all the functionality done by
> the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> passing the accesses anyway.

This was discussed on v1 already. Such QEMU would need to be able to do
the actual write. If it's running in stubdomain, it would hit the exact
issue again (page mapped R/O to it). In fact, that might be an issue for
dom0 too (I haven't checked).
I guess that could use my subpage RO feature I just posted then, but it
would still mean intercepting the write twice (not a performance issue
really here, but rather convoluted handling in total).

> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > v2:
> >  - passthrough quad writes to emulator too (Jan)
> >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> >    #define for this magic value
> > 
> > This behavior change needs to be surfaced to the device model somehow,
> > so it knows whether it can rely on it. I'm open for suggestions.
> 
> Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> 
> But I wonder whether it shouldn't be the other way arround, the device
> model tells Xen it doesn't need to handle any MSI-X accesses because
> QEMU will take care of it, likely using a new flag in
> XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> part of the gflags, but then we would need to assert that the flag is
> passed for all MSI-X interrupts bound from that device to the same
> domain.

Is is safe thing to do? I mean, doesn't Xen need to guard access to
MSI-X configuration to assure its safety, especially if no interrupt
remapping is there? It probably doesn't matter for qemu in dom0 case,
but both with deprivileged qemu and stubdom, it might matter.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
@ 2023-03-27 10:47   ` Andrew Cooper
  2023-03-28 11:28   ` Roger Pau Monné
  1 sibling, 0 replies; 37+ messages in thread
From: Andrew Cooper @ 2023-03-27 10:47 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel
  Cc: Jason Andryuk, Jan Beulich, Roger Pau Monné, Wei Liu

On 25/03/2023 2:49 am, Marek Marczykowski-Górecki wrote:
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 9c82bf9b4ec2..9293009a4075 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
>      .write = _msixtbl_write,
>  };
>  
> +static void __iomem *msixtbl_page_handler_get_hwaddr(
> +    const struct domain *d,
> +    uint64_t address,
> +    bool write)
> +{
> +    const struct pci_dev *pdev = NULL;
> +    const struct msixtbl_entry *entry;
> +    int adj_type;
> +
> +    rcu_read_lock(&msixtbl_rcu_lock);
> +    /*
> +     * Check if it's on the same page as the end of the MSI-X table, but
> +     * outside of the table itself.
> +     */
> +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +    {
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> +             address < entry->gtable )
> +        {
> +            adj_type = ADJ_IDX_FIRST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) &&
> +             address >= entry->gtable + entry->table_len )
> +        {
> +            adj_type = ADJ_IDX_LAST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +    }
> +    rcu_read_unlock(&msixtbl_rcu_lock);
> +
> +    if ( !pdev )
> +        return NULL;
> +
> +    ASSERT(pdev->msix);
> +
> +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "Page for adjacent MSI-X table access not initialized for %pp\n",
> +                 pdev);

One minor observation.   &pdev->sbdf

Otherwise things will go wrong if sbdf ever moves from being the first
element in a pdev.

~Andrew


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 10:26   ` Marek Marczykowski-Górecki
@ 2023-03-27 10:51     ` Roger Pau Monné
  2023-03-27 11:34       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-27 10:51 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > > QEMU needs to know whether clearing maskbit of a vector is really
> > > clearing, or was already cleared before. Currently Xen sends only
> > > clearing that bit to the device model, but not setting it, so QEMU
> > > cannot detect it. Because of that, QEMU is working this around by
> > > checking via /dev/mem, but that isn't the proper approach.
> > > 
> > > Give all necessary information to QEMU by passing all ctrl writes,
> > > including masking a vector. This does include forwarding also writes
> > > that did not change the value, but as tested on both Linux (6.1.12) and
> > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > (Windows seems to clear maskbit in some cases twice, but not more).
> > 
> > Since we passthrough all the accesses to the device model, is the
> > handling in Xen still required?  It might be worth to also expose any
> > interfaces needed to the device model so all the functionality done by
> > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > passing the accesses anyway.
> 
> This was discussed on v1 already. Such QEMU would need to be able to do
> the actual write. If it's running in stubdomain, it would hit the exact
> issue again (page mapped R/O to it). In fact, that might be an issue for
> dom0 too (I haven't checked).

Oh, sorry, likely missed that discussion, as I don't recall this.

Maybe we need an hypercall for QEMU to notify the masking/unmasking to
Xen?  As any change on the other fields is already handled by QEMU.

> I guess that could use my subpage RO feature I just posted then, but it
> would still mean intercepting the write twice (not a performance issue
> really here, but rather convoluted handling in total).

Yes, that does seem way too convoluted.

> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > ---
> > > v2:
> > >  - passthrough quad writes to emulator too (Jan)
> > >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> > >    #define for this magic value
> > > 
> > > This behavior change needs to be surfaced to the device model somehow,
> > > so it knows whether it can rely on it. I'm open for suggestions.
> > 
> > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> > 
> > But I wonder whether it shouldn't be the other way arround, the device
> > model tells Xen it doesn't need to handle any MSI-X accesses because
> > QEMU will take care of it, likely using a new flag in
> > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > part of the gflags, but then we would need to assert that the flag is
> > passed for all MSI-X interrupts bound from that device to the same
> > domain.
> 
> Is is safe thing to do? I mean, doesn't Xen need to guard access to
> MSI-X configuration to assure its safety, especially if no interrupt
> remapping is there? It probably doesn't matter for qemu in dom0 case,
> but both with deprivileged qemu and stubdom, it might matter.

Right - QEMU shouldn't write directly to the MSI-X table using
/dev/mem, but instead use an hypercall to notify Xen of the
{un,}masking of the MSI-X table entry.  I think that would allow us to
safely get rid of the extra logic in Xen to deal with MSI-X table
accesses.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 10:51     ` Roger Pau Monné
@ 2023-03-27 11:34       ` Marek Marczykowski-Górecki
  2023-03-27 13:29         ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-27 11:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 5116 bytes --]

On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > clearing, or was already cleared before. Currently Xen sends only
> > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > cannot detect it. Because of that, QEMU is working this around by
> > > > checking via /dev/mem, but that isn't the proper approach.
> > > > 
> > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > including masking a vector. This does include forwarding also writes
> > > > that did not change the value, but as tested on both Linux (6.1.12) and
> > > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > > (Windows seems to clear maskbit in some cases twice, but not more).
> > > 
> > > Since we passthrough all the accesses to the device model, is the
> > > handling in Xen still required?  It might be worth to also expose any
> > > interfaces needed to the device model so all the functionality done by
> > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > passing the accesses anyway.
> > 
> > This was discussed on v1 already. Such QEMU would need to be able to do
> > the actual write. If it's running in stubdomain, it would hit the exact
> > issue again (page mapped R/O to it). In fact, that might be an issue for
> > dom0 too (I haven't checked).
> 
> Oh, sorry, likely missed that discussion, as I don't recall this.
> 
> Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> Xen?  As any change on the other fields is already handled by QEMU.
> 
> > I guess that could use my subpage RO feature I just posted then, but it
> > would still mean intercepting the write twice (not a performance issue
> > really here, but rather convoluted handling in total).
> 
> Yes, that does seem way too convoluted.
> 
> > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > ---
> > > > v2:
> > > >  - passthrough quad writes to emulator too (Jan)
> > > >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> > > >    #define for this magic value
> > > > 
> > > > This behavior change needs to be surfaced to the device model somehow,
> > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > 
> > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?

Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
new field?

> > > 
> > > But I wonder whether it shouldn't be the other way arround, the device
> > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > QEMU will take care of it, likely using a new flag in
> > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > part of the gflags, but then we would need to assert that the flag is
> > > passed for all MSI-X interrupts bound from that device to the same
> > > domain.
> > 
> > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > MSI-X configuration to assure its safety, especially if no interrupt
> > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > but both with deprivileged qemu and stubdom, it might matter.
> 
> Right - QEMU shouldn't write directly to the MSI-X table using
> /dev/mem, but instead use an hypercall to notify Xen of the
> {un,}masking of the MSI-X table entry.  I think that would allow us to
> safely get rid of the extra logic in Xen to deal with MSI-X table
> accesses.

But the purpose of this series is to give guest (or QEMU) more write
access to the MSI-X page, not less. If it wouldn't be this Intel AX
wifi, indeed we could translate everything to hypercalls in QEMU and not
worry about special handlers in Xen at all. But unfortunately, we do
need to handle writes to the same page outside of the MSI-X structures
and QEMU can't be trusted with properly filtering them (and otherwise
given full write access to the page). 

So, while I agree translating {un,}masking individual vectors to
hypercalls could simplify MSI-X handling in general, I don't think it
helps in this particular case. That said, such simplification would
involve:
1. Exposing the capability in Xen to the qemu
(XEN_DMOP_get_ioreq_server_info sounds reasonable).
2. QEMU notifying Xen it will handle masking too, somehow.
3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
to update Xen about the mask state too.
4. Xen no longer interpreting writes to mask bit, but still intercepting
them to passthorugh those outside of MSI-X structures (the other patch
in the series). But the handler would still need to stay, to keep
working with older QEMU versions.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 11:34       ` Marek Marczykowski-Górecki
@ 2023-03-27 13:29         ` Roger Pau Monné
  2023-03-27 14:20           ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-27 13:29 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > 
> > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > including masking a vector. This does include forwarding also writes
> > > > > that did not change the value, but as tested on both Linux (6.1.12) and
> > > > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > > > (Windows seems to clear maskbit in some cases twice, but not more).
> > > > 
> > > > Since we passthrough all the accesses to the device model, is the
> > > > handling in Xen still required?  It might be worth to also expose any
> > > > interfaces needed to the device model so all the functionality done by
> > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > passing the accesses anyway.
> > > 
> > > This was discussed on v1 already. Such QEMU would need to be able to do
> > > the actual write. If it's running in stubdomain, it would hit the exact
> > > issue again (page mapped R/O to it). In fact, that might be an issue for
> > > dom0 too (I haven't checked).
> > 
> > Oh, sorry, likely missed that discussion, as I don't recall this.
> > 
> > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > Xen?  As any change on the other fields is already handled by QEMU.
> > 
> > > I guess that could use my subpage RO feature I just posted then, but it
> > > would still mean intercepting the write twice (not a performance issue
> > > really here, but rather convoluted handling in total).
> > 
> > Yes, that does seem way too convoluted.
> > 
> > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > > ---
> > > > > v2:
> > > > >  - passthrough quad writes to emulator too (Jan)
> > > > >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> > > > >    #define for this magic value
> > > > > 
> > > > > This behavior change needs to be surfaced to the device model somehow,
> > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > 
> > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> 
> Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> new field?

I think it would be fine to make it IN/OUT, but see below.

> > > > 
> > > > But I wonder whether it shouldn't be the other way arround, the device
> > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > QEMU will take care of it, likely using a new flag in
> > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > part of the gflags, but then we would need to assert that the flag is
> > > > passed for all MSI-X interrupts bound from that device to the same
> > > > domain.
> > > 
> > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > MSI-X configuration to assure its safety, especially if no interrupt
> > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > but both with deprivileged qemu and stubdom, it might matter.
> > 
> > Right - QEMU shouldn't write directly to the MSI-X table using
> > /dev/mem, but instead use an hypercall to notify Xen of the
> > {un,}masking of the MSI-X table entry.  I think that would allow us to
> > safely get rid of the extra logic in Xen to deal with MSI-X table
> > accesses.
> 
> But the purpose of this series is to give guest (or QEMU) more write
> access to the MSI-X page, not less.

Right, but there are two independent issues here: one is the
propagation of the MSIX entry mask state to the device model, the
other is allowing guest accesses to MMIO regions adjacent to the MSIX
table.

> If it wouldn't be this Intel AX
> wifi, indeed we could translate everything to hypercalls in QEMU and not
> worry about special handlers in Xen at all. But unfortunately, we do
> need to handle writes to the same page outside of the MSI-X structures
> and QEMU can't be trusted with properly filtering them (and otherwise
> given full write access to the page).

Indeed, but IMO it would be helpful if we could avoid this split
handling of MSIX entries, where Xen handles entry mask/unmask, and
QEMU handles entry setup.  It makes the handling logic very
complicated, and more likely to be buggy (as you have probably
discovered).

Having QEMU always handle accesses to the MSI-X table would make
things simpler, and we could get rid of a huge amount of logic and
entry tracking in msixtbl_mmio_ops.

Then, we would only need to detect where an access falls into the same
page as the MSI-X (or PBA() tables, but outside of those, and forward
it to the underlying hardware, but that's a fairly simple piece of
logic, and completely detached from all the MSI-X entry tracking that
Xen currently does.

> So, while I agree translating {un,}masking individual vectors to
> hypercalls could simplify MSI-X handling in general, I don't think it
> helps in this particular case. That said, such simplification would
> involve:
> 1. Exposing the capability in Xen to the qemu
> (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> 2. QEMU notifying Xen it will handle masking too, somehow.

I think it's possible we could get away with adding a new flag bit to
xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
would tell Xen that QEMU will handle the mask bit for this entry.

QEMU using this flag should be prepared to handle the mask bit, but if
Xen doesn't know the flag it will keep processing the mask bit.

> 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> to update Xen about the mask state too.
> 4. Xen no longer interpreting writes to mask bit, but still intercepting
> them to passthorugh those outside of MSI-X structures (the other patch
> in the series). But the handler would still need to stay, to keep
> working with older QEMU versions.

Xen would need to intercept writes to the page(s) containing the MSI-X
table in any case, but the logic is much simpler if it just needs to
decide whether the accesses fall inside of the table region, and thus
needs to be forwarded to the device model, or fails outside of it and
needs to be propagated to the real address.

While true that we won't be able to remove the code that partially
handles MSIX entries for guests in Xen, it would be unused for newer
versions of QEMU, hopefully making the handling far more consistent as
the logic will be entirely in QEMU rather than split between Xen and
QEMU.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 13:29         ` Roger Pau Monné
@ 2023-03-27 14:20           ` Marek Marczykowski-Górecki
  2023-03-27 14:37             ` Roger Pau Monné
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-27 14:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 8354 bytes --]

On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote:
> On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > > 
> > > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > > including masking a vector. This does include forwarding also writes
> > > > > > that did not change the value, but as tested on both Linux (6.1.12) and
> > > > > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > > > > (Windows seems to clear maskbit in some cases twice, but not more).
> > > > > 
> > > > > Since we passthrough all the accesses to the device model, is the
> > > > > handling in Xen still required?  It might be worth to also expose any
> > > > > interfaces needed to the device model so all the functionality done by
> > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > > passing the accesses anyway.
> > > > 
> > > > This was discussed on v1 already. Such QEMU would need to be able to do
> > > > the actual write. If it's running in stubdomain, it would hit the exact
> > > > issue again (page mapped R/O to it). In fact, that might be an issue for
> > > > dom0 too (I haven't checked).
> > > 
> > > Oh, sorry, likely missed that discussion, as I don't recall this.
> > > 
> > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > > Xen?  As any change on the other fields is already handled by QEMU.
> > > 
> > > > I guess that could use my subpage RO feature I just posted then, but it
> > > > would still mean intercepting the write twice (not a performance issue
> > > > really here, but rather convoluted handling in total).
> > > 
> > > Yes, that does seem way too convoluted.
> > > 
> > > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > > > ---
> > > > > > v2:
> > > > > >  - passthrough quad writes to emulator too (Jan)
> > > > > >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> > > > > >    #define for this magic value
> > > > > > 
> > > > > > This behavior change needs to be surfaced to the device model somehow,
> > > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > > 
> > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> > 
> > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> > new field?
> 
> I think it would be fine to make it IN/OUT, but see below.
> 
> > > > > 
> > > > > But I wonder whether it shouldn't be the other way arround, the device
> > > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > > QEMU will take care of it, likely using a new flag in
> > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > > part of the gflags, but then we would need to assert that the flag is
> > > > > passed for all MSI-X interrupts bound from that device to the same
> > > > > domain.
> > > > 
> > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > > MSI-X configuration to assure its safety, especially if no interrupt
> > > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > > but both with deprivileged qemu and stubdom, it might matter.
> > > 
> > > Right - QEMU shouldn't write directly to the MSI-X table using
> > > /dev/mem, but instead use an hypercall to notify Xen of the
> > > {un,}masking of the MSI-X table entry.  I think that would allow us to
> > > safely get rid of the extra logic in Xen to deal with MSI-X table
> > > accesses.
> > 
> > But the purpose of this series is to give guest (or QEMU) more write
> > access to the MSI-X page, not less.
> 
> Right, but there are two independent issues here: one is the
> propagation of the MSIX entry mask state to the device model, the
> other is allowing guest accesses to MMIO regions adjacent to the MSIX
> table.
> 
> > If it wouldn't be this Intel AX
> > wifi, indeed we could translate everything to hypercalls in QEMU and not
> > worry about special handlers in Xen at all. But unfortunately, we do
> > need to handle writes to the same page outside of the MSI-X structures
> > and QEMU can't be trusted with properly filtering them (and otherwise
> > given full write access to the page).
> 
> Indeed, but IMO it would be helpful if we could avoid this split
> handling of MSIX entries, where Xen handles entry mask/unmask, and
> QEMU handles entry setup.  It makes the handling logic very
> complicated, and more likely to be buggy (as you have probably
> discovered).
> 
> Having QEMU always handle accesses to the MSI-X table would make
> things simpler, and we could get rid of a huge amount of logic and
> entry tracking in msixtbl_mmio_ops.
> 
> Then, we would only need to detect where an access falls into the same
> page as the MSI-X (or PBA() tables, but outside of those, and forward
> it to the underlying hardware, but that's a fairly simple piece of
> logic, and completely detached from all the MSI-X entry tracking that
> Xen currently does.
> 
> > So, while I agree translating {un,}masking individual vectors to
> > hypercalls could simplify MSI-X handling in general, I don't think it
> > helps in this particular case. That said, such simplification would
> > involve:
> > 1. Exposing the capability in Xen to the qemu
> > (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> > 2. QEMU notifying Xen it will handle masking too, somehow.
> 
> I think it's possible we could get away with adding a new flag bit to
> xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
> would tell Xen that QEMU will handle the mask bit for this entry.

Technically, for Xen to not care about those writes, it would need to
observe this flag on all vectors, including those not mapped yet. In
practice though, I think it might be okay to say device model should set
XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of
them, or all of them), and Xen can rely on it (if one vector has
XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have
it).

> QEMU using this flag should be prepared to handle the mask bit, but if
> Xen doesn't know the flag it will keep processing the mask bit.
> 
> > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> > to update Xen about the mask state too.
> > 4. Xen no longer interpreting writes to mask bit, but still intercepting
> > them to passthorugh those outside of MSI-X structures (the other patch
> > in the series). But the handler would still need to stay, to keep
> > working with older QEMU versions.
> 
> Xen would need to intercept writes to the page(s) containing the MSI-X
> table in any case, but the logic is much simpler if it just needs to
> decide whether the accesses fall inside of the table region, and thus
> needs to be forwarded to the device model, or fails outside of it and
> needs to be propagated to the real address.
> 
> While true that we won't be able to remove the code that partially
> handles MSIX entries for guests in Xen, it would be unused for newer
> versions of QEMU, hopefully making the handling far more consistent as
> the logic will be entirely in QEMU rather than split between Xen and
> QEMU.

In fact, it was easier for me to register a separate ioreq server for
writes outside of the MSI-X table. But I'm afraid the current one would
need to stay registered (just not accepting writes).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 14:20           ` Marek Marczykowski-Górecki
@ 2023-03-27 14:37             ` Roger Pau Monné
  0 siblings, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-27 14:37 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Mon, Mar 27, 2023 at 04:20:45PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 27, 2023 at 03:29:55PM +0200, Roger Pau Monné wrote:
> > On Mon, Mar 27, 2023 at 01:34:23PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Mar 27, 2023 at 12:51:09PM +0200, Roger Pau Monné wrote:
> > > > On Mon, Mar 27, 2023 at 12:26:05PM +0200, Marek Marczykowski-Górecki wrote:
> > > > > On Mon, Mar 27, 2023 at 12:12:29PM +0200, Roger Pau Monné wrote:
> > > > > > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > QEMU needs to know whether clearing maskbit of a vector is really
> > > > > > > clearing, or was already cleared before. Currently Xen sends only
> > > > > > > clearing that bit to the device model, but not setting it, so QEMU
> > > > > > > cannot detect it. Because of that, QEMU is working this around by
> > > > > > > checking via /dev/mem, but that isn't the proper approach.
> > > > > > > 
> > > > > > > Give all necessary information to QEMU by passing all ctrl writes,
> > > > > > > including masking a vector. This does include forwarding also writes
> > > > > > > that did not change the value, but as tested on both Linux (6.1.12) and
> > > > > > > Windows (10 pro), they don't do excessive writes of unchanged values
> > > > > > > (Windows seems to clear maskbit in some cases twice, but not more).
> > > > > > 
> > > > > > Since we passthrough all the accesses to the device model, is the
> > > > > > handling in Xen still required?  It might be worth to also expose any
> > > > > > interfaces needed to the device model so all the functionality done by
> > > > > > the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> > > > > > passing the accesses anyway.
> > > > > 
> > > > > This was discussed on v1 already. Such QEMU would need to be able to do
> > > > > the actual write. If it's running in stubdomain, it would hit the exact
> > > > > issue again (page mapped R/O to it). In fact, that might be an issue for
> > > > > dom0 too (I haven't checked).
> > > > 
> > > > Oh, sorry, likely missed that discussion, as I don't recall this.
> > > > 
> > > > Maybe we need an hypercall for QEMU to notify the masking/unmasking to
> > > > Xen?  As any change on the other fields is already handled by QEMU.
> > > > 
> > > > > I guess that could use my subpage RO feature I just posted then, but it
> > > > > would still mean intercepting the write twice (not a performance issue
> > > > > really here, but rather convoluted handling in total).
> > > > 
> > > > Yes, that does seem way too convoluted.
> > > > 
> > > > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > > > > ---
> > > > > > > v2:
> > > > > > >  - passthrough quad writes to emulator too (Jan)
> > > > > > >  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
> > > > > > >    #define for this magic value
> > > > > > > 
> > > > > > > This behavior change needs to be surfaced to the device model somehow,
> > > > > > > so it knows whether it can rely on it. I'm open for suggestions.
> > > > > > 
> > > > > > Maybe exposed in XEN_DMOP_get_ioreq_server_info?
> > > 
> > > Make flags IN/OUT parameter (and not reuse the same bits)? Or introduce
> > > new field?
> > 
> > I think it would be fine to make it IN/OUT, but see below.
> > 
> > > > > > 
> > > > > > But I wonder whether it shouldn't be the other way arround, the device
> > > > > > model tells Xen it doesn't need to handle any MSI-X accesses because
> > > > > > QEMU will take care of it, likely using a new flag in
> > > > > > XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> > > > > > part of the gflags, but then we would need to assert that the flag is
> > > > > > passed for all MSI-X interrupts bound from that device to the same
> > > > > > domain.
> > > > > 
> > > > > Is is safe thing to do? I mean, doesn't Xen need to guard access to
> > > > > MSI-X configuration to assure its safety, especially if no interrupt
> > > > > remapping is there? It probably doesn't matter for qemu in dom0 case,
> > > > > but both with deprivileged qemu and stubdom, it might matter.
> > > > 
> > > > Right - QEMU shouldn't write directly to the MSI-X table using
> > > > /dev/mem, but instead use an hypercall to notify Xen of the
> > > > {un,}masking of the MSI-X table entry.  I think that would allow us to
> > > > safely get rid of the extra logic in Xen to deal with MSI-X table
> > > > accesses.
> > > 
> > > But the purpose of this series is to give guest (or QEMU) more write
> > > access to the MSI-X page, not less.
> > 
> > Right, but there are two independent issues here: one is the
> > propagation of the MSIX entry mask state to the device model, the
> > other is allowing guest accesses to MMIO regions adjacent to the MSIX
> > table.
> > 
> > > If it wouldn't be this Intel AX
> > > wifi, indeed we could translate everything to hypercalls in QEMU and not
> > > worry about special handlers in Xen at all. But unfortunately, we do
> > > need to handle writes to the same page outside of the MSI-X structures
> > > and QEMU can't be trusted with properly filtering them (and otherwise
> > > given full write access to the page).
> > 
> > Indeed, but IMO it would be helpful if we could avoid this split
> > handling of MSIX entries, where Xen handles entry mask/unmask, and
> > QEMU handles entry setup.  It makes the handling logic very
> > complicated, and more likely to be buggy (as you have probably
> > discovered).
> > 
> > Having QEMU always handle accesses to the MSI-X table would make
> > things simpler, and we could get rid of a huge amount of logic and
> > entry tracking in msixtbl_mmio_ops.
> > 
> > Then, we would only need to detect where an access falls into the same
> > page as the MSI-X (or PBA() tables, but outside of those, and forward
> > it to the underlying hardware, but that's a fairly simple piece of
> > logic, and completely detached from all the MSI-X entry tracking that
> > Xen currently does.
> > 
> > > So, while I agree translating {un,}masking individual vectors to
> > > hypercalls could simplify MSI-X handling in general, I don't think it
> > > helps in this particular case. That said, such simplification would
> > > involve:
> > > 1. Exposing the capability in Xen to the qemu
> > > (XEN_DMOP_get_ioreq_server_info sounds reasonable).
> > > 2. QEMU notifying Xen it will handle masking too, somehow.
> > 
> > I think it's possible we could get away with adding a new flag bit to
> > xen_domctl_bind_pt_irq, like: XEN_DOMCTL_VMSI_X86_MASK_HANDLING that
> > would tell Xen that QEMU will handle the mask bit for this entry.
> 
> Technically, for Xen to not care about those writes, it would need to
> observe this flag on all vectors, including those not mapped yet. In
> practice though, I think it might be okay to say device model should set
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING flag consistently (either on none of
> them, or all of them), and Xen can rely on it (if one vector has
> XEN_DOMCTL_VMSI_X86_MASK_HANDLING, then assume all of them will have
> it).

I agree.  I would just return -EINVAL if the flag is not consistent
across vectors on the same device.

> > QEMU using this flag should be prepared to handle the mask bit, but if
> > Xen doesn't know the flag it will keep processing the mask bit.
> > 
> > > 3. QEMU using xc_domain_update_msi_irq and XEN_DOMCTL_VMSI_X86_UNMASKED
> > > to update Xen about the mask state too.
> > > 4. Xen no longer interpreting writes to mask bit, but still intercepting
> > > them to passthorugh those outside of MSI-X structures (the other patch
> > > in the series). But the handler would still need to stay, to keep
> > > working with older QEMU versions.
> > 
> > Xen would need to intercept writes to the page(s) containing the MSI-X
> > table in any case, but the logic is much simpler if it just needs to
> > decide whether the accesses fall inside of the table region, and thus
> > needs to be forwarded to the device model, or fails outside of it and
> > needs to be propagated to the real address.
> > 
> > While true that we won't be able to remove the code that partially
> > handles MSIX entries for guests in Xen, it would be unused for newer
> > versions of QEMU, hopefully making the handling far more consistent as
> > the logic will be entirely in QEMU rather than split between Xen and
> > QEMU.
> 
> In fact, it was easier for me to register a separate ioreq server for
> writes outside of the MSI-X table. But I'm afraid the current one would
> need to stay registered (just not accepting writes).

(I assume in the paragraph above you should use hvm_io_handler rather
than ioreq server, as ioreqs are only for emulation running outside of
Xen)

The handle is currently registered when a device with MSIX is assigned
to an HVM domain.  I think it's fine to leave as-is as long as the
handler doesn't accept any accesses if QEMU does handle the mask bit.
We can always see later about not registering it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
  2023-03-27 10:26   ` Marek Marczykowski-Górecki
@ 2023-03-27 15:32   ` Jan Beulich
  2023-03-27 15:42     ` Roger Pau Monné
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-27 15:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu,
	Marek Marczykowski-Górecki

On 27.03.2023 12:12, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
>> QEMU needs to know whether clearing maskbit of a vector is really
>> clearing, or was already cleared before. Currently Xen sends only
>> clearing that bit to the device model, but not setting it, so QEMU
>> cannot detect it. Because of that, QEMU is working this around by
>> checking via /dev/mem, but that isn't the proper approach.
>>
>> Give all necessary information to QEMU by passing all ctrl writes,
>> including masking a vector. This does include forwarding also writes
>> that did not change the value, but as tested on both Linux (6.1.12) and
>> Windows (10 pro), they don't do excessive writes of unchanged values
>> (Windows seems to clear maskbit in some cases twice, but not more).
>
> Since we passthrough all the accesses to the device model, is the
> handling in Xen still required?

"All accesses" isn't really correct; aiui even after this change it's only
"all writes". We still "accelerate" reading from the first 3 table entries
(whether or not that's [still] useful in this shape is a separate question).
Plus we need to invoke guest_mask_msi_irq() as necessary, and I don't think
we should make ourselves dependent upon qemu communicating the necessary
info back to us, when that's necessary for the correctness of Xen's internal
interrupt handling. (That's further leaving aside the performance aspect of
handing off to qemu just for it to pass data back to us.)

Jan

>  It might be worth to also expose any
> interfaces needed to the device model so all the functionality done by
> the msixtbl_mmio_ops hooks could be done by QEMU, since we end up
> passing the accesses anyway.
>
>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> ---
>> v2:
>>  - passthrough quad writes to emulator too (Jan)
>>  - (ab)use len==0 for write len=4 completion (Jan), but add descriptive
>>    #define for this magic value
>>
>> This behavior change needs to be surfaced to the device model somehow,
>> so it knows whether it can rely on it. I'm open for suggestions.
>
> Maybe exposed in XEN_DMOP_get_ioreq_server_info?
>
> But I wonder whether it shouldn't be the other way arround, the device
> model tells Xen it doesn't need to handle any MSI-X accesses because
> QEMU will take care of it, likely using a new flag in
> XEN_DMOP_create_ioreq_server or maybe in XEN_DOMCTL_bind_pt_irq as
> part of the gflags, but then we would need to assert that the flag is
> passed for all MSI-X interrupts bound from that device to the same
> domain.
>
> Thanks, Roger.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model
  2023-03-27 15:32   ` Jan Beulich
@ 2023-03-27 15:42     ` Roger Pau Monné
  0 siblings, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-27 15:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu,
	Marek Marczykowski-Górecki

On Mon, Mar 27, 2023 at 05:32:01PM +0200, Jan Beulich wrote:
> On 27.03.2023 12:12, Roger Pau Monné wrote:
> > On Sat, Mar 25, 2023 at 03:49:22AM +0100, Marek Marczykowski-Górecki wrote:
> >> QEMU needs to know whether clearing maskbit of a vector is really
> >> clearing, or was already cleared before. Currently Xen sends only
> >> clearing that bit to the device model, but not setting it, so QEMU
> >> cannot detect it. Because of that, QEMU is working this around by
> >> checking via /dev/mem, but that isn't the proper approach.
> >>
> >> Give all necessary information to QEMU by passing all ctrl writes,
> >> including masking a vector. This does include forwarding also writes
> >> that did not change the value, but as tested on both Linux (6.1.12) and
> >> Windows (10 pro), they don't do excessive writes of unchanged values
> >> (Windows seems to clear maskbit in some cases twice, but not more).
> >
> > Since we passthrough all the accesses to the device model, is the
> > handling in Xen still required?
> 
> "All accesses" isn't really correct; aiui even after this change it's only
> "all writes". We still "accelerate" reading from the first 3 table entries
> (whether or not that's [still] useful in this shape is a separate question).
> Plus we need to invoke guest_mask_msi_irq() as necessary, and I don't think
> we should make ourselves dependent upon qemu communicating the necessary
> info back to us, when that's necessary for the correctness of Xen's internal
> interrupt handling. (That's further leaving aside the performance aspect of
> handing off to qemu just for it to pass data back to us.)

The call to guest_mask_msi_irq() is a result of a guest action, I'm
failing to see how that's necessary for the correctness of Xen's
internal interrupt handling.  In my proposed model QEMU won't be
allowed direct access to the physical MSIX entry mask bits, and would
instead use an hypercall to mask/unmask MSIX entries (and thus
guest_mask_msi_irq() would be called as appropriate).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
  2023-03-27 10:47   ` Andrew Cooper
@ 2023-03-28 11:28   ` Roger Pau Monné
  2023-03-28 12:05     ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-28 11:28 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
> server that handle those writes.
> 
> Doing this, requires correlating write location with guest view
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This can be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, forbid writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v2:
>  - adjust commit message
>  - pass struct domain to msixtbl_page_handler_get_hwaddr()
>  - reduce local variables used only once
>  - log a warning if write is forbidden if MSI-X and PBA lives on the same
>    page
>  - do not passthrough unaligned accesses
>  - handle accesses both before and after MSI-X table
> ---
>  xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
>  xen/arch/x86/include/asm/msi.h |   5 ++
>  xen/arch/x86/msi.c             |  38 ++++++++
>  3 files changed, 197 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 9c82bf9b4ec2..9293009a4075 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
>      .write = _msixtbl_write,
>  };
>  
> +static void __iomem *msixtbl_page_handler_get_hwaddr(
> +    const struct domain *d,
> +    uint64_t address,
> +    bool write)
> +{
> +    const struct pci_dev *pdev = NULL;
> +    const struct msixtbl_entry *entry;
> +    int adj_type;

unsigned AFAICT.

> +
> +    rcu_read_lock(&msixtbl_rcu_lock);
> +    /*
> +     * Check if it's on the same page as the end of the MSI-X table, but
> +     * outside of the table itself.
> +     */
> +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> +    {
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> +             address < entry->gtable )
> +        {
> +            adj_type = ADJ_IDX_FIRST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) &&

Should be entry->gtable + entry->table_len - 1, or else you are
off-by-one.

> +             address >= entry->gtable + entry->table_len )
> +        {
> +            adj_type = ADJ_IDX_LAST;
> +            pdev = entry->pdev;
> +            break;
> +        }
> +    }
> +    rcu_read_unlock(&msixtbl_rcu_lock);
> +
> +    if ( !pdev )
> +        return NULL;
> +
> +    ASSERT(pdev->msix);
> +
> +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "Page for adjacent MSI-X table access not initialized for %pp\n",
> +                 pdev);
> +
> +        return NULL;
> +    }
> +
> +    /* If PBA lives on the same page too, forbid writes. */
> +    if ( write && (
> +        (adj_type == ADJ_IDX_LAST &&
> +         pdev->msix->table.last == pdev->msix->pba.first) ||
> +        (adj_type == ADJ_IDX_FIRST &&
> +         pdev->msix->table.first == pdev->msix->pba.last)) )
> +    {
> +        gdprintk(XENLOG_WARNING,
> +                 "MSI-X table and PBA of %pp live on the same page, "
> +                 "writing to other registers there is not implemented\n",
> +                 pdev);

Don't you actually need to check that the passed address falls into the
PBA array?  PBA can be sharing the same page as the MSI-X table, but
that doesn't imply there aren't holes also not used by either the PBA
or the MSI-X table in such page.

> +        return NULL;
> +    }
> +
> +    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
> +        (address & (PAGE_SIZE - 1));

You can use PAGE_OFFSET().

> +
> +}
> +
> +static bool cf_check msixtbl_page_accept(
> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> +{
> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> +
> +    return msixtbl_page_handler_get_hwaddr(
> +            current->domain, r->addr, r->dir == IOREQ_WRITE);

I think you want to accept it also if it's a write to the PBA, and
just drop it.  You should always pass write=false and then drop it in
msixtbl_page_write() if it falls in the PBA region (but still return
X86EMUL_OKAY).

> +}
> +
> +static int cf_check msixtbl_page_read(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t *pval)

Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
accesses here.

> +{
> +    void __iomem *hwaddr;

const

I would also initialize *pval = ~0ul for safety.

> +
> +    if ( address & (len - 1) )
> +        return X86EMUL_UNHANDLEABLE;

You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
asked to split unaligned accesses into 1byte ones, but I think for
guests it's better to just drop them unless there's a specific case
that we need to handle.

Also you should return X86EMUL_OKAY here, as the address is handled
here, but the current way to handle it is to drop the access.

Printing a debug message can be helpful in case unaligned accesses
need to be implemented in order to support some device.

> +
> +    hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, false);
> +
> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;

Maybe X86EMUL_RETRY, since the page was there in the accept handler.

> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;

Nit: we usually add a newline after every break;

> +    default:
> +        return X86EMUL_UNHANDLEABLE;

I would rather use ASSERT_UNREACHABLE() here and fallthrough to the
return below.  Seeing such size is likely an indication of something
else gone very wrong, better to just terminate the access at once.

> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int cf_check msixtbl_page_write(
> +        const struct hvm_io_handler *handler,
> +        uint64_t address, uint32_t len, uint64_t val)
> +{
> +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> +            current->domain, address, true);
> +

You don't seem to check whether the access is aligned here?

Otherwise I have mostly the same comments as in msixtbl_page_read().

> +    if ( !hwaddr )
> +        return X86EMUL_UNHANDLEABLE;
> +
> +    switch ( len ) {
> +        case 1:
> +            writeb(val, hwaddr);
> +            break;
> +        case 2:
> +            writew(val, hwaddr);
> +            break;
> +        case 4:
> +            writel(val, hwaddr);
> +            break;
> +        case 8:
> +            writeq(val, hwaddr);
> +            break;
> +        default:
> +            return X86EMUL_UNHANDLEABLE;
> +    }
> +    return X86EMUL_OKAY;
> +
> +}
> +
> +static const struct hvm_io_ops msixtbl_mmio_page_ops = {
> +    .accept = msixtbl_page_accept,
> +    .read = msixtbl_page_read,
> +    .write = msixtbl_page_write,
> +};
> +
>  static void add_msixtbl_entry(struct domain *d,
>                                struct pci_dev *pdev,
>                                uint64_t gtable,
> @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
>          handler->type = IOREQ_TYPE_COPY;
>          handler->ops = &msixtbl_mmio_ops;
>      }
> +
> +    /* passthrough access to other registers on the same page */
> +    handler = hvm_next_io_handler(d);
> +    if ( handler )
> +    {
> +        handler->type = IOREQ_TYPE_COPY;
> +        handler->ops = &msixtbl_mmio_page_ops;
> +    }
>  }
>  
>  void msixtbl_pt_cleanup(struct domain *d)
> diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> index a53ade95c9ad..d13cf1c1f873 100644
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_table_idx[] below */
> +#define ADJ_IDX_FIRST 0
> +#define ADJ_IDX_LAST  1
> +
>  struct arch_msix {
>      unsigned int nr_entries, used_entries;
>      struct {
> @@ -214,6 +218,7 @@ struct arch_msix {
>      } table, pba;
>      int table_refcnt[MAX_MSIX_TABLE_PAGES];
>      int table_idx[MAX_MSIX_TABLE_PAGES];
> +    int adj_access_table_idx[2];
>      spinlock_t table_lock;
>      bool host_maskall, guest_maskall;
>      domid_t warned;
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index d0bf63df1def..680853f84685 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
>                  domain_crash(d);
>              /* XXX How to deal with existing mappings? */
>          }
> +
> +        /*
> +         * If the MSI-X table doesn't start at the page boundary, map the first page for
> +         * passthrough accesses.
> +         */

I think you should initialize
msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?

> +        if ( table_paddr & (PAGE_SIZE - 1) )
> +        {
> +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
> +        }
> +        /*
> +         * If the MSI-X table doesn't span full page(s), map the last page for
> +         * passthrough accesses.
> +         */
> +        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
> +        {
> +            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
> +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> +
> +            if ( idx >= 0 )
> +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> +            else
> +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
> +        }
>      }
>      WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
>      ++msix->used_entries;
> @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> +        }
> +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
> +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;

Isn't 0 a valid idx?  You should check for >= 0 and also set
to -1 once the fixmap entry has been put.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
@ 2023-03-28 11:37   ` Roger Pau Monné
  2023-03-28 12:07     ` Marek Marczykowski-Górecki
  2023-03-28 12:54   ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-28 11:37 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Paul Durrant

On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Xen relies on initial state being both disabled.

The 'both' in this sentence seems out of context, as you just mention
MASKALL in the previous sentence.

> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> 
> Reported-by: Jason Andryuk <jandryuk@gmail.com>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> v2:
>  - new patch
> ---
>  xen/drivers/passthrough/msi.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
> index ce1a450f6f4a..60adad47e379 100644
> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>          msix->nr_entries = msix_table_size(ctrl);
>  
> +        /*
> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> +         * initial state.
> +         */
> +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);

Should you make this conditional to them being set in the first place:

if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
{
    ...

We should avoid the extra access if possible (specially if it's to
write the same value that has been read).

I would even move this before the msix_table_size() call, so the
handling of ctrl is closer together.

Would it also be worth to print a debug message that the initial
device state seems inconsistent?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 11:28   ` Roger Pau Monné
@ 2023-03-28 12:05     ` Marek Marczykowski-Górecki
  2023-03-28 12:34       ` Jan Beulich
  2023-03-28 12:52       ` Roger Pau Monné
  0 siblings, 2 replies; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-28 12:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 13633 bytes --]

On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> > on the same page as MSI-X table. Device model (especially one in
> > stubdomain) cannot really handle those, as direct writes to that page is
> > refused (page is on the mmio_ro_ranges list). Instead, add internal ioreq
> > server that handle those writes.
> > 
> > Doing this, requires correlating write location with guest view
> > of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> > for PV would need to be done separately.
> > 
> > This can be also used to read Pending Bit Array, if it lives on the same
> > page, making QEMU not needing /dev/mem access at all (especially helpful
> > with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> > map it to the guest directly.
> > If PBA lives on the same page, forbid writes and log a message.
> > Technically, writes outside of PBA could be allowed, but at this moment
> > the precise location of PBA isn't saved.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > v2:
> >  - adjust commit message
> >  - pass struct domain to msixtbl_page_handler_get_hwaddr()
> >  - reduce local variables used only once
> >  - log a warning if write is forbidden if MSI-X and PBA lives on the same
> >    page
> >  - do not passthrough unaligned accesses
> >  - handle accesses both before and after MSI-X table
> > ---
> >  xen/arch/x86/hvm/vmsi.c        | 154 +++++++++++++++++++++++++++++++++
> >  xen/arch/x86/include/asm/msi.h |   5 ++
> >  xen/arch/x86/msi.c             |  38 ++++++++
> >  3 files changed, 197 insertions(+)
> > 
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 9c82bf9b4ec2..9293009a4075 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -438,6 +438,152 @@ static const struct hvm_io_ops msixtbl_mmio_ops = {
> >      .write = _msixtbl_write,
> >  };
> >  
> > +static void __iomem *msixtbl_page_handler_get_hwaddr(
> > +    const struct domain *d,
> > +    uint64_t address,
> > +    bool write)
> > +{
> > +    const struct pci_dev *pdev = NULL;
> > +    const struct msixtbl_entry *entry;
> > +    int adj_type;
> 
> unsigned AFAICT.

Ok.

> > +
> > +    rcu_read_lock(&msixtbl_rcu_lock);
> > +    /*
> > +     * Check if it's on the same page as the end of the MSI-X table, but
> > +     * outside of the table itself.
> > +     */
> > +    list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> > +    {
> > +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable) &&
> > +             address < entry->gtable )
> > +        {
> > +            adj_type = ADJ_IDX_FIRST;
> > +            pdev = entry->pdev;
> > +            break;
> > +        }
> > +        if ( PFN_DOWN(address) == PFN_DOWN(entry->gtable + entry->table_len) &&
> 
> Should be entry->gtable + entry->table_len - 1, or else you are
> off-by-one.

Ok.

> > +             address >= entry->gtable + entry->table_len )
> > +        {
> > +            adj_type = ADJ_IDX_LAST;
> > +            pdev = entry->pdev;
> > +            break;
> > +        }
> > +    }
> > +    rcu_read_unlock(&msixtbl_rcu_lock);
> > +
> > +    if ( !pdev )
> > +        return NULL;
> > +
> > +    ASSERT(pdev->msix);
> > +
> > +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> > +    {
> > +        gdprintk(XENLOG_WARNING,
> > +                 "Page for adjacent MSI-X table access not initialized for %pp\n",
> > +                 pdev);
> > +
> > +        return NULL;
> > +    }
> > +
> > +    /* If PBA lives on the same page too, forbid writes. */
> > +    if ( write && (
> > +        (adj_type == ADJ_IDX_LAST &&
> > +         pdev->msix->table.last == pdev->msix->pba.first) ||
> > +        (adj_type == ADJ_IDX_FIRST &&
> > +         pdev->msix->table.first == pdev->msix->pba.last)) )
> > +    {
> > +        gdprintk(XENLOG_WARNING,
> > +                 "MSI-X table and PBA of %pp live on the same page, "
> > +                 "writing to other registers there is not implemented\n",
> > +                 pdev);
> 
> Don't you actually need to check that the passed address falls into the
> PBA array?  PBA can be sharing the same page as the MSI-X table, but
> that doesn't imply there aren't holes also not used by either the PBA
> or the MSI-X table in such page.

I don't know exact location of PBA, so I'm rejecting writes just in case
they do hit PBA (see commit message).

> > +        return NULL;
> > +    }
> > +
> > +    return fix_to_virt(pdev->msix->adj_access_table_idx[adj_type]) +
> > +        (address & (PAGE_SIZE - 1));
> 
> You can use PAGE_OFFSET().

Ok.

> > +
> > +}
> > +
> > +static bool cf_check msixtbl_page_accept(
> > +        const struct hvm_io_handler *handler, const ioreq_t *r)
> > +{
> > +    ASSERT(r->type == IOREQ_TYPE_COPY);
> > +
> > +    return msixtbl_page_handler_get_hwaddr(
> > +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> 
> I think you want to accept it also if it's a write to the PBA, and
> just drop it.  You should always pass write=false and then drop it in
> msixtbl_page_write() if it falls in the PBA region (but still return
> X86EMUL_OKAY).

I don't want to interfere with msixtbl_mmio_page_ops, this handler is
only about accesses not hitting actual MSI-X structures.

> > +}
> > +
> > +static int cf_check msixtbl_page_read(
> > +        const struct hvm_io_handler *handler,
> > +        uint64_t address, uint32_t len, uint64_t *pval)
> 
> Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
> accesses here.

I followed how msixtbl_mmio_ops are registered. Should that also be
changed to hvm_mmio_ops?

> 
> > +{
> > +    void __iomem *hwaddr;
> 
> const
> 
> I would also initialize *pval = ~0ul for safety.

When returning X86EMUL_OKAY, then I agree.

> > +
> > +    if ( address & (len - 1) )
> > +        return X86EMUL_UNHANDLEABLE;
> 
> You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
> asked to split unaligned accesses into 1byte ones, but I think for
> guests it's better to just drop them unless there's a specific case
> that we need to handle.
> 
> Also you should return X86EMUL_OKAY here, as the address is handled
> here, but the current way to handle it is to drop the access.
> 
> Printing a debug message can be helpful in case unaligned accesses
> need to be implemented in order to support some device.
> 
> > +
> > +    hwaddr = msixtbl_page_handler_get_hwaddr(
> > +            current->domain, address, false);
> > +
> > +    if ( !hwaddr )
> > +        return X86EMUL_UNHANDLEABLE;
> 
> Maybe X86EMUL_RETRY, since the page was there in the accept handler.

How does X86EMUL_RETRY work? Is it trying to find the handler again?

> > +
> > +    switch ( len )
> > +    {
> > +    case 1:
> > +        *pval = readb(hwaddr);
> > +        break;
> > +    case 2:
> > +        *pval = readw(hwaddr);
> > +        break;
> > +    case 4:
> > +        *pval = readl(hwaddr);
> > +        break;
> > +    case 8:
> > +        *pval = readq(hwaddr);
> > +        break;
> 
> Nit: we usually add a newline after every break;

Ok.

> > +    default:
> > +        return X86EMUL_UNHANDLEABLE;
> 
> I would rather use ASSERT_UNREACHABLE() here and fallthrough to the
> return below.  Seeing such size is likely an indication of something
> else gone very wrong, better to just terminate the access at once.
> 
> > +    }
> > +    return X86EMUL_OKAY;
> > +}
> > +
> > +static int cf_check msixtbl_page_write(
> > +        const struct hvm_io_handler *handler,
> > +        uint64_t address, uint32_t len, uint64_t val)
> > +{
> > +    void __iomem *hwaddr = msixtbl_page_handler_get_hwaddr(
> > +            current->domain, address, true);
> > +
> 
> You don't seem to check whether the access is aligned here?
> 
> Otherwise I have mostly the same comments as in msixtbl_page_read().

Ok.

> > +    if ( !hwaddr )
> > +        return X86EMUL_UNHANDLEABLE;
> > +
> > +    switch ( len ) {
> > +        case 1:
> > +            writeb(val, hwaddr);
> > +            break;
> > +        case 2:
> > +            writew(val, hwaddr);
> > +            break;
> > +        case 4:
> > +            writel(val, hwaddr);
> > +            break;
> > +        case 8:
> > +            writeq(val, hwaddr);
> > +            break;
> > +        default:
> > +            return X86EMUL_UNHANDLEABLE;
> > +    }
> > +    return X86EMUL_OKAY;
> > +
> > +}
> > +
> > +static const struct hvm_io_ops msixtbl_mmio_page_ops = {
> > +    .accept = msixtbl_page_accept,
> > +    .read = msixtbl_page_read,
> > +    .write = msixtbl_page_write,
> > +};
> > +
> >  static void add_msixtbl_entry(struct domain *d,
> >                                struct pci_dev *pdev,
> >                                uint64_t gtable,
> > @@ -593,6 +739,14 @@ void msixtbl_init(struct domain *d)
> >          handler->type = IOREQ_TYPE_COPY;
> >          handler->ops = &msixtbl_mmio_ops;
> >      }
> > +
> > +    /* passthrough access to other registers on the same page */
> > +    handler = hvm_next_io_handler(d);
> > +    if ( handler )
> > +    {
> > +        handler->type = IOREQ_TYPE_COPY;
> > +        handler->ops = &msixtbl_mmio_page_ops;
> > +    }
> >  }
> >  
> >  void msixtbl_pt_cleanup(struct domain *d)
> > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> > index a53ade95c9ad..d13cf1c1f873 100644
> > --- a/xen/arch/x86/include/asm/msi.h
> > +++ b/xen/arch/x86/include/asm/msi.h
> > @@ -207,6 +207,10 @@ struct msg_address {
> >                                         PCI_MSIX_ENTRY_SIZE + \
> >                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
> >  
> > +/* indexes in adj_access_table_idx[] below */
> > +#define ADJ_IDX_FIRST 0
> > +#define ADJ_IDX_LAST  1
> > +
> >  struct arch_msix {
> >      unsigned int nr_entries, used_entries;
> >      struct {
> > @@ -214,6 +218,7 @@ struct arch_msix {
> >      } table, pba;
> >      int table_refcnt[MAX_MSIX_TABLE_PAGES];
> >      int table_idx[MAX_MSIX_TABLE_PAGES];
> > +    int adj_access_table_idx[2];
> >      spinlock_t table_lock;
> >      bool host_maskall, guest_maskall;
> >      domid_t warned;
> > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> > index d0bf63df1def..680853f84685 100644
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
> >                  domain_crash(d);
> >              /* XXX How to deal with existing mappings? */
> >          }
> > +
> > +        /*
> > +         * If the MSI-X table doesn't start at the page boundary, map the first page for
> > +         * passthrough accesses.
> > +         */
> 
> I think you should initialize
> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?
> 
> > +        if ( table_paddr & (PAGE_SIZE - 1) )
> > +        {
> > +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> > +
> > +            if ( idx >= 0 )
> > +                msix->adj_access_table_idx[ADJ_IDX_FIRST] = idx;
> > +            else
> > +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: %d\n", idx);
> > +        }
> > +        /*
> > +         * If the MSI-X table doesn't span full page(s), map the last page for
> > +         * passthrough accesses.
> > +         */
> > +        if ( (table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE) & (PAGE_SIZE - 1) )
> > +        {
> > +            uint64_t entry_paddr = table_paddr + msix->nr_entries * PCI_MSIX_ENTRY_SIZE;
> > +            int idx = msix_get_fixmap(msix, table_paddr, entry_paddr);
> > +
> > +            if ( idx >= 0 )
> > +                msix->adj_access_table_idx[ADJ_IDX_LAST] = idx;
> > +            else
> > +                gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: %d\n", idx);
> > +        }
> >      }
> >      WARN_ON(msix->table.first != (table_paddr >> PAGE_SHIFT));
> >      ++msix->used_entries;
> > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
> >              WARN();
> >          msix->table.first = 0;
> >          msix->table.last = 0;
> > +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> > +        {
> > +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> > +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> > +        }
> > +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> > +        {
> > +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
> > +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;
> 
> Isn't 0 a valid idx?  You should check for >= 0 and also set
> to -1 once the fixmap entry has been put.

I rely here on msix using specific range of fixmap indexes
(FIX_MSIX_TO_RESERV_BASE - FIX_MSIX_TO_RESERV_END), which isn't starting
at 0. For this reason also, I keep unused entries at 0 (no need explicit
initialization).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 11:37   ` Roger Pau Monné
@ 2023-03-28 12:07     ` Marek Marczykowski-Górecki
  2023-03-28 12:38       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-28 12:07 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jason Andryuk, Jan Beulich, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

On Tue, Mar 28, 2023 at 01:37:14PM +0200, Roger Pau Monné wrote:
> On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Xen relies on initial state being both disabled.
> 
> The 'both' in this sentence seems out of context, as you just mention
> MASKALL in the previous sentence.
> 
> > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > 
> > Reported-by: Jason Andryuk <jandryuk@gmail.com>
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> > v2:
> >  - new patch
> > ---
> >  xen/drivers/passthrough/msi.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
> > index ce1a450f6f4a..60adad47e379 100644
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> >          msix->nr_entries = msix_table_size(ctrl);
> >  
> > +        /*
> > +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> > +         * initial state.
> > +         */
> > +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
> > +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
> 
> Should you make this conditional to them being set in the first place:
> 
> if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
> {
>     ...
> 
> We should avoid the extra access if possible (specially if it's to
> write the same value that has been read).
> 
> I would even move this before the msix_table_size() call, so the
> handling of ctrl is closer together.
> 
> Would it also be worth to print a debug message that the initial
> device state seems inconsistent?

That may make sense. XENLOG_WARNING? XENLOG_DEBUG?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 12:05     ` Marek Marczykowski-Górecki
@ 2023-03-28 12:34       ` Jan Beulich
  2023-03-28 12:52         ` Marek Marczykowski-Górecki
  2023-03-28 12:52       ` Roger Pau Monné
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 12:34 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu, Roger Pau Monné

On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
>>> +static bool cf_check msixtbl_page_accept(
>>> +        const struct hvm_io_handler *handler, const ioreq_t *r)
>>> +{
>>> +    ASSERT(r->type == IOREQ_TYPE_COPY);
>>> +
>>> +    return msixtbl_page_handler_get_hwaddr(
>>> +            current->domain, r->addr, r->dir == IOREQ_WRITE);
>>
>> I think you want to accept it also if it's a write to the PBA, and
>> just drop it.  You should always pass write=false and then drop it in
>> msixtbl_page_write() if it falls in the PBA region (but still return
>> X86EMUL_OKAY).
> 
> I don't want to interfere with msixtbl_mmio_page_ops, this handler is
> only about accesses not hitting actual MSI-X structures.

In his functionally similar vPCI change I did ask Roger to handle the
"extra" space right from the same handlers. Maybe that's going to be
best here, too.
>>> +    hwaddr = msixtbl_page_handler_get_hwaddr(
>>> +            current->domain, address, false);
>>> +
>>> +    if ( !hwaddr )
>>> +        return X86EMUL_UNHANDLEABLE;
>>
>> Maybe X86EMUL_RETRY, since the page was there in the accept handler.
> 
> How does X86EMUL_RETRY work? Is it trying to find the handler again?

It exits back to guest context, to restart the insn in question. Then
the full emulation path may be re-triggered. If the region was removed
from the guest, a handler then won't be found, and the access handed
to the device model.

>>> --- a/xen/arch/x86/msi.c
>>> +++ b/xen/arch/x86/msi.c
>>> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
>>>                  domain_crash(d);
>>>              /* XXX How to deal with existing mappings? */
>>>          }
>>> +
>>> +        /*
>>> +         * If the MSI-X table doesn't start at the page boundary, map the first page for
>>> +         * passthrough accesses.
>>> +         */
>>
>> I think you should initialize
>> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?

Or better not use a signed type there and set to UINT_MAX here.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 12:07     ` Marek Marczykowski-Górecki
@ 2023-03-28 12:38       ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 12:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Paul Durrant, Roger Pau Monné

On 28.03.2023 14:07, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 01:37:14PM +0200, Roger Pau Monné wrote:
>> On Sat, Mar 25, 2023 at 03:49:24AM +0100, Marek Marczykowski-Górecki wrote:
>>> --- a/xen/drivers/passthrough/msi.c
>>> +++ b/xen/drivers/passthrough/msi.c
>>> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>>>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>>>          msix->nr_entries = msix_table_size(ctrl);
>>>  
>>> +        /*
>>> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
>>> +         * initial state.
>>> +         */
>>> +        ctrl &= ~(PCI_MSIX_FLAGS_ENABLE|PCI_MSIX_FLAGS_MASKALL);
>>> +        pci_conf_write16(pdev->sbdf, msix_control_reg(pos), ctrl);
>>
>> Should you make this conditional to them being set in the first place:
>>
>> if ( ctrl & (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL) )
>> {
>>     ...
>>
>> We should avoid the extra access if possible (specially if it's to
>> write the same value that has been read).
>>
>> I would even move this before the msix_table_size() call, so the
>> handling of ctrl is closer together.
>>
>> Would it also be worth to print a debug message that the initial
>> device state seems inconsistent?
> 
> That may make sense. XENLOG_WARNING? XENLOG_DEBUG?

Warning I would say, as this isn't liable to repeat frequently for the
same device. And it's helpful to see even in release build runs using
default log levels.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 12:34       ` Jan Beulich
@ 2023-03-28 12:52         ` Marek Marczykowski-Górecki
  2023-03-28 13:03           ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-28 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote:
> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> >> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> >>> +static bool cf_check msixtbl_page_accept(
> >>> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> >>> +{
> >>> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> >>> +
> >>> +    return msixtbl_page_handler_get_hwaddr(
> >>> +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> >>
> >> I think you want to accept it also if it's a write to the PBA, and
> >> just drop it.  You should always pass write=false and then drop it in
> >> msixtbl_page_write() if it falls in the PBA region (but still return
> >> X86EMUL_OKAY).
> > 
> > I don't want to interfere with msixtbl_mmio_page_ops, this handler is
> > only about accesses not hitting actual MSI-X structures.
> 
> In his functionally similar vPCI change I did ask Roger to handle the
> "extra" space right from the same handlers. Maybe that's going to be
> best here, too.

I have considered this option, but msixtbl_range() is already quite
complex, adding yet another case there won't make it easier to follow.
I mean, technically I can probably merge those two handlers together,
but I don't think it will result in nicer code. Especially since the
general direction is to abandon split of MSI-X table access handling
between Xen and QEMU and go with just QEMU doing it, hopefully at some
point not needing msixtbl_mmio_ops anymore (but still needing the one
for adjacent accesses).

> >>> --- a/xen/arch/x86/msi.c
> >>> +++ b/xen/arch/x86/msi.c
> >>> @@ -961,6 +961,34 @@ static int msix_capability_init(struct pci_dev *dev,
> >>>                  domain_crash(d);
> >>>              /* XXX How to deal with existing mappings? */
> >>>          }
> >>> +
> >>> +        /*
> >>> +         * If the MSI-X table doesn't start at the page boundary, map the first page for
> >>> +         * passthrough accesses.
> >>> +         */
> >>
> >> I think you should initialize
> >> msix->adj_access_table_idx[ADJ_IDX_{FIRST,LAST}] to -1?
> 
> Or better not use a signed type there and set to UINT_MAX here.

If not using 0 as unused entry (see the other commend I made in response
to Roger), then that's probably the way to go.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 12:05     ` Marek Marczykowski-Górecki
  2023-03-28 12:34       ` Jan Beulich
@ 2023-03-28 12:52       ` Roger Pau Monné
  1 sibling, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-28 12:52 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Jan Beulich, Andrew Cooper, Wei Liu

On Tue, Mar 28, 2023 at 02:05:14PM +0200, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> > On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> > > +             address >= entry->gtable + entry->table_len )
> > > +        {
> > > +            adj_type = ADJ_IDX_LAST;
> > > +            pdev = entry->pdev;
> > > +            break;
> > > +        }
> > > +    }
> > > +    rcu_read_unlock(&msixtbl_rcu_lock);
> > > +
> > > +    if ( !pdev )
> > > +        return NULL;
> > > +
> > > +    ASSERT(pdev->msix);
> > > +
> > > +    if ( !pdev->msix->adj_access_table_idx[adj_type] )
> > > +    {
> > > +        gdprintk(XENLOG_WARNING,
> > > +                 "Page for adjacent MSI-X table access not initialized for %pp\n",
> > > +                 pdev);
> > > +
> > > +        return NULL;
> > > +    }
> > > +
> > > +    /* If PBA lives on the same page too, forbid writes. */
> > > +    if ( write && (
> > > +        (adj_type == ADJ_IDX_LAST &&
> > > +         pdev->msix->table.last == pdev->msix->pba.first) ||
> > > +        (adj_type == ADJ_IDX_FIRST &&
> > > +         pdev->msix->table.first == pdev->msix->pba.last)) )
> > > +    {
> > > +        gdprintk(XENLOG_WARNING,
> > > +                 "MSI-X table and PBA of %pp live on the same page, "
> > > +                 "writing to other registers there is not implemented\n",
> > > +                 pdev);
> > 
> > Don't you actually need to check that the passed address falls into the
> > PBA array?  PBA can be sharing the same page as the MSI-X table, but
> > that doesn't imply there aren't holes also not used by either the PBA
> > or the MSI-X table in such page.
> 
> I don't know exact location of PBA, so I'm rejecting writes just in case
> they do hit PBA (see commit message).

Hm, maybe we should cache the address and size of the PBA in
msix_capability_init().

> > > +
> > > +}
> > > +
> > > +static bool cf_check msixtbl_page_accept(
> > > +        const struct hvm_io_handler *handler, const ioreq_t *r)
> > > +{
> > > +    ASSERT(r->type == IOREQ_TYPE_COPY);
> > > +
> > > +    return msixtbl_page_handler_get_hwaddr(
> > > +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> > 
> > I think you want to accept it also if it's a write to the PBA, and
> > just drop it.  You should always pass write=false and then drop it in
> > msixtbl_page_write() if it falls in the PBA region (but still return
> > X86EMUL_OKAY).
> 
> I don't want to interfere with msixtbl_mmio_page_ops, this handler is
> only about accesses not hitting actual MSI-X structures.

But msixtbl_mmio_page_ops doesn't handle PBA array accesses at all, so
you won't interfere by accepting PBA writes here and dropping them in
msixtbl_page_write()?

Maybe there's some piece I'm missing.

> > > +}
> > > +
> > > +static int cf_check msixtbl_page_read(
> > > +        const struct hvm_io_handler *handler,
> > > +        uint64_t address, uint32_t len, uint64_t *pval)
> > 
> > Why use hvm_io_ops and not hvm_mmio_ops?  You only care about MMIO
> > accesses here.
> 
> I followed how msixtbl_mmio_ops are registered. Should that also be
> changed to hvm_mmio_ops?

Maybe, but let's leave that for another series I think.  Using
hvm_mmio_ops and register_mmio_handler() together with the slightly
simplified handlers will allow you to drop some of the code.

> > > +
> > > +    if ( address & (len - 1) )
> > > +        return X86EMUL_UNHANDLEABLE;
> > 
> > You can use IS_ALIGNED for clarity.  In my fix for this for vPCI Jan
> > asked to split unaligned accesses into 1byte ones, but I think for
> > guests it's better to just drop them unless there's a specific case
> > that we need to handle.
> > 
> > Also you should return X86EMUL_OKAY here, as the address is handled
> > here, but the current way to handle it is to drop the access.
> > 
> > Printing a debug message can be helpful in case unaligned accesses
> > need to be implemented in order to support some device.
> > 
> > > +
> > > +    hwaddr = msixtbl_page_handler_get_hwaddr(
> > > +            current->domain, address, false);
> > > +
> > > +    if ( !hwaddr )
> > > +        return X86EMUL_UNHANDLEABLE;
> > 
> > Maybe X86EMUL_RETRY, since the page was there in the accept handler.
> 
> How does X86EMUL_RETRY work? Is it trying to find the handler again?

Hm, I'm not sure it works as I thought it does, so maybe not a good
suggestion after all.

> > > @@ -1090,6 +1118,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
> > >              WARN();
> > >          msix->table.first = 0;
> > >          msix->table.last = 0;
> > > +        if ( msix->adj_access_table_idx[ADJ_IDX_FIRST] )
> > > +        {
> > > +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_FIRST]);
> > > +            msix->adj_access_table_idx[ADJ_IDX_FIRST] = 0;
> > > +        }
> > > +        if ( msix->adj_access_table_idx[ADJ_IDX_LAST] )
> > > +        {
> > > +            msix_put_fixmap(msix, msix->adj_access_table_idx[ADJ_IDX_LAST]);
> > > +            msix->adj_access_table_idx[ADJ_IDX_LAST] = 0;
> > 
> > Isn't 0 a valid idx?  You should check for >= 0 and also set
> > to -1 once the fixmap entry has been put.
> 
> I rely here on msix using specific range of fixmap indexes
> (FIX_MSIX_TO_RESERV_BASE - FIX_MSIX_TO_RESERV_END), which isn't starting
> at 0. For this reason also, I keep unused entries at 0 (no need explicit
> initialization).

Hm, OK, then you should also check that the return of
msix_get_fixmap() is strictly > 0, as right now it's using >= (and
thus would think 0 is a valid fixmap entry).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
  2023-03-28 11:37   ` Roger Pau Monné
@ 2023-03-28 12:54   ` Jan Beulich
  2023-03-28 13:04     ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 12:54 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Jason Andryuk, Paul Durrant, Roger Pau Monné, xen-devel

On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> Some firmware/devices are found to not reset MSI-X properly, leaving
> MASKALL set. Xen relies on initial state being both disabled.
> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> just MASKALL might be unsafe if ENABLE is set, so clear them both.

But pci_reset_msix_state() comes into play only when assigning a device
to a DomU. If the tool stack doing a reset doesn't properly clear the
bit, how would it be cleared the next time round (i.e. after the guest
stopped and then possibly was started again)? It feels like the issue
wants dealing with elsewhere, possibly in the tool stack.

> --- a/xen/drivers/passthrough/msi.c
> +++ b/xen/drivers/passthrough/msi.c
> @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
>          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
>          msix->nr_entries = msix_table_size(ctrl);
>  
> +        /*
> +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> +         * initial state.
> +         */

Please can comments be accurate at least at the time of writing?
pci_reset_msix_state() doesn't care about ENABLE at all.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 12:52         ` Marek Marczykowski-Górecki
@ 2023-03-28 13:03           ` Jan Beulich
  2023-03-28 13:22             ` Roger Pau Monné
  2023-04-03  4:21             ` Marek Marczykowski-Górecki
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 13:03 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu, Roger Pau Monné

On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote:
>> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote:
>>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
>>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
>>>>> +static bool cf_check msixtbl_page_accept(
>>>>> +        const struct hvm_io_handler *handler, const ioreq_t *r)
>>>>> +{
>>>>> +    ASSERT(r->type == IOREQ_TYPE_COPY);
>>>>> +
>>>>> +    return msixtbl_page_handler_get_hwaddr(
>>>>> +            current->domain, r->addr, r->dir == IOREQ_WRITE);
>>>>
>>>> I think you want to accept it also if it's a write to the PBA, and
>>>> just drop it.  You should always pass write=false and then drop it in
>>>> msixtbl_page_write() if it falls in the PBA region (but still return
>>>> X86EMUL_OKAY).
>>>
>>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is
>>> only about accesses not hitting actual MSI-X structures.
>>
>> In his functionally similar vPCI change I did ask Roger to handle the
>> "extra" space right from the same handlers. Maybe that's going to be
>> best here, too.
> 
> I have considered this option, but msixtbl_range() is already quite
> complex, adding yet another case there won't make it easier to follow.

Do you care about the case of msixtbl_addr_to_desc() returning NULL at
all for the purpose you have? Like in Roger's patch I'd assume
msixtbl_find_entry() needs extending what ranges it accepts; if need
be another parameter may be added to cover cases where the extended
coverage isn't wanted.

> I mean, technically I can probably merge those two handlers together,
> but I don't think it will result in nicer code. Especially since the
> general direction is to abandon split of MSI-X table access handling
> between Xen and QEMU and go with just QEMU doing it, hopefully at some
> point not needing msixtbl_mmio_ops anymore (but still needing the one
> for adjacent accesses).

Hmm, at this point I'm not convinced of this plan. Instead I was hoping
that once vPCI properly supports PVH DomU-s, we may also be able to make
use of it for HVM, delegating less to qemu rather than more.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 12:54   ` Jan Beulich
@ 2023-03-28 13:04     ` Marek Marczykowski-Górecki
  2023-03-28 13:17       ` Jason Andryuk
  2023-03-28 13:23       ` Jan Beulich
  0 siblings, 2 replies; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-03-28 13:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jason Andryuk, Paul Durrant, Roger Pau Monné, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]

On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > Some firmware/devices are found to not reset MSI-X properly, leaving
> > MASKALL set. Xen relies on initial state being both disabled.
> > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> 
> But pci_reset_msix_state() comes into play only when assigning a device
> to a DomU. If the tool stack doing a reset doesn't properly clear the
> bit, how would it be cleared the next time round (i.e. after the guest
> stopped and then possibly was started again)? It feels like the issue
> wants dealing with elsewhere, possibly in the tool stack.

I may be misremembering some details, but AFAIR Xen intercepts
toolstack's (or more generally: accesses from dom0) attempt to clean
this up and once it enters an inconsistent state (or rather: starts with
such at the start of the day), there was no way to clean it up.

I have considered changing pci_reset_msix_state() to not choke on
MASKALL being set, but I'm a bit afraid of doing it, as there it seems
there is a lot of assumptions all over the place and I may miss some.

> 
> > --- a/xen/drivers/passthrough/msi.c
> > +++ b/xen/drivers/passthrough/msi.c
> > @@ -48,6 +48,13 @@ int pdev_msi_init(struct pci_dev *pdev)
> >          ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
> >          msix->nr_entries = msix_table_size(ctrl);
> >  
> > +        /*
> > +         * Clear both ENABLE and MASKALL, pci_reset_msix_state() relies on this
> > +         * initial state.
> > +         */
> 
> Please can comments be accurate at least at the time of writing?
> pci_reset_msix_state() doesn't care about ENABLE at all.
> 
> Jan

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:04     ` Marek Marczykowski-Górecki
@ 2023-03-28 13:17       ` Jason Andryuk
  2023-03-28 17:38         ` Jason Andryuk
  2023-03-28 13:23       ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2023-03-28 13:17 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Jan Beulich, Paul Durrant, Roger Pau Monné, xen-devel

On Tue, Mar 28, 2023 at 9:04 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > MASKALL set. Xen relies on initial state being both disabled.
> > > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >
> > But pci_reset_msix_state() comes into play only when assigning a device
> > to a DomU. If the tool stack doing a reset doesn't properly clear the
> > bit, how would it be cleared the next time round (i.e. after the guest
> > stopped and then possibly was started again)? It feels like the issue
> > wants dealing with elsewhere, possibly in the tool stack.
>
> I may be misremembering some details, but AFAIR Xen intercepts
> toolstack's (or more generally: accesses from dom0) attempt to clean
> this up and once it enters an inconsistent state (or rather: starts with
> such at the start of the day), there was no way to clean it up.
>
> I have considered changing pci_reset_msix_state() to not choke on
> MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> there is a lot of assumptions all over the place and I may miss some.

Hi, Marek and Jan,

Marek, thank you for working on MSI-X support.

As Jan says, the clearing here works during system boot.  However, I
have found that Xen itself is setting MASKALL in __pci_disable_msix()
when shutting down a domU.  When that is called, memory_decoded(dev)
returns false, and Xen prints "cannot disable IRQ 137: masking MSI-X
on 0000:00:14.3".  That makes the device unavailable for subsequent
domU assignment.  I haven't investigated where and why memory decoding
gets disabled for the device.

Testing was with this v2 patchset integrated into OpenXT w/ Xen 4.16.
We have some device reset changes, so I'll have to look at them again.
Hmmm, they move the libxl device reseting from pci_remove_detached()
to libxl__destroy_domid() to ensure all devices are de-assign after
the domain is destroyed.  A kernel patch implements a "more thorough
reset" which could do a slot or bus level reset, and the desire is to
have all devices deassigned before that.  Maybe the shift later is
throwing off Xen's expectations?

As Marek says, the MASKALL handling seems tricky.  Xen seems to use it
to ensure there are no interrupts, so clearing it makes one worry
about breaking something else.

Regards,
Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 13:03           ` Jan Beulich
@ 2023-03-28 13:22             ` Roger Pau Monné
  2023-04-03  4:21             ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-28 13:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, xen-devel, Jason Andryuk,
	Andrew Cooper, Wei Liu

On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote:
> On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote:
> >> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote:
> >>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> >>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> >>>>> +static bool cf_check msixtbl_page_accept(
> >>>>> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> >>>>> +{
> >>>>> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> >>>>> +
> >>>>> +    return msixtbl_page_handler_get_hwaddr(
> >>>>> +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> >>>>
> >>>> I think you want to accept it also if it's a write to the PBA, and
> >>>> just drop it.  You should always pass write=false and then drop it in
> >>>> msixtbl_page_write() if it falls in the PBA region (but still return
> >>>> X86EMUL_OKAY).
> >>>
> >>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is
> >>> only about accesses not hitting actual MSI-X structures.
> >>
> >> In his functionally similar vPCI change I did ask Roger to handle the
> >> "extra" space right from the same handlers. Maybe that's going to be
> >> best here, too.
> > 
> > I have considered this option, but msixtbl_range() is already quite
> > complex, adding yet another case there won't make it easier to follow.
> 
> Do you care about the case of msixtbl_addr_to_desc() returning NULL at
> all for the purpose you have? Like in Roger's patch I'd assume
> msixtbl_find_entry() needs extending what ranges it accepts; if need
> be another parameter may be added to cover cases where the extended
> coverage isn't wanted.
> 
> > I mean, technically I can probably merge those two handlers together,
> > but I don't think it will result in nicer code. Especially since the
> > general direction is to abandon split of MSI-X table access handling
> > between Xen and QEMU and go with just QEMU doing it, hopefully at some
> > point not needing msixtbl_mmio_ops anymore (but still needing the one
> > for adjacent accesses).
> 
> Hmm, at this point I'm not convinced of this plan. Instead I was hoping
> that once vPCI properly supports PVH DomU-s, we may also be able to make
> use of it for HVM, delegating less to qemu rather than more.

Right, but vPCI doesn't use msixtbl_mmio_ops at all.

I have to admit I don't like the split model we currently use for MSIX
handling with QEMU.  I find it difficult to understand, and prone to
errors.  IMO it would be much better if all the handling was either
done in QEMU (except for the quirk we need here to handle adjacent
accesses) or Xen by using vPCI.

Hence I proposed to Marek that given writes to MSIX entries need to be
propagated to QEMU in order for it to also track the mask bit, we
could also move the handling of the mask bit to QEMU entirely,
dropping the usage of msixtbl_mmio_ops.

At some point I expect we should be able to choose between doing the
handling in QEMU or vPCI in Xen.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:04     ` Marek Marczykowski-Górecki
  2023-03-28 13:17       ` Jason Andryuk
@ 2023-03-28 13:23       ` Jan Beulich
  2023-03-28 13:27         ` Roger Pau Monné
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 13:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Jason Andryuk, Paul Durrant, Roger Pau Monné, xen-devel

On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>> MASKALL set. Xen relies on initial state being both disabled.
>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>
>> But pci_reset_msix_state() comes into play only when assigning a device
>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>> bit, how would it be cleared the next time round (i.e. after the guest
>> stopped and then possibly was started again)? It feels like the issue
>> wants dealing with elsewhere, possibly in the tool stack.
> 
> I may be misremembering some details, but AFAIR Xen intercepts
> toolstack's (or more generally: accesses from dom0) attempt to clean
> this up and once it enters an inconsistent state (or rather: starts with
> such at the start of the day), there was no way to clean it up.

Iirc Roger and you already discussed that there needs to be an
indication of device reset having happened, so that Xen can resync
from this "behind its back" operation. That would look to be the
point/place where such inconsistencies should be eliminated.

> I have considered changing pci_reset_msix_state() to not choke on
> MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> there is a lot of assumptions all over the place and I may miss some.

Well, the comment there alone is enough justification to leave that
check as is, I would say. To drop it there, something equivalent
would likely want adding in its stead.

But you haven't really answered my earlier question: If reset leaves
MASKALL set, how would the same issue be taken care of the 2nd time
round? (If, otoh, firmware sets the bit for some reason, but reset
clears it along with ENABLE, I could see how a one-time action might
suffice. But the first sentence in the description doesn't make clear
which situation it is that we have to work around.)

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:23       ` Jan Beulich
@ 2023-03-28 13:27         ` Roger Pau Monné
  2023-03-28 13:32           ` Jason Andryuk
  0 siblings, 1 reply; 37+ messages in thread
From: Roger Pau Monné @ 2023-03-28 13:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, Jason Andryuk, Paul Durrant, xen-devel

On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>> MASKALL set. Xen relies on initial state being both disabled.
> >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>
> >> But pci_reset_msix_state() comes into play only when assigning a device
> >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >> bit, how would it be cleared the next time round (i.e. after the guest
> >> stopped and then possibly was started again)? It feels like the issue
> >> wants dealing with elsewhere, possibly in the tool stack.
> > 
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> 
> Iirc Roger and you already discussed that there needs to be an
> indication of device reset having happened, so that Xen can resync
> from this "behind its back" operation. That would look to be the
> point/place where such inconsistencies should be eliminated.

I think that was a different conversation with Huang Rui related to
the AMD GPU work, see:

https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/

I understood the problem Marek was trying to solve was that some
devices where initialized with the MASKALL bit set (likely by the
firmware?) and that prevented Xen from using them.  But now seeing the
further replies on this patch I'm unsure whether that's the case.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:27         ` Roger Pau Monné
@ 2023-03-28 13:32           ` Jason Andryuk
  2023-03-28 13:35             ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2023-03-28 13:32 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, Marek Marczykowski-Górecki, Paul Durrant, xen-devel

On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> > On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> > >>> MASKALL set. Xen relies on initial state being both disabled.
> > >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > >>
> > >> But pci_reset_msix_state() comes into play only when assigning a device
> > >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> > >> bit, how would it be cleared the next time round (i.e. after the guest
> > >> stopped and then possibly was started again)? It feels like the issue
> > >> wants dealing with elsewhere, possibly in the tool stack.
> > >
> > > I may be misremembering some details, but AFAIR Xen intercepts
> > > toolstack's (or more generally: accesses from dom0) attempt to clean
> > > this up and once it enters an inconsistent state (or rather: starts with
> > > such at the start of the day), there was no way to clean it up.
> >
> > Iirc Roger and you already discussed that there needs to be an
> > indication of device reset having happened, so that Xen can resync
> > from this "behind its back" operation. That would look to be the
> > point/place where such inconsistencies should be eliminated.
>
> I think that was a different conversation with Huang Rui related to
> the AMD GPU work, see:
>
> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>
> I understood the problem Marek was trying to solve was that some
> devices where initialized with the MASKALL bit set (likely by the
> firmware?) and that prevented Xen from using them.  But now seeing the
> further replies on this patch I'm unsure whether that's the case.

In my case, Xen's setting of MASKALL persists through a warm reboot,
so Xen sees it set when booting.  On a cold boot, MASKALL is not set.

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:32           ` Jason Andryuk
@ 2023-03-28 13:35             ` Jan Beulich
  2023-03-28 13:43               ` Jason Andryuk
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 13:35 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Marek Marczykowski-Górecki, Paul Durrant, xen-devel,
	Roger Pau Monné

On 28.03.2023 15:32, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>
>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>
>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>> such at the start of the day), there was no way to clean it up.
>>>
>>> Iirc Roger and you already discussed that there needs to be an
>>> indication of device reset having happened, so that Xen can resync
>>> from this "behind its back" operation. That would look to be the
>>> point/place where such inconsistencies should be eliminated.
>>
>> I think that was a different conversation with Huang Rui related to
>> the AMD GPU work, see:
>>
>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>
>> I understood the problem Marek was trying to solve was that some
>> devices where initialized with the MASKALL bit set (likely by the
>> firmware?) and that prevented Xen from using them.  But now seeing the
>> further replies on this patch I'm unsure whether that's the case.
> 
> In my case, Xen's setting of MASKALL persists through a warm reboot,

And does this get in the way of Dom0 using the device? (Before a DomU
gets to use it, things should be properly reset anyway.)

Jan

> so Xen sees it set when booting.  On a cold boot, MASKALL is not set.
> 
> Regards,
> Jason



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:35             ` Jan Beulich
@ 2023-03-28 13:43               ` Jason Andryuk
  2023-03-28 13:54                 ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2023-03-28 13:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, Paul Durrant, xen-devel,
	Roger Pau Monné

On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.03.2023 15:32, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>
> >>>>> But pci_reset_msix_state() comes into play only when assigning a device
> >>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>
> >>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>> this up and once it enters an inconsistent state (or rather: starts with
> >>>> such at the start of the day), there was no way to clean it up.
> >>>
> >>> Iirc Roger and you already discussed that there needs to be an
> >>> indication of device reset having happened, so that Xen can resync
> >>> from this "behind its back" operation. That would look to be the
> >>> point/place where such inconsistencies should be eliminated.
> >>
> >> I think that was a different conversation with Huang Rui related to
> >> the AMD GPU work, see:
> >>
> >> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>
> >> I understood the problem Marek was trying to solve was that some
> >> devices where initialized with the MASKALL bit set (likely by the
> >> firmware?) and that prevented Xen from using them.  But now seeing the
> >> further replies on this patch I'm unsure whether that's the case.
> >
> > In my case, Xen's setting of MASKALL persists through a warm reboot,
>
> And does this get in the way of Dom0 using the device? (Before a DomU
> gets to use it, things should be properly reset anyway.)

Dom0 doesn't have drivers for the device, so I am not sure.  I don't
seem to have the logs around, but I believe when MASKALL is set, the
initial quarantine of the device fails.  Yes, some notes I have
mention:

It's getting -EBUSY from pdev_msix_assign() which means
pci_reset_msix_state() is failing:
    if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
         PCI_MSIX_FLAGS_MASKALL )
        return -EBUSY;

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:43               ` Jason Andryuk
@ 2023-03-28 13:54                 ` Jan Beulich
  2023-03-28 15:08                   ` Jason Andryuk
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 13:54 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Marek Marczykowski-Górecki, Paul Durrant, xen-devel,
	Roger Pau Monné

On 28.03.2023 15:43, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.03.2023 15:32, Jason Andryuk wrote:
>>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>>>
>>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>>>
>>>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>>>> such at the start of the day), there was no way to clean it up.
>>>>>
>>>>> Iirc Roger and you already discussed that there needs to be an
>>>>> indication of device reset having happened, so that Xen can resync
>>>>> from this "behind its back" operation. That would look to be the
>>>>> point/place where such inconsistencies should be eliminated.
>>>>
>>>> I think that was a different conversation with Huang Rui related to
>>>> the AMD GPU work, see:
>>>>
>>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>>>
>>>> I understood the problem Marek was trying to solve was that some
>>>> devices where initialized with the MASKALL bit set (likely by the
>>>> firmware?) and that prevented Xen from using them.  But now seeing the
>>>> further replies on this patch I'm unsure whether that's the case.
>>>
>>> In my case, Xen's setting of MASKALL persists through a warm reboot,
>>
>> And does this get in the way of Dom0 using the device? (Before a DomU
>> gets to use it, things should be properly reset anyway.)
> 
> Dom0 doesn't have drivers for the device, so I am not sure.  I don't
> seem to have the logs around, but I believe when MASKALL is set, the
> initial quarantine of the device fails.  Yes, some notes I have
> mention:
> 
> It's getting -EBUSY from pdev_msix_assign() which means
> pci_reset_msix_state() is failing:
>     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
>          PCI_MSIX_FLAGS_MASKALL )
>         return -EBUSY;

Arguably this check may want skipping when moving to quarantine. I'd
still be curious to know whether the device works in Dom0, and
confirmation of device reset's effect on the bit would also be helpful.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:54                 ` Jan Beulich
@ 2023-03-28 15:08                   ` Jason Andryuk
  2023-03-28 15:39                     ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2023-03-28 15:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, Paul Durrant, xen-devel,
	Roger Pau Monné

On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 28.03.2023 15:43, Jason Andryuk wrote:
> > On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 28.03.2023 15:32, Jason Andryuk wrote:
> >>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> >>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> >>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
> >>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>>>>>>
> >>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
> >>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
> >>>>>>> stopped and then possibly was started again)? It feels like the issue
> >>>>>>> wants dealing with elsewhere, possibly in the tool stack.
> >>>>>>
> >>>>>> I may be misremembering some details, but AFAIR Xen intercepts
> >>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
> >>>>>> this up and once it enters an inconsistent state (or rather: starts with
> >>>>>> such at the start of the day), there was no way to clean it up.
> >>>>>
> >>>>> Iirc Roger and you already discussed that there needs to be an
> >>>>> indication of device reset having happened, so that Xen can resync
> >>>>> from this "behind its back" operation. That would look to be the
> >>>>> point/place where such inconsistencies should be eliminated.
> >>>>
> >>>> I think that was a different conversation with Huang Rui related to
> >>>> the AMD GPU work, see:
> >>>>
> >>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
> >>>>
> >>>> I understood the problem Marek was trying to solve was that some
> >>>> devices where initialized with the MASKALL bit set (likely by the
> >>>> firmware?) and that prevented Xen from using them.  But now seeing the
> >>>> further replies on this patch I'm unsure whether that's the case.
> >>>
> >>> In my case, Xen's setting of MASKALL persists through a warm reboot,
> >>
> >> And does this get in the way of Dom0 using the device? (Before a DomU
> >> gets to use it, things should be properly reset anyway.)
> >
> > Dom0 doesn't have drivers for the device, so I am not sure.  I don't
> > seem to have the logs around, but I believe when MASKALL is set, the
> > initial quarantine of the device fails.  Yes, some notes I have
> > mention:
> >
> > It's getting -EBUSY from pdev_msix_assign() which means
> > pci_reset_msix_state() is failing:
> >     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
> >          PCI_MSIX_FLAGS_MASKALL )
> >         return -EBUSY;
>
> Arguably this check may want skipping when moving to quarantine. I'd
> still be curious to know whether the device works in Dom0, and
> confirmation of device reset's effect on the bit would also be helpful.

echo 1 > /sys/.../reset does not clear MASKALL.

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 15:08                   ` Jason Andryuk
@ 2023-03-28 15:39                     ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2023-03-28 15:39 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Marek Marczykowski-Górecki, Paul Durrant, xen-devel,
	Roger Pau Monné

On 28.03.2023 17:08, Jason Andryuk wrote:
> On Tue, Mar 28, 2023 at 9:54 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 28.03.2023 15:43, Jason Andryuk wrote:
>>> On Tue, Mar 28, 2023 at 9:35 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 28.03.2023 15:32, Jason Andryuk wrote:
>>>>> On Tue, Mar 28, 2023 at 9:28 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>>> On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
>>>>>>> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
>>>>>>>> On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
>>>>>>>>> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
>>>>>>>>>> Some firmware/devices are found to not reset MSI-X properly, leaving
>>>>>>>>>> MASKALL set. Xen relies on initial state being both disabled.
>>>>>>>>>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
>>>>>>>>>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
>>>>>>>>>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
>>>>>>>>>
>>>>>>>>> But pci_reset_msix_state() comes into play only when assigning a device
>>>>>>>>> to a DomU. If the tool stack doing a reset doesn't properly clear the
>>>>>>>>> bit, how would it be cleared the next time round (i.e. after the guest
>>>>>>>>> stopped and then possibly was started again)? It feels like the issue
>>>>>>>>> wants dealing with elsewhere, possibly in the tool stack.
>>>>>>>>
>>>>>>>> I may be misremembering some details, but AFAIR Xen intercepts
>>>>>>>> toolstack's (or more generally: accesses from dom0) attempt to clean
>>>>>>>> this up and once it enters an inconsistent state (or rather: starts with
>>>>>>>> such at the start of the day), there was no way to clean it up.
>>>>>>>
>>>>>>> Iirc Roger and you already discussed that there needs to be an
>>>>>>> indication of device reset having happened, so that Xen can resync
>>>>>>> from this "behind its back" operation. That would look to be the
>>>>>>> point/place where such inconsistencies should be eliminated.
>>>>>>
>>>>>> I think that was a different conversation with Huang Rui related to
>>>>>> the AMD GPU work, see:
>>>>>>
>>>>>> https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/
>>>>>>
>>>>>> I understood the problem Marek was trying to solve was that some
>>>>>> devices where initialized with the MASKALL bit set (likely by the
>>>>>> firmware?) and that prevented Xen from using them.  But now seeing the
>>>>>> further replies on this patch I'm unsure whether that's the case.
>>>>>
>>>>> In my case, Xen's setting of MASKALL persists through a warm reboot,
>>>>
>>>> And does this get in the way of Dom0 using the device? (Before a DomU
>>>> gets to use it, things should be properly reset anyway.)
>>>
>>> Dom0 doesn't have drivers for the device, so I am not sure.  I don't
>>> seem to have the logs around, but I believe when MASKALL is set, the
>>> initial quarantine of the device fails.  Yes, some notes I have
>>> mention:
>>>
>>> It's getting -EBUSY from pdev_msix_assign() which means
>>> pci_reset_msix_state() is failing:
>>>     if ( pci_conf_read16(pdev->sbdf, msix_control_reg(pos)) &
>>>          PCI_MSIX_FLAGS_MASKALL )
>>>         return -EBUSY;
>>
>> Arguably this check may want skipping when moving to quarantine. I'd
>> still be curious to know whether the device works in Dom0, and
>> confirmation of device reset's effect on the bit would also be helpful.
> 
> echo 1 > /sys/.../reset does not clear MASKALL.

Well, I think that - as proposed elsewhere - we need the kernel to tell
us about resets, and then we could clear the bit from there. (I didn't
check whether the spec permits the bit to remain set, or whether this
is a device erratum.)

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot
  2023-03-28 13:17       ` Jason Andryuk
@ 2023-03-28 17:38         ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2023-03-28 17:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Jan Beulich, Paul Durrant, Roger Pau Monné, xen-devel

On Tue, Mar 28, 2023 at 9:17 AM Jason Andryuk <jandryuk@gmail.com> wrote:
>
> On Tue, Mar 28, 2023 at 9:04 AM Marek Marczykowski-Górecki
> <marmarek@invisiblethingslab.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> > > On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> > > > Some firmware/devices are found to not reset MSI-X properly, leaving
> > > > MASKALL set. Xen relies on initial state being both disabled.
> > > > Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> > > > setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> > > > just MASKALL might be unsafe if ENABLE is set, so clear them both.
> > >
> > > But pci_reset_msix_state() comes into play only when assigning a device
> > > to a DomU. If the tool stack doing a reset doesn't properly clear the
> > > bit, how would it be cleared the next time round (i.e. after the guest
> > > stopped and then possibly was started again)? It feels like the issue
> > > wants dealing with elsewhere, possibly in the tool stack.
> >
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> >
> > I have considered changing pci_reset_msix_state() to not choke on
> > MASKALL being set, but I'm a bit afraid of doing it, as there it seems
> > there is a lot of assumptions all over the place and I may miss some.
>
> Hi, Marek and Jan,
>
> Marek, thank you for working on MSI-X support.
>
> As Jan says, the clearing here works during system boot.  However, I
> have found that Xen itself is setting MASKALL in __pci_disable_msix()
> when shutting down a domU.  When that is called, memory_decoded(dev)
> returns false, and Xen prints "cannot disable IRQ 137: masking MSI-X
> on 0000:00:14.3".  That makes the device unavailable for subsequent
> domU assignment.  I haven't investigated where and why memory decoding
> gets disabled for the device.
>
> Testing was with this v2 patchset integrated into OpenXT w/ Xen 4.16.
> We have some device reset changes, so I'll have to look at them again.
> Hmmm, they move the libxl device reseting from pci_remove_detached()
> to libxl__destroy_domid() to ensure all devices are de-assign after
> the domain is destroyed.  A kernel patch implements a "more thorough
> reset" which could do a slot or bus level reset, and the desire is to
> have all devices deassigned before that.  Maybe the shift later is
> throwing off Xen's expectations?

I dropped the OpenXT libxl patch, and Xen is not setting MASKALL.

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-03-28 13:03           ` Jan Beulich
  2023-03-28 13:22             ` Roger Pau Monné
@ 2023-04-03  4:21             ` Marek Marczykowski-Górecki
  2023-04-03 11:09               ` Jan Beulich
  1 sibling, 1 reply; 37+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-04-03  4:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 2963 bytes --]

On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote:
> On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:34:23PM +0200, Jan Beulich wrote:
> >> On 28.03.2023 14:05, Marek Marczykowski-Górecki wrote:
> >>> On Tue, Mar 28, 2023 at 01:28:44PM +0200, Roger Pau Monné wrote:
> >>>> On Sat, Mar 25, 2023 at 03:49:23AM +0100, Marek Marczykowski-Górecki wrote:
> >>>>> +static bool cf_check msixtbl_page_accept(
> >>>>> +        const struct hvm_io_handler *handler, const ioreq_t *r)
> >>>>> +{
> >>>>> +    ASSERT(r->type == IOREQ_TYPE_COPY);
> >>>>> +
> >>>>> +    return msixtbl_page_handler_get_hwaddr(
> >>>>> +            current->domain, r->addr, r->dir == IOREQ_WRITE);
> >>>>
> >>>> I think you want to accept it also if it's a write to the PBA, and
> >>>> just drop it.  You should always pass write=false and then drop it in
> >>>> msixtbl_page_write() if it falls in the PBA region (but still return
> >>>> X86EMUL_OKAY).
> >>>
> >>> I don't want to interfere with msixtbl_mmio_page_ops, this handler is
> >>> only about accesses not hitting actual MSI-X structures.
> >>
> >> In his functionally similar vPCI change I did ask Roger to handle the
> >> "extra" space right from the same handlers. Maybe that's going to be
> >> best here, too.
> > 
> > I have considered this option, but msixtbl_range() is already quite
> > complex, adding yet another case there won't make it easier to follow.
> 
> Do you care about the case of msixtbl_addr_to_desc() returning NULL at
> all for the purpose you have? 

IIUC I care specifically about this case.

> Like in Roger's patch I'd assume
> msixtbl_find_entry() needs extending what ranges it accepts; if need
> be another parameter may be added to cover cases where the extended
> coverage isn't wanted.
>
> > I mean, technically I can probably merge those two handlers together,
> > but I don't think it will result in nicer code. Especially since the
> > general direction is to abandon split of MSI-X table access handling
> > between Xen and QEMU and go with just QEMU doing it, hopefully at some
> > point not needing msixtbl_mmio_ops anymore (but still needing the one
> > for adjacent accesses).
> 
> Hmm, at this point I'm not convinced of this plan. Instead I was hoping
> that once vPCI properly supports PVH DomU-s, we may also be able to make
> use of it for HVM, delegating less to qemu rather than more.

In that case, this code won't be needed anymore, which will also make
this handler unnecessary.

Anyway, I tried to merge this handling into existing handlers and the
resulting patch is slightly bigger, so it doesn't seem to avoid any
duplication. The only benefit I can think of is avoiding iterating
msixtbl_list twice (for respective accept callbacks) on each access. Is
it worth a bit more complicated handlers?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table
  2023-04-03  4:21             ` Marek Marczykowski-Górecki
@ 2023-04-03 11:09               ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2023-04-03 11:09 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, Jason Andryuk, Andrew Cooper, Wei Liu, Roger Pau Monné

On 03.04.2023 06:21, Marek Marczykowski-Górecki wrote:
> On Tue, Mar 28, 2023 at 03:03:17PM +0200, Jan Beulich wrote:
>> On 28.03.2023 14:52, Marek Marczykowski-Górecki wrote:
>>> I mean, technically I can probably merge those two handlers together,
>>> but I don't think it will result in nicer code. Especially since the
>>> general direction is to abandon split of MSI-X table access handling
>>> between Xen and QEMU and go with just QEMU doing it, hopefully at some
>>> point not needing msixtbl_mmio_ops anymore (but still needing the one
>>> for adjacent accesses).
>>
>> Hmm, at this point I'm not convinced of this plan. Instead I was hoping
>> that once vPCI properly supports PVH DomU-s, we may also be able to make
>> use of it for HVM, delegating less to qemu rather than more.
> 
> In that case, this code won't be needed anymore, which will also make
> this handler unnecessary.
> 
> Anyway, I tried to merge this handling into existing handlers and the
> resulting patch is slightly bigger, so it doesn't seem to avoid any
> duplication. The only benefit I can think of is avoiding iterating
> msixtbl_list twice (for respective accept callbacks) on each access. Is
> it worth a bit more complicated handlers?

Well, limiting duplication (if possible) is only one aspect. Again
referring to Roger's functionally similar vPCI work, an important (from
my pov) aspect is to avoid consuming another handler slot (via a new
call to hvm_next_io_handler()).

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2023-04-03 11:09 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-25  2:49 [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Marek Marczykowski-Górecki
2023-03-25  2:49 ` [PATCH v2 2/3] x86/hvm: Allow writes to registers on the same page as MSI-X table Marek Marczykowski-Górecki
2023-03-27 10:47   ` Andrew Cooper
2023-03-28 11:28   ` Roger Pau Monné
2023-03-28 12:05     ` Marek Marczykowski-Górecki
2023-03-28 12:34       ` Jan Beulich
2023-03-28 12:52         ` Marek Marczykowski-Górecki
2023-03-28 13:03           ` Jan Beulich
2023-03-28 13:22             ` Roger Pau Monné
2023-04-03  4:21             ` Marek Marczykowski-Górecki
2023-04-03 11:09               ` Jan Beulich
2023-03-28 12:52       ` Roger Pau Monné
2023-03-25  2:49 ` [PATCH v2 3/3] x86/msi: clear initial MSI-X state on boot Marek Marczykowski-Górecki
2023-03-28 11:37   ` Roger Pau Monné
2023-03-28 12:07     ` Marek Marczykowski-Górecki
2023-03-28 12:38       ` Jan Beulich
2023-03-28 12:54   ` Jan Beulich
2023-03-28 13:04     ` Marek Marczykowski-Górecki
2023-03-28 13:17       ` Jason Andryuk
2023-03-28 17:38         ` Jason Andryuk
2023-03-28 13:23       ` Jan Beulich
2023-03-28 13:27         ` Roger Pau Monné
2023-03-28 13:32           ` Jason Andryuk
2023-03-28 13:35             ` Jan Beulich
2023-03-28 13:43               ` Jason Andryuk
2023-03-28 13:54                 ` Jan Beulich
2023-03-28 15:08                   ` Jason Andryuk
2023-03-28 15:39                     ` Jan Beulich
2023-03-27 10:12 ` [PATCH v2 1/3] x86/msi: passthrough all MSI-X vector ctrl writes to device model Roger Pau Monné
2023-03-27 10:26   ` Marek Marczykowski-Górecki
2023-03-27 10:51     ` Roger Pau Monné
2023-03-27 11:34       ` Marek Marczykowski-Górecki
2023-03-27 13:29         ` Roger Pau Monné
2023-03-27 14:20           ` Marek Marczykowski-Górecki
2023-03-27 14:37             ` Roger Pau Monné
2023-03-27 15:32   ` Jan Beulich
2023-03-27 15:42     ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.