All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
@ 2018-12-20 15:29 Chao Gao
  2018-12-20 15:29 ` [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Chao Gao @ 2018-12-20 15:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Jan Beulich, Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
    ->free_domain_pirqs
        ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
            ->pirq_guest_force_unbind
                ->__pirq_guest_unbind
                    ->mask_msi_irq(=desc->handler->disable())
                        ->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v4:
 - split out change to 'msix->warned' field
 - handle multiple msi cases
 - use list_first_entry_or_null to traverse 'pdev->msi_list'
---
 xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
 xen/drivers/passthrough/pci.c | 51 ++++++++++++++++++++++++++++++++++++++
 xen/include/xen/iommu.h       |  1 +
 3 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index a6eb8a4..56ee1ef 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -619,6 +619,42 @@ int pt_irq_create_bind(
     return 0;
 }
 
+static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
+         list_empty(&pirq_dpci->digl_list) )
+    {
+        pirq_guest_unbind(d, pirq);
+        msixtbl_pt_unregister(d, pirq);
+        if ( pt_irq_need_timer(pirq_dpci->flags) )
+            kill_timer(&pirq_dpci->timer);
+        pirq_dpci->flags = 0;
+        /*
+         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
+         * call to pt_pirq_softirq_reset.
+         */
+        pt_pirq_softirq_reset(pirq_dpci);
+
+        pirq_cleanup_check(pirq, d);
+    }
+}
+
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
+
+    pt_irq_destroy_bind_common(d, pirq);
+}
+
 int pt_irq_destroy_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
         }
         else
             what = "bogus";
-    }
-    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
-        pi_update_irte(NULL, pirq, 0);
-
-    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
-         list_empty(&pirq_dpci->digl_list) )
-    {
-        pirq_guest_unbind(d, pirq);
-        msixtbl_pt_unregister(d, pirq);
-        if ( pt_irq_need_timer(pirq_dpci->flags) )
-            kill_timer(&pirq_dpci->timer);
-        pirq_dpci->flags = 0;
-        /*
-         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
-         * call to pt_pirq_softirq_reset.
-         */
-        pt_pirq_softirq_reset(pirq_dpci);
 
-        pirq_cleanup_check(pirq, d);
+        pt_irq_destroy_bind_common(d, pirq);
     }
+    else
+        pt_irq_destroy_bind_msi(d, pirq);
 
     spin_unlock(&d->event_lock);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 1277ce2..ccae0ec 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1514,6 +1514,55 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     return rc;
 }
 
+/*
+ * Unmap established mappings between domain's pirq and device's MSI.
+ * These mappings were set up by qemu/guest and are expected to be
+ * destroyed when changing the device's ownership.
+ */
+static void pci_unmap_msi(struct pci_dev *pdev)
+{
+    struct msi_desc *entry;
+    struct domain *d = pdev->domain;
+
+    ASSERT(pcidevs_locked());
+
+    if ( !d )
+        return;
+
+    spin_lock(&d->event_lock);
+    while ( (entry = list_first_entry_or_null(&pdev->msi_list,
+                                              struct msi_desc, list)) != NULL )
+    {
+        struct pirq *info;
+        int pirq = 0;
+        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
+                          ? entry->msi.nvec : 1;
+
+        while ( nr -- )
+        {
+            struct hvm_pirq_dpci *pirq_dpci;
+
+            pirq = domain_irq_to_pirq(d, entry[nr].irq);
+            WARN_ON(pirq < 0);
+            if ( pirq <= 0 )
+                continue;
+
+            info = pirq_info(d, pirq);
+            if ( !info )
+                continue;
+
+            pirq_dpci = pirq_dpci(info);
+            if ( pirq_dpci &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
+                pt_irq_destroy_bind_msi(d, info);
+        }
+        if ( pirq > 0 )
+            unmap_domain_pirq(d, pirq);
+    }
+    spin_unlock(&d->event_lock);
+}
+
 /* caller should hold the pcidevs_lock */
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 {
@@ -1529,6 +1578,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !pdev )
         return -ENODEV;
 
+    pci_unmap_msi(pdev);
+
     while ( pdev->phantom_stride )
     {
         devfn += pdev->phantom_stride;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 3d78126..8aecf43 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -108,6 +108,7 @@ struct pirq;
 int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
 int pt_irq_create_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
 int pt_irq_destroy_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq);
 
 void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest
  2018-12-20 15:29 [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
@ 2018-12-20 15:29 ` Chao Gao
  2019-01-02 11:57   ` Wei Liu
  2018-12-20 15:29 ` [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
  2018-12-21 10:13 ` [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-12-20 15:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, Chao Gao,
	Roger Pau Monné

When I destroyed a guest with 'xl destroy', I found the warning
in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
to that place, I got the call trace below:

(XEN) Xen call trace:
(XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
(XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
(XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
(XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
(XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
(XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
(XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
(XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
(XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
(XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
(XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
(XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
(XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
(XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
(XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
(XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
(XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
(XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
(XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
(XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270

It seems to me that guest is trying to mask a msi while the memory decoding
of the device is disabled. Performing a device reset without proper method
to avoid guest's MSI-X operation would lead to this issue.

The fix is basic - detach pci device before resetting the device.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/libxl/libxl_pci.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..855fb71 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1459,17 +1459,17 @@ skip1:
         fclose(f);
     }
 out:
-    /* don't do multiple resets while some functions are still passed through */
-    if ( (pcidev->vdevfn & 0x7) == 0 ) {
-        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
-    }
-
     if (!isstubdom) {
         rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
         if (rc < 0 && (hvm || errno != ENOSYS))
             LOGED(ERROR, domainid, "xc_deassign_device failed");
     }
 
+    /* don't do multiple resets while some functions are still passed through */
+    if ( (pcidev->vdevfn & 0x7) == 0 ) {
+        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+    }
+
     stubdomid = libxl_get_stubdom_id(ctx, domid);
     if (stubdomid != 0) {
         libxl_device_pci pcidev_s = *pcidev;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly
  2018-12-20 15:29 [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2018-12-20 15:29 ` [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
@ 2018-12-20 15:29 ` Chao Gao
  2018-12-21 10:17   ` Jan Beulich
  2018-12-21 10:13 ` [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-12-20 15:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich, Chao Gao

Also clean up current code by moving initialization of arch specific
fields out of common code.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v4:
 - newly added
---
 xen/drivers/passthrough/pci.c | 2 +-
 xen/include/asm-x86/msi.h     | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ccae0ec..dad8be6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -367,7 +367,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
             xfree(pdev);
             return NULL;
         }
-        spin_lock_init(&msix->table_lock);
+        init_arch_msix(msix);
         pdev->msix = msix;
     }
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 10387dc..c2293ab 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -252,5 +252,10 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask);
 void ack_nonmaskable_msi_irq(struct irq_desc *);
 void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
 void set_msi_affinity(struct irq_desc *, const cpumask_t *);
+static inline void init_arch_msix(struct arch_msix *msix)
+{
+    spin_lock_init(&msix->table_lock);
+    msix->warned = DOMID_INVALID;
+}
 
 #endif /* __ASM_MSI_H */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-20 15:29 [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2018-12-20 15:29 ` [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
  2018-12-20 15:29 ` [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
@ 2018-12-21 10:13 ` Jan Beulich
  2018-12-21 10:26   ` Roger Pau Monné
  2018-12-21 15:21   ` Chao Gao
  2 siblings, 2 replies; 12+ messages in thread
From: Jan Beulich @ 2018-12-21 10:13 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 20.12.18 at 16:29, <chao.gao@intel.com> wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1514,6 +1514,55 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      return rc;
>  }
>  
> +/*
> + * Unmap established mappings between domain's pirq and device's MSI.
> + * These mappings were set up by qemu/guest and are expected to be
> + * destroyed when changing the device's ownership.
> + */
> +static void pci_unmap_msi(struct pci_dev *pdev)
> +{
> +    struct msi_desc *entry;
> +    struct domain *d = pdev->domain;
> +
> +    ASSERT(pcidevs_locked());
> +
> +    if ( !d )
> +        return;

Why? deassign_device() (the only caller) ought to guarantee this,
due to its use of pci_get_pdev_by_domain(). I think this simply
wants to be another ASSERT(), if anything at all.

> +    spin_lock(&d->event_lock);
> +    while ( (entry = list_first_entry_or_null(&pdev->msi_list,
> +                                              struct msi_desc, list)) != NULL )
> +    {
> +        struct pirq *info;
> +        int pirq = 0;
> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> +                          ? entry->msi.nvec : 1;
> +
> +        while ( nr -- )

Stray blank.

> +        {
> +            struct hvm_pirq_dpci *pirq_dpci;
> +
> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
> +            WARN_ON(pirq < 0);
> +            if ( pirq <= 0 )
> +                continue;
> +
> +            info = pirq_info(d, pirq);
> +            if ( !info )
> +                continue;
> +
> +            pirq_dpci = pirq_dpci(info);
> +            if ( pirq_dpci &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
> +                pt_irq_destroy_bind_msi(d, info);
> +        }
> +        if ( pirq > 0 )
> +            unmap_domain_pirq(d, pirq);

Can you guarantee that this function won't fail? Because if it
does, I think you might end up in an infinite loop, because the
entry doesn't always get removed from the list in error cases.
Maybe unmap_domain_pirq() needs a "force" mode added,
perhaps indirectly by way of passing "entry" into it (all other
callers would pass NULL).

But then again I'm still not fully convinced that a hypervisor
change is the right course of action here in the first place. It
would be better if the hypervisor had to just verify that all
IRQ mappings are gone, or else fail the de-assignment of the
device.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly
  2018-12-20 15:29 ` [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
@ 2018-12-21 10:17   ` Jan Beulich
  2018-12-21 10:37     ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-12-21 10:17 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 20.12.18 at 16:29, <chao.gao@intel.com> wrote:
> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -252,5 +252,10 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask);
>  void ack_nonmaskable_msi_irq(struct irq_desc *);
>  void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
>  void set_msi_affinity(struct irq_desc *, const cpumask_t *);
> +static inline void init_arch_msix(struct arch_msix *msix)
> +{
> +    spin_lock_init(&msix->table_lock);
> +    msix->warned = DOMID_INVALID;
> +}

I think this would better sit next to the structure definition,
i.e. a few lines up. In any event a separating blank line
needs adding. With that
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Additionally perhaps arch_init_msix() would better fit our
general naming of arch-specific functions.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-21 10:13 ` [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
@ 2018-12-21 10:26   ` Roger Pau Monné
  2018-12-21 10:32     ` Jan Beulich
  2018-12-21 15:21   ` Chao Gao
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2018-12-21 10:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Chao Gao

On Fri, Dec 21, 2018 at 03:13:50AM -0700, Jan Beulich wrote:
> But then again I'm still not fully convinced that a hypervisor
> change is the right course of action here in the first place. It
> would be better if the hypervisor had to just verify that all
> IRQ mappings are gone, or else fail the de-assignment of the
> device.

The only component (except the hypervisor) that knows about such
assignments is QEMU, and in the case of a QEMU crash the host would be
left with a device that cannot be de-assigned, because the information
about the PIRQ bindings in lost due to the QEMU crash.

IMO Xen needs to be capable of cleaning any bindings and mappings done
by the toolstack or the device model in order to be able to correctly
recover from a device model or toolstack crash.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-21 10:26   ` Roger Pau Monné
@ 2018-12-21 10:32     ` Jan Beulich
  2018-12-21 11:32       ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-12-21 10:32 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Chao Gao

>>> On 21.12.18 at 11:26, <roger.pau@citrix.com> wrote:
> On Fri, Dec 21, 2018 at 03:13:50AM -0700, Jan Beulich wrote:
>> But then again I'm still not fully convinced that a hypervisor
>> change is the right course of action here in the first place. It
>> would be better if the hypervisor had to just verify that all
>> IRQ mappings are gone, or else fail the de-assignment of the
>> device.
> 
> The only component (except the hypervisor) that knows about such
> assignments is QEMU, and in the case of a QEMU crash the host would be
> left with a device that cannot be de-assigned, because the information
> about the PIRQ bindings in lost due to the QEMU crash.
> 
> IMO Xen needs to be capable of cleaning any bindings and mappings done
> by the toolstack or the device model in order to be able to correctly
> recover from a device model or toolstack crash.

But possibly with tool stack assistance: Rather than doing it (in a
potentially fragile way, as per my other comments) as an integral
part of deassign-device, it could be a separate domctl to be
issued first. Or otherwise failure here ought to lead to failure of
deassign-device, rather than (e.g.) an infinite loop.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly
  2018-12-21 10:17   ` Jan Beulich
@ 2018-12-21 10:37     ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2018-12-21 10:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Chao Gao

On Fri, Dec 21, 2018 at 03:17:30AM -0700, Jan Beulich wrote:
> >>> On 20.12.18 at 16:29, <chao.gao@intel.com> wrote:
> > --- a/xen/include/asm-x86/msi.h
> > +++ b/xen/include/asm-x86/msi.h
> > @@ -252,5 +252,10 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask);
> >  void ack_nonmaskable_msi_irq(struct irq_desc *);
> >  void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector);
> >  void set_msi_affinity(struct irq_desc *, const cpumask_t *);
> > +static inline void init_arch_msix(struct arch_msix *msix)
> > +{
> > +    spin_lock_init(&msix->table_lock);
> > +    msix->warned = DOMID_INVALID;
> > +}
> 
> I think this would better sit next to the structure definition,
> i.e. a few lines up. In any event a separating blank line
> needs adding. With that
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Additionally perhaps arch_init_msix() would better fit our
> general naming of arch-specific functions.

With Jan's comments addresses you can also add my

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-21 10:32     ` Jan Beulich
@ 2018-12-21 11:32       ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2018-12-21 11:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Chao Gao

On Fri, Dec 21, 2018 at 03:32:47AM -0700, Jan Beulich wrote:
> >>> On 21.12.18 at 11:26, <roger.pau@citrix.com> wrote:
> > On Fri, Dec 21, 2018 at 03:13:50AM -0700, Jan Beulich wrote:
> >> But then again I'm still not fully convinced that a hypervisor
> >> change is the right course of action here in the first place. It
> >> would be better if the hypervisor had to just verify that all
> >> IRQ mappings are gone, or else fail the de-assignment of the
> >> device.
> > 
> > The only component (except the hypervisor) that knows about such
> > assignments is QEMU, and in the case of a QEMU crash the host would be
> > left with a device that cannot be de-assigned, because the information
> > about the PIRQ bindings in lost due to the QEMU crash.
> > 
> > IMO Xen needs to be capable of cleaning any bindings and mappings done
> > by the toolstack or the device model in order to be able to correctly
> > recover from a device model or toolstack crash.
> 
> But possibly with tool stack assistance: Rather than doing it (in a
> potentially fragile way, as per my other comments) as an integral
> part of deassign-device, it could be a separate domctl to be
> issued first.

I don't have a strong opinion whether a new hypercall would be better
than just hooking this logic into the current deassign hypercall, as
long as it's robust.

> Or otherwise failure here ought to lead to failure of
> deassign-device, rather than (e.g.) an infinite loop.

Yes, I fully agree it needs to be robust, ideally the
unbinding/unmapping should always succeed, or at least don't get stuck
into an infinite loop.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-21 10:13 ` [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
  2018-12-21 10:26   ` Roger Pau Monné
@ 2018-12-21 15:21   ` Chao Gao
  2019-01-04  9:15     ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-12-21 15:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne

On Fri, Dec 21, 2018 at 03:13:50AM -0700, Jan Beulich wrote:
>>>> On 20.12.18 at 16:29, <chao.gao@intel.com> wrote:
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1514,6 +1514,55 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Unmap established mappings between domain's pirq and device's MSI.
>> + * These mappings were set up by qemu/guest and are expected to be
>> + * destroyed when changing the device's ownership.
>> + */
>> +static void pci_unmap_msi(struct pci_dev *pdev)
>> +{
>> +    struct msi_desc *entry;
>> +    struct domain *d = pdev->domain;
>> +
>> +    ASSERT(pcidevs_locked());
>> +
>> +    if ( !d )
>> +        return;
>
>Why? deassign_device() (the only caller) ought to guarantee this,
>due to its use of pci_get_pdev_by_domain(). I think this simply
>wants to be another ASSERT(), if anything at all.
>
>> +    spin_lock(&d->event_lock);
>> +    while ( (entry = list_first_entry_or_null(&pdev->msi_list,
>> +                                              struct msi_desc, list)) != NULL )
>> +    {
>> +        struct pirq *info;
>> +        int pirq = 0;
>> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> +                          ? entry->msi.nvec : 1;
>> +
>> +        while ( nr -- )
>
>Stray blank.
>
>> +        {
>> +            struct hvm_pirq_dpci *pirq_dpci;
>> +
>> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
>> +            WARN_ON(pirq < 0);
>> +            if ( pirq <= 0 )
>> +                continue;
>> +
>> +            info = pirq_info(d, pirq);
>> +            if ( !info )
>> +                continue;
>> +
>> +            pirq_dpci = pirq_dpci(info);
>> +            if ( pirq_dpci &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
>> +                pt_irq_destroy_bind_msi(d, info);
>> +        }
>> +        if ( pirq > 0 )
>> +            unmap_domain_pirq(d, pirq);
>
>Can you guarantee that this function won't fail? Because if it
>does, I think you might end up in an infinite loop, because the

Considering current code doesn't deal with remaining pirq, if we
failed to unmap some pirq here (remove all entries from the msi_list
here), it wouldn't be a big issue. Hence the real issue here is a
potential infinite loop. Then we can just use
list_for_each_entry_safe(...) to traverse msi_list to avoid infinite
loop.

>entry doesn't always get removed from the list in error cases.
>Maybe unmap_domain_pirq() needs a "force" mode added,
>perhaps indirectly by way of passing "entry" into it (all other
>callers would pass NULL).

Yes, it is viable. However, for this call site, unmap_domain_pirq()
would fail to remove an entry only if xsm_unmap_domain_irq() in
unmap_domain_pirq() failed. Can we expect that xsm_unmap_domain_irq()
would always succeed there? If the answer is yes, what needed is
another assertion rather than the "force" mode. Maybe we can
forcibly remove all entries still on the list after the loop.
The benefit is we needn't change unmap_domain_pirq() and its
existing call sites.

>
>But then again I'm still not fully convinced that a hypervisor
>change is the right course of action here in the first place. It
>would be better if the hypervisor had to just verify that all
>IRQ mappings are gone, or else fail the de-assignment of the
>device.

Then in another place, we need the "force" mode. I don't think
it would bring great benefit if there is no other case (except device
hot-remove and guest shutdown) where we want to unmap all pirqs
related to a device.  

Thanks
Chao


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest
  2018-12-20 15:29 ` [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
@ 2019-01-02 11:57   ` Wei Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Wei Liu @ 2019-01-02 11:57 UTC (permalink / raw)
  To: Chao Gao
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel,
	Roger Pau Monné

On Thu, Dec 20, 2018 at 11:29:35PM +0800, Chao Gao wrote:
> When I destroyed a guest with 'xl destroy', I found the warning
> in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
> to that place, I got the call trace below:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
> (XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
> (XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
> (XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
> (XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
> (XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
> (XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
> (XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
> (XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
> (XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
> (XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
> (XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
> (XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
> (XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
> (XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
> (XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
> (XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
> (XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
> (XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
> (XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270
> 
> It seems to me that guest is trying to mask a msi while the memory decoding
> of the device is disabled. Performing a device reset without proper method
> to avoid guest's MSI-X operation would lead to this issue.
> 
> The fix is basic - detach pci device before resetting the device.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

Since this one is in the middle of a series, I will leave committing it
to hypervisor comitters. In the meantime I will discard the previous
version in my queue.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-21 15:21   ` Chao Gao
@ 2019-01-04  9:15     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2019-01-04  9:15 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 21.12.18 at 16:21, <chao.gao@intel.com> wrote:
> On Fri, Dec 21, 2018 at 03:13:50AM -0700, Jan Beulich wrote:
>>>>> On 20.12.18 at 16:29, <chao.gao@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -1514,6 +1514,55 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>>>      return rc;
>>>  }
>>>  
>>> +/*
>>> + * Unmap established mappings between domain's pirq and device's MSI.
>>> + * These mappings were set up by qemu/guest and are expected to be
>>> + * destroyed when changing the device's ownership.
>>> + */
>>> +static void pci_unmap_msi(struct pci_dev *pdev)
>>> +{
>>> +    struct msi_desc *entry;
>>> +    struct domain *d = pdev->domain;
>>> +
>>> +    ASSERT(pcidevs_locked());
>>> +
>>> +    if ( !d )
>>> +        return;
>>
>>Why? deassign_device() (the only caller) ought to guarantee this,
>>due to its use of pci_get_pdev_by_domain(). I think this simply
>>wants to be another ASSERT(), if anything at all.
>>
>>> +    spin_lock(&d->event_lock);
>>> +    while ( (entry = list_first_entry_or_null(&pdev->msi_list,
>>> +                                              struct msi_desc, list)) != 
> NULL )
>>> +    {
>>> +        struct pirq *info;
>>> +        int pirq = 0;
>>> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>>> +                          ? entry->msi.nvec : 1;
>>> +
>>> +        while ( nr -- )
>>
>>Stray blank.
>>
>>> +        {
>>> +            struct hvm_pirq_dpci *pirq_dpci;
>>> +
>>> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
>>> +            WARN_ON(pirq < 0);
>>> +            if ( pirq <= 0 )
>>> +                continue;
>>> +
>>> +            info = pirq_info(d, pirq);
>>> +            if ( !info )
>>> +                continue;
>>> +
>>> +            pirq_dpci = pirq_dpci(info);
>>> +            if ( pirq_dpci &&
>>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
>>> +                pt_irq_destroy_bind_msi(d, info);
>>> +        }
>>> +        if ( pirq > 0 )
>>> +            unmap_domain_pirq(d, pirq);
>>
>>Can you guarantee that this function won't fail? Because if it
>>does, I think you might end up in an infinite loop, because the
> 
> Considering current code doesn't deal with remaining pirq, if we
> failed to unmap some pirq here (remove all entries from the msi_list
> here), it wouldn't be a big issue. Hence the real issue here is a
> potential infinite loop. Then we can just use
> list_for_each_entry_safe(...) to traverse msi_list to avoid infinite
> loop.
> 
>>entry doesn't always get removed from the list in error cases.
>>Maybe unmap_domain_pirq() needs a "force" mode added,
>>perhaps indirectly by way of passing "entry" into it (all other
>>callers would pass NULL).
> 
> Yes, it is viable. However, for this call site, unmap_domain_pirq()
> would fail to remove an entry only if xsm_unmap_domain_irq() in
> unmap_domain_pirq() failed. Can we expect that xsm_unmap_domain_irq()
> would always succeed there?

It would probably be a buggy policy, but we shouldn't crash/hang
the hypervisor in such a case.

> If the answer is yes, what needed is
> another assertion rather than the "force" mode. Maybe we can
> forcibly remove all entries still on the list after the loop.

That's indeed a possible option.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-04  9:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20 15:29 [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
2018-12-20 15:29 ` [PATCH v4 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
2019-01-02 11:57   ` Wei Liu
2018-12-20 15:29 ` [PATCH v4 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
2018-12-21 10:17   ` Jan Beulich
2018-12-21 10:37     ` Roger Pau Monné
2018-12-21 10:13 ` [PATCH v4 1/3] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
2018-12-21 10:26   ` Roger Pau Monné
2018-12-21 10:32     ` Jan Beulich
2018-12-21 11:32       ` Roger Pau Monné
2018-12-21 15:21   ` Chao Gao
2019-01-04  9:15     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.