All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
@ 2019-01-16  8:17 Chao Gao
  2019-01-16  8:17 ` [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Chao Gao @ 2019-01-16  8:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Jan Beulich, Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
    ->free_domain_pirqs
        ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
            ->pirq_guest_force_unbind
                ->__pirq_guest_unbind
                    ->mask_msi_irq(=desc->handler->disable())
                        ->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v5:
 - fix the potential infinite loop
 - assert that unmap_domain_pirq() won't fail
 - assert msi_list is empty after the loop in pci_unmap_msi
 - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
   compilation error when building PVShim

Changes in v4:
 - split out change to 'msix->warned' field
 - handle multiple msi cases
 - use list_first_entry_or_null to traverse 'pdev->msi_list'
---
 xen/drivers/passthrough/io.c  | 57 ++++++++++++++++++++++++++------------
 xen/drivers/passthrough/pci.c | 64 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/iommu.h       |  4 +++
 3 files changed, 107 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index a6eb8a4..56ee1ef 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -619,6 +619,42 @@ int pt_irq_create_bind(
     return 0;
 }
 
+static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
+         list_empty(&pirq_dpci->digl_list) )
+    {
+        pirq_guest_unbind(d, pirq);
+        msixtbl_pt_unregister(d, pirq);
+        if ( pt_irq_need_timer(pirq_dpci->flags) )
+            kill_timer(&pirq_dpci->timer);
+        pirq_dpci->flags = 0;
+        /*
+         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
+         * call to pt_pirq_softirq_reset.
+         */
+        pt_pirq_softirq_reset(pirq_dpci);
+
+        pirq_cleanup_check(pirq, d);
+    }
+}
+
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
+
+    pt_irq_destroy_bind_common(d, pirq);
+}
+
 int pt_irq_destroy_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
         }
         else
             what = "bogus";
-    }
-    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
-        pi_update_irte(NULL, pirq, 0);
-
-    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
-         list_empty(&pirq_dpci->digl_list) )
-    {
-        pirq_guest_unbind(d, pirq);
-        msixtbl_pt_unregister(d, pirq);
-        if ( pt_irq_need_timer(pirq_dpci->flags) )
-            kill_timer(&pirq_dpci->timer);
-        pirq_dpci->flags = 0;
-        /*
-         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
-         * call to pt_pirq_softirq_reset.
-         */
-        pt_pirq_softirq_reset(pirq_dpci);
 
-        pirq_cleanup_check(pirq, d);
+        pt_irq_destroy_bind_common(d, pirq);
     }
+    else
+        pt_irq_destroy_bind_msi(d, pirq);
 
     spin_unlock(&d->event_lock);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 93c20b9..4f2be02 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     return rc;
 }
 
+/*
+ * Unmap established mappings between domain's pirq and device's MSI.
+ * These mappings were set up by qemu/guest and are expected to be
+ * destroyed when changing the device's ownership.
+ */
+static void pci_unmap_msi(struct pci_dev *pdev)
+{
+    struct msi_desc *entry, *tmp;
+    struct domain *d = pdev->domain;
+
+    ASSERT(pcidevs_locked());
+    ASSERT(d);
+
+    spin_lock(&d->event_lock);
+    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
+    {
+        struct pirq *info;
+        int ret, pirq = 0;
+        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
+                          ? entry->msi.nvec : 1;
+
+        while ( nr-- )
+        {
+            struct hvm_pirq_dpci *pirq_dpci;
+
+            pirq = domain_irq_to_pirq(d, entry[nr].irq);
+            WARN_ON(pirq < 0);
+            if ( pirq <= 0 )
+                continue;
+
+            info = pirq_info(d, pirq);
+            if ( !info )
+                continue;
+
+            pirq_dpci = pirq_dpci(info);
+            if ( pirq_dpci &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
+                pt_irq_destroy_bind_msi(d, info);
+        }
+
+        if ( pirq > 0 )
+        {
+            /*
+             * The pirq is derived from an entry in msi_list rather than an
+             * arbitrary value passed down. There should be a irq (msi) mapped
+             * to this pirq. In this case, unmapping this pirq should succeed.
+             * Otherwise, something goes wrong.
+             */
+            ret = unmap_domain_pirq(d, pirq);
+            ASSERT(!ret);
+        }
+    }
+    /*
+     * All pirq-s should have been unmapped and corresponding msi_desc
+     * entries should have been removed in the above loop.
+     */
+    ASSERT(list_empty(&pdev->msi_list));
+
+    spin_unlock(&d->event_lock);
+}
+
 /* caller should hold the pcidevs_lock */
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 {
@@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !pdev )
         return -ENODEV;
 
+    pci_unmap_msi(pdev);
+
     while ( pdev->phantom_stride )
     {
         devfn += pdev->phantom_stride;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index cdc8021..76b186a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -151,8 +151,12 @@ struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
 void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
 #ifdef CONFIG_HVM
 bool pt_irq_need_timer(uint32_t flags);
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq);
 #else
 static inline bool pt_irq_need_timer(unsigned int flags) { return false; }
+static inline void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+}
 #endif
 
 struct msi_desc;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest
  2019-01-16  8:17 [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
@ 2019-01-16  8:17 ` Chao Gao
  2019-01-16  8:17 ` [PATCH v5 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
  2019-01-16 10:38 ` [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Roger Pau Monné
  2 siblings, 0 replies; 16+ messages in thread
From: Chao Gao @ 2019-01-16  8:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Roger Pau Monné, Wei Liu, Chao Gao

When I destroyed a guest with 'xl destroy', I found the warning
in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
to that place, I got the call trace below:

(XEN) Xen call trace:
(XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
(XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
(XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
(XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
(XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
(XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
(XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
(XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
(XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
(XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
(XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
(XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
(XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
(XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
(XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
(XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
(XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
(XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
(XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
(XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270

It seems to me that guest is trying to mask a msi while the memory decoding
of the device is disabled. Performing a device reset without proper method
to avoid guest's MSI-X operation would lead to this issue.

The fix is basic - detach pci device before resetting the device.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_pci.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..855fb71 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1459,17 +1459,17 @@ skip1:
         fclose(f);
     }
 out:
-    /* don't do multiple resets while some functions are still passed through */
-    if ( (pcidev->vdevfn & 0x7) == 0 ) {
-        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
-    }
-
     if (!isstubdom) {
         rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
         if (rc < 0 && (hvm || errno != ENOSYS))
             LOGED(ERROR, domainid, "xc_deassign_device failed");
     }
 
+    /* don't do multiple resets while some functions are still passed through */
+    if ( (pcidev->vdevfn & 0x7) == 0 ) {
+        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+    }
+
     stubdomid = libxl_get_stubdom_id(ctx, domid);
     if (stubdomid != 0) {
         libxl_device_pci pcidev_s = *pcidev;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/3] xen/pt: initialize 'warned' field of arch_msix properly
  2019-01-16  8:17 [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2019-01-16  8:17 ` [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
@ 2019-01-16  8:17 ` Chao Gao
  2019-01-16 10:38 ` [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Roger Pau Monné
  2 siblings, 0 replies; 16+ messages in thread
From: Chao Gao @ 2019-01-16  8:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich, Chao Gao

Also clean up current code by moving initialization of arch specific
fields out of common code.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes in v5:
 - rename init_arch_msix to arch_init_msix
 - place arch_init_msix right after the definition of arch_msix

Changes in v4:
 - newly added
---
 xen/drivers/passthrough/pci.c | 2 +-
 xen/include/asm-x86/msi.h     | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4f2be02..95fc06b 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -367,7 +367,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
             xfree(pdev);
             return NULL;
         }
-        spin_lock_init(&msix->table_lock);
+        arch_init_msix(msix);
         pdev->msix = msix;
     }
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 10387dc..7b13c07 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -242,6 +242,12 @@ struct arch_msix {
     domid_t warned;
 };
 
+static inline void arch_init_msix(struct arch_msix *msix)
+{
+    spin_lock_init(&msix->table_lock);
+    msix->warned = DOMID_INVALID;
+}
+
 void early_msi_init(void);
 void msi_compose_msg(unsigned vector, const cpumask_t *mask,
                      struct msi_msg *msg);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16  8:17 [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2019-01-16  8:17 ` [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
  2019-01-16  8:17 ` [PATCH v5 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
@ 2019-01-16 10:38 ` Roger Pau Monné
  2019-01-16 11:59   ` Chao Gao
  2019-01-22  5:50   ` Chao Gao
  2 siblings, 2 replies; 16+ messages in thread
From: Roger Pau Monné @ 2019-01-16 10:38 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
> 
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> destroying guest. But as pciback has already disabled meory decoding before
> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> 
> ->arch_domain_destroy
>     ->free_domain_pirqs
>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>             ->pirq_guest_force_unbind
>                 ->__pirq_guest_unbind
>                     ->mask_msi_irq(=desc->handler->disable())
>                         ->the warning in msi_set_mask_bit()
> 
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Then guests cannot
> receive MSIs from this device.
> 
> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> pciback. Specifically, when a device is detached from a guest, all established
> mappings between pirq and msi are destroying before changing the ownership.
> 
> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Thanks, I think the approach is fine, just a couple of comments.

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> Changes in v5:
>  - fix the potential infinite loop
>  - assert that unmap_domain_pirq() won't fail
>  - assert msi_list is empty after the loop in pci_unmap_msi
>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>    compilation error when building PVShim
> 
> Changes in v4:
>  - split out change to 'msix->warned' field
>  - handle multiple msi cases
>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
> ---
>  xen/drivers/passthrough/io.c  | 57 ++++++++++++++++++++++++++------------
>  xen/drivers/passthrough/pci.c | 64 +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/iommu.h       |  4 +++
>  3 files changed, 107 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index a6eb8a4..56ee1ef 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>      return 0;
>  }
>  
> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> +         list_empty(&pirq_dpci->digl_list) )
> +    {
> +        pirq_guest_unbind(d, pirq);
> +        msixtbl_pt_unregister(d, pirq);
> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
> +            kill_timer(&pirq_dpci->timer);
> +        pirq_dpci->flags = 0;
> +        /*
> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> +         * call to pt_pirq_softirq_reset.
> +         */
> +        pt_pirq_softirq_reset(pirq_dpci);
> +
> +        pirq_cleanup_check(pirq, d);
> +    }
> +}
> +
> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);

const

> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
> +        pi_update_irte(NULL, pirq, 0);
> +
> +    pt_irq_destroy_bind_common(d, pirq);
> +}
> +
>  int pt_irq_destroy_bind(
>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>  {
> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>          }
>          else
>              what = "bogus";
> -    }
> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
> -        pi_update_irte(NULL, pirq, 0);
> -
> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> -         list_empty(&pirq_dpci->digl_list) )
> -    {
> -        pirq_guest_unbind(d, pirq);
> -        msixtbl_pt_unregister(d, pirq);
> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> -            kill_timer(&pirq_dpci->timer);
> -        pirq_dpci->flags = 0;
> -        /*
> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> -         * call to pt_pirq_softirq_reset.
> -         */
> -        pt_pirq_softirq_reset(pirq_dpci);
>  
> -        pirq_cleanup_check(pirq, d);
> +        pt_irq_destroy_bind_common(d, pirq);
>      }
> +    else
> +        pt_irq_destroy_bind_msi(d, pirq);
>  
>      spin_unlock(&d->event_lock);
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 93c20b9..4f2be02 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      return rc;
>  }
>  
> +/*
> + * Unmap established mappings between domain's pirq and device's MSI.
> + * These mappings were set up by qemu/guest and are expected to be
> + * destroyed when changing the device's ownership.
> + */
> +static void pci_unmap_msi(struct pci_dev *pdev)
> +{
> +    struct msi_desc *entry, *tmp;
> +    struct domain *d = pdev->domain;
> +
> +    ASSERT(pcidevs_locked());
> +    ASSERT(d);
> +
> +    spin_lock(&d->event_lock);
> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
> +    {
> +        struct pirq *info;
> +        int ret, pirq = 0;
> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> +                          ? entry->msi.nvec : 1;

I think you should mask the entry, like it's done in
pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
consistent state between bind and unbind.

> +
> +        while ( nr-- )
> +        {
> +            struct hvm_pirq_dpci *pirq_dpci;

Nit: you could reduce the scope of info by declaring it here AFAICT.

> +
> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
> +            WARN_ON(pirq < 0);
> +            if ( pirq <= 0 )
> +                continue;
> +
> +            info = pirq_info(d, pirq);
> +            if ( !info )
> +                continue;
> +
> +            pirq_dpci = pirq_dpci(info);
> +            if ( pirq_dpci &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
> +                pt_irq_destroy_bind_msi(d, info);
> +        }
> +
> +        if ( pirq > 0 )
> +        {
> +            /*
> +             * The pirq is derived from an entry in msi_list rather than an
> +             * arbitrary value passed down. There should be a irq (msi) mapped
> +             * to this pirq. In this case, unmapping this pirq should succeed.
> +             * Otherwise, something goes wrong.
> +             */
> +            ret = unmap_domain_pirq(d, pirq);
> +            ASSERT(!ret);

unmap_domain_pirq can fail, why not make pci_unmap_msi return an int
and propagate the error to the caller?

deassign_device returning an error should also be fine.

> +        }
> +    }
> +    /*
> +     * All pirq-s should have been unmapped and corresponding msi_desc
> +     * entries should have been removed in the above loop.
> +     */
> +    ASSERT(list_empty(&pdev->msi_list));
> +
> +    spin_unlock(&d->event_lock);
> +}
> +
>  /* caller should hold the pcidevs_lock */
>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>  {
> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>      if ( !pdev )
>          return -ENODEV;
>  
> +    pci_unmap_msi(pdev);

Just want to make sure, since deassign_device will be called for both
PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
device is assigned to a PV guest, but would like your confirmation.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16 10:38 ` [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Roger Pau Monné
@ 2019-01-16 11:59   ` Chao Gao
  2019-01-16 12:34     ` Roger Pau Monné
  2019-01-22  5:50   ` Chao Gao
  1 sibling, 1 reply; 16+ messages in thread
From: Chao Gao @ 2019-01-16 11:59 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>>     ->free_domain_pirqs
>>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>>             ->pirq_guest_force_unbind
>>                 ->__pirq_guest_unbind
>>                     ->mask_msi_irq(=desc->handler->disable())
>>                         ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
>Thanks, I think the approach is fine, just a couple of comments.
>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> Changes in v5:
>>  - fix the potential infinite loop
>>  - assert that unmap_domain_pirq() won't fail
>>  - assert msi_list is empty after the loop in pci_unmap_msi
>>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>>    compilation error when building PVShim
>> 
>> Changes in v4:
>>  - split out change to 'msix->warned' field
>>  - handle multiple msi cases
>>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
>> ---
>>  xen/drivers/passthrough/io.c  | 57 ++++++++++++++++++++++++++------------
>>  xen/drivers/passthrough/pci.c | 64 +++++++++++++++++++++++++++++++++++++++++++
>>  xen/include/xen/iommu.h       |  4 +++
>>  3 files changed, 107 insertions(+), 18 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index a6eb8a4..56ee1ef 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>>      return 0;
>>  }
>>  
>> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> +         list_empty(&pirq_dpci->digl_list) )
>> +    {
>> +        pirq_guest_unbind(d, pirq);
>> +        msixtbl_pt_unregister(d, pirq);
>> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> +            kill_timer(&pirq_dpci->timer);
>> +        pirq_dpci->flags = 0;
>> +        /*
>> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> +         * call to pt_pirq_softirq_reset.
>> +         */
>> +        pt_pirq_softirq_reset(pirq_dpci);
>> +
>> +        pirq_cleanup_check(pirq, d);
>> +    }
>> +}
>> +
>> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>
>const
>
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> +        pi_update_irte(NULL, pirq, 0);
>> +
>> +    pt_irq_destroy_bind_common(d, pirq);
>> +}
>> +
>>  int pt_irq_destroy_bind(
>>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>>          }
>>          else
>>              what = "bogus";
>> -    }
>> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> -        pi_update_irte(NULL, pirq, 0);
>> -
>> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> -         list_empty(&pirq_dpci->digl_list) )
>> -    {
>> -        pirq_guest_unbind(d, pirq);
>> -        msixtbl_pt_unregister(d, pirq);
>> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> -            kill_timer(&pirq_dpci->timer);
>> -        pirq_dpci->flags = 0;
>> -        /*
>> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> -         * call to pt_pirq_softirq_reset.
>> -         */
>> -        pt_pirq_softirq_reset(pirq_dpci);
>>  
>> -        pirq_cleanup_check(pirq, d);
>> +        pt_irq_destroy_bind_common(d, pirq);
>>      }
>> +    else
>> +        pt_irq_destroy_bind_msi(d, pirq);
>>  
>>      spin_unlock(&d->event_lock);
>>  
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 93c20b9..4f2be02 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Unmap established mappings between domain's pirq and device's MSI.
>> + * These mappings were set up by qemu/guest and are expected to be
>> + * destroyed when changing the device's ownership.
>> + */
>> +static void pci_unmap_msi(struct pci_dev *pdev)
>> +{
>> +    struct msi_desc *entry, *tmp;
>> +    struct domain *d = pdev->domain;
>> +
>> +    ASSERT(pcidevs_locked());
>> +    ASSERT(d);
>> +
>> +    spin_lock(&d->event_lock);
>> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
>> +    {
>> +        struct pirq *info;
>> +        int ret, pirq = 0;
>> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> +                          ? entry->msi.nvec : 1;
>
>I think you should mask the entry, like it's done in
>pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
>consistent state between bind and unbind.

I don't think it is necessary considering that we are to unmap pirq.
The reason of keeping state consistent is that we might try to bind
the same pirq to another guest interrupt.

>
>> +
>> +        while ( nr-- )
>> +        {
>> +            struct hvm_pirq_dpci *pirq_dpci;
>
>Nit: you could reduce the scope of info by declaring it here AFAICT.
>
>> +
>> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
>> +            WARN_ON(pirq < 0);
>> +            if ( pirq <= 0 )
>> +                continue;
>> +
>> +            info = pirq_info(d, pirq);
>> +            if ( !info )
>> +                continue;
>> +
>> +            pirq_dpci = pirq_dpci(info);
>> +            if ( pirq_dpci &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
>> +                pt_irq_destroy_bind_msi(d, info);
>> +        }
>> +
>> +        if ( pirq > 0 )
>> +        {
>> +            /*
>> +             * The pirq is derived from an entry in msi_list rather than an
>> +             * arbitrary value passed down. There should be a irq (msi) mapped
>> +             * to this pirq. In this case, unmapping this pirq should succeed.
>> +             * Otherwise, something goes wrong.
>> +             */
>> +            ret = unmap_domain_pirq(d, pirq);
>> +            ASSERT(!ret);
>
>unmap_domain_pirq can fail, why not make pci_unmap_msi return an int
>and propagate the error to the caller?
>
>deassign_device returning an error should also be fine.

Good suggestion. :)

>
>> +        }
>> +    }
>> +    /*
>> +     * All pirq-s should have been unmapped and corresponding msi_desc
>> +     * entries should have been removed in the above loop.
>> +     */
>> +    ASSERT(list_empty(&pdev->msi_list));
>> +
>> +    spin_unlock(&d->event_lock);
>> +}
>> +
>>  /* caller should hold the pcidevs_lock */
>>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>  {
>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>      if ( !pdev )
>>          return -ENODEV;
>>  
>> +    pci_unmap_msi(pdev);
>
>Just want to make sure, since deassign_device will be called for both
>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>device is assigned to a PV guest, but would like your confirmation.

TBH, I don't know how device pass-thru is implemented for PV guest.
If PV guest also uses the same structures and APIs to manage the mapping
between msi, pirq and guest interrupt, I think pci_unmap_msi() should also
work for PV guest case.

Thanks.
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16 11:59   ` Chao Gao
@ 2019-01-16 12:34     ` Roger Pau Monné
  2019-01-16 14:52       ` Chao Gao
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2019-01-16 12:34 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 07:59:44PM +0800, Chao Gao wrote:
> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> >> index 93c20b9..4f2be02 100644
> >> --- a/xen/drivers/passthrough/pci.c
> >> +++ b/xen/drivers/passthrough/pci.c
> >> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
> >>      return rc;
> >>  }
> >>  
> >> +/*
> >> + * Unmap established mappings between domain's pirq and device's MSI.
> >> + * These mappings were set up by qemu/guest and are expected to be
> >> + * destroyed when changing the device's ownership.
> >> + */
> >> +static void pci_unmap_msi(struct pci_dev *pdev)
> >> +{
> >> +    struct msi_desc *entry, *tmp;
> >> +    struct domain *d = pdev->domain;
> >> +
> >> +    ASSERT(pcidevs_locked());
> >> +    ASSERT(d);
> >> +
> >> +    spin_lock(&d->event_lock);
> >> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
> >> +    {
> >> +        struct pirq *info;
> >> +        int ret, pirq = 0;
> >> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> >> +                          ? entry->msi.nvec : 1;
> >
> >I think you should mask the entry, like it's done in
> >pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
> >consistent state between bind and unbind.
> 
> I don't think it is necessary considering that we are to unmap pirq.
> The reason of keeping state consistent is that we might try to bind
> the same pirq to another guest interrupt.

Even taking into account that the pirq will be unmapped afterwards I'm
not sure the state is going to be the same. unmap_domain_pirq doesn't
seem to mask the MSI entries, and so I wonder whether we could run
into issues (state not being the expected) when later re-assigning the
device to another guest.

Maybe I'm missing something, but I would like to make sure the device
state stays consistent between assignations, at the end of day the
problem this patch aims to solve is a state inconsistency between
device assignations.

> >> +        }
> >> +    }
> >> +    /*
> >> +     * All pirq-s should have been unmapped and corresponding msi_desc
> >> +     * entries should have been removed in the above loop.
> >> +     */
> >> +    ASSERT(list_empty(&pdev->msi_list));
> >> +
> >> +    spin_unlock(&d->event_lock);
> >> +}
> >> +
> >>  /* caller should hold the pcidevs_lock */
> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> >>  {
> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> >>      if ( !pdev )
> >>          return -ENODEV;
> >>  
> >> +    pci_unmap_msi(pdev);
> >
> >Just want to make sure, since deassign_device will be called for both
> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
> >device is assigned to a PV guest, but would like your confirmation.
> 
> TBH, I don't know how device pass-thru is implemented for PV guest.
> If PV guest also uses the same structures and APIs to manage the mapping
> between msi, pirq and guest interrupt, I think pci_unmap_msi() should also
> work for PV guest case.

No, PV guest uses a completely different mechanism. I think
pci_unmap_msi is safe to be used against PV guests, but it would be
nice to have some confirmation. The more that there are no
pci-passthorugh tests in osstest, so such error would go unnoticed.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16 12:34     ` Roger Pau Monné
@ 2019-01-16 14:52       ` Chao Gao
  2019-01-16 15:29         ` Roger Pau Monné
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Gao @ 2019-01-16 14:52 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 01:34:28PM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 07:59:44PM +0800, Chao Gao wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> >> index 93c20b9..4f2be02 100644
>> >> --- a/xen/drivers/passthrough/pci.c
>> >> +++ b/xen/drivers/passthrough/pci.c
>> >> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>> >>      return rc;
>> >>  }
>> >>  
>> >> +/*
>> >> + * Unmap established mappings between domain's pirq and device's MSI.
>> >> + * These mappings were set up by qemu/guest and are expected to be
>> >> + * destroyed when changing the device's ownership.
>> >> + */
>> >> +static void pci_unmap_msi(struct pci_dev *pdev)
>> >> +{
>> >> +    struct msi_desc *entry, *tmp;
>> >> +    struct domain *d = pdev->domain;
>> >> +
>> >> +    ASSERT(pcidevs_locked());
>> >> +    ASSERT(d);
>> >> +
>> >> +    spin_lock(&d->event_lock);
>> >> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
>> >> +    {
>> >> +        struct pirq *info;
>> >> +        int ret, pirq = 0;
>> >> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> >> +                          ? entry->msi.nvec : 1;
>> >
>> >I think you should mask the entry, like it's done in
>> >pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
>> >consistent state between bind and unbind.
>> 
>> I don't think it is necessary considering that we are to unmap pirq.
>> The reason of keeping state consistent is that we might try to bind
>> the same pirq to another guest interrupt.
>
>Even taking into account that the pirq will be unmapped afterwards I'm
>not sure the state is going to be the same. unmap_domain_pirq doesn't
>seem to mask the MSI entries, and so I wonder whether we could run
>into issues (state not being the expected) when later re-assigning the
>device to another guest.

A valid call trace (in this patch's description) is:

->unmap_domain_pirq (if pirq isn't unmapped by qemu)
    ->pirq_guest_force_unbind
	->__pirq_guest_unbind
	    ->mask_msi_irq(=desc->handler->disable())
		->the warning in msi_set_mask_bit()

>
>Maybe I'm missing something, but I would like to make sure the device
>state stays consistent between assignations, at the end of day the
>problem this patch aims to solve is a state inconsistency between
>device assignations.
>
>> >> +        }
>> >> +    }
>> >> +    /*
>> >> +     * All pirq-s should have been unmapped and corresponding msi_desc
>> >> +     * entries should have been removed in the above loop.
>> >> +     */
>> >> +    ASSERT(list_empty(&pdev->msi_list));
>> >> +
>> >> +    spin_unlock(&d->event_lock);
>> >> +}
>> >> +
>> >>  /* caller should hold the pcidevs_lock */
>> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>  {
>> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>      if ( !pdev )
>> >>          return -ENODEV;
>> >>  
>> >> +    pci_unmap_msi(pdev);
>> >
>> >Just want to make sure, since deassign_device will be called for both
>> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>> >device is assigned to a PV guest, but would like your confirmation.
>> 
>> TBH, I don't know how device pass-thru is implemented for PV guest.
>> If PV guest also uses the same structures and APIs to manage the mapping
>> between msi, pirq and guest interrupt, I think pci_unmap_msi() should also
>> work for PV guest case.
>
>No, PV guest uses a completely different mechanism. I think
>pci_unmap_msi is safe to be used against PV guests, but it would be
>nice to have some confirmation. The more that there are no
>pci-passthorugh tests in osstest, so such error would go unnoticed.

I will do some tests for PV guest.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16 14:52       ` Chao Gao
@ 2019-01-16 15:29         ` Roger Pau Monné
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Pau Monné @ 2019-01-16 15:29 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 10:52:48PM +0800, Chao Gao wrote:
> On Wed, Jan 16, 2019 at 01:34:28PM +0100, Roger Pau Monné wrote:
> >On Wed, Jan 16, 2019 at 07:59:44PM +0800, Chao Gao wrote:
> >> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
> >> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
> >> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> >> >> index 93c20b9..4f2be02 100644
> >> >> --- a/xen/drivers/passthrough/pci.c
> >> >> +++ b/xen/drivers/passthrough/pci.c
> >> >> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
> >> >>      return rc;
> >> >>  }
> >> >>  
> >> >> +/*
> >> >> + * Unmap established mappings between domain's pirq and device's MSI.
> >> >> + * These mappings were set up by qemu/guest and are expected to be
> >> >> + * destroyed when changing the device's ownership.
> >> >> + */
> >> >> +static void pci_unmap_msi(struct pci_dev *pdev)
> >> >> +{
> >> >> +    struct msi_desc *entry, *tmp;
> >> >> +    struct domain *d = pdev->domain;
> >> >> +
> >> >> +    ASSERT(pcidevs_locked());
> >> >> +    ASSERT(d);
> >> >> +
> >> >> +    spin_lock(&d->event_lock);
> >> >> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
> >> >> +    {
> >> >> +        struct pirq *info;
> >> >> +        int ret, pirq = 0;
> >> >> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
> >> >> +                          ? entry->msi.nvec : 1;
> >> >
> >> >I think you should mask the entry, like it's done in
> >> >pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
> >> >consistent state between bind and unbind.
> >> 
> >> I don't think it is necessary considering that we are to unmap pirq.
> >> The reason of keeping state consistent is that we might try to bind
> >> the same pirq to another guest interrupt.
> >
> >Even taking into account that the pirq will be unmapped afterwards I'm
> >not sure the state is going to be the same. unmap_domain_pirq doesn't
> >seem to mask the MSI entries, and so I wonder whether we could run
> >into issues (state not being the expected) when later re-assigning the
> >device to another guest.
> 
> A valid call trace (in this patch's description) is:
> 
> ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>     ->pirq_guest_force_unbind
> 	->__pirq_guest_unbind
> 	    ->mask_msi_irq(=desc->handler->disable())
> 		->the warning in msi_set_mask_bit()

Ack, sorry for not realizing this.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-16 10:38 ` [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Roger Pau Monné
  2019-01-16 11:59   ` Chao Gao
@ 2019-01-22  5:50   ` Chao Gao
  2019-01-22  8:24     ` Jan Beulich
  2019-01-22  9:18     ` Roger Pau Monné
  1 sibling, 2 replies; 16+ messages in thread
From: Chao Gao @ 2019-01-22  5:50 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>>     ->free_domain_pirqs
>>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>>             ->pirq_guest_force_unbind
>>                 ->__pirq_guest_unbind
>>                     ->mask_msi_irq(=desc->handler->disable())
>>                         ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>
>Thanks, I think the approach is fine, just a couple of comments.
>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> Changes in v5:
>>  - fix the potential infinite loop
>>  - assert that unmap_domain_pirq() won't fail
>>  - assert msi_list is empty after the loop in pci_unmap_msi
>>  - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
>>    compilation error when building PVShim
>> 
>> Changes in v4:
>>  - split out change to 'msix->warned' field
>>  - handle multiple msi cases
>>  - use list_first_entry_or_null to traverse 'pdev->msi_list'
>> ---
>>  xen/drivers/passthrough/io.c  | 57 ++++++++++++++++++++++++++------------
>>  xen/drivers/passthrough/pci.c | 64 +++++++++++++++++++++++++++++++++++++++++++
>>  xen/include/xen/iommu.h       |  4 +++
>>  3 files changed, 107 insertions(+), 18 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index a6eb8a4..56ee1ef 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>>      return 0;
>>  }
>>  
>> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> +         list_empty(&pirq_dpci->digl_list) )
>> +    {
>> +        pirq_guest_unbind(d, pirq);
>> +        msixtbl_pt_unregister(d, pirq);
>> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> +            kill_timer(&pirq_dpci->timer);
>> +        pirq_dpci->flags = 0;
>> +        /*
>> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> +         * call to pt_pirq_softirq_reset.
>> +         */
>> +        pt_pirq_softirq_reset(pirq_dpci);
>> +
>> +        pirq_cleanup_check(pirq, d);
>> +    }
>> +}
>> +
>> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>
>const
>
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> +        pi_update_irte(NULL, pirq, 0);
>> +
>> +    pt_irq_destroy_bind_common(d, pirq);
>> +}
>> +
>>  int pt_irq_destroy_bind(
>>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>>          }
>>          else
>>              what = "bogus";
>> -    }
>> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> -        pi_update_irte(NULL, pirq, 0);
>> -
>> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> -         list_empty(&pirq_dpci->digl_list) )
>> -    {
>> -        pirq_guest_unbind(d, pirq);
>> -        msixtbl_pt_unregister(d, pirq);
>> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> -            kill_timer(&pirq_dpci->timer);
>> -        pirq_dpci->flags = 0;
>> -        /*
>> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> -         * call to pt_pirq_softirq_reset.
>> -         */
>> -        pt_pirq_softirq_reset(pirq_dpci);
>>  
>> -        pirq_cleanup_check(pirq, d);
>> +        pt_irq_destroy_bind_common(d, pirq);
>>      }
>> +    else
>> +        pt_irq_destroy_bind_msi(d, pirq);
>>  
>>      spin_unlock(&d->event_lock);
>>  
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 93c20b9..4f2be02 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1514,6 +1514,68 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Unmap established mappings between domain's pirq and device's MSI.
>> + * These mappings were set up by qemu/guest and are expected to be
>> + * destroyed when changing the device's ownership.
>> + */
>> +static void pci_unmap_msi(struct pci_dev *pdev)
>> +{
>> +    struct msi_desc *entry, *tmp;
>> +    struct domain *d = pdev->domain;
>> +
>> +    ASSERT(pcidevs_locked());
>> +    ASSERT(d);
>> +
>> +    spin_lock(&d->event_lock);
>> +    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
>> +    {
>> +        struct pirq *info;
>> +        int ret, pirq = 0;
>> +        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
>> +                          ? entry->msi.nvec : 1;
>
>I think you should mask the entry, like it's done in
>pt_irq_destroy_bind, see the call to guest_mask_msi_irq. That gives a
>consistent state between bind and unbind.
>
>> +
>> +        while ( nr-- )
>> +        {
>> +            struct hvm_pirq_dpci *pirq_dpci;
>
>Nit: you could reduce the scope of info by declaring it here AFAICT.
>
>> +
>> +            pirq = domain_irq_to_pirq(d, entry[nr].irq);
>> +            WARN_ON(pirq < 0);
>> +            if ( pirq <= 0 )
>> +                continue;
>> +
>> +            info = pirq_info(d, pirq);
>> +            if ( !info )
>> +                continue;
>> +
>> +            pirq_dpci = pirq_dpci(info);
>> +            if ( pirq_dpci &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
>> +                pt_irq_destroy_bind_msi(d, info);
>> +        }
>> +
>> +        if ( pirq > 0 )
>> +        {
>> +            /*
>> +             * The pirq is derived from an entry in msi_list rather than an
>> +             * arbitrary value passed down. There should be a irq (msi) mapped
>> +             * to this pirq. In this case, unmapping this pirq should succeed.
>> +             * Otherwise, something goes wrong.
>> +             */
>> +            ret = unmap_domain_pirq(d, pirq);
>> +            ASSERT(!ret);
>
>unmap_domain_pirq can fail, why not make pci_unmap_msi return an int
>and propagate the error to the caller?
>
>deassign_device returning an error should also be fine.
>
>> +        }
>> +    }
>> +    /*
>> +     * All pirq-s should have been unmapped and corresponding msi_desc
>> +     * entries should have been removed in the above loop.
>> +     */
>> +    ASSERT(list_empty(&pdev->msi_list));
>> +
>> +    spin_unlock(&d->event_lock);
>> +}
>> +
>>  /* caller should hold the pcidevs_lock */
>>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>  {
>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>      if ( !pdev )
>>          return -ENODEV;
>>  
>> +    pci_unmap_msi(pdev);
>
>Just want to make sure, since deassign_device will be called for both
>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>device is assigned to a PV guest, but would like your confirmation.

Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
msi-x issue I want to fix.

With these three patches applied, I got some error messages from Xen
and Dom0 as follow:

(XEN) irq.c:2176: dom3: forcing unbind of pirq 332
(XEN) irq.c:2176: dom3: forcing unbind of pirq 331
(XEN) irq.c:2176: dom3: forcing unbind of pirq 328
(XEN) irq.c:2148: dom3: pirq 359 not mapped
[ 2887.067685] xen:events: unmap irq failed -22
(XEN) irq.c:2148: dom3: pirq 358 not mapped
[ 2887.075917] xen:events: unmap irq failed -22
(XEN) irq.c:2148: dom3: pirq 357 not mapped

It seems, the cause of such error is that pirq-s are unmapped and forcibly
unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
From some aspects, this error is expected. Because with this patch,
pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
deassignment and mapping/binding pirq after deassignment should fail.

So what's your opinion on handling such error? We should figure out another
method to fix msi-x issue to avoid such error or suppress these errors in
qemu and linux kernel?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22  5:50   ` Chao Gao
@ 2019-01-22  8:24     ` Jan Beulich
  2019-01-22 16:08       ` Chao Gao
  2019-01-22  9:18     ` Roger Pau Monné
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2019-01-22  8:24 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 22.01.19 at 06:50, <chao.gao@intel.com> wrote:
> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>>      if ( !pdev )
>>>          return -ENODEV;
>>>  
>>> +    pci_unmap_msi(pdev);
>>
>>Just want to make sure, since deassign_device will be called for both
>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>device is assigned to a PV guest, but would like your confirmation.
> 
> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
> msi-x issue I want to fix.
> 
> With these three patches applied, I got some error messages from Xen
> and Dom0 as follow:
> 
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
> (XEN) irq.c:2148: dom3: pirq 359 not mapped
> [ 2887.067685] xen:events: unmap irq failed -22
> (XEN) irq.c:2148: dom3: pirq 358 not mapped
> [ 2887.075917] xen:events: unmap irq failed -22
> (XEN) irq.c:2148: dom3: pirq 357 not mapped
> 
> It seems, the cause of such error is that pirq-s are unmapped and forcibly
> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
> From some aspects, this error is expected. Because with this patch,
> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
> deassignment and mapping/binding pirq after deassignment should fail.
> 
> So what's your opinion on handling such error? We should figure out another
> method to fix msi-x issue to avoid such error or suppress these errors in
> qemu and linux kernel?

The "forcing unbind" ones are probably fine to leave alone, but
the errors would better be avoided in Xen (i.e. without a need
to also change qemu and/or Linux). Since you don't really say
when / why these errors now surface, it's hard to suggest what
might be best to do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22  5:50   ` Chao Gao
  2019-01-22  8:24     ` Jan Beulich
@ 2019-01-22  9:18     ` Roger Pau Monné
  2019-01-22 16:11       ` Chao Gao
  1 sibling, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2019-01-22  9:18 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Tue, Jan 22, 2019 at 01:50:20PM +0800, Chao Gao wrote:
> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
> >> +        }
> >> +    }
> >> +    /*
> >> +     * All pirq-s should have been unmapped and corresponding msi_desc
> >> +     * entries should have been removed in the above loop.
> >> +     */
> >> +    ASSERT(list_empty(&pdev->msi_list));
> >> +
> >> +    spin_unlock(&d->event_lock);
> >> +}
> >> +
> >>  /* caller should hold the pcidevs_lock */
> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> >>  {
> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> >>      if ( !pdev )
> >>          return -ENODEV;
> >>  
> >> +    pci_unmap_msi(pdev);
> >
> >Just want to make sure, since deassign_device will be called for both
> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
> >device is assigned to a PV guest, but would like your confirmation.
> 
> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
> msi-x issue I want to fix.
> 
> With these three patches applied, I got some error messages from Xen
> and Dom0 as follow:
> 
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
> (XEN) irq.c:2148: dom3: pirq 359 not mapped
> [ 2887.067685] xen:events: unmap irq failed -22
> (XEN) irq.c:2148: dom3: pirq 358 not mapped
> [ 2887.075917] xen:events: unmap irq failed -22
> (XEN) irq.c:2148: dom3: pirq 357 not mapped
> 
> It seems, the cause of such error is that pirq-s are unmapped and forcibly
> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
> From some aspects, this error is expected. Because with this patch,
> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
> deassignment and mapping/binding pirq after deassignment should fail.

This is quite entangled because it involves Xen, libxl and pciback.

AFAICT libxl will already try to unmap the pirqs before deassigning
the device if the domain is PV, see do_pci_remove in libxl_pci.c and
the calls it makes to xc_physdev_unmap_pirq.

Which makes me wonder, have you tested if you see those messages about
pirq unmap failure without this patch applied?

> So what's your opinion on handling such error? We should figure out another
> method to fix msi-x issue to avoid such error or suppress these errors in
> qemu and linux kernel?

Regardless of the reply to the question above, I think
unmap_domain_pirq should return ESRCH if the pirq cannot be found,
like the patch below. That would turn the Linux kernel messages into
less scary info messages, like:

"domain %d does not have %d anymore"

Which seems more accurate.

Thanks, Roger.

---8<---
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 23b4f423e6..7e9c974ba1 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2144,9 +2144,9 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     info = pirq_info(d, pirq);
     if ( !info || (irq = info->arch.irq) <= 0 )
     {
-        dprintk(XENLOG_G_ERR, "dom%d: pirq %d not mapped\n",
+        dprintk(XENLOG_G_INFO, "dom%d: pirq %d not mapped\n",
                 d->domain_id, pirq);
-        ret = -EINVAL;
+        ret = -ESRCH;
         goto done;
     }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22  8:24     ` Jan Beulich
@ 2019-01-22 16:08       ` Chao Gao
  2019-01-22 16:28         ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Gao @ 2019-01-22 16:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne

On Tue, Jan 22, 2019 at 01:24:48AM -0700, Jan Beulich wrote:
>>>> On 22.01.19 at 06:50, <chao.gao@intel.com> wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>>>>      if ( !pdev )
>>>>          return -ENODEV;
>>>>  
>>>> +    pci_unmap_msi(pdev);
>>>
>>>Just want to make sure, since deassign_device will be called for both
>>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>>device is assigned to a PV guest, but would like your confirmation.
>> 
>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>> msi-x issue I want to fix.
>> 
>> With these three patches applied, I got some error messages from Xen
>> and Dom0 as follow:
>> 
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>> [ 2887.067685] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>> [ 2887.075917] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>> 
>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>> From some aspects, this error is expected. Because with this patch,
>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>> deassignment and mapping/binding pirq after deassignment should fail.
>> 
>> So what's your opinion on handling such error? We should figure out another
>> method to fix msi-x issue to avoid such error or suppress these errors in
>> qemu and linux kernel?
>
>The "forcing unbind" ones are probably fine to leave alone, but
>the errors would better be avoided in Xen (i.e. without a need
>to also change qemu and/or Linux). Since you don't really say
>when / why these errors now surface, it's hard to suggest what
>might be best to do.

With these patches applied, these errors surface in three cases:
1. destroy the PV guest with assigned devices by "xl destroy"
2. hot-unplug a assigned device from the PV guest
3. shut down the PV guest by executing "init 0" in guest (only for some
devices whose driver doesn't clean up MSI-x when shutdown)

The reason is:
when detaching a device from a domain, Toolstack always calls
xc_deassign_device() prior to libxl__device_pci_remove_xenstore().
The latter notifies xen_pciback to clean up the pci devices. I guess
unbinding and unmapping pirq are steps of the cleanup (just like
qemu's role in device deassignment for HVM guest). But in this patch,
pirqs are forcibly unmapped when calling xc_deassign_device(). Thus when
xen_pciback tries to unmap pirqs as usual, xen reports this pirq isn't
mapped and propagates this error to xen_pciback.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22  9:18     ` Roger Pau Monné
@ 2019-01-22 16:11       ` Chao Gao
  0 siblings, 0 replies; 16+ messages in thread
From: Chao Gao @ 2019-01-22 16:11 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Tue, Jan 22, 2019 at 10:18:55AM +0100, Roger Pau Monné wrote:
>On Tue, Jan 22, 2019 at 01:50:20PM +0800, Chao Gao wrote:
>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>> >On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>> >> +        }
>> >> +    }
>> >> +    /*
>> >> +     * All pirq-s should have been unmapped and corresponding msi_desc
>> >> +     * entries should have been removed in the above loop.
>> >> +     */
>> >> +    ASSERT(list_empty(&pdev->msi_list));
>> >> +
>> >> +    spin_unlock(&d->event_lock);
>> >> +}
>> >> +
>> >>  /* caller should hold the pcidevs_lock */
>> >>  int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>  {
>> >> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> >>      if ( !pdev )
>> >>          return -ENODEV;
>> >>  
>> >> +    pci_unmap_msi(pdev);
>> >
>> >Just want to make sure, since deassign_device will be called for both
>> >PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>> >device is assigned to a PV guest, but would like your confirmation.
>> 
>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>> msi-x issue I want to fix.
>> 
>> With these three patches applied, I got some error messages from Xen
>> and Dom0 as follow:
>> 
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>> [ 2887.067685] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>> [ 2887.075917] xen:events: unmap irq failed -22
>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>> 
>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>> From some aspects, this error is expected. Because with this patch,
>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>> deassignment and mapping/binding pirq after deassignment should fail.
>
>This is quite entangled because it involves Xen, libxl and pciback.
>
>AFAICT libxl will already try to unmap the pirqs before deassigning
>the device if the domain is PV, see do_pci_remove in libxl_pci.c and
>the calls it makes to xc_physdev_unmap_pirq.

It seems it only unmaps the pirq bound to INTx.

>
>Which makes me wonder, have you tested if you see those messages about
>pirq unmap failure without this patch applied?

No such error without my patch.

>
>> So what's your opinion on handling such error? We should figure out another
>> method to fix msi-x issue to avoid such error or suppress these errors in
>> qemu and linux kernel?
>
>Regardless of the reply to the question above, I think
>unmap_domain_pirq should return ESRCH if the pirq cannot be found,
>like the patch below. That would turn the Linux kernel messages into
>less scary info messages, like:
>
>"domain %d does not have %d anymore"
>
>Which seems more accurate.

I agree with you.

Thanks
Chao


>Thanks, Roger.
>
>---8<---
>diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>index 23b4f423e6..7e9c974ba1 100644
>--- a/xen/arch/x86/irq.c
>+++ b/xen/arch/x86/irq.c
>@@ -2144,9 +2144,9 @@ int unmap_domain_pirq(struct domain *d, int pirq)
>     info = pirq_info(d, pirq);
>     if ( !info || (irq = info->arch.irq) <= 0 )
>     {
>-        dprintk(XENLOG_G_ERR, "dom%d: pirq %d not mapped\n",
>+        dprintk(XENLOG_G_INFO, "dom%d: pirq %d not mapped\n",
>                 d->domain_id, pirq);
>-        ret = -EINVAL;
>+        ret = -ESRCH;
>         goto done;
>     }
> 
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22 16:08       ` Chao Gao
@ 2019-01-22 16:28         ` Jan Beulich
  2019-01-25 11:10           ` Chao Gao
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2019-01-22 16:28 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 22.01.19 at 17:08, <chao.gao@intel.com> wrote:
> On Tue, Jan 22, 2019 at 01:24:48AM -0700, Jan Beulich wrote:
>>>>> On 22.01.19 at 06:50, <chao.gao@intel.com> wrote:
>>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, 
> u8 devfn)
>>>>>      if ( !pdev )
>>>>>          return -ENODEV;
>>>>>  
>>>>> +    pci_unmap_msi(pdev);
>>>>
>>>>Just want to make sure, since deassign_device will be called for both
>>>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>>>device is assigned to a PV guest, but would like your confirmation.
>>> 
>>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>>> msi-x issue I want to fix.
>>> 
>>> With these three patches applied, I got some error messages from Xen
>>> and Dom0 as follow:
>>> 
>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>>> [ 2887.067685] xen:events: unmap irq failed -22
>>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>>> [ 2887.075917] xen:events: unmap irq failed -22
>>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>>> 
>>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>>> From some aspects, this error is expected. Because with this patch,
>>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>>> deassignment and mapping/binding pirq after deassignment should fail.
>>> 
>>> So what's your opinion on handling such error? We should figure out another
>>> method to fix msi-x issue to avoid such error or suppress these errors in
>>> qemu and linux kernel?
>>
>>The "forcing unbind" ones are probably fine to leave alone, but
>>the errors would better be avoided in Xen (i.e. without a need
>>to also change qemu and/or Linux). Since you don't really say
>>when / why these errors now surface, it's hard to suggest what
>>might be best to do.
> 
> With these patches applied, these errors surface in three cases:
> 1. destroy the PV guest with assigned devices by "xl destroy"
> 2. hot-unplug a assigned device from the PV guest
> 3. shut down the PV guest by executing "init 0" in guest (only for some
> devices whose driver doesn't clean up MSI-x when shutdown)
> 
> The reason is:
> when detaching a device from a domain, Toolstack always calls
> xc_deassign_device() prior to libxl__device_pci_remove_xenstore().
> The latter notifies xen_pciback to clean up the pci devices. I guess
> unbinding and unmapping pirq are steps of the cleanup (just like
> qemu's role in device deassignment for HVM guest). But in this patch,
> pirqs are forcibly unmapped when calling xc_deassign_device(). Thus when
> xen_pciback tries to unmap pirqs as usual, xen reports this pirq isn't
> mapped and propagates this error to xen_pciback.

Why are you talking about pciback here? I don't think it plays any
role in IRQ unmapping. If what the tool stack does if going to be
fully taen care of by the hypervisor, I think that tool stack code
could the be deleted (unless compatibility requirements don't
allow doing so, in which case it could be conditionally bypassed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-22 16:28         ` Jan Beulich
@ 2019-01-25 11:10           ` Chao Gao
  2019-01-25 15:57             ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Gao @ 2019-01-25 11:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne

On Tue, Jan 22, 2019 at 09:28:16AM -0700, Jan Beulich wrote:
>>>> On 22.01.19 at 17:08, <chao.gao@intel.com> wrote:
>> On Tue, Jan 22, 2019 at 01:24:48AM -0700, Jan Beulich wrote:
>>>>>> On 22.01.19 at 06:50, <chao.gao@intel.com> wrote:
>>>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>>>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>>>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, 
>> u8 devfn)
>>>>>>      if ( !pdev )
>>>>>>          return -ENODEV;
>>>>>>  
>>>>>> +    pci_unmap_msi(pdev);
>>>>>
>>>>>Just want to make sure, since deassign_device will be called for both
>>>>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>>>>device is assigned to a PV guest, but would like your confirmation.
>>>> 
>>>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>>>> msi-x issue I want to fix.
>>>> 
>>>> With these three patches applied, I got some error messages from Xen
>>>> and Dom0 as follow:
>>>> 
>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>>>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>>>> [ 2887.067685] xen:events: unmap irq failed -22
>>>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>>>> [ 2887.075917] xen:events: unmap irq failed -22
>>>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>>>> 
>>>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>>>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>>>> From some aspects, this error is expected. Because with this patch,
>>>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>>>> deassignment and mapping/binding pirq after deassignment should fail.
>>>> 
>>>> So what's your opinion on handling such error? We should figure out another
>>>> method to fix msi-x issue to avoid such error or suppress these errors in
>>>> qemu and linux kernel?
>>>
>>>The "forcing unbind" ones are probably fine to leave alone, but
>>>the errors would better be avoided in Xen (i.e. without a need
>>>to also change qemu and/or Linux). Since you don't really say
>>>when / why these errors now surface, it's hard to suggest what
>>>might be best to do.
>> 
>> With these patches applied, these errors surface in three cases:
>> 1. destroy the PV guest with assigned devices by "xl destroy"
>> 2. hot-unplug a assigned device from the PV guest
>> 3. shut down the PV guest by executing "init 0" in guest (only for some
>> devices whose driver doesn't clean up MSI-x when shutdown)
>> 
>> The reason is:
>> when detaching a device from a domain, Toolstack always calls
>> xc_deassign_device() prior to libxl__device_pci_remove_xenstore().
>> The latter notifies xen_pciback to clean up the pci devices. I guess
>> unbinding and unmapping pirq are steps of the cleanup (just like
>> qemu's role in device deassignment for HVM guest). But in this patch,
>> pirqs are forcibly unmapped when calling xc_deassign_device(). Thus when
>> xen_pciback tries to unmap pirqs as usual, xen reports this pirq isn't
>> mapped and propagates this error to xen_pciback.
>
>Why are you talking about pciback here? I don't think it plays any
>role in IRQ unmapping.

Hi Jan,

In my understanding, for PV guest, it is the pciback that unmaps pirqs
of a device.

[xl] libxl__device_pci_remove_xenstore() ->
[Linux] pcistub_put_pci_dev() -> xen_pcibk_reset_device() ->
pci_disable_msix() -> free_msi_irqs -> ...  -> xen_teardown_msi_irq() ->
xen_destroy_irq() -> PHYSDEVOP_unmap_pirq.  

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot
  2019-01-25 11:10           ` Chao Gao
@ 2019-01-25 15:57             ` Jan Beulich
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-01-25 15:57 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 25.01.19 at 12:10, <chao.gao@intel.com> wrote:
> On Tue, Jan 22, 2019 at 09:28:16AM -0700, Jan Beulich wrote:
>>>>> On 22.01.19 at 17:08, <chao.gao@intel.com> wrote:
>>> On Tue, Jan 22, 2019 at 01:24:48AM -0700, Jan Beulich wrote:
>>>>>>> On 22.01.19 at 06:50, <chao.gao@intel.com> wrote:
>>>>> On Wed, Jan 16, 2019 at 11:38:23AM +0100, Roger Pau Monné wrote:
>>>>>>On Wed, Jan 16, 2019 at 04:17:30PM +0800, Chao Gao wrote:
>>>>>>> @@ -1529,6 +1591,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, 
>>> u8 devfn)
>>>>>>>      if ( !pdev )
>>>>>>>          return -ENODEV;
>>>>>>>  
>>>>>>> +    pci_unmap_msi(pdev);
>>>>>>
>>>>>>Just want to make sure, since deassign_device will be called for both
>>>>>>PV and HVM domains. AFAICT pci_unmap_msi is safe to call when the
>>>>>>device is assigned to a PV guest, but would like your confirmation.
>>>>> 
>>>>> Tested with a PV guest loaded by Pygrub. PV guest doesn't suffer the
>>>>> msi-x issue I want to fix.
>>>>> 
>>>>> With these three patches applied, I got some error messages from Xen
>>>>> and Dom0 as follow:
>>>>> 
>>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 332
>>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 331
>>>>> (XEN) irq.c:2176: dom3: forcing unbind of pirq 328
>>>>> (XEN) irq.c:2148: dom3: pirq 359 not mapped
>>>>> [ 2887.067685] xen:events: unmap irq failed -22
>>>>> (XEN) irq.c:2148: dom3: pirq 358 not mapped
>>>>> [ 2887.075917] xen:events: unmap irq failed -22
>>>>> (XEN) irq.c:2148: dom3: pirq 357 not mapped
>>>>> 
>>>>> It seems, the cause of such error is that pirq-s are unmapped and forcibly
>>>>> unbound on deassignment; subsequent unmapping pirq issued by dom0 fail.
>>>>> From some aspects, this error is expected. Because with this patch,
>>>>> pirq-s are expected to be mapped by qemu or dom0 kernel (for pv case) before
>>>>> deassignment and mapping/binding pirq after deassignment should fail.
>>>>> 
>>>>> So what's your opinion on handling such error? We should figure out another
>>>>> method to fix msi-x issue to avoid such error or suppress these errors in
>>>>> qemu and linux kernel?
>>>>
>>>>The "forcing unbind" ones are probably fine to leave alone, but
>>>>the errors would better be avoided in Xen (i.e. without a need
>>>>to also change qemu and/or Linux). Since you don't really say
>>>>when / why these errors now surface, it's hard to suggest what
>>>>might be best to do.
>>> 
>>> With these patches applied, these errors surface in three cases:
>>> 1. destroy the PV guest with assigned devices by "xl destroy"
>>> 2. hot-unplug a assigned device from the PV guest
>>> 3. shut down the PV guest by executing "init 0" in guest (only for some
>>> devices whose driver doesn't clean up MSI-x when shutdown)
>>> 
>>> The reason is:
>>> when detaching a device from a domain, Toolstack always calls
>>> xc_deassign_device() prior to libxl__device_pci_remove_xenstore().
>>> The latter notifies xen_pciback to clean up the pci devices. I guess
>>> unbinding and unmapping pirq are steps of the cleanup (just like
>>> qemu's role in device deassignment for HVM guest). But in this patch,
>>> pirqs are forcibly unmapped when calling xc_deassign_device(). Thus when
>>> xen_pciback tries to unmap pirqs as usual, xen reports this pirq isn't
>>> mapped and propagates this error to xen_pciback.
>>
>>Why are you talking about pciback here? I don't think it plays any
>>role in IRQ unmapping.
> 
> Hi Jan,
> 
> In my understanding, for PV guest, it is the pciback that unmaps pirqs
> of a device.
> 
> [xl] libxl__device_pci_remove_xenstore() ->
> [Linux] pcistub_put_pci_dev() -> xen_pcibk_reset_device() ->
> pci_disable_msix() -> free_msi_irqs -> ...  -> xen_teardown_msi_irq() ->
> xen_destroy_irq() -> PHYSDEVOP_unmap_pirq.  

Oh, I see - you're right. Way too convoluted for my taste.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-25 15:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16  8:17 [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
2019-01-16  8:17 ` [PATCH v5 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
2019-01-16  8:17 ` [PATCH v5 3/3] xen/pt: initialize 'warned' field of arch_msix properly Chao Gao
2019-01-16 10:38 ` [PATCH v5 1/3] xen/pt: fix some pass-thru devices don't work across reboot Roger Pau Monné
2019-01-16 11:59   ` Chao Gao
2019-01-16 12:34     ` Roger Pau Monné
2019-01-16 14:52       ` Chao Gao
2019-01-16 15:29         ` Roger Pau Monné
2019-01-22  5:50   ` Chao Gao
2019-01-22  8:24     ` Jan Beulich
2019-01-22 16:08       ` Chao Gao
2019-01-22 16:28         ` Jan Beulich
2019-01-25 11:10           ` Chao Gao
2019-01-25 15:57             ` Jan Beulich
2019-01-22  9:18     ` Roger Pau Monné
2019-01-22 16:11       ` Chao Gao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.