xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
@ 2018-12-18 14:43 Chao Gao
  2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Chao Gao @ 2018-12-18 14:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monné, Jan Beulich, Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
    ->free_domain_pirqs
        ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
            ->pirq_guest_force_unbind
                ->__pirq_guest_unbind
                    ->mask_msi_irq(=desc->handler->disable())
                        ->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Applied this patch, qemu would report the error below:
    [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
    [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
    [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
    [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)

Despite of the error, guest shutdown or device hotplug finishs smoothly.
It seems to me that qemu tries to unbind a msi which is already unbound by
the code added by this patch. I am not sure whether it is acceptable to
leave this error there.
---
 xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
 xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
 xen/include/xen/iommu.h       |  1 +
 3 files changed, 89 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index a6eb8a4..56ee1ef 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -619,6 +619,42 @@ int pt_irq_create_bind(
     return 0;
 }
 
+static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
+         list_empty(&pirq_dpci->digl_list) )
+    {
+        pirq_guest_unbind(d, pirq);
+        msixtbl_pt_unregister(d, pirq);
+        if ( pt_irq_need_timer(pirq_dpci->flags) )
+            kill_timer(&pirq_dpci->timer);
+        pirq_dpci->flags = 0;
+        /*
+         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
+         * call to pt_pirq_softirq_reset.
+         */
+        pt_pirq_softirq_reset(pirq_dpci);
+
+        pirq_cleanup_check(pirq, d);
+    }
+}
+
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
+
+    pt_irq_destroy_bind_common(d, pirq);
+}
+
 int pt_irq_destroy_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
         }
         else
             what = "bogus";
-    }
-    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
-        pi_update_irte(NULL, pirq, 0);
-
-    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
-         list_empty(&pirq_dpci->digl_list) )
-    {
-        pirq_guest_unbind(d, pirq);
-        msixtbl_pt_unregister(d, pirq);
-        if ( pt_irq_need_timer(pirq_dpci->flags) )
-            kill_timer(&pirq_dpci->timer);
-        pirq_dpci->flags = 0;
-        /*
-         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
-         * call to pt_pirq_softirq_reset.
-         */
-        pt_pirq_softirq_reset(pirq_dpci);
 
-        pirq_cleanup_check(pirq, d);
+        pt_irq_destroy_bind_common(d, pirq);
     }
+    else
+        pt_irq_destroy_bind_msi(d, pirq);
 
     spin_unlock(&d->event_lock);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 1277ce2..88a8007 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
             return NULL;
         }
         spin_lock_init(&msix->table_lock);
+        msix->warned = DOMID_INVALID;
         pdev->msix = msix;
     }
 
@@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     return rc;
 }
 
+/*
+ * Unmap established mappings between domain's pirq and device's MSI.
+ * These mappings were set up by qemu/guest and are expected to be
+ * destroyed when changing the device's ownership.
+ */
+static void pci_unmap_msi(struct pci_dev *pdev)
+{
+    struct msi_desc *entry, *tmp;
+
+    ASSERT(pcidevs_locked());
+
+    if ( !pdev->domain )
+        return;
+
+    spin_lock(&pdev->domain->event_lock);
+    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
+    {
+        struct pirq *info;
+        struct hvm_pirq_dpci *pirq_dpci;
+        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
+
+        pirq_orig = pirq;
+
+        if ( !pirq )
+            continue;
+
+        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
+        if ( pirq < 0)
+            pirq = -pirq;
+
+        info = pirq_info(pdev->domain, pirq);
+        if ( !info )
+            continue;
+        pirq_dpci = pirq_dpci(info);
+
+        if ( pirq_dpci &&
+             (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+             (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
+            pt_irq_destroy_bind_msi(pdev->domain, info);
+
+        if ( pirq_orig > 0 )
+            unmap_domain_pirq(pdev->domain, pirq_orig);
+    }
+    spin_unlock(&pdev->domain->event_lock);
+}
+
 /* caller should hold the pcidevs_lock */
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 {
@@ -1529,6 +1576,8 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !pdev )
         return -ENODEV;
 
+    pci_unmap_msi(pdev);
+
     while ( pdev->phantom_stride )
     {
         devfn += pdev->phantom_stride;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 3d78126..8aecf43 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -108,6 +108,7 @@ struct pirq;
 int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
 int pt_irq_create_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
 int pt_irq_destroy_bind(struct domain *, const struct xen_domctl_bind_pt_irq *);
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq);
 
 void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest
  2018-12-18 14:43 [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
@ 2018-12-18 14:43 ` Chao Gao
  2018-12-19  9:00   ` Roger Pau Monné
  2019-01-02 11:49   ` Wei Liu
  2018-12-18 15:53 ` [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
  2018-12-19  8:57 ` Roger Pau Monné
  2 siblings, 2 replies; 11+ messages in thread
From: Chao Gao @ 2018-12-18 14:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, Chao Gao,
	Roger Pau Monné

When I destroyed a guest with 'xl destroy', I found the warning
in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
to that place, I got the call trace below:

(XEN) Xen call trace:
(XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
(XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
(XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
(XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
(XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
(XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
(XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
(XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
(XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
(XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
(XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
(XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
(XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
(XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
(XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
(XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
(XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
(XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
(XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
(XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270

It seems to me that guest is trying to mask a msi while the memory decoding
of the device is disabled. Performing a device reset without proper method
to avoid guest's MSI-X operation would lead to this issue.

The fix is basic - detach pci device before resetting the device.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 tools/libxl/libxl_pci.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 87afa03..855fb71 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1459,17 +1459,17 @@ skip1:
         fclose(f);
     }
 out:
-    /* don't do multiple resets while some functions are still passed through */
-    if ( (pcidev->vdevfn & 0x7) == 0 ) {
-        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
-    }
-
     if (!isstubdom) {
         rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
         if (rc < 0 && (hvm || errno != ENOSYS))
             LOGED(ERROR, domainid, "xc_deassign_device failed");
     }
 
+    /* don't do multiple resets while some functions are still passed through */
+    if ( (pcidev->vdevfn & 0x7) == 0 ) {
+        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+    }
+
     stubdomid = libxl_get_stubdom_id(ctx, domid);
     if (stubdomid != 0) {
         libxl_device_pci pcidev_s = *pcidev;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-18 14:43 [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
@ 2018-12-18 15:53 ` Jan Beulich
  2018-12-19  4:56   ` Chao Gao
  2018-12-19  8:57 ` Roger Pau Monné
  2 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-12-18 15:53 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Roger Pau Monne

>>> On 18.12.18 at 15:43, <chao.gao@intel.com> wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
> 
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> destroying guest. But as pciback has already disabled meory decoding before
> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> 
> ->arch_domain_destroy
>     ->free_domain_pirqs
>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>             ->pirq_guest_force_unbind
>                 ->__pirq_guest_unbind
>                     ->mask_msi_irq(=desc->handler->disable())
>                         ->the warning in msi_set_mask_bit()
> 
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Then guests cannot
> receive MSIs from this device.
> 
> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> pciback. Specifically, when a device is detached from a guest, all established
> mappings between pirq and msi are destroying before changing the ownership.
> 
> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html 
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> Applied this patch, qemu would report the error below:
>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
> 
> Despite of the error, guest shutdown or device hotplug finishs smoothly.
> It seems to me that qemu tries to unbind a msi which is already unbound by
> the code added by this patch. I am not sure whether it is acceptable to
> leave this error there.

Well, the errors mean that qemu is playing with a device that's no
longer owned by the guest controlled by this qemu instance. At
least with a de-privileged qemu (no idea whether this actually works
with pass-through) that's still a mistake, and hence would need
fixing. Whichever entity it is that invokes the de-assign of the
device, other involved parties should be informed so that they can
keep their hands off the device from that point onwards.

The hypervisor change itself looks mostly fine, just a few minor
comments.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>              return NULL;
>          }
>          spin_lock_init(&msix->table_lock);
> +        msix->warned = DOMID_INVALID;

This is an arch-specific field right now; in fact the entire structure
is arch-specific. Playing with any of its fields in common code is
undesirable, but I guess the use of ->table_lock can be taken as
an excuse until this code wants to eventually be used by Arm.
(The structure requiring a lock is sufficiently generic, whereas
the "warned" field may not be universally needed.)

> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      return rc;
>  }
>  
> +/*
> + * Unmap established mappings between domain's pirq and device's MSI.
> + * These mappings were set up by qemu/guest and are expected to be
> + * destroyed when changing the device's ownership.
> + */
> +static void pci_unmap_msi(struct pci_dev *pdev)
> +{
> +    struct msi_desc *entry, *tmp;
> +
> +    ASSERT(pcidevs_locked());
> +
> +    if ( !pdev->domain )

There are quite a few uses of pdev->domain - please consider
using a local variable.

> +        return;
> +
> +    spin_lock(&pdev->domain->event_lock);
> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
> +    {
> +        struct pirq *info;
> +        struct hvm_pirq_dpci *pirq_dpci;
> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
> +
> +        pirq_orig = pirq;
> +
> +        if ( !pirq )
> +            continue;
> +
> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
> +        if ( pirq < 0)
> +            pirq = -pirq;
> +
> +        info = pirq_info(pdev->domain, pirq);

Why not simply

        info = pirq_info(pdev->domain, ABS(pirq));

without any pirq_orig?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-18 15:53 ` [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
@ 2018-12-19  4:56   ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2018-12-19  4:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne

On Tue, Dec 18, 2018 at 08:53:46AM -0700, Jan Beulich wrote:
>>>> On 18.12.18 at 15:43, <chao.gao@intel.com> wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>>     ->free_domain_pirqs
>>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>>             ->pirq_guest_force_unbind
>>                 ->__pirq_guest_unbind
>>                     ->mask_msi_irq(=desc->handler->disable())
>>                         ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html 
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> Applied this patch, qemu would report the error below:
>>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
>>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
>>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
>>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
>> 
>> Despite of the error, guest shutdown or device hotplug finishs smoothly.
>> It seems to me that qemu tries to unbind a msi which is already unbound by
>> the code added by this patch. I am not sure whether it is acceptable to
>> leave this error there.
>
>Well, the errors mean that qemu is playing with a device that's no
>longer owned by the guest controlled by this qemu instance. At
>least with a de-privileged qemu (no idea whether this actually works
>with pass-through) that's still a mistake, and hence would need
>fixing. Whichever entity it is that invokes the de-assign of the
>device, other involved parties should be informed so that they can
>keep their hands off the device from that point onwards.
>
>The hypervisor change itself looks mostly fine, just a few minor
>comments.
>
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>>              return NULL;
>>          }
>>          spin_lock_init(&msix->table_lock);
>> +        msix->warned = DOMID_INVALID;
>
>This is an arch-specific field right now; in fact the entire structure
>is arch-specific. Playing with any of its fields in common code is
>undesirable, but I guess the use of ->table_lock can be taken as
>an excuse until this code wants to eventually be used by Arm.
>(The structure requiring a lock is sufficiently generic, whereas
>the "warned" field may not be universally needed.)

I will clean up this place.

>
>> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Unmap established mappings between domain's pirq and device's MSI.
>> + * These mappings were set up by qemu/guest and are expected to be
>> + * destroyed when changing the device's ownership.
>> + */
>> +static void pci_unmap_msi(struct pci_dev *pdev)
>> +{
>> +    struct msi_desc *entry, *tmp;
>> +
>> +    ASSERT(pcidevs_locked());
>> +
>> +    if ( !pdev->domain )
>
>There are quite a few uses of pdev->domain - please consider
>using a local variable.
>
>> +        return;
>> +
>> +    spin_lock(&pdev->domain->event_lock);
>> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
>> +    {
>> +        struct pirq *info;
>> +        struct hvm_pirq_dpci *pirq_dpci;
>> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
>> +
>> +        pirq_orig = pirq;
>> +
>> +        if ( !pirq )
>> +            continue;
>> +
>> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
>> +        if ( pirq < 0)
>> +            pirq = -pirq;
>> +
>> +        info = pirq_info(pdev->domain, pirq);
>
>Why not simply
>
>        info = pirq_info(pdev->domain, ABS(pirq));
>
>without any pirq_orig?

Will do.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-18 14:43 [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
  2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
  2018-12-18 15:53 ` [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
@ 2018-12-19  8:57 ` Roger Pau Monné
  2018-12-20  2:46   ` Chao Gao
  2 siblings, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2018-12-19  8:57 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest
> reboot. Assigning it to another domain also meets the same issue. And
> the only way to make it work again is un-binding and binding it to
> pciback. Someone reported this issue one year ago [1].
> 
> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> killed/crashed before the domain shutdown, this domain's pirq won't be
> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> destroying guest. But as pciback has already disabled meory decoding before
> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> this process is:
> 
> ->arch_domain_destroy
>     ->free_domain_pirqs
>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>             ->pirq_guest_force_unbind
>                 ->__pirq_guest_unbind
>                     ->mask_msi_irq(=desc->handler->disable())
>                         ->the warning in msi_set_mask_bit()
> 
> The host_maskall bit will prevent guests from clearing the maskall bit
> even the device is assigned to another guest later. Then guests cannot
> receive MSIs from this device.
> 
> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> pciback. Specifically, when a device is detached from a guest, all established
> mappings between pirq and msi are destroying before changing the ownership.
> 
> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> Applied this patch, qemu would report the error below:
>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
> 
> Despite of the error, guest shutdown or device hotplug finishs smoothly.
> It seems to me that qemu tries to unbind a msi which is already unbound by
> the code added by this patch. I am not sure whether it is acceptable to
> leave this error there.

So QEMU would try to unmap IRQs after unbinding the device? I think
QEMU should be fixed to first unmap the IRQs and then unbind the
device.

As long as this doesn't affect QEMU functionality I guess the Xen side
can be committed, but ideally a QEMU patch to avoid those error
messages should be committed at the same time.

> ---
>  xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
>  xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
>  xen/include/xen/iommu.h       |  1 +
>  3 files changed, 89 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index a6eb8a4..56ee1ef 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>      return 0;
>  }
>  
> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> +         list_empty(&pirq_dpci->digl_list) )
> +    {
> +        pirq_guest_unbind(d, pirq);
> +        msixtbl_pt_unregister(d, pirq);
> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
> +            kill_timer(&pirq_dpci->timer);
> +        pirq_dpci->flags = 0;
> +        /*
> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> +         * call to pt_pirq_softirq_reset.
> +         */
> +        pt_pirq_softirq_reset(pirq_dpci);
> +
> +        pirq_cleanup_check(pirq, d);
> +    }
> +}
> +
> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
> +{
> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> +
> +    ASSERT(spin_is_locked(&d->event_lock));
> +
> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
> +        pi_update_irte(NULL, pirq, 0);
> +
> +    pt_irq_destroy_bind_common(d, pirq);
> +}
> +
>  int pt_irq_destroy_bind(
>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>  {
> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>          }
>          else
>              what = "bogus";
> -    }
> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
> -        pi_update_irte(NULL, pirq, 0);
> -
> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> -         list_empty(&pirq_dpci->digl_list) )
> -    {
> -        pirq_guest_unbind(d, pirq);
> -        msixtbl_pt_unregister(d, pirq);
> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> -            kill_timer(&pirq_dpci->timer);
> -        pirq_dpci->flags = 0;
> -        /*
> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> -         * call to pt_pirq_softirq_reset.
> -         */
> -        pt_pirq_softirq_reset(pirq_dpci);
>  
> -        pirq_cleanup_check(pirq, d);
> +        pt_irq_destroy_bind_common(d, pirq);
>      }
> +    else
> +        pt_irq_destroy_bind_msi(d, pirq);
>  
>      spin_unlock(&d->event_lock);
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 1277ce2..88a8007 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>              return NULL;
>          }
>          spin_lock_init(&msix->table_lock);
> +        msix->warned = DOMID_INVALID;
>          pdev->msix = msix;
>      }
>  
> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      return rc;
>  }
>  
> +/*
> + * Unmap established mappings between domain's pirq and device's MSI.
> + * These mappings were set up by qemu/guest and are expected to be
> + * destroyed when changing the device's ownership.
> + */
> +static void pci_unmap_msi(struct pci_dev *pdev)
> +{
> +    struct msi_desc *entry, *tmp;
> +
> +    ASSERT(pcidevs_locked());
> +
> +    if ( !pdev->domain )
> +        return;
> +
> +    spin_lock(&pdev->domain->event_lock);
> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )

Do you really need the _safe version here? Couldn't you even use:

while ( (entry = list_first_entry_or_null(...)) != NULL )
...

> +    {
> +        struct pirq *info;
> +        struct hvm_pirq_dpci *pirq_dpci;
> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
> +
> +        pirq_orig = pirq;
> +
> +        if ( !pirq )
> +            continue;
> +
> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
> +        if ( pirq < 0)
> +            pirq = -pirq;

I'm not sure I follow, the pirq hasn't been unmapped at this point
yet?

> +
> +        info = pirq_info(pdev->domain, pirq);
> +        if ( !info )
> +            continue;
> +        pirq_dpci = pirq_dpci(info);
> +
> +        if ( pirq_dpci &&
> +             (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +             (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
> +            pt_irq_destroy_bind_msi(pdev->domain, info);

I think this is missing unbinding for group MSI interrupts, you should
check the type and if it's MSI (not MSIX) iterate over the number of
vectors in msi.nvec in order to unbind them?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest
  2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
@ 2018-12-19  9:00   ` Roger Pau Monné
  2018-12-20  2:47     ` Chao Gao
  2019-01-02 11:49   ` Wei Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2018-12-19  9:00 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich, Andrew Cooper

On Tue, Dec 18, 2018 at 10:43:38PM +0800, Chao Gao wrote:
> When I destroyed a guest with 'xl destroy', I found the warning
> in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
> to that place, I got the call trace below:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
> (XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
> (XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
> (XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
> (XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
> (XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
> (XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
> (XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
> (XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
> (XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
> (XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
> (XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
> (XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
> (XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
> (XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
> (XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
> (XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
> (XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
> (XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
> (XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270
> 
> It seems to me that guest is trying to mask a msi while the memory decoding
> of the device is disabled. Performing a device reset without proper method
> to avoid guest's MSI-X operation would lead to this issue.
> 
> The fix is basic - detach pci device before resetting the device.

Seems quite obvious. Do you have any idea why the device was first
reset and then deassigned?

> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-19  8:57 ` Roger Pau Monné
@ 2018-12-20  2:46   ` Chao Gao
  2018-12-20  9:29     ` Roger Pau Monné
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2018-12-20  2:46 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote:
>On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
>> I find some pass-thru devices don't work any more across guest
>> reboot. Assigning it to another domain also meets the same issue. And
>> the only way to make it work again is un-binding and binding it to
>> pciback. Someone reported this issue one year ago [1].
>> 
>> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> killed/crashed before the domain shutdown, this domain's pirq won't be
>> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> destroying guest. But as pciback has already disabled meory decoding before
>> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> this process is:
>> 
>> ->arch_domain_destroy
>>     ->free_domain_pirqs
>>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>>             ->pirq_guest_force_unbind
>>                 ->__pirq_guest_unbind
>>                     ->mask_msi_irq(=desc->handler->disable())
>>                         ->the warning in msi_set_mask_bit()
>> 
>> The host_maskall bit will prevent guests from clearing the maskall bit
>> even the device is assigned to another guest later. Then guests cannot
>> receive MSIs from this device.
>> 
>> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> pciback. Specifically, when a device is detached from a guest, all established
>> mappings between pirq and msi are destroying before changing the ownership.
>> 
>> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> Applied this patch, qemu would report the error below:
>>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
>>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
>>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
>>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
>> 
>> Despite of the error, guest shutdown or device hotplug finishs smoothly.
>> It seems to me that qemu tries to unbind a msi which is already unbound by
>> the code added by this patch. I am not sure whether it is acceptable to
>> leave this error there.
>
>So QEMU would try to unmap IRQs after unbinding the device? I think

It seems to me yes. I don't know the reason right now. maybe because it
is an asynchronous process?

>QEMU should be fixed to first unmap the IRQs and then unbind the
>device.

Yes. Agree.

>
>As long as this doesn't affect QEMU functionality I guess the Xen side
>can be committed, but ideally a QEMU patch to avoid those error
>messages should be committed at the same time.
>
>> ---
>>  xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
>>  xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
>>  xen/include/xen/iommu.h       |  1 +
>>  3 files changed, 89 insertions(+), 18 deletions(-)
>> 
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index a6eb8a4..56ee1ef 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>>      return 0;
>>  }
>>  
>> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> +         list_empty(&pirq_dpci->digl_list) )
>> +    {
>> +        pirq_guest_unbind(d, pirq);
>> +        msixtbl_pt_unregister(d, pirq);
>> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> +            kill_timer(&pirq_dpci->timer);
>> +        pirq_dpci->flags = 0;
>> +        /*
>> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> +         * call to pt_pirq_softirq_reset.
>> +         */
>> +        pt_pirq_softirq_reset(pirq_dpci);
>> +
>> +        pirq_cleanup_check(pirq, d);
>> +    }
>> +}
>> +
>> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> +{
>> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> +
>> +    ASSERT(spin_is_locked(&d->event_lock));
>> +
>> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> +        pi_update_irte(NULL, pirq, 0);
>> +
>> +    pt_irq_destroy_bind_common(d, pirq);
>> +}
>> +
>>  int pt_irq_destroy_bind(
>>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>>  {
>> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>>          }
>>          else
>>              what = "bogus";
>> -    }
>> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> -        pi_update_irte(NULL, pirq, 0);
>> -
>> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> -         list_empty(&pirq_dpci->digl_list) )
>> -    {
>> -        pirq_guest_unbind(d, pirq);
>> -        msixtbl_pt_unregister(d, pirq);
>> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> -            kill_timer(&pirq_dpci->timer);
>> -        pirq_dpci->flags = 0;
>> -        /*
>> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> -         * call to pt_pirq_softirq_reset.
>> -         */
>> -        pt_pirq_softirq_reset(pirq_dpci);
>>  
>> -        pirq_cleanup_check(pirq, d);
>> +        pt_irq_destroy_bind_common(d, pirq);
>>      }
>> +    else
>> +        pt_irq_destroy_bind_msi(d, pirq);
>>  
>>      spin_unlock(&d->event_lock);
>>  
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 1277ce2..88a8007 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>>              return NULL;
>>          }
>>          spin_lock_init(&msix->table_lock);
>> +        msix->warned = DOMID_INVALID;
>>          pdev->msix = msix;
>>      }
>>  
>> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>      return rc;
>>  }
>>  
>> +/*
>> + * Unmap established mappings between domain's pirq and device's MSI.
>> + * These mappings were set up by qemu/guest and are expected to be
>> + * destroyed when changing the device's ownership.
>> + */
>> +static void pci_unmap_msi(struct pci_dev *pdev)
>> +{
>> +    struct msi_desc *entry, *tmp;
>> +
>> +    ASSERT(pcidevs_locked());
>> +
>> +    if ( !pdev->domain )
>> +        return;
>> +
>> +    spin_lock(&pdev->domain->event_lock);
>> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
>
>Do you really need the _safe version here? Couldn't you even use:

Don't need the _safe version.

>
>while ( (entry = list_first_entry_or_null(...)) != NULL )
>...

I think it is the same with list_for_each_entry(). Any reason makes you think
this one would be better?

>
>> +    {
>> +        struct pirq *info;
>> +        struct hvm_pirq_dpci *pirq_dpci;
>> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
>> +
>> +        pirq_orig = pirq;
>> +
>> +        if ( !pirq )
>> +            continue;
>> +
>> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
>> +        if ( pirq < 0)
>> +            pirq = -pirq;
>
>I'm not sure I follow, the pirq hasn't been unmapped at this point
>yet?

Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't
assert the pirq hasn't been unmapped here.

>
>> +
>> +        info = pirq_info(pdev->domain, pirq);
>> +        if ( !info )
>> +            continue;
>> +        pirq_dpci = pirq_dpci(info);
>> +
>> +        if ( pirq_dpci &&
>> +             (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +             (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
>> +            pt_irq_destroy_bind_msi(pdev->domain, info);
>
>I think this is missing unbinding for group MSI interrupts, you should
>check the type and if it's MSI (not MSIX) iterate over the number of
>vectors in msi.nvec in order to unbind them?

Good catch.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest
  2018-12-19  9:00   ` Roger Pau Monné
@ 2018-12-20  2:47     ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2018-12-20  2:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Wei Liu, Ian Jackson, Jan Beulich, Andrew Cooper

On Wed, Dec 19, 2018 at 10:00:49AM +0100, Roger Pau Monné wrote:
>On Tue, Dec 18, 2018 at 10:43:38PM +0800, Chao Gao wrote:
>> When I destroyed a guest with 'xl destroy', I found the warning
>> in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
>> to that place, I got the call trace below:
>> 
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
>> (XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
>> (XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
>> (XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
>> (XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
>> (XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
>> (XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
>> (XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
>> (XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
>> (XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
>> (XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
>> (XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
>> (XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
>> (XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
>> (XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
>> (XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
>> (XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
>> (XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
>> (XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
>> (XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270
>> 
>> It seems to me that guest is trying to mask a msi while the memory decoding
>> of the device is disabled. Performing a device reset without proper method
>> to avoid guest's MSI-X operation would lead to this issue.
>> 
>> The fix is basic - detach pci device before resetting the device.
>
>Seems quite obvious. Do you have any idea why the device was first
>reset and then deassigned?

TBH, I have no idea.

>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-20  2:46   ` Chao Gao
@ 2018-12-20  9:29     ` Roger Pau Monné
  2018-12-20 14:20       ` Chao Gao
  0 siblings, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2018-12-20  9:29 UTC (permalink / raw)
  To: Chao Gao; +Cc: xen-devel, Jan Beulich

On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote:
> On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote:
> >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
> >> I find some pass-thru devices don't work any more across guest
> >> reboot. Assigning it to another domain also meets the same issue. And
> >> the only way to make it work again is un-binding and binding it to
> >> pciback. Someone reported this issue one year ago [1].
> >> 
> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is
> >> killed/crashed before the domain shutdown, this domain's pirq won't be
> >> unmapped. Then xen takes over this work, unmapping all pirq-s, when
> >> destroying guest. But as pciback has already disabled meory decoding before
> >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
> >> this process is:
> >> 
> >> ->arch_domain_destroy
> >>     ->free_domain_pirqs
> >>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
> >>             ->pirq_guest_force_unbind
> >>                 ->__pirq_guest_unbind
> >>                     ->mask_msi_irq(=desc->handler->disable())
> >>                         ->the warning in msi_set_mask_bit()
> >> 
> >> The host_maskall bit will prevent guests from clearing the maskall bit
> >> even the device is assigned to another guest later. Then guests cannot
> >> receive MSIs from this device.
> >> 
> >> To fix this issue, a pirq is unmapped before memory decoding is disabled by
> >> pciback. Specifically, when a device is detached from a guest, all established
> >> mappings between pirq and msi are destroying before changing the ownership.
> >> 
> >> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
> >> 
> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
> >> ---
> >> Applied this patch, qemu would report the error below:
> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
> >> 
> >> Despite of the error, guest shutdown or device hotplug finishs smoothly.
> >> It seems to me that qemu tries to unbind a msi which is already unbound by
> >> the code added by this patch. I am not sure whether it is acceptable to
> >> leave this error there.
> >
> >So QEMU would try to unmap IRQs after unbinding the device? I think
> 
> It seems to me yes. I don't know the reason right now. maybe because it
> is an asynchronous process?
> 
> >QEMU should be fixed to first unmap the IRQs and then unbind the
> >device.
> 
> Yes. Agree.
> 
> >
> >As long as this doesn't affect QEMU functionality I guess the Xen side
> >can be committed, but ideally a QEMU patch to avoid those error
> >messages should be committed at the same time.
> >
> >> ---
> >>  xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
> >>  xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
> >>  xen/include/xen/iommu.h       |  1 +
> >>  3 files changed, 89 insertions(+), 18 deletions(-)
> >> 
> >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> >> index a6eb8a4..56ee1ef 100644
> >> --- a/xen/drivers/passthrough/io.c
> >> +++ b/xen/drivers/passthrough/io.c
> >> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
> >>      return 0;
> >>  }
> >>  
> >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
> >> +{
> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> >> +
> >> +    ASSERT(spin_is_locked(&d->event_lock));
> >> +
> >> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> >> +         list_empty(&pirq_dpci->digl_list) )
> >> +    {
> >> +        pirq_guest_unbind(d, pirq);
> >> +        msixtbl_pt_unregister(d, pirq);
> >> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> +            kill_timer(&pirq_dpci->timer);
> >> +        pirq_dpci->flags = 0;
> >> +        /*
> >> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> >> +         * call to pt_pirq_softirq_reset.
> >> +         */
> >> +        pt_pirq_softirq_reset(pirq_dpci);
> >> +
> >> +        pirq_cleanup_check(pirq, d);
> >> +    }
> >> +}
> >> +
> >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
> >> +{
> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
> >> +
> >> +    ASSERT(spin_is_locked(&d->event_lock));
> >> +
> >> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
> >> +        pi_update_irte(NULL, pirq, 0);
> >> +
> >> +    pt_irq_destroy_bind_common(d, pirq);
> >> +}
> >> +
> >>  int pt_irq_destroy_bind(
> >>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
> >>  {
> >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
> >>          }
> >>          else
> >>              what = "bogus";
> >> -    }
> >> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
> >> -        pi_update_irte(NULL, pirq, 0);
> >> -
> >> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
> >> -         list_empty(&pirq_dpci->digl_list) )
> >> -    {
> >> -        pirq_guest_unbind(d, pirq);
> >> -        msixtbl_pt_unregister(d, pirq);
> >> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> -            kill_timer(&pirq_dpci->timer);
> >> -        pirq_dpci->flags = 0;
> >> -        /*
> >> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
> >> -         * call to pt_pirq_softirq_reset.
> >> -         */
> >> -        pt_pirq_softirq_reset(pirq_dpci);
> >>  
> >> -        pirq_cleanup_check(pirq, d);
> >> +        pt_irq_destroy_bind_common(d, pirq);
> >>      }
> >> +    else
> >> +        pt_irq_destroy_bind_msi(d, pirq);
> >>  
> >>      spin_unlock(&d->event_lock);
> >>  
> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> >> index 1277ce2..88a8007 100644
> >> --- a/xen/drivers/passthrough/pci.c
> >> +++ b/xen/drivers/passthrough/pci.c
> >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
> >>              return NULL;
> >>          }
> >>          spin_lock_init(&msix->table_lock);
> >> +        msix->warned = DOMID_INVALID;
> >>          pdev->msix = msix;
> >>      }
> >>  
> >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
> >>      return rc;
> >>  }
> >>  
> >> +/*
> >> + * Unmap established mappings between domain's pirq and device's MSI.
> >> + * These mappings were set up by qemu/guest and are expected to be
> >> + * destroyed when changing the device's ownership.
> >> + */
> >> +static void pci_unmap_msi(struct pci_dev *pdev)
> >> +{
> >> +    struct msi_desc *entry, *tmp;
> >> +
> >> +    ASSERT(pcidevs_locked());
> >> +
> >> +    if ( !pdev->domain )
> >> +        return;
> >> +
> >> +    spin_lock(&pdev->domain->event_lock);
> >> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
> >
> >Do you really need the _safe version here? Couldn't you even use:
> 
> Don't need the _safe version.
> 
> >
> >while ( (entry = list_first_entry_or_null(...)) != NULL )
> >...
> 
> I think it is the same with list_for_each_entry(). Any reason makes you think
> this one would be better?

Doesn't 'entry' get freed when you call unmap_domain_pirq? In which
case using the list pointer from that struct would be a
use-after-free.

Using list_first_entry_or_null you don't need the previous entry in
order to get the next one, since you always pick the first one until
the list is empty.

> >
> >> +    {
> >> +        struct pirq *info;
> >> +        struct hvm_pirq_dpci *pirq_dpci;
> >> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
> >> +
> >> +        pirq_orig = pirq;
> >> +
> >> +        if ( !pirq )
> >> +            continue;
> >> +
> >> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
> >> +        if ( pirq < 0)
> >> +            pirq = -pirq;
> >
> >I'm not sure I follow, the pirq hasn't been unmapped at this point
> >yet?
> 
> Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't
> assert the pirq hasn't been unmapped here.

If the PIRQ is unmapped then the 'entry' would also be gone (freed)
AFAICT (see unmap_domain_pirq which calls msi_free_irq)?

I think that any entry in pdev->msi_list will always have entry->irq
 >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not
add an entry with irq < 0.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot
  2018-12-20  9:29     ` Roger Pau Monné
@ 2018-12-20 14:20       ` Chao Gao
  0 siblings, 0 replies; 11+ messages in thread
From: Chao Gao @ 2018-12-20 14:20 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

On Thu, Dec 20, 2018 at 10:29:14AM +0100, Roger Pau Monné wrote:
>On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote:
>> On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote:
>> >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote:
>> >> I find some pass-thru devices don't work any more across guest
>> >> reboot. Assigning it to another domain also meets the same issue. And
>> >> the only way to make it work again is un-binding and binding it to
>> >> pciback. Someone reported this issue one year ago [1].
>> >> 
>> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is
>> >> killed/crashed before the domain shutdown, this domain's pirq won't be
>> >> unmapped. Then xen takes over this work, unmapping all pirq-s, when
>> >> destroying guest. But as pciback has already disabled meory decoding before
>> >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
>> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
>> >> this process is:
>> >> 
>> >> ->arch_domain_destroy
>> >>     ->free_domain_pirqs
>> >>         ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
>> >>             ->pirq_guest_force_unbind
>> >>                 ->__pirq_guest_unbind
>> >>                     ->mask_msi_irq(=desc->handler->disable())
>> >>                         ->the warning in msi_set_mask_bit()
>> >> 
>> >> The host_maskall bit will prevent guests from clearing the maskall bit
>> >> even the device is assigned to another guest later. Then guests cannot
>> >> receive MSIs from this device.
>> >> 
>> >> To fix this issue, a pirq is unmapped before memory decoding is disabled by
>> >> pciback. Specifically, when a device is detached from a guest, all established
>> >> mappings between pirq and msi are destroying before changing the ownership.
>> >> 
>> >> [1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html
>> >> 
>> >> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> >> ---
>> >> Applied this patch, qemu would report the error below:
>> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 302, gvec: 0xd5)
>> >>     [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 301, gvec: 0xe5)
>> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 359, gvec: 0x41)
>> >>     [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: 1, pirq: 358, gvec: 0x51)
>> >> 
>> >> Despite of the error, guest shutdown or device hotplug finishs smoothly.
>> >> It seems to me that qemu tries to unbind a msi which is already unbound by
>> >> the code added by this patch. I am not sure whether it is acceptable to
>> >> leave this error there.
>> >
>> >So QEMU would try to unmap IRQs after unbinding the device? I think
>> 
>> It seems to me yes. I don't know the reason right now. maybe because it
>> is an asynchronous process?
>> 
>> >QEMU should be fixed to first unmap the IRQs and then unbind the
>> >device.
>> 
>> Yes. Agree.
>> 
>> >
>> >As long as this doesn't affect QEMU functionality I guess the Xen side
>> >can be committed, but ideally a QEMU patch to avoid those error
>> >messages should be committed at the same time.
>> >
>> >> ---
>> >>  xen/drivers/passthrough/io.c  | 57 +++++++++++++++++++++++++++++--------------
>> >>  xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++
>> >>  xen/include/xen/iommu.h       |  1 +
>> >>  3 files changed, 89 insertions(+), 18 deletions(-)
>> >> 
>> >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> >> index a6eb8a4..56ee1ef 100644
>> >> --- a/xen/drivers/passthrough/io.c
>> >> +++ b/xen/drivers/passthrough/io.c
>> >> @@ -619,6 +619,42 @@ int pt_irq_create_bind(
>> >>      return 0;
>> >>  }
>> >>  
>> >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
>> >> +{
>> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> >> +
>> >> +    ASSERT(spin_is_locked(&d->event_lock));
>> >> +
>> >> +    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> >> +         list_empty(&pirq_dpci->digl_list) )
>> >> +    {
>> >> +        pirq_guest_unbind(d, pirq);
>> >> +        msixtbl_pt_unregister(d, pirq);
>> >> +        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> +            kill_timer(&pirq_dpci->timer);
>> >> +        pirq_dpci->flags = 0;
>> >> +        /*
>> >> +         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> >> +         * call to pt_pirq_softirq_reset.
>> >> +         */
>> >> +        pt_pirq_softirq_reset(pirq_dpci);
>> >> +
>> >> +        pirq_cleanup_check(pirq, d);
>> >> +    }
>> >> +}
>> >> +
>> >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
>> >> +{
>> >> +    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
>> >> +
>> >> +    ASSERT(spin_is_locked(&d->event_lock));
>> >> +
>> >> +    if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> >> +        pi_update_irte(NULL, pirq, 0);
>> >> +
>> >> +    pt_irq_destroy_bind_common(d, pirq);
>> >> +}
>> >> +
>> >>  int pt_irq_destroy_bind(
>> >>      struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
>> >>  {
>> >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind(
>> >>          }
>> >>          else
>> >>              what = "bogus";
>> >> -    }
>> >> -    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
>> >> -        pi_update_irte(NULL, pirq, 0);
>> >> -
>> >> -    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>> >> -         list_empty(&pirq_dpci->digl_list) )
>> >> -    {
>> >> -        pirq_guest_unbind(d, pirq);
>> >> -        msixtbl_pt_unregister(d, pirq);
>> >> -        if ( pt_irq_need_timer(pirq_dpci->flags) )
>> >> -            kill_timer(&pirq_dpci->timer);
>> >> -        pirq_dpci->flags = 0;
>> >> -        /*
>> >> -         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
>> >> -         * call to pt_pirq_softirq_reset.
>> >> -         */
>> >> -        pt_pirq_softirq_reset(pirq_dpci);
>> >>  
>> >> -        pirq_cleanup_check(pirq, d);
>> >> +        pt_irq_destroy_bind_common(d, pirq);
>> >>      }
>> >> +    else
>> >> +        pt_irq_destroy_bind_msi(d, pirq);
>> >>  
>> >>      spin_unlock(&d->event_lock);
>> >>  
>> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> >> index 1277ce2..88a8007 100644
>> >> --- a/xen/drivers/passthrough/pci.c
>> >> +++ b/xen/drivers/passthrough/pci.c
>> >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>> >>              return NULL;
>> >>          }
>> >>          spin_lock_init(&msix->table_lock);
>> >> +        msix->warned = DOMID_INVALID;
>> >>          pdev->msix = msix;
>> >>      }
>> >>  
>> >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>> >>      return rc;
>> >>  }
>> >>  
>> >> +/*
>> >> + * Unmap established mappings between domain's pirq and device's MSI.
>> >> + * These mappings were set up by qemu/guest and are expected to be
>> >> + * destroyed when changing the device's ownership.
>> >> + */
>> >> +static void pci_unmap_msi(struct pci_dev *pdev)
>> >> +{
>> >> +    struct msi_desc *entry, *tmp;
>> >> +
>> >> +    ASSERT(pcidevs_locked());
>> >> +
>> >> +    if ( !pdev->domain )
>> >> +        return;
>> >> +
>> >> +    spin_lock(&pdev->domain->event_lock);
>> >> +    list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
>> >
>> >Do you really need the _safe version here? Couldn't you even use:
>> 
>> Don't need the _safe version.
>> 
>> >
>> >while ( (entry = list_first_entry_or_null(...)) != NULL )
>> >...
>> 
>> I think it is the same with list_for_each_entry(). Any reason makes you think
>> this one would be better?
>
>Doesn't 'entry' get freed when you call unmap_domain_pirq? In which
>case using the list pointer from that struct would be a
>use-after-free.
>
>Using list_first_entry_or_null you don't need the previous entry in
>order to get the next one, since you always pick the first one until
>the list is empty.

Agree. here I should use _safe version and will take you advice.

>
>> >
>> >> +    {
>> >> +        struct pirq *info;
>> >> +        struct hvm_pirq_dpci *pirq_dpci;
>> >> +        int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), pirq_orig;
>> >> +
>> >> +        pirq_orig = pirq;
>> >> +
>> >> +        if ( !pirq )
>> >> +            continue;
>> >> +
>> >> +        /* For forcibly unmapped pirq, lookup radix tree with absolute value */
>> >> +        if ( pirq < 0)
>> >> +            pirq = -pirq;
>> >
>> >I'm not sure I follow, the pirq hasn't been unmapped at this point
>> >yet?
>> 
>> Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't
>> assert the pirq hasn't been unmapped here.
>
>If the PIRQ is unmapped then the 'entry' would also be gone (freed)
>AFAICT (see unmap_domain_pirq which calls msi_free_irq)?
>
>I think that any entry in pdev->msi_list will always have entry->irq
> >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not
>add an entry with irq < 0.

Yes, you are right. I will add a "WARN_ON" when pirq < 0.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest
  2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
  2018-12-19  9:00   ` Roger Pau Monné
@ 2019-01-02 11:49   ` Wei Liu
  1 sibling, 0 replies; 11+ messages in thread
From: Wei Liu @ 2019-01-02 11:49 UTC (permalink / raw)
  To: Chao Gao
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, Jan Beulich, xen-devel,
	Roger Pau Monné

Where is 1/2 in this series?

On Tue, Dec 18, 2018 at 10:43:38PM +0800, Chao Gao wrote:
> When I destroyed a guest with 'xl destroy', I found the warning
> in msi_set_mask_bit() in Xen was triggered. After adding "WARN_ON(1)"
> to that place, I got the call trace below:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080281a6a>] msi.c#msi_set_mask_bit+0x1da/0x29b
> (XEN)    [<ffff82d080282e78>] guest_mask_msi_irq+0x1c/0x1e
> (XEN)    [<ffff82d08030ceb9>] vmsi.c#msixtbl_write+0x173/0x1d4
> (XEN)    [<ffff82d08030cf30>] vmsi.c#_msixtbl_write+0x16/0x18
> (XEN)    [<ffff82d0802ffac4>] hvm_process_io_intercept+0x216/0x270
> (XEN)    [<ffff82d0802ffb45>] hvm_io_intercept+0x27/0x4c
> (XEN)    [<ffff82d0802f0e86>] emulate.c#hvmemul_do_io+0x273/0x454
> (XEN)    [<ffff82d0802f10a4>] emulate.c#hvmemul_do_io_buffer+0x3d/0x70
> (XEN)    [<ffff82d0802f2343>] emulate.c#hvmemul_linear_mmio_access+0x35e/0x436
> (XEN)    [<ffff82d0802f2640>] emulate.c#linear_write+0xdd/0x13b
> (XEN)    [<ffff82d0802f3f25>] emulate.c#hvmemul_write+0xbd/0xf1
> (XEN)    [<ffff82d0802d51df>] x86_emulate+0x2249d/0x23c5c
> (XEN)    [<ffff82d0802d861f>] x86_emulate_wrapper+0x2b/0x5f
> (XEN)    [<ffff82d0802f28aa>] emulate.c#_hvm_emulate_one+0x54/0x1b2
> (XEN)    [<ffff82d0802f2a18>] hvm_emulate_one+0x10/0x12
> (XEN)    [<ffff82d080300227>] hvm_emulate_one_insn+0x42/0x14a
> (XEN)    [<ffff82d08030037e>] handle_mmio_with_translation+0x4f/0x51
> (XEN)    [<ffff82d0802f803b>] hvm_hap_nested_page_fault+0x16c/0x6d8
> (XEN)    [<ffff82d08032446a>] vmx_vmexit_handler+0x19b0/0x1f2e
> (XEN)    [<ffff82d08032995a>] vmx_asm_vmexit_handler+0xfa/0x270
> 
> It seems to me that guest is trying to mask a msi while the memory decoding
> of the device is disabled. Performing a device reset without proper method
> to avoid guest's MSI-X operation would lead to this issue.
> 
> The fix is basic - detach pci device before resetting the device.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  tools/libxl/libxl_pci.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 87afa03..855fb71 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1459,17 +1459,17 @@ skip1:
>          fclose(f);
>      }
>  out:
> -    /* don't do multiple resets while some functions are still passed through */
> -    if ( (pcidev->vdevfn & 0x7) == 0 ) {
> -        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
> -    }
> -
>      if (!isstubdom) {
>          rc = xc_deassign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
>          if (rc < 0 && (hvm || errno != ENOSYS))
>              LOGED(ERROR, domainid, "xc_deassign_device failed");
>      }
>  
> +    /* don't do multiple resets while some functions are still passed through */
> +    if ( (pcidev->vdevfn & 0x7) == 0 ) {
> +        libxl__device_pci_reset(gc, pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
> +    }
> +
>      stubdomid = libxl_get_stubdom_id(ctx, domid);
>      if (stubdomid != 0) {
>          libxl_device_pci pcidev_s = *pcidev;
> -- 
> 1.8.3.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-02 11:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18 14:43 [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Chao Gao
2018-12-18 14:43 ` [PATCH v3 2/2] libxl: don't reset device when it is accessible by the guest Chao Gao
2018-12-19  9:00   ` Roger Pau Monné
2018-12-20  2:47     ` Chao Gao
2019-01-02 11:49   ` Wei Liu
2018-12-18 15:53 ` [PATCH v3 1/2] xen/pt: fix some pass-thru devices don't work across reboot Jan Beulich
2018-12-19  4:56   ` Chao Gao
2018-12-19  8:57 ` Roger Pau Monné
2018-12-20  2:46   ` Chao Gao
2018-12-20  9:29     ` Roger Pau Monné
2018-12-20 14:20       ` Chao Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).