[PATCH v6 1/3] xen/pt: fix some pass-thru devices don't work across reboot

* [PATCH v6 1/3] xen/pt: fix some pass-thru devices don't work across reboot
@ 2019-01-25  8:26 Chao Gao
  2019-01-25  8:27 ` [PATCH v6 2/3] libxl: don't reset device when it is accessible by the guest Chao Gao
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Chao Gao @ 2019-01-25  8:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan, Ian Jackson,
	Julien Grall, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Chao Gao

I find some pass-thru devices don't work any more across guest
reboot. Assigning it to another domain also meets the same issue. And
the only way to make it work again is un-binding and binding it to
pciback. Someone reported this issue one year ago [1].

If the device's driver doesn't disable MSI-X during shutdown or qemu is
killed/crashed before the domain shutdown, this domain's pirq won't be
unmapped. Then xen takes over this work, unmapping all pirq-s, when
destroying guest. But as pciback has already disabled meory decoding before
xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit
to mask a MSI rather than sets maskbit in MSI-x table. The call trace of
this process is:

->arch_domain_destroy
    ->free_domain_pirqs
        ->unmap_domain_pirq (if pirq isn't unmapped by qemu)
            ->pirq_guest_force_unbind
                ->__pirq_guest_unbind
                    ->mask_msi_irq(=desc->handler->disable())
                        ->the warning in msi_set_mask_bit()

The host_maskall bit will prevent guests from clearing the maskall bit
even the device is assigned to another guest later. Then guests cannot
receive MSIs from this device.

To fix this issue, a pirq is unmapped before memory decoding is disabled by
pciback. Specifically, when a device is detached from a guest, all established
mappings between pirq and msi are destroying before changing the ownership.

With this behavior, qemu and pciback are not aware of the forcibly unbindng
and unmapping done by Xen. As a result, the state of pirq maintained by Xen and
pciback/qemu becomes inconsistent. Particularly for hot-plug/hot-unplug case,
guests stay alive; such inconsistency may cause other issues. To resolve
this inconsistency and keep compatibility with current qemu and pciback,
two flags, force_unmapped and force_unbound are used to denote that a pirq is
forcibly unmapped or unbound. The flags are set when Xen unbinds or unmaps the
pirq behind qemu and pciback. And subsequent unbinding or unmapping requests
from qemu/pciback can clear these flags and free the pirq.

[1]: https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v6:
 - introduce flags to denote that a pirq has been forcibly unmapped/unbound.
   It helps to keep compatibility with current qemu/pciback.

Changes in v5:
 - fix the potential infinite loop
 - assert that unmap_domain_pirq() won't fail
 - assert msi_list is empty after the loop in pci_unmap_msi
 - provide a stub for pt_irq_destroy_bind_msi() if !CONFIG_HVM to fix a
   compilation error when building PVShim

Changes in v4:
 - split out change to 'msix->warned' field
 - handle multiple msi cases
 - use list_first_entry_or_null to traverse 'pdev->msi_list'
---
 xen/arch/x86/domctl.c         |  6 +++-
 xen/arch/x86/irq.c            | 54 ++++++++++++++++++++++-------
 xen/drivers/passthrough/io.c  | 81 +++++++++++++++++++++++++++++--------------
 xen/drivers/passthrough/pci.c | 61 ++++++++++++++++++++++++++++++++
 xen/include/asm-x86/irq.h     |  1 +
 xen/include/xen/iommu.h       |  4 +++
 xen/include/xen/irq.h         |  9 ++++-
 7 files changed, 176 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9bf2d08..fb7dadc 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -732,7 +732,11 @@ long arch_do_domctl(
             break;
 
         ret = -EPERM;
-        if ( irq <= 0 || !irq_access_permitted(currd, irq) )
+        /*
+         * irq < 0 denotes the corresponding pirq has been forcibly unbound.
+         * For this case, bypass permission check to reap the pirq.
+         */
+        if ( !irq || ((irq > 0) && !irq_access_permitted(currd, irq)) )
             break;
 
         ret = xsm_unbind_pt_irq(XSM_HOOK, d, bind);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 23b4f42..fa533e1 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1345,10 +1345,8 @@ void (pirq_cleanup_check)(struct pirq *pirq, struct domain *d)
     /*
      * Check whether all fields have their default values, and delete
      * the entry from the tree if so.
-     *
-     * NB: Common parts were already checked.
      */
-    if ( pirq->arch.irq )
+    if ( pirq->force_unmapped || pirq->force_unbound || pirq->arch.irq )
         return;
 
     if ( is_hvm_domain(d) )
@@ -1582,6 +1580,13 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, int will_share)
     WARN_ON(!spin_is_locked(&v->domain->event_lock));
     BUG_ON(!local_irq_is_enabled());
 
+    if ( pirq->force_unmapped || pirq->force_unbound )
+    {
+        dprintk(XENLOG_G_ERR, "dom%d: forcibly unmapped/unbound pirq %d can't be bound\n",
+                v->domain->domain_id, pirq->pirq);
+        return -EINVAL;
+    }
+
  retry:
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
     if ( desc == NULL )
@@ -1793,6 +1798,7 @@ void pirq_guest_unbind(struct domain *d, struct pirq *pirq)
         desc = irq_to_desc(irq);
         spin_lock_irq(&desc->lock);
         clear_domain_irq_pirq(d, irq, pirq);
+        pirq->force_unbound = false;
     }
     else
     {
@@ -1811,12 +1817,11 @@ void pirq_guest_unbind(struct domain *d, struct pirq *pirq)
         cleanup_domain_irq_pirq(d, irq, pirq);
 }
 
-static bool pirq_guest_force_unbind(struct domain *d, struct pirq *pirq)
+static void pirq_guest_force_unbind(struct domain *d, struct pirq *pirq)
 {
     struct irq_desc *desc;
     irq_guest_action_t *action, *oldaction = NULL;
     unsigned int i;
-    bool bound = false;
 
     WARN_ON(!spin_is_locked(&d->event_lock));
 
@@ -1840,7 +1845,7 @@ static bool pirq_guest_force_unbind(struct domain *d, struct pirq *pirq)
     if ( i == action->nr_guests )
         goto out;
 
-    bound = true;
+    pirq->force_unbound = true;
     oldaction = __pirq_guest_unbind(d, pirq, desc);
 
  out:
@@ -1852,15 +1857,14 @@ static bool pirq_guest_force_unbind(struct domain *d, struct pirq *pirq)
         free_cpumask_var(oldaction->cpu_eoi_map);
         xfree(oldaction);
     }
-
-    return bound;
 }
 
 static inline bool is_free_pirq(const struct domain *d,
                                 const struct pirq *pirq)
 {
-    return !pirq || (!pirq->arch.irq && (!is_hvm_domain(d) ||
-        pirq->arch.hvm.emuirq == IRQ_UNBOUND));
+    return !pirq || (!pirq->force_unmapped && !pirq->force_unbound &&
+           !pirq->arch.irq && (!is_hvm_domain(d) ||
+           pirq->arch.hvm.emuirq == IRQ_UNBOUND));
 }
 
 int get_free_pirq(struct domain *d, int type)
@@ -1932,6 +1936,14 @@ int map_domain_pirq(
         return -EINVAL;
     }
 
+    info = pirq_info(d, pirq);
+    if ( info && (info->force_unmapped || info->force_unbound) )
+    {
+        dprintk(XENLOG_G_ERR, "dom%d: forcibly unmapped/unbound pirq %d can't be mapped\n",
+                d->domain_id, pirq);
+        return -EINVAL;
+    }
+
     old_irq = domain_pirq_to_irq(d, pirq);
     old_pirq = domain_irq_to_pirq(d, irq);
 
@@ -2125,7 +2137,7 @@ done:
 }
 
 /* The pirq should have been unbound before this call. */
-int unmap_domain_pirq(struct domain *d, int pirq)
+int unmap_domain_pirq_force(struct domain *d, int pirq, bool force_unmap)
 {
     unsigned long flags;
     struct irq_desc *desc;
@@ -2144,6 +2156,14 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     info = pirq_info(d, pirq);
     if ( !info || (irq = info->arch.irq) <= 0 )
     {
+        if ( info && info->force_unmapped )
+        {
+            info->force_unmapped = false;
+            pirq_cleanup_check(info, d);
+            ret = 0;
+            goto done;
+        }
+
         dprintk(XENLOG_G_ERR, "dom%d: pirq %d not mapped\n",
                 d->domain_id, pirq);
         ret = -EINVAL;
@@ -2170,7 +2190,8 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     if ( ret )
         goto done;
 
-    forced_unbind = pirq_guest_force_unbind(d, info);
+    pirq_guest_force_unbind(d, info);
+    forced_unbind = info->force_unbound;
     if ( forced_unbind )
         dprintk(XENLOG_G_WARNING, "dom%d: forcing unbind of pirq %d\n",
                 d->domain_id, pirq);
@@ -2184,6 +2205,9 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     {
         BUG_ON(irq != domain_pirq_to_irq(d, pirq + i));
 
+        if ( force_unmap )
+            info->force_unmapped = true;
+
         if ( !forced_unbind )
             clear_domain_irq_pirq(d, irq, info);
         else
@@ -2261,6 +2285,12 @@ int unmap_domain_pirq(struct domain *d, int pirq)
     return ret;
 }
 
+/* The pirq should have been unbound before this call. */
+int unmap_domain_pirq(struct domain *d, int pirq)
+{
+    return unmap_domain_pirq_force(d, pirq, false);
+}
+
 void free_domain_pirqs(struct domain *d)
 {
     int i;
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index a6eb8a4..2ec28a9 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -619,6 +619,42 @@ int pt_irq_create_bind(
     return 0;
 }
 
+static void pt_irq_destroy_bind_common(struct domain *d, struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
+         list_empty(&pirq_dpci->digl_list) )
+    {
+        pirq_guest_unbind(d, pirq);
+        msixtbl_pt_unregister(d, pirq);
+        if ( pt_irq_need_timer(pirq_dpci->flags) )
+            kill_timer(&pirq_dpci->timer);
+        pirq_dpci->flags = 0;
+        /*
+         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
+         * call to pt_pirq_softirq_reset.
+         */
+        pt_pirq_softirq_reset(pirq_dpci);
+
+        pirq_cleanup_check(pirq, d);
+    }
+}
+
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+    const struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq);
+
+    ASSERT(spin_is_locked(&d->event_lock));
+
+    if ( pirq_dpci && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
+
+    pt_irq_destroy_bind_common(d, pirq);
+}
+
 int pt_irq_destroy_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -650,14 +686,15 @@ int pt_irq_destroy_bind(
         struct irq_desc *desc = domain_spin_lock_irq_desc(d, machine_gsi,
                                                           &flags);
 
-        if ( !desc )
-            return -EINVAL;
-        /*
-         * Leave the MSI masked, so that the state when calling
-         * pt_irq_create_bind is consistent across bind/unbinds.
-         */
-        guest_mask_msi_irq(desc, true);
-        spin_unlock_irqrestore(&desc->lock, flags);
+        if ( desc )
+        {
+            /*
+             * Leave the MSI masked, so that the state when calling
+             * pt_irq_create_bind is consistent across bind/unbinds.
+             */
+            guest_mask_msi_irq(desc, true);
+            spin_unlock_irqrestore(&desc->lock, flags);
+        }
         break;
     }
 
@@ -676,6 +713,13 @@ int pt_irq_destroy_bind(
     }
 
     pirq = pirq_info(d, machine_gsi);
+    if ( pirq->force_unbound )
+    {
+        pirq_guest_unbind(d, pirq);
+        pirq_cleanup_check(pirq, d);
+        spin_unlock(&d->event_lock);
+        return 0;
+    }
     pirq_dpci = pirq_dpci(pirq);
 
     if ( hvm_irq_dpci && pt_irq_bind->irq_type != PT_IRQ_TYPE_MSI )
@@ -727,26 +771,11 @@ int pt_irq_destroy_bind(
         }
         else
             what = "bogus";
-    }
-    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
-        pi_update_irte(NULL, pirq, 0);
 
-    if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
-         list_empty(&pirq_dpci->digl_list) )
-    {
-        pirq_guest_unbind(d, pirq);
-        msixtbl_pt_unregister(d, pirq);
-        if ( pt_irq_need_timer(pirq_dpci->flags) )
-            kill_timer(&pirq_dpci->timer);
-        pirq_dpci->flags = 0;
-        /*
-         * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the
-         * call to pt_pirq_softirq_reset.
-         */
-        pt_pirq_softirq_reset(pirq_dpci);
-
-        pirq_cleanup_check(pirq, d);
+        pt_irq_destroy_bind_common(d, pirq);
     }
+    else
+        pt_irq_destroy_bind_msi(d, pirq);
 
     spin_unlock(&d->event_lock);
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 93c20b9..a347806 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1514,6 +1514,63 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     return rc;
 }
 
+/*
+ * Unmap established mappings between domain's pirq and device's MSI.
+ * These mappings were set up by qemu/guest and are expected to be
+ * destroyed when changing the device's ownership.
+ */
+static int pci_unmap_msi(struct pci_dev *pdev)
+{
+    struct msi_desc *entry, *tmp;
+    struct domain *d = pdev->domain;
+    int ret = 0;
+
+    ASSERT(pcidevs_locked());
+    ASSERT(d);
+
+    spin_lock(&d->event_lock);
+    list_for_each_entry_safe(entry, tmp, &pdev->msi_list, list)
+    {
+        int ret, pirq = 0;
+        unsigned int nr = entry->msi_attrib.type != PCI_CAP_ID_MSIX
+                          ? entry->msi.nvec : 1;
+
+        while ( nr-- )
+        {
+            struct pirq *info;
+            struct hvm_pirq_dpci *pirq_dpci;
+
+            pirq = domain_irq_to_pirq(d, entry[nr].irq);
+            WARN_ON(pirq < 0);
+            if ( pirq <= 0 )
+                continue;
+
+            info = pirq_info(d, pirq);
+            if ( !info )
+                continue;
+
+            pirq_dpci = pirq_dpci(info);
+            if ( pirq_dpci &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+                 (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) )
+            {
+                info->force_unbound = true;
+                pt_irq_destroy_bind_msi(d, info);
+            }
+        }
+
+        if ( pirq > 0 )
+        {
+            ret = unmap_domain_pirq_force(d, pirq, true);
+            if ( ret )
+                break;
+        }
+    }
+    spin_unlock(&d->event_lock);
+
+    return ret;
+}
+
 /* caller should hold the pcidevs_lock */
 int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 {
@@ -1529,6 +1586,10 @@ int deassign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
     if ( !pdev )
         return -ENODEV;
 
+    ret = pci_unmap_msi(pdev);
+    if ( ret )
+        return ret;
+
     while ( pdev->phantom_stride )
     {
         devfn += pdev->phantom_stride;
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 4b39997..b74c976 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -140,6 +140,7 @@ int pirq_shared(struct domain *d , int irq);
 int map_domain_pirq(struct domain *d, int pirq, int irq, int type,
                            void *data);
 int unmap_domain_pirq(struct domain *d, int pirq);
+int unmap_domain_pirq_force(struct domain *d, int pirq, bool force);
 int get_free_pirq(struct domain *d, int type);
 int get_free_pirqs(struct domain *, unsigned int nr);
 void free_domain_pirqs(struct domain *d);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index cdc8021..76b186a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -151,8 +151,12 @@ struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
 void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci);
 #ifdef CONFIG_HVM
 bool pt_irq_need_timer(uint32_t flags);
+void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq);
 #else
 static inline bool pt_irq_need_timer(unsigned int flags) { return false; }
+static inline void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq)
+{
+}
 #endif
 
 struct msi_desc;
diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
index 586b783..1892875 100644
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -128,7 +128,14 @@ struct vcpu;
 struct pirq {
     int pirq;
     u16 evtchn;
-    bool_t masked;
+    bool masked;
+    /*
+     * Set when forcibly unmapped or unbound by Xen. Subsequent unmapping
+     * and unbinding request from qemu or pciback would clear these flags
+     * to reap this pirq.
+     */
+    bool force_unmapped;
+    bool force_unbound;
     struct rcu_head rcu_head;
     struct arch_pirq arch;
 };
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread