All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list
@ 2017-03-29  5:11 Chao Gao
  2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

The current VT-d PI related code may operate incorrectly in the 
following scenarios: 
1. When VT-d PI is enabled, neen't migrate pirq which is using VT-d PI during
vCPU migration. Patch [1/6] solves this by introducing a new flag to indicate
that the pt-irq is delivered through VT-d PI.

2. msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format.
It wrongly inherits the 'im' field from the live IRTE but updates all the
other fileds to remapping format. Patch [2/6] handles this.

3. [3/6] is a cleanup patch 

4. When a pCPU is unplugged, and there might be vCPUs on its 
list. Since the pCPU is offline, those vCPUs might not be woken 
up again. [4/6] addresses it. 

5. IRTE is updated through structure assigment which is unsafe in some cases.
To resolve this, Patch [5/6] provides two variants, atomic and non-atomic, to
update IRTE. And a bug is raised when we can't meet the caller's atomic
requirement.

6. We didn't change the IRTE to remapping format when pt-irq is configurated
to have multi-destination vCPUs. Patch [6/6] resolves this problem.


Chao Gao (4):
  passthrough: don't migrate pirq when it is delivered through VT-d PI
  VT-d: Introduce new fields in msi_desc to track binding with guest
    interrupt
  VT-d: introduce update_irte to update irte safely
  passthrough/io: Fall back to remapping interrupt when we can't use
    VT-d PI

Feng Wu (2):
  VT-d: Some cleanups
  VMX: Fixup PI descriptor when cpu is offline

 xen/arch/x86/hvm/hvm.c                 |   3 +
 xen/arch/x86/hvm/vmx/vmcs.c            |   1 +
 xen/arch/x86/hvm/vmx/vmx.c             |  70 ++++++++++
 xen/arch/x86/msi.c                     |   2 +
 xen/drivers/passthrough/io.c           |  71 ++--------
 xen/drivers/passthrough/vtd/intremap.c | 236 +++++++++++++++------------------
 xen/include/asm-x86/hvm/vmx/vmx.h      |   1 +
 xen/include/asm-x86/msi.h              |   3 +
 xen/include/xen/hvm/irq.h              |   1 +
 9 files changed, 204 insertions(+), 184 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-31  5:28   ` Tian, Kevin
  2017-03-31  9:31   ` Jan Beulich
  2017-03-29  5:11 ` [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Chao Gao

When a vCPU migrated to another pCPU, pt irqs binded to this vCPU also needed
migration. When VT-d PI is enabled, interrupt vector will be recorded to
a main memory resident data-structure and a notification whose destination
is decided by NDST is generated. NDST is properly adjusted during vCPU
migration so pirq directly injected to guest needn't be migrated.

This patch adds a indicator, @posted, to show whether the pt irq is delivered
through VT-d PI.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v11:
- rename the indicator to 'posted'
- move setting 'posted' field to event lock un-locked region.

v10:
- Newly added.

 xen/arch/x86/hvm/hvm.c       |  3 +++
 xen/drivers/passthrough/io.c | 62 +++++++++-----------------------------------
 xen/include/xen/hvm/irq.h    |  1 +
 3 files changed, 16 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0282986..2d8de16 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     struct vcpu *v = arg;
 
     if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
+         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
+         (!pirq_dpci->gmsi.posted) &&
          (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
     {
         struct irq_desc *desc =
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 080183e..d53976c 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -259,52 +259,6 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
     return dest;
 }
 
-/*
- * The purpose of this routine is to find the right destination vCPU for
- * an interrupt which will be delivered by VT-d posted-interrupt. There
- * are several cases as below:
- *
- * - For lowest-priority interrupts, use vector-hashing mechanism to find
- *   the destination.
- * - Otherwise, for single destination interrupt, it is straightforward to
- *   find the destination vCPU and return true.
- * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
- *   so return NULL.
- */
-static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
-                                      bool_t dest_mode, uint8_t delivery_mode,
-                                      uint8_t gvec)
-{
-    unsigned int dest_vcpus = 0;
-    struct vcpu *v, *dest = NULL;
-
-    switch ( delivery_mode )
-    {
-    case dest_LowestPrio:
-        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
-    case dest_Fixed:
-        for_each_vcpu ( d, v )
-        {
-            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
-                                    dest_id, dest_mode) )
-                continue;
-
-            dest_vcpus++;
-            dest = v;
-        }
-
-        /* For fixed mode, we only handle single-destination interrupts. */
-        if ( dest_vcpus == 1 )
-            return dest;
-
-        break;
-    default:
-        break;
-    }
-
-    return NULL;
-}
-
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -365,6 +319,7 @@ int pt_irq_create_bind(
     {
         uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
+        const struct vcpu *vcpu;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         {
@@ -442,17 +397,24 @@ int pt_irq_create_bind(
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
+
+        pirq_dpci->gmsi.posted = false;
+        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
+        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
+        {
+            vcpu = vector_hashing_dest(d, dest, dest_mode,
+                                       pirq_dpci->gmsi.gvec);
+            if ( vcpu )
+                pirq_dpci->gmsi.posted = true;
+        }
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
         {
-            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
-                                          delivery_mode, pirq_dpci->gmsi.gvec);
-
             if ( vcpu )
-                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
             else
                 dprintk(XENLOG_G_INFO,
                         "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index d3f8623..566854a 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -63,6 +63,7 @@ struct hvm_gmsi_info {
     uint32_t gvec;
     uint32_t gflags;
     int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
+    bool posted; /* directly deliver to guest via VT-d PI? */
 };
 
 struct hvm_girq_dpci_mapping {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-31  5:46   ` Tian, Kevin
  2017-03-31  9:48   ` Jan Beulich
  2017-03-29  5:11 ` [PATCH v11 3/6] VT-d: Some cleanups Chao Gao
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format. It
wrongly inherits the 'im' field meaning the IRTE is in posted format but
updates all the other fields to remapping format.

There are also two situations that lead to the above issue. One is some callers
really want to change the IRTE to remapped format. The other is some callers
only want to update msi message (e.g. set msi affinity) for they don't aware
that this msi is binded with a guest interrupt. We should suppress update
in the second situation. To distinguish them, straightforwardly, we can let
caller specify which format of IRTE they want update to. It isn't feasible for
making all callers be aware of the binding with guest interrupt will cause a
far more complicated change (including the interfaces exposed to IOAPIC and
MSI). Also some callings happen in interrupt context where we can't acquire
d->event_lock to read struct hvm_pirq_dpci.

This patch introduces two new fields in msi_desc to track binding with a guest
interrupt such that msi_msg_to_remap_entry() can get the binding and update
IRTE accordingly. After that change, pi_update_irte() can utilize
msi_msg_to_remap_entry() to update IRTE to posted format.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v11:
- Commit message changes.
- Add code to clear the two new fields introduced in this patch.

v10:
- Newly added.

 xen/arch/x86/msi.c                     |   1 +
 xen/drivers/passthrough/io.c           |   2 +
 xen/drivers/passthrough/vtd/intremap.c | 150 +++++++--------------------------
 xen/include/asm-x86/msi.h              |   2 +
 4 files changed, 36 insertions(+), 119 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index a868007..3374cd4 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
         entry[nr].dev = NULL;
         entry[nr].irq = -1;
         entry[nr].remap_index = -1;
+        entry[nr].pi_desc = NULL;
     }
 
     return entry;
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index d53976c..5dbfe53 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
         else
             what = "bogus";
     }
+    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
 
     if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
          list_empty(&pirq_dpci->digl_list) )
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index bfd468b..9bbf0f6 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -551,12 +551,12 @@ static int msi_msg_to_remap_entry(
     struct iommu *iommu, struct pci_dev *pdev,
     struct msi_desc *msi_desc, struct msi_msg *msg)
 {
-    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
-    struct iremap_entry new_ire;
+    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
     struct msi_msg_remap_entry *remap_rte;
     unsigned int index, i, nr = 1;
     unsigned long flags;
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+    const struct pi_desc *pi_desc = msi_desc->pi_desc;
 
     if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
         nr = msi_desc->msi.nvec;
@@ -595,33 +595,35 @@ static int msi_msg_to_remap_entry(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
-
-    /* Set interrupt remapping table entry */
-    new_ire.remap.fpd = 0;
-    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
-    /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
-    new_ire.remap.avail = 0;
-    new_ire.remap.res_1 = 0;
-    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                            MSI_DATA_VECTOR_MASK;
-    new_ire.remap.res_2 = 0;
-    if ( x2apic_enabled )
-        new_ire.remap.dst = msg->dest32;
+    if ( !pi_desc )
+    {
+        new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
+        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
+        new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
+        /* Hardware requires RH = 1 for lowest priority delivery mode */
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                                MSI_DATA_VECTOR_MASK;
+        if ( x2apic_enabled )
+            new_ire.remap.dst = msg->dest32;
+        else
+            new_ire.remap.dst =
+                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
+        new_ire.remap.p = 1;
+    }
     else
-        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                             & 0xff) << 8;
+    {
+        new_ire.post.im = 1;
+        new_ire.post.vector = msi_desc->gvec;
+        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
+        new_ire.post.p = 1;
+    }
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.remap.res_3 = 0;
-    new_ire.remap.res_4 = 0;
-    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
@@ -902,42 +904,6 @@ void iommu_disable_x2apic_IR(void)
         disable_qinval(drhd->iommu);
 }
 
-static void setup_posted_irte(
-    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
-    const struct pi_desc *pi_desc, const uint8_t gvec)
-{
-    memset(new_ire, 0, sizeof(*new_ire));
-
-    /*
-     * 'im' filed decides whether the irte is in posted format (with value 1)
-     * or remapped format (with value 0), if the old irte is in remapped format,
-     * we copy things from remapped part in 'struct iremap_entry', otherwise,
-     * we copy from posted part.
-     */
-    if ( !old_ire->remap.im )
-    {
-        new_ire->post.p = old_ire->remap.p;
-        new_ire->post.fpd = old_ire->remap.fpd;
-        new_ire->post.sid = old_ire->remap.sid;
-        new_ire->post.sq = old_ire->remap.sq;
-        new_ire->post.svt = old_ire->remap.svt;
-    }
-    else
-    {
-        new_ire->post.p = old_ire->post.p;
-        new_ire->post.fpd = old_ire->post.fpd;
-        new_ire->post.sid = old_ire->post.sid;
-        new_ire->post.sq = old_ire->post.sq;
-        new_ire->post.svt = old_ire->post.svt;
-        new_ire->post.urg = old_ire->post.urg;
-    }
-
-    new_ire->post.im = 1;
-    new_ire->post.vector = gvec;
-    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
-    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
-}
-
 /*
  * This function is used to update the IRTE for posted-interrupt
  * when guest changes MSI/MSI-X information.
@@ -946,17 +912,9 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
     const uint8_t gvec)
 {
     struct irq_desc *desc;
-    const struct msi_desc *msi_desc;
-    int remap_index;
-    int rc = 0;
-    const struct pci_dev *pci_dev;
-    const struct acpi_drhd_unit *drhd;
-    struct iommu *iommu;
-    struct ir_ctrl *ir_ctrl;
-    struct iremap_entry *iremap_entries = NULL, *p = NULL;
-    struct iremap_entry new_ire, old_ire;
-    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
-    __uint128_t ret;
+    struct msi_desc *msi_desc;
+    const struct pi_desc *pi_desc = v ? &v->arch.hvm_vmx.pi_desc : NULL;
+    int rc;
 
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
     if ( !desc )
@@ -968,59 +926,13 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
         rc = -ENODEV;
         goto unlock_out;
     }
-
-    pci_dev = msi_desc->dev;
-    if ( !pci_dev )
-    {
-        rc = -ENODEV;
-        goto unlock_out;
-    }
-
-    remap_index = msi_desc->remap_index;
+    msi_desc->pi_desc = pi_desc;
+    msi_desc->gvec = gvec;
 
     spin_unlock_irq(&desc->lock);
 
     ASSERT(pcidevs_locked());
-
-    /*
-     * FIXME: For performance reasons we should store the 'iommu' pointer in
-     * 'struct msi_desc' in some other place, so we don't need to waste
-     * time searching it here.
-     */
-    drhd = acpi_find_matched_drhd_unit(pci_dev);
-    if ( !drhd )
-        return -ENODEV;
-
-    iommu = drhd->iommu;
-    ir_ctrl = iommu_ir_ctrl(iommu);
-    if ( !ir_ctrl )
-        return -ENODEV;
-
-    spin_lock_irq(&ir_ctrl->iremap_lock);
-
-    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
-
-    old_ire = *p;
-
-    /* Setup/Update interrupt remapping table entry. */
-    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
-    ret = cmpxchg16b(p, &old_ire, &new_ire);
-
-    /*
-     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
-     * and the hardware cannot update the IRTE behind us, so the return value
-     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
-     */
-    ASSERT(ret == old_ire.val);
-
-    iommu_flush_cache_entry(p, sizeof(*p));
-    iommu_flush_iec_index(iommu, 0, remap_index);
-
-    unmap_vtd_domain_page(iremap_entries);
-
-    spin_unlock_irq(&ir_ctrl->iremap_lock);
-
-    return 0;
+    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
 
  unlock_out:
     spin_unlock_irq(&desc->lock);
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 9c02945..fc9ab04 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -118,6 +118,8 @@ struct msi_desc {
 	struct msi_msg msg;		/* Last set MSI message */
 
 	int remap_index;		/* index in interrupt remapping table */
+	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
+	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
 };
 
 /*
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v11 3/6] VT-d: Some cleanups
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
  2017-03-29  5:11 ` [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-29  5:11 ` [PATCH v11 4/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

Use type-safe structure assignment instead of memcpy()
Use sizeof(*iremap_entry).

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v10:
- Added several lines for patch [1/6] has been reworked.

v9:
- Delete several lines for patch [5/8] has been reworked.

v7: 
- Remove a useless cleanup

v6: 
- More descripion about the patch

 xen/drivers/passthrough/vtd/intremap.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 9bbf0f6..b992f23 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -183,8 +183,8 @@ static void free_remap_entry(struct iommu *iommu, int index)
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memset(iremap_entry, 0, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    memset(iremap_entry, 0, sizeof(*iremap_entry));
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -310,7 +310,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
+    new_ire = *iremap_entry;
 
     if ( rte_upper )
     {
@@ -353,8 +353,8 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -639,8 +639,8 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v11 4/6] VMX: Fixup PI descriptor when cpu is offline
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (2 preceding siblings ...)
  2017-03-29  5:11 ` [PATCH v11 3/6] VT-d: Some cleanups Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-29  5:11 ` [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely Chao Gao
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c       |  1 +
 xen/arch/x86/hvm/vmx/vmx.c        | 70 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 934674c..99c77b9 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -591,6 +591,7 @@ void vmx_cpu_dead(unsigned int cpu)
     vmx_free_vmcs(per_cpu(vmxon_region, cpu));
     per_cpu(vmxon_region, cpu) = 0;
     nvmx_cpu_dead(cpu);
+    vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d201956..25f9ec9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -199,6 +199,76 @@ static void vmx_pi_do_resume(struct vcpu *v)
     vmx_pi_unblock_vcpu(v);
 }
 
+void vmx_pi_desc_fixup(unsigned int cpu)
+{
+    unsigned int new_cpu, dest;
+    unsigned long flags;
+    struct arch_vmx_struct *vmx, *tmp;
+    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+    if ( !iommu_intpost )
+        return;
+
+    /*
+     * We are in the context of CPU_DEAD or CPU_UP_CANCELED notification,
+     * and it is impossible for a second CPU go down in parallel. So we
+     * can safely acquire the old cpu's lock and then acquire the new_cpu's
+     * lock after that.
+     */
+    spin_lock_irqsave(old_lock, flags);
+
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+    {
+        /*
+         * Suppress notification or we may miss an interrupt when the
+         * target cpu is dying.
+         */
+        pi_set_sn(&vmx->pi_desc);
+
+        /*
+         * Check whether a notification is pending before doing the
+         * movement, if that is the case we need to wake up it directly
+         * other than moving it to the new cpu's list.
+         */
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocking.list);
+            vmx->pi_blocking.lock = NULL;
+            vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
+        }
+        else
+        {
+            /*
+             * We need to find an online cpu as the NDST of the PI descriptor, it
+             * doesn't matter whether it is within the cpupool of the domain or
+             * not. As long as it is online, the vCPU will be woken up once the
+             * notification event arrives.
+             */
+            new_cpu = cpumask_any(&cpu_online_map);
+            new_lock = &per_cpu(vmx_pi_blocking, new_cpu).lock;
+
+            spin_lock(new_lock);
+
+            ASSERT(vmx->pi_blocking.lock == old_lock);
+
+            dest = cpu_physical_id(new_cpu);
+            write_atomic(&vmx->pi_desc.ndst,
+                         x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+
+            list_move(&vmx->pi_blocking.list,
+                      &per_cpu(vmx_pi_blocking, new_cpu).list);
+            vmx->pi_blocking.lock = new_lock;
+
+            spin_unlock(new_lock);
+        }
+
+        pi_clear_sn(&vmx->pi_desc);
+    }
+
+    spin_unlock_irqrestore(old_lock, flags);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 2b781ab..5ead57c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -597,6 +597,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(unsigned int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (3 preceding siblings ...)
  2017-03-29  5:11 ` [PATCH v11 4/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-31  6:03   ` Tian, Kevin
  2017-03-31 10:01   ` Jan Beulich
  2017-03-29  5:11 ` [PATCH v11 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
  2017-03-31  5:13 ` [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Tian, Kevin
  6 siblings, 2 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

We used structure assignment to update irte which was non-atomic when the
whole IRTE was to be updated. It is unsafe when a interrupt happened during
update. Furthermore, no bug or warning would be reported when this happened.

This patch introduces two variants, atomic and non-atomic, to update
irte. Both variants will update IRTE if possible. If the caller requests a
atomic update but we can't meet it, we raise a bug.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v11:
- Add two variant function to update IRTE. Call the non-atomic one for init
and clear operations. Call the atomic one for other cases.
- Add a new field to indicate the remap_entry associated with msi_desc is
initialized or not.

v10:
- rename copy_irte_to_irt to update_irte
- remove copy_from_to_irt
- change commmit message and add some comments to illustrate on which
condition update_irte() is safe.

 xen/arch/x86/msi.c                     |  1 +
 xen/drivers/passthrough/vtd/intremap.c | 78 ++++++++++++++++++++++++++++++++--
 xen/include/asm-x86/msi.h              |  1 +
 3 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 3374cd4..7ed1243 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
         entry[nr].dev = NULL;
         entry[nr].irq = -1;
         entry[nr].remap_index = -1;
+        entry[nr].remap_entry_initialized = false;
         entry[nr].pi_desc = NULL;
     }
 
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index b992f23..b7f3cf1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -169,10 +169,64 @@ bool_t __init iommu_supports_eim(void)
     return 1;
 }
 
+static void update_irte(struct iremap_entry *entry,
+                        const struct iremap_entry *new_ire,
+                        bool atomic)
+{
+    if ( cpu_has_cx16 )
+    {
+        __uint128_t ret;
+        struct iremap_entry old_ire;
+
+        old_ire = *entry;
+        ret = cmpxchg16b(entry, &old_ire, new_ire);
+
+        /*
+         * In the above, we use cmpxchg16 to atomically update the 128-bit
+         * IRTE, and the hardware cannot update the IRTE behind us, so
+         * the return value of cmpxchg16 should be the same as old_ire.
+         * This ASSERT validate it.
+         */
+        ASSERT(ret == old_ire.val);
+    }
+    else
+    {
+        /*
+         * The following code will update irte atomically if possible.
+         * If the caller requests a atomic update but we can't meet it, 
+         * a bug will be raised.
+         */
+        if ( entry->lo == new_ire->lo )
+            entry->hi = new_ire->hi;
+        else if ( entry->hi == new_ire->hi )
+            entry->lo = new_ire->lo;
+        else if ( !atomic )
+        {
+            entry->lo = new_ire->lo;
+            entry->hi = new_ire->hi;
+        }
+        else
+            BUG();
+    }
+}
+
+static inline void update_irte_non_atomic(struct iremap_entry *entry,
+                                          const struct iremap_entry *new_ire)
+{
+    update_irte(entry, new_ire, false);
+}
+
+static inline void update_irte_atomic(struct iremap_entry *entry,
+                                      const struct iremap_entry *new_ire)
+{
+    update_irte(entry, new_ire, true);
+}
+
+
 /* Mark specified intr remap entry as free */
 static void free_remap_entry(struct iommu *iommu, int index)
 {
-    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
+    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
 
     if ( index < 0 || index > IREMAP_ENTRY_NR - 1 )
@@ -183,7 +237,7 @@ static void free_remap_entry(struct iommu *iommu, int index)
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memset(iremap_entry, 0, sizeof(*iremap_entry));
+    update_irte_non_atomic(iremap_entry, &new_ire);
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
@@ -286,6 +340,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     int index;
     unsigned long flags;
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+    bool init = false;
 
     remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
     spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
@@ -296,6 +351,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         index = alloc_remap_entry(iommu, 1);
         if ( index < IREMAP_ENTRY_NR )
             apic_pin_2_ir_idx[apic][ioapic_pin] = index;
+        init = true;
     }
 
     if ( index > IREMAP_ENTRY_NR - 1 )
@@ -353,7 +409,11 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    *iremap_entry = new_ire;
+    if ( init )
+        update_irte_non_atomic(iremap_entry, &new_ire);
+    else
+        update_irte_atomic(iremap_entry, &new_ire);
+
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
@@ -567,7 +627,10 @@ static int msi_msg_to_remap_entry(
     {
         /* Free specified unused IRTEs */
         for ( i = 0; i < nr; ++i )
+        {
             free_remap_entry(iommu, msi_desc->remap_index + i);
+            msi_desc[i].remap_entry_initialized = false;
+        }
         spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
         return 0;
     }
@@ -639,7 +702,14 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    *iremap_entry = new_ire;
+    if ( msi_desc->remap_entry_initialized )
+        update_irte_atomic(iremap_entry, &new_ire);
+    else
+    {
+        update_irte_non_atomic(iremap_entry, &new_ire);
+        msi_desc->remap_entry_initialized = true;
+    }
+
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index fc9ab04..a0bd3af 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -118,6 +118,7 @@ struct msi_desc {
 	struct msi_msg msg;		/* Last set MSI message */
 
 	int remap_index;		/* index in interrupt remapping table */
+	bool remap_entry_initialized;
 	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
 	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
 };
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v11 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (4 preceding siblings ...)
  2017-03-29  5:11 ` [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-03-29  5:11 ` Chao Gao
  2017-03-31  5:13 ` [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Tian, Kevin
  6 siblings, 0 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-29  5:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Jan Beulich, Chao Gao

The current logic of using VT-d pi is when guest configurates the pirq's
destination vcpu to a single vcpu, the according IRTE is updated to
posted format. If the destination of the pirq is multiple vcpus, we will
stay in posted format. Obviously, we should fall back to remapping interrupt
when guest wrongly configurate destination of pirq or makes it have
multi-destination vcpus.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v11:
- move the code (one line) that allow the parameter 'vcpu' of pi_update_irte()
can be NULL to Patch [2/6].

v10:
- Newly added
 xen/drivers/passthrough/io.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 5dbfe53..93de0c2 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -412,14 +412,7 @@ int pt_irq_create_bind(
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
-        {
-            if ( vcpu )
-                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
-            else
-                dprintk(XENLOG_G_INFO,
-                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
-                        vcpu, pirq_dpci->gmsi.gvec);
-        }
+            pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
 
         break;
     }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-31  5:46   ` Tian, Kevin
@ 2017-03-30 23:01     ` Chao Gao
  2017-03-31  8:11       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-03-30 23:01 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

On Fri, Mar 31, 2017 at 01:46:33PM +0800, Tian, Kevin wrote:
>> From: Gao, Chao
>> Sent: Wednesday, March 29, 2017 1:12 PM
>> 
>>      return entry;
>> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
>> index d53976c..5dbfe53 100644
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
>>          else
>>              what = "bogus";
>>      }
>> +    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )
>
>No need to check iommu_intpost. Just keep later conditions.

ok. Is it for if posted is set, the iommu_intpost should be true?

>
>btw isn't the condition be " (pirq_dpci && pirq_dpci->gmsi.posted)"?

I don't think so. pirq, not pirq_dpci, is consumed by pi_update_irte.
Furthermore if pirq_dpci is null, pirq is also null. But it is not true
in turn.

>
>> +        pi_update_irte(NULL, pirq, 0);
>> 
>>      if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>>           list_empty(&pirq_dpci->digl_list) )
>>  /*
>>   * This function is used to update the IRTE for posted-interrupt
>>   * when guest changes MSI/MSI-X information.
>> @@ -946,17 +912,9 @@ int pi_update_irte(const struct vcpu *v, const struct
>> pirq *pirq,
>>      const uint8_t gvec)
>>  {
>>      struct irq_desc *desc;
>> -    const struct msi_desc *msi_desc;
>> -    int remap_index;
>> -    int rc = 0;
>> -    const struct pci_dev *pci_dev;
>> -    const struct acpi_drhd_unit *drhd;
>> -    struct iommu *iommu;
>> -    struct ir_ctrl *ir_ctrl;
>> -    struct iremap_entry *iremap_entries = NULL, *p = NULL;
>> -    struct iremap_entry new_ire, old_ire;
>> -    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> -    __uint128_t ret;
>> +    struct msi_desc *msi_desc;
>> +    const struct pi_desc *pi_desc = v ? &v->arch.hvm_vmx.pi_desc : NULL;
>
>will there be a case that pi_update_desc is invoked with v==NULL?

The line behind your last comment above. It is to clear 'pi_desc' and 'gvec'
fields in msi_desc when guest destroies the binding with pt-irq.

>
>> +    int rc;
>> 
>>      desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>      if ( !desc )
>> @@ -968,59 +926,13 @@ int pi_update_irte(const struct vcpu *v, const
>> struct pirq *pirq,
>>          rc = -ENODEV;
>>          goto unlock_out;
>>      }
>> -
>> -    pci_dev = msi_desc->dev;
>> -    if ( !pci_dev )
>> -    {
>> -        rc = -ENODEV;
>> -        goto unlock_out;
>> -    }
>> -
>> -    remap_index = msi_desc->remap_index;
>> +    msi_desc->pi_desc = pi_desc;
>> +    msi_desc->gvec = gvec;
>
>Is it possible to see pi_desc already assigned? the name of pi_update_irte
>looks it may be invoked multiple times. If we assume a new update should
>happen only when previous one is cleaned up, then some check code
>should be added here to prove that assumption.

We don't make this assumption. I just want to track the msi's binding with
guest. If guest updates it, we updates here. If guest clears it, then we clear
it accordingly.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31  5:28   ` Tian, Kevin
@ 2017-03-30 23:10     ` Chao Gao
  2017-03-31 10:27       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-03-30 23:10 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Fri, Mar 31, 2017 at 01:28:02PM +0800, Tian, Kevin wrote:
>> From: Chao Gao
>> Sent: Wednesday, March 29, 2017 1:12 PM
>> 
>> When a vCPU migrated to another pCPU, pt irqs binded to this vCPU also
>> needed migration. When VT-d PI is enabled, interrupt vector will be recorded
>> to a main memory resident data-structure and a notification whose
>> destination is decided by NDST is generated. NDST is properly adjusted
>> during vCPU migration so pirq directly injected to guest needn't be migrated.
>> 
>> This patch adds a indicator, @posted, to show whether the pt irq is delivered
>> through VT-d PI.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> 
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index
>> 0282986..2d8de16 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct
>> hvm_pirq_dpci *pirq_dpci,
>>      struct vcpu *v = arg;
>> 
>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>> +         (!pirq_dpci->gmsi.posted) &&
>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>
>simply looking at above change it's more than what you intend to change.
>Previously even w/o GUEST_MSI flag will fall into that path, but now
>you limit it to only GUEST_MSI and irq remapping (i.e. changed the
>behavior for both posted case and w/o GUEST_MSI case). I haven't looked
>whether MACH_MASI always set with GUEST_MSI, but my gut-feeling 
>looks not correct here.

Yes. It's a problem. I think the original code may be wrong for it acquires
gmsi.dest_vcpu_id without checking 'flags' of pirq_dpci. Do we need to migrate
pirq when its type is GUEST_PCI?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-31  8:11       ` Jan Beulich
@ 2017-03-31  1:13         ` Chao Gao
  0 siblings, 0 replies; 27+ messages in thread
From: Chao Gao @ 2017-03-31  1:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

On Fri, Mar 31, 2017 at 02:11:08AM -0600, Jan Beulich wrote:
>>>> On 31.03.17 at 01:01, <chao.gao@intel.com> wrote:
>> On Fri, Mar 31, 2017 at 01:46:33PM +0800, Tian, Kevin wrote:
>>>> From: Gao, Chao
>>>> Sent: Wednesday, March 29, 2017 1:12 PM
>>>> --- a/xen/drivers/passthrough/io.c
>>>> +++ b/xen/drivers/passthrough/io.c
>>>> @@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
>>>>          else
>>>>              what = "bogus";
>>>>      }
>>>> +    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )
>>>
>>>No need to check iommu_intpost. Just keep later conditions.
>> 
>> ok. Is it for if posted is set, the iommu_intpost should be true?
>> 
>>>
>>>btw isn't the condition be " (pirq_dpci && pirq_dpci->gmsi.posted)"?
>> 
>> I don't think so. pirq, not pirq_dpci, is consumed by pi_update_irte.
>> Furthermore if pirq_dpci is null, pirq is also null. But it is not true
>> in turn.
>
>With
>
>    pirq_dpci = pirq_dpci(pirq);
>
>it _is_ the other way around, which can then also be expressed
>as "if pirq_dpci is non-NULL, pirq is non-NULL too", which is the
>precondition you need to consume (dereference) pirq. There is
>a reason other code around here also only ever checks pirq_dpci
>(even when using pirq).

Got it. Kevin and you are right.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31  9:31   ` Jan Beulich
@ 2017-03-31  2:42     ` Chao Gao
  2017-03-31 10:06       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-03-31  2:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Fri, Mar 31, 2017 at 03:31:47AM -0600, Jan Beulich wrote:
>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>>          spin_unlock(&d->event_lock);
>> +
>> +        pirq_dpci->gmsi.posted = false;
>> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
>
>Why again would dest_Fixed not allow posted delivery? This needs

No this restriction. For dest_Fixed case, hvm_girq_dest_2_vcpu_id() gives
the same output with pi_find_dest_vcpu(). Thus we don't call it again, just
use the return value of hvm_girq_dest_2_vcpu_id().

>recording in a comment, if there really is such a restriction. Or did
>you really mean to place ...
>
>> +        {
>> +            vcpu = vector_hashing_dest(d, dest, dest_mode,
>> +                                       pirq_dpci->gmsi.gvec);
>> +            if ( vcpu )
>> +                pirq_dpci->gmsi.posted = true;
>
>... this ...
>
>> +        }
>
>... after this brace (which then wouldn't be needed anymore)? If
>so, is there any point calling vector_hashing_dest() when vcpu is
>already non-NULL prior to the if()?
>
>This then also raises the question whether the call to
>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>priority delivery mode.

For lowest priority delivery mode, if VT-d PI is enabled, the result (the
destination vcpu) is overrided by vector_hashing_dest() to keep the 
existing behavior. I think the only point we should keep in mind is
for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.

Thanks
Chao

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31 10:06       ` Jan Beulich
@ 2017-03-31  3:27         ` Chao Gao
  2017-03-31 10:38           ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-03-31  3:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Fri, Mar 31, 2017 at 04:06:27AM -0600, Jan Beulich wrote:
>>>> On 31.03.17 at 04:42, <chao.gao@intel.com> wrote:
>> On Fri, Mar 31, 2017 at 03:31:47AM -0600, Jan Beulich wrote:
>>>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>>>> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>>>>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>>>>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>>>>          spin_unlock(&d->event_lock);
>>>> +
>>>> +        pirq_dpci->gmsi.posted = false;
>>>> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>>>> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
>>>
>>>Why again would dest_Fixed not allow posted delivery? This needs
>> 
>> No this restriction. For dest_Fixed case, hvm_girq_dest_2_vcpu_id() gives
>> the same output with pi_find_dest_vcpu(). Thus we don't call it again, just
>> use the return value of hvm_girq_dest_2_vcpu_id().
>
>But as pointed out you don't set the new posted field in that case.
>

Indeed.

How about this:
	pirq_dpci->gmsi.posted = false;
        if ( iommu_intpost )
        {
            vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
            if ( delivery_mode == dest_LowestPrio )
                vcpu = vector_hashing_dest(d, dest, dest_mode,
                                           pirq_dpci->gmsi.gvec);
            if ( vcpu )
                pirq_dpci->gmsi.posted = true;
        }

>>>recording in a comment, if there really is such a restriction. Or did
>>>you really mean to place ...
>>>
>>>> +        {
>>>> +            vcpu = vector_hashing_dest(d, dest, dest_mode,
>>>> +                                       pirq_dpci->gmsi.gvec);
>>>> +            if ( vcpu )
>>>> +                pirq_dpci->gmsi.posted = true;
>>>
>>>... this ...
>>>
>>>> +        }
>>>
>>>... after this brace (which then wouldn't be needed anymore)? If
>>>so, is there any point calling vector_hashing_dest() when vcpu is
>>>already non-NULL prior to the if()?
>>>
>>>This then also raises the question whether the call to
>>>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>>>priority delivery mode.
>> 
>> For lowest priority delivery mode, if VT-d PI is enabled, the result (the
>> destination vcpu) is overrided by vector_hashing_dest() to keep the 
>> existing behavior. I think the only point we should keep in mind is
>> for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
>> and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.
>
>Well, the override is done for the iommu_intpost case. The remark
>on hvm_girq_dest_2_vcpu_id(), however, was made in general.

Ok. You meant the method using in hvm_girq_dest_2_vcpu_id() may not match the method
used by real hardware. I will check it.

Thanks
Chao

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list
  2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (5 preceding siblings ...)
  2017-03-29  5:11 ` [PATCH v11 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
@ 2017-03-31  5:13 ` Tian, Kevin
  6 siblings, 0 replies; 27+ messages in thread
From: Tian, Kevin @ 2017-03-31  5:13 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 29, 2017 1:12 PM
> 
> The current VT-d PI related code may operate incorrectly in the following
> scenarios:
> 1. When VT-d PI is enabled, neen't migrate pirq which is using VT-d PI during

neen't -> needn't

> vCPU migration. Patch [1/6] solves this by introducing a new flag to indicate
> that the pt-irq is delivered through VT-d PI.
> 
> 2. msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format.
> It wrongly inherits the 'im' field from the live IRTE but updates all the other
> fileds to remapping format. Patch [2/6] handles this.
> 
> 3. [3/6] is a cleanup patch
> 
> 4. When a pCPU is unplugged, and there might be vCPUs on its list. Since the
> pCPU is offline, those vCPUs might not be woken up again. [4/6] addresses it.
> 
> 5. IRTE is updated through structure assigment which is unsafe in some cases.
> To resolve this, Patch [5/6] provides two variants, atomic and non-atomic, to
> update IRTE. And a bug is raised when we can't meet the caller's atomic
> requirement.
> 
> 6. We didn't change the IRTE to remapping format when pt-irq is
> configurated to have multi-destination vCPUs. Patch [6/6] resolves this
> problem.
> 
> 
> Chao Gao (4):
>   passthrough: don't migrate pirq when it is delivered through VT-d PI
>   VT-d: Introduce new fields in msi_desc to track binding with guest
>     interrupt
>   VT-d: introduce update_irte to update irte safely
>   passthrough/io: Fall back to remapping interrupt when we can't use
>     VT-d PI
> 
> Feng Wu (2):
>   VT-d: Some cleanups
>   VMX: Fixup PI descriptor when cpu is offline
> 
>  xen/arch/x86/hvm/hvm.c                 |   3 +
>  xen/arch/x86/hvm/vmx/vmcs.c            |   1 +
>  xen/arch/x86/hvm/vmx/vmx.c             |  70 ++++++++++
>  xen/arch/x86/msi.c                     |   2 +
>  xen/drivers/passthrough/io.c           |  71 ++--------
>  xen/drivers/passthrough/vtd/intremap.c | 236 +++++++++++++++--------------
> ----
>  xen/include/asm-x86/hvm/vmx/vmx.h      |   1 +
>  xen/include/asm-x86/msi.h              |   3 +
>  xen/include/xen/hvm/irq.h              |   1 +
>  9 files changed, 204 insertions(+), 184 deletions(-)
> 
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-03-31  5:28   ` Tian, Kevin
  2017-03-30 23:10     ` Chao Gao
  2017-03-31  9:31   ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Tian, Kevin @ 2017-03-31  5:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Gao, Chao

> From: Chao Gao
> Sent: Wednesday, March 29, 2017 1:12 PM
> 
> When a vCPU migrated to another pCPU, pt irqs binded to this vCPU also
> needed migration. When VT-d PI is enabled, interrupt vector will be recorded
> to a main memory resident data-structure and a notification whose
> destination is decided by NDST is generated. NDST is properly adjusted
> during vCPU migration so pirq directly injected to guest needn't be migrated.
> 
> This patch adds a indicator, @posted, to show whether the pt irq is delivered
> through VT-d PI.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v11:
> - rename the indicator to 'posted'
> - move setting 'posted' field to event lock un-locked region.
> 
> v10:
> - Newly added.
> 
>  xen/arch/x86/hvm/hvm.c       |  3 +++
>  xen/drivers/passthrough/io.c | 62 +++++++++-----------------------------------
>  xen/include/xen/hvm/irq.h    |  1 +
>  3 files changed, 16 insertions(+), 50 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index
> 0282986..2d8de16 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct
> hvm_pirq_dpci *pirq_dpci,
>      struct vcpu *v = arg;
> 
>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
> +         (!pirq_dpci->gmsi.posted) &&
>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )

simply looking at above change it's more than what you intend to change.
Previously even w/o GUEST_MSI flag will fall into that path, but now
you limit it to only GUEST_MSI and irq remapping (i.e. changed the
behavior for both posted case and w/o GUEST_MSI case). I haven't looked
whether MACH_MASI always set with GUEST_MSI, but my gut-feeling 
looks not correct here.

>      {
>          struct irq_desc *desc =
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 080183e..d53976c 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -259,52 +259,6 @@ static struct vcpu *vector_hashing_dest(const struct
> domain *d,
>      return dest;
>  }
> 
> -/*
> - * The purpose of this routine is to find the right destination vCPU for
> - * an interrupt which will be delivered by VT-d posted-interrupt. There
> - * are several cases as below:
> - *
> - * - For lowest-priority interrupts, use vector-hashing mechanism to find
> - *   the destination.
> - * - Otherwise, for single destination interrupt, it is straightforward to
> - *   find the destination vCPU and return true.
> - * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> - *   so return NULL.
> - */
> -static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t
> dest_id,
> -                                      bool_t dest_mode, uint8_t delivery_mode,
> -                                      uint8_t gvec)
> -{
> -    unsigned int dest_vcpus = 0;
> -    struct vcpu *v, *dest = NULL;
> -
> -    switch ( delivery_mode )
> -    {
> -    case dest_LowestPrio:
> -        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
> -    case dest_Fixed:
> -        for_each_vcpu ( d, v )
> -        {
> -            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
> -                                    dest_id, dest_mode) )
> -                continue;
> -
> -            dest_vcpus++;
> -            dest = v;
> -        }
> -
> -        /* For fixed mode, we only handle single-destination interrupts. */
> -        if ( dest_vcpus == 1 )
> -            return dest;
> -
> -        break;
> -    default:
> -        break;
> -    }
> -
> -    return NULL;
> -}
> -
>  int pt_irq_create_bind(
>      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)  { @@ -365,6
> +319,7 @@ int pt_irq_create_bind(
>      {
>          uint8_t dest, dest_mode, delivery_mode;
>          int dest_vcpu_id;
> +        const struct vcpu *vcpu;
> 
>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
>          {
> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
> +
> +        pirq_dpci->gmsi.posted = false;
> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
> +        {
> +            vcpu = vector_hashing_dest(d, dest, dest_mode,
> +                                       pirq_dpci->gmsi.gvec);
> +            if ( vcpu )
> +                pirq_dpci->gmsi.posted = true;
> +        }
>          if ( dest_vcpu_id >= 0 )
>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> 
>          /* Use interrupt posting if it is supported. */
>          if ( iommu_intpost )
>          {
> -            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
> -                                          delivery_mode, pirq_dpci->gmsi.gvec);
> -
>              if ( vcpu )
> -                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
> +                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
>              else
>                  dprintk(XENLOG_G_INFO,
>                          "%pv: deliver interrupt in remapping mode,gvec:%02x\n", diff -
> -git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h index
> d3f8623..566854a 100644
> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -63,6 +63,7 @@ struct hvm_gmsi_info {
>      uint32_t gvec;
>      uint32_t gflags;
>      int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
> +    bool posted; /* directly deliver to guest via VT-d PI? */
>  };
> 
>  struct hvm_girq_dpci_mapping {
> --
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-29  5:11 ` [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-03-31  5:46   ` Tian, Kevin
  2017-03-30 23:01     ` Chao Gao
  2017-03-31  9:48   ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Tian, Kevin @ 2017-03-31  5:46 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 29, 2017 1:12 PM
> 
> msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format. It
> wrongly inherits the 'im' field meaning the IRTE is in posted format but
> updates all the other fields to remapping format.
> 
> There are also two situations that lead to the above issue. One is some
> callers
> really want to change the IRTE to remapped format. The other is some callers
> only want to update msi message (e.g. set msi affinity) for they don't aware
> that this msi is binded with a guest interrupt. We should suppress update
> in the second situation. To distinguish them, straightforwardly, we can let
> caller specify which format of IRTE they want update to. It isn't feasible for
> making all callers be aware of the binding with guest interrupt will cause a
> far more complicated change (including the interfaces exposed to IOAPIC and
> MSI). Also some callings happen in interrupt context where we can't acquire
> d->event_lock to read struct hvm_pirq_dpci.
> 
> This patch introduces two new fields in msi_desc to track binding with a
> guest
> interrupt such that msi_msg_to_remap_entry() can get the binding and
> update
> IRTE accordingly. After that change, pi_update_irte() can utilize
> msi_msg_to_remap_entry() to update IRTE to posted format.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v11:
> - Commit message changes.
> - Add code to clear the two new fields introduced in this patch.
> 
> v10:
> - Newly added.
> 
>  xen/arch/x86/msi.c                     |   1 +
>  xen/drivers/passthrough/io.c           |   2 +
>  xen/drivers/passthrough/vtd/intremap.c | 150 +++++++--------------------------
>  xen/include/asm-x86/msi.h              |   2 +
>  4 files changed, 36 insertions(+), 119 deletions(-)
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index a868007..3374cd4 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int
> nr)
>          entry[nr].dev = NULL;
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
> +        entry[nr].pi_desc = NULL;
>      }
> 
>      return entry;
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index d53976c..5dbfe53 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
>          else
>              what = "bogus";
>      }
> +    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )

No need to check iommu_intpost. Just keep later conditions.

btw isn't the condition be " (pirq_dpci && pirq_dpci->gmsi.posted)"?

> +        pi_update_irte(NULL, pirq, 0);
> 
>      if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
>           list_empty(&pirq_dpci->digl_list) )
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index bfd468b..9bbf0f6 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -551,12 +551,12 @@ static int msi_msg_to_remap_entry(
>      struct iommu *iommu, struct pci_dev *pdev,
>      struct msi_desc *msi_desc, struct msi_msg *msg)
>  {
> -    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
> -    struct iremap_entry new_ire;
> +    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
>      struct msi_msg_remap_entry *remap_rte;
>      unsigned int index, i, nr = 1;
>      unsigned long flags;
>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
> +    const struct pi_desc *pi_desc = msi_desc->pi_desc;
> 
>      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>          nr = msi_desc->msi.nvec;
> @@ -595,33 +595,35 @@ static int msi_msg_to_remap_entry(
>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>                       iremap_entries, iremap_entry);
> 
> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
> -
> -    /* Set interrupt remapping table entry */
> -    new_ire.remap.fpd = 0;
> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT)
> & 0x1;
> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT)
> & 0x1;
> -    /* Hardware require RH = 1 for LPR delivery mode */
> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> -    new_ire.remap.avail = 0;
> -    new_ire.remap.res_1 = 0;
> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> -                            MSI_DATA_VECTOR_MASK;
> -    new_ire.remap.res_2 = 0;
> -    if ( x2apic_enabled )
> -        new_ire.remap.dst = msg->dest32;
> +    if ( !pi_desc )
> +    {
> +        new_ire.remap.dm = msg->address_lo >>
> MSI_ADDR_DESTMODE_SHIFT;
> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
> +        new_ire.remap.dlm = msg->data >>
> MSI_DATA_DELIVERY_MODE_SHIFT;
> +        /* Hardware requires RH = 1 for lowest priority delivery mode */
> +        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> +        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> +                                MSI_DATA_VECTOR_MASK;
> +        if ( x2apic_enabled )
> +            new_ire.remap.dst = msg->dest32;
> +        else
> +            new_ire.remap.dst =
> +                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
> +        new_ire.remap.p = 1;
> +    }
>      else
> -        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
> -                             & 0xff) << 8;
> +    {
> +        new_ire.post.im = 1;
> +        new_ire.post.vector = msi_desc->gvec;
> +        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> +        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
> +        new_ire.post.p = 1;
> +    }
> 
>      if ( pdev )
>          set_msi_source_id(pdev, &new_ire);
>      else
>          set_hpet_source_id(msi_desc->hpet_id, &new_ire);
> -    new_ire.remap.res_3 = 0;
> -    new_ire.remap.res_4 = 0;
> -    new_ire.remap.p = 1;    /* finally, set present bit */
> 
>      /* now construct new MSI/MSI-X rte entry */
>      remap_rte = (struct msi_msg_remap_entry *)msg;
> @@ -902,42 +904,6 @@ void iommu_disable_x2apic_IR(void)
>          disable_qinval(drhd->iommu);
>  }
> 
> -static void setup_posted_irte(
> -    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
> -    const struct pi_desc *pi_desc, const uint8_t gvec)
> -{
> -    memset(new_ire, 0, sizeof(*new_ire));
> -
> -    /*
> -     * 'im' filed decides whether the irte is in posted format (with value 1)
> -     * or remapped format (with value 0), if the old irte is in remapped format,
> -     * we copy things from remapped part in 'struct iremap_entry', otherwise,
> -     * we copy from posted part.
> -     */
> -    if ( !old_ire->remap.im )
> -    {
> -        new_ire->post.p = old_ire->remap.p;
> -        new_ire->post.fpd = old_ire->remap.fpd;
> -        new_ire->post.sid = old_ire->remap.sid;
> -        new_ire->post.sq = old_ire->remap.sq;
> -        new_ire->post.svt = old_ire->remap.svt;
> -    }
> -    else
> -    {
> -        new_ire->post.p = old_ire->post.p;
> -        new_ire->post.fpd = old_ire->post.fpd;
> -        new_ire->post.sid = old_ire->post.sid;
> -        new_ire->post.sq = old_ire->post.sq;
> -        new_ire->post.svt = old_ire->post.svt;
> -        new_ire->post.urg = old_ire->post.urg;
> -    }
> -
> -    new_ire->post.im = 1;
> -    new_ire->post.vector = gvec;
> -    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> -    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
> -}
> -
>  /*
>   * This function is used to update the IRTE for posted-interrupt
>   * when guest changes MSI/MSI-X information.
> @@ -946,17 +912,9 @@ int pi_update_irte(const struct vcpu *v, const struct
> pirq *pirq,
>      const uint8_t gvec)
>  {
>      struct irq_desc *desc;
> -    const struct msi_desc *msi_desc;
> -    int remap_index;
> -    int rc = 0;
> -    const struct pci_dev *pci_dev;
> -    const struct acpi_drhd_unit *drhd;
> -    struct iommu *iommu;
> -    struct ir_ctrl *ir_ctrl;
> -    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> -    struct iremap_entry new_ire, old_ire;
> -    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> -    __uint128_t ret;
> +    struct msi_desc *msi_desc;
> +    const struct pi_desc *pi_desc = v ? &v->arch.hvm_vmx.pi_desc : NULL;

will there be a case that pi_update_desc is invoked with v==NULL?

> +    int rc;
> 
>      desc = pirq_spin_lock_irq_desc(pirq, NULL);
>      if ( !desc )
> @@ -968,59 +926,13 @@ int pi_update_irte(const struct vcpu *v, const
> struct pirq *pirq,
>          rc = -ENODEV;
>          goto unlock_out;
>      }
> -
> -    pci_dev = msi_desc->dev;
> -    if ( !pci_dev )
> -    {
> -        rc = -ENODEV;
> -        goto unlock_out;
> -    }
> -
> -    remap_index = msi_desc->remap_index;
> +    msi_desc->pi_desc = pi_desc;
> +    msi_desc->gvec = gvec;

Is it possible to see pi_desc already assigned? the name of pi_update_irte
looks it may be invoked multiple times. If we assume a new update should
happen only when previous one is cleaned up, then some check code
should be added here to prove that assumption.

> 
>      spin_unlock_irq(&desc->lock);
> 
>      ASSERT(pcidevs_locked());
> -
> -    /*
> -     * FIXME: For performance reasons we should store the 'iommu' pointer
> in
> -     * 'struct msi_desc' in some other place, so we don't need to waste
> -     * time searching it here.
> -     */
> -    drhd = acpi_find_matched_drhd_unit(pci_dev);
> -    if ( !drhd )
> -        return -ENODEV;
> -
> -    iommu = drhd->iommu;
> -    ir_ctrl = iommu_ir_ctrl(iommu);
> -    if ( !ir_ctrl )
> -        return -ENODEV;
> -
> -    spin_lock_irq(&ir_ctrl->iremap_lock);
> -
> -    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> -
> -    old_ire = *p;
> -
> -    /* Setup/Update interrupt remapping table entry. */
> -    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> -    ret = cmpxchg16b(p, &old_ire, &new_ire);
> -
> -    /*
> -     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
> -     * and the hardware cannot update the IRTE behind us, so the return
> value
> -     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
> -     */
> -    ASSERT(ret == old_ire.val);
> -
> -    iommu_flush_cache_entry(p, sizeof(*p));
> -    iommu_flush_iec_index(iommu, 0, remap_index);
> -
> -    unmap_vtd_domain_page(iremap_entries);
> -
> -    spin_unlock_irq(&ir_ctrl->iremap_lock);
> -
> -    return 0;
> +    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> 
>   unlock_out:
>      spin_unlock_irq(&desc->lock);
> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
> index 9c02945..fc9ab04 100644
> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -118,6 +118,8 @@ struct msi_desc {
>  	struct msi_msg msg;		/* Last set MSI message */
> 
>  	int remap_index;		/* index in interrupt remapping table
> */
> +	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
> +	uint8_t gvec;			/* guest vector. valid when pi_desc
> isn't NULL */
>  };
> 
>  /*
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely
  2017-03-29  5:11 ` [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-03-31  6:03   ` Tian, Kevin
  2017-03-31 10:01   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Tian, Kevin @ 2017-03-31  6:03 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 29, 2017 1:12 PM
> 
> We used structure assignment to update irte which was non-atomic when
> the
> whole IRTE was to be updated. It is unsafe when a interrupt happened
> during
> update. Furthermore, no bug or warning would be reported when this
> happened.
> 
> This patch introduces two variants, atomic and non-atomic, to update
> irte. Both variants will update IRTE if possible. If the caller requests a
> atomic update but we can't meet it, we raise a bug.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v11:
> - Add two variant function to update IRTE. Call the non-atomic one for init
> and clear operations. Call the atomic one for other cases.
> - Add a new field to indicate the remap_entry associated with msi_desc is
> initialized or not.
> 
> v10:
> - rename copy_irte_to_irt to update_irte
> - remove copy_from_to_irt
> - change commmit message and add some comments to illustrate on which
> condition update_irte() is safe.
> 
>  xen/arch/x86/msi.c                     |  1 +
>  xen/drivers/passthrough/vtd/intremap.c | 78
> ++++++++++++++++++++++++++++++++--
>  xen/include/asm-x86/msi.h              |  1 +
>  3 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 3374cd4..7ed1243 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int
> nr)
>          entry[nr].dev = NULL;
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
> +        entry[nr].remap_entry_initialized = false;
>          entry[nr].pi_desc = NULL;
>      }
> 
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index b992f23..b7f3cf1 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -169,10 +169,64 @@ bool_t __init iommu_supports_eim(void)
>      return 1;
>  }
> 
> +static void update_irte(struct iremap_entry *entry,
> +                        const struct iremap_entry *new_ire,
> +                        bool atomic)
> +{
> +    if ( cpu_has_cx16 )
> +    {
> +        __uint128_t ret;
> +        struct iremap_entry old_ire;
> +
> +        old_ire = *entry;
> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
> +
> +        /*
> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
> +         * IRTE, and the hardware cannot update the IRTE behind us, so
> +         * the return value of cmpxchg16 should be the same as old_ire.
> +         * This ASSERT validate it.
> +         */
> +        ASSERT(ret == old_ire.val);
> +    }
> +    else
> +    {
> +        /*
> +         * The following code will update irte atomically if possible.
> +         * If the caller requests a atomic update but we can't meet it,

a -> an

> +         * a bug will be raised.
> +         */
> +        if ( entry->lo == new_ire->lo )
> +            entry->hi = new_ire->hi;
> +        else if ( entry->hi == new_ire->hi )
> +            entry->lo = new_ire->lo;
> +        else if ( !atomic )
> +        {
> +            entry->lo = new_ire->lo;
> +            entry->hi = new_ire->hi;
> +        }
> +        else
> +            BUG();

suppose you need same ASSERT as for cmxchg16 here in atomic
case.

> +    }
> +}
> +
> +static inline void update_irte_non_atomic(struct iremap_entry *entry,
> +                                          const struct iremap_entry *new_ire)
> +{
> +    update_irte(entry, new_ire, false);
> +}
> +
> +static inline void update_irte_atomic(struct iremap_entry *entry,
> +                                      const struct iremap_entry *new_ire)
> +{
> +    update_irte(entry, new_ire, true);
> +}
> +
> +
>  /* Mark specified intr remap entry as free */
>  static void free_remap_entry(struct iommu *iommu, int index)
>  {
> -    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
> +    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
> 
>      if ( index < 0 || index > IREMAP_ENTRY_NR - 1 )
> @@ -183,7 +237,7 @@ static void free_remap_entry(struct iommu *iommu,
> int index)
>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>                       iremap_entries, iremap_entry);
> 
> -    memset(iremap_entry, 0, sizeof(*iremap_entry));
> +    update_irte_non_atomic(iremap_entry, &new_ire);
>      iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
>      iommu_flush_iec_index(iommu, 0, index);
> 
> @@ -286,6 +340,7 @@ static int ioapic_rte_to_remap_entry(struct iommu
> *iommu,
>      int index;
>      unsigned long flags;
>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
> +    bool init = false;
> 
>      remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
>      spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> @@ -296,6 +351,7 @@ static int ioapic_rte_to_remap_entry(struct iommu
> *iommu,
>          index = alloc_remap_entry(iommu, 1);
>          if ( index < IREMAP_ENTRY_NR )
>              apic_pin_2_ir_idx[apic][ioapic_pin] = index;
> +        init = true;
>      }
> 
>      if ( index > IREMAP_ENTRY_NR - 1 )
> @@ -353,7 +409,11 @@ static int ioapic_rte_to_remap_entry(struct iommu
> *iommu,
>          remap_rte->format = 1;    /* indicate remap format */
>      }
> 
> -    *iremap_entry = new_ire;
> +    if ( init )
> +        update_irte_non_atomic(iremap_entry, &new_ire);
> +    else
> +        update_irte_atomic(iremap_entry, &new_ire);
> +
>      iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
>      iommu_flush_iec_index(iommu, 0, index);
> 
> @@ -567,7 +627,10 @@ static int msi_msg_to_remap_entry(
>      {
>          /* Free specified unused IRTEs */
>          for ( i = 0; i < nr; ++i )
> +        {
>              free_remap_entry(iommu, msi_desc->remap_index + i);
> +            msi_desc[i].remap_entry_initialized = false;
> +        }
>          spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
>          return 0;
>      }
> @@ -639,7 +702,14 @@ static int msi_msg_to_remap_entry(
>      remap_rte->address_hi = 0;
>      remap_rte->data = index - i;
> 
> -    *iremap_entry = new_ire;
> +    if ( msi_desc->remap_entry_initialized )
> +        update_irte_atomic(iremap_entry, &new_ire);
> +    else
> +    {
> +        update_irte_non_atomic(iremap_entry, &new_ire);
> +        msi_desc->remap_entry_initialized = true;
> +    }
> +
>      iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
>      iommu_flush_iec_index(iommu, 0, index);
> 
> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
> index fc9ab04..a0bd3af 100644
> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -118,6 +118,7 @@ struct msi_desc {
>  	struct msi_msg msg;		/* Last set MSI message */
> 
>  	int remap_index;		/* index in interrupt remapping table
> */
> +	bool remap_entry_initialized;
>  	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
>  	uint8_t gvec;			/* guest vector. valid when pi_desc
> isn't NULL */
>  };
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-30 23:01     ` Chao Gao
@ 2017-03-31  8:11       ` Jan Beulich
  2017-03-31  1:13         ` Chao Gao
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2017-03-31  8:11 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

>>> On 31.03.17 at 01:01, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 01:46:33PM +0800, Tian, Kevin wrote:
>>> From: Gao, Chao
>>> Sent: Wednesday, March 29, 2017 1:12 PM
>>> --- a/xen/drivers/passthrough/io.c
>>> +++ b/xen/drivers/passthrough/io.c
>>> @@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
>>>          else
>>>              what = "bogus";
>>>      }
>>> +    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )
>>
>>No need to check iommu_intpost. Just keep later conditions.
> 
> ok. Is it for if posted is set, the iommu_intpost should be true?
> 
>>
>>btw isn't the condition be " (pirq_dpci && pirq_dpci->gmsi.posted)"?
> 
> I don't think so. pirq, not pirq_dpci, is consumed by pi_update_irte.
> Furthermore if pirq_dpci is null, pirq is also null. But it is not true
> in turn.

With

    pirq_dpci = pirq_dpci(pirq);

it _is_ the other way around, which can then also be expressed
as "if pirq_dpci is non-NULL, pirq is non-NULL too", which is the
precondition you need to consume (dereference) pirq. There is
a reason other code around here also only ever checks pirq_dpci
(even when using pirq).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
  2017-03-31  5:28   ` Tian, Kevin
@ 2017-03-31  9:31   ` Jan Beulich
  2017-03-31  2:42     ` Chao Gao
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2017-03-31  9:31 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
> +
> +        pirq_dpci->gmsi.posted = false;
> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )

Why again would dest_Fixed not allow posted delivery? This needs
recording in a comment, if there really is such a restriction. Or did
you really mean to place ...

> +        {
> +            vcpu = vector_hashing_dest(d, dest, dest_mode,
> +                                       pirq_dpci->gmsi.gvec);
> +            if ( vcpu )
> +                pirq_dpci->gmsi.posted = true;

... this ...

> +        }

... after this brace (which then wouldn't be needed anymore)? If
so, is there any point calling vector_hashing_dest() when vcpu is
already non-NULL prior to the if()?

This then also raises the question whether the call to
hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
priority delivery mode.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-29  5:11 ` [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
  2017-03-31  5:46   ` Tian, Kevin
@ 2017-03-31  9:48   ` Jan Beulich
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2017-03-31  9:48 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
> v11:
> - Commit message changes.
> - Add code to clear the two new fields introduced in this patch.

The latter appears to contradict ...

> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
>          entry[nr].dev = NULL;
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
> +        entry[nr].pi_desc = NULL;
>      }

... only pi_desc being cleared here. I think the code is fine here (as
you use gvec only when pi_desc is non-NULL), but please try to
avoid giving imprecise information in the future.

> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -618,6 +618,8 @@ int pt_irq_destroy_bind(
>          else
>              what = "bogus";
>      }
> +    else if ( iommu_intpost && pirq && pirq_dpci->gmsi.posted )
> +        pi_update_irte(NULL, pirq, 0);

The more I look at this passing of a NULL vCPU pointer, the less
I like it: If the pointer was used for anything other than
retrieving pi_desc, this would be actively dangerous. I think the
function would better be passed a const struct pi_desc * instead.

> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -118,6 +118,8 @@ struct msi_desc {
>  	struct msi_msg msg;		/* Last set MSI message */
>  
>  	int remap_index;		/* index in interrupt remapping table */
> +	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
> +	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
>  };

Please avoid adding further holes in this structure, by moving
gvec right after msi_attrib (I'll also queue a patch to move
remap_index up ahead of msg).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely
  2017-03-29  5:11 ` [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely Chao Gao
  2017-03-31  6:03   ` Tian, Kevin
@ 2017-03-31 10:01   ` Jan Beulich
  2017-04-04 19:12     ` Chao Gao
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2017-03-31 10:01 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
> We used structure assignment to update irte which was non-atomic when the
> whole IRTE was to be updated. It is unsafe when a interrupt happened during
> update. Furthermore, no bug or warning would be reported when this happened.
> 
> This patch introduces two variants, atomic and non-atomic, to update
> irte. Both variants will update IRTE if possible. If the caller requests a
> atomic update but we can't meet it, we raise a bug.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v11:
> - Add two variant function to update IRTE. Call the non-atomic one for init
> and clear operations. Call the atomic one for other cases.
> - Add a new field to indicate the remap_entry associated with msi_desc is
> initialized or not.
> 
> v10:
> - rename copy_irte_to_irt to update_irte
> - remove copy_from_to_irt
> - change commmit message and add some comments to illustrate on which
> condition update_irte() is safe.
> 
>  xen/arch/x86/msi.c                     |  1 +
>  xen/drivers/passthrough/vtd/intremap.c | 78 
> ++++++++++++++++++++++++++++++++--
>  xen/include/asm-x86/msi.h              |  1 +
>  3 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 3374cd4..7ed1243 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
>          entry[nr].dev = NULL;
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
> +        entry[nr].remap_entry_initialized = false;
>          entry[nr].pi_desc = NULL;
>      }
>  
> diff --git a/xen/drivers/passthrough/vtd/intremap.c 
> b/xen/drivers/passthrough/vtd/intremap.c
> index b992f23..b7f3cf1 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -169,10 +169,64 @@ bool_t __init iommu_supports_eim(void)
>      return 1;
>  }
>  
> +static void update_irte(struct iremap_entry *entry,
> +                        const struct iremap_entry *new_ire,
> +                        bool atomic)
> +{
> +    if ( cpu_has_cx16 )
> +    {
> +        __uint128_t ret;
> +        struct iremap_entry old_ire;
> +
> +        old_ire = *entry;
> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
> +
> +        /*
> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
> +         * IRTE, and the hardware cannot update the IRTE behind us, so
> +         * the return value of cmpxchg16 should be the same as old_ire.
> +         * This ASSERT validate it.
> +         */
> +        ASSERT(ret == old_ire.val);
> +    }
> +    else
> +    {
> +        /*
> +         * The following code will update irte atomically if possible.

There's nothing atomic below - between the compares and stores
the value in the table could change. Please don't make false
promises in comments.

> +         * If the caller requests a atomic update but we can't meet it, 
> +         * a bug will be raised.
> +         */
> +        if ( entry->lo == new_ire->lo )
> +            entry->hi = new_ire->hi;
> +        else if ( entry->hi == new_ire->hi )
> +            entry->lo = new_ire->lo;

Best effort would still call for use of write_atomic() instead of both
of the assignments above.

> @@ -353,7 +409,11 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
>          remap_rte->format = 1;    /* indicate remap format */
>      }
>  
> -    *iremap_entry = new_ire;
> +    if ( init )
> +        update_irte_non_atomic(iremap_entry, &new_ire);
> +    else
> +        update_irte_atomic(iremap_entry, &new_ire);

Seems like you'd better call update_irte() here directly, instead of
having this if/else. Which puts under question the usefulness of the
two wrappers.

> @@ -639,7 +702,14 @@ static int msi_msg_to_remap_entry(
>      remap_rte->address_hi = 0;
>      remap_rte->data = index - i;
>  
> -    *iremap_entry = new_ire;
> +    if ( msi_desc->remap_entry_initialized )
> +        update_irte_atomic(iremap_entry, &new_ire);
> +    else
> +    {
> +        update_irte_non_atomic(iremap_entry, &new_ire);
> +        msi_desc->remap_entry_initialized = true;
> +    }

Same here.

I also wonder whether you really need the new flag: The function
knows whether it's initializing the IRTE for the first time, as it
allocates a table index just like ioapic_rte_to_remap_entry() does
(where you get away without a new flag).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31  2:42     ` Chao Gao
@ 2017-03-31 10:06       ` Jan Beulich
  2017-03-31  3:27         ` Chao Gao
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2017-03-31 10:06 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 31.03.17 at 04:42, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 03:31:47AM -0600, Jan Beulich wrote:
>>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>>> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>>>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>>>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>>>          spin_unlock(&d->event_lock);
>>> +
>>> +        pirq_dpci->gmsi.posted = false;
>>> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>>> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
>>
>>Why again would dest_Fixed not allow posted delivery? This needs
> 
> No this restriction. For dest_Fixed case, hvm_girq_dest_2_vcpu_id() gives
> the same output with pi_find_dest_vcpu(). Thus we don't call it again, just
> use the return value of hvm_girq_dest_2_vcpu_id().

But as pointed out you don't set the new posted field in that case.

>>recording in a comment, if there really is such a restriction. Or did
>>you really mean to place ...
>>
>>> +        {
>>> +            vcpu = vector_hashing_dest(d, dest, dest_mode,
>>> +                                       pirq_dpci->gmsi.gvec);
>>> +            if ( vcpu )
>>> +                pirq_dpci->gmsi.posted = true;
>>
>>... this ...
>>
>>> +        }
>>
>>... after this brace (which then wouldn't be needed anymore)? If
>>so, is there any point calling vector_hashing_dest() when vcpu is
>>already non-NULL prior to the if()?
>>
>>This then also raises the question whether the call to
>>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>>priority delivery mode.
> 
> For lowest priority delivery mode, if VT-d PI is enabled, the result (the
> destination vcpu) is overrided by vector_hashing_dest() to keep the 
> existing behavior. I think the only point we should keep in mind is
> for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
> and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.

Well, the override is done for the iommu_intpost case. The remark
on hvm_girq_dest_2_vcpu_id(), however, was made in general.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-30 23:10     ` Chao Gao
@ 2017-03-31 10:27       ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2017-03-31 10:27 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Kevin Tian, xen-devel

>>> On 31.03.17 at 01:10, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 01:28:02PM +0800, Tian, Kevin wrote:
>>> From: Chao Gao
>>> Sent: Wednesday, March 29, 2017 1:12 PM
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct
>>> hvm_pirq_dpci *pirq_dpci,
>>>      struct vcpu *v = arg;
>>> 
>>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>>> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>>> +         (!pirq_dpci->gmsi.posted) &&
>>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>
>>simply looking at above change it's more than what you intend to change.
>>Previously even w/o GUEST_MSI flag will fall into that path, but now
>>you limit it to only GUEST_MSI and irq remapping (i.e. changed the
>>behavior for both posted case and w/o GUEST_MSI case). I haven't looked
>>whether MACH_MASI always set with GUEST_MSI, but my gut-feeling 
>>looks not correct here.
> 
> Yes. It's a problem. I think the original code may be wrong for it acquires
> gmsi.dest_vcpu_id without checking 'flags' of pirq_dpci. Do we need to migrate
> pirq when its type is GUEST_PCI?

It probably would be nice, but we don't know where to migrate it to:
The names in "pirq_dpci->gmsi.dest_vcpu_id" should make this pretty
clear, I would think.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31  3:27         ` Chao Gao
@ 2017-03-31 10:38           ` Jan Beulich
  2017-04-05  0:20             ` Chao Gao
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2017-03-31 10:38 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 31.03.17 at 05:27, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 04:06:27AM -0600, Jan Beulich wrote:
>>>>> On 31.03.17 at 04:42, <chao.gao@intel.com> wrote:
>>> On Fri, Mar 31, 2017 at 03:31:47AM -0600, Jan Beulich wrote:
>>>>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>>>>> @@ -442,17 +397,24 @@ int pt_irq_create_bind(
>>>>>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>>>>>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>>>>>          spin_unlock(&d->event_lock);
>>>>> +
>>>>> +        pirq_dpci->gmsi.posted = false;
>>>>> +        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>>>>> +        if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )
>>>>
>>>>Why again would dest_Fixed not allow posted delivery? This needs
>>> 
>>> No this restriction. For dest_Fixed case, hvm_girq_dest_2_vcpu_id() gives
>>> the same output with pi_find_dest_vcpu(). Thus we don't call it again, just
>>> use the return value of hvm_girq_dest_2_vcpu_id().
>>
>>But as pointed out you don't set the new posted field in that case.
>>
> 
> Indeed.
> 
> How about this:
> 	pirq_dpci->gmsi.posted = false;
>         if ( iommu_intpost )
>         {
>             vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>             if ( delivery_mode == dest_LowestPrio )
>                 vcpu = vector_hashing_dest(d, dest, dest_mode,
>                                            pirq_dpci->gmsi.gvec);
>             if ( vcpu )
>                 pirq_dpci->gmsi.posted = true;
>         }

I think the setting of vcpu needs to stay outside the if(), but other
than that the above looks reasonable at a first glance.

>>>>This then also raises the question whether the call to
>>>>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>>>>priority delivery mode.
>>> 
>>> For lowest priority delivery mode, if VT-d PI is enabled, the result (the
>>> destination vcpu) is overrided by vector_hashing_dest() to keep the 
>>> existing behavior. I think the only point we should keep in mind is
>>> for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
>>> and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.
>>
>>Well, the override is done for the iommu_intpost case. The remark
>>on hvm_girq_dest_2_vcpu_id(), however, was made in general.
> 
> Ok. You meant the method using in hvm_girq_dest_2_vcpu_id() may not match 
> the method
> used by real hardware. I will check it.

I mean that the method used there is pretty clearly "fixed" mode
handling. Thanks for checking.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely
  2017-03-31 10:01   ` Jan Beulich
@ 2017-04-04 19:12     ` Chao Gao
  2017-04-05  7:40       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-04-04 19:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

On Fri, Mar 31, 2017 at 04:01:31AM -0600, Jan Beulich wrote:
>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>> +static void update_irte(struct iremap_entry *entry,
>> +                        const struct iremap_entry *new_ire,
>> +                        bool atomic)
>> +{
>> +    if ( cpu_has_cx16 )
>> +    {
>> +        __uint128_t ret;
>> +        struct iremap_entry old_ire;
>> +
>> +        old_ire = *entry;
>> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
>> +
>> +        /*
>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
>> +         * IRTE, and the hardware cannot update the IRTE behind us, so
>> +         * the return value of cmpxchg16 should be the same as old_ire.
>> +         * This ASSERT validate it.
>> +         */
>> +        ASSERT(ret == old_ire.val);
>> +    }
>> +    else
>> +    {
>> +        /*
>> +         * The following code will update irte atomically if possible.
>
>There's nothing atomic below - between the compares and stores
>the value in the table could change. Please don't make false
>promises in comments.

Ok. I agree. Then do you think the parameter 'atomic' of this function is proper?

I think this atomic means the caller wants this update to be presented to VT-d hardware
as a atomic update. That's to say, no intermediate, invalid IRTE can be seen by hardware.

>
>> +         * If the caller requests a atomic update but we can't meet it, 
>> +         * a bug will be raised.
>> +         */
>> +        if ( entry->lo == new_ire->lo )
>> +            entry->hi = new_ire->hi;
>> +        else if ( entry->hi == new_ire->hi )
>> +            entry->lo = new_ire->lo;
>
>Best effort would still call for use of write_atomic() instead of both
>of the assignments above.

Will fix. Your concern is compiler would wrongly optimize the assignments?

>
>> @@ -353,7 +409,11 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
>>          remap_rte->format = 1;    /* indicate remap format */
>>      }
>>  
>> -    *iremap_entry = new_ire;
>> +    if ( init )
>> +        update_irte_non_atomic(iremap_entry, &new_ire);
>> +    else
>> +        update_irte_atomic(iremap_entry, &new_ire);
>
>Seems like you'd better call update_irte() here directly, instead of
>having this if/else. Which puts under question the usefulness of the
>two wrappers.

agree. Will remove the two wrappers.

>
>> @@ -639,7 +702,14 @@ static int msi_msg_to_remap_entry(
>>      remap_rte->address_hi = 0;
>>      remap_rte->data = index - i;
>>  
>> -    *iremap_entry = new_ire;
>> +    if ( msi_desc->remap_entry_initialized )
>> +        update_irte_atomic(iremap_entry, &new_ire);
>> +    else
>> +    {
>> +        update_irte_non_atomic(iremap_entry, &new_ire);
>> +        msi_desc->remap_entry_initialized = true;
>> +    }
>
>Same here.
>
>I also wonder whether you really need the new flag: The function
>knows whether it's initializing the IRTE for the first time, as it
>allocates a table index just like ioapic_rte_to_remap_entry() does
>(where you get away without a new flag).

For multi-vector msi case, I don't have a clean solution to get away this flag.
The problem here is that it allocates several table indexes for multi-vector msi
and only initialize the first one. For others, it isn't aware whether the IRTE
is initialized or not.

Thank
Chao

>
>Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-03-31 10:38           ` Jan Beulich
@ 2017-04-05  0:20             ` Chao Gao
  2017-04-05  8:03               ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Chao Gao @ 2017-04-05  0:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Fri, Mar 31, 2017 at 04:38:02AM -0600, Jan Beulich wrote:
>>>> On 31.03.17 at 05:27, <chao.gao@intel.com> wrote:
>>>>>This then also raises the question whether the call to
>>>>>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>>>>>priority delivery mode.
>>>> 
>>>> For lowest priority delivery mode, if VT-d PI is enabled, the result (the
>>>> destination vcpu) is overrided by vector_hashing_dest() to keep the 
>>>> existing behavior. I think the only point we should keep in mind is
>>>> for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
>>>> and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.
>>>
>>>Well, the override is done for the iommu_intpost case. The remark
>>>on hvm_girq_dest_2_vcpu_id(), however, was made in general.
>> 
>> Ok. You meant the method using in hvm_girq_dest_2_vcpu_id() may not match 
>> the method
>> used by real hardware. I will check it.
>
>I mean that the method used there is pretty clearly "fixed" mode
>handling. Thanks for checking.

I think the method is ok here. It is an optimization to reduce the IPI between
CPUs. Only the cases that the destination vcpu is a single vcpu can use this
optimization. For lowest priority delivery case, we can regard the destination
vcpus that match the 'dest_mode' and 'dest' here as possible destination vcpus.
To emulate hardware accurately, I think we can use this optimization
when a msi has multiple possible destination vcpus.

Thank
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely
  2017-04-04 19:12     ` Chao Gao
@ 2017-04-05  7:40       ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2017-04-05  7:40 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 04.04.17 at 21:12, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 04:01:31AM -0600, Jan Beulich wrote:
>>>>> On 29.03.17 at 07:11, <chao.gao@intel.com> wrote:
>>> +static void update_irte(struct iremap_entry *entry,
>>> +                        const struct iremap_entry *new_ire,
>>> +                        bool atomic)
>>> +{
>>> +    if ( cpu_has_cx16 )
>>> +    {
>>> +        __uint128_t ret;
>>> +        struct iremap_entry old_ire;
>>> +
>>> +        old_ire = *entry;
>>> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
>>> +
>>> +        /*
>>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
>>> +         * IRTE, and the hardware cannot update the IRTE behind us, so
>>> +         * the return value of cmpxchg16 should be the same as old_ire.
>>> +         * This ASSERT validate it.
>>> +         */
>>> +        ASSERT(ret == old_ire.val);
>>> +    }
>>> +    else
>>> +    {
>>> +        /*
>>> +         * The following code will update irte atomically if possible.
>>
>>There's nothing atomic below - between the compares and stores
>>the value in the table could change. Please don't make false
>>promises in comments.
> 
> Ok. I agree. Then do you think the parameter 'atomic' of this function is 
> proper?

That depends: As long as no changes behind our backs are possible,
I think it could be left with this name.

> I think this atomic means the caller wants this update to be presented to 
> VT-d hardware
> as a atomic update. That's to say, no intermediate, invalid IRTE can be seen 
> by hardware.

Right, that's what the comment should say imo.

>>> +         * If the caller requests a atomic update but we can't meet it, 
>>> +         * a bug will be raised.
>>> +         */
>>> +        if ( entry->lo == new_ire->lo )
>>> +            entry->hi = new_ire->hi;
>>> +        else if ( entry->hi == new_ire->hi )
>>> +            entry->lo = new_ire->lo;
>>
>>Best effort would still call for use of write_atomic() instead of both
>>of the assignments above.
> 
> Will fix. Your concern is compiler would wrongly optimize the assignments?

With s/optimize/translate/, yes. If you care about atomicity, you
should guarantee as much of it as is possible. As per what you've
said above, if you weren't using write_atomic() here, you'd have
to prove that byte-wise writing in any order could not lead to a
transiently inconsistent IRTE.

>>> @@ -639,7 +702,14 @@ static int msi_msg_to_remap_entry(
>>>      remap_rte->address_hi = 0;
>>>      remap_rte->data = index - i;
>>>  
>>> -    *iremap_entry = new_ire;
>>> +    if ( msi_desc->remap_entry_initialized )
>>> +        update_irte_atomic(iremap_entry, &new_ire);
>>> +    else
>>> +    {
>>> +        update_irte_non_atomic(iremap_entry, &new_ire);
>>> +        msi_desc->remap_entry_initialized = true;
>>> +    }
>>
>>Same here.
>>
>>I also wonder whether you really need the new flag: The function
>>knows whether it's initializing the IRTE for the first time, as it
>>allocates a table index just like ioapic_rte_to_remap_entry() does
>>(where you get away without a new flag).
> 
> For multi-vector msi case, I don't have a clean solution to get away this flag.
> The problem here is that it allocates several table indexes for multi-vector msi
> and only initialize the first one. For others, it isn't aware whether the IRTE
> is initialized or not.

Oh, indeed. I did overlook this aspect. May I then suggest to
shorten the field name to e.g. irte_initialized?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-05  0:20             ` Chao Gao
@ 2017-04-05  8:03               ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2017-04-05  8:03 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 05.04.17 at 02:20, <chao.gao@intel.com> wrote:
> On Fri, Mar 31, 2017 at 04:38:02AM -0600, Jan Beulich wrote:
>>>>> On 31.03.17 at 05:27, <chao.gao@intel.com> wrote:
>>>>>>This then also raises the question whether the call to
>>>>>>hvm_girq_dest_2_vcpu_id() is actually legitimate for lowest
>>>>>>priority delivery mode.
>>>>> 
>>>>> For lowest priority delivery mode, if VT-d PI is enabled, the result (the
>>>>> destination vcpu) is overrided by vector_hashing_dest() to keep the 
>>>>> existing behavior. I think the only point we should keep in mind is
>>>>> for cases other than lowest priority delivery mode, pi_find_dest_vcpu()
>>>>> and hvm_girq_dest_2_vcpu_id() give the same destination vcpu.
>>>>
>>>>Well, the override is done for the iommu_intpost case. The remark
>>>>on hvm_girq_dest_2_vcpu_id(), however, was made in general.
>>> 
>>> Ok. You meant the method using in hvm_girq_dest_2_vcpu_id() may not match 
>>> the method
>>> used by real hardware. I will check it.
>>
>>I mean that the method used there is pretty clearly "fixed" mode
>>handling. Thanks for checking.
> 
> I think the method is ok here. It is an optimization to reduce the IPI between
> CPUs. Only the cases that the destination vcpu is a single vcpu can use this
> optimization. For lowest priority delivery case, we can regard the destination
> vcpus that match the 'dest_mode' and 'dest' here as possible destination vcpus.
> To emulate hardware accurately, I think we can use this optimization
> when a msi has multiple possible destination vcpus.

Hmm, looking again I think I was confused when I did look last time.
The (non-obvious) use of APIC_DEST_NOSHORT in the call to
vlapic_match_dest() looks correct for Fixed and LowestPrio modes
(and is in fact independent of delivery mode selection), so there
indeed shouldn't be an issue here. I'm sorry for the extra trouble.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-04-05  8:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29  5:11 [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
2017-03-29  5:11 ` [PATCH v11 1/6] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
2017-03-31  5:28   ` Tian, Kevin
2017-03-30 23:10     ` Chao Gao
2017-03-31 10:27       ` Jan Beulich
2017-03-31  9:31   ` Jan Beulich
2017-03-31  2:42     ` Chao Gao
2017-03-31 10:06       ` Jan Beulich
2017-03-31  3:27         ` Chao Gao
2017-03-31 10:38           ` Jan Beulich
2017-04-05  0:20             ` Chao Gao
2017-04-05  8:03               ` Jan Beulich
2017-03-29  5:11 ` [PATCH v11 2/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
2017-03-31  5:46   ` Tian, Kevin
2017-03-30 23:01     ` Chao Gao
2017-03-31  8:11       ` Jan Beulich
2017-03-31  1:13         ` Chao Gao
2017-03-31  9:48   ` Jan Beulich
2017-03-29  5:11 ` [PATCH v11 3/6] VT-d: Some cleanups Chao Gao
2017-03-29  5:11 ` [PATCH v11 4/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
2017-03-29  5:11 ` [PATCH v11 5/6] VT-d: introduce update_irte to update irte safely Chao Gao
2017-03-31  6:03   ` Tian, Kevin
2017-03-31 10:01   ` Jan Beulich
2017-04-04 19:12     ` Chao Gao
2017-04-05  7:40       ` Jan Beulich
2017-03-29  5:11 ` [PATCH v11 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
2017-03-31  5:13 ` [PATCH v11 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.