All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list
@ 2017-04-06  0:30 Chao Gao
  2017-04-06  0:30 ` [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Paul Durrant, Jan Beulich, Chao Gao

The current VT-d PI related code may operate incorrectly in the 
following scenarios: 
1. When VT-d PI is enabled, needn't migrate pirq which is using VT-d PI during
vCPU migration. Patch [1/7] solves this by introducing a new flag to indicate
that the pt-irq is delivered through VT-d PI.

2. Patch [2/7] is to make io.h self-contained. It avoids compile errors 
after including hvm/vmx/vmcs.h in asm-x86/msi.h and asm-x86/iommu.h. It is a
preparation for Patch [3/7].

2. msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format.
It wrongly inherits the 'im' field from the live IRTE but updates all the
other fileds to remapping format. Patch [3/7] handles this.

3. [4/7] is a cleanup patch 

4. When a pCPU is unplugged, and there might be vCPUs on its 
list. Since the pCPU is offline, those vCPUs might not be woken 
up again. [5/7] addresses it. 

5. IRTE is updated through structure assigment which is unsafe in some cases.
To resolve this, Patch [6/7] provides two variants, atomic and non-atomic, to
update IRTE. And a bug is raised when we can't meet the caller's atomic
requirement.

6. We didn't change the IRTE to remapping format when pt-irq is configurated
to have multi-destination vCPUs. Patch [7/7] resolves this problem.

Chao Gao (5):
  passthrough: don't migrate pirq when it is delivered through VT-d PI
  x86/hvm: make io.h self-contained
  VT-d: Introduce new fields in msi_desc to track binding with guest
    interrupt
  VT-d: introduce update_irte to update irte safely
  passthrough/io: Fall back to remapping interrupt when we can't use
    VT-d PI

Feng Wu (2):
  VT-d: Some cleanups
  VMX: Fixup PI descriptor when cpu is offline

 xen/arch/x86/hvm/hvm.c                 |   3 +
 xen/arch/x86/hvm/vmx/vmcs.c            |   1 +
 xen/arch/x86/hvm/vmx/vmx.c             |  70 +++++++++++
 xen/arch/x86/msi.c                     |   2 +
 xen/drivers/passthrough/io.c           |  73 +++--------
 xen/drivers/passthrough/vtd/intremap.c | 220 ++++++++++++++-------------------
 xen/include/asm-x86/hvm/io.h           |   1 +
 xen/include/asm-x86/hvm/vmx/vmx.h      |   1 +
 xen/include/asm-x86/iommu.h            |   4 +-
 xen/include/asm-x86/msi.h              |   5 +
 xen/include/xen/hvm/irq.h              |   1 +
 11 files changed, 195 insertions(+), 186 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-07 10:38   ` Jan Beulich
  2017-04-06  0:30 ` [PATCH v12 2/7] x86/hvm: make io.h self-contained Chao Gao
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Chao Gao

When a vCPU was migrated to another pCPU, pt irqs binded to this vCPU might
also need migration as a optimization to reduce IPI between pCPUs. When VT-d
PI is enabled, interrupt vector will be recorded to a main memory resident
data-structure and a notification whose destination is decided by NDST is
generated. NDST is properly adjusted during vCPU migration so pirq directly
injected to guest needn't be migrated.

This patch adds a indicator, @posted, to show whether the pt irq is delivered
through VT-d PI. Also this patch fixes a bug that hvm_migrate_pirq() accesses
pirq_dpci->gmsi.dest_vcpu_id without checking the pirq_dpci's type.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v12:
- fix a logic error in fixed delivery case.

v11:
- rename the indicator to 'posted'
- move setting 'posted' field to event lock un-locked region.

v10:
- Newly added.

 xen/arch/x86/hvm/hvm.c       |  3 +++
 xen/drivers/passthrough/io.c | 63 +++++++++-----------------------------------
 xen/include/xen/hvm/irq.h    |  1 +
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index eba6e9d..d4c8967 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     struct vcpu *v = arg;
 
     if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
+         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
+         (!pirq_dpci->gmsi.posted) &&
          (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
     {
         struct irq_desc *desc =
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 83e0961..4d19413 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -259,52 +259,6 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
     return dest;
 }
 
-/*
- * The purpose of this routine is to find the right destination vCPU for
- * an interrupt which will be delivered by VT-d posted-interrupt. There
- * are several cases as below:
- *
- * - For lowest-priority interrupts, use vector-hashing mechanism to find
- *   the destination.
- * - Otherwise, for single destination interrupt, it is straightforward to
- *   find the destination vCPU and return true.
- * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
- *   so return NULL.
- */
-static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
-                                      bool_t dest_mode, uint8_t delivery_mode,
-                                      uint8_t gvec)
-{
-    unsigned int dest_vcpus = 0;
-    struct vcpu *v, *dest = NULL;
-
-    switch ( delivery_mode )
-    {
-    case dest_LowestPrio:
-        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
-    case dest_Fixed:
-        for_each_vcpu ( d, v )
-        {
-            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
-                                    dest_id, dest_mode) )
-                continue;
-
-            dest_vcpus++;
-            dest = v;
-        }
-
-        /* For fixed mode, we only handle single-destination interrupts. */
-        if ( dest_vcpus == 1 )
-            return dest;
-
-        break;
-    default:
-        break;
-    }
-
-    return NULL;
-}
-
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -365,6 +319,7 @@ int pt_irq_create_bind(
     {
         uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
+        const struct vcpu *vcpu;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         {
@@ -442,17 +397,25 @@ int pt_irq_create_bind(
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
+
+        pirq_dpci->gmsi.posted = false;
+        vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
+        if ( iommu_intpost )
+        {
+            if ( delivery_mode == dest_LowestPrio )
+                vcpu = vector_hashing_dest(d, dest, dest_mode,
+                                           pirq_dpci->gmsi.gvec);
+            if ( vcpu )
+                pirq_dpci->gmsi.posted = true;
+        }
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
         {
-            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
-                                          delivery_mode, pirq_dpci->gmsi.gvec);
-
             if ( vcpu )
-                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
             else
                 dprintk(XENLOG_G_INFO,
                         "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index f041252..671a6f2 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -63,6 +63,7 @@ struct hvm_gmsi_info {
     uint32_t gvec;
     uint32_t gflags;
     int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
+    bool posted; /* directly deliver to guest via VT-d PI? */
 };
 
 struct hvm_girq_dpci_mapping {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 2/7] x86/hvm: make io.h self-contained
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-04-06  0:30 ` [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-06 12:10   ` Paul Durrant
  2017-04-06  0:30 ` [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich, Chao Gao

io.h uses structure npfec without including the file xen/mm.h where the
structure is defined.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/include/asm-x86/hvm/io.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index d6801c1..2349f58 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -19,6 +19,7 @@
 #ifndef __ASM_X86_HVM_IO_H__
 #define __ASM_X86_HVM_IO_H__
 
+#include <xen/mm.h>
 #include <asm/hvm/vpic.h>
 #include <asm/hvm/vioapic.h>
 #include <public/hvm/ioreq.h>
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-04-06  0:30 ` [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
  2017-04-06  0:30 ` [PATCH v12 2/7] x86/hvm: make io.h self-contained Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-07  7:43   ` Tian, Kevin
  2017-04-07 12:17   ` Jan Beulich
  2017-04-06  0:30 ` [PATCH v12 4/7] VT-d: Some cleanups Chao Gao
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format. It
wrongly inherits the 'im' field meaning the IRTE is in posted format but
updates all the other fields to remapping format.

There are also two situations that lead to the above issue. One is some callers
really want to change the IRTE to remapped format. The other is some callers
only want to update msi message (e.g. set msi affinity) for they don't aware
that this msi is binded with a guest interrupt. We should suppress update
in the second situation. To distinguish them, straightforwardly, we can let
caller specify which format of IRTE they want update to. It isn't feasible for
making all callers be aware of the binding with guest interrupt will cause a
far more complicated change (including the interfaces exposed to IOAPIC and
MSI). Also some callings happen in interrupt context where we can't acquire
d->event_lock to read struct hvm_pirq_dpci.

This patch introduces two new fields in msi_desc to track binding with a guest
interrupt such that msi_msg_to_remap_entry() can get the binding and update
IRTE accordingly. After that change, pi_update_irte() can utilize
msi_msg_to_remap_entry() to update IRTE to posted format.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v12:
- pass const struct pi_desc * to pi_update_irte() other than struct vcpu
- rearrage new fields in msi_desc to avoid more holes in this structure
- change the condition we call pi_update_irte in pt_irq_destroy_bind() 

v11:
- Commit message changes.
- Add code to clear the two new fields introduced in this patch.

v10:
- Newly added.

 xen/arch/x86/msi.c                     |   1 +
 xen/drivers/passthrough/io.c           |   5 +-
 xen/drivers/passthrough/vtd/intremap.c | 151 +++++++--------------------------
 xen/include/asm-x86/iommu.h            |   4 +-
 xen/include/asm-x86/msi.h              |   4 +
 5 files changed, 43 insertions(+), 122 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index a868007..3374cd4 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
         entry[nr].dev = NULL;
         entry[nr].irq = -1;
         entry[nr].remap_index = -1;
+        entry[nr].pi_desc = NULL;
     }
 
     return entry;
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 4d19413..3e0a10e 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -415,7 +415,8 @@ int pt_irq_create_bind(
         if ( iommu_intpost )
         {
             if ( vcpu )
-                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
+                pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
+                               pirq_dpci->gmsi.gvec);
             else
                 dprintk(XENLOG_G_INFO,
                         "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
@@ -619,6 +620,8 @@ int pt_irq_destroy_bind(
         else
             what = "bogus";
     }
+    else if ( pirq_dpci && pirq_dpci->gmsi.posted )
+        pi_update_irte(NULL, pirq, 0);
 
     if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) &&
          list_empty(&pirq_dpci->digl_list) )
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index bfd468b..6314cbf 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -551,12 +551,12 @@ static int msi_msg_to_remap_entry(
     struct iommu *iommu, struct pci_dev *pdev,
     struct msi_desc *msi_desc, struct msi_msg *msg)
 {
-    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
-    struct iremap_entry new_ire;
+    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
     struct msi_msg_remap_entry *remap_rte;
     unsigned int index, i, nr = 1;
     unsigned long flags;
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+    const struct pi_desc *pi_desc = msi_desc->pi_desc;
 
     if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
         nr = msi_desc->msi.nvec;
@@ -595,33 +595,35 @@ static int msi_msg_to_remap_entry(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
-
-    /* Set interrupt remapping table entry */
-    new_ire.remap.fpd = 0;
-    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
-    /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
-    new_ire.remap.avail = 0;
-    new_ire.remap.res_1 = 0;
-    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                            MSI_DATA_VECTOR_MASK;
-    new_ire.remap.res_2 = 0;
-    if ( x2apic_enabled )
-        new_ire.remap.dst = msg->dest32;
+    if ( !pi_desc )
+    {
+        new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
+        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
+        new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
+        /* Hardware requires RH = 1 for lowest priority delivery mode */
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                                MSI_DATA_VECTOR_MASK;
+        if ( x2apic_enabled )
+            new_ire.remap.dst = msg->dest32;
+        else
+            new_ire.remap.dst =
+                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
+        new_ire.remap.p = 1;
+    }
     else
-        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                             & 0xff) << 8;
+    {
+        new_ire.post.im = 1;
+        new_ire.post.vector = msi_desc->gvec;
+        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
+        new_ire.post.p = 1;
+    }
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.remap.res_3 = 0;
-    new_ire.remap.res_4 = 0;
-    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
@@ -902,61 +904,16 @@ void iommu_disable_x2apic_IR(void)
         disable_qinval(drhd->iommu);
 }
 
-static void setup_posted_irte(
-    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
-    const struct pi_desc *pi_desc, const uint8_t gvec)
-{
-    memset(new_ire, 0, sizeof(*new_ire));
-
-    /*
-     * 'im' filed decides whether the irte is in posted format (with value 1)
-     * or remapped format (with value 0), if the old irte is in remapped format,
-     * we copy things from remapped part in 'struct iremap_entry', otherwise,
-     * we copy from posted part.
-     */
-    if ( !old_ire->remap.im )
-    {
-        new_ire->post.p = old_ire->remap.p;
-        new_ire->post.fpd = old_ire->remap.fpd;
-        new_ire->post.sid = old_ire->remap.sid;
-        new_ire->post.sq = old_ire->remap.sq;
-        new_ire->post.svt = old_ire->remap.svt;
-    }
-    else
-    {
-        new_ire->post.p = old_ire->post.p;
-        new_ire->post.fpd = old_ire->post.fpd;
-        new_ire->post.sid = old_ire->post.sid;
-        new_ire->post.sq = old_ire->post.sq;
-        new_ire->post.svt = old_ire->post.svt;
-        new_ire->post.urg = old_ire->post.urg;
-    }
-
-    new_ire->post.im = 1;
-    new_ire->post.vector = gvec;
-    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
-    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
-}
-
 /*
  * This function is used to update the IRTE for posted-interrupt
  * when guest changes MSI/MSI-X information.
  */
-int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
+int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
     const uint8_t gvec)
 {
     struct irq_desc *desc;
-    const struct msi_desc *msi_desc;
-    int remap_index;
-    int rc = 0;
-    const struct pci_dev *pci_dev;
-    const struct acpi_drhd_unit *drhd;
-    struct iommu *iommu;
-    struct ir_ctrl *ir_ctrl;
-    struct iremap_entry *iremap_entries = NULL, *p = NULL;
-    struct iremap_entry new_ire, old_ire;
-    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
-    __uint128_t ret;
+    struct msi_desc *msi_desc;
+    int rc;
 
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
     if ( !desc )
@@ -968,59 +925,13 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
         rc = -ENODEV;
         goto unlock_out;
     }
-
-    pci_dev = msi_desc->dev;
-    if ( !pci_dev )
-    {
-        rc = -ENODEV;
-        goto unlock_out;
-    }
-
-    remap_index = msi_desc->remap_index;
+    msi_desc->pi_desc = pi_desc;
+    msi_desc->gvec = gvec;
 
     spin_unlock_irq(&desc->lock);
 
     ASSERT(pcidevs_locked());
-
-    /*
-     * FIXME: For performance reasons we should store the 'iommu' pointer in
-     * 'struct msi_desc' in some other place, so we don't need to waste
-     * time searching it here.
-     */
-    drhd = acpi_find_matched_drhd_unit(pci_dev);
-    if ( !drhd )
-        return -ENODEV;
-
-    iommu = drhd->iommu;
-    ir_ctrl = iommu_ir_ctrl(iommu);
-    if ( !ir_ctrl )
-        return -ENODEV;
-
-    spin_lock_irq(&ir_ctrl->iremap_lock);
-
-    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
-
-    old_ire = *p;
-
-    /* Setup/Update interrupt remapping table entry. */
-    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
-    ret = cmpxchg16b(p, &old_ire, &new_ire);
-
-    /*
-     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
-     * and the hardware cannot update the IRTE behind us, so the return value
-     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
-     */
-    ASSERT(ret == old_ire.val);
-
-    iommu_flush_cache_entry(p, sizeof(*p));
-    iommu_flush_iec_index(iommu, 0, remap_index);
-
-    unmap_vtd_domain_page(iremap_entries);
-
-    spin_unlock_irq(&ir_ctrl->iremap_lock);
-
-    return 0;
+    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
 
  unlock_out:
     spin_unlock_irq(&desc->lock);
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index be95106..fb86f38 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -18,6 +18,7 @@
 #include <xen/list.h>
 #include <xen/spinlock.h>
 #include <asm/processor.h>
+#include <asm/hvm/vmx/vmcs.h>
 
 #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 #define MAX_IOMMUS 32
@@ -92,7 +93,8 @@ bool_t iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
-int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
+int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
+                   const uint8_t gvec);
 
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 9c02945..bf243f8 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -4,6 +4,7 @@
 #include <xen/cpumask.h>
 #include <xen/pci.h>
 #include <asm/byteorder.h>
+#include <asm/hvm/vmx/vmcs.h>
 
 /*
  * Constants for Intel APIC based MSI messages.
@@ -102,6 +103,9 @@ struct msi_desc {
 		__u16	entry_nr;	/* specific enabled entry 	  */
 	} msi_attrib;
 
+	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
+	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
+
 	struct list_head list;
 
 	union {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 4/7] VT-d: Some cleanups
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (2 preceding siblings ...)
  2017-04-06  0:30 ` [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-06  0:30 ` [PATCH v12 5/7] VMX: Fixup PI descriptor when cpu is offline Chao Gao
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

Use type-safe structure assignment instead of memcpy()
Use sizeof(*iremap_entry).

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v10:
- Added several lines for patch [1/6] has been reworked.

v9:
- Delete several lines for patch [5/8] has been reworked.

v7: 
- Remove a useless cleanup

v6: 
- More descripion about the patch

 xen/drivers/passthrough/vtd/intremap.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 6314cbf..a18717b 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -183,8 +183,8 @@ static void free_remap_entry(struct iommu *iommu, int index)
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memset(iremap_entry, 0, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    memset(iremap_entry, 0, sizeof(*iremap_entry));
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -310,7 +310,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
+    new_ire = *iremap_entry;
 
     if ( rte_upper )
     {
@@ -353,8 +353,8 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -639,8 +639,8 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 5/7] VMX: Fixup PI descriptor when cpu is offline
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (3 preceding siblings ...)
  2017-04-06  0:30 ` [PATCH v12 4/7] VT-d: Some cleanups Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-06  0:30 ` [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely Chao Gao
  2017-04-06  0:30 ` [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
  6 siblings, 0 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c       |  1 +
 xen/arch/x86/hvm/vmx/vmx.c        | 70 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 934674c..99c77b9 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -591,6 +591,7 @@ void vmx_cpu_dead(unsigned int cpu)
     vmx_free_vmcs(per_cpu(vmxon_region, cpu));
     per_cpu(vmxon_region, cpu) = 0;
     nvmx_cpu_dead(cpu);
+    vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d201956..25f9ec9 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -199,6 +199,76 @@ static void vmx_pi_do_resume(struct vcpu *v)
     vmx_pi_unblock_vcpu(v);
 }
 
+void vmx_pi_desc_fixup(unsigned int cpu)
+{
+    unsigned int new_cpu, dest;
+    unsigned long flags;
+    struct arch_vmx_struct *vmx, *tmp;
+    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+    if ( !iommu_intpost )
+        return;
+
+    /*
+     * We are in the context of CPU_DEAD or CPU_UP_CANCELED notification,
+     * and it is impossible for a second CPU go down in parallel. So we
+     * can safely acquire the old cpu's lock and then acquire the new_cpu's
+     * lock after that.
+     */
+    spin_lock_irqsave(old_lock, flags);
+
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+    {
+        /*
+         * Suppress notification or we may miss an interrupt when the
+         * target cpu is dying.
+         */
+        pi_set_sn(&vmx->pi_desc);
+
+        /*
+         * Check whether a notification is pending before doing the
+         * movement, if that is the case we need to wake up it directly
+         * other than moving it to the new cpu's list.
+         */
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocking.list);
+            vmx->pi_blocking.lock = NULL;
+            vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
+        }
+        else
+        {
+            /*
+             * We need to find an online cpu as the NDST of the PI descriptor, it
+             * doesn't matter whether it is within the cpupool of the domain or
+             * not. As long as it is online, the vCPU will be woken up once the
+             * notification event arrives.
+             */
+            new_cpu = cpumask_any(&cpu_online_map);
+            new_lock = &per_cpu(vmx_pi_blocking, new_cpu).lock;
+
+            spin_lock(new_lock);
+
+            ASSERT(vmx->pi_blocking.lock == old_lock);
+
+            dest = cpu_physical_id(new_cpu);
+            write_atomic(&vmx->pi_desc.ndst,
+                         x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+
+            list_move(&vmx->pi_blocking.list,
+                      &per_cpu(vmx_pi_blocking, new_cpu).list);
+            vmx->pi_blocking.lock = new_lock;
+
+            spin_unlock(new_lock);
+        }
+
+        pi_clear_sn(&vmx->pi_desc);
+    }
+
+    spin_unlock_irqrestore(old_lock, flags);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 2b781ab..5ead57c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -597,6 +597,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(unsigned int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (4 preceding siblings ...)
  2017-04-06  0:30 ` [PATCH v12 5/7] VMX: Fixup PI descriptor when cpu is offline Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-07  7:50   ` Tian, Kevin
  2017-04-07 12:27   ` Jan Beulich
  2017-04-06  0:30 ` [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
  6 siblings, 2 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

We used structure assignment to update irte which was non-atomic when the
whole IRTE was to be updated. It is unsafe when a interrupt happened during
update. Furthermore, no bug or warning would be reported when this happened.

This patch introduces two variants, atomic and non-atomic, to update irte.
For initialization and release case, the non-atomic variant will be used. for
other cases (such as reprogramming to set irq affinity), the atomic variant
will be used. If the caller requests a atomic update but we can't meet it, we
raise a bug.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v12:
- use write_atomic in update_irte
- add description to explain why no one won't change the IRTE entry which is
using in update_irte(). Also add assertion to verify it.
- rename the new flag in msi_desc to irte_initialized 
- remove two helper function update_irte_{atomic/nonatomic}

v11:
- Add two variant function to update IRTE. Call the non-atomic one for init
and clear operations. Call the atomic one for other cases.
- Add a new field to indicate the remap_entry associated with msi_desc is
initialized or not.

v10:
- rename copy_irte_to_irt to update_irte
- remove copy_from_to_irt
- change commmit message and add some comments to illustrate on which
condition update_irte() is safe.

 xen/arch/x86/msi.c                     |  1 +
 xen/drivers/passthrough/vtd/intremap.c | 61 +++++++++++++++++++++++++++++++---
 xen/include/asm-x86/msi.h              |  1 +
 3 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 3374cd4..d98f400 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -579,6 +579,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
         entry[nr].irq = -1;
         entry[nr].remap_index = -1;
         entry[nr].pi_desc = NULL;
+        entry[nr].irte_initialized = false;
     }
 
     return entry;
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index a18717b..a7bdbe4 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -169,10 +169,55 @@ bool_t __init iommu_supports_eim(void)
     return 1;
 }
 
+/*
+ * Assume iremap_lock has been acquired. It is to make sure software will not
+ * change the same IRTE behind us. With this assumption, if only high qword or
+ * low qword in IRTE is to be updated, this function's atomic variant can
+ * present an atomic update to VT-d hardware even when cmpxchg16b
+ * instruction is not supported.
+ */
+static void update_irte(struct iommu *iommu, struct iremap_entry *entry,
+                        const struct iremap_entry *new_ire, bool atomic)
+{
+    ASSERT(spin_is_locked(&iommu_ir_ctrl(iommu)->iremap_lock));
+
+    if ( cpu_has_cx16 )
+    {
+        __uint128_t ret;
+        struct iremap_entry old_ire;
+
+        old_ire = *entry;
+        ret = cmpxchg16b(entry, &old_ire, new_ire);
+
+        /*
+         * In the above, we use cmpxchg16 to atomically update the 128-bit
+         * IRTE, and the hardware cannot update the IRTE behind us, so
+         * the return value of cmpxchg16 should be the same as old_ire.
+         * This ASSERT validate it.
+         */
+        ASSERT(ret == old_ire.val);
+    }
+    else
+    {
+        /*
+         * If the caller requests a atomic update but we can't meet it, 
+         * a bug will be raised.
+         */
+        if ( entry->lo == new_ire->lo )
+            write_atomic(&entry->hi, new_ire->hi);
+        else if ( entry->hi == new_ire->hi )
+            write_atomic(&entry->lo, new_ire->lo);
+        else if ( !atomic )
+            *entry = *new_ire;
+        else
+            BUG();
+    }
+}
+
 /* Mark specified intr remap entry as free */
 static void free_remap_entry(struct iommu *iommu, int index)
 {
-    struct iremap_entry *iremap_entry = NULL, *iremap_entries;
+    struct iremap_entry *iremap_entry = NULL, *iremap_entries, new_ire = { };
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
 
     if ( index < 0 || index > IREMAP_ENTRY_NR - 1 )
@@ -183,7 +228,7 @@ static void free_remap_entry(struct iommu *iommu, int index)
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memset(iremap_entry, 0, sizeof(*iremap_entry));
+    update_irte(iommu, iremap_entry, &new_ire, false);
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
@@ -286,6 +331,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     int index;
     unsigned long flags;
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+    bool init = false;
 
     remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
     spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
@@ -296,6 +342,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         index = alloc_remap_entry(iommu, 1);
         if ( index < IREMAP_ENTRY_NR )
             apic_pin_2_ir_idx[apic][ioapic_pin] = index;
+        init = true;
     }
 
     if ( index > IREMAP_ENTRY_NR - 1 )
@@ -353,7 +400,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    *iremap_entry = new_ire;
+    update_irte(iommu, iremap_entry, &new_ire, !init);
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
@@ -567,7 +614,10 @@ static int msi_msg_to_remap_entry(
     {
         /* Free specified unused IRTEs */
         for ( i = 0; i < nr; ++i )
+        {
             free_remap_entry(iommu, msi_desc->remap_index + i);
+            msi_desc[i].irte_initialized = false;
+        }
         spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
         return 0;
     }
@@ -639,7 +689,10 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    *iremap_entry = new_ire;
+    update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
+    if ( !msi_desc->irte_initialized )
+        msi_desc->irte_initialized = true;
+
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index bf243f8..a5de6a1 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -103,6 +103,7 @@ struct msi_desc {
 		__u16	entry_nr;	/* specific enabled entry 	  */
 	} msi_attrib;
 
+	bool irte_initialized;
 	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
 	const struct pi_desc *pi_desc;	/* pointer to posted descriptor */
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (5 preceding siblings ...)
  2017-04-06  0:30 ` [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-04-06  0:30 ` Chao Gao
  2017-04-07 12:54   ` Jan Beulich
  6 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-06  0:30 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Jan Beulich, Chao Gao

The current logic of using VT-d pi is when guest configurates the pirq's
destination vcpu to a single vcpu, the according IRTE is updated to
posted format. If the destination of the pirq is multiple vcpus, we will
stay in posted format. Obviously, we should fall back to remapping interrupt
when guest wrongly configurate destination of pirq or makes it have
multi-destination vcpus.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v11:
- move the code (one line) that allow the parameter 'vcpu' of pi_update_irte()
can be NULL to Patch [2/6].

v10:
- Newly added

 xen/drivers/passthrough/io.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 3e0a10e..3503fad 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -413,15 +413,8 @@ int pt_irq_create_bind(
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
-        {
-            if ( vcpu )
-                pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
-                               pirq_dpci->gmsi.gvec);
-            else
-                dprintk(XENLOG_G_INFO,
-                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
-                        vcpu, pirq_dpci->gmsi.gvec);
-        }
+            pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
+                           pirq_dpci->gmsi.gvec);
 
         break;
     }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 2/7] x86/hvm: make io.h self-contained
  2017-04-06  0:30 ` [PATCH v12 2/7] x86/hvm: make io.h self-contained Chao Gao
@ 2017-04-06 12:10   ` Paul Durrant
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Durrant @ 2017-04-06 12:10 UTC (permalink / raw)
  To: 'Chao Gao', xen-devel; +Cc: Andrew Cooper, Jan Beulich

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 05 April 2017 20:30
> To: xen-devel@lists.xen.org
> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>
> Subject: [PATCH v12 2/7] x86/hvm: make io.h self-contained
> 
> io.h uses structure npfec without including the file xen/mm.h where the
> structure is defined.
> 

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/include/asm-x86/hvm/io.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index d6801c1..2349f58 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -19,6 +19,7 @@
>  #ifndef __ASM_X86_HVM_IO_H__
>  #define __ASM_X86_HVM_IO_H__
> 
> +#include <xen/mm.h>
>  #include <asm/hvm/vpic.h>
>  #include <asm/hvm/vioapic.h>
>  #include <public/hvm/ioreq.h>
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-07 10:38   ` Jan Beulich
@ 2017-04-07  4:07     ` Chao Gao
  2017-04-07 11:50       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-07  4:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Kevin Tian, xen-devel

Cc: kevin

On Fri, Apr 07, 2017 at 04:38:00AM -0600, Jan Beulich wrote:
>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>>      struct vcpu *v = arg;
>>  
>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>> +         (!pirq_dpci->gmsi.posted) &&
>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>      {
>
>I don't think I've seen you address Kevin's comment on this for v11,
>and like Kevin I can't immediately see why the above addition would
>be correct. Do you perhaps mean
>
>    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>         (!pirq_dpci->gmsi.posted ||
>          <whatever is appropriate here, if anything>) &&
>         (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )

Sorry to Kevin. And thanks to point it out. 
But I thought we had discussed this in https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg04383.html. I did think you agreed with me.
gmsi is invalid when pirq_dpci is not GUEST_MSI, is there something I have
ignored?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-04-07 12:54   ` Jan Beulich
@ 2017-04-07  6:41     ` Chao Gao
  0 siblings, 0 replies; 22+ messages in thread
From: Chao Gao @ 2017-04-07  6:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, xen-devel

On Fri, Apr 07, 2017 at 06:54:59AM -0600, Jan Beulich wrote:
>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>> The current logic of using VT-d pi is when guest configurates the pirq's
>> destination vcpu to a single vcpu, the according IRTE is updated to
>> posted format. If the destination of the pirq is multiple vcpus, we will
>> stay in posted format. Obviously, we should fall back to remapping interrupt
>> when guest wrongly configurate destination of pirq or makes it have
>> multi-destination vcpus.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>> v11:
>> - move the code (one line) that allow the parameter 'vcpu' of pi_update_irte()
>> can be NULL to Patch [2/6].
>
>There was a change in v12 too, and I'm afraid it's buggy:

Indeed.

>
>> --- a/xen/drivers/passthrough/io.c
>> +++ b/xen/drivers/passthrough/io.c
>> @@ -413,15 +413,8 @@ int pt_irq_create_bind(
>>  
>>          /* Use interrupt posting if it is supported. */
>>          if ( iommu_intpost )
>> -        {
>> -            if ( vcpu )
>> -                pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
>> -                               pirq_dpci->gmsi.gvec);
>> -            else
>> -                dprintk(XENLOG_G_INFO,
>> -                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
>> -                        vcpu, pirq_dpci->gmsi.gvec);
>> -        }
>> +            pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
>> +                           pirq_dpci->gmsi.gvec);
>
>pi_update_irte() tolerates a NULL first argument, but if vcpu is NULL
>you don't pass NULL here. I'll fix this up while committing.
>

Thanks for your kind help.
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-07 11:50       ` Jan Beulich
@ 2017-04-07  7:23         ` Chao Gao
  2017-04-07 14:48           ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-07  7:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Kevin Tian, xen-devel

On Fri, Apr 07, 2017 at 05:50:36AM -0600, Jan Beulich wrote:
>>>> On 07.04.17 at 06:07, <chao.gao@intel.com> wrote:
>> Cc: kevin
>> 
>> On Fri, Apr 07, 2017 at 04:38:00AM -0600, Jan Beulich wrote:
>>>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct 
>> hvm_pirq_dpci *pirq_dpci,
>>>>      struct vcpu *v = arg;
>>>>  
>>>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>>>> +         /* Needn't migrate pirq if this pirq is delivered to guest 
>> directly.*/
>>>> +         (!pirq_dpci->gmsi.posted) &&
>>>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>>>      {
>>>
>>>I don't think I've seen you address Kevin's comment on this for v11,
>>>and like Kevin I can't immediately see why the above addition would
>>>be correct. Do you perhaps mean
>>>
>>>    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>>         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>>>         (!pirq_dpci->gmsi.posted ||
>>>          <whatever is appropriate here, if anything>) &&
>>>         (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>> 
>> Sorry to Kevin. And thanks to point it out. 
>> But I thought we had discussed this in 
>> https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg04383.html. I 
>> did think you agreed with me.
>> gmsi is invalid when pirq_dpci is not GUEST_MSI, is there something I have
>> ignored?
>
>You've been talking about GUEST_PCI there, which I did (and do)
>agree we can't handle here. So for the purposes of your series,
>simply adding the gmsi.posted check would be the right thing imo.
>I don't think I see anything wrong with the ->gmsi accesses here:
>The GUEST_PCI code simply doesn't set them, so dest_vcpu_id
>will still be -1 (from pt_pirq_init()). So I don't see any bug being
>fixed here with the extra other check you add. If you agree, I
>can take that line and the commit message sentence out while
>committing.

Ok. I admit I said it's bug is wrong. feel free to do what you want.

Thanks
Chao

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-04-06  0:30 ` [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-04-07  7:43   ` Tian, Kevin
  2017-04-07 12:17   ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Tian, Kevin @ 2017-04-07  7:43 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Thursday, April 6, 2017 8:30 AM
> 
> msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format. It
> wrongly inherits the 'im' field meaning the IRTE is in posted format but
> updates all the other fields to remapping format.
> 
> There are also two situations that lead to the above issue. One is some
> callers
> really want to change the IRTE to remapped format. The other is some callers
> only want to update msi message (e.g. set msi affinity) for they don't aware
> that this msi is binded with a guest interrupt. We should suppress update
> in the second situation. To distinguish them, straightforwardly, we can let
> caller specify which format of IRTE they want update to. It isn't feasible for
> making all callers be aware of the binding with guest interrupt will cause a
> far more complicated change (including the interfaces exposed to IOAPIC and
> MSI). Also some callings happen in interrupt context where we can't acquire
> d->event_lock to read struct hvm_pirq_dpci.
> 
> This patch introduces two new fields in msi_desc to track binding with a
> guest
> interrupt such that msi_msg_to_remap_entry() can get the binding and
> update
> IRTE accordingly. After that change, pi_update_irte() can utilize
> msi_msg_to_remap_entry() to update IRTE to posted format.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely
  2017-04-06  0:30 ` [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-04-07  7:50   ` Tian, Kevin
  2017-04-07 12:27   ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Tian, Kevin @ 2017-04-07  7:50 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Thursday, April 6, 2017 8:30 AM
> 
> We used structure assignment to update irte which was non-atomic when
> the
> whole IRTE was to be updated. It is unsafe when a interrupt happened
> during
> update. Furthermore, no bug or warning would be reported when this
> happened.
> 
> This patch introduces two variants, atomic and non-atomic, to update irte.
> For initialization and release case, the non-atomic variant will be used. for
> other cases (such as reprogramming to set irq affinity), the atomic variant
> will be used. If the caller requests a atomic update but we can't meet it, we

'a'->'an'. I thought whether above should be updated since two helpers
are removed now. But possibly still OK based on purpose of 'atomic'
parameter in update_irte. :-)

> raise a bug.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v12:
> - use write_atomic in update_irte
> - add description to explain why no one won't change the IRTE entry which is
> using in update_irte(). Also add assertion to verify it.
> - rename the new flag in msi_desc to irte_initialized
> - remove two helper function update_irte_{atomic/nonatomic}
> 
> v11:
> - Add two variant function to update IRTE. Call the non-atomic one for init
> and clear operations. Call the atomic one for other cases.
> - Add a new field to indicate the remap_entry associated with msi_desc is
> initialized or not.
> 
> v10:
> - rename copy_irte_to_irt to update_irte
> - remove copy_from_to_irt
> - change commmit message and add some comments to illustrate on which
> condition update_irte() is safe.
> 
>  xen/arch/x86/msi.c                     |  1 +
>  xen/drivers/passthrough/vtd/intremap.c | 61
> +++++++++++++++++++++++++++++++---
>  xen/include/asm-x86/msi.h              |  1 +
>  3 files changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> index 3374cd4..d98f400 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -579,6 +579,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int
> nr)
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
>          entry[nr].pi_desc = NULL;
> +        entry[nr].irte_initialized = false;
>      }
> 
>      return entry;
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index a18717b..a7bdbe4 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -169,10 +169,55 @@ bool_t __init iommu_supports_eim(void)
>      return 1;
>  }
> 
> +/*
> + * Assume iremap_lock has been acquired. It is to make sure software will
> not
> + * change the same IRTE behind us. With this assumption, if only high
> qword or
> + * low qword in IRTE is to be updated, this function's atomic variant can
> + * present an atomic update to VT-d hardware even when cmpxchg16b
> + * instruction is not supported.
> + */
> +static void update_irte(struct iommu *iommu, struct iremap_entry *entry,
> +                        const struct iremap_entry *new_ire, bool atomic)
> +{
> +    ASSERT(spin_is_locked(&iommu_ir_ctrl(iommu)->iremap_lock));
> +
> +    if ( cpu_has_cx16 )
> +    {
> +        __uint128_t ret;
> +        struct iremap_entry old_ire;
> +
> +        old_ire = *entry;
> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
> +
> +        /*
> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
> +         * IRTE, and the hardware cannot update the IRTE behind us, so
> +         * the return value of cmpxchg16 should be the same as old_ire.
> +         * This ASSERT validate it.
> +         */
> +        ASSERT(ret == old_ire.val);
> +    }
> +    else
> +    {
> +        /*
> +         * If the caller requests a atomic update but we can't meet it,

'a'->'an'

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely
  2017-04-07 12:27   ` Jan Beulich
@ 2017-04-07  7:51     ` Chao Gao
  2017-04-07 14:59       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Chao Gao @ 2017-04-07  7:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

On Fri, Apr 07, 2017 at 06:27:39AM -0600, Jan Beulich wrote:
>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/intremap.c
>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>> @@ -169,10 +169,55 @@ bool_t __init iommu_supports_eim(void)
>>      return 1;
>>  }
>>  
>> +/*
>> + * Assume iremap_lock has been acquired. It is to make sure software will not
>> + * change the same IRTE behind us. With this assumption, if only high qword or
>> + * low qword in IRTE is to be updated, this function's atomic variant can
>> + * present an atomic update to VT-d hardware even when cmpxchg16b
>> + * instruction is not supported.
>> + */
>> +static void update_irte(struct iommu *iommu, struct iremap_entry *entry,
>> +                        const struct iremap_entry *new_ire, bool atomic)
>> +{
>> +    ASSERT(spin_is_locked(&iommu_ir_ctrl(iommu)->iremap_lock));
>> +
>> +    if ( cpu_has_cx16 )
>> +    {
>> +        __uint128_t ret;
>> +        struct iremap_entry old_ire;
>> +
>> +        old_ire = *entry;
>> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
>> +
>> +        /*
>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
>> +         * IRTE, and the hardware cannot update the IRTE behind us, so
>> +         * the return value of cmpxchg16 should be the same as old_ire.
>> +         * This ASSERT validate it.
>> +         */
>> +        ASSERT(ret == old_ire.val);
>> +    }
>> +    else
>> +    {
>> +        /*
>> +         * If the caller requests a atomic update but we can't meet it, 
>> +         * a bug will be raised.
>> +         */
>> +        if ( entry->lo == new_ire->lo )
>> +            write_atomic(&entry->hi, new_ire->hi);
>> +        else if ( entry->hi == new_ire->hi )
>> +            write_atomic(&entry->lo, new_ire->lo);
>> +        else if ( !atomic )
>> +            *entry = *new_ire;
>> +        else
>> +            BUG();
>> +    }
>
>Sadly the comment still uses the word atomic, and there's still no
>mention of whether (and if so, how) the hardware may update an
>IRTE behind your back. But since Kevin gave his R-b, I guess I'll
>have to give up on this.

To make it clearer, the comment you mentioned is the comment in the else()
branch or the comment before this function (or both)?  I will fix it
in a new patch.

>
>> @@ -639,7 +689,10 @@ static int msi_msg_to_remap_entry(
>>      remap_rte->address_hi = 0;
>>      remap_rte->data = index - i;
>>  
>> -    *iremap_entry = new_ire;
>> +    update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
>> +    if ( !msi_desc->irte_initialized )
>> +        msi_desc->irte_initialized = true;
>
>I don't see the point of the conditional, and I guess I'll take the
>liberty to remove it, should I end up committing this patch.

Yes, please.

Thanks
Chao

>
>x86 parts
>Acked-by: Jan Beulich <jbeulich@suse.com>
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-06  0:30 ` [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-04-07 10:38   ` Jan Beulich
  2017-04-07  4:07     ` Chao Gao
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 10:38 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>      struct vcpu *v = arg;
>  
>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
> +         (!pirq_dpci->gmsi.posted) &&
>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>      {

I don't think I've seen you address Kevin's comment on this for v11,
and like Kevin I can't immediately see why the above addition would
be correct. Do you perhaps mean

    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
         (!pirq_dpci->gmsi.posted ||
          <whatever is appropriate here, if anything>) &&
         (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-07  4:07     ` Chao Gao
@ 2017-04-07 11:50       ` Jan Beulich
  2017-04-07  7:23         ` Chao Gao
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 11:50 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Kevin Tian, xen-devel

>>> On 07.04.17 at 06:07, <chao.gao@intel.com> wrote:
> Cc: kevin
> 
> On Fri, Apr 07, 2017 at 04:38:00AM -0600, Jan Beulich wrote:
>>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct 
> hvm_pirq_dpci *pirq_dpci,
>>>      struct vcpu *v = arg;
>>>  
>>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>>> +         /* Needn't migrate pirq if this pirq is delivered to guest 
> directly.*/
>>> +         (!pirq_dpci->gmsi.posted) &&
>>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>>      {
>>
>>I don't think I've seen you address Kevin's comment on this for v11,
>>and like Kevin I can't immediately see why the above addition would
>>be correct. Do you perhaps mean
>>
>>    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
>>         (!pirq_dpci->gmsi.posted ||
>>          <whatever is appropriate here, if anything>) &&
>>         (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
> 
> Sorry to Kevin. And thanks to point it out. 
> But I thought we had discussed this in 
> https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg04383.html. I 
> did think you agreed with me.
> gmsi is invalid when pirq_dpci is not GUEST_MSI, is there something I have
> ignored?

You've been talking about GUEST_PCI there, which I did (and do)
agree we can't handle here. So for the purposes of your series,
simply adding the gmsi.posted check would be the right thing imo.
I don't think I see anything wrong with the ->gmsi accesses here:
The GUEST_PCI code simply doesn't set them, so dest_vcpu_id
will still be -1 (from pt_pirq_init()). So I don't see any bug being
fixed here with the extra other check you add. If you agree, I
can take that line and the commit message sentence out while
committing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-04-06  0:30 ` [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
  2017-04-07  7:43   ` Tian, Kevin
@ 2017-04-07 12:17   ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 12:17 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
> msi_msg_to_remap_entry() is buggy when the live IRTE is in posted format. It
> wrongly inherits the 'im' field meaning the IRTE is in posted format but
> updates all the other fields to remapping format.
> 
> There are also two situations that lead to the above issue. One is some callers
> really want to change the IRTE to remapped format. The other is some callers
> only want to update msi message (e.g. set msi affinity) for they don't aware
> that this msi is binded with a guest interrupt. We should suppress update
> in the second situation. To distinguish them, straightforwardly, we can let
> caller specify which format of IRTE they want update to. It isn't feasible for
> making all callers be aware of the binding with guest interrupt will cause a
> far more complicated change (including the interfaces exposed to IOAPIC and
> MSI). Also some callings happen in interrupt context where we can't acquire
> d->event_lock to read struct hvm_pirq_dpci.
> 
> This patch introduces two new fields in msi_desc to track binding with a guest
> interrupt such that msi_msg_to_remap_entry() can get the binding and update
> IRTE accordingly. After that change, pi_update_irte() can utilize
> msi_msg_to_remap_entry() to update IRTE to posted format.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely
  2017-04-06  0:30 ` [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely Chao Gao
  2017-04-07  7:50   ` Tian, Kevin
@ 2017-04-07 12:27   ` Jan Beulich
  2017-04-07  7:51     ` Chao Gao
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 12:27 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -169,10 +169,55 @@ bool_t __init iommu_supports_eim(void)
>      return 1;
>  }
>  
> +/*
> + * Assume iremap_lock has been acquired. It is to make sure software will not
> + * change the same IRTE behind us. With this assumption, if only high qword or
> + * low qword in IRTE is to be updated, this function's atomic variant can
> + * present an atomic update to VT-d hardware even when cmpxchg16b
> + * instruction is not supported.
> + */
> +static void update_irte(struct iommu *iommu, struct iremap_entry *entry,
> +                        const struct iremap_entry *new_ire, bool atomic)
> +{
> +    ASSERT(spin_is_locked(&iommu_ir_ctrl(iommu)->iremap_lock));
> +
> +    if ( cpu_has_cx16 )
> +    {
> +        __uint128_t ret;
> +        struct iremap_entry old_ire;
> +
> +        old_ire = *entry;
> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
> +
> +        /*
> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
> +         * IRTE, and the hardware cannot update the IRTE behind us, so
> +         * the return value of cmpxchg16 should be the same as old_ire.
> +         * This ASSERT validate it.
> +         */
> +        ASSERT(ret == old_ire.val);
> +    }
> +    else
> +    {
> +        /*
> +         * If the caller requests a atomic update but we can't meet it, 
> +         * a bug will be raised.
> +         */
> +        if ( entry->lo == new_ire->lo )
> +            write_atomic(&entry->hi, new_ire->hi);
> +        else if ( entry->hi == new_ire->hi )
> +            write_atomic(&entry->lo, new_ire->lo);
> +        else if ( !atomic )
> +            *entry = *new_ire;
> +        else
> +            BUG();
> +    }

Sadly the comment still uses the word atomic, and there's still no
mention of whether (and if so, how) the hardware may update an
IRTE behind your back. But since Kevin gave his R-b, I guess I'll
have to give up on this.

> @@ -639,7 +689,10 @@ static int msi_msg_to_remap_entry(
>      remap_rte->address_hi = 0;
>      remap_rte->data = index - i;
>  
> -    *iremap_entry = new_ire;
> +    update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
> +    if ( !msi_desc->irte_initialized )
> +        msi_desc->irte_initialized = true;

I don't see the point of the conditional, and I guess I'll take the
liberty to remove it, should I end up committing this patch.

x86 parts
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-04-06  0:30 ` [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
@ 2017-04-07 12:54   ` Jan Beulich
  2017-04-07  6:41     ` Chao Gao
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 12:54 UTC (permalink / raw)
  To: Chao Gao; +Cc: Kevin Tian, xen-devel

>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
> The current logic of using VT-d pi is when guest configurates the pirq's
> destination vcpu to a single vcpu, the according IRTE is updated to
> posted format. If the destination of the pirq is multiple vcpus, we will
> stay in posted format. Obviously, we should fall back to remapping interrupt
> when guest wrongly configurate destination of pirq or makes it have
> multi-destination vcpus.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> v11:
> - move the code (one line) that allow the parameter 'vcpu' of pi_update_irte()
> can be NULL to Patch [2/6].

There was a change in v12 too, and I'm afraid it's buggy:

> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -413,15 +413,8 @@ int pt_irq_create_bind(
>  
>          /* Use interrupt posting if it is supported. */
>          if ( iommu_intpost )
> -        {
> -            if ( vcpu )
> -                pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
> -                               pirq_dpci->gmsi.gvec);
> -            else
> -                dprintk(XENLOG_G_INFO,
> -                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
> -                        vcpu, pirq_dpci->gmsi.gvec);
> -        }
> +            pi_update_irte(&vcpu->arch.hvm_vmx.pi_desc, info,
> +                           pirq_dpci->gmsi.gvec);

pi_update_irte() tolerates a NULL first argument, but if vcpu is NULL
you don't pass NULL here. I'll fix this up while committing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI
  2017-04-07  7:23         ` Chao Gao
@ 2017-04-07 14:48           ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 14:48 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Kevin Tian, xen-devel

>>> On 07.04.17 at 09:23, <chao.gao@intel.com> wrote:
> On Fri, Apr 07, 2017 at 05:50:36AM -0600, Jan Beulich wrote:
>>>>> On 07.04.17 at 06:07, <chao.gao@intel.com> wrote:
>>> Cc: kevin
>>> 
>>> On Fri, Apr 07, 2017 at 04:38:00AM -0600, Jan Beulich wrote:
>>>>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -438,6 +438,9 @@ static int hvm_migrate_pirq(struct domain *d, struct 
>>> hvm_pirq_dpci *pirq_dpci,
>>>>>      struct vcpu *v = arg;
>>>>>  
>>>>>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>>>> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
>>>>> +         /* Needn't migrate pirq if this pirq is delivered to guest 
>>> directly.*/
>>>>> +         (!pirq_dpci->gmsi.posted) &&
>>>>>           (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>>>>      {
>>>>
>>>>I don't think I've seen you address Kevin's comment on this for v11,
>>>>and like Kevin I can't immediately see why the above addition would
>>>>be correct. Do you perhaps mean
>>>>
>>>>    if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
>>>>         /* Needn't migrate pirq if this pirq is delivered to guest 
> directly.*/
>>>>         (!pirq_dpci->gmsi.posted ||
>>>>          <whatever is appropriate here, if anything>) &&
>>>>         (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
>>> 
>>> Sorry to Kevin. And thanks to point it out. 
>>> But I thought we had discussed this in 
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg04383.html. I 
>>> did think you agreed with me.
>>> gmsi is invalid when pirq_dpci is not GUEST_MSI, is there something I have
>>> ignored?
>>
>>You've been talking about GUEST_PCI there, which I did (and do)
>>agree we can't handle here. So for the purposes of your series,
>>simply adding the gmsi.posted check would be the right thing imo.
>>I don't think I see anything wrong with the ->gmsi accesses here:
>>The GUEST_PCI code simply doesn't set them, so dest_vcpu_id
>>will still be -1 (from pt_pirq_init()). So I don't see any bug being
>>fixed here with the extra other check you add. If you agree, I
>>can take that line and the commit message sentence out while
>>committing.
> 
> Ok. I admit I said it's bug is wrong. feel free to do what you want.

Well, looks like I forgot to adjust the commit message.
-ETOOMUCHSTUFFGOINGONTODAY.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely
  2017-04-07  7:51     ` Chao Gao
@ 2017-04-07 14:59       ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2017-04-07 14:59 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 07.04.17 at 09:51, <chao.gao@intel.com> wrote:
> On Fri, Apr 07, 2017 at 06:27:39AM -0600, Jan Beulich wrote:
>>>>> On 06.04.17 at 02:30, <chao.gao@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/intremap.c
>>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>>> @@ -169,10 +169,55 @@ bool_t __init iommu_supports_eim(void)
>>>      return 1;
>>>  }
>>>  
>>> +/*
>>> + * Assume iremap_lock has been acquired. It is to make sure software will not
>>> + * change the same IRTE behind us. With this assumption, if only high qword or
>>> + * low qword in IRTE is to be updated, this function's atomic variant can
>>> + * present an atomic update to VT-d hardware even when cmpxchg16b
>>> + * instruction is not supported.
>>> + */
>>> +static void update_irte(struct iommu *iommu, struct iremap_entry *entry,
>>> +                        const struct iremap_entry *new_ire, bool atomic)
>>> +{
>>> +    ASSERT(spin_is_locked(&iommu_ir_ctrl(iommu)->iremap_lock));
>>> +
>>> +    if ( cpu_has_cx16 )
>>> +    {
>>> +        __uint128_t ret;
>>> +        struct iremap_entry old_ire;
>>> +
>>> +        old_ire = *entry;
>>> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
>>> +
>>> +        /*
>>> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
>>> +         * IRTE, and the hardware cannot update the IRTE behind us, so
>>> +         * the return value of cmpxchg16 should be the same as old_ire.
>>> +         * This ASSERT validate it.
>>> +         */
>>> +        ASSERT(ret == old_ire.val);
>>> +    }
>>> +    else
>>> +    {
>>> +        /*
>>> +         * If the caller requests a atomic update but we can't meet it, 
>>> +         * a bug will be raised.
>>> +         */
>>> +        if ( entry->lo == new_ire->lo )
>>> +            write_atomic(&entry->hi, new_ire->hi);
>>> +        else if ( entry->hi == new_ire->hi )
>>> +            write_atomic(&entry->lo, new_ire->lo);
>>> +        else if ( !atomic )
>>> +            *entry = *new_ire;
>>> +        else
>>> +            BUG();
>>> +    }
>>
>>Sadly the comment still uses the word atomic, and there's still no
>>mention of whether (and if so, how) the hardware may update an
>>IRTE behind your back. But since Kevin gave his R-b, I guess I'll
>>have to give up on this.
> 
> To make it clearer, the comment you mentioned is the comment in the else()
> branch or the comment before this function (or both)?  I will fix it
> in a new patch.

Specifically the one in the else branch.

>>> @@ -639,7 +689,10 @@ static int msi_msg_to_remap_entry(
>>>      remap_rte->address_hi = 0;
>>>      remap_rte->data = index - i;
>>>  
>>> -    *iremap_entry = new_ire;
>>> +    update_irte(iommu, iremap_entry, &new_ire, msi_desc->irte_initialized);
>>> +    if ( !msi_desc->irte_initialized )
>>> +        msi_desc->irte_initialized = true;
>>
>>I don't see the point of the conditional, and I guess I'll take the
>>liberty to remove it, should I end up committing this patch.
> 
> Yes, please.

And that's another thing I've already noticed I forgot.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-04-07 14:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06  0:30 [PATCH v12 0/7] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
2017-04-06  0:30 ` [PATCH v12 1/7] passthrough: don't migrate pirq when it is delivered through VT-d PI Chao Gao
2017-04-07 10:38   ` Jan Beulich
2017-04-07  4:07     ` Chao Gao
2017-04-07 11:50       ` Jan Beulich
2017-04-07  7:23         ` Chao Gao
2017-04-07 14:48           ` Jan Beulich
2017-04-06  0:30 ` [PATCH v12 2/7] x86/hvm: make io.h self-contained Chao Gao
2017-04-06 12:10   ` Paul Durrant
2017-04-06  0:30 ` [PATCH v12 3/7] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
2017-04-07  7:43   ` Tian, Kevin
2017-04-07 12:17   ` Jan Beulich
2017-04-06  0:30 ` [PATCH v12 4/7] VT-d: Some cleanups Chao Gao
2017-04-06  0:30 ` [PATCH v12 5/7] VMX: Fixup PI descriptor when cpu is offline Chao Gao
2017-04-06  0:30 ` [PATCH v12 6/7] VT-d: introduce update_irte to update irte safely Chao Gao
2017-04-07  7:50   ` Tian, Kevin
2017-04-07 12:27   ` Jan Beulich
2017-04-07  7:51     ` Chao Gao
2017-04-07 14:59       ` Jan Beulich
2017-04-06  0:30 ` [PATCH v12 7/7] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI Chao Gao
2017-04-07 12:54   ` Jan Beulich
2017-04-07  6:41     ` Chao Gao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.