All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list
@ 2017-03-15  5:11 Chao Gao
  2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

The current VT-d PI related code may operate incorrectly in the 
following scenarios: 
1. When IRTE is in posted mode, we don't need to set the irq 
affinity for it, since the destination of these interrupts is 
vCPU and the vCPU affinity is set during vCPU scheduling. Patch 
[1/6] handles this. In order to make the logic clearer, it also
extracts the common part from pi_update_irte() and
msi_msg_to_rte_entry() which two functions update irte to make
the logic clearer.

2. [2/6] is a cleanup patch 

3. When a pCPU is unplugged, and there might be vCPUs on its 
list. Since the pCPU is offline, those vCPUs might not be woken 
up again. [3/6] addresses it. 

4. IRTE is updated through structure assigment or memcpy() which is
unsafe. To resolve this, Patch [4/6] use cmpxchg16b() if supported or
two 64-bit write operations to update irte.

5. When VT-d PI is enabled, neen't migrate pirq which is using VT-d PI during
vCPU migration. Patch [5/6] solves this by introducing a new flag to indicate
that the pt-irq is delivered through VT-d PI.

6. We didn't change the IRTE to remapping format when pt-irq is configurated
to have multi-destination vCPUs. Patch [6/6] resolves this problem.

Chao Gao (4):
  VT-d: Introduce new fields in msi_desc to track binding with guest
    interrupt
  VT-d: introduce update_irte to update irte safely
  passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  passthrough/io: Fall back to remapping interrupt when we can't use
    VT-d PI

Feng Wu (2):
  VT-d: Some cleanups
  VMX: Fixup PI descriptor when cpu is offline

 xen/arch/x86/hvm/hvm.c                 |   3 +
 xen/arch/x86/hvm/vmx/vmcs.c            |   1 +
 xen/arch/x86/hvm/vmx/vmx.c             |  70 ++++++++++++
 xen/arch/x86/msi.c                     |   1 +
 xen/drivers/passthrough/io.c           |  22 ++--
 xen/drivers/passthrough/vtd/intremap.c | 196 ++++++++++++---------------------
 xen/include/asm-x86/hvm/vmx/vmx.h      |   1 +
 xen/include/asm-x86/msi.h              |   2 +
 xen/include/xen/hvm/irq.h              |   1 +
 9 files changed, 161 insertions(+), 136 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-15 16:41   ` Jan Beulich
  2017-03-22  5:59   ` Tian, Kevin
  2017-03-15  5:11 ` [PATCH v10 2/6] VT-d: Some cleanups Chao Gao
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

Currently, msi_msg_to_remap_entry is buggy when the live IRTE is in posted
format. Straightforwardly, we can let caller specify which format of IRTE they
want update to. But the problem is current callers are not aware of the
binding with guest interrupt. Making all callers be aware of the binding with
guest interrupt will cause a far more complicated change. Also some callings
happen in interrupt context where they can't acquire d->event_lock to read
struct hvm_pirq_dpci.

This patch introduces two new fields in msi_desc to track binding with a guest
interrupt such that msi_msg_to_remap_entry() can get the binding and update
IRTE accordingly. After that change, pi_update_irte() can utilize
msi_msg_to_remap_entry() to update IRTE to posted format.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v10:
- Newly added.

 xen/arch/x86/msi.c                     |   1 +
 xen/drivers/passthrough/vtd/intremap.c | 148 +++++++--------------------------
 xen/include/asm-x86/msi.h              |   2 +
 3 files changed, 34 insertions(+), 117 deletions(-)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index a868007..3374cd4 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int nr)
         entry[nr].dev = NULL;
         entry[nr].irq = -1;
         entry[nr].remap_index = -1;
+        entry[nr].pi_desc = NULL;
     }
 
     return entry;
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index bfd468b..6202ece 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
     struct msi_desc *msi_desc, struct msi_msg *msg)
 {
     struct iremap_entry *iremap_entry = NULL, *iremap_entries;
-    struct iremap_entry new_ire;
+    struct iremap_entry new_ire = {{0}};
     struct msi_msg_remap_entry *remap_rte;
     unsigned int index, i, nr = 1;
     unsigned long flags;
     struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
+    const struct pi_desc *pi_desc = msi_desc->pi_desc;
 
     if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
         nr = msi_desc->msi.nvec;
@@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
-
-    /* Set interrupt remapping table entry */
-    new_ire.remap.fpd = 0;
-    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
-    /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
-    new_ire.remap.avail = 0;
-    new_ire.remap.res_1 = 0;
-    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                            MSI_DATA_VECTOR_MASK;
-    new_ire.remap.res_2 = 0;
-    if ( x2apic_enabled )
-        new_ire.remap.dst = msg->dest32;
+    if ( !pi_desc )
+    {
+        new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
+        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
+        new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
+        /* Hardware require RH = 1 for LPR delivery mode */
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                                MSI_DATA_VECTOR_MASK;
+        if ( x2apic_enabled )
+            new_ire.remap.dst = msg->dest32;
+        else
+            new_ire.remap.dst =
+                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
+        new_ire.remap.p = 1;
+    }
     else
-        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                             & 0xff) << 8;
+    {
+        new_ire.post.im = 1;
+        new_ire.post.vector = msi_desc->gvec;
+        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
+        new_ire.post.p = 1;
+    }
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.remap.res_3 = 0;
-    new_ire.remap.res_4 = 0;
-    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
@@ -902,42 +905,6 @@ void iommu_disable_x2apic_IR(void)
         disable_qinval(drhd->iommu);
 }
 
-static void setup_posted_irte(
-    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
-    const struct pi_desc *pi_desc, const uint8_t gvec)
-{
-    memset(new_ire, 0, sizeof(*new_ire));
-
-    /*
-     * 'im' filed decides whether the irte is in posted format (with value 1)
-     * or remapped format (with value 0), if the old irte is in remapped format,
-     * we copy things from remapped part in 'struct iremap_entry', otherwise,
-     * we copy from posted part.
-     */
-    if ( !old_ire->remap.im )
-    {
-        new_ire->post.p = old_ire->remap.p;
-        new_ire->post.fpd = old_ire->remap.fpd;
-        new_ire->post.sid = old_ire->remap.sid;
-        new_ire->post.sq = old_ire->remap.sq;
-        new_ire->post.svt = old_ire->remap.svt;
-    }
-    else
-    {
-        new_ire->post.p = old_ire->post.p;
-        new_ire->post.fpd = old_ire->post.fpd;
-        new_ire->post.sid = old_ire->post.sid;
-        new_ire->post.sq = old_ire->post.sq;
-        new_ire->post.svt = old_ire->post.svt;
-        new_ire->post.urg = old_ire->post.urg;
-    }
-
-    new_ire->post.im = 1;
-    new_ire->post.vector = gvec;
-    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
-    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
-}
-
 /*
  * This function is used to update the IRTE for posted-interrupt
  * when guest changes MSI/MSI-X information.
@@ -946,17 +913,9 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
     const uint8_t gvec)
 {
     struct irq_desc *desc;
-    const struct msi_desc *msi_desc;
-    int remap_index;
-    int rc = 0;
-    const struct pci_dev *pci_dev;
-    const struct acpi_drhd_unit *drhd;
-    struct iommu *iommu;
-    struct ir_ctrl *ir_ctrl;
-    struct iremap_entry *iremap_entries = NULL, *p = NULL;
-    struct iremap_entry new_ire, old_ire;
+    struct msi_desc *msi_desc;
     const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
-    __uint128_t ret;
+    int rc;
 
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
     if ( !desc )
@@ -968,59 +927,14 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
         rc = -ENODEV;
         goto unlock_out;
     }
-
-    pci_dev = msi_desc->dev;
-    if ( !pci_dev )
-    {
-        rc = -ENODEV;
-        goto unlock_out;
-    }
-
-    remap_index = msi_desc->remap_index;
+    msi_desc->pi_desc = pi_desc;
+    msi_desc->gvec = gvec;
 
     spin_unlock_irq(&desc->lock);
 
     ASSERT(pcidevs_locked());
-
-    /*
-     * FIXME: For performance reasons we should store the 'iommu' pointer in
-     * 'struct msi_desc' in some other place, so we don't need to waste
-     * time searching it here.
-     */
-    drhd = acpi_find_matched_drhd_unit(pci_dev);
-    if ( !drhd )
-        return -ENODEV;
-
-    iommu = drhd->iommu;
-    ir_ctrl = iommu_ir_ctrl(iommu);
-    if ( !ir_ctrl )
-        return -ENODEV;
-
-    spin_lock_irq(&ir_ctrl->iremap_lock);
-
-    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
-
-    old_ire = *p;
-
-    /* Setup/Update interrupt remapping table entry. */
-    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
-    ret = cmpxchg16b(p, &old_ire, &new_ire);
-
-    /*
-     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
-     * and the hardware cannot update the IRTE behind us, so the return value
-     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
-     */
-    ASSERT(ret == old_ire.val);
-
-    iommu_flush_cache_entry(p, sizeof(*p));
-    iommu_flush_iec_index(iommu, 0, remap_index);
-
-    unmap_vtd_domain_page(iremap_entries);
-
-    spin_unlock_irq(&ir_ctrl->iremap_lock);
-
-    return 0;
+    rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
+    return rc;
 
  unlock_out:
     spin_unlock_irq(&desc->lock);
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 9c02945..3286692 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -118,6 +118,8 @@ struct msi_desc {
 	struct msi_msg msg;		/* Last set MSI message */
 
 	int remap_index;		/* index in interrupt remapping table */
+	const void *pi_desc;		/* PDA, indicates msi is delivered via VT-d PI */
+	uint8_t gvec;			/* guest vector. valid when pi_desc isn't NULL */
 };
 
 /*
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 2/6] VT-d: Some cleanups
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-15  5:11 ` [PATCH v10 3/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

Use type-safe structure assignment instead of memcpy()
Use sizeof(*iremap_entry).

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v10:
- Added several lines for patch [1/6] has been reworked.

v9:
- Delete several lines for patch [5/8] has been reworked.

v7: 
- Remove a useless cleanup

v6: 
- More descripion about the patch

 xen/drivers/passthrough/vtd/intremap.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 6202ece..7d4c7e1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -183,8 +183,8 @@ static void free_remap_entry(struct iommu *iommu, int index)
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memset(iremap_entry, 0, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    memset(iremap_entry, 0, sizeof(*iremap_entry));
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -310,7 +310,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
+    new_ire = *iremap_entry;
 
     if ( rte_upper )
     {
@@ -353,8 +353,8 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
@@ -640,8 +640,8 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    memcpy(iremap_entry, &new_ire, sizeof(struct iremap_entry));
-    iommu_flush_cache_entry(iremap_entry, sizeof(struct iremap_entry));
+    *iremap_entry = new_ire;
+    iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
     unmap_vtd_domain_page(iremap_entries);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 3/6] VMX: Fixup PI descriptor when cpu is offline
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
  2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
  2017-03-15  5:11 ` [PATCH v10 2/6] VT-d: Some cleanups Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-15  5:11 ` [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely Chao Gao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

From: Feng Wu <feng.wu@intel.com>

When cpu is offline, we need to move all the vcpus in its blocking
list to another online cpu, this patch handles it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7: 
- Pass unsigned int to vmx_pi_desc_fixup()

v6: 
- Carefully suppress 'SN' to avoid missing notification event
during moving the vcpu to the new list

v5: 
- Add some comments to explain why it doesn't cause deadlock
for the ABBA deadlock scenario. 

v4: 
- Remove the pointless check since we are in machine stop
context and no other cpus go down in parallel.

 xen/arch/x86/hvm/vmx/vmcs.c       |  1 +
 xen/arch/x86/hvm/vmx/vmx.c        | 70 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0c1b711..b7f6a5e 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -591,6 +591,7 @@ void vmx_cpu_dead(unsigned int cpu)
     vmx_free_vmcs(per_cpu(vmxon_region, cpu));
     per_cpu(vmxon_region, cpu) = 0;
     nvmx_cpu_dead(cpu);
+    vmx_pi_desc_fixup(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 894d7d4..dee0463 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -199,6 +199,76 @@ static void vmx_pi_do_resume(struct vcpu *v)
     vmx_pi_unblock_vcpu(v);
 }
 
+void vmx_pi_desc_fixup(unsigned int cpu)
+{
+    unsigned int new_cpu, dest;
+    unsigned long flags;
+    struct arch_vmx_struct *vmx, *tmp;
+    spinlock_t *new_lock, *old_lock = &per_cpu(vmx_pi_blocking, cpu).lock;
+    struct list_head *blocked_vcpus = &per_cpu(vmx_pi_blocking, cpu).list;
+
+    if ( !iommu_intpost )
+        return;
+
+    /*
+     * We are in the context of CPU_DEAD or CPU_UP_CANCELED notification,
+     * and it is impossible for a second CPU go down in parallel. So we
+     * can safely acquire the old cpu's lock and then acquire the new_cpu's
+     * lock after that.
+     */
+    spin_lock_irqsave(old_lock, flags);
+
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocking.list)
+    {
+        /*
+         * Suppress notification or we may miss an interrupt when the
+         * target cpu is dying.
+         */
+        pi_set_sn(&vmx->pi_desc);
+
+        /*
+         * Check whether a notification is pending before doing the
+         * movement, if that is the case we need to wake up it directly
+         * other than moving it to the new cpu's list.
+         */
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocking.list);
+            vmx->pi_blocking.lock = NULL;
+            vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
+        }
+        else
+        {
+            /*
+             * We need to find an online cpu as the NDST of the PI descriptor, it
+             * doesn't matter whether it is within the cpupool of the domain or
+             * not. As long as it is online, the vCPU will be woken up once the
+             * notification event arrives.
+             */
+            new_cpu = cpumask_any(&cpu_online_map);
+            new_lock = &per_cpu(vmx_pi_blocking, new_cpu).lock;
+
+            spin_lock(new_lock);
+
+            ASSERT(vmx->pi_blocking.lock == old_lock);
+
+            dest = cpu_physical_id(new_cpu);
+            write_atomic(&vmx->pi_desc.ndst,
+                         x2apic_enabled ? dest : MASK_INSR(dest, PI_xAPIC_NDST_MASK));
+
+            list_move(&vmx->pi_blocking.list,
+                      &per_cpu(vmx_pi_blocking, new_cpu).list);
+            vmx->pi_blocking.lock = new_lock;
+
+            spin_unlock(new_lock);
+        }
+
+        pi_clear_sn(&vmx->pi_desc);
+    }
+
+    spin_unlock_irqrestore(old_lock, flags);
+}
+
 /*
  * To handle posted interrupts correctly, we need to set the following
  * state:
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 2b781ab..5ead57c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -597,6 +597,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void vmx_pi_per_cpu_init(unsigned int cpu);
+void vmx_pi_desc_fixup(unsigned int cpu);
 
 void vmx_pi_hooks_assign(struct domain *d);
 void vmx_pi_hooks_deassign(struct domain *d);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (2 preceding siblings ...)
  2017-03-15  5:11 ` [PATCH v10 3/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-15 16:48   ` Jan Beulich
  2017-03-15  5:11 ` [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI Chao Gao
  2017-03-15  5:11 ` [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use " Chao Gao
  5 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Chao Gao

We used structure assignment to update irte which was not safe when an
interrupt happened during update. It is better to update IRTE atomically
through cmpxchg16b(). When cmpxchg16b is not supported, two 64-bit write
operations can update IRTE safely when only the high qword or the low qword is
intented to be updated.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v10:
- rename copy_irte_to_irt to update_irte
- remove copy_from_to_irt
- change commmit message and add some comments to illustrate on which
condition update_irte() is safe.

 xen/drivers/passthrough/vtd/intremap.c | 36 ++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 7d4c7e1..342b45f 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -169,6 +169,38 @@ bool_t __init iommu_supports_eim(void)
     return 1;
 }
 
+static void update_irte(struct iremap_entry *entry,
+                        const struct iremap_entry *new_ire)
+{
+    if ( cpu_has_cx16 )
+    {
+        __uint128_t ret;
+        struct iremap_entry old_ire;
+
+        old_ire = *entry;
+        ret = cmpxchg16b(entry, &old_ire, new_ire);
+
+        /*
+         * In the above, we use cmpxchg16 to atomically update the 128-bit
+         * IRTE, and the hardware cannot update the IRTE behind us, so
+         * the return value of cmpxchg16 should be the same as old_ire.
+         * This ASSERT validate it.
+         */
+        ASSERT(ret == old_ire.val);
+    }
+    else
+    {
+        /*
+         * The following method to update IRTE is safe on condition that
+         * only the high qword or the low qword is to be updated.
+         * If entire IRTE is to be updated, callers should make sure the
+         * IRTE is not in use.
+         */
+        entry->lo = new_ire->lo;
+        entry->hi = new_ire->hi;
+    }
+}
+
 /* Mark specified intr remap entry as free */
 static void free_remap_entry(struct iommu *iommu, int index)
 {
@@ -353,7 +385,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
         remap_rte->format = 1;    /* indicate remap format */
     }
 
-    *iremap_entry = new_ire;
+    update_irte(iremap_entry, &new_ire);
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
@@ -640,7 +672,7 @@ static int msi_msg_to_remap_entry(
     remap_rte->address_hi = 0;
     remap_rte->data = index - i;
 
-    *iremap_entry = new_ire;
+    update_irte(iremap_entry, &new_ire);
     iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry));
     iommu_flush_iec_index(iommu, 0, index);
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (3 preceding siblings ...)
  2017-03-15  5:11 ` [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-17 10:43   ` Jan Beulich
  2017-03-15  5:11 ` [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use " Chao Gao
  5 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Chao Gao

When a vCPU migrated to another pCPU, pt irqs binded to this vCPU also needed
migration. When VT-d PI is enabled, interrupt vector will be recorded to
a main memory resident data-structure and a notification whose destination
is decided by NDST is generated. NDST is properly adjusted during vCPU
migration so pirq directly injected to guest needn't be migrated.

This patch adds a indicator, @via_pi, to show whether the pt irq is delivered
through VT-d PI.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v10:
- Newly added.

 xen/arch/x86/hvm/hvm.c       |  3 +++
 xen/drivers/passthrough/io.c | 15 +++++++++++----
 xen/include/xen/hvm/irq.h    |  1 +
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ccfae4f..abbdab9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -445,6 +445,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
     struct vcpu *v = arg;
 
     if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
+         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
+         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
+         (pirq_dpci->gmsi.via_pi != 1) &&
          (pirq_dpci->gmsi.dest_vcpu_id == v->vcpu_id) )
     {
         struct irq_desc *desc =
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 080183e..37d9af6 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -365,6 +365,7 @@ int pt_irq_create_bind(
     {
         uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
+        const struct vcpu *vcpu = NULL;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
         {
@@ -441,6 +442,15 @@ int pt_irq_create_bind(
 
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
+
+        pirq_dpci->gmsi.via_pi = 0;
+        if ( iommu_intpost )
+        {
+            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
+                                     pirq_dpci->gmsi.gvec);
+            if ( vcpu )
+                pirq_dpci->gmsi.via_pi = 1;
+        }
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
@@ -448,11 +458,8 @@ int pt_irq_create_bind(
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
         {
-            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
-                                          delivery_mode, pirq_dpci->gmsi.gvec);
-
             if ( vcpu )
-                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
             else
                 dprintk(XENLOG_G_INFO,
                         "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index d3f8623..ba74a31 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -63,6 +63,7 @@ struct hvm_gmsi_info {
     uint32_t gvec;
     uint32_t gflags;
     int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
+    bool via_pi; /* directly deliver to guest via VT-d PI? */
 };
 
 struct hvm_girq_dpci_mapping {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
                   ` (4 preceding siblings ...)
  2017-03-15  5:11 ` [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-03-15  5:11 ` Chao Gao
  2017-03-17 10:48   ` Jan Beulich
  2017-03-22  6:34   ` Tian, Kevin
  5 siblings, 2 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-15  5:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Jan Beulich, Chao Gao

The current logic of using VT-d pi is when guest configurates the pirq's
destination vcpu to a single vcpu, the according IRTE is updated to
posted format. If the destination of the pirq is multiple vcpus, we just use
interrupt remapping. Obviously, we should fall back to remapping interrupt
when guest wrongly configurate destination of pirq or makes it have
multi-destination vcpus.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v10:
- Newly added

 xen/drivers/passthrough/io.c           | 9 +--------
 xen/drivers/passthrough/vtd/intremap.c | 2 +-
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 37d9af6..be06b10 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -457,14 +457,7 @@ int pt_irq_create_bind(
 
         /* Use interrupt posting if it is supported. */
         if ( iommu_intpost )
-        {
-            if ( vcpu )
-                pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
-            else
-                dprintk(XENLOG_G_INFO,
-                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
-                        vcpu, pirq_dpci->gmsi.gvec);
-        }
+            pi_update_irte(vcpu, info, pirq_dpci->gmsi.gvec);
 
         break;
     }
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 342b45f..331c7d5 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -946,7 +946,7 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
 {
     struct irq_desc *desc;
     struct msi_desc *msi_desc;
-    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    const struct pi_desc *pi_desc = v ? &v->arch.hvm_vmx.pi_desc : NULL;
     int rc;
 
     desc = pirq_spin_lock_irq_desc(pirq, NULL);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
@ 2017-03-15 16:41   ` Jan Beulich
  2017-03-15 21:21     ` Chao Gao
  2017-03-22  5:59   ` Tian, Kevin
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-15 16:41 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
>      struct msi_desc *msi_desc, struct msi_msg *msg)
>  {
>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
> -    struct iremap_entry new_ire;
> +    struct iremap_entry new_ire = {{0}};

Any reason this isn't simple "{ }"?

> @@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>                       iremap_entries, iremap_entry);
>  
> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
> -
> -    /* Set interrupt remapping table entry */
> -    new_ire.remap.fpd = 0;
> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
> -    /* Hardware require RH = 1 for LPR delivery mode */
> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> -    new_ire.remap.avail = 0;
> -    new_ire.remap.res_1 = 0;
> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> -                            MSI_DATA_VECTOR_MASK;
> -    new_ire.remap.res_2 = 0;
> -    if ( x2apic_enabled )
> -        new_ire.remap.dst = msg->dest32;
> +    if ( !pi_desc )
> +    {
> +        new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
> +        new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
> +        /* Hardware require RH = 1 for LPR delivery mode */

As you're touching this anyway, please make it "requires" and
"lowest priority" respectively.

> @@ -968,59 +927,14 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>          rc = -ENODEV;
>          goto unlock_out;
>      }
> -
> -    pci_dev = msi_desc->dev;
> -    if ( !pci_dev )
> -    {
> -        rc = -ENODEV;
> -        goto unlock_out;
> -    }
> -
> -    remap_index = msi_desc->remap_index;
> +    msi_desc->pi_desc = pi_desc;
> +    msi_desc->gvec = gvec;

Am I overlooking something - I can't seem to find any place where these
two fields (or at least the former) get cleared again? This may be correct,
but if it is the reason wants recording in the commit message.

> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -118,6 +118,8 @@ struct msi_desc {
>  	struct msi_msg msg;		/* Last set MSI message */
>  
>  	int remap_index;		/* index in interrupt remapping table */
> +	const void *pi_desc;		/* PDA, indicates msi is delivered via VT-d PI */

Why "void"? Please let's play type safe wherever we can.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-15  5:11 ` [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely Chao Gao
@ 2017-03-15 16:48   ` Jan Beulich
  2017-03-15 22:39     ` Chao Gao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-15 16:48 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> +static void update_irte(struct iremap_entry *entry,
> +                        const struct iremap_entry *new_ire)
> +{
> +    if ( cpu_has_cx16 )
> +    {
> +        __uint128_t ret;
> +        struct iremap_entry old_ire;
> +
> +        old_ire = *entry;
> +        ret = cmpxchg16b(entry, &old_ire, new_ire);
> +
> +        /*
> +         * In the above, we use cmpxchg16 to atomically update the 128-bit
> +         * IRTE, and the hardware cannot update the IRTE behind us, so
> +         * the return value of cmpxchg16 should be the same as old_ire.
> +         * This ASSERT validate it.
> +         */
> +        ASSERT(ret == old_ire.val);
> +    }
> +    else
> +    {
> +        /*
> +         * The following method to update IRTE is safe on condition that
> +         * only the high qword or the low qword is to be updated.
> +         * If entire IRTE is to be updated, callers should make sure the
> +         * IRTE is not in use.
> +         */
> +        entry->lo = new_ire->lo;
> +        entry->hi = new_ire->hi;

How is this any better than structure assignment? Furthermore
the comment here partially contradicts the commit message. I
guess callers need to be given a way (another function parameter?)
to signal the function whether the unsafe variant is okay to use.
You should then add a suitable BUG_ON() in the else path here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-15 16:41   ` Jan Beulich
@ 2017-03-15 21:21     ` Chao Gao
  2017-03-16 10:24       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-15 21:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

On Wed, Mar 15, 2017 at 10:41:13AM -0600, Jan Beulich wrote:
>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/intremap.c
>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
>>      struct msi_desc *msi_desc, struct msi_msg *msg)
>>  {
>>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
>> -    struct iremap_entry new_ire;
>> +    struct iremap_entry new_ire = {{0}};
>
>Any reason this isn't simple "{ }"?
>

I also think '{}' can work. But here is the compiler output:
intremap.c: In function ‘msi_msg_to_remap_entry’:
intremap.c:587:12: error: missing braces around initializer [-Werror=missing-braces]
     struct iremap_entry new_ire = {0};
            ^
intremap.c:587:12: error: (near initialization for ‘new_ire.<anonymous>’) [-Werror=missing-braces]

>> @@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
>>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>>                       iremap_entries, iremap_entry);
>>  
>> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
>> -
>> -    /* Set interrupt remapping table entry */
>> -    new_ire.remap.fpd = 0;
>> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
>> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
>> -    /* Hardware require RH = 1 for LPR delivery mode */
>> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> -    new_ire.remap.avail = 0;
>> -    new_ire.remap.res_1 = 0;
>> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> -                            MSI_DATA_VECTOR_MASK;
>> -    new_ire.remap.res_2 = 0;
>> -    if ( x2apic_enabled )
>> -        new_ire.remap.dst = msg->dest32;
>> +    if ( !pi_desc )
>> +    {
>> +        new_ire.remap.dm = msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT;
>> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
>> +        new_ire.remap.dlm = msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT;
>> +        /* Hardware require RH = 1 for LPR delivery mode */
>
>As you're touching this anyway, please make it "requires" and
>"lowest priority" respectively.

OK.

>
>> @@ -968,59 +927,14 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>>          rc = -ENODEV;
>>          goto unlock_out;
>>      }
>> -
>> -    pci_dev = msi_desc->dev;
>> -    if ( !pci_dev )
>> -    {
>> -        rc = -ENODEV;
>> -        goto unlock_out;
>> -    }
>> -
>> -    remap_index = msi_desc->remap_index;
>> +    msi_desc->pi_desc = pi_desc;
>> +    msi_desc->gvec = gvec;
>
>Am I overlooking something - I can't seem to find any place where these
>two fields (or at least the former) get cleared again? This may be correct,
>but if it is the reason wants recording in the commit message.

IIUC, the current logic is free the whole msi_desc when device deassignment.
But it is better to clear this two fields explicitly. I will call pi_update_irte()
with @vcpu=NULL, just like Patch [6/6] when device deassignment.
Do you think the new added code should be squashed into this one?
Shall I also squash Patch [6/6] to this one? I think it is to fix another flaw.

>
>> --- a/xen/include/asm-x86/msi.h
>> +++ b/xen/include/asm-x86/msi.h
>> @@ -118,6 +118,8 @@ struct msi_desc {
>>  	struct msi_msg msg;		/* Last set MSI message */
>>  
>>  	int remap_index;		/* index in interrupt remapping table */
>> +	const void *pi_desc;		/* PDA, indicates msi is delivered via VT-d PI */
>
>Why "void"? Please let's play type safe wherever we can.

Ok.

Thank
Chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-15 16:48   ` Jan Beulich
@ 2017-03-15 22:39     ` Chao Gao
  2017-03-16 10:29       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-15 22:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>> +        /*
>> +         * The following method to update IRTE is safe on condition that
>> +         * only the high qword or the low qword is to be updated.
>> +         * If entire IRTE is to be updated, callers should make sure the
>> +         * IRTE is not in use.
>> +         */
>> +        entry->lo = new_ire->lo;
>> +        entry->hi = new_ire->hi;
>
>How is this any better than structure assignment? Furthermore

Indeed, not better. when using structure assignment, the assembly code is
48 8b 06                mov    (%rsi),%rax                    
48 8b 56 08             mov    0x8(%rsi),%rdx                 
48 89 07                mov    %rax,(%rdi)                    
48 89 57 08             mov    %rdx,0x8(%rdi)
Using the code above, the assembly code is
48 8b 06                mov    (%rsi),%rax                  
48 89 07                mov    %rax,(%rdi)                    
48 8b 46 08             mov    0x8(%rsi),%rax                 
48 89 47 08             mov    %rax,0x8(%rdi)

I thought structure assignment maybe ultilize memcpy considering structure
of a big size, so I made this change. I will change this back. Although
that, this patch is trying to make the change safer when cmpxchg16() is
supported. 

>the comment here partially contradicts the commit message. I

Yes.

>guess callers need to be given a way (another function parameter?)
>to signal the function whether the unsafe variant is okay to use.

This means we need to add the new parameter to iommu ops for only
IOAPIC/MSI know the entry they want to change is masked. Is there
any another reasonable and correct solution? How about...

>You should then add a suitable BUG_ON() in the else path here.

just add a BUG_ON() like this
BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) );
Adding this BUG_ON() means update_irte() can't be used for initializing
or clearing IRTE which are not bugs.

Thanks
Chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-15 21:21     ` Chao Gao
@ 2017-03-16 10:24       ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-03-16 10:24 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Feng Wu, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jun Nakajima

>>> On 15.03.17 at 22:21, <chao.gao@intel.com> wrote:
> On Wed, Mar 15, 2017 at 10:41:13AM -0600, Jan Beulich wrote:
>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/intremap.c
>>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>>> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
>>>      struct msi_desc *msi_desc, struct msi_msg *msg)
>>>  {
>>>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
>>> -    struct iremap_entry new_ire;
>>> +    struct iremap_entry new_ire = {{0}};
>>
>>Any reason this isn't simple "{ }"?
>>
> 
> I also think '{}' can work. But here is the compiler output:
> intremap.c: In function ‘msi_msg_to_remap_entry’:
> intremap.c:587:12: error: missing braces around initializer 
> [-Werror=missing-braces]
>      struct iremap_entry new_ire = {0};
>             ^
> intremap.c:587:12: error: (near initialization for ‘new_ire.<anonymous>’) 
> [-Werror=missing-braces]

Sure - you've left the 0 there, other than suggested.

>>> @@ -968,59 +927,14 @@ int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
>>>          rc = -ENODEV;
>>>          goto unlock_out;
>>>      }
>>> -
>>> -    pci_dev = msi_desc->dev;
>>> -    if ( !pci_dev )
>>> -    {
>>> -        rc = -ENODEV;
>>> -        goto unlock_out;
>>> -    }
>>> -
>>> -    remap_index = msi_desc->remap_index;
>>> +    msi_desc->pi_desc = pi_desc;
>>> +    msi_desc->gvec = gvec;
>>
>>Am I overlooking something - I can't seem to find any place where these
>>two fields (or at least the former) get cleared again? This may be correct,
>>but if it is the reason wants recording in the commit message.
> 
> IIUC, the current logic is free the whole msi_desc when device deassignment.
> But it is better to clear this two fields explicitly. I will call pi_update_irte()
> with @vcpu=NULL, just like Patch [6/6] when device deassignment.
> Do you think the new added code should be squashed into this one?

Not sure which code specifically you think about.

> Shall I also squash Patch [6/6] to this one? I think it is to fix another 
> flaw.

I haven't got around to look at patch 6 yet.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-15 22:39     ` Chao Gao
@ 2017-03-16 10:29       ` Jan Beulich
  2017-03-17  1:52         ` Chao Gao
                           ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jan Beulich @ 2017-03-16 10:29 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote:
> On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>> +        /*
>>> +         * The following method to update IRTE is safe on condition that
>>> +         * only the high qword or the low qword is to be updated.
>>> +         * If entire IRTE is to be updated, callers should make sure the
>>> +         * IRTE is not in use.
>>> +         */
>>> +        entry->lo = new_ire->lo;
>>> +        entry->hi = new_ire->hi;
>>
>>How is this any better than structure assignment? Furthermore
> 
> Indeed, not better. when using structure assignment, the assembly code is
> 48 8b 06                mov    (%rsi),%rax                    
> 48 8b 56 08             mov    0x8(%rsi),%rdx                 
> 48 89 07                mov    %rax,(%rdi)                    
> 48 89 57 08             mov    %rdx,0x8(%rdi)
> Using the code above, the assembly code is
> 48 8b 06                mov    (%rsi),%rax                  
> 48 89 07                mov    %rax,(%rdi)                    
> 48 8b 46 08             mov    0x8(%rsi),%rax                 
> 48 89 47 08             mov    %rax,0x8(%rdi)
> 
> I thought structure assignment maybe ultilize memcpy considering structure
> of a big size, so I made this change. I will change this back. Although
> that, this patch is trying to make the change safer when cmpxchg16() is
> supported. 

Perhaps you've really meant to use write_atomic()?

>>the comment here partially contradicts the commit message. I
> 
> Yes.
> 
>>guess callers need to be given a way (another function parameter?)
>>to signal the function whether the unsafe variant is okay to use.
> 
> This means we need to add the new parameter to iommu ops for only
> IOAPIC/MSI know the entry they want to change is masked. Is there
> any another reasonable and correct solution?

Well, users you convert in this patch must be okay to use the
non-atomic variant. The PI user(s) know(s) that cmpxchg16b is
available, so could always request the safe variant. No need for
a new parameter higher up in the call trees afaics.

> How about...
> 
>>You should then add a suitable BUG_ON() in the else path here.
> 
> just add a BUG_ON() like this
> BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) );
> Adding this BUG_ON() means update_irte() can't be used for initializing
> or clearing IRTE which are not bugs.

Yes, that's an option too, albeit then I'd suggest (pseudo code)

    if ( high_up_to_date )
        update_low;
    else if ( low_up_to_date )
        update_high;
    else
       BUG();

But you'll want to have the okay from Kevin as the maintainer for
something like this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-16 10:29       ` Jan Beulich
@ 2017-03-17  1:52         ` Chao Gao
  2017-03-17  9:08           ` Jan Beulich
  2017-03-22  6:26         ` Tian, Kevin
  2017-03-24  8:44         ` Tian, Kevin
  2 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-17  1:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

On Thu, Mar 16, 2017 at 04:29:29AM -0600, Jan Beulich wrote:
>>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote:
>> On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>> +        /*
>>>> +         * The following method to update IRTE is safe on condition that
>>>> +         * only the high qword or the low qword is to be updated.
>>>> +         * If entire IRTE is to be updated, callers should make sure the
>>>> +         * IRTE is not in use.
>>>> +         */
>>>> +        entry->lo = new_ire->lo;
>>>> +        entry->hi = new_ire->hi;
>>>
>>>How is this any better than structure assignment? Furthermore
>> 
>> Indeed, not better. when using structure assignment, the assembly code is
>> 48 8b 06                mov    (%rsi),%rax                    
>> 48 8b 56 08             mov    0x8(%rsi),%rdx                 
>> 48 89 07                mov    %rax,(%rdi)                    
>> 48 89 57 08             mov    %rdx,0x8(%rdi)
>> Using the code above, the assembly code is
>> 48 8b 06                mov    (%rsi),%rax                  
>> 48 89 07                mov    %rax,(%rdi)                    
>> 48 8b 46 08             mov    0x8(%rsi),%rax                 
>> 48 89 47 08             mov    %rax,0x8(%rdi)
>> 
>> I thought structure assignment maybe ultilize memcpy considering structure
>> of a big size, so I made this change. I will change this back. Although
>> that, this patch is trying to make the change safer when cmpxchg16() is
>> supported. 
>
>Perhaps you've really meant to use write_atomic()?

I don't understand what you mean. But I think write_atomic may be not related
to the problem how to update a 16 byte memory atomically if cmpxchg16() is not
supported.

>
>>>the comment here partially contradicts the commit message. I
>> 
>> Yes.
>> 
>>>guess callers need to be given a way (another function parameter?)
>>>to signal the function whether the unsafe variant is okay to use.
>> 
>> This means we need to add the new parameter to iommu ops for only
>> IOAPIC/MSI know the entry they want to change is masked. Is there
>> any another reasonable and correct solution?
>
>Well, users you convert in this patch must be okay to use the
>non-atomic variant. The PI user(s) know(s) that cmpxchg16b is
>available, so could always request the safe variant. No need for
>a new parameter higher up in the call trees afaics.
>
>> How about...
>> 
>>>You should then add a suitable BUG_ON() in the else path here.
>> 
>> just add a BUG_ON() like this
>> BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) );
>> Adding this BUG_ON() means update_irte() can't be used for initializing
>> or clearing IRTE which are not bugs.
>
>Yes, that's an option too, albeit then I'd suggest (pseudo code)
>
>    if ( high_up_to_date )
>        update_low;
>    else if ( low_up_to_date )
>        update_high;
>    else
>       BUG();
>
>But you'll want to have the okay from Kevin as the maintainer for
>something like this.

ok. I will wait for comments of Kevin.

Thank,
Chao
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-17  1:52         ` Chao Gao
@ 2017-03-17  9:08           ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-03-17  9:08 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Dario Faggioli,
	xen-devel, Jun Nakajima

>>> On 17.03.17 at 02:52, <chao.gao@intel.com> wrote:
> On Thu, Mar 16, 2017 at 04:29:29AM -0600, Jan Beulich wrote:
>>>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote:
>>> On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
>>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>>> +        /*
>>>>> +         * The following method to update IRTE is safe on condition that
>>>>> +         * only the high qword or the low qword is to be updated.
>>>>> +         * If entire IRTE is to be updated, callers should make sure the
>>>>> +         * IRTE is not in use.
>>>>> +         */
>>>>> +        entry->lo = new_ire->lo;
>>>>> +        entry->hi = new_ire->hi;
>>>>
>>>>How is this any better than structure assignment? Furthermore
>>> 
>>> Indeed, not better. when using structure assignment, the assembly code is
>>> 48 8b 06                mov    (%rsi),%rax                    
>>> 48 8b 56 08             mov    0x8(%rsi),%rdx                 
>>> 48 89 07                mov    %rax,(%rdi)                    
>>> 48 89 57 08             mov    %rdx,0x8(%rdi)
>>> Using the code above, the assembly code is
>>> 48 8b 06                mov    (%rsi),%rax                  
>>> 48 89 07                mov    %rax,(%rdi)                    
>>> 48 8b 46 08             mov    0x8(%rsi),%rax                 
>>> 48 89 47 08             mov    %rax,0x8(%rdi)
>>> 
>>> I thought structure assignment maybe ultilize memcpy considering structure
>>> of a big size, so I made this change. I will change this back. Although
>>> that, this patch is trying to make the change safer when cmpxchg16() is
>>> supported. 
>>
>>Perhaps you've really meant to use write_atomic()?
> 
> I don't understand what you mean. But I think write_atomic may be not related
> to the problem how to update a 16 byte memory atomically if cmpxchg16() is not
> supported.

Of course not, but you were concerned about memcpy() being
called by the compiler. I.e. I did assume that when doing the
2x8-byte update you would want to have them carried out as
two 8-byte writes, instead of possibly being broken up further
by the compiler.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-15  5:11 ` [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI Chao Gao
@ 2017-03-17 10:43   ` Jan Beulich
  2017-03-20  1:59     ` Chao Gao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-17 10:43 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -445,6 +445,9 @@ static int hvm_migrate_pirq(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
>      struct vcpu *v = arg;
>  
>      if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> +         (pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI) &&
> +         /* Needn't migrate pirq if this pirq is delivered to guest directly.*/
> +         (pirq_dpci->gmsi.via_pi != 1) &&

Considering the field is bool, !pirq_dpci->gmsi.via_pi please.

> @@ -441,6 +442,15 @@ int pt_irq_create_bind(
>  
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> +
> +        pirq_dpci->gmsi.via_pi = 0;

false

> +        if ( iommu_intpost )
> +        {
> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
> +                                     pirq_dpci->gmsi.gvec);

This is now outside of the event_lock-ed region - is this safe?

Furthermore, looking at this, why do we need to call this function
at all in other than the LowestPrio case? It looks as if we could
easily use what hvm_girq_dest_2_vcpu_id() provides in the Fixed
case.

> +            if ( vcpu )
> +                pirq_dpci->gmsi.via_pi = 1;

true

> +        }
>          spin_unlock(&d->event_lock);
>          if ( dest_vcpu_id >= 0 )
>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);

(continuing from above) This could then use vcpu too.

> --- a/xen/include/xen/hvm/irq.h
> +++ b/xen/include/xen/hvm/irq.h
> @@ -63,6 +63,7 @@ struct hvm_gmsi_info {
>      uint32_t gvec;
>      uint32_t gflags;
>      int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
> +    bool via_pi; /* directly deliver to guest via VT-d PI? */

Wouldn't the field name perhaps better be (or include) "posted"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-03-15  5:11 ` [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use " Chao Gao
@ 2017-03-17 10:48   ` Jan Beulich
  2017-03-22  6:34   ` Tian, Kevin
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-03-17 10:48 UTC (permalink / raw)
  To: Chao Gao; +Cc: Kevin Tian, xen-devel

>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> The current logic of using VT-d pi is when guest configurates the pirq's
> destination vcpu to a single vcpu, the according IRTE is updated to
> posted format. If the destination of the pirq is multiple vcpus, we just use
> interrupt remapping. Obviously, we should fall back to remapping interrupt
> when guest wrongly configurate destination of pirq or makes it have
> multi-destination vcpus.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-17 10:43   ` Jan Beulich
@ 2017-03-20  1:59     ` Chao Gao
  2017-03-20  9:18       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-20  1:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>> @@ -441,6 +442,15 @@ int pt_irq_create_bind(
>>  
>>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>> +
>> +        pirq_dpci->gmsi.via_pi = 0;
>
>false
>
>> +        if ( iommu_intpost )
>> +        {
>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>> +                                     pirq_dpci->gmsi.gvec);
>
>This is now outside of the event_lock-ed region - is this safe?

do you mean it is __inside__ the event_lock-ed region? I think it is safe
for the functions called by pi_find_dest_vcpu() are almost same with
hvm_girq_dest_2_vcpu_id.

>
>Furthermore, looking at this, why do we need to call this function
>at all in other than the LowestPrio case? It looks as if we could
>easily use what hvm_girq_dest_2_vcpu_id() provides in the Fixed
>case.

Agree. we can delete pi_find_dest_vcpu() and for LowestPrio case 
when VT-d PI enabled, we call vector_hashing_dest() directly.

>
>> +            if ( vcpu )
>> +                pirq_dpci->gmsi.via_pi = 1;
>
>true
>
>> +        }
>>          spin_unlock(&d->event_lock);
>>          if ( dest_vcpu_id >= 0 )
>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>
>(continuing from above) This could then use vcpu too.

I don't understand. In this patch, vcpu is always null when VT-d PI is not
enabled. Do you mean something like below: 

if ( dest_vcpu_id >= 0 )
    vcpu = d->vcpu[dest_vcpu_id];
if ( iommu_intpost && (!vcpu) && (delivery_mode == dest_LowestPrio) )
{
    vcpu = vector_hashing_dest(d, dest, dest_mode,pirq_dpci->gmsi.gvec);
...
}
spin_unlock(&d->event_lock);
if ( vcpu )
    hvm_migrate_pirqs(vcpu);

Thank,
Chao

>
>> --- a/xen/include/xen/hvm/irq.h
>> +++ b/xen/include/xen/hvm/irq.h
>> @@ -63,6 +63,7 @@ struct hvm_gmsi_info {
>>      uint32_t gvec;
>>      uint32_t gflags;
>>      int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
>> +    bool via_pi; /* directly deliver to guest via VT-d PI? */
>
>Wouldn't the field name perhaps better be (or include) "posted"?
>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20  9:18       ` Jan Beulich
@ 2017-03-20  2:38         ` Chao Gao
  2017-03-20 10:26           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-20  2:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Mon, Mar 20, 2017 at 03:18:18AM -0600, Jan Beulich wrote:
>>>> On 20.03.17 at 02:59, <chao.gao@intel.com> wrote:
>> On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>> +        if ( iommu_intpost )
>>>> +        {
>>>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>>>> +                                     pirq_dpci->gmsi.gvec);
>>>
>>>This is now outside of the event_lock-ed region - is this safe?
>> 
>> do you mean it is __inside__ the event_lock-ed region?
>
>Oops, indeed.
>
>> I think it is safe
>> for the functions called by pi_find_dest_vcpu() are almost same with
>> hvm_girq_dest_2_vcpu_id.
>
>The question then needs to be put differently: Is this needed?
>You shouldn't move into a locked region what doesn't need to
>be there.

Yes. For we rely on the result to set @via_pi which needs to be 
protected by the lock.

>
>>>> +        }
>>>>          spin_unlock(&d->event_lock);
>>>>          if ( dest_vcpu_id >= 0 )
>>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>>>
>>>(continuing from above) This could then use vcpu too.
>> 
>> I don't understand. In this patch, vcpu is always null when VT-d PI is not
>> enabled. Do you mean something like below: 
>> 
>> if ( dest_vcpu_id >= 0 )
>>     vcpu = d->vcpu[dest_vcpu_id];
>> if ( iommu_intpost && (!vcpu) && (delivery_mode == dest_LowestPrio) )
>> {
>>     vcpu = vector_hashing_dest(d, dest, dest_mode,pirq_dpci->gmsi.gvec);
>> ...
>> }
>> spin_unlock(&d->event_lock);
>> if ( vcpu )
>>     hvm_migrate_pirqs(vcpu);
>
>Yes, along these lines, albeit I think the first if() is more complicated
>than it needs to be.

We can make it simple like this:

const struct *vcpu vcpu;
...

vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;

Thanks
Chao

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20 10:26           ` Jan Beulich
@ 2017-03-20  5:22             ` Chao Gao
  2017-03-20 12:50               ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-20  5:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Mon, Mar 20, 2017 at 04:26:10AM -0600, Jan Beulich wrote:
>>>> On 20.03.17 at 03:38, <chao.gao@intel.com> wrote:
>> On Mon, Mar 20, 2017 at 03:18:18AM -0600, Jan Beulich wrote:
>>>>>> On 20.03.17 at 02:59, <chao.gao@intel.com> wrote:
>>>> On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>>>> +        if ( iommu_intpost )
>>>>>> +        {
>>>>>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>>>>>> +                                     pirq_dpci->gmsi.gvec);
>>>>>
>>>>>This is now outside of the event_lock-ed region - is this safe?
>>>> 
>>>> do you mean it is __inside__ the event_lock-ed region?
>>>
>>>Oops, indeed.
>>>
>>>> I think it is safe
>>>> for the functions called by pi_find_dest_vcpu() are almost same with
>>>> hvm_girq_dest_2_vcpu_id.
>>>
>>>The question then needs to be put differently: Is this needed?
>>>You shouldn't move into a locked region what doesn't need to
>>>be there.
>> 
>> Yes. For we rely on the result to set @via_pi which needs to be 
>> protected by the lock.
>
>I don't follow: You set via_pi for hvm_migrate_pirqs() to consume,
>yet the call to that function sits ...
>
>>>>>> +        }
>>>>>>          spin_unlock(&d->event_lock);
>>>>>>          if ( dest_vcpu_id >= 0 )
>>>>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>
>... right after the lock release.

@via_pi is also consumed during vCPU migration.
I just think the event_lock protects R/W operations on struct hvm_pirq_dpci.
To prohibit potential race(we can't use VT-d PI in 1st binding, but we can use
in the 2nd binding. But somehow the first update to via_pi overrides the second one),
and don't complicate the fields event_lock protects,
I'm inclined to put it in event_lock-ed region as long as it doesn't introduce other issues.

>
>>>>>(continuing from above) This could then use vcpu too.
>>>> 
>>>> I don't understand. In this patch, vcpu is always null when VT-d PI is not
>>>> enabled. Do you mean something like below: 
>>>> 
>>>> if ( dest_vcpu_id >= 0 )
>>>>     vcpu = d->vcpu[dest_vcpu_id];
>>>> if ( iommu_intpost && (!vcpu) && (delivery_mode == dest_LowestPrio) )
>>>> {
>>>>     vcpu = vector_hashing_dest(d, dest, dest_mode,pirq_dpci->gmsi.gvec);
>>>> ...
>>>> }
>>>> spin_unlock(&d->event_lock);
>>>> if ( vcpu )
>>>>     hvm_migrate_pirqs(vcpu);
>>>
>>>Yes, along these lines, albeit I think the first if() is more complicated
>>>than it needs to be.
>> 
>> We can make it simple like this:
>> 
>> const struct *vcpu vcpu;
>> ...
>> 
>> vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;
>
>Ouch - there were three if()s, and I missed the first one, i.e. I
>really meant the middle of them.

Yes. the code may be wrong, other than complicated. The code above has changing
the way we choose the destination vcpu when VT-d PI is enabled. Even we
can get a single destination vcpu for LowestPrio, we should use
vector_hashing_dest() to calculate the destination vcpu like the original logic.

I think the code below is better:
if ( iommu_intpost && (delivery_mode == dest_LowestPrio) )

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20 12:50               ` Jan Beulich
@ 2017-03-20  6:11                 ` Chao Gao
  0 siblings, 0 replies; 30+ messages in thread
From: Chao Gao @ 2017-03-20  6:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Mon, Mar 20, 2017 at 06:50:37AM -0600, Jan Beulich wrote:
>>>> On 20.03.17 at 06:22, <chao.gao@intel.com> wrote:
>> On Mon, Mar 20, 2017 at 04:26:10AM -0600, Jan Beulich wrote:
>>>>>> On 20.03.17 at 03:38, <chao.gao@intel.com> wrote:
>>>> On Mon, Mar 20, 2017 at 03:18:18AM -0600, Jan Beulich wrote:
>>>>>>>>          spin_unlock(&d->event_lock);
>>>>>>>>          if ( dest_vcpu_id >= 0 )
>>>>>>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>>>
>>>... right after the lock release.
>> 
>> @via_pi is also consumed during vCPU migration.
>
>But the event lock isn't being held around the checking of the
>field, so putting the setting of the field under lock is of no use.
>
>> I just think the event_lock protects R/W operations on struct hvm_pirq_dpci.
>> To prohibit potential race(we can't use VT-d PI in 1st binding, but we can use
>> in the 2nd binding. But somehow the first update to via_pi overrides the second one),
>> and don't complicate the fields event_lock protects,
>> I'm inclined to put it in event_lock-ed region as long as it doesn't 
>> introduce other issues.
>
>I certainly don't object to properly synchronizing things here,
>but then both producing and consuming side need to hold
>respective locks. Otherwise the best you can hope for is to
>reduce timing windows; you won't eliminate them though.

You are right. I am convinced.
Thanks
Chao

>
>Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20  1:59     ` Chao Gao
@ 2017-03-20  9:18       ` Jan Beulich
  2017-03-20  2:38         ` Chao Gao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-20  9:18 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 20.03.17 at 02:59, <chao.gao@intel.com> wrote:
> On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>> +        if ( iommu_intpost )
>>> +        {
>>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>>> +                                     pirq_dpci->gmsi.gvec);
>>
>>This is now outside of the event_lock-ed region - is this safe?
> 
> do you mean it is __inside__ the event_lock-ed region?

Oops, indeed.

> I think it is safe
> for the functions called by pi_find_dest_vcpu() are almost same with
> hvm_girq_dest_2_vcpu_id.

The question then needs to be put differently: Is this needed?
You shouldn't move into a locked region what doesn't need to
be there.

>>> +        }
>>>          spin_unlock(&d->event_lock);
>>>          if ( dest_vcpu_id >= 0 )
>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>>
>>(continuing from above) This could then use vcpu too.
> 
> I don't understand. In this patch, vcpu is always null when VT-d PI is not
> enabled. Do you mean something like below: 
> 
> if ( dest_vcpu_id >= 0 )
>     vcpu = d->vcpu[dest_vcpu_id];
> if ( iommu_intpost && (!vcpu) && (delivery_mode == dest_LowestPrio) )
> {
>     vcpu = vector_hashing_dest(d, dest, dest_mode,pirq_dpci->gmsi.gvec);
> ...
> }
> spin_unlock(&d->event_lock);
> if ( vcpu )
>     hvm_migrate_pirqs(vcpu);

Yes, along these lines, albeit I think the first if() is more complicated
than it needs to be.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20  2:38         ` Chao Gao
@ 2017-03-20 10:26           ` Jan Beulich
  2017-03-20  5:22             ` Chao Gao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-20 10:26 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 20.03.17 at 03:38, <chao.gao@intel.com> wrote:
> On Mon, Mar 20, 2017 at 03:18:18AM -0600, Jan Beulich wrote:
>>>>> On 20.03.17 at 02:59, <chao.gao@intel.com> wrote:
>>> On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>>> +        if ( iommu_intpost )
>>>>> +        {
>>>>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>>>>> +                                     pirq_dpci->gmsi.gvec);
>>>>
>>>>This is now outside of the event_lock-ed region - is this safe?
>>> 
>>> do you mean it is __inside__ the event_lock-ed region?
>>
>>Oops, indeed.
>>
>>> I think it is safe
>>> for the functions called by pi_find_dest_vcpu() are almost same with
>>> hvm_girq_dest_2_vcpu_id.
>>
>>The question then needs to be put differently: Is this needed?
>>You shouldn't move into a locked region what doesn't need to
>>be there.
> 
> Yes. For we rely on the result to set @via_pi which needs to be 
> protected by the lock.

I don't follow: You set via_pi for hvm_migrate_pirqs() to consume,
yet the call to that function sits ...

>>>>> +        }
>>>>>          spin_unlock(&d->event_lock);
>>>>>          if ( dest_vcpu_id >= 0 )
>>>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);

... right after the lock release.

>>>>(continuing from above) This could then use vcpu too.
>>> 
>>> I don't understand. In this patch, vcpu is always null when VT-d PI is not
>>> enabled. Do you mean something like below: 
>>> 
>>> if ( dest_vcpu_id >= 0 )
>>>     vcpu = d->vcpu[dest_vcpu_id];
>>> if ( iommu_intpost && (!vcpu) && (delivery_mode == dest_LowestPrio) )
>>> {
>>>     vcpu = vector_hashing_dest(d, dest, dest_mode,pirq_dpci->gmsi.gvec);
>>> ...
>>> }
>>> spin_unlock(&d->event_lock);
>>> if ( vcpu )
>>>     hvm_migrate_pirqs(vcpu);
>>
>>Yes, along these lines, albeit I think the first if() is more complicated
>>than it needs to be.
> 
> We can make it simple like this:
> 
> const struct *vcpu vcpu;
> ...
> 
> vcpu = (dest_vcpu_id >= 0) ? d->vcpu[dest_vcpu_id] : NULL;

Ouch - there were three if()s, and I missed the first one, i.e. I
really meant the middle of them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI
  2017-03-20  5:22             ` Chao Gao
@ 2017-03-20 12:50               ` Jan Beulich
  2017-03-20  6:11                 ` Chao Gao
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-20 12:50 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 20.03.17 at 06:22, <chao.gao@intel.com> wrote:
> On Mon, Mar 20, 2017 at 04:26:10AM -0600, Jan Beulich wrote:
>>>>> On 20.03.17 at 03:38, <chao.gao@intel.com> wrote:
>>> On Mon, Mar 20, 2017 at 03:18:18AM -0600, Jan Beulich wrote:
>>>>>>> On 20.03.17 at 02:59, <chao.gao@intel.com> wrote:
>>>>> On Fri, Mar 17, 2017 at 04:43:08AM -0600, Jan Beulich wrote:
>>>>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
>>>>>>> +        if ( iommu_intpost )
>>>>>>> +        {
>>>>>>> +            vcpu = pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode,
>>>>>>> +                                     pirq_dpci->gmsi.gvec);
>>>>>>
>>>>>>This is now outside of the event_lock-ed region - is this safe?
>>>>> 
>>>>> do you mean it is __inside__ the event_lock-ed region?
>>>>
>>>>Oops, indeed.
>>>>
>>>>> I think it is safe
>>>>> for the functions called by pi_find_dest_vcpu() are almost same with
>>>>> hvm_girq_dest_2_vcpu_id.
>>>>
>>>>The question then needs to be put differently: Is this needed?
>>>>You shouldn't move into a locked region what doesn't need to
>>>>be there.
>>> 
>>> Yes. For we rely on the result to set @via_pi which needs to be 
>>> protected by the lock.
>>
>>I don't follow: You set via_pi for hvm_migrate_pirqs() to consume,
>>yet the call to that function sits ...
>>
>>>>>>> +        }
>>>>>>>          spin_unlock(&d->event_lock);
>>>>>>>          if ( dest_vcpu_id >= 0 )
>>>>>>>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
>>
>>... right after the lock release.
> 
> @via_pi is also consumed during vCPU migration.

But the event lock isn't being held around the checking of the
field, so putting the setting of the field under lock is of no use.

> I just think the event_lock protects R/W operations on struct hvm_pirq_dpci.
> To prohibit potential race(we can't use VT-d PI in 1st binding, but we can use
> in the 2nd binding. But somehow the first update to via_pi overrides the second one),
> and don't complicate the fields event_lock protects,
> I'm inclined to put it in event_lock-ed region as long as it doesn't 
> introduce other issues.

I certainly don't object to properly synchronizing things here,
but then both producing and consuming side need to hold
respective locks. Otherwise the best you can hope for is to
reduce timing windows; you won't eliminate them though.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-22  5:59   ` Tian, Kevin
@ 2017-03-22  0:18     ` Chao Gao
  2017-03-22  8:32       ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Chao Gao @ 2017-03-22  0:18 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

On Wed, Mar 22, 2017 at 01:59:19PM +0800, Tian, Kevin wrote:
>> From: Gao, Chao
>> Sent: Wednesday, March 15, 2017 1:11 PM
>> 
>> Currently, msi_msg_to_remap_entry is buggy when the live IRTE is in posted
>> format. Straightforwardly, we can let caller specify which format of IRTE they
>> want update to. But the problem is current callers are not aware of the
>> binding with guest interrupt. Making all callers be aware of the binding with
>> guest interrupt will cause a far more complicated change. Also some callings
>> happen in interrupt context where they can't acquire d->event_lock to read
>> struct hvm_pirq_dpci.
>
>Above text is unclear to me. Are you trying to explain why current 
>code is buggy (which I don't get the point) or simply for mitigation
>options which were once considered but dropped for some reasons?
>

the latter. I will divide them into two sections and add more description about
why current code is buggy.

>> 
>> diff --git a/xen/drivers/passthrough/vtd/intremap.c
>> b/xen/drivers/passthrough/vtd/intremap.c
>> index bfd468b..6202ece 100644
>> --- a/xen/drivers/passthrough/vtd/intremap.c
>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
>>      struct msi_desc *msi_desc, struct msi_msg *msg)  {
>>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
>> -    struct iremap_entry new_ire;
>> +    struct iremap_entry new_ire = {{0}};
>>      struct msi_msg_remap_entry *remap_rte;
>>      unsigned int index, i, nr = 1;
>>      unsigned long flags;
>>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
>> +    const struct pi_desc *pi_desc = msi_desc->pi_desc;
>> 
>>      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>>          nr = msi_desc->msi.nvec;
>> @@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
>>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>>                       iremap_entries, iremap_entry);
>> 
>> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
>> -
>> -    /* Set interrupt remapping table entry */
>> -    new_ire.remap.fpd = 0;
>> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT)
>> & 0x1;
>> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
>> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT)
>> & 0x1;
>> -    /* Hardware require RH = 1 for LPR delivery mode */
>> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> -    new_ire.remap.avail = 0;
>> -    new_ire.remap.res_1 = 0;
>> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> -                            MSI_DATA_VECTOR_MASK;
>> -    new_ire.remap.res_2 = 0;
>> -    if ( x2apic_enabled )
>> -        new_ire.remap.dst = msg->dest32;
>> +    if ( !pi_desc )
>> +    {
>> +        new_ire.remap.dm = msg->address_lo >>
>> MSI_ADDR_DESTMODE_SHIFT;
>> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
>> +        new_ire.remap.dlm = msg->data >>
>> MSI_DATA_DELIVERY_MODE_SHIFT;
>> +        /* Hardware require RH = 1 for LPR delivery mode */
>> +        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
>> +        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
>> +                                MSI_DATA_VECTOR_MASK;
>> +        if ( x2apic_enabled )
>> +            new_ire.remap.dst = msg->dest32;
>> +        else
>> +            new_ire.remap.dst =
>> +                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
>> +        new_ire.remap.p = 1;
>
>Old code also touches fpd, res_1/2/3/4, which are abandoned
>above. Can you elaborate?
>

We have initialized new_ire to zero so I remove all the lines that assign 0 to fields.

>> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h index
>> 9c02945..3286692 100644
>> --- a/xen/include/asm-x86/msi.h
>> +++ b/xen/include/asm-x86/msi.h
>> @@ -118,6 +118,8 @@ struct msi_desc {
>>  	struct msi_msg msg;		/* Last set MSI message */
>> 
>>  	int remap_index;		/* index in interrupt remapping table
>> */
>> +	const void *pi_desc;		/* PDA, indicates msi is delivered via
>> VT-d PI */
>
>what's PDA?

Posted Descriptor Address which is recorded in posted format IRTE.

How about just use "Indicates msi is delivered via VT-d PI" because of the line width
limitation?

>
>> +	uint8_t gvec;			/* guest vector. valid when pi_desc
>> isn't NULL */
>>  };
>> 
>>  /*
>> --
>> 1.8.3.1
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
  2017-03-15 16:41   ` Jan Beulich
@ 2017-03-22  5:59   ` Tian, Kevin
  2017-03-22  0:18     ` Chao Gao
  1 sibling, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2017-03-22  5:59 UTC (permalink / raw)
  To: Gao, Chao, xen-devel
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 15, 2017 1:11 PM
> 
> Currently, msi_msg_to_remap_entry is buggy when the live IRTE is in posted
> format. Straightforwardly, we can let caller specify which format of IRTE they
> want update to. But the problem is current callers are not aware of the
> binding with guest interrupt. Making all callers be aware of the binding with
> guest interrupt will cause a far more complicated change. Also some callings
> happen in interrupt context where they can't acquire d->event_lock to read
> struct hvm_pirq_dpci.

Above text is unclear to me. Are you trying to explain why current 
code is buggy (which I don't get the point) or simply for mitigation
options which were once considered but dropped for some reasons?

> 
> This patch introduces two new fields in msi_desc to track binding with a
> guest interrupt such that msi_msg_to_remap_entry() can get the binding and
> update IRTE accordingly. After that change, pi_update_irte() can utilize
> msi_msg_to_remap_entry() to update IRTE to posted format.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v10:
> - Newly added.
> 
>  xen/arch/x86/msi.c                     |   1 +
>  xen/drivers/passthrough/vtd/intremap.c | 148 +++++++--------------------------
>  xen/include/asm-x86/msi.h              |   2 +
>  3 files changed, 34 insertions(+), 117 deletions(-)
> 
> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index
> a868007..3374cd4 100644
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -578,6 +578,7 @@ static struct msi_desc *alloc_msi_entry(unsigned int
> nr)
>          entry[nr].dev = NULL;
>          entry[nr].irq = -1;
>          entry[nr].remap_index = -1;
> +        entry[nr].pi_desc = NULL;
>      }
> 
>      return entry;
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index bfd468b..6202ece 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
>      struct msi_desc *msi_desc, struct msi_msg *msg)  {
>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
> -    struct iremap_entry new_ire;
> +    struct iremap_entry new_ire = {{0}};
>      struct msi_msg_remap_entry *remap_rte;
>      unsigned int index, i, nr = 1;
>      unsigned long flags;
>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
> +    const struct pi_desc *pi_desc = msi_desc->pi_desc;
> 
>      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>          nr = msi_desc->msi.nvec;
> @@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
>                       iremap_entries, iremap_entry);
> 
> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
> -
> -    /* Set interrupt remapping table entry */
> -    new_ire.remap.fpd = 0;
> -    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT)
> & 0x1;
> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
> -    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT)
> & 0x1;
> -    /* Hardware require RH = 1 for LPR delivery mode */
> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> -    new_ire.remap.avail = 0;
> -    new_ire.remap.res_1 = 0;
> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> -                            MSI_DATA_VECTOR_MASK;
> -    new_ire.remap.res_2 = 0;
> -    if ( x2apic_enabled )
> -        new_ire.remap.dst = msg->dest32;
> +    if ( !pi_desc )
> +    {
> +        new_ire.remap.dm = msg->address_lo >>
> MSI_ADDR_DESTMODE_SHIFT;
> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
> +        new_ire.remap.dlm = msg->data >>
> MSI_DATA_DELIVERY_MODE_SHIFT;
> +        /* Hardware require RH = 1 for LPR delivery mode */
> +        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> +        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> +                                MSI_DATA_VECTOR_MASK;
> +        if ( x2apic_enabled )
> +            new_ire.remap.dst = msg->dest32;
> +        else
> +            new_ire.remap.dst =
> +                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
> +        new_ire.remap.p = 1;

Old code also touches fpd, res_1/2/3/4, which are abandoned
above. Can you elaborate?

> +    }
>      else
> -        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
> -                             & 0xff) << 8;
> +    {
> +        new_ire.post.im = 1;
> +        new_ire.post.vector = msi_desc->gvec;
> +        new_ire.post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> +        new_ire.post.pda_h = virt_to_maddr(pi_desc) >> 32;
> +        new_ire.post.p = 1;
> +    }
> 
>      if ( pdev )
>          set_msi_source_id(pdev, &new_ire);
>      else
>          set_hpet_source_id(msi_desc->hpet_id, &new_ire);
> -    new_ire.remap.res_3 = 0;
> -    new_ire.remap.res_4 = 0;
> -    new_ire.remap.p = 1;    /* finally, set present bit */
> 
>      /* now construct new MSI/MSI-X rte entry */
>      remap_rte = (struct msi_msg_remap_entry *)msg; @@ -902,42 +905,6
> @@ void iommu_disable_x2apic_IR(void)
>          disable_qinval(drhd->iommu);
>  }
> 
> -static void setup_posted_irte(
> -    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
> -    const struct pi_desc *pi_desc, const uint8_t gvec)
> -{
> -    memset(new_ire, 0, sizeof(*new_ire));
> -
> -    /*
> -     * 'im' filed decides whether the irte is in posted format (with value 1)
> -     * or remapped format (with value 0), if the old irte is in remapped format,
> -     * we copy things from remapped part in 'struct iremap_entry', otherwise,
> -     * we copy from posted part.
> -     */
> -    if ( !old_ire->remap.im )
> -    {
> -        new_ire->post.p = old_ire->remap.p;
> -        new_ire->post.fpd = old_ire->remap.fpd;
> -        new_ire->post.sid = old_ire->remap.sid;
> -        new_ire->post.sq = old_ire->remap.sq;
> -        new_ire->post.svt = old_ire->remap.svt;
> -    }
> -    else
> -    {
> -        new_ire->post.p = old_ire->post.p;
> -        new_ire->post.fpd = old_ire->post.fpd;
> -        new_ire->post.sid = old_ire->post.sid;
> -        new_ire->post.sq = old_ire->post.sq;
> -        new_ire->post.svt = old_ire->post.svt;
> -        new_ire->post.urg = old_ire->post.urg;
> -    }
> -
> -    new_ire->post.im = 1;
> -    new_ire->post.vector = gvec;
> -    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> -    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
> -}
> -
>  /*
>   * This function is used to update the IRTE for posted-interrupt
>   * when guest changes MSI/MSI-X information.
> @@ -946,17 +913,9 @@ int pi_update_irte(const struct vcpu *v, const struct
> pirq *pirq,
>      const uint8_t gvec)
>  {
>      struct irq_desc *desc;
> -    const struct msi_desc *msi_desc;
> -    int remap_index;
> -    int rc = 0;
> -    const struct pci_dev *pci_dev;
> -    const struct acpi_drhd_unit *drhd;
> -    struct iommu *iommu;
> -    struct ir_ctrl *ir_ctrl;
> -    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> -    struct iremap_entry new_ire, old_ire;
> +    struct msi_desc *msi_desc;
>      const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> -    __uint128_t ret;
> +    int rc;
> 
>      desc = pirq_spin_lock_irq_desc(pirq, NULL);
>      if ( !desc )
> @@ -968,59 +927,14 @@ int pi_update_irte(const struct vcpu *v, const
> struct pirq *pirq,
>          rc = -ENODEV;
>          goto unlock_out;
>      }
> -
> -    pci_dev = msi_desc->dev;
> -    if ( !pci_dev )
> -    {
> -        rc = -ENODEV;
> -        goto unlock_out;
> -    }
> -
> -    remap_index = msi_desc->remap_index;
> +    msi_desc->pi_desc = pi_desc;
> +    msi_desc->gvec = gvec;
> 
>      spin_unlock_irq(&desc->lock);
> 
>      ASSERT(pcidevs_locked());
> -
> -    /*
> -     * FIXME: For performance reasons we should store the 'iommu' pointer
> in
> -     * 'struct msi_desc' in some other place, so we don't need to waste
> -     * time searching it here.
> -     */
> -    drhd = acpi_find_matched_drhd_unit(pci_dev);
> -    if ( !drhd )
> -        return -ENODEV;
> -
> -    iommu = drhd->iommu;
> -    ir_ctrl = iommu_ir_ctrl(iommu);
> -    if ( !ir_ctrl )
> -        return -ENODEV;
> -
> -    spin_lock_irq(&ir_ctrl->iremap_lock);
> -
> -    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> -
> -    old_ire = *p;
> -
> -    /* Setup/Update interrupt remapping table entry. */
> -    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> -    ret = cmpxchg16b(p, &old_ire, &new_ire);
> -
> -    /*
> -     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
> -     * and the hardware cannot update the IRTE behind us, so the return
> value
> -     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
> -     */
> -    ASSERT(ret == old_ire.val);
> -
> -    iommu_flush_cache_entry(p, sizeof(*p));
> -    iommu_flush_iec_index(iommu, 0, remap_index);
> -
> -    unmap_vtd_domain_page(iremap_entries);
> -
> -    spin_unlock_irq(&ir_ctrl->iremap_lock);
> -
> -    return 0;
> +    rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> +    return rc;

just "return iommu_update_ire..."

> 
>   unlock_out:
>      spin_unlock_irq(&desc->lock);
> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h index
> 9c02945..3286692 100644
> --- a/xen/include/asm-x86/msi.h
> +++ b/xen/include/asm-x86/msi.h
> @@ -118,6 +118,8 @@ struct msi_desc {
>  	struct msi_msg msg;		/* Last set MSI message */
> 
>  	int remap_index;		/* index in interrupt remapping table
> */
> +	const void *pi_desc;		/* PDA, indicates msi is delivered via
> VT-d PI */

what's PDA?

> +	uint8_t gvec;			/* guest vector. valid when pi_desc
> isn't NULL */
>  };
> 
>  /*
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-16 10:29       ` Jan Beulich
  2017-03-17  1:52         ` Chao Gao
@ 2017-03-22  6:26         ` Tian, Kevin
  2017-03-24  8:44         ` Tian, Kevin
  2 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-03-22  6:26 UTC (permalink / raw)
  To: Jan Beulich, Gao, Chao
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli, xen-devel, Nakajima, Jun

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, March 16, 2017 6:29 PM
> 
> >>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote:
> > On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
> >>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> >>> +        /*
> >>> +         * The following method to update IRTE is safe on condition that
> >>> +         * only the high qword or the low qword is to be updated.
> >>> +         * If entire IRTE is to be updated, callers should make sure the
> >>> +         * IRTE is not in use.
> >>> +         */
> >>> +        entry->lo = new_ire->lo;
> >>> +        entry->hi = new_ire->hi;
> >>
> >>How is this any better than structure assignment? Furthermore
> >
> > Indeed, not better. when using structure assignment, the assembly code is
> > 48 8b 06                mov    (%rsi),%rax
> > 48 8b 56 08             mov    0x8(%rsi),%rdx
> > 48 89 07                mov    %rax,(%rdi)
> > 48 89 57 08             mov    %rdx,0x8(%rdi)
> > Using the code above, the assembly code is
> > 48 8b 06                mov    (%rsi),%rax
> > 48 89 07                mov    %rax,(%rdi)
> > 48 8b 46 08             mov    0x8(%rsi),%rax
> > 48 89 47 08             mov    %rax,0x8(%rdi)
> >
> > I thought structure assignment maybe ultilize memcpy considering
> > structure of a big size, so I made this change. I will change this
> > back. Although that, this patch is trying to make the change safer
> > when cmpxchg16() is supported.
> 
> Perhaps you've really meant to use write_atomic()?
> 
> >>the comment here partially contradicts the commit message. I
> >
> > Yes.
> >
> >>guess callers need to be given a way (another function parameter?) to
> >>signal the function whether the unsafe variant is okay to use.
> >
> > This means we need to add the new parameter to iommu ops for only
> > IOAPIC/MSI know the entry they want to change is masked. Is there any
> > another reasonable and correct solution?
> 
> Well, users you convert in this patch must be okay to use the non-atomic
> variant. The PI user(s) know(s) that cmpxchg16b is available, so could always
> request the safe variant. No need for a new parameter higher up in the call
> trees afaics.
> 
> > How about...
> >
> >>You should then add a suitable BUG_ON() in the else path here.
> >
> > just add a BUG_ON() like this
> > BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) );
> > Adding this BUG_ON() means update_irte() can't be used for
> > initializing or clearing IRTE which are not bugs.
> 
> Yes, that's an option too, albeit then I'd suggest (pseudo code)
> 
>     if ( high_up_to_date )
>         update_low;
>     else if ( low_up_to_date )
>         update_high;
>     else
>        BUG();
> 
> But you'll want to have the okay from Kevin as the maintainer for something
> like this.
> 
> Jan


We'll have an internal discussion about exact requirement of update
atomicity here and then get back with a proposal.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use VT-d PI
  2017-03-15  5:11 ` [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use " Chao Gao
  2017-03-17 10:48   ` Jan Beulich
@ 2017-03-22  6:34   ` Tian, Kevin
  1 sibling, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-03-22  6:34 UTC (permalink / raw)
  To: Gao, Chao, xen-devel; +Cc: Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 15, 2017 1:11 PM
> 
> The current logic of using VT-d pi is when guest configurates the pirq's
> destination vcpu to a single vcpu, the according IRTE is updated to posted
> format. If the destination of the pirq is multiple vcpus, we just use interrupt
> remapping. 

From the patch I think you actually means the reverse - current
logic will stay in posted format.

> Obviously, we should fall back to remapping interrupt when
> guest wrongly configurate destination of pirq or makes it have multi-
> destination vcpus.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

with above comment fixed:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt
  2017-03-22  0:18     ` Chao Gao
@ 2017-03-22  8:32       ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-03-22  8:32 UTC (permalink / raw)
  To: Gao, Chao
  Cc: Feng Wu, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

> From: Gao, Chao
> Sent: Wednesday, March 22, 2017 8:19 AM
> 
> On Wed, Mar 22, 2017 at 01:59:19PM +0800, Tian, Kevin wrote:
> >> From: Gao, Chao
> >> Sent: Wednesday, March 15, 2017 1:11 PM
> >>
> >> Currently, msi_msg_to_remap_entry is buggy when the live IRTE is in
> >> posted format. Straightforwardly, we can let caller specify which
> >> format of IRTE they want update to. But the problem is current
> >> callers are not aware of the binding with guest interrupt. Making all
> >> callers be aware of the binding with guest interrupt will cause a far
> >> more complicated change. Also some callings happen in interrupt
> >> context where they can't acquire d->event_lock to read struct
> hvm_pirq_dpci.
> >
> >Above text is unclear to me. Are you trying to explain why current code
> >is buggy (which I don't get the point) or simply for mitigation options
> >which were once considered but dropped for some reasons?
> >
> 
> the latter. I will divide them into two sections and add more description
> about why current code is buggy.
> 
> >>
> >> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> >> b/xen/drivers/passthrough/vtd/intremap.c
> >> index bfd468b..6202ece 100644
> >> --- a/xen/drivers/passthrough/vtd/intremap.c
> >> +++ b/xen/drivers/passthrough/vtd/intremap.c
> >> @@ -552,11 +552,12 @@ static int msi_msg_to_remap_entry(
> >>      struct msi_desc *msi_desc, struct msi_msg *msg)  {
> >>      struct iremap_entry *iremap_entry = NULL, *iremap_entries;
> >> -    struct iremap_entry new_ire;
> >> +    struct iremap_entry new_ire = {{0}};
> >>      struct msi_msg_remap_entry *remap_rte;
> >>      unsigned int index, i, nr = 1;
> >>      unsigned long flags;
> >>      struct ir_ctrl *ir_ctrl = iommu_ir_ctrl(iommu);
> >> +    const struct pi_desc *pi_desc = msi_desc->pi_desc;
> >>
> >>      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
> >>          nr = msi_desc->msi.nvec;
> >> @@ -595,33 +596,35 @@ static int msi_msg_to_remap_entry(
> >>      GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
> >>                       iremap_entries, iremap_entry);
> >>
> >> -    memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
> >> -
> >> -    /* Set interrupt remapping table entry */
> >> -    new_ire.remap.fpd = 0;
> >> -    new_ire.remap.dm = (msg->address_lo >>
> MSI_ADDR_DESTMODE_SHIFT)
> >> & 0x1;
> >> -    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
> >> -    new_ire.remap.dlm = (msg->data >>
> MSI_DATA_DELIVERY_MODE_SHIFT)
> >> & 0x1;
> >> -    /* Hardware require RH = 1 for LPR delivery mode */
> >> -    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> >> -    new_ire.remap.avail = 0;
> >> -    new_ire.remap.res_1 = 0;
> >> -    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
> >> -                            MSI_DATA_VECTOR_MASK;
> >> -    new_ire.remap.res_2 = 0;
> >> -    if ( x2apic_enabled )
> >> -        new_ire.remap.dst = msg->dest32;
> >> +    if ( !pi_desc )
> >> +    {
> >> +        new_ire.remap.dm = msg->address_lo >>
> >> MSI_ADDR_DESTMODE_SHIFT;
> >> +        new_ire.remap.tm = msg->data >> MSI_DATA_TRIGGER_SHIFT;
> >> +        new_ire.remap.dlm = msg->data >>
> >> MSI_DATA_DELIVERY_MODE_SHIFT;
> >> +        /* Hardware require RH = 1 for LPR delivery mode */
> >> +        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
> >> +        new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT)
> &
> >> +                                MSI_DATA_VECTOR_MASK;
> >> +        if ( x2apic_enabled )
> >> +            new_ire.remap.dst = msg->dest32;
> >> +        else
> >> +            new_ire.remap.dst =
> >> +                MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK) << 8;
> >> +        new_ire.remap.p = 1;
> >
> >Old code also touches fpd, res_1/2/3/4, which are abandoned above. Can
> >you elaborate?
> >
> 
> We have initialized new_ire to zero so I remove all the lines that assign 0 to
> fields.

I overlooked this part

> 
> >> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
> >> index
> >> 9c02945..3286692 100644
> >> --- a/xen/include/asm-x86/msi.h
> >> +++ b/xen/include/asm-x86/msi.h
> >> @@ -118,6 +118,8 @@ struct msi_desc {
> >>  	struct msi_msg msg;		/* Last set MSI message */
> >>
> >>  	int remap_index;		/* index in interrupt remapping table
> >> */
> >> +	const void *pi_desc;		/* PDA, indicates msi is delivered via
> >> VT-d PI */
> >
> >what's PDA?
> 
> Posted Descriptor Address which is recorded in posted format IRTE.
> 
> How about just use "Indicates msi is delivered via VT-d PI" because of the line
> width limitation?
> 

Just "Pointer to posted descriptor".

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely
  2017-03-16 10:29       ` Jan Beulich
  2017-03-17  1:52         ` Chao Gao
  2017-03-22  6:26         ` Tian, Kevin
@ 2017-03-24  8:44         ` Tian, Kevin
  2 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2017-03-24  8:44 UTC (permalink / raw)
  To: Jan Beulich, Gao, Chao
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli, xen-devel, Nakajima, Jun

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, March 16, 2017 6:29 PM
> 
> >>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote:
> > On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote:
> >>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote:
> >>> +        /*
> >>> +         * The following method to update IRTE is safe on condition that
> >>> +         * only the high qword or the low qword is to be updated.
> >>> +         * If entire IRTE is to be updated, callers should make sure the
> >>> +         * IRTE is not in use.
> >>> +         */
> >>> +        entry->lo = new_ire->lo;
> >>> +        entry->hi = new_ire->hi;
> >>
> >>How is this any better than structure assignment? Furthermore
> >
> > Indeed, not better. when using structure assignment, the assembly code is
> > 48 8b 06                mov    (%rsi),%rax
> > 48 8b 56 08             mov    0x8(%rsi),%rdx
> > 48 89 07                mov    %rax,(%rdi)
> > 48 89 57 08             mov    %rdx,0x8(%rdi)
> > Using the code above, the assembly code is
> > 48 8b 06                mov    (%rsi),%rax
> > 48 89 07                mov    %rax,(%rdi)
> > 48 8b 46 08             mov    0x8(%rsi),%rax
> > 48 89 47 08             mov    %rax,0x8(%rdi)
> >
> > I thought structure assignment maybe ultilize memcpy considering
> > structure of a big size, so I made this change. I will change this
> > back. Although that, this patch is trying to make the change safer
> > when cmpxchg16() is supported.
> 
> Perhaps you've really meant to use write_atomic()?
> 
> >>the comment here partially contradicts the commit message. I
> >
> > Yes.
> >
> >>guess callers need to be given a way (another function parameter?) to
> >>signal the function whether the unsafe variant is okay to use.
> >
> > This means we need to add the new parameter to iommu ops for only
> > IOAPIC/MSI know the entry they want to change is masked. Is there any
> > another reasonable and correct solution?
> 
> Well, users you convert in this patch must be okay to use the non-atomic
> variant. The PI user(s) know(s) that cmpxchg16b is available, so could always
> request the safe variant. No need for a new parameter higher up in the call
> trees afaics.
> 
> > How about...
> >
> >>You should then add a suitable BUG_ON() in the else path here.
> >
> > just add a BUG_ON() like this
> > BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) );
> > Adding this BUG_ON() means update_irte() can't be used for
> > initializing or clearing IRTE which are not bugs.
> 
> Yes, that's an option too, albeit then I'd suggest (pseudo code)
> 
>     if ( high_up_to_date )
>         update_low;
>     else if ( low_up_to_date )
>         update_high;
>     else
>        BUG();
> 
> But you'll want to have the okay from Kevin as the maintainer for something
> like this.
> 

Just had a discussion with Chao. Here is a summary:

- PI implies availability of cmpxchg16b, so can always request atomicity

- The problem is really about remap mode. cmpxchg16b might be not
available on those platforms:

	* for init/clear operations, atomicity is not a requirement. We
can have high/low parts updated separately;

	* for reprogram operations (e.g. irq balance), atomicity
is a requirement. But in reality only lower 64bit should be touched.

Then:

- for remapping IRTE
	* do update_irte(high) and update_irte(low) separately
when IRTE is newly allocated
	* Then only do update_irte(low) in case an IRTE is found

- for posting IRTE
	* always do update_irte(high&low)

Then we can use the pseudo code you suggested.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-03-24  8:44 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15  5:11 [PATCH v10 0/6] VMX: Properly handle pi descriptor and per-cpu blocking list Chao Gao
2017-03-15  5:11 ` [PATCH v10 1/6] VT-d: Introduce new fields in msi_desc to track binding with guest interrupt Chao Gao
2017-03-15 16:41   ` Jan Beulich
2017-03-15 21:21     ` Chao Gao
2017-03-16 10:24       ` Jan Beulich
2017-03-22  5:59   ` Tian, Kevin
2017-03-22  0:18     ` Chao Gao
2017-03-22  8:32       ` Tian, Kevin
2017-03-15  5:11 ` [PATCH v10 2/6] VT-d: Some cleanups Chao Gao
2017-03-15  5:11 ` [PATCH v10 3/6] VMX: Fixup PI descriptor when cpu is offline Chao Gao
2017-03-15  5:11 ` [PATCH v10 4/6] VT-d: introduce update_irte to update irte safely Chao Gao
2017-03-15 16:48   ` Jan Beulich
2017-03-15 22:39     ` Chao Gao
2017-03-16 10:29       ` Jan Beulich
2017-03-17  1:52         ` Chao Gao
2017-03-17  9:08           ` Jan Beulich
2017-03-22  6:26         ` Tian, Kevin
2017-03-24  8:44         ` Tian, Kevin
2017-03-15  5:11 ` [PATCH v10 5/6] passthrough/io: don't migrate pirq when it is delivered through VT-d PI Chao Gao
2017-03-17 10:43   ` Jan Beulich
2017-03-20  1:59     ` Chao Gao
2017-03-20  9:18       ` Jan Beulich
2017-03-20  2:38         ` Chao Gao
2017-03-20 10:26           ` Jan Beulich
2017-03-20  5:22             ` Chao Gao
2017-03-20 12:50               ` Jan Beulich
2017-03-20  6:11                 ` Chao Gao
2017-03-15  5:11 ` [PATCH v10 6/6] passthrough/io: Fall back to remapping interrupt when we can't use " Chao Gao
2017-03-17 10:48   ` Jan Beulich
2017-03-22  6:34   ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.