All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 00/15] Add VT-d Posted-Interrupts support
@ 2015-03-25 12:31 Feng Wu
  2015-03-25 12:31 ` [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
                   ` (16 more replies)
  0 siblings, 17 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This patch set follow the following design:
http://article.gmane.org/gmane.comp.emulators.xen.devel/236476

Feng Wu (15):
  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  vt-d: VT-d Posted-Interrupts feature detection
  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  vmx: Add some helper functions for Posted-Interrupts
  vmx: Initialize VT-d Posted-Interrupts Descriptor
  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  vt-d: Add API to update IRTE when VT-d PI is used
  Update IRTE according to guest interrupt config changes
  Add a new per-vCPU tasklet to wakeup the blocked vCPU
  vmx: Define two per-cpu variants
  vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  vmx: Properly handle notification event when vCPU is running
  Update Posted-Interrupts Descriptor during vCPU scheduling
  Suppress posting interrupts when 'SN' is set
  Add a command line parameter for VT-d posted-interrupts

 xen/arch/x86/hvm/vmx/vmcs.c            |   6 ++
 xen/arch/x86/hvm/vmx/vmx.c             | 185 ++++++++++++++++++++++++++++++++-
 xen/common/domain.c                    |  11 ++
 xen/common/schedule.c                  |   3 +
 xen/drivers/passthrough/io.c           |  77 +++++++++++++-
 xen/drivers/passthrough/iommu.c        |  17 ++-
 xen/drivers/passthrough/vtd/intremap.c |  83 +++++++++++++++
 xen/drivers/passthrough/vtd/iommu.c    |  15 ++-
 xen/drivers/passthrough/vtd/iommu.h    |  23 ++++
 xen/include/asm-x86/hvm/hvm.h          |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |  16 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h      |  49 ++++++++-
 xen/include/asm-x86/iommu.h            |   2 +
 xen/include/xen/iommu.h                |   2 +-
 xen/include/xen/sched.h                |   5 +
 15 files changed, 485 insertions(+), 10 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 17:39   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds variable 'iommu_intpost' to control whether enable VT-d
posted-interrupt or not in the generic IOMMU code.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/iommu.c | 11 ++++++++++-
 xen/include/xen/iommu.h         |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 92ea26f..302e3e4 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char key);
  *   no-snoop                   Disable VT-d Snoop Control
  *   no-qinval                  Disable VT-d Queued Invalidation
  *   no-intremap                Disable VT-d Interrupt Remapping
+ *   no-intpost                 Disable VT-d Interrupt posting
  */
 custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
@@ -51,6 +52,7 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+bool_t __read_mostly iommu_intpost = 0;
 bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
@@ -94,7 +96,11 @@ static void __init parse_iommu_param(char *s)
         else if ( !strcmp(s, "qinval") )
             iommu_qinval = val;
         else if ( !strcmp(s, "intremap") )
+        {
             iommu_intremap = val;
+            if ( iommu_intremap == 0 )
+                iommu_intpost = 0;
+        }
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
@@ -272,7 +278,10 @@ int __init iommu_setup(void)
         iommu_enabled = (rc == 0);
     }
     if ( !iommu_enabled )
+    {
         iommu_intremap = 0;
+        iommu_intpost = 0;
+    }
 
     if ( (force_iommu && !iommu_enabled) ||
          (force_intremap && !iommu_intremap) )
@@ -341,7 +350,7 @@ void iommu_crash_shutdown(void)
     const struct iommu_ops *ops = iommu_get_ops();
     if ( iommu_enabled )
         ops->crash_shutdown();
-    iommu_enabled = iommu_intremap = 0;
+    iommu_enabled = iommu_intremap = iommu_intpost = 0;
 }
 
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index bf4aff0..91063bb 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -31,7 +31,7 @@
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
-extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
+extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
  2015-03-25 12:31 ` [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 18:12   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds feature detection logic for VT-d posted-interrupt.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 15 +++++++++++++--
 xen/drivers/passthrough/vtd/iommu.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 891b9e3..86798a3 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2030,6 +2030,7 @@ static int init_vtd_hw(void)
             if ( ioapic_to_iommu(IO_APIC_ID(apic)) == NULL )
             {
                 iommu_intremap = 0;
+                iommu_intpost = 0;
                 dprintk(XENLOG_ERR VTDPREFIX,
                     "ioapic_to_iommu: ioapic %#x (id: %#x) is NULL! "
                     "Will not try to enable Interrupt Remapping.\n",
@@ -2046,6 +2047,7 @@ static int init_vtd_hw(void)
             if ( enable_intremap(iommu, 0) != 0 )
             {
                 iommu_intremap = 0;
+                iommu_intpost = 0;
                 dprintk(XENLOG_WARNING VTDPREFIX,
                         "Interrupt Remapping not enabled\n");
 
@@ -2119,8 +2121,8 @@ int __init intel_vtd_setup(void)
     }
 
     /* We enable the following features only if they are supported by all VT-d
-     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
-     * Interrupt Remapping.
+     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
+     * Remapping, and Posted Interrupt
      */
     for_each_drhd_unit ( drhd )
     {
@@ -2146,7 +2148,13 @@ int __init intel_vtd_setup(void)
             iommu_qinval = 0;
 
         if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
+        {
             iommu_intremap = 0;
+            iommu_intpost = 0;
+        }
+
+        if ( iommu_intpost && !cap_intr_post(iommu->cap))
+            iommu_intpost = 0;
 
         if ( !vtd_ept_page_compatible(iommu) )
             iommu_hap_pt_share = 0;
@@ -2164,6 +2172,7 @@ int __init intel_vtd_setup(void)
     if ( !iommu_qinval && iommu_intremap )
     {
         iommu_intremap = 0;
+        iommu_intpost = 0;
         dprintk(XENLOG_WARNING VTDPREFIX, "Interrupt Remapping disabled "
             "since Queued Invalidation isn't supported or enabled.\n");
     }
@@ -2173,6 +2182,7 @@ int __init intel_vtd_setup(void)
     P(iommu_passthrough, "Dom0 DMA Passthrough");
     P(iommu_qinval, "Queued Invalidation");
     P(iommu_intremap, "Interrupt Remapping");
+    P(iommu_intpost, "Posted Interrupt");
     P(iommu_hap_pt_share, "Shared EPT tables");
 #undef P
 
@@ -2192,6 +2202,7 @@ int __init intel_vtd_setup(void)
     iommu_passthrough = 0;
     iommu_qinval = 0;
     iommu_intremap = 0;
+    iommu_intpost = 0;
     return ret;
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index d6e6520..42047e0 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -69,6 +69,7 @@
 /*
  * Decoding Capability Register
  */
+#define cap_intr_post(c)       (((c) >> 59) & 1)
 #define cap_read_drain(c)      (((c) >> 55) & 1)
 #define cap_write_drain(c)     (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
  2015-03-25 12:31 ` [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
  2015-03-25 12:31 ` [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 18:37   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

Extend struct pi_desc according to VT-d Posted-Interrupts Spec.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/include/asm-x86/hvm/vmx/vmcs.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6fce6aa..9631461 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -76,8 +76,20 @@ struct vmx_domain {
 
 struct pi_desc {
     DECLARE_BITMAP(pir, NR_VECTORS);
-    u32 control;
-    u32 rsvd[7];
+    union {
+        struct
+        {
+        u64 on     : 1,
+            sn     : 1,
+            rsvd_1 : 13,
+            ndm    : 1,
+            nv     : 8,
+            rsvd_2 : 8,
+            ndst   : 32;
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
 } __attribute__ ((aligned (64)));
 
 #define ept_get_wl(ept)   ((ept)->ept_wl)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 18:44   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 91c5e18..ecc5e17 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -100,6 +100,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
 void vmx_update_secondary_exec_control(struct vcpu *v);
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
 static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 {
     return test_and_set_bit(vector, pi_desc->pir);
@@ -120,6 +121,26 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
     return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
 }
 
+static inline int pi_test_on(struct pi_desc *pi_desc)
+{
+    return test_bit(POSTED_INTR_ON, &pi_desc->control);
+}
+
+static inline void pi_set_sn(struct pi_desc *pi_desc)
+{
+    set_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline int pi_test_sn(struct pi_desc *pi_desc)
+{
+    return test_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+    clear_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
 static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 {
     return xchg(&pi_desc->pir[group], 0);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 18:53   ` Andrew Cooper
  2015-03-26 19:29   ` Konrad Rzeszutek Wilk
  2015-03-25 12:31 ` [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch initializes the VT-d Posted-interrupt Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
 xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index d614638..942f4b7 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
+        if ( iommu_intpost == 1 )
+            pi_desc_init(v);
+
         __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index ecc5e17..3cd75eb 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -28,6 +28,9 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/vmx/vmcs.h>
+#include <asm/apic.h>
+
+extern uint8_t posted_intr_vector;
 
 typedef union {
     struct {
@@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
     return xchg(&pi_desc->pir[group], 0);
 }
 
+static inline void pi_desc_init(struct vcpu *v)
+{
+    uint32_t dest;
+
+    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);
+    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
+
+    /* Physical mode for Notificaiton Event */
+    v->arch.hvm_vmx.pi_desc.ndm = 0;
+    dest = cpu_physical_id(v->processor);
+
+    if ( x2apic_enabled )
+        v->arch.hvm_vmx.pi_desc.ndst = dest;
+    else
+        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
+}
+
 /*
  * Exit Reasons
  */
@@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
 
 extern u64 vmx_ept_vpid_cap;
-extern uint8_t posted_intr_vector;
 
 #define cpu_has_vmx_ept_exec_only_supported        \
     (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 19:00   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 42047e0..cd61e12 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -303,6 +303,18 @@ struct iremap_entry {
             res_2   : 8,
             dst     : 32;
     }lo;
+    struct {
+        u64 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1,
+            vector  : 8,
+            res_3   : 14,
+            pda_l   : 26;
+    }lo_intpost;
   };
   union {
     u64 hi_val;
@@ -312,6 +324,13 @@ struct iremap_entry {
             svt     : 2,
             res_1   : 44;
     }hi;
+    struct {
+        u64 sid     : 16,
+            sq      : 2,
+            svt     : 2,
+            res_1   : 12,
+            pda_h   : 32;
+    }hi_intpost;
   };
 };
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 19:17   ` Andrew Cooper
                     ` (2 more replies)
  2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
                   ` (9 subsequent siblings)
  16 siblings, 3 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/vtd/intremap.c | 83 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |  3 ++
 xen/include/asm-x86/iommu.h            |  2 +
 3 files changed, 88 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 0333686..f44e74d 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
     for_each_drhd_unit ( drhd )
         disable_qinval(drhd->iommu);
 }
+
+/*
+ * This function is used to update the IRTE for posted-interrupt
+ * when guest changes MSI/MSI-X information
+ */
+int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
+{
+    struct irq_desc *desc;
+    struct msi_desc *msi_desc;
+    int remap_index, rc = -1;
+    struct pci_dev *pci_dev;
+    struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+    struct ir_ctrl *ir_ctrl;
+    struct iremap_entry *iremap_entries = NULL, *p = NULL;
+    struct iremap_entry new_ire;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    unsigned long flags;
+
+    desc = pirq_spin_lock_irq_desc(pirq, NULL);
+    if ( !desc )
+        return -1;
+
+    msi_desc = desc->msi_desc;
+    if ( !msi_desc )
+        goto unlock_out;
+
+    remap_index = msi_desc->remap_index;
+    pci_dev = msi_desc->dev;
+    if ( !pci_dev )
+        goto unlock_out;
+
+    drhd = acpi_find_matched_drhd_unit(pci_dev);
+    if (!drhd)
+    {
+        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
+        goto unlock_out;
+    }
+
+    iommu = drhd->iommu;
+    ir_ctrl = iommu_ir_ctrl(iommu);
+    if ( !ir_ctrl )
+    {
+        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
+        goto unlock_out;
+    }
+
+    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
+
+    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
+
+    memcpy(&new_ire, p, sizeof(struct iremap_entry));
+
+    /* Setup/Update interrupt remapping table entry */
+    new_ire.lo_intpost.urg = 0;
+    new_ire.lo_intpost.vector = gvec;
+    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
+                                (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
+    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
+                                ~(-1UL << PDA_HIGH_BIT);
+
+    new_ire.lo_intpost.res_1 = 0;
+    new_ire.lo_intpost.res_2 = 0;
+    new_ire.lo_intpost.res_3 = 0;
+    new_ire.hi_intpost.res_1 = 0;
+
+    new_ire.lo_intpost.im = 1;
+
+    memcpy(p, &new_ire, sizeof(struct iremap_entry));
+    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
+    iommu_flush_iec_index(iommu, 0, remap_index);
+
+    if ( iremap_entries )
+        unmap_vtd_domain_page(iremap_entries);
+
+    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
+
+    rc = 0;
+ unlock_out:
+    spin_unlock_irq(&desc->lock);
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index cd61e12..ffa72c8 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -334,6 +334,9 @@ struct iremap_entry {
   };
 };
 
+#define PDA_LOW_BIT    26
+#define PDA_HIGH_BIT   32
+
 /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
 #define IREMAP_PAGE_ORDER  8
 
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index e7a65da..d233621 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -32,6 +32,8 @@ int iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
+int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 19:46   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2015-03-25 12:31 ` [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU Feng Wu
                   ` (8 subsequent siblings)
  16 siblings, 3 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/io.c | 77 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..1d9a132 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -26,6 +26,7 @@
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
+#include <asm/io_apic.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * Here we handle the following cases:
+ * - For lowest-priority interrupts, we find the destination vCPU from the
+ *   guest vector using vector-hashing mechamisn and return true. This follows
+ *   the hardware behavior, since modern Intel CPUs use vector hashing to
+ *   handle the lowest-priority interrupt.
+ * - Otherwise, for single destination interrupt, it is straightforward to
+ *   find the destination vCPU and return true.
+ * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
+ *   so return false.
+ */
+static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
+                                uint8_t dest_mode, uint8_t deliver_mode,
+                                uint32_t gvec, struct vcpu **dest_vcpu)
+{
+    struct vcpu *v, **dest_vcpu_array;
+    unsigned int dest_vcpu_num = 0;
+    int ret;
+
+    if ( deliver_mode == dest_LowestPrio )
+        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
+                                dest_id, dest_mode) )
+            continue;
+
+        dest_vcpu_num++;
+
+        if ( deliver_mode == dest_LowestPrio )
+            dest_vcpu_array[dest_vcpu_num] = v;
+        else
+            *dest_vcpu = v;
+    }
+
+    if ( deliver_mode == dest_LowestPrio )
+    {
+        if (  dest_vcpu_num != 0 )
+        {
+            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
+            ret = 1;
+        }
+        else
+            ret = 0;
+
+        xfree(dest_vcpu_array);
+        return ret;
+    }
+    else if (  dest_vcpu_num == 1 )
+        return 1;
+    else
+        return 0;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -257,7 +313,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, dest_mode;
+        uint8_t dest, dest_mode, deliver_mode;
         int dest_vcpu_id;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -330,11 +386,30 @@ int pt_irq_create_bind(
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
         dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
         dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
+        deliver_mode = (pirq_dpci->gmsi.gflags >> GFLAGS_SHIFT_DELIV_MODE) &
+                        VMSI_DELIV_MASK;
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+
+        /* Use interrupt posting if it is supported */
+        if ( iommu_intpost )
+        {
+            struct vcpu *vcpu = NULL;
+
+            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
+                                    pirq_dpci->gmsi.gvec, &vcpu) )
+                break;
+
+            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
+            {
+                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");
+                return -EBUSY;
+            }
+        }
+
         break;
     }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-04-02  5:53   ` Tian, Kevin
  2015-03-25 12:31 ` [RFC v1 10/15] vmx: Define two per-cpu variants Feng Wu
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch adds a new per-vCPU tasklet to wakeup the blocked
vCPU. It can be used in the case vcpu_unblock cannot be called
directly.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/common/domain.c     | 11 +++++++++++
 xen/include/xen/sched.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index aa78fd7..fe89658 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -109,6 +109,13 @@ static void vcpu_check_shutdown(struct vcpu *v)
     spin_unlock(&d->shutdown_lock);
 }
 
+static void vcpu_wakeup_tasklet_handler(unsigned long arg)
+{
+    struct vcpu *v = (void *)arg;
+
+    vcpu_unblock(v);
+}
+
 struct vcpu *alloc_vcpu(
     struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
 {
@@ -126,6 +133,9 @@ struct vcpu *alloc_vcpu(
 
     tasklet_init(&v->continue_hypercall_tasklet, NULL, 0);
 
+    tasklet_init(&v->vcpu_wakeup_tasklet, vcpu_wakeup_tasklet_handler,
+                 (unsigned long)v);
+
     if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
          !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
@@ -784,6 +794,7 @@ static void complete_domain_destroy(struct rcu_head *head)
         if ( (v = d->vcpu[i]) == NULL )
             continue;
         tasklet_kill(&v->continue_hypercall_tasklet);
+        tasklet_kill(&v->vcpu_wakeup_tasklet);
         vcpu_destroy(v);
         sched_destroy_vcpu(v);
         destroy_waitqueue_vcpu(v);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ccd7ed8..c874dd4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -239,6 +239,9 @@ struct vcpu
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
+    /* Tasklet for wakeup_blocked_vcpu(). */
+    struct tasklet   vcpu_wakeup_tasklet;
+
     /* Multicall information. */
     struct mc_state  mc_state;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 10/15] vmx: Define two per-cpu variants
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 19:59   ` Andrew Cooper
  2015-04-02  5:54   ` Tian, Kevin
  2015-03-25 12:31 ` [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts Feng Wu
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch defines two per-cpu variants:

blocked_vcpu_on_cpu:
A list storing the vCPUs which were blocked on this pCPU.

blocked_vcpu_on_cpu_lock:
The spinlock to protect blocked_vcpu_on_cpu.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c       | 3 +++
 xen/arch/x86/hvm/vmx/vmx.c        | 7 +++++++
 xen/include/asm-x86/hvm/vmx/vmx.h | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 942f4b7..1345e69 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -585,6 +585,9 @@ int vmx_cpu_up(void)
     if ( cpu_has_vmx_vpid )
         vpid_sync_all();
 
+    INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+    spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e1c55ce..ff5544d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -81,6 +81,13 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be waken up.
+ */
+DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
 
 static int vmx_domain_initialise(struct domain *d)
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3cd75eb..e643c3c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -30,6 +30,9 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <asm/apic.h>
 
+DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 extern uint8_t posted_intr_vector;
 
 typedef union {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 10/15] vmx: Define two per-cpu variants Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 20:07   ` Andrew Cooper
  2015-04-02  6:00   ` Tian, Kevin
  2015-03-25 12:31 ` [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Feng Wu
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

This patch adds a global vector which is used to wake up
the blocked vCPU when an interrupt is being posted to it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c        | 33 +++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/hvm.h     |  1 +
 xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
 xen/include/xen/sched.h           |  2 ++
 4 files changed, 39 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ff5544d..b2b4c26 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
 DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
 
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     if ( v->vcpu_id == 0 )
         v->arch.user_regs.eax = 1;
 
+    INIT_LIST_HEAD(&v->blocked_vcpu_list);
+
     return 0;
 }
 
@@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init start_vmx(void)
     }
 
     if ( cpu_has_vmx_posted_intr_processing )
+    {
         alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+
+        if ( iommu_intpost )
+            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+        else
+            vmx_function_table.pi_desc_update = NULL;
+    }
     else
     {
         vmx_function_table.deliver_posted_intr = NULL;
         vmx_function_table.sync_pir_to_irr = NULL;
+        vmx_function_table.pi_desc_update = NULL;
     }
 
     if ( cpu_has_vmx_ept
@@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 }
 
 /*
+ * Handle VT-d posted-interrupt when VCPU is blocked.
+ */
+void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v;
+    int cpu = smp_processor_id();
+
+    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
+                    blocked_vcpu_list) {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+        if ( pi_test_on(pi_desc) == 1 )
+            tasklet_schedule(&v->vcpu_wakeup_tasklet);
+    }
+    spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 0dc909b..a11a256 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -195,6 +195,7 @@ struct hvm_function_table {
     void (*deliver_posted_intr)(struct vcpu *v, u8 vector);
     void (*sync_pir_to_irr)(struct vcpu *v);
     void (*handle_eoi)(u8 vector);
+    void (*pi_desc_update)(struct vcpu *v, int new_state);
 
     /*Walk nested p2m  */
     int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index e643c3c..f4296ab 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -34,6 +34,7 @@ DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
 DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
 
 extern uint8_t posted_intr_vector;
+extern uint8_t pi_wakeup_vector;
 
 typedef union {
     struct {
@@ -574,6 +575,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
+void pi_wakeup_interrupt(struct cpu_user_regs *regs);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c874dd4..91f0912 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -148,6 +148,8 @@ struct vcpu
 
     struct vcpu     *next_in_list;
 
+    struct list_head blocked_vcpu_list;
+
     s_time_t         periodic_period;
     s_time_t         periodic_last_event;
     struct timer     periodic_timer;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-25 14:14   ` Zhang, Yang Z
  2015-03-26 19:57   ` Konrad Rzeszutek Wilk
  2015-03-25 12:31 ` [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling Feng Wu
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b2b4c26..b30392c 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1838,7 +1838,7 @@ const struct hvm_function_table * __init start_vmx(void)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
-        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+        alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
 
         if ( iommu_intpost )
             alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
@@ -3288,6 +3288,28 @@ void pi_wakeup_interrupt(struct cpu_user_regs *regs)
 }
 
 /*
+ * Handle VT-d posted-interrupt when VCPU is running.
+ */
+
+void pi_notification_interrupt(struct cpu_user_regs *regs)
+{
+    /*
+     * We get here because a vCPU is running in Root mode
+     * and a notification event has been injected to it.
+     *
+     * we need to set VCPU_KICK_SOFTIRQ for the current
+     * cpu, just like __vmx_deliver_posted_interrupt().
+     *
+     * So the pending interrupt in PIRR will be synced to
+     * vIRR before VM-Exit in time.
+     */
+    set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(smp_processor_id()));
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index f4296ab..e53275b 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -576,6 +576,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
 void pi_wakeup_interrupt(struct cpu_user_regs *regs);
+void pi_notification_interrupt(struct cpu_user_regs *regs);
 
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 20:16   ` Andrew Cooper
  2015-04-02  6:24   ` Tian, Kevin
  2015-03-25 12:31 ` [RFC v1 14/15] Suppress posting interrupts when 'SN' is set Feng Wu
                   ` (3 subsequent siblings)
  16 siblings, 2 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

The basic idea here is:
1. When vCPU's state is RUNSTATE_running,
        - set 'NV' to 'Notification Vector'.
        - Clear 'SN' to accpet PI.
        - set 'NDST' to the right pCPU.
2. When vCPU's state is RUNSTATE_blocked,
        - set 'NV' to 'Wake-up Vector', so we can wake up the
          related vCPU when posted-interrupt happens for it.
        - Clear 'SN' to accpet PI.
3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
        - Set 'SN' to suppress non-urgent interrupts.
          (Current, we only support non-urgent interrupts)
        - Set 'NV' back to 'Notification Vector' if needed.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 108 +++++++++++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c      |   3 ++
 2 files changed, 111 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b30392c..6323bd6 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
     __vmwrite(GUEST_INTR_STATUS, status);
 }
 
+static void vmx_pi_desc_update(struct vcpu *v, int new_state)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    struct pi_desc old, new;
+    int old_state = v->runstate.state;
+    unsigned long flags;
+
+    if ( !iommu_intpost )
+        return;
+
+    switch ( new_state )
+    {
+    case RUNSTATE_runnable:
+    case RUNSTATE_offline:
+        /*
+         * We don't need to send notification event to a non-running
+         * vcpu, the interrupt information will be delivered to it before
+         * VM-ENTRY when the vcpu is scheduled to run next time.
+         */
+        pi_set_sn(pi_desc);
+
+        /*
+         * If the state is transferred from RUNSTATE_blocked,
+         * we should set 'NV' feild back to posted_intr_vector,
+         * so the Posted-Interrupts can be delivered to the vCPU
+         * by VT-d HW after it is scheduled to run.
+         */
+        if ( old_state == RUNSTATE_blocked )
+        {
+            do
+            {
+                old.control = new.control = pi_desc->control;
+                new.nv = posted_intr_vector;
+            }
+            while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                    != old.control );
+
+           /*
+            * Delete the vCPU from the related wakeup queue
+            * if we are resuming from blocked state
+            */
+           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+                             v->processor), flags);
+           list_del(&v->blocked_vcpu_list);
+           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+                                  v->processor), flags);
+        }
+        break;
+
+    case RUNSTATE_blocked:
+        /*
+         * The vCPU is blocked on the wait queue.
+         * Store the blocked vCPU on the list of the
+         * vcpu->wakeup_cpu, which is the destination
+         * of the wake-up notification event.
+         */
+        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+                          v->processor), flags);
+        list_add_tail(&v->blocked_vcpu_list,
+                      &per_cpu(blocked_vcpu_on_cpu, v->processor));
+        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+                               v->processor), flags);
+
+        do
+        {
+            old.control = new.control = pi_desc->control;
+
+            /*
+             * We should not block the vCPU if
+             * an interrupt is posted for it.
+             */
+
+            if ( pi_test_on(&old) == 1 )
+            {
+                tasklet_schedule(&v->vcpu_wakeup_tasklet);
+                return;
+            }
+
+            pi_clear_sn(&new);
+            new.nv = pi_wakeup_vector;
+        }
+        while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                != old.control );
+        break;
+
+    case RUNSTATE_running:
+        ASSERT( pi_test_sn(pi_desc) == 1 );
+
+        do
+        {
+            old.control = new.control = pi_desc->control;
+            if ( x2apic_enabled )
+                new.ndst = cpu_physical_id(v->processor);
+            else
+                new.ndst = (cpu_physical_id(v->processor) << 8) & 0xFF00;
+
+            pi_clear_sn(&new);
+        }
+        while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                != old.control );
+        break;
+
+    default:
+        break;
+    }
+}
+
 void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
                                uint32_t *eax, uint32_t *ebx,
                                uint32_t *ecx, uint32_t *edx)
@@ -1795,6 +1902,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .process_isr          = vmx_process_isr,
     .deliver_posted_intr  = vmx_deliver_posted_intr,
     .sync_pir_to_irr      = vmx_sync_pir_to_irr,
+    .pi_desc_update       = vmx_pi_desc_update,
     .handle_eoi           = vmx_handle_eoi,
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..acf3186 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -157,6 +157,9 @@ static inline void vcpu_runstate_change(
         v->runstate.state_entry_time = new_entry_time;
     }
 
+    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
+        hvm_funcs.pi_desc_update(v, new_state);
+
     v->runstate.state = new_state;
 }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 20:34   ` Andrew Cooper
  2015-03-25 12:31 ` [RFC v1 15/15] Add a command line parameter for VT-d posted-interrupts Feng Wu
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 6323bd6..40c7b0e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1663,9 +1663,20 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 
 static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
 {
+    int r, sn;
+
     if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
         return;
 
+    /*
+     * Currently, we don't support urgent interrupt, all interrupts
+     * are recognized as non-urgent interrupt, so we cannot send
+     * posted-interrupt when 'SN' is set.
+     */
+
+    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
+    r = pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc);
+
     if ( unlikely(v->arch.hvm_vmx.eoi_exitmap_changed) )
     {
         /*
@@ -1675,7 +1686,7 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
          */
         pi_set_on(&v->arch.hvm_vmx.pi_desc);
     }
-    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
+    else if ( !r && !sn )
     {
         __vmx_deliver_posted_interrupt(v);
         return;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v1 15/15] Add a command line parameter for VT-d posted-interrupts
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (13 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 14/15] Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-03-25 12:31 ` Feng Wu
  2015-03-26 18:50 ` [RFC v1 00/15] Add VT-d Posted-Interrupts support Konrad Rzeszutek Wilk
  2015-04-01 13:21 ` Wu, Feng
  16 siblings, 0 replies; 101+ messages in thread
From: Feng Wu @ 2015-03-25 12:31 UTC (permalink / raw)
  To: xen-devel; +Cc: yang.z.zhang, Feng Wu, kevin.tian, keir, JBeulich

Enable VT-d Posted-Interrupts and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 xen/drivers/passthrough/iommu.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 302e3e4..1bda7e9 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -52,7 +52,7 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
-bool_t __read_mostly iommu_intpost = 0;
+bool_t __read_mostly iommu_intpost = 1;
 bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
@@ -101,6 +101,12 @@ static void __init parse_iommu_param(char *s)
             if ( iommu_intremap == 0 )
                 iommu_intpost = 0;
         }
+        else if ( !strcmp(s, "intpost") )
+        {
+            iommu_intpost = val;
+            if ( iommu_intremap == 0 )
+                iommu_intpost = 0;
+        }
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-25 12:31 ` [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-03-25 14:14   ` Zhang, Yang Z
  2015-03-27  4:40     ` Wu, Feng
  2015-03-26 19:57   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 101+ messages in thread
From: Zhang, Yang Z @ 2015-03-25 14:14 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Tian, Kevin, keir, JBeulich

Wu, Feng wrote on 2015-03-25:
> When a vCPU is running in Root mode and a notification event has been
> injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu,
> so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.

Shouldn't the pending interrupt be synced unconditionally before next vmentry? What happens if we didn't set the softirq?

> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>  2 files changed, 24 insertions(+), 1 deletion(-)
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
> b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
> hvm_function_table * __init start_vmx(void)
> 
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
> -        alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt); +       
> alloc_direct_apic_vector(&posted_intr_vector, +
> pi_notification_interrupt);
> 
>          if ( iommu_intpost )
>              alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
> pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
> 
>  /*
> + * Handle VT-d posted-interrupt when VCPU is running. + */ + +void
> pi_notification_interrupt(struct cpu_user_regs *regs) { +    /* +     *
> We get here because a vCPU is running in Root mode +     * and a
> notification event has been injected to it. +     * +     * we need to
> set VCPU_KICK_SOFTIRQ for the current +     * cpu, just like
> __vmx_deliver_posted_interrupt(). +     * +     * So the pending
> interrupt in PIRR will be synced to +     * vIRR before VM-Exit in time.
> +     */ +    set_bit(VCPU_KICK_SOFTIRQ,
> &softirq_pending(smp_processor_id())); + +    ack_APIC_irq(); +   
> this_cpu(irq_count)++; +} + +/*
>   * Local variables:
>   * mode: C
>   * c-file-style: "BSD"
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
> a/xen/include/asm-x86/hvm/vmx/vmx.h +++
> b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
> free_p2m_hap_data(struct p2m_domain *p2m); void p2m_init_hap_data(struct
> p2m_domain *p2m);
> 
>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> +void pi_notification_interrupt(struct cpu_user_regs *regs);
> 
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION         0


Best regards,
Yang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-03-25 12:31 ` [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-03-26 17:39   ` Andrew Cooper
  2015-03-27  4:46     ` Wu, Feng
  2015-03-27  9:52     ` Jan Beulich
  0 siblings, 2 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 17:39 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
>
> This patch adds variable 'iommu_intpost' to control whether enable VT-d
> posted-interrupt or not in the generic IOMMU code.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/drivers/passthrough/iommu.c | 11 ++++++++++-
>   xen/include/xen/iommu.h         |  2 +-
>   2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 92ea26f..302e3e4 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char key);
>    *   no-snoop                   Disable VT-d Snoop Control
>    *   no-qinval                  Disable VT-d Queued Invalidation
>    *   no-intremap                Disable VT-d Interrupt Remapping
> + *   no-intpost                 Disable VT-d Interrupt posting
>    */
>   custom_param("iommu", parse_iommu_param);
>   bool_t __initdata iommu_enable = 1;
> @@ -51,6 +52,7 @@ bool_t __read_mostly iommu_passthrough;
>   bool_t __read_mostly iommu_snoop = 1;
>   bool_t __read_mostly iommu_qinval = 1;
>   bool_t __read_mostly iommu_intremap = 1;
> +bool_t __read_mostly iommu_intpost = 0;
>   bool_t __read_mostly iommu_hap_pt_share = 1;
>   bool_t __read_mostly iommu_debug;
>   bool_t __read_mostly amd_iommu_perdev_intremap = 1;
> @@ -94,7 +96,11 @@ static void __init parse_iommu_param(char *s)
>           else if ( !strcmp(s, "qinval") )
>               iommu_qinval = val;
>           else if ( !strcmp(s, "intremap") )
> +        {
>               iommu_intremap = val;
> +            if ( iommu_intremap == 0 )
> +                iommu_intpost = 0;
> +        }
>           else if ( !strcmp(s, "debug") )
>           {
>               iommu_debug = val;

At no point here do you add an strcmp(s, "intpost"), which means that 
you do not alter the allowable command line syntax.

intpost must be able to be controlled independently of intremap, so I 
suggest

else if ( !strcmp(s, "intpost") )
     iommu_intpost = val;

and after the while loop,

if ( !iommu_intremap )
     iommu_intpost = 0;

To ensure that intpost is never 1 if intremap is 0.

Also, you must adjust the documentation in 
docs/misc/xen-command-line.markdown

~Andrew

> @@ -272,7 +278,10 @@ int __init iommu_setup(void)
>           iommu_enabled = (rc == 0);
>       }
>       if ( !iommu_enabled )
> +    {
>           iommu_intremap = 0;
> +        iommu_intpost = 0;
> +    }
>   
>       if ( (force_iommu && !iommu_enabled) ||
>            (force_intremap && !iommu_intremap) )
> @@ -341,7 +350,7 @@ void iommu_crash_shutdown(void)
>       const struct iommu_ops *ops = iommu_get_ops();
>       if ( iommu_enabled )
>           ops->crash_shutdown();
> -    iommu_enabled = iommu_intremap = 0;
> +    iommu_enabled = iommu_intremap = iommu_intpost = 0;
>   }
>   
>   bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index bf4aff0..91063bb 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -31,7 +31,7 @@
>   extern bool_t iommu_enable, iommu_enabled;
>   extern bool_t force_iommu, iommu_verbose;
>   extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
> -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
> +extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
>   extern bool_t iommu_hap_pt_share;
>   extern bool_t iommu_debug;
>   extern bool_t amd_iommu_perdev_intremap;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection
  2015-03-25 12:31 ` [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-03-26 18:12   ` Andrew Cooper
  2015-03-27  1:21     ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 18:12 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
>
> This patch adds feature detection logic for VT-d posted-interrupt.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/drivers/passthrough/vtd/iommu.c | 15 +++++++++++++--
>   xen/drivers/passthrough/vtd/iommu.h |  1 +
>   2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 891b9e3..86798a3 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2030,6 +2030,7 @@ static int init_vtd_hw(void)
>               if ( ioapic_to_iommu(IO_APIC_ID(apic)) == NULL )
>               {
>                   iommu_intremap = 0;
> +                iommu_intpost = 0;
>                   dprintk(XENLOG_ERR VTDPREFIX,
>                       "ioapic_to_iommu: ioapic %#x (id: %#x) is NULL! "
>                       "Will not try to enable Interrupt Remapping.\n",
> @@ -2046,6 +2047,7 @@ static int init_vtd_hw(void)
>               if ( enable_intremap(iommu, 0) != 0 )
>               {
>                   iommu_intremap = 0;
> +                iommu_intpost = 0;
>                   dprintk(XENLOG_WARNING VTDPREFIX,
>                           "Interrupt Remapping not enabled\n");
>   
> @@ -2119,8 +2121,8 @@ int __init intel_vtd_setup(void)
>       }
>   
>       /* We enable the following features only if they are supported by all VT-d
> -     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
> -     * Interrupt Remapping.
> +     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
> +     * Remapping, and Posted Interrupt
>        */
>       for_each_drhd_unit ( drhd )
>       {
> @@ -2146,7 +2148,13 @@ int __init intel_vtd_setup(void)
>               iommu_qinval = 0;
>   
>           if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> +        {
>               iommu_intremap = 0;
> +            iommu_intpost = 0;
> +        }
> +
> +        if ( iommu_intpost && !cap_intr_post(iommu->cap))

Missing space inside the outer bracket.

I am wondering whether it might be easier, instead of having 
"iommu_intremap = 0; iommu_intpost = 0" all over the place, to instead 
insist that one must check "iommu_intremap && iommu_intpost".

Out of interest, which platforms have intpost capabilities?

~Andrew

> +            iommu_intpost = 0;
>   
>           if ( !vtd_ept_page_compatible(iommu) )
>               iommu_hap_pt_share = 0;
> @@ -2164,6 +2172,7 @@ int __init intel_vtd_setup(void)
>       if ( !iommu_qinval && iommu_intremap )
>       {
>           iommu_intremap = 0;
> +        iommu_intpost = 0;
>           dprintk(XENLOG_WARNING VTDPREFIX, "Interrupt Remapping disabled "
>               "since Queued Invalidation isn't supported or enabled.\n");
>       }
> @@ -2173,6 +2182,7 @@ int __init intel_vtd_setup(void)
>       P(iommu_passthrough, "Dom0 DMA Passthrough");
>       P(iommu_qinval, "Queued Invalidation");
>       P(iommu_intremap, "Interrupt Remapping");
> +    P(iommu_intpost, "Posted Interrupt");
>       P(iommu_hap_pt_share, "Shared EPT tables");
>   #undef P
>   
> @@ -2192,6 +2202,7 @@ int __init intel_vtd_setup(void)
>       iommu_passthrough = 0;
>       iommu_qinval = 0;
>       iommu_intremap = 0;
> +    iommu_intpost = 0;
>       return ret;
>   }
>   
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index d6e6520..42047e0 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -69,6 +69,7 @@
>   /*
>    * Decoding Capability Register
>    */
> +#define cap_intr_post(c)       (((c) >> 59) & 1)
>   #define cap_read_drain(c)      (((c) >> 55) & 1)
>   #define cap_write_drain(c)     (((c) >> 54) & 1)
>   #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-03-25 12:31 ` [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-03-26 18:37   ` Andrew Cooper
  2015-03-27  1:32     ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 18:37 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> Extend struct pi_desc according to VT-d Posted-Interrupts Spec.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/include/asm-x86/hvm/vmx/vmcs.h | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 6fce6aa..9631461 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -76,8 +76,20 @@ struct vmx_domain {
>   
>   struct pi_desc {
>       DECLARE_BITMAP(pir, NR_VECTORS);
> -    u32 control;
> -    u32 rsvd[7];
> +    union {
> +        struct
> +        {
> +        u64 on     : 1,

Could you put a comment on each line giving the non-abreviated name for 
the fields, similar to ept_entry_t.

> +            sn     : 1,
> +            rsvd_1 : 13,
> +            ndm    : 1,

Which revision of the spec is this from?  The latest spec linked from 
your covering letter still doesn't identify this 'ndm' field.

~Andrew

> +            nv     : 8,
> +            rsvd_2 : 8,
> +            ndst   : 32;
> +        };
> +        u64 control;
> +    };
> +    u32 rsvd[6];
>   } __attribute__ ((aligned (64)));
>   
>   #define ept_get_wl(ept)   ((ept)->ept_wl)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts
  2015-03-25 12:31 ` [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-03-26 18:44   ` Andrew Cooper
  0 siblings, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 18:44 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> This patch adds some helper functions to manipulate the
> Posted-Interrupts Descriptor.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
>
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 91c5e18..ecc5e17 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -100,6 +100,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
>   void vmx_update_secondary_exec_control(struct vcpu *v);
>   
>   #define POSTED_INTR_ON  0
> +#define POSTED_INTR_SN  1
>   static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
>   {
>       return test_and_set_bit(vector, pi_desc->pir);
> @@ -120,6 +121,26 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
>       return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
>   }
>   
> +static inline int pi_test_on(struct pi_desc *pi_desc)

static inline bool_t pi_test_on(const struct pi_desc *pi_desc) please.  
Similar for test_sn below.

Is it important that these operations are properly atomic?  A custory 
glance at the rest of your series suggests not.  If not, please use the 
non-locked variants of test/clear/set_bit.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> +{
> +    return test_bit(POSTED_INTR_ON, &pi_desc->control);
> +}
> +
> +static inline void pi_set_sn(struct pi_desc *pi_desc)
> +{
> +    set_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
> +static inline int pi_test_sn(struct pi_desc *pi_desc)
> +{
> +    return test_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
> +static inline void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> +    clear_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
>   static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>   {
>       return xchg(&pi_desc->pir[group], 0);

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (14 preceding siblings ...)
  2015-03-25 12:31 ` [RFC v1 15/15] Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-03-26 18:50 ` Konrad Rzeszutek Wilk
  2015-03-27  1:06   ` Wu, Feng
  2015-04-01 13:21 ` Wu, Feng
  16 siblings, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-26 18:50 UTC (permalink / raw)
  To: Feng Wu; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich, xen-devel

On Wed, Mar 25, 2015 at 08:31:42PM +0800, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> 
> This patch set follow the following design:
> http://article.gmane.org/gmane.comp.emulators.xen.devel/236476

Would it be possible to put the design in xen/docs/ directory?

Thanks!

> 
> Feng Wu (15):
>   iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
>   vt-d: VT-d Posted-Interrupts feature detection
>   vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
>   vmx: Add some helper functions for Posted-Interrupts
>   vmx: Initialize VT-d Posted-Interrupts Descriptor
>   vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
>   vt-d: Add API to update IRTE when VT-d PI is used
>   Update IRTE according to guest interrupt config changes
>   Add a new per-vCPU tasklet to wakeup the blocked vCPU
>   vmx: Define two per-cpu variants
>   vmx: Add a global wake-up vector for VT-d Posted-Interrupts
>   vmx: Properly handle notification event when vCPU is running
>   Update Posted-Interrupts Descriptor during vCPU scheduling
>   Suppress posting interrupts when 'SN' is set
>   Add a command line parameter for VT-d posted-interrupts
> 
>  xen/arch/x86/hvm/vmx/vmcs.c            |   6 ++
>  xen/arch/x86/hvm/vmx/vmx.c             | 185 ++++++++++++++++++++++++++++++++-
>  xen/common/domain.c                    |  11 ++
>  xen/common/schedule.c                  |   3 +
>  xen/drivers/passthrough/io.c           |  77 +++++++++++++-
>  xen/drivers/passthrough/iommu.c        |  17 ++-
>  xen/drivers/passthrough/vtd/intremap.c |  83 +++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.c    |  15 ++-
>  xen/drivers/passthrough/vtd/iommu.h    |  23 ++++
>  xen/include/asm-x86/hvm/hvm.h          |   1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h     |  16 ++-
>  xen/include/asm-x86/hvm/vmx/vmx.h      |  49 ++++++++-
>  xen/include/asm-x86/iommu.h            |   2 +
>  xen/include/xen/iommu.h                |   2 +-
>  xen/include/xen/sched.h                |   5 +
>  15 files changed, 485 insertions(+), 10 deletions(-)
> 
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-25 12:31 ` [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-03-26 18:53   ` Andrew Cooper
  2015-03-27  1:45     ` Wu, Feng
  2015-03-26 19:29   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 18:53 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> This patch initializes the VT-d Posted-interrupt Descriptor.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
>   xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
>   2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index d614638..942f4b7 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
>   
>       if ( cpu_has_vmx_posted_intr_processing )
>       {
> +        if ( iommu_intpost == 1 )

Being a boolean, please shorten this to "if ( iommu_intpost )"

> +            pi_desc_init(v);
> +
>           __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
>           __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
>       }
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index ecc5e17..3cd75eb 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,9 @@
>   #include <asm/hvm/support.h>
>   #include <asm/hvm/trace.h>
>   #include <asm/hvm/vmx/vmcs.h>
> +#include <asm/apic.h>
> +
> +extern uint8_t posted_intr_vector;
>   
>   typedef union {
>       struct {
> @@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>       return xchg(&pi_desc->pir[group], 0);
>   }
>   
> +static inline void pi_desc_init(struct vcpu *v)
> +{
> +    uint32_t dest;
> +
> +    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);

This is called during construct_vmcs().  You safely rely on the fact 
that all memory is safely zeroed, so you don't need this clear bit, nor 
the lower setting of ndm.

> +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> +
> +    /* Physical mode for Notificaiton Event */
> +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> +    dest = cpu_physical_id(v->processor);
> +
> +    if ( x2apic_enabled )
> +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> +    else
> +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;

What takes care of ensuring that this ndst field gets updated whenever 
v->processor changes?

~Andrew

> +}
> +
>   /*
>    * Exit Reasons
>    */
> @@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>   #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
>   
>   extern u64 vmx_ept_vpid_cap;
> -extern uint8_t posted_intr_vector;
>   
>   #define cpu_has_vmx_ept_exec_only_supported        \
>       (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-03-25 12:31 ` [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
@ 2015-03-26 19:00   ` Andrew Cooper
  2015-03-27  1:53     ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 19:00 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/drivers/passthrough/vtd/iommu.h | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
>
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 42047e0..cd61e12 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -303,6 +303,18 @@ struct iremap_entry {
>               res_2   : 8,
>               dst     : 32;
>       }lo;
> +    struct {
> +        u64 p       : 1,
> +            fpd     : 1,
> +            res_1   : 6,
> +            avail   : 4,
> +            res_2   : 2,
> +            urg     : 1,
> +            im      : 1,
> +            vector  : 8,
> +            res_3   : 14,
> +            pda_l   : 26;
> +    }lo_intpost;
>     };
>     union {
>       u64 hi_val;
> @@ -312,6 +324,13 @@ struct iremap_entry {
>               svt     : 2,
>               res_1   : 44;
>       }hi;
> +    struct {
> +        u64 sid     : 16,
> +            sq      : 2,
> +            svt     : 2,
> +            res_1   : 12,
> +            pda_h   : 32;
> +    }hi_intpost;

I would prefer if this union was reformatted as I suggested in the 
thread from your design doc, but I won't insist on it as a blocker to entry.

Please however name each of the fields with a comment.

~Andrew

>     };
>   };
>   

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-03-26 19:17   ` Andrew Cooper
  2015-03-27  2:13     ` Wu, Feng
  2015-03-27  4:52     ` Wu, Feng
  2015-03-26 19:36   ` Konrad Rzeszutek Wilk
  2015-04-02  5:34   ` Tian, Kevin
  2 siblings, 2 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 19:17 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/drivers/passthrough/vtd/intremap.c | 83 ++++++++++++++++++++++++++++++++++
>   xen/drivers/passthrough/vtd/iommu.h    |  3 ++
>   xen/include/asm-x86/iommu.h            |  2 +
>   3 files changed, 88 insertions(+)
>
> diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
> index 0333686..f44e74d 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
>       for_each_drhd_unit ( drhd )
>           disable_qinval(drhd->iommu);
>   }
> +
> +/*
> + * This function is used to update the IRTE for posted-interrupt
> + * when guest changes MSI/MSI-X information
> + */
> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )

Stray space after gvec.

I presume gvec means "guest vector", in which case it should be a uint8_t.

> +{
> +    struct irq_desc *desc;
> +    struct msi_desc *msi_desc;
> +    int remap_index, rc = -1;
> +    struct pci_dev *pci_dev;
> +    struct acpi_drhd_unit *drhd;
> +    struct iommu *iommu;
> +    struct ir_ctrl *ir_ctrl;
> +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +    struct iremap_entry new_ire;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    unsigned long flags;
> +
> +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +    if ( !desc )
> +        return -1;
> +
> +    msi_desc = desc->msi_desc;
> +    if ( !msi_desc )
> +        goto unlock_out;
> +
> +    remap_index = msi_desc->remap_index;
> +    pci_dev = msi_desc->dev;
> +    if ( !pci_dev )
> +        goto unlock_out;
> +
> +    drhd = acpi_find_matched_drhd_unit(pci_dev);

This does an O(n^2) walk over the dhrd_units list and the unit 
device_cnt.  As the layout will be completely static, might it be better 
to stash drhd information in struct pci_dev during boot?

> +    if (!drhd)

Spaces inside brackets.

> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");

This is a useless error message, providing no useful information to 
identify the issue.

At the very least it should identify the domain and vcpu (%pv of v), the 
guest vector and pci device.  Also, drop the exclamation mark - it is 
not useful in the log.

> +        goto unlock_out;
> +    }
> +
> +    iommu = drhd->iommu;
> +    ir_ctrl = iommu_ir_ctrl(iommu);
> +    if ( !ir_ctrl )
> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");

Needs simiarly extending with some information.

> +        goto unlock_out;
> +    }
> +
> +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> +
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> +
> +    memcpy(&new_ire, p, sizeof(struct iremap_entry));

sizeof(new_ire) please

> +
> +    /* Setup/Update interrupt remapping table entry */
> +    new_ire.lo_intpost.urg = 0;
> +    new_ire.lo_intpost.vector = gvec;
> +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> +                                (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
> +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> +                                ~(-1UL << PDA_HIGH_BIT);

Is it possible to hide this bit manipulation in side a static inline 
function which takes an ire and pi_desc pointer rather than open coding it?

> +
> +    new_ire.lo_intpost.res_1 = 0;
> +    new_ire.lo_intpost.res_2 = 0;
> +    new_ire.lo_intpost.res_3 = 0;
> +    new_ire.hi_intpost.res_1 = 0;
> +
> +    new_ire.lo_intpost.im = 1;
> +
> +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
> +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));

Same here regarding sizeof.

Furthermore, is the memcpy() safe to update the live descriptor?

If it is, why do you not update the descriptor in place using p-> rather 
than copying it to the stack and back?

~Andrew

> +    iommu_flush_iec_index(iommu, 0, remap_index);
> +
> +    if ( iremap_entries )
> +        unmap_vtd_domain_page(iremap_entries);
> +
> +    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
> +
> +    rc = 0;
> + unlock_out:
> +    spin_unlock_irq(&desc->lock);
> +
> +    return rc;
> +}
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index cd61e12..ffa72c8 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -334,6 +334,9 @@ struct iremap_entry {
>     };
>   };
>   
> +#define PDA_LOW_BIT    26
> +#define PDA_HIGH_BIT   32
> +
>   /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
>   #define IREMAP_PAGE_ORDER  8
>   
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index e7a65da..d233621 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -32,6 +32,8 @@ int iommu_supports_eim(void);
>   int iommu_enable_x2apic_IR(void);
>   void iommu_disable_x2apic_IR(void);
>   
> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
> +
>   #endif /* !__ARCH_X86_IOMMU_H__ */
>   /*
>    * Local variables:

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-25 12:31 ` [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
  2015-03-26 18:53   ` Andrew Cooper
@ 2015-03-26 19:29   ` Konrad Rzeszutek Wilk
  2015-03-27  1:45     ` Wu, Feng
  2015-05-04  5:32     ` Wu, Feng
  1 sibling, 2 replies; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-26 19:29 UTC (permalink / raw)
  To: Feng Wu; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich, xen-devel

On Wed, Mar 25, 2015 at 08:31:47PM +0800, Feng Wu wrote:
> This patch initializes the VT-d Posted-interrupt Descriptor.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
>  xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index d614638..942f4b7 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
>  
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
> +        if ( iommu_intpost == 1 )

if ( iommu_intpost )
	.. ?
> +            pi_desc_init(v);
> +
>          __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
>          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
>      }
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index ecc5e17..3cd75eb 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,9 @@
>  #include <asm/hvm/support.h>
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/vmx/vmcs.h>
> +#include <asm/apic.h>
> +
> +extern uint8_t posted_intr_vector;
>  
>  typedef union {
>      struct {
> @@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>      return xchg(&pi_desc->pir[group], 0);
>  }
>  
> +static inline void pi_desc_init(struct vcpu *v)
> +{
> +    uint32_t dest;
> +
> +    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);
> +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> +
> +    /* Physical mode for Notificaiton Event */

s/Notificaiton/Notification/
> +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> +    dest = cpu_physical_id(v->processor);
> +
> +    if ( x2apic_enabled )
> +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> +    else
> +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;

Surely there are some macros for that?
> +}
> +
>  /*
>   * Exit Reasons
>   */
> @@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>  #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
>  
>  extern u64 vmx_ept_vpid_cap;
> -extern uint8_t posted_intr_vector;
>  
>  #define cpu_has_vmx_ept_exec_only_supported        \
>      (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
  2015-03-26 19:17   ` Andrew Cooper
@ 2015-03-26 19:36   ` Konrad Rzeszutek Wilk
  2015-03-27  1:59     ` Wu, Feng
  2015-04-02  5:34   ` Tian, Kevin
  2 siblings, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-26 19:36 UTC (permalink / raw)
  To: Feng Wu; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich, xen-devel

On Wed, Mar 25, 2015 at 08:31:49PM +0800, Feng Wu wrote:
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/drivers/passthrough/vtd/intremap.c | 83 ++++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.h    |  3 ++
>  xen/include/asm-x86/iommu.h            |  2 +
>  3 files changed, 88 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
> index 0333686..f44e74d 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
>      for_each_drhd_unit ( drhd )
>          disable_qinval(drhd->iommu);
>  }
> +
> +/*
> + * This function is used to update the IRTE for posted-interrupt
> + * when guest changes MSI/MSI-X information

Missing period at the end.

> + */
> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )

There is an extra space at the end?
> +{
> +    struct irq_desc *desc;
> +    struct msi_desc *msi_desc;
> +    int remap_index, rc = -1;
> +    struct pci_dev *pci_dev;
> +    struct acpi_drhd_unit *drhd;
> +    struct iommu *iommu;
> +    struct ir_ctrl *ir_ctrl;
> +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +    struct iremap_entry new_ire;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    unsigned long flags;
> +
> +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +    if ( !desc )
> +        return -1;
> +
> +    msi_desc = desc->msi_desc;
> +    if ( !msi_desc )
> +        goto unlock_out;
> +
> +    remap_index = msi_desc->remap_index;

Could you move this right below the 'goto' check? No point
of doing this if pci_dev is NULL.

> +    pci_dev = msi_desc->dev;
> +    if ( !pci_dev )
> +        goto unlock_out;
> +
> +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> +    if (!drhd)

Missing spaces around !drhd.

> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");

Perhaps a bit more data. Can you include the pci_dev BDF as well?

> +        goto unlock_out;
> +    }
> +
> +    iommu = drhd->iommu;
> +    ir_ctrl = iommu_ir_ctrl(iommu);
> +    if ( !ir_ctrl )
> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");

.. for IOMMU (with some data that can help diagnose the issue when one
boots with 'iommu=verbose') please.

> +        goto unlock_out;
> +    }
> +
> +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> +
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> +
> +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> +
> +    /* Setup/Update interrupt remapping table entry */

Missing period at the end.
> +    new_ire.lo_intpost.urg = 0;
> +    new_ire.lo_intpost.vector = gvec;
> +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> +                                (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
> +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &

You have an extra space after >>
> +                                ~(-1UL << PDA_HIGH_BIT);

You can make ~(-1UL << PDA_XX_BIT) and macro here..

> +
> +    new_ire.lo_intpost.res_1 = 0;
> +    new_ire.lo_intpost.res_2 = 0;
> +    new_ire.lo_intpost.res_3 = 0;
> +    new_ire.hi_intpost.res_1 = 0;
> +
> +    new_ire.lo_intpost.im = 1;
> +
> +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
> +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
> +    iommu_flush_iec_index(iommu, 0, remap_index);
> +
> +    if ( iremap_entries )
> +        unmap_vtd_domain_page(iremap_entries);
> +
> +    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
> +
> +    rc = 0;
> + unlock_out:
> +    spin_unlock_irq(&desc->lock);
> +
> +    return rc;
> +}
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index cd61e12..ffa72c8 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -334,6 +334,9 @@ struct iremap_entry {
>    };
>  };
>  
> +#define PDA_LOW_BIT    26
> +#define PDA_HIGH_BIT   32
> +
>  /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
>  #define IREMAP_PAGE_ORDER  8
>  
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index e7a65da..d233621 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -32,6 +32,8 @@ int iommu_supports_eim(void);
>  int iommu_enable_x2apic_IR(void);
>  void iommu_disable_x2apic_IR(void);
>  
> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
> +
>  #endif /* !__ARCH_X86_IOMMU_H__ */
>  /*
>   * Local variables:
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-03-26 19:46   ` Konrad Rzeszutek Wilk
  2015-03-27  5:45     ` Wu, Feng
  2015-03-26 19:59   ` Andrew Cooper
  2015-04-02  5:52   ` Tian, Kevin
  2 siblings, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-26 19:46 UTC (permalink / raw)
  To: Feng Wu; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich, xen-devel

On Wed, Mar 25, 2015 at 08:31:50PM +0800, Feng Wu wrote:
> When guest changes its interrupt configuration (such as, vector, etc.)

s/such as,/such as/
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
> 
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> 
> For multicase/broadcast vCPU, we cannot handle it via interrupt posting,

multicase? Or multicast? or multicascade??
> still use interrupt remapping.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/drivers/passthrough/io.c | 77 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..1d9a132 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -26,6 +26,7 @@
>  #include <asm/hvm/iommu.h>
>  #include <asm/hvm/support.h>
>  #include <xen/hvm/irq.h>
> +#include <asm/io_apic.h>
>  
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
>  
> @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>      xfree(dpci);
>  }
>  
> +/*
> + * Here we handle the following cases:
> + * - For lowest-priority interrupts, we find the destination vCPU from the
> + *   guest vector using vector-hashing mechamisn and return true. This follows

s/mechamism/mechanism/

> + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> + *   handle the lowest-priority interrupt.
> + * - Otherwise, for single destination interrupt, it is straightforward to
> + *   find the destination vCPU and return true.
> + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,

s/multicase/??/
> + *   so return false.
> + */
> +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> +                                uint8_t dest_mode, uint8_t deliver_mode,
> +                                uint32_t gvec, struct vcpu **dest_vcpu)
> +{
> +    struct vcpu *v, **dest_vcpu_array;
> +    unsigned int dest_vcpu_num = 0;
> +    int ret;
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> +
Please check that dest_vcpu_array was allocated.

> +    for_each_vcpu ( d, v )
> +    {
> +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> +                                dest_id, dest_mode) )
> +            continue;
> +
> +        dest_vcpu_num++;
> +
> +        if ( deliver_mode == dest_LowestPrio )
> +            dest_vcpu_array[dest_vcpu_num] = v;
> +        else
> +            *dest_vcpu = v;

Should there be an break here?
> +    }
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +    {
> +        if (  dest_vcpu_num != 0 )
> +        {
> +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> +            ret = 1;
> +        }
> +        else
> +            ret = 0;
> +
> +        xfree(dest_vcpu_array);
> +        return ret;
> +    }
> +    else if (  dest_vcpu_num == 1 )
> +        return 1;
> +    else
> +        return 0;
> +}
> +
>  int pt_irq_create_bind(
>      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
>  {
> @@ -256,7 +313,7 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, dest_mode;
> +        uint8_t dest, dest_mode, deliver_mode;
>          int dest_vcpu_id;
>  
>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -330,11 +386,30 @@ int pt_irq_create_bind(
>          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
>          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> +        deliver_mode = (pirq_dpci->gmsi.gflags >> GFLAGS_SHIFT_DELIV_MODE) &
> +                        VMSI_DELIV_MASK;
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
>          if ( dest_vcpu_id >= 0 )
>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> +
> +        /* Use interrupt posting if it is supported */
> +        if ( iommu_intpost )
> +        {
> +            struct vcpu *vcpu = NULL;
> +
> +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> +                break;
> +
> +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )

s/ != 0//


> +            {
> +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");

Perhaps with some data on which domain it is for? And what vector?

> +                return -EBUSY;

Hmm.. Under what conditions can this actually happen? What should the
recepient do?
> +            }
> +        }
> +
>          break;
>      }
>  
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-25 12:31 ` [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Feng Wu
  2015-03-25 14:14   ` Zhang, Yang Z
@ 2015-03-26 19:57   ` Konrad Rzeszutek Wilk
  2015-03-27  3:06     ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-26 19:57 UTC (permalink / raw)
  To: Feng Wu; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich, xen-devel

On Wed, Mar 25, 2015 at 08:31:54PM +0800, Feng Wu wrote:
> When a vCPU is running in Root mode and a notification event
> has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
> the current cpu, so the pending interrupt in PIRR will be
> synced to vIRR before VM-Exit in time.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b2b4c26..b30392c 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1838,7 +1838,7 @@ const struct hvm_function_table * __init start_vmx(void)
>  
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
> -        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
> +        alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
>  
>          if ( iommu_intpost )
>              alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> @@ -3288,6 +3288,28 @@ void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>  }
>  
>  /*
> + * Handle VT-d posted-interrupt when VCPU is running.
> + */
> +
> +void pi_notification_interrupt(struct cpu_user_regs *regs)
> +{
> +    /*
> +     * We get here because a vCPU is running in Root mode
> +     * and a notification event has been injected to it.
> +     *
> +     * we need to set VCPU_KICK_SOFTIRQ for the current
> +     * cpu, just like __vmx_deliver_posted_interrupt().
> +     *
> +     * So the pending interrupt in PIRR will be synced to
> +     * vIRR before VM-Exit in time.
> +     */
> +    set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(smp_processor_id()));

Could you use the 'raise_softirq' instead?

> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +}
> +
> +/*
>   * Local variables:
>   * mode: C
>   * c-file-style: "BSD"
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index f4296ab..e53275b 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -576,6 +576,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
>  void p2m_init_hap_data(struct p2m_domain *p2m);
>  
>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> +void pi_notification_interrupt(struct cpu_user_regs *regs);
>  
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION         0
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
  2015-03-26 19:46   ` Konrad Rzeszutek Wilk
@ 2015-03-26 19:59   ` Andrew Cooper
  2015-03-27  5:49     ` Wu, Feng
  2015-04-02  5:52   ` Tian, Kevin
  2 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 19:59 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
>
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
>
> For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> still use interrupt remapping.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/drivers/passthrough/io.c | 77 +++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..1d9a132 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -26,6 +26,7 @@
>   #include <asm/hvm/iommu.h>
>   #include <asm/hvm/support.h>
>   #include <xen/hvm/irq.h>
> +#include <asm/io_apic.h>
>   
>   static DEFINE_PER_CPU(struct list_head, dpci_list);
>   
> @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>       xfree(dpci);
>   }
>   
> +/*
> + * Here we handle the following cases:
> + * - For lowest-priority interrupts, we find the destination vCPU from the
> + *   guest vector using vector-hashing mechamisn and return true. This follows
> + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> + *   handle the lowest-priority interrupt.

What is the hashing algorithm, or can I have some hint as to where to 
find it in a manual?

> + * - Otherwise, for single destination interrupt, it is straightforward to
> + *   find the destination vCPU and return true.
> + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> + *   so return false.
> + */
> +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> +                                uint8_t dest_mode, uint8_t deliver_mode,
> +                                uint32_t gvec, struct vcpu **dest_vcpu)
> +{
> +    struct vcpu *v, **dest_vcpu_array;
> +    unsigned int dest_vcpu_num = 0;
> +    int ret;
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);

This allocation can fail, but you really should see about avoiding it 
entirely, if possible.

> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> +                                dest_id, dest_mode) )
> +            continue;
> +
> +        dest_vcpu_num++;
> +
> +        if ( deliver_mode == dest_LowestPrio )
> +            dest_vcpu_array[dest_vcpu_num] = v;
> +        else
> +            *dest_vcpu = v;
> +    }
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +    {
> +        if (  dest_vcpu_num != 0 )
> +        {
> +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> +            ret = 1;
> +        }
> +        else
> +            ret = 0;
> +
> +        xfree(dest_vcpu_array);
> +        return ret;
> +    }
> +    else if (  dest_vcpu_num == 1 )
> +        return 1;
> +    else
> +        return 0;
> +}
> +
>   int pt_irq_create_bind(
>       struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
>   {
> @@ -257,7 +313,7 @@ int pt_irq_create_bind(
>       {
>       case PT_IRQ_TYPE_MSI:
>       {
> -        uint8_t dest, dest_mode;
> +        uint8_t dest, dest_mode, deliver_mode;
>           int dest_vcpu_id;
>   
>           if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -330,11 +386,30 @@ int pt_irq_create_bind(
>           /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>           dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
>           dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> +        deliver_mode = (pirq_dpci->gmsi.gflags >> GFLAGS_SHIFT_DELIV_MODE) &
> +                        VMSI_DELIV_MASK;

s/deliver/delivery/

Also, you should be able to use MASK_EXTR() rather than manual shifts 
and masks.

>           dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>           pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>           spin_unlock(&d->event_lock);
>           if ( dest_vcpu_id >= 0 )
>               hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> +
> +        /* Use interrupt posting if it is supported */
> +        if ( iommu_intpost )
> +        {
> +            struct vcpu *vcpu = NULL;
> +
> +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> +                break;
> +
> +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
> +            {
> +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");

Please put far more information this error message.

> +                return -EBUSY;

Under what circumstances can this happen.  I don't think it is valid to 
fail the userspace bind hypercall in this case.

~Andrew

> +            }
> +        }
> +
>           break;
>       }
>   

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 10/15] vmx: Define two per-cpu variants
  2015-03-25 12:31 ` [RFC v1 10/15] vmx: Define two per-cpu variants Feng Wu
@ 2015-03-26 19:59   ` Andrew Cooper
  2015-04-02  5:54   ` Tian, Kevin
  1 sibling, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 19:59 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> This patch defines two per-cpu variants:

You do not mean variants.  You mean variables.

~Andrew

>
> blocked_vcpu_on_cpu:
> A list storing the vCPUs which were blocked on this pCPU.
>
> blocked_vcpu_on_cpu_lock:
> The spinlock to protect blocked_vcpu_on_cpu.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmcs.c       | 3 +++
>   xen/arch/x86/hvm/vmx/vmx.c        | 7 +++++++
>   xen/include/asm-x86/hvm/vmx/vmx.h | 3 +++
>   3 files changed, 13 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 942f4b7..1345e69 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -585,6 +585,9 @@ int vmx_cpu_up(void)
>       if ( cpu_has_vmx_vpid )
>           vpid_sync_all();
>   
> +    INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> +    spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +
>       return 0;
>   }
>   
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e1c55ce..ff5544d 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -81,6 +81,13 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
>   static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>   static void vmx_invlpg_intercept(unsigned long vaddr);
>   
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be waken up.
> + */
> +DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>   uint8_t __read_mostly posted_intr_vector;
>   
>   static int vmx_domain_initialise(struct domain *d)
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 3cd75eb..e643c3c 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -30,6 +30,9 @@
>   #include <asm/hvm/vmx/vmcs.h>
>   #include <asm/apic.h>
>   
> +DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>   extern uint8_t posted_intr_vector;
>   
>   typedef union {

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-03-25 12:31 ` [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts Feng Wu
@ 2015-03-26 20:07   ` Andrew Cooper
  2015-04-02  6:00   ` Tian, Kevin
  1 sibling, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 20:07 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> This patch adds a global vector which is used to wake up
> the blocked vCPU when an interrupt is being posted to it.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmx.c        | 33 +++++++++++++++++++++++++++++++++
>   xen/include/asm-x86/hvm/hvm.h     |  1 +
>   xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
>   xen/include/xen/sched.h           |  2 ++
>   4 files changed, 39 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ff5544d..b2b4c26 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
>   DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
>   
>   uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
>   
>   static int vmx_domain_initialise(struct domain *d)
>   {
> @@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>       if ( v->vcpu_id == 0 )
>           v->arch.user_regs.eax = 1;
>   
> +    INIT_LIST_HEAD(&v->blocked_vcpu_list);
> +
>       return 0;
>   }
>   
> @@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init start_vmx(void)
>       }
>   
>       if ( cpu_has_vmx_posted_intr_processing )
> +    {
>           alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
> +
> +        if ( iommu_intpost )
> +            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> +        else
> +            vmx_function_table.pi_desc_update = NULL;
> +    }
>       else
>       {
>           vmx_function_table.deliver_posted_intr = NULL;
>           vmx_function_table.sync_pir_to_irr = NULL;
> +        vmx_function_table.pi_desc_update = NULL;
>       }
>   
>       if ( cpu_has_vmx_ept
> @@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>   }
>   
>   /*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct vcpu *v;
> +    int cpu = smp_processor_id();

unsigned int

~Andrew

> +
> +    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +                    blocked_vcpu_list) {
> +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +        if ( pi_test_on(pi_desc) == 1 )
> +            tasklet_schedule(&v->vcpu_wakeup_tasklet);
> +    }
> +    spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +}
> +
> +/*
>    * Local variables:
>    * mode: C
>    * c-file-style: "BSD"
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 0dc909b..a11a256 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -195,6 +195,7 @@ struct hvm_function_table {
>       void (*deliver_posted_intr)(struct vcpu *v, u8 vector);
>       void (*sync_pir_to_irr)(struct vcpu *v);
>       void (*handle_eoi)(u8 vector);
> +    void (*pi_desc_update)(struct vcpu *v, int new_state);
>   
>       /*Walk nested p2m  */
>       int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa,
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index e643c3c..f4296ab 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -34,6 +34,7 @@ DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
>   DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
>   
>   extern uint8_t posted_intr_vector;
> +extern uint8_t pi_wakeup_vector;
>   
>   typedef union {
>       struct {
> @@ -574,6 +575,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
>   void free_p2m_hap_data(struct p2m_domain *p2m);
>   void p2m_init_hap_data(struct p2m_domain *p2m);
>   
> +void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> +
>   /* EPT violation qualifications definitions */
>   #define _EPT_READ_VIOLATION         0
>   #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index c874dd4..91f0912 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -148,6 +148,8 @@ struct vcpu
>   
>       struct vcpu     *next_in_list;
>   
> +    struct list_head blocked_vcpu_list;
> +
>       s_time_t         periodic_period;
>       s_time_t         periodic_last_event;
>       struct timer     periodic_timer;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-03-25 12:31 ` [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling Feng Wu
@ 2015-03-26 20:16   ` Andrew Cooper
  2015-03-27  2:59     ` Wu, Feng
  2015-04-02  6:24   ` Tian, Kevin
  1 sibling, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 20:16 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> The basic idea here is:
> 1. When vCPU's state is RUNSTATE_running,
>          - set 'NV' to 'Notification Vector'.
>          - Clear 'SN' to accpet PI.
>          - set 'NDST' to the right pCPU.
> 2. When vCPU's state is RUNSTATE_blocked,
>          - set 'NV' to 'Wake-up Vector', so we can wake up the
>            related vCPU when posted-interrupt happens for it.
>          - Clear 'SN' to accpet PI.
> 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
>          - Set 'SN' to suppress non-urgent interrupts.
>            (Current, we only support non-urgent interrupts)
>          - Set 'NV' back to 'Notification Vector' if needed.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmx.c | 108 +++++++++++++++++++++++++++++++++++++++++++++
>   xen/common/schedule.c      |   3 ++
>   2 files changed, 111 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b30392c..6323bd6 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
>       __vmwrite(GUEST_INTR_STATUS, status);
>   }
>   
> +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    struct pi_desc old, new;
> +    int old_state = v->runstate.state;
> +    unsigned long flags;
> +
> +    if ( !iommu_intpost )
> +        return;
> +
> +    switch ( new_state )
> +    {
> +    case RUNSTATE_runnable:
> +    case RUNSTATE_offline:
> +        /*
> +         * We don't need to send notification event to a non-running
> +         * vcpu, the interrupt information will be delivered to it before
> +         * VM-ENTRY when the vcpu is scheduled to run next time.
> +         */
> +        pi_set_sn(pi_desc);
> +
> +        /*
> +         * If the state is transferred from RUNSTATE_blocked,
> +         * we should set 'NV' feild back to posted_intr_vector,
> +         * so the Posted-Interrupts can be delivered to the vCPU
> +         * by VT-d HW after it is scheduled to run.
> +         */
> +        if ( old_state == RUNSTATE_blocked )
> +        {
> +            do
> +            {
> +                old.control = new.control = pi_desc->control;
> +                new.nv = posted_intr_vector;
> +            }
> +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                    != old.control );
> +
> +           /*
> +            * Delete the vCPU from the related wakeup queue
> +            * if we are resuming from blocked state
> +            */
> +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                             v->processor), flags);
> +           list_del(&v->blocked_vcpu_list);

The scheduler is perfectly able to change v->processor behind your back, 
so your spinlocks are not protecting access to v->blocked_vcpu_list.

~Andrew

> +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                                  v->processor), flags);
> +        }
> +        break;
> +
> +    case RUNSTATE_blocked:
> +        /*
> +         * The vCPU is blocked on the wait queue.
> +         * Store the blocked vCPU on the list of the
> +         * vcpu->wakeup_cpu, which is the destination
> +         * of the wake-up notification event.
> +         */
> +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                          v->processor), flags);
> +        list_add_tail(&v->blocked_vcpu_list,
> +                      &per_cpu(blocked_vcpu_on_cpu, v->processor));
> +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                               v->processor), flags);
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +
> +            /*
> +             * We should not block the vCPU if
> +             * an interrupt is posted for it.
> +             */
> +
> +            if ( pi_test_on(&old) == 1 )
> +            {
> +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> +                return;
> +            }
> +
> +            pi_clear_sn(&new);
> +            new.nv = pi_wakeup_vector;
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );
> +        break;
> +
> +    case RUNSTATE_running:
> +        ASSERT( pi_test_sn(pi_desc) == 1 );
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +            if ( x2apic_enabled )
> +                new.ndst = cpu_physical_id(v->processor);
> +            else
> +                new.ndst = (cpu_physical_id(v->processor) << 8) & 0xFF00;
> +
> +            pi_clear_sn(&new);
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );
> +        break;
> +
> +    default:
> +        break;
> +    }
> +}
> +
>   void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
>                                  uint32_t *eax, uint32_t *ebx,
>                                  uint32_t *ecx, uint32_t *edx)
> @@ -1795,6 +1902,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>       .process_isr          = vmx_process_isr,
>       .deliver_posted_intr  = vmx_deliver_posted_intr,
>       .sync_pir_to_irr      = vmx_sync_pir_to_irr,
> +    .pi_desc_update       = vmx_pi_desc_update,
>       .handle_eoi           = vmx_handle_eoi,
>       .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
>       .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ef79847..acf3186 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -157,6 +157,9 @@ static inline void vcpu_runstate_change(
>           v->runstate.state_entry_time = new_entry_time;
>       }
>   
> +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> +        hvm_funcs.pi_desc_update(v, new_state);
> +
>       v->runstate.state = new_state;
>   }
>   

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-25 12:31 ` [RFC v1 14/15] Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-03-26 20:34   ` Andrew Cooper
  2015-03-27  3:00     ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-26 20:34 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir, JBeulich

On 25/03/15 12:31, Feng Wu wrote:
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot send
> posted-interrupt when 'SN' is set.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmx.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 6323bd6..40c7b0e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1663,9 +1663,20 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>   
>   static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>   {
> +    int r, sn;
> +
>       if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
>           return;
>   
> +    /*
> +     * Currently, we don't support urgent interrupt, all interrupts
> +     * are recognized as non-urgent interrupt, so we cannot send
> +     * posted-interrupt when 'SN' is set.
> +     */
> +
> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);

Is there anywhere which sets sn at all? I cant spot anywhere.

~Andrew

> +    r = pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc);
> +
>       if ( unlikely(v->arch.hvm_vmx.eoi_exitmap_changed) )
>       {
>           /*
> @@ -1675,7 +1686,7 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>            */
>           pi_set_on(&v->arch.hvm_vmx.pi_desc);
>       }
> -    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> +    else if ( !r && !sn )
>       {
>           __vmx_deliver_posted_interrupt(v);
>           return;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-03-26 18:50 ` [RFC v1 00/15] Add VT-d Posted-Interrupts support Konrad Rzeszutek Wilk
@ 2015-03-27  1:06   ` Wu, Feng
  2015-03-27 14:44     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 2:51 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support
> 
> On Wed, Mar 25, 2015 at 08:31:42PM +0800, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> >
> > This patch set follow the following design:
> > http://article.gmane.org/gmane.comp.emulators.xen.devel/236476
> 
> Would it be possible to put the design in xen/docs/ directory?

That's a good suggestion. I will do it in the next post.

Thanks,
Feng

> 
> Thanks!
> 
> >
> > Feng Wu (15):
> >   iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
> >   vt-d: VT-d Posted-Interrupts feature detection
> >   vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
> >   vmx: Add some helper functions for Posted-Interrupts
> >   vmx: Initialize VT-d Posted-Interrupts Descriptor
> >   vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
> >   vt-d: Add API to update IRTE when VT-d PI is used
> >   Update IRTE according to guest interrupt config changes
> >   Add a new per-vCPU tasklet to wakeup the blocked vCPU
> >   vmx: Define two per-cpu variants
> >   vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> >   vmx: Properly handle notification event when vCPU is running
> >   Update Posted-Interrupts Descriptor during vCPU scheduling
> >   Suppress posting interrupts when 'SN' is set
> >   Add a command line parameter for VT-d posted-interrupts
> >
> >  xen/arch/x86/hvm/vmx/vmcs.c            |   6 ++
> >  xen/arch/x86/hvm/vmx/vmx.c             | 185
> ++++++++++++++++++++++++++++++++-
> >  xen/common/domain.c                    |  11 ++
> >  xen/common/schedule.c                  |   3 +
> >  xen/drivers/passthrough/io.c           |  77 +++++++++++++-
> >  xen/drivers/passthrough/iommu.c        |  17 ++-
> >  xen/drivers/passthrough/vtd/intremap.c |  83 +++++++++++++++
> >  xen/drivers/passthrough/vtd/iommu.c    |  15 ++-
> >  xen/drivers/passthrough/vtd/iommu.h    |  23 ++++
> >  xen/include/asm-x86/hvm/hvm.h          |   1 +
> >  xen/include/asm-x86/hvm/vmx/vmcs.h     |  16 ++-
> >  xen/include/asm-x86/hvm/vmx/vmx.h      |  49 ++++++++-
> >  xen/include/asm-x86/iommu.h            |   2 +
> >  xen/include/xen/iommu.h                |   2 +-
> >  xen/include/xen/sched.h                |   5 +
> >  15 files changed, 485 insertions(+), 10 deletions(-)
> >
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection
  2015-03-26 18:12   ` Andrew Cooper
@ 2015-03-27  1:21     ` Wu, Feng
  2015-03-27 10:06       ` Andrew Cooper
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:21 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 2:13 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature
> detection
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > This patch adds feature detection logic for VT-d posted-interrupt.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/vtd/iommu.c | 15 +++++++++++++--
> >   xen/drivers/passthrough/vtd/iommu.h |  1 +
> >   2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> > index 891b9e3..86798a3 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -2030,6 +2030,7 @@ static int init_vtd_hw(void)
> >               if ( ioapic_to_iommu(IO_APIC_ID(apic)) == NULL )
> >               {
> >                   iommu_intremap = 0;
> > +                iommu_intpost = 0;
> >                   dprintk(XENLOG_ERR VTDPREFIX,
> >                       "ioapic_to_iommu: ioapic %#x (id: %#x) is NULL! "
> >                       "Will not try to enable Interrupt Remapping.\n",
> > @@ -2046,6 +2047,7 @@ static int init_vtd_hw(void)
> >               if ( enable_intremap(iommu, 0) != 0 )
> >               {
> >                   iommu_intremap = 0;
> > +                iommu_intpost = 0;
> >                   dprintk(XENLOG_WARNING VTDPREFIX,
> >                           "Interrupt Remapping not enabled\n");
> >
> > @@ -2119,8 +2121,8 @@ int __init intel_vtd_setup(void)
> >       }
> >
> >       /* We enable the following features only if they are supported by all
> VT-d
> > -     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
> > -     * Interrupt Remapping.
> > +     * engines: Snoop Control, DMA passthrough, Queued Invalidation,
> Interrupt
> > +     * Remapping, and Posted Interrupt
> >        */
> >       for_each_drhd_unit ( drhd )
> >       {
> > @@ -2146,7 +2148,13 @@ int __init intel_vtd_setup(void)
> >               iommu_qinval = 0;
> >
> >           if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> > +        {
> >               iommu_intremap = 0;
> > +            iommu_intpost = 0;
> > +        }
> > +
> > +        if ( iommu_intpost && !cap_intr_post(iommu->cap))
> 
> Missing space inside the outer bracket.
> 
> I am wondering whether it might be easier, instead of having
> "iommu_intremap = 0; iommu_intpost = 0" all over the place, to instead
> insist that one must check "iommu_intremap && iommu_intpost".

It that case, user must check "iommu_intremap && iommu_intpost" together,
my idea in this patchset is the when iommu_intpost == 1 guarantees "iommu_intremap == 1",
so we only need to check iommu_intpost later.

> 
> Out of interest, which platforms have intpost capabilities?

Another Broadwell platform, not launched yet.

Thanks,
Feng

> 
> ~Andrew
> 
> > +            iommu_intpost = 0;
> >
> >           if ( !vtd_ept_page_compatible(iommu) )
> >               iommu_hap_pt_share = 0;
> > @@ -2164,6 +2172,7 @@ int __init intel_vtd_setup(void)
> >       if ( !iommu_qinval && iommu_intremap )
> >       {
> >           iommu_intremap = 0;
> > +        iommu_intpost = 0;
> >           dprintk(XENLOG_WARNING VTDPREFIX, "Interrupt Remapping
> disabled "
> >               "since Queued Invalidation isn't supported or enabled.\n");
> >       }
> > @@ -2173,6 +2182,7 @@ int __init intel_vtd_setup(void)
> >       P(iommu_passthrough, "Dom0 DMA Passthrough");
> >       P(iommu_qinval, "Queued Invalidation");
> >       P(iommu_intremap, "Interrupt Remapping");
> > +    P(iommu_intpost, "Posted Interrupt");
> >       P(iommu_hap_pt_share, "Shared EPT tables");
> >   #undef P
> >
> > @@ -2192,6 +2202,7 @@ int __init intel_vtd_setup(void)
> >       iommu_passthrough = 0;
> >       iommu_qinval = 0;
> >       iommu_intremap = 0;
> > +    iommu_intpost = 0;
> >       return ret;
> >   }
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> > index d6e6520..42047e0 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -69,6 +69,7 @@
> >   /*
> >    * Decoding Capability Register
> >    */
> > +#define cap_intr_post(c)       (((c) >> 59) & 1)
> >   #define cap_read_drain(c)      (((c) >> 55) & 1)
> >   #define cap_write_drain(c)     (((c) >> 54) & 1)
> >   #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-03-26 18:37   ` Andrew Cooper
@ 2015-03-27  1:32     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:32 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 2:37 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 03/15] vmx: Extend struct pi_desc to support
> VT-d Posted-Interrupts
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > Extend struct pi_desc according to VT-d Posted-Interrupts Spec.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/include/asm-x86/hvm/vmx/vmcs.h | 16 ++++++++++++++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> > index 6fce6aa..9631461 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> > @@ -76,8 +76,20 @@ struct vmx_domain {
> >
> >   struct pi_desc {
> >       DECLARE_BITMAP(pir, NR_VECTORS);
> > -    u32 control;
> > -    u32 rsvd[7];
> > +    union {
> > +        struct
> > +        {
> > +        u64 on     : 1,
> 
> Could you put a comment on each line giving the non-abreviated name for
> the fields, similar to ept_entry_t.

Good sugguestion!

> 
> > +            sn     : 1,
> > +            rsvd_1 : 13,
> > +            ndm    : 1,
> 
> Which revision of the spec is this from?  The latest spec linked from
> your covering letter still doesn't identify this 'ndm' field.

Oh, 'ndm' means 'Notification Destination Mode' in the earlier version,
However, in the latest Spec, notification event is force to be delivered in physical
mode, so this field is not needed. In fact, in my patch, I also set 'ndm' to
physical mode. Anyway, I will change it. Thanks for the finding.

Thanks,
Feng

> 
> ~Andrew
> 
> > +            nv     : 8,
> > +            rsvd_2 : 8,
> > +            ndst   : 32;
> > +        };
> > +        u64 control;
> > +    };
> > +    u32 rsvd[6];
> >   } __attribute__ ((aligned (64)));
> >
> >   #define ept_get_wl(ept)   ((ept)->ept_wl)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-26 18:53   ` Andrew Cooper
@ 2015-03-27  1:45     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:45 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 2:54 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts
> Descriptor
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > This patch initializes the VT-d Posted-interrupt Descriptor.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
> >   xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
> >   2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index d614638..942f4b7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
> >
> >       if ( cpu_has_vmx_posted_intr_processing )
> >       {
> > +        if ( iommu_intpost == 1 )
> 
> Being a boolean, please shorten this to "if ( iommu_intpost )"
> 
> > +            pi_desc_init(v);
> > +
> >           __vmwrite(PI_DESC_ADDR,
> virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
> >           __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR,
> posted_intr_vector);
> >       }
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index ecc5e17..3cd75eb 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -28,6 +28,9 @@
> >   #include <asm/hvm/support.h>
> >   #include <asm/hvm/trace.h>
> >   #include <asm/hvm/vmx/vmcs.h>
> > +#include <asm/apic.h>
> > +
> > +extern uint8_t posted_intr_vector;
> >
> >   typedef union {
> >       struct {
> > @@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >       return xchg(&pi_desc->pir[group], 0);
> >   }
> >
> > +static inline void pi_desc_init(struct vcpu *v)
> > +{
> > +    uint32_t dest;
> > +
> > +    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);
> 
> This is called during construct_vmcs().  You safely rely on the fact
> that all memory is safely zeroed, so you don't need this clear bit, nor
> the lower setting of ndm.

Okay.

> 
> > +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> > +
> > +    /* Physical mode for Notificaiton Event */
> > +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> > +    dest = cpu_physical_id(v->processor);
> > +
> > +    if ( x2apic_enabled )
> > +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> > +    else
> > +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
> 
> What takes care of ensuring that this ndst field gets updated whenever
> v->processor changes?
> 

This is done in patch [13/15]

Thanks,
Feng

> ~Andrew
> 
> > +}
> > +
> >   /*
> >    * Exit Reasons
> >    */
> > @@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >   #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
> >
> >   extern u64 vmx_ept_vpid_cap;
> > -extern uint8_t posted_intr_vector;
> >
> >   #define cpu_has_vmx_ept_exec_only_supported        \
> >       (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-26 19:29   ` Konrad Rzeszutek Wilk
@ 2015-03-27  1:45     ` Wu, Feng
  2015-05-04  5:32     ` Wu, Feng
  1 sibling, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 3:29 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts
> Descriptor
> 
> On Wed, Mar 25, 2015 at 08:31:47PM +0800, Feng Wu wrote:
> > This patch initializes the VT-d Posted-interrupt Descriptor.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
> >  xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index d614638..942f4b7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> >      {
> > +        if ( iommu_intpost == 1 )
> 
> if ( iommu_intpost )
> 	.. ?
> > +            pi_desc_init(v);
> > +
> >          __vmwrite(PI_DESC_ADDR,
> virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
> >          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR,
> posted_intr_vector);
> >      }
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index ecc5e17..3cd75eb 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -28,6 +28,9 @@
> >  #include <asm/hvm/support.h>
> >  #include <asm/hvm/trace.h>
> >  #include <asm/hvm/vmx/vmcs.h>
> > +#include <asm/apic.h>
> > +
> > +extern uint8_t posted_intr_vector;
> >
> >  typedef union {
> >      struct {
> > @@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >      return xchg(&pi_desc->pir[group], 0);
> >  }
> >
> > +static inline void pi_desc_init(struct vcpu *v)
> > +{
> > +    uint32_t dest;
> > +
> > +    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);
> > +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> > +
> > +    /* Physical mode for Notificaiton Event */
> 
> s/Notificaiton/Notification/
> > +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> > +    dest = cpu_physical_id(v->processor);
> > +
> > +    if ( x2apic_enabled )
> > +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> > +    else
> > +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
> 
> Surely there are some macros for that?

Okay, I will find them.

Thanks,
Feng

> > +}
> > +
> >  /*
> >   * Exit Reasons
> >   */
> > @@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >  #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
> >
> >  extern u64 vmx_ept_vpid_cap;
> > -extern uint8_t posted_intr_vector;
> >
> >  #define cpu_has_vmx_ept_exec_only_supported        \
> >      (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-03-26 19:00   ` Andrew Cooper
@ 2015-03-27  1:53     ` Wu, Feng
  2015-03-27  9:58       ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 3:01 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 06/15] vt-d: Extend struct iremap_entry to
> support VT-d Posted-Interrupts
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/vtd/iommu.h | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> > index 42047e0..cd61e12 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -303,6 +303,18 @@ struct iremap_entry {
> >               res_2   : 8,
> >               dst     : 32;
> >       }lo;
> > +    struct {
> > +        u64 p       : 1,
> > +            fpd     : 1,
> > +            res_1   : 6,
> > +            avail   : 4,
> > +            res_2   : 2,
> > +            urg     : 1,
> > +            im      : 1,
> > +            vector  : 8,
> > +            res_3   : 14,
> > +            pda_l   : 26;
> > +    }lo_intpost;
> >     };
> >     union {
> >       u64 hi_val;
> > @@ -312,6 +324,13 @@ struct iremap_entry {
> >               svt     : 2,
> >               res_1   : 44;
> >       }hi;
> > +    struct {
> > +        u64 sid     : 16,
> > +            sq      : 2,
> > +            svt     : 2,
> > +            res_1   : 12,
> > +            pda_h   : 32;
> > +    }hi_intpost;
> 
> I would prefer if this union was reformatted as I suggested in the
> thread from your design doc, but I won't insist on it as a blocker to entry.

Thanks for the comments. I also considered your sugguestion on the Design
doc, here is your proposal:

struct iremap_entry {
    union {
        struct { u64 lo, hi; };
        struct { <bitfields> } norm; (names subject to improvement)
        struct { <bitfields> } post;
    };
};

Seems in that way, we need to change some existing code to adapt to
this new structure. I am okay with both of them, but can we listen
some voices form some others? Is it okay for you?

Thanks,
Feng

> 
> Please however name each of the fields with a comment.
> 
> ~Andrew
> 
> >     };
> >   };
> >

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-26 19:36   ` Konrad Rzeszutek Wilk
@ 2015-03-27  1:59     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  1:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 3:36 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d
> PI is used
> 
> On Wed, Mar 25, 2015 at 08:31:49PM +0800, Feng Wu wrote:
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/drivers/passthrough/vtd/intremap.c | 83
> ++++++++++++++++++++++++++++++++++
> >  xen/drivers/passthrough/vtd/iommu.h    |  3 ++
> >  xen/include/asm-x86/iommu.h            |  2 +
> >  3 files changed, 88 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> > index 0333686..f44e74d 100644
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
> >      for_each_drhd_unit ( drhd )
> >          disable_qinval(drhd->iommu);
> >  }
> > +
> > +/*
> > + * This function is used to update the IRTE for posted-interrupt
> > + * when guest changes MSI/MSI-X information
> 
> Missing period at the end.
> 
> > + */
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
> 
> There is an extra space at the end?
> > +{
> > +    struct irq_desc *desc;
> > +    struct msi_desc *msi_desc;
> > +    int remap_index, rc = -1;
> > +    struct pci_dev *pci_dev;
> > +    struct acpi_drhd_unit *drhd;
> > +    struct iommu *iommu;
> > +    struct ir_ctrl *ir_ctrl;
> > +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> > +    struct iremap_entry new_ire;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    unsigned long flags;
> > +
> > +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> > +    if ( !desc )
> > +        return -1;
> > +
> > +    msi_desc = desc->msi_desc;
> > +    if ( !msi_desc )
> > +        goto unlock_out;
> > +
> > +    remap_index = msi_desc->remap_index;
> 
> Could you move this right below the 'goto' check? No point
> of doing this if pci_dev is NULL.
> 
> > +    pci_dev = msi_desc->dev;
> > +    if ( !pci_dev )
> > +        goto unlock_out;
> > +
> > +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> > +    if (!drhd)
> 
> Missing spaces around !drhd.
> 
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
> 
> Perhaps a bit more data. Can you include the pci_dev BDF as well?
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    iommu = drhd->iommu;
> > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > +    if ( !ir_ctrl )
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
> 
> .. for IOMMU (with some data that can help diagnose the issue when one
> boots with 'iommu=verbose') please.
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> > +
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> > +
> > +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> > +
> > +    /* Setup/Update interrupt remapping table entry */
> 
> Missing period at the end.
> > +    new_ire.lo_intpost.urg = 0;
> > +    new_ire.lo_intpost.vector = gvec;
> > +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> > +                                (32 - PDA_LOW_BIT)) & ~(-1UL <<
> PDA_LOW_BIT);
> > +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> 
> You have an extra space after >>
> > +                                ~(-1UL << PDA_HIGH_BIT);
> 
> You can make ~(-1UL << PDA_XX_BIT) and macro here..

Thanks for all the comments!

Thanks,
Feng

> 
> > +
> > +    new_ire.lo_intpost.res_1 = 0;
> > +    new_ire.lo_intpost.res_2 = 0;
> > +    new_ire.lo_intpost.res_3 = 0;
> > +    new_ire.hi_intpost.res_1 = 0;
> > +
> > +    new_ire.lo_intpost.im = 1;
> > +
> > +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
> > +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
> > +    iommu_flush_iec_index(iommu, 0, remap_index);
> > +
> > +    if ( iremap_entries )
> > +        unmap_vtd_domain_page(iremap_entries);
> > +
> > +    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
> > +
> > +    rc = 0;
> > + unlock_out:
> > +    spin_unlock_irq(&desc->lock);
> > +
> > +    return rc;
> > +}
> > diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> > index cd61e12..ffa72c8 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -334,6 +334,9 @@ struct iremap_entry {
> >    };
> >  };
> >
> > +#define PDA_LOW_BIT    26
> > +#define PDA_HIGH_BIT   32
> > +
> >  /* Max intr remapping table page order is 8, as max number of IRTEs is 64K
> */
> >  #define IREMAP_PAGE_ORDER  8
> >
> > diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> > index e7a65da..d233621 100644
> > --- a/xen/include/asm-x86/iommu.h
> > +++ b/xen/include/asm-x86/iommu.h
> > @@ -32,6 +32,8 @@ int iommu_supports_eim(void);
> >  int iommu_enable_x2apic_IR(void);
> >  void iommu_disable_x2apic_IR(void);
> >
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
> > +
> >  #endif /* !__ARCH_X86_IOMMU_H__ */
> >  /*
> >   * Local variables:
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-26 19:17   ` Andrew Cooper
@ 2015-03-27  2:13     ` Wu, Feng
  2015-03-27 10:02       ` Jan Beulich
  2015-03-27  4:52     ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  2:13 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 3:17 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d
> PI is used
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/vtd/intremap.c | 83
> ++++++++++++++++++++++++++++++++++
> >   xen/drivers/passthrough/vtd/iommu.h    |  3 ++
> >   xen/include/asm-x86/iommu.h            |  2 +
> >   3 files changed, 88 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> > index 0333686..f44e74d 100644
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
> >       for_each_drhd_unit ( drhd )
> >           disable_qinval(drhd->iommu);
> >   }
> > +
> > +/*
> > + * This function is used to update the IRTE for posted-interrupt
> > + * when guest changes MSI/MSI-X information
> > + */
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
> 
> Stray space after gvec.
> 
> I presume gvec means "guest vector", in which case it should be a uint8_t.
> 
> > +{
> > +    struct irq_desc *desc;
> > +    struct msi_desc *msi_desc;
> > +    int remap_index, rc = -1;
> > +    struct pci_dev *pci_dev;
> > +    struct acpi_drhd_unit *drhd;
> > +    struct iommu *iommu;
> > +    struct ir_ctrl *ir_ctrl;
> > +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> > +    struct iremap_entry new_ire;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    unsigned long flags;
> > +
> > +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> > +    if ( !desc )
> > +        return -1;
> > +
> > +    msi_desc = desc->msi_desc;
> > +    if ( !msi_desc )
> > +        goto unlock_out;
> > +
> > +    remap_index = msi_desc->remap_index;
> > +    pci_dev = msi_desc->dev;
> > +    if ( !pci_dev )
> > +        goto unlock_out;
> > +
> > +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> 
> This does an O(n^2) walk over the dhrd_units list and the unit
> device_cnt.  As the layout will be completely static, might it be better
> to stash drhd information in struct pci_dev during boot?
> 
> > +    if (!drhd)
> 
> Spaces inside brackets.
> 
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
> 
> This is a useless error message, providing no useful information to
> identify the issue.
> 
> At the very least it should identify the domain and vcpu (%pv of v), the
> guest vector and pci device.  Also, drop the exclamation mark - it is
> not useful in the log.
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    iommu = drhd->iommu;
> > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > +    if ( !ir_ctrl )
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
> 
> Needs simiarly extending with some information.
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> > +
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> > +
> > +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> 
> sizeof(new_ire) please
> 
> > +
> > +    /* Setup/Update interrupt remapping table entry */
> > +    new_ire.lo_intpost.urg = 0;
> > +    new_ire.lo_intpost.vector = gvec;
> > +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> > +                                (32 - PDA_LOW_BIT)) & ~(-1UL <<
> PDA_LOW_BIT);
> > +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> > +                                ~(-1UL << PDA_HIGH_BIT);
> 
> Is it possible to hide this bit manipulation in side a static inline
> function which takes an ire and pi_desc pointer rather than open coding it?
> 
> > +
> > +    new_ire.lo_intpost.res_1 = 0;
> > +    new_ire.lo_intpost.res_2 = 0;
> > +    new_ire.lo_intpost.res_3 = 0;
> > +    new_ire.hi_intpost.res_1 = 0;
> > +
> > +    new_ire.lo_intpost.im = 1;
> > +
> > +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
> > +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
> 
> Same here regarding sizeof.
> 
> Furthermore, is the memcpy() safe to update the live descriptor?
> 
> If it is, why do you not update the descriptor in place using p-> rather
> than copying it to the stack and back?

There are some example in the current code of updating IRTE, such as
in ioapic_rte_to_remap_entry(), I try to follow it here.

Thanks,
Feng

> 
> ~Andrew
> 
> > +    iommu_flush_iec_index(iommu, 0, remap_index);
> > +
> > +    if ( iremap_entries )
> > +        unmap_vtd_domain_page(iremap_entries);
> > +
> > +    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
> > +
> > +    rc = 0;
> > + unlock_out:
> > +    spin_unlock_irq(&desc->lock);
> > +
> > +    return rc;
> > +}
> > diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> > index cd61e12..ffa72c8 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -334,6 +334,9 @@ struct iremap_entry {
> >     };
> >   };
> >
> > +#define PDA_LOW_BIT    26
> > +#define PDA_HIGH_BIT   32
> > +
> >   /* Max intr remapping table page order is 8, as max number of IRTEs is
> 64K */
> >   #define IREMAP_PAGE_ORDER  8
> >
> > diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> > index e7a65da..d233621 100644
> > --- a/xen/include/asm-x86/iommu.h
> > +++ b/xen/include/asm-x86/iommu.h
> > @@ -32,6 +32,8 @@ int iommu_supports_eim(void);
> >   int iommu_enable_x2apic_IR(void);
> >   void iommu_disable_x2apic_IR(void);
> >
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
> > +
> >   #endif /* !__ARCH_X86_IOMMU_H__ */
> >   /*
> >    * Local variables:

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-03-26 20:16   ` Andrew Cooper
@ 2015-03-27  2:59     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  2:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 4:16 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 13/15] Update Posted-Interrupts Descriptor
> during vCPU scheduling
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > The basic idea here is:
> > 1. When vCPU's state is RUNSTATE_running,
> >          - set 'NV' to 'Notification Vector'.
> >          - Clear 'SN' to accpet PI.
> >          - set 'NDST' to the right pCPU.
> > 2. When vCPU's state is RUNSTATE_blocked,
> >          - set 'NV' to 'Wake-up Vector', so we can wake up the
> >            related vCPU when posted-interrupt happens for it.
> >          - Clear 'SN' to accpet PI.
> > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> >          - Set 'SN' to suppress non-urgent interrupts.
> >            (Current, we only support non-urgent interrupts)
> >          - Set 'NV' back to 'Notification Vector' if needed.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/arch/x86/hvm/vmx/vmx.c | 108
> +++++++++++++++++++++++++++++++++++++++++++++
> >   xen/common/schedule.c      |   3 ++
> >   2 files changed, 111 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b30392c..6323bd6 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
> >       __vmwrite(GUEST_INTR_STATUS, status);
> >   }
> >
> > +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    struct pi_desc old, new;
> > +    int old_state = v->runstate.state;
> > +    unsigned long flags;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> > +
> > +    switch ( new_state )
> > +    {
> > +    case RUNSTATE_runnable:
> > +    case RUNSTATE_offline:
> > +        /*
> > +         * We don't need to send notification event to a non-running
> > +         * vcpu, the interrupt information will be delivered to it before
> > +         * VM-ENTRY when the vcpu is scheduled to run next time.
> > +         */
> > +        pi_set_sn(pi_desc);
> > +
> > +        /*
> > +         * If the state is transferred from RUNSTATE_blocked,
> > +         * we should set 'NV' feild back to posted_intr_vector,
> > +         * so the Posted-Interrupts can be delivered to the vCPU
> > +         * by VT-d HW after it is scheduled to run.
> > +         */
> > +        if ( old_state == RUNSTATE_blocked )
> > +        {
> > +            do
> > +            {
> > +                old.control = new.control = pi_desc->control;
> > +                new.nv = posted_intr_vector;
> > +            }
> > +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                    != old.control );
> > +
> > +           /*
> > +            * Delete the vCPU from the related wakeup queue
> > +            * if we are resuming from blocked state
> > +            */
> > +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                             v->processor), flags);
> > +           list_del(&v->blocked_vcpu_list);
> 
> The scheduler is perfectly able to change v->processor behind your back,
> so your spinlocks are not protecting access to v->blocked_vcpu_list.
> 

Oh, do you mean the following?

1) When we stored the vCPU in list of v->processor when vCPU is blocked.
2) Scheduler changes v->processor.
3) Oops, when vCPU is unblocked, we still use the old v->processor to find the vCPU.

I think I can use the same method of what I am doing for KVM. I can introduce a new
member v->blocked_cpu which stored the blocking pCPU info, then put vCPU in this list,
and this variable is not changed by the scheduler.

Thanks,
Feng


> ~Andrew
> 
> > +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                  v->processor), flags);
> > +        }
> > +        break;
> > +
> > +    case RUNSTATE_blocked:
> > +        /*
> > +         * The vCPU is blocked on the wait queue.
> > +         * Store the blocked vCPU on the list of the
> > +         * vcpu->wakeup_cpu, which is the destination
> > +         * of the wake-up notification event.
> > +         */
> > +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                          v->processor), flags);
> > +        list_add_tail(&v->blocked_vcpu_list,
> > +                      &per_cpu(blocked_vcpu_on_cpu, v->processor));
> > +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                               v->processor), flags);
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +
> > +            /*
> > +             * We should not block the vCPU if
> > +             * an interrupt is posted for it.
> > +             */
> > +
> > +            if ( pi_test_on(&old) == 1 )
> > +            {
> > +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > +                return;
> > +            }
> > +
> > +            pi_clear_sn(&new);
> > +            new.nv = pi_wakeup_vector;
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> > +        break;
> > +
> > +    case RUNSTATE_running:
> > +        ASSERT( pi_test_sn(pi_desc) == 1 );
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +            if ( x2apic_enabled )
> > +                new.ndst = cpu_physical_id(v->processor);
> > +            else
> > +                new.ndst = (cpu_physical_id(v->processor) << 8) &
> 0xFF00;
> > +
> > +            pi_clear_sn(&new);
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> > +        break;
> > +
> > +    default:
> > +        break;
> > +    }
> > +}
> > +
> >   void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
> >                                  uint32_t *eax, uint32_t *ebx,
> >                                  uint32_t *ecx, uint32_t *edx)
> > @@ -1795,6 +1902,7 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> >       .process_isr          = vmx_process_isr,
> >       .deliver_posted_intr  = vmx_deliver_posted_intr,
> >       .sync_pir_to_irr      = vmx_sync_pir_to_irr,
> > +    .pi_desc_update       = vmx_pi_desc_update,
> >       .handle_eoi           = vmx_handle_eoi,
> >       .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
> >       .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index ef79847..acf3186 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -157,6 +157,9 @@ static inline void vcpu_runstate_change(
> >           v->runstate.state_entry_time = new_entry_time;
> >       }
> >
> > +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> > +        hvm_funcs.pi_desc_update(v, new_state);
> > +
> >       v->runstate.state = new_state;
> >   }
> >

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-26 20:34   ` Andrew Cooper
@ 2015-03-27  3:00     ` Wu, Feng
  2015-03-27 12:06       ` Andrew Cooper
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  3:00 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 4:35 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when 'SN' is
> set
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > Currently, we don't support urgent interrupt, all interrupts
> > are recognized as non-urgent interrupt, so we cannot send
> > posted-interrupt when 'SN' is set.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/arch/x86/hvm/vmx/vmx.c | 13 ++++++++++++-
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 6323bd6..40c7b0e 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1663,9 +1663,20 @@ static void
> __vmx_deliver_posted_interrupt(struct vcpu *v)
> >
> >   static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> >   {
> > +    int r, sn;
> > +
> >       if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> >           return;
> >
> > +    /*
> > +     * Currently, we don't support urgent interrupt, all interrupts
> > +     * are recognized as non-urgent interrupt, so we cannot send
> > +     * posted-interrupt when 'SN' is set.
> > +     */
> > +
> > +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
> 
> Is there anywhere which sets sn at all? I cant spot anywhere.
> 

SN is set in [13/15] while vCPU is going to runnable state.

Thanks,
Feng

> ~Andrew
> 
> > +    r = pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc);
> > +
> >       if ( unlikely(v->arch.hvm_vmx.eoi_exitmap_changed) )
> >       {
> >           /*
> > @@ -1675,7 +1686,7 @@ static void vmx_deliver_posted_intr(struct vcpu *v,
> u8 vector)
> >            */
> >           pi_set_on(&v->arch.hvm_vmx.pi_desc);
> >       }
> > -    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> > +    else if ( !r && !sn )
> >       {
> >           __vmx_deliver_posted_interrupt(v);
> >           return;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-26 19:57   ` Konrad Rzeszutek Wilk
@ 2015-03-27  3:06     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  3:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 3:57 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event
> when vCPU is running
> 
> On Wed, Mar 25, 2015 at 08:31:54PM +0800, Feng Wu wrote:
> > When a vCPU is running in Root mode and a notification event
> > has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
> > the current cpu, so the pending interrupt in PIRR will be
> > synced to vIRR before VM-Exit in time.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
> >  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b2b4c26..b30392c 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1838,7 +1838,7 @@ const struct hvm_function_table * __init
> start_vmx(void)
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> >      {
> > -        alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt);
> > +        alloc_direct_apic_vector(&posted_intr_vector,
> pi_notification_interrupt);
> >
> >          if ( iommu_intpost )
> >              alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> > @@ -3288,6 +3288,28 @@ void pi_wakeup_interrupt(struct cpu_user_regs
> *regs)
> >  }
> >
> >  /*
> > + * Handle VT-d posted-interrupt when VCPU is running.
> > + */
> > +
> > +void pi_notification_interrupt(struct cpu_user_regs *regs)
> > +{
> > +    /*
> > +     * We get here because a vCPU is running in Root mode
> > +     * and a notification event has been injected to it.
> > +     *
> > +     * we need to set VCPU_KICK_SOFTIRQ for the current
> > +     * cpu, just like __vmx_deliver_posted_interrupt().
> > +     *
> > +     * So the pending interrupt in PIRR will be synced to
> > +     * vIRR before VM-Exit in time.
> > +     */
> > +    set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(smp_processor_id()));
> 
> Could you use the 'raise_softirq' instead?

Sure!

Thanks,
Feng

> 
> > +
> > +    ack_APIC_irq();
> > +    this_cpu(irq_count)++;
> > +}
> > +
> > +/*
> >   * Local variables:
> >   * mode: C
> >   * c-file-style: "BSD"
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index f4296ab..e53275b 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -576,6 +576,7 @@ void free_p2m_hap_data(struct p2m_domain *p2m);
> >  void p2m_init_hap_data(struct p2m_domain *p2m);
> >
> >  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> > +void pi_notification_interrupt(struct cpu_user_regs *regs);
> >
> >  /* EPT violation qualifications definitions */
> >  #define _EPT_READ_VIOLATION         0
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-25 14:14   ` Zhang, Yang Z
@ 2015-03-27  4:40     ` Wu, Feng
  2015-03-27  4:44       ` Zhang, Yang Z
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  4:40 UTC (permalink / raw)
  To: Zhang, Yang Z, xen-devel; +Cc: Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Wednesday, March 25, 2015 10:15 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when vCPU
> is running
> 
> Wu, Feng wrote on 2015-03-25:
> > When a vCPU is running in Root mode and a notification event has been
> > injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu,
> > so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in time.
> 
> Shouldn't the pending interrupt be synced unconditionally before next vmentry?
> What happens if we didn't set the softirq?

If we didn't set the softirq in the notification handler, the interrupts happened exactly
before VM-entry cannot be delivered to guest at this time. Please see the following
code fragments from xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)

.Lvmx_do_vmentry

......
		/* If Vt-d engine issues a notification event here,
         * it cannot be delivered to guest during this VM-entry
         * without raising the softirq in notification handler. */

        cmp  %ecx,(%rdx,%rax,1)
        jnz  .Lvmx_process_softirqs

......

        je   .Lvmx_launch

......


.Lvmx_process_softirqs:
        sti
        call do_softirq
        jmp  .Lvmx_do_vmentry

Thanks,
Feng

> 
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
> >  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
> > b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
> > hvm_function_table * __init start_vmx(void)
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> >      {
> > -        alloc_direct_apic_vector(&posted_intr_vector,
> > event_check_interrupt); +
> > alloc_direct_apic_vector(&posted_intr_vector, +
> > pi_notification_interrupt);
> >
> >          if ( iommu_intpost )
> >              alloc_direct_apic_vector(&pi_wakeup_vector,
> > pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
> > pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
> >
> >  /*
> > + * Handle VT-d posted-interrupt when VCPU is running. + */ + +void
> > pi_notification_interrupt(struct cpu_user_regs *regs) { +    /* +     *
> > We get here because a vCPU is running in Root mode +     * and a
> > notification event has been injected to it. +     * +     * we need to
> > set VCPU_KICK_SOFTIRQ for the current +     * cpu, just like
> > __vmx_deliver_posted_interrupt(). +     * +     * So the pending
> > interrupt in PIRR will be synced to +     * vIRR before VM-Exit in time.
> > +     */ +    set_bit(VCPU_KICK_SOFTIRQ,
> > &softirq_pending(smp_processor_id())); + +    ack_APIC_irq(); +
> > this_cpu(irq_count)++; +} + +/*
> >   * Local variables:
> >   * mode: C
> >   * c-file-style: "BSD"
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> > b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
> > a/xen/include/asm-x86/hvm/vmx/vmx.h +++
> > b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
> > free_p2m_hap_data(struct p2m_domain *p2m); void
> p2m_init_hap_data(struct
> > p2m_domain *p2m);
> >
> >  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> > +void pi_notification_interrupt(struct cpu_user_regs *regs);
> >
> >  /* EPT violation qualifications definitions */
> >  #define _EPT_READ_VIOLATION         0
> 
> 
> Best regards,
> Yang
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-27  4:40     ` Wu, Feng
@ 2015-03-27  4:44       ` Zhang, Yang Z
  2015-03-27  4:57         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Zhang, Yang Z @ 2015-03-27  4:44 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Tian, Kevin, keir, JBeulich

Wu, Feng wrote on 2015-03-27:
> 
> 
> Zhang, Yang Z wrote on 2015-03-25:
>> when vCPU is running
>> 
>> Wu, Feng wrote on 2015-03-25:
>>> When a vCPU is running in Root mode and a notification event has
>>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
>>> current cpu, so the pending interrupt in PIRR will be synced to
>>> vIRR before
> VM-Exit in time.
>> 
>> Shouldn't the pending interrupt be synced unconditionally before next
>> vmentry? What happens if we didn't set the softirq?
> 
> If we didn't set the softirq in the notification handler, the
> interrupts happened exactly before VM-entry cannot be delivered to
> guest at this time. Please see the following code fragments from
> xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> 
> .Lvmx_do_vmentry
> 
> ......
> 		/* If Vt-d engine issues a notification event here,
>          * it cannot be delivered to guest during this VM-entry
>          * without raising the softirq in notification handler. */
>         cmp  %ecx,(%rdx,%rax,1)
>         jnz  .Lvmx_process_softirqs
> ......
> 
>         je   .Lvmx_launch
> ......
> 
> 
> .Lvmx_process_softirqs:
>         sti
>         call do_softirq
>         jmp  .Lvmx_do_vmentry

You are right! This helps me to recall why raise the softirq when delivering the PI.

> Thanks,
> Feng
> 
>> 
>>> 
>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>> ---
>>>  xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++++++++++++++++-
>>>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>>>  2 files changed, 24 insertions(+), 1 deletion(-) diff --git
>>> a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index
>>> b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
>>> b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
>>> hvm_function_table * __init start_vmx(void)
>>> 
>>>      if ( cpu_has_vmx_posted_intr_processing )
>>>      {
>>> -        alloc_direct_apic_vector(&posted_intr_vector,
>>> event_check_interrupt); +
>>> alloc_direct_apic_vector(&posted_intr_vector, +
>>> pi_notification_interrupt);
>>> 
>>>          if ( iommu_intpost )
>>>              alloc_direct_apic_vector(&pi_wakeup_vector,
>>> pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
>>> pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
>>> 
>>>  /*
>>> + * Handle VT-d posted-interrupt when VCPU is running. + */ +
>>> + +void
>>> pi_notification_interrupt(struct cpu_user_regs *regs) { +    /* +     *
>>> We get here because a vCPU is running in Root mode +     * and a
>>> notification event has been injected to it. +     * +     * we need to
>>> set VCPU_KICK_SOFTIRQ for the current +     * cpu, just like
>>> __vmx_deliver_posted_interrupt(). +     * +     * So the pending
>>> interrupt in PIRR will be synced to +     * vIRR before VM-Exit in time.
>>> +     */ +    set_bit(VCPU_KICK_SOFTIRQ,
>>> &softirq_pending(smp_processor_id())); + +    ack_APIC_irq(); +
>>> this_cpu(irq_count)++; +} + +/*
>>>   * Local variables:
>>>   * mode: C
>>>   * c-file-style: "BSD"
>>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
>>> b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
>>> a/xen/include/asm-x86/hvm/vmx/vmx.h +++
>>> b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
>>> free_p2m_hap_data(struct p2m_domain *p2m); void
>>> p2m_init_hap_data(struct p2m_domain *p2m);
>>> 
>>>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
>>> +void pi_notification_interrupt(struct cpu_user_regs *regs);
>>> 
>>>  /* EPT violation qualifications definitions */
>>>  #define _EPT_READ_VIOLATION         0
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-03-26 17:39   ` Andrew Cooper
@ 2015-03-27  4:46     ` Wu, Feng
  2015-03-27  9:55       ` Andrew Cooper
  2015-03-27  9:52     ` Jan Beulich
  1 sibling, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  4:46 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 1:40 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 01/15] iommu: Add iommu_intpost to control
> VT-d Posted-Interrupts feature
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > This patch adds variable 'iommu_intpost' to control whether enable VT-d
> > posted-interrupt or not in the generic IOMMU code.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/iommu.c | 11 ++++++++++-
> >   xen/include/xen/iommu.h         |  2 +-
> >   2 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> > index 92ea26f..302e3e4 100644
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -39,6 +39,7 @@ static void iommu_dump_p2m_table(unsigned char
> key);
> >    *   no-snoop                   Disable VT-d Snoop Control
> >    *   no-qinval                  Disable VT-d Queued Invalidation
> >    *   no-intremap                Disable VT-d Interrupt Remapping
> > + *   no-intpost                 Disable VT-d Interrupt posting
> >    */
> >   custom_param("iommu", parse_iommu_param);
> >   bool_t __initdata iommu_enable = 1;
> > @@ -51,6 +52,7 @@ bool_t __read_mostly iommu_passthrough;
> >   bool_t __read_mostly iommu_snoop = 1;
> >   bool_t __read_mostly iommu_qinval = 1;
> >   bool_t __read_mostly iommu_intremap = 1;
> > +bool_t __read_mostly iommu_intpost = 0;
> >   bool_t __read_mostly iommu_hap_pt_share = 1;
> >   bool_t __read_mostly iommu_debug;
> >   bool_t __read_mostly amd_iommu_perdev_intremap = 1;
> > @@ -94,7 +96,11 @@ static void __init parse_iommu_param(char *s)
> >           else if ( !strcmp(s, "qinval") )
> >               iommu_qinval = val;
> >           else if ( !strcmp(s, "intremap") )
> > +        {
> >               iommu_intremap = val;
> > +            if ( iommu_intremap == 0 )
> > +                iommu_intpost = 0;
> > +        }
> >           else if ( !strcmp(s, "debug") )
> >           {
> >               iommu_debug = val;
> 
> At no point here do you add an strcmp(s, "intpost"), which means that
> you do not alter the allowable command line syntax.
> 
> intpost must be able to be controlled independently of intremap, so I
> suggest
> 
> else if ( !strcmp(s, "intpost") )
>      iommu_intpost = val;

In this patch, I only define variable 'iommu_qinval' and implement some
logics related to it, while command line 'intpost' is added and checked in
patch [15/15]. So before patch [15/15] these patches don't take effect.

Thanks,
Feng

> 
> and after the while loop,
> 
> if ( !iommu_intremap )
>      iommu_intpost = 0;
> 
> To ensure that intpost is never 1 if intremap is 0.
> 
> Also, you must adjust the documentation in
> docs/misc/xen-command-line.markdown
> 
> ~Andrew
> 
> > @@ -272,7 +278,10 @@ int __init iommu_setup(void)
> >           iommu_enabled = (rc == 0);
> >       }
> >       if ( !iommu_enabled )
> > +    {
> >           iommu_intremap = 0;
> > +        iommu_intpost = 0;
> > +    }
> >
> >       if ( (force_iommu && !iommu_enabled) ||
> >            (force_intremap && !iommu_intremap) )
> > @@ -341,7 +350,7 @@ void iommu_crash_shutdown(void)
> >       const struct iommu_ops *ops = iommu_get_ops();
> >       if ( iommu_enabled )
> >           ops->crash_shutdown();
> > -    iommu_enabled = iommu_intremap = 0;
> > +    iommu_enabled = iommu_intremap = iommu_intpost = 0;
> >   }
> >
> >   bool_t iommu_has_feature(struct domain *d, enum iommu_feature
> feature)
> > diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> > index bf4aff0..91063bb 100644
> > --- a/xen/include/xen/iommu.h
> > +++ b/xen/include/xen/iommu.h
> > @@ -31,7 +31,7 @@
> >   extern bool_t iommu_enable, iommu_enabled;
> >   extern bool_t force_iommu, iommu_verbose;
> >   extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
> > -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
> > +extern bool_t iommu_snoop, iommu_qinval, iommu_intremap,
> iommu_intpost;
> >   extern bool_t iommu_hap_pt_share;
> >   extern bool_t iommu_debug;
> >   extern bool_t amd_iommu_perdev_intremap;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-26 19:17   ` Andrew Cooper
  2015-03-27  2:13     ` Wu, Feng
@ 2015-03-27  4:52     ` Wu, Feng
  1 sibling, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  4:52 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 3:17 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d
> PI is used
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/vtd/intremap.c | 83
> ++++++++++++++++++++++++++++++++++
> >   xen/drivers/passthrough/vtd/iommu.h    |  3 ++
> >   xen/include/asm-x86/iommu.h            |  2 +
> >   3 files changed, 88 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> > index 0333686..f44e74d 100644
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
> >       for_each_drhd_unit ( drhd )
> >           disable_qinval(drhd->iommu);
> >   }
> > +
> > +/*
> > + * This function is used to update the IRTE for posted-interrupt
> > + * when guest changes MSI/MSI-X information
> > + */
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
> 
> Stray space after gvec.
> 
> I presume gvec means "guest vector", in which case it should be a uint8_t.
> 
> > +{
> > +    struct irq_desc *desc;
> > +    struct msi_desc *msi_desc;
> > +    int remap_index, rc = -1;
> > +    struct pci_dev *pci_dev;
> > +    struct acpi_drhd_unit *drhd;
> > +    struct iommu *iommu;
> > +    struct ir_ctrl *ir_ctrl;
> > +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> > +    struct iremap_entry new_ire;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    unsigned long flags;
> > +
> > +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> > +    if ( !desc )
> > +        return -1;
> > +
> > +    msi_desc = desc->msi_desc;
> > +    if ( !msi_desc )
> > +        goto unlock_out;
> > +
> > +    remap_index = msi_desc->remap_index;
> > +    pci_dev = msi_desc->dev;
> > +    if ( !pci_dev )
> > +        goto unlock_out;
> > +
> > +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> 
> This does an O(n^2) walk over the dhrd_units list and the unit
> device_cnt.  As the layout will be completely static, might it be better
> to stash drhd information in struct pci_dev during boot?

This might be a good suggestion. Since acpi_find_matched_drhd_unit() is widely
used in the current code for this purpose, I think we can define this as an independent
work if this is really doable.

Thanks,
Feng

> 
> > +    if (!drhd)
> 
> Spaces inside brackets.
> 
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
> 
> This is a useless error message, providing no useful information to
> identify the issue.
> 
> At the very least it should identify the domain and vcpu (%pv of v), the
> guest vector and pci device.  Also, drop the exclamation mark - it is
> not useful in the log.
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    iommu = drhd->iommu;
> > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > +    if ( !ir_ctrl )
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
> 
> Needs simiarly extending with some information.
> 
> > +        goto unlock_out;
> > +    }
> > +
> > +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> > +
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> > +
> > +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> 
> sizeof(new_ire) please
> 
> > +
> > +    /* Setup/Update interrupt remapping table entry */
> > +    new_ire.lo_intpost.urg = 0;
> > +    new_ire.lo_intpost.vector = gvec;
> > +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> > +                                (32 - PDA_LOW_BIT)) & ~(-1UL <<
> PDA_LOW_BIT);
> > +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> > +                                ~(-1UL << PDA_HIGH_BIT);
> 
> Is it possible to hide this bit manipulation in side a static inline
> function which takes an ire and pi_desc pointer rather than open coding it?
> 
> > +
> > +    new_ire.lo_intpost.res_1 = 0;
> > +    new_ire.lo_intpost.res_2 = 0;
> > +    new_ire.lo_intpost.res_3 = 0;
> > +    new_ire.hi_intpost.res_1 = 0;
> > +
> > +    new_ire.lo_intpost.im = 1;
> > +
> > +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
> > +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
> 
> Same here regarding sizeof.
> 
> Furthermore, is the memcpy() safe to update the live descriptor?
> 
> If it is, why do you not update the descriptor in place using p-> rather
> than copying it to the stack and back?
> 
> ~Andrew
> 
> > +    iommu_flush_iec_index(iommu, 0, remap_index);
> > +
> > +    if ( iremap_entries )
> > +        unmap_vtd_domain_page(iremap_entries);
> > +
> > +    spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
> > +
> > +    rc = 0;
> > + unlock_out:
> > +    spin_unlock_irq(&desc->lock);
> > +
> > +    return rc;
> > +}
> > diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> > index cd61e12..ffa72c8 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -334,6 +334,9 @@ struct iremap_entry {
> >     };
> >   };
> >
> > +#define PDA_LOW_BIT    26
> > +#define PDA_HIGH_BIT   32
> > +
> >   /* Max intr remapping table page order is 8, as max number of IRTEs is
> 64K */
> >   #define IREMAP_PAGE_ORDER  8
> >
> > diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> > index e7a65da..d233621 100644
> > --- a/xen/include/asm-x86/iommu.h
> > +++ b/xen/include/asm-x86/iommu.h
> > @@ -32,6 +32,8 @@ int iommu_supports_eim(void);
> >   int iommu_enable_x2apic_IR(void);
> >   void iommu_disable_x2apic_IR(void);
> >
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec);
> > +
> >   #endif /* !__ARCH_X86_IOMMU_H__ */
> >   /*
> >    * Local variables:

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-27  4:44       ` Zhang, Yang Z
@ 2015-03-27  4:57         ` Wu, Feng
  2015-04-02  6:08           ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  4:57 UTC (permalink / raw)
  To: Zhang, Yang Z, xen-devel; +Cc: Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, March 27, 2015 12:44 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when vCPU
> is running
> 
> Wu, Feng wrote on 2015-03-27:
> >
> >
> > Zhang, Yang Z wrote on 2015-03-25:
> >> when vCPU is running
> >>
> >> Wu, Feng wrote on 2015-03-25:
> >>> When a vCPU is running in Root mode and a notification event has
> >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> >>> current cpu, so the pending interrupt in PIRR will be synced to
> >>> vIRR before
> > VM-Exit in time.
> >>
> >> Shouldn't the pending interrupt be synced unconditionally before next
> >> vmentry? What happens if we didn't set the softirq?
> >
> > If we didn't set the softirq in the notification handler, the
> > interrupts happened exactly before VM-entry cannot be delivered to
> > guest at this time. Please see the following code fragments from
> > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> >
> > .Lvmx_do_vmentry
> >
> > ......
> > 		/* If Vt-d engine issues a notification event here,
> >          * it cannot be delivered to guest during this VM-entry
> >          * without raising the softirq in notification handler. */
> >         cmp  %ecx,(%rdx,%rax,1)
> >         jnz  .Lvmx_process_softirqs
> > ......
> >
> >         je   .Lvmx_launch
> > ......
> >
> >
> > .Lvmx_process_softirqs:
> >         sti
> >         call do_softirq
> >         jmp  .Lvmx_do_vmentry
> 
> You are right! This helps me to recall why raise the softirq when delivering the
> PI.

Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it sets the
softirq for this purpose, however, when VT-d HW delivers PI, we have no control to
the HW itself, hence we need to set this softirq in the Notification Event handler.

Thanks,
Feng

> 
> > Thanks,
> > Feng
> >
> >>
> >>>
> >>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> >>> ---
> >>>  xen/arch/x86/hvm/vmx/vmx.c        | 24
> +++++++++++++++++++++++-
> >>>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
> >>>  2 files changed, 24 insertions(+), 1 deletion(-) diff --git
> >>> a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index
> >>> b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
> >>> b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
> >>> hvm_function_table * __init start_vmx(void)
> >>>
> >>>      if ( cpu_has_vmx_posted_intr_processing )
> >>>      {
> >>> -        alloc_direct_apic_vector(&posted_intr_vector,
> >>> event_check_interrupt); +
> >>> alloc_direct_apic_vector(&posted_intr_vector, +
> >>> pi_notification_interrupt);
> >>>
> >>>          if ( iommu_intpost )
> >>>              alloc_direct_apic_vector(&pi_wakeup_vector,
> >>> pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
> >>> pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
> >>>
> >>>  /*
> >>> + * Handle VT-d posted-interrupt when VCPU is running. + */ +
> >>> + +void
> >>> pi_notification_interrupt(struct cpu_user_regs *regs) { +    /* +     *
> >>> We get here because a vCPU is running in Root mode +     * and a
> >>> notification event has been injected to it. +     * +     * we need to
> >>> set VCPU_KICK_SOFTIRQ for the current +     * cpu, just like
> >>> __vmx_deliver_posted_interrupt(). +     * +     * So the pending
> >>> interrupt in PIRR will be synced to +     * vIRR before VM-Exit in time.
> >>> +     */ +    set_bit(VCPU_KICK_SOFTIRQ,
> >>> &softirq_pending(smp_processor_id())); + +    ack_APIC_irq(); +
> >>> this_cpu(irq_count)++; +} + +/*
> >>>   * Local variables:
> >>>   * mode: C
> >>>   * c-file-style: "BSD"
> >>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> >>> b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
> >>> a/xen/include/asm-x86/hvm/vmx/vmx.h +++
> >>> b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
> >>> free_p2m_hap_data(struct p2m_domain *p2m); void
> >>> p2m_init_hap_data(struct p2m_domain *p2m);
> >>>
> >>>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> >>> +void pi_notification_interrupt(struct cpu_user_regs *regs);
> >>>
> >>>  /* EPT violation qualifications definitions */
> >>>  #define _EPT_READ_VIOLATION         0
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> 
> 
> Best regards,
> Yang
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-26 19:46   ` Konrad Rzeszutek Wilk
@ 2015-03-27  5:45     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  5:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 3:47 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 08/15] Update IRTE according to guest
> interrupt config changes
> 
> On Wed, Mar 25, 2015 at 08:31:50PM +0800, Feng Wu wrote:
> > When guest changes its interrupt configuration (such as, vector, etc.)
> 
> s/such as,/such as/
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > the destination vCPU. This follows the hardware behavior, since modern
> > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> >
> > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> 
> multicase? Or multicast? or multicascade??

multicast. Sorry for the typo.

> > still use interrupt remapping.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/drivers/passthrough/io.c | 77
> +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index ae050df..1d9a132 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -26,6 +26,7 @@
> >  #include <asm/hvm/iommu.h>
> >  #include <asm/hvm/support.h>
> >  #include <xen/hvm/irq.h>
> > +#include <asm/io_apic.h>
> >
> >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >      xfree(dpci);
> >  }
> >
> > +/*
> > + * Here we handle the following cases:
> > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > + *   guest vector using vector-hashing mechamisn and return true. This
> follows
> 
> s/mechamism/mechanism/
> 
> > + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> > + *   handle the lowest-priority interrupt.
> > + * - Otherwise, for single destination interrupt, it is straightforward to
> > + *   find the destination vCPU and return true.
> > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> 
> s/multicase/??/
> > + *   so return false.
> > + */
> > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > +                                uint8_t dest_mode, uint8_t
> deliver_mode,
> > +                                uint32_t gvec, struct vcpu
> **dest_vcpu)
> > +{
> > +    struct vcpu *v, **dest_vcpu_array;
> > +    unsigned int dest_vcpu_num = 0;
> > +    int ret;
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> > +
> Please check that dest_vcpu_array was allocated.

Oh, yes, forgot it.

> 
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > +                                dest_id, dest_mode) )
> > +            continue;
> > +
> > +        dest_vcpu_num++;
> > +
> > +        if ( deliver_mode == dest_LowestPrio )
> > +            dest_vcpu_array[dest_vcpu_num] = v;
> > +        else
> > +            *dest_vcpu = v;
> 
> Should there be an break here?

We need get 'dest_vcpu_num' after the whole loop, so we can check
whether it is a multicast/broadcast interrupt, so we cannot add break
here.

Thanks,
Feng

> > +    }
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +    {
> > +        if (  dest_vcpu_num != 0 )
> > +        {
> > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > +            ret = 1;
> > +        }
> > +        else
> > +            ret = 0;
> > +
> > +        xfree(dest_vcpu_array);
> > +        return ret;
> > +    }
> > +    else if (  dest_vcpu_num == 1 )
> > +        return 1;
> > +    else
> > +        return 0;
> > +}
> > +
> >  int pt_irq_create_bind(
> >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> >  {
> > @@ -256,7 +313,7 @@ int pt_irq_create_bind(
> >      {
> >      case PT_IRQ_TYPE_MSI:
> >      {
> > -        uint8_t dest, dest_mode;
> > +        uint8_t dest, dest_mode, deliver_mode;
> >          int dest_vcpu_id;
> >
> >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> GFLAGS_SHIFT_DELIV_MODE) &
> > +                        VMSI_DELIV_MASK;
> >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> >          spin_unlock(&d->event_lock);
> >          if ( dest_vcpu_id >= 0 )
> >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > +
> > +        /* Use interrupt posting if it is supported */
> > +        if ( iommu_intpost )
> > +        {
> > +            struct vcpu *vcpu = NULL;
> > +
> > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> > +                break;
> > +
> > +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
> 
> s/ != 0//
> 
> 
> > +            {
> > +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");
> 
> Perhaps with some data on which domain it is for? And what vector?
> 
> > +                return -EBUSY;
> 
> Hmm.. Under what conditions can this actually happen? What should the
> recepient do?

pi_update_irte() returns false in some cases, such as:
- Cannot find 'struct irq_desc' for the pirq.
- Cannot get 'struct msi_desc' of the 'struct irq_desc'.
- Cannot get 'struct pci_dev' of the 'struct msi_desc'.
- Cannot get drhd structure for the PCI device.
- Cannot get 'struct ir_ctrl' of the iommu engine.

All these are normal checking, I think it seldom returns false. But if it happens,
may need more thinking about how to loop back the previous state. Do you have
any ideas about this? Thanks!

Thanks,
Feng


> > +            }
> > +        }
> > +
> >          break;
> >      }
> >
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-26 19:59   ` Andrew Cooper
@ 2015-03-27  5:49     ` Wu, Feng
  2015-03-27 11:31       ` Andrew Cooper
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27  5:49 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 3:59 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 08/15] Update IRTE according to guest
> interrupt config changes
> 
> On 25/03/15 12:31, Feng Wu wrote:
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > the destination vCPU. This follows the hardware behavior, since modern
> > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> >
> > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > still use interrupt remapping.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   xen/drivers/passthrough/io.c | 77
> +++++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index ae050df..1d9a132 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -26,6 +26,7 @@
> >   #include <asm/hvm/iommu.h>
> >   #include <asm/hvm/support.h>
> >   #include <xen/hvm/irq.h>
> > +#include <asm/io_apic.h>
> >
> >   static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >       xfree(dpci);
> >   }
> >
> > +/*
> > + * Here we handle the following cases:
> > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > + *   guest vector using vector-hashing mechamisn and return true. This
> follows
> > + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> > + *   handle the lowest-priority interrupt.
> 
> What is the hashing algorithm, or can I have some hint as to where to
> find it in a manual?

I asked hardware guys about this, there is no document about how hardware
implements the hashing algorithm.

> 
> > + * - Otherwise, for single destination interrupt, it is straightforward to
> > + *   find the destination vCPU and return true.
> > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > + *   so return false.
> > + */
> > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > +                                uint8_t dest_mode, uint8_t
> deliver_mode,
> > +                                uint32_t gvec, struct vcpu
> **dest_vcpu)
> > +{
> > +    struct vcpu *v, **dest_vcpu_array;
> > +    unsigned int dest_vcpu_num = 0;
> > +    int ret;
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> 
> This allocation can fail, but you really should see about avoiding it
> entirely, if possible.
> 
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > +                                dest_id, dest_mode) )
> > +            continue;
> > +
> > +        dest_vcpu_num++;
> > +
> > +        if ( deliver_mode == dest_LowestPrio )
> > +            dest_vcpu_array[dest_vcpu_num] = v;
> > +        else
> > +            *dest_vcpu = v;
> > +    }
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +    {
> > +        if (  dest_vcpu_num != 0 )
> > +        {
> > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > +            ret = 1;
> > +        }
> > +        else
> > +            ret = 0;
> > +
> > +        xfree(dest_vcpu_array);
> > +        return ret;
> > +    }
> > +    else if (  dest_vcpu_num == 1 )
> > +        return 1;
> > +    else
> > +        return 0;
> > +}
> > +
> >   int pt_irq_create_bind(
> >       struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> >   {
> > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> >       {
> >       case PT_IRQ_TYPE_MSI:
> >       {
> > -        uint8_t dest, dest_mode;
> > +        uint8_t dest, dest_mode, deliver_mode;
> >           int dest_vcpu_id;
> >
> >           if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> >           /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> >           dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> >           dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> GFLAGS_SHIFT_DELIV_MODE) &
> > +                        VMSI_DELIV_MASK;
> 
> s/deliver/delivery/
> 
> Also, you should be able to use MASK_EXTR() rather than manual shifts
> and masks.
> 
> >           dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >           pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> >           spin_unlock(&d->event_lock);
> >           if ( dest_vcpu_id >= 0 )
> >               hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > +
> > +        /* Use interrupt posting if it is supported */
> > +        if ( iommu_intpost )
> > +        {
> > +            struct vcpu *vcpu = NULL;
> > +
> > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> > +                break;
> > +
> > +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
> > +            {
> > +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");
> 
> Please put far more information this error message.
> 
> > +                return -EBUSY;
> 
> Under what circumstances can this happen.  I don't think it is valid to
> fail the userspace bind hypercall in this case.

Please see the reply to Konrad, BTW, do you have any sugguestion about this point?

Thanks,
Feng

> 
> ~Andrew
> 
> > +            }
> > +        }
> > +
> >           break;
> >       }
> >

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-03-26 17:39   ` Andrew Cooper
  2015-03-27  4:46     ` Wu, Feng
@ 2015-03-27  9:52     ` Jan Beulich
  1 sibling, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2015-03-27  9:52 UTC (permalink / raw)
  To: Andrew Cooper, Feng Wu, xen-devel; +Cc: yang.z.zhang, kevin.tian, keir

>>> On 26.03.15 at 18:39, <andrew.cooper3@citrix.com> wrote:
> On 25/03/15 12:31, Feng Wu wrote:
>> @@ -51,6 +52,7 @@ bool_t __read_mostly iommu_passthrough;
>>   bool_t __read_mostly iommu_snoop = 1;
>>   bool_t __read_mostly iommu_qinval = 1;
>>   bool_t __read_mostly iommu_intremap = 1;
>> +bool_t __read_mostly iommu_intpost = 0;
>>   bool_t __read_mostly iommu_hap_pt_share = 1;
>>   bool_t __read_mostly iommu_debug;
>>   bool_t __read_mostly amd_iommu_perdev_intremap = 1;
>> @@ -94,7 +96,11 @@ static void __init parse_iommu_param(char *s)
>>           else if ( !strcmp(s, "qinval") )
>>               iommu_qinval = val;
>>           else if ( !strcmp(s, "intremap") )
>> +        {
>>               iommu_intremap = val;
>> +            if ( iommu_intremap == 0 )
>> +                iommu_intpost = 0;
>> +        }
>>           else if ( !strcmp(s, "debug") )
>>           {
>>               iommu_debug = val;
> 
> At no point here do you add an strcmp(s, "intpost"), which means that 
> you do not alter the allowable command line syntax.
> 
> intpost must be able to be controlled independently of intremap, so I 
> suggest
> 
> else if ( !strcmp(s, "intpost") )
>      iommu_intpost = val;
> 
> and after the while loop,
> 
> if ( !iommu_intremap )
>      iommu_intpost = 0;
> 
> To ensure that intpost is never 1 if intremap is 0.

That shouldn't be after the while loop, but elsewhere such that
intremap getting turned off for other reasons or even being
switched to a default of zero would still result in consistent
settings.

Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-03-27  4:46     ` Wu, Feng
@ 2015-03-27  9:55       ` Andrew Cooper
  0 siblings, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-27  9:55 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 27/03/15 04:46, Wu, Feng wrote:
>>>     *   no-snoop                   Disable VT-d Snoop Control
>>>     *   no-qinval                  Disable VT-d Queued Invalidation
>>>     *   no-intremap                Disable VT-d Interrupt Remapping
>>> + *   no-intpost                 Disable VT-d Interrupt posting
>>>     */
>>>    custom_param("iommu", parse_iommu_param);
>>>    bool_t __initdata iommu_enable = 1;
>>> @@ -51,6 +52,7 @@ bool_t __read_mostly iommu_passthrough;
>>>    bool_t __read_mostly iommu_snoop = 1;
>>>    bool_t __read_mostly iommu_qinval = 1;
>>>    bool_t __read_mostly iommu_intremap = 1;
>>> +bool_t __read_mostly iommu_intpost = 0;
>>>    bool_t __read_mostly iommu_hap_pt_share = 1;
>>>    bool_t __read_mostly iommu_debug;
>>>    bool_t __read_mostly amd_iommu_perdev_intremap = 1;
>>> @@ -94,7 +96,11 @@ static void __init parse_iommu_param(char *s)
>>>            else if ( !strcmp(s, "qinval") )
>>>                iommu_qinval = val;
>>>            else if ( !strcmp(s, "intremap") )
>>> +        {
>>>                iommu_intremap = val;
>>> +            if ( iommu_intremap == 0 )
>>> +                iommu_intpost = 0;
>>> +        }
>>>            else if ( !strcmp(s, "debug") )
>>>            {
>>>                iommu_debug = val;
>> At no point here do you add an strcmp(s, "intpost"), which means that
>> you do not alter the allowable command line syntax.
>>
>> intpost must be able to be controlled independently of intremap, so I
>> suggest
>>
>> else if ( !strcmp(s, "intpost") )
>>       iommu_intpost = val;
> In this patch, I only define variable 'iommu_qinval' and implement some
> logics related to it, while command line 'intpost' is added and checked in
> patch [15/15]. So before patch [15/15] these patches don't take effect.

I noticed that when I got to the end of the series.  Doing it like that 
is ok, but you must make sure that you edit xen-command-line.markdown in 
the patch which changes the parsing.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-03-27  1:53     ` Wu, Feng
@ 2015-03-27  9:58       ` Jan Beulich
  2015-04-02  6:32         ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2015-03-27  9:58 UTC (permalink / raw)
  To: Andrew Cooper, Feng Wu; +Cc: Yang Z Zhang, Kevin Tian, keir, xen-devel

>>> On 27.03.15 at 02:53, <feng.wu@intel.com> wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, March 27, 2015 3:01 AM
>> On 25/03/15 12:31, Feng Wu wrote:
>> > --- a/xen/drivers/passthrough/vtd/iommu.h
>> > +++ b/xen/drivers/passthrough/vtd/iommu.h
>> > @@ -303,6 +303,18 @@ struct iremap_entry {
>> >               res_2   : 8,
>> >               dst     : 32;
>> >       }lo;
>> > +    struct {
>> > +        u64 p       : 1,
>> > +            fpd     : 1,
>> > +            res_1   : 6,
>> > +            avail   : 4,
>> > +            res_2   : 2,
>> > +            urg     : 1,
>> > +            im      : 1,
>> > +            vector  : 8,
>> > +            res_3   : 14,
>> > +            pda_l   : 26;
>> > +    }lo_intpost;
>> >     };
>> >     union {
>> >       u64 hi_val;
>> > @@ -312,6 +324,13 @@ struct iremap_entry {
>> >               svt     : 2,
>> >               res_1   : 44;
>> >       }hi;
>> > +    struct {
>> > +        u64 sid     : 16,
>> > +            sq      : 2,
>> > +            svt     : 2,
>> > +            res_1   : 12,
>> > +            pda_h   : 32;
>> > +    }hi_intpost;
>> 
>> I would prefer if this union was reformatted as I suggested in the
>> thread from your design doc, but I won't insist on it as a blocker to entry.
> 
> Thanks for the comments. I also considered your sugguestion on the Design
> doc, here is your proposal:
> 
> struct iremap_entry {
>     union {
>         struct { u64 lo, hi; };
>         struct { <bitfields> } norm; (names subject to improvement)
>         struct { <bitfields> } post;
>     };
> };
> 
> Seems in that way, we need to change some existing code to adapt to
> this new structure. I am okay with both of them, but can we listen
> some voices form some others? Is it okay for you?

I think this would be a good move, but in the end it's the VT-d
maintainers who got to decide.

Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-27  2:13     ` Wu, Feng
@ 2015-03-27 10:02       ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2015-03-27 10:02 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Z Zhang, Andrew Cooper, Kevin Tian, keir, xen-devel

>>> On 27.03.15 at 03:13, <feng.wu@intel.com> wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, March 27, 2015 3:17 AM
>> On 25/03/15 12:31, Feng Wu wrote:
>> > +
>> > +    new_ire.lo_intpost.res_1 = 0;
>> > +    new_ire.lo_intpost.res_2 = 0;
>> > +    new_ire.lo_intpost.res_3 = 0;
>> > +    new_ire.hi_intpost.res_1 = 0;
>> > +
>> > +    new_ire.lo_intpost.im = 1;
>> > +
>> > +    memcpy(p, &new_ire, sizeof(struct iremap_entry));
>> > +    iommu_flush_cache_entry(p, sizeof(struct iremap_entry));
>> 
>> Same here regarding sizeof.
>> 
>> Furthermore, is the memcpy() safe to update the live descriptor?
>> 
>> If it is, why do you not update the descriptor in place using p-> rather
>> than copying it to the stack and back?
> 
> There are some example in the current code of updating IRTE, such as
> in ioapic_rte_to_remap_entry(), I try to follow it here.

Doing things the same way as existing code does should always be
done with spending some extra thought on "is the original code
correct" and "does my use case have the same constraints as the
original one". Blindly copying code means blindly spreading mistakes.

Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection
  2015-03-27  1:21     ` Wu, Feng
@ 2015-03-27 10:06       ` Andrew Cooper
  2015-03-27 13:41         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-27 10:06 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 27/03/15 01:21, Wu, Feng wrote:
>>> +     * Remapping, and Posted Interrupt
>>>         */
>>>        for_each_drhd_unit ( drhd )
>>>        {
>>> @@ -2146,7 +2148,13 @@ int __init intel_vtd_setup(void)
>>>                iommu_qinval = 0;
>>>
>>>            if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
>>> +        {
>>>                iommu_intremap = 0;
>>> +            iommu_intpost = 0;
>>> +        }
>>> +
>>> +        if ( iommu_intpost && !cap_intr_post(iommu->cap))
>> Missing space inside the outer bracket.
>>
>> I am wondering whether it might be easier, instead of having
>> "iommu_intremap = 0; iommu_intpost = 0" all over the place, to instead
>> insist that one must check "iommu_intremap && iommu_intpost".
> It that case, user must check "iommu_intremap && iommu_intpost" together,
> my idea in this patchset is the when iommu_intpost == 1 guarantees "iommu_intremap == 1",
> so we only need to check iommu_intpost later.

If the configuration gets much more complicated, it might be worth 
introducing iommu_disable(foo)/iommu_enable(foo) functions which take 
care of ensuring that the interdependences are met.

For now, this is probably the more simple solution.

>> Out of interest, which platforms have intpost capabilities?
> Another Broadwell platform, not launched yet.

Broadwell EP/EX ?  I have an SDP to hand, but as it claims to be a 
"Genuine Intel(R) CPU 0000 @ 1.70GHz", it is not completely trivial to 
work out what platform it is exactly.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-27  5:49     ` Wu, Feng
@ 2015-03-27 11:31       ` Andrew Cooper
  0 siblings, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-27 11:31 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 27/03/15 05:49, Wu, Feng wrote:
>>> +/*
>>> + * Here we handle the following cases:
>>> + * - For lowest-priority interrupts, we find the destination vCPU from the
>>> + *   guest vector using vector-hashing mechamisn and return true. This
>> follows
>>> + *   the hardware behavior, since modern Intel CPUs use vector hashing to
>>> + *   handle the lowest-priority interrupt.
>> What is the hashing algorithm, or can I have some hint as to where to
>> find it in a manual?
> I asked hardware guys about this, there is no document about how hardware
> implements the hashing algorithm.

In which case you must carefully document the hashing algorithm in the 
comment.  (And while you are at it, press for the hashing algorithm to 
find its way into an appropriate formal document.)

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-27  3:00     ` Wu, Feng
@ 2015-03-27 12:06       ` Andrew Cooper
  2015-03-27 13:45         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-27 12:06 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 27/03/15 03:00, Wu, Feng wrote:
>
>>>    static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>>>    {
>>> +    int r, sn;
>>> +
>>>        if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
>>>            return;
>>>
>>> +    /*
>>> +     * Currently, we don't support urgent interrupt, all interrupts
>>> +     * are recognized as non-urgent interrupt, so we cannot send
>>> +     * posted-interrupt when 'SN' is set.
>>> +     */
>>> +
>>> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
>> Is there anywhere which sets sn at all? I cant spot anywhere.
>>
> SN is set in [13/15] while vCPU is going to runnable state.

Then please do not set SN in the first place if we don't support it.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection
  2015-03-27 10:06       ` Andrew Cooper
@ 2015-03-27 13:41         ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-03-27 13:41 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 6:06 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature
> detection
> 
> On 27/03/15 01:21, Wu, Feng wrote:
> >>> +     * Remapping, and Posted Interrupt
> >>>         */
> >>>        for_each_drhd_unit ( drhd )
> >>>        {
> >>> @@ -2146,7 +2148,13 @@ int __init intel_vtd_setup(void)
> >>>                iommu_qinval = 0;
> >>>
> >>>            if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> >>> +        {
> >>>                iommu_intremap = 0;
> >>> +            iommu_intpost = 0;
> >>> +        }
> >>> +
> >>> +        if ( iommu_intpost && !cap_intr_post(iommu->cap))
> >> Missing space inside the outer bracket.
> >>
> >> I am wondering whether it might be easier, instead of having
> >> "iommu_intremap = 0; iommu_intpost = 0" all over the place, to instead
> >> insist that one must check "iommu_intremap && iommu_intpost".
> > It that case, user must check "iommu_intremap && iommu_intpost"
> together,
> > my idea in this patchset is the when iommu_intpost == 1 guarantees
> "iommu_intremap == 1",
> > so we only need to check iommu_intpost later.
> 
> If the configuration gets much more complicated, it might be worth
> introducing iommu_disable(foo)/iommu_enable(foo) functions which take
> care of ensuring that the interdependences are met.
> 
> For now, this is probably the more simple solution.
> 
> >> Out of interest, which platforms have intpost capabilities?
> > Another Broadwell platform, not launched yet.
> 
> Broadwell EP/EX ?  I have an SDP to hand, but as it claims to be a
> "Genuine Intel(R) CPU 0000 @ 1.70GHz", it is not completely trivial to
> work out what platform it is exactly.

Broadwell EP should have this feature, but the hardware with this feature
should be available later this year.

Thanks,
Feng
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-27 12:06       ` Andrew Cooper
@ 2015-03-27 13:45         ` Wu, Feng
  2015-03-27 13:49           ` Andrew Cooper
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-27 13:45 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 8:06 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when 'SN' is
> set
> 
> On 27/03/15 03:00, Wu, Feng wrote:
> >
> >>>    static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> >>>    {
> >>> +    int r, sn;
> >>> +
> >>>        if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> >>>            return;
> >>>
> >>> +    /*
> >>> +     * Currently, we don't support urgent interrupt, all interrupts
> >>> +     * are recognized as non-urgent interrupt, so we cannot send
> >>> +     * posted-interrupt when 'SN' is set.
> >>> +     */
> >>> +
> >>> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
> >> Is there anywhere which sets sn at all? I cant spot anywhere.
> >>
> > SN is set in [13/15] while vCPU is going to runnable state.
> 
> Then please do not set SN in the first place if we don't support it.


What do you mean here. Setting 'SN' can suppress non-urgent interrupt. (we only support this)

Thanks,
Feng

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-27 13:45         ` Wu, Feng
@ 2015-03-27 13:49           ` Andrew Cooper
  2015-03-30  2:11             ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-03-27 13:49 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 27/03/15 13:45, Wu, Feng wrote:
>
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, March 27, 2015 8:06 PM
>> To: Wu, Feng; xen-devel@lists.xen.org
>> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
>> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when 'SN' is
>> set
>>
>> On 27/03/15 03:00, Wu, Feng wrote:
>>>>>     static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>>>>>     {
>>>>> +    int r, sn;
>>>>> +
>>>>>         if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
>>>>>             return;
>>>>>
>>>>> +    /*
>>>>> +     * Currently, we don't support urgent interrupt, all interrupts
>>>>> +     * are recognized as non-urgent interrupt, so we cannot send
>>>>> +     * posted-interrupt when 'SN' is set.
>>>>> +     */
>>>>> +
>>>>> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
>>>> Is there anywhere which sets sn at all? I cant spot anywhere.
>>>>
>>> SN is set in [13/15] while vCPU is going to runnable state.
>> Then please do not set SN in the first place if we don't support it.
>
> What do you mean here. Setting 'SN' can suppress non-urgent interrupt. (we only support this)

Sorry, in which case patch 13 shouldn't clear SN then.

Either way - we should not be fixing up something in this patch which 
was introduced in the previous patch.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-03-27  1:06   ` Wu, Feng
@ 2015-03-27 14:44     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-27 14:44 UTC (permalink / raw)
  To: Wu, Feng; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich, xen-devel

On Fri, Mar 27, 2015 at 01:06:48AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Friday, March 27, 2015 2:51 AM
> > To: Wu, Feng
> > Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> > JBeulich@suse.com
> > Subject: Re: [Xen-devel] [RFC v1 00/15] Add VT-d Posted-Interrupts support
> > 
> > On Wed, Mar 25, 2015 at 08:31:42PM +0800, Feng Wu wrote:
> > > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > > With VT-d Posted-Interrupts enabled, external interrupts from
> > > direct-assigned devices can be delivered to guests without VMM
> > > intervention when guest is running in non-root mode.
> > >
> > > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > >
> > http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> > y/vt-directed-io-spec.html
> > >
> > > This patch set follow the following design:
> > > http://article.gmane.org/gmane.comp.emulators.xen.devel/236476
> > 
> > Would it be possible to put the design in xen/docs/ directory?
> 
> That's a good suggestion. I will do it in the next post.

This being an RFC patchset it is in very good state. And having the design
document in mind made it very nice to follow. Thank you!

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-27 13:49           ` Andrew Cooper
@ 2015-03-30  2:11             ` Wu, Feng
  2015-03-30 10:11               ` Andrew Cooper
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-03-30  2:11 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, March 27, 2015 9:49 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when 'SN' is
> set
> 
> On 27/03/15 13:45, Wu, Feng wrote:
> >
> >> -----Original Message-----
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: Friday, March 27, 2015 8:06 PM
> >> To: Wu, Feng; xen-devel@lists.xen.org
> >> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
> >> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when
> 'SN' is
> >> set
> >>
> >> On 27/03/15 03:00, Wu, Feng wrote:
> >>>>>     static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
> >>>>>     {
> >>>>> +    int r, sn;
> >>>>> +
> >>>>>         if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
> >>>>>             return;
> >>>>>
> >>>>> +    /*
> >>>>> +     * Currently, we don't support urgent interrupt, all interrupts
> >>>>> +     * are recognized as non-urgent interrupt, so we cannot send
> >>>>> +     * posted-interrupt when 'SN' is set.
> >>>>> +     */
> >>>>> +
> >>>>> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
> >>>> Is there anywhere which sets sn at all? I cant spot anywhere.
> >>>>
> >>> SN is set in [13/15] while vCPU is going to runnable state.
> >> Then please do not set SN in the first place if we don't support it.
> >
> > What do you mean here. Setting 'SN' can suppress non-urgent interrupt. (we
> only support this)
> 
> Sorry, in which case patch 13 shouldn't clear SN then.
> 
> Either way - we should not be fixing up something in this patch which
> was introduced in the previous patch.

I think there are some misunderstanding here. 

- In patch 13, we need to set 'SN', so as to suppress the interrupts when vCPU
is in runnable state, since we don't need to send notification event then.
- Here in patch 14, it is another story. From hardware p.o.v, if 'SN' is set, it doesn't
send notification event. vmx_deliver_posted_intr() is the software way to delivery
posted-interrupts, so we need to follow the HW's behavior. Hence we check 'SN'
first, and not send notification event if it is set.

Thanks,
Feng

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 14/15] Suppress posting interrupts when 'SN' is set
  2015-03-30  2:11             ` Wu, Feng
@ 2015-03-30 10:11               ` Andrew Cooper
  0 siblings, 0 replies; 101+ messages in thread
From: Andrew Cooper @ 2015-03-30 10:11 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich

On 30/03/15 03:11, Wu, Feng wrote:
>
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Friday, March 27, 2015 9:49 PM
>> To: Wu, Feng; xen-devel@lists.xen.org
>> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
>> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when 'SN' is
>> set
>>
>> On 27/03/15 13:45, Wu, Feng wrote:
>>>> -----Original Message-----
>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>>>> Sent: Friday, March 27, 2015 8:06 PM
>>>> To: Wu, Feng; xen-devel@lists.xen.org
>>>> Cc: Zhang, Yang Z; Tian, Kevin; keir@xen.org; JBeulich@suse.com
>>>> Subject: Re: [Xen-devel] [RFC v1 14/15] Suppress posting interrupts when
>> 'SN' is
>>>> set
>>>>
>>>> On 27/03/15 03:00, Wu, Feng wrote:
>>>>>>>     static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>>>>>>>     {
>>>>>>> +    int r, sn;
>>>>>>> +
>>>>>>>         if ( pi_test_and_set_pir(vector, &v->arch.hvm_vmx.pi_desc) )
>>>>>>>             return;
>>>>>>>
>>>>>>> +    /*
>>>>>>> +     * Currently, we don't support urgent interrupt, all interrupts
>>>>>>> +     * are recognized as non-urgent interrupt, so we cannot send
>>>>>>> +     * posted-interrupt when 'SN' is set.
>>>>>>> +     */
>>>>>>> +
>>>>>>> +    sn = pi_test_sn(&v->arch.hvm_vmx.pi_desc);
>>>>>> Is there anywhere which sets sn at all? I cant spot anywhere.
>>>>>>
>>>>> SN is set in [13/15] while vCPU is going to runnable state.
>>>> Then please do not set SN in the first place if we don't support it.
>>> What do you mean here. Setting 'SN' can suppress non-urgent interrupt. (we
>> only support this)
>>
>> Sorry, in which case patch 13 shouldn't clear SN then.
>>
>> Either way - we should not be fixing up something in this patch which
>> was introduced in the previous patch.
> I think there are some misunderstanding here. 
>
> - In patch 13, we need to set 'SN', so as to suppress the interrupts when vCPU
> is in runnable state, since we don't need to send notification event then.
> - Here in patch 14, it is another story. From hardware p.o.v, if 'SN' is set, it doesn't
> send notification event. vmx_deliver_posted_intr() is the software way to delivery
> posted-interrupts, so we need to follow the HW's behavior. Hence we check 'SN'
> first, and not send notification event if it is set.

Ah I see.  Thanks for the clarification.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
                   ` (15 preceding siblings ...)
  2015-03-26 18:50 ` [RFC v1 00/15] Add VT-d Posted-Interrupts support Konrad Rzeszutek Wilk
@ 2015-04-01 13:21 ` Wu, Feng
  2015-04-13 12:12   ` Jan Beulich
  16 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-01 13:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, JBeulich

Hi Jan,

Any more comments about this series? Thanks a lot!

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> To: xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z; Tian, Kevin; Wu, Feng
> Subject: [RFC v1 00/15] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> This patch set follow the following design:
> http://article.gmane.org/gmane.comp.emulators.xen.devel/236476
> 
> Feng Wu (15):
>   iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
>   vt-d: VT-d Posted-Interrupts feature detection
>   vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
>   vmx: Add some helper functions for Posted-Interrupts
>   vmx: Initialize VT-d Posted-Interrupts Descriptor
>   vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
>   vt-d: Add API to update IRTE when VT-d PI is used
>   Update IRTE according to guest interrupt config changes
>   Add a new per-vCPU tasklet to wakeup the blocked vCPU
>   vmx: Define two per-cpu variants
>   vmx: Add a global wake-up vector for VT-d Posted-Interrupts
>   vmx: Properly handle notification event when vCPU is running
>   Update Posted-Interrupts Descriptor during vCPU scheduling
>   Suppress posting interrupts when 'SN' is set
>   Add a command line parameter for VT-d posted-interrupts
> 
>  xen/arch/x86/hvm/vmx/vmcs.c            |   6 ++
>  xen/arch/x86/hvm/vmx/vmx.c             | 185
> ++++++++++++++++++++++++++++++++-
>  xen/common/domain.c                    |  11 ++
>  xen/common/schedule.c                  |   3 +
>  xen/drivers/passthrough/io.c           |  77 +++++++++++++-
>  xen/drivers/passthrough/iommu.c        |  17 ++-
>  xen/drivers/passthrough/vtd/intremap.c |  83 +++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.c    |  15 ++-
>  xen/drivers/passthrough/vtd/iommu.h    |  23 ++++
>  xen/include/asm-x86/hvm/hvm.h          |   1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h     |  16 ++-
>  xen/include/asm-x86/hvm/vmx/vmx.h      |  49 ++++++++-
>  xen/include/asm-x86/iommu.h            |   2 +
>  xen/include/xen/iommu.h                |   2 +-
>  xen/include/xen/sched.h                |   5 +
>  15 files changed, 485 insertions(+), 10 deletions(-)
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
  2015-03-26 19:17   ` Andrew Cooper
  2015-03-26 19:36   ` Konrad Rzeszutek Wilk
@ 2015-04-02  5:34   ` Tian, Kevin
  2015-04-02  6:02     ` Wu, Feng
  2 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  5:34 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/drivers/passthrough/vtd/intremap.c | 83
> ++++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.h    |  3 ++
>  xen/include/asm-x86/iommu.h            |  2 +
>  3 files changed, 88 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index 0333686..f44e74d 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
>      for_each_drhd_unit ( drhd )
>          disable_qinval(drhd->iommu);
>  }
> +
> +/*
> + * This function is used to update the IRTE for posted-interrupt
> + * when guest changes MSI/MSI-X information
> + */
> +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
> +{
> +    struct irq_desc *desc;
> +    struct msi_desc *msi_desc;
> +    int remap_index, rc = -1;
> +    struct pci_dev *pci_dev;
> +    struct acpi_drhd_unit *drhd;
> +    struct iommu *iommu;
> +    struct ir_ctrl *ir_ctrl;
> +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +    struct iremap_entry new_ire;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    unsigned long flags;
> +
> +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +    if ( !desc )
> +        return -1;
> +
> +    msi_desc = desc->msi_desc;
> +    if ( !msi_desc )
> +        goto unlock_out;
> +
> +    remap_index = msi_desc->remap_index;
> +    pci_dev = msi_desc->dev;
> +    if ( !pci_dev )
> +        goto unlock_out;
> +
> +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> +    if (!drhd)
> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
> +        goto unlock_out;
> +    }
> +
> +    iommu = drhd->iommu;
> +    ir_ctrl = iommu_ir_ctrl(iommu);
> +    if ( !ir_ctrl )
> +    {
> +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
> +        goto unlock_out;
> +    }
> +
> +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> +
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> +
> +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> +
> +    /* Setup/Update interrupt remapping table entry */
> +    new_ire.lo_intpost.urg = 0;
> +    new_ire.lo_intpost.vector = gvec;
> +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> +                                (32 - PDA_LOW_BIT)) & ~(-1UL <<
> PDA_LOW_BIT);
> +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> +                                ~(-1UL << PDA_HIGH_BIT);
> +
> +    new_ire.lo_intpost.res_1 = 0;
> +    new_ire.lo_intpost.res_2 = 0;
> +    new_ire.lo_intpost.res_3 = 0;
> +    new_ire.hi_intpost.res_1 = 0;
> +
> +    new_ire.lo_intpost.im = 1;
> +

Is above code for creating a new entry or updating an existing entry?
suppose two purposes would require different steps but here I didn't
see a difference... or if you mean every update is equal to what's
required for creating a new one, better add a comment for that.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
  2015-03-26 19:46   ` Konrad Rzeszutek Wilk
  2015-03-26 19:59   ` Andrew Cooper
@ 2015-04-02  5:52   ` Tian, Kevin
  2015-04-02  6:20     ` Wu, Feng
  2 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  5:52 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
> 
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> 
> For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> still use interrupt remapping.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/drivers/passthrough/io.c | 77
> +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..1d9a132 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -26,6 +26,7 @@
>  #include <asm/hvm/iommu.h>
>  #include <asm/hvm/support.h>
>  #include <xen/hvm/irq.h>
> +#include <asm/io_apic.h>
> 
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
> 
> @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>      xfree(dpci);
>  }
> 
> +/*
> + * Here we handle the following cases:
> + * - For lowest-priority interrupts, we find the destination vCPU from the
> + *   guest vector using vector-hashing mechamisn and return true. This
> follows
> + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> + *   handle the lowest-priority interrupt.
> + * - Otherwise, for single destination interrupt, it is straightforward to
> + *   find the destination vCPU and return true.
> + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> + *   so return false.
> + */
> +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> +                                uint8_t dest_mode, uint8_t
> deliver_mode,
> +                                uint32_t gvec, struct vcpu
> **dest_vcpu)
> +{
> +    struct vcpu *v, **dest_vcpu_array;
> +    unsigned int dest_vcpu_num = 0;
> +    int ret;
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> +                                dest_id, dest_mode) )
> +            continue;
> +
> +        dest_vcpu_num++;
> +
> +        if ( deliver_mode == dest_LowestPrio )
> +            dest_vcpu_array[dest_vcpu_num] = v;
> +        else
> +            *dest_vcpu = v;
> +    }
> +
> +    if ( deliver_mode == dest_LowestPrio )
> +    {
> +        if (  dest_vcpu_num != 0 )
> +        {
> +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> +            ret = 1;
> +        }
> +        else
> +            ret = 0;
> +
> +        xfree(dest_vcpu_array);
> +        return ret;
> +    }
> +    else if (  dest_vcpu_num == 1 )
> +        return 1;
> +    else
> +        return 0;
> +}
> +
>  int pt_irq_create_bind(
>      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
>  {
> @@ -257,7 +313,7 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, dest_mode;
> +        uint8_t dest, dest_mode, deliver_mode;
>          int dest_vcpu_id;
> 
>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -330,11 +386,30 @@ int pt_irq_create_bind(
>          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
>          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> GFLAGS_SHIFT_DELIV_MODE) &
> +                        VMSI_DELIV_MASK;
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
>          if ( dest_vcpu_id >= 0 )
>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> +
> +        /* Use interrupt posting if it is supported */
> +        if ( iommu_intpost )
> +        {
> +            struct vcpu *vcpu = NULL;
> +
> +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> +                break;
> +

Is it possible this new pi_find_dest_vcpu will return a different target from
earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
earlier pirqs are migrated according to different policy. We need consolidate
vcpu selection policies together to keep consistency.

and why failure to find dest_vcpu doesn't lead to an error but a break?

> +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
> +            {
> +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");
> +                return -EBUSY;
> +            }
> +        }
> +
>          break;
>      }
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
  2015-03-25 12:31 ` [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU Feng Wu
@ 2015-04-02  5:53   ` Tian, Kevin
  2015-04-02  7:20     ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  5:53 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> This patch adds a new per-vCPU tasklet to wakeup the blocked
> vCPU. It can be used in the case vcpu_unblock cannot be called
> directly.

could you elaborate under which scenario vcpu_unblock can't
be called directly?

> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/common/domain.c     | 11 +++++++++++
>  xen/include/xen/sched.h |  3 +++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index aa78fd7..fe89658 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -109,6 +109,13 @@ static void vcpu_check_shutdown(struct vcpu *v)
>      spin_unlock(&d->shutdown_lock);
>  }
> 
> +static void vcpu_wakeup_tasklet_handler(unsigned long arg)
> +{
> +    struct vcpu *v = (void *)arg;
> +
> +    vcpu_unblock(v);
> +}
> +
>  struct vcpu *alloc_vcpu(
>      struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
>  {
> @@ -126,6 +133,9 @@ struct vcpu *alloc_vcpu(
> 
>      tasklet_init(&v->continue_hypercall_tasklet, NULL, 0);
> 
> +    tasklet_init(&v->vcpu_wakeup_tasklet, vcpu_wakeup_tasklet_handler,
> +                 (unsigned long)v);
> +
>      if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
>           !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
>           !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
> @@ -784,6 +794,7 @@ static void complete_domain_destroy(struct rcu_head
> *head)
>          if ( (v = d->vcpu[i]) == NULL )
>              continue;
>          tasklet_kill(&v->continue_hypercall_tasklet);
> +        tasklet_kill(&v->vcpu_wakeup_tasklet);
>          vcpu_destroy(v);
>          sched_destroy_vcpu(v);
>          destroy_waitqueue_vcpu(v);
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index ccd7ed8..c874dd4 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -239,6 +239,9 @@ struct vcpu
>      /* Tasklet for continue_hypercall_on_cpu(). */
>      struct tasklet   continue_hypercall_tasklet;
> 
> +    /* Tasklet for wakeup_blocked_vcpu(). */
> +    struct tasklet   vcpu_wakeup_tasklet;
> +
>      /* Multicall information. */
>      struct mc_state  mc_state;
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 10/15] vmx: Define two per-cpu variants
  2015-03-25 12:31 ` [RFC v1 10/15] vmx: Define two per-cpu variants Feng Wu
  2015-03-26 19:59   ` Andrew Cooper
@ 2015-04-02  5:54   ` Tian, Kevin
  2015-04-02  6:24     ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  5:54 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> This patch defines two per-cpu variants:
> 
> blocked_vcpu_on_cpu:
> A list storing the vCPUs which were blocked on this pCPU.
> 
> blocked_vcpu_on_cpu_lock:
> The spinlock to protect blocked_vcpu_on_cpu.

since above two are already per-cpu variants, you don't need
'on_cpu' in the name to duplicate it. How about just call them
"blocked_vcpus" and "blocked_vcpus_lock"? :-)

> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c       | 3 +++
>  xen/arch/x86/hvm/vmx/vmx.c        | 7 +++++++
>  xen/include/asm-x86/hvm/vmx/vmx.h | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 942f4b7..1345e69 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -585,6 +585,9 @@ int vmx_cpu_up(void)
>      if ( cpu_has_vmx_vpid )
>          vpid_sync_all();
> 
> +    INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> +    spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +
>      return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e1c55ce..ff5544d 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -81,6 +81,13 @@ static int vmx_msr_read_intercept(unsigned int msr,
> uint64_t *msr_content);
>  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
> 
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be waken up.
> + */
> +DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> 
>  static int vmx_domain_initialise(struct domain *d)
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 3cd75eb..e643c3c 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -30,6 +30,9 @@
>  #include <asm/hvm/vmx/vmcs.h>
>  #include <asm/apic.h>
> 
> +DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>  extern uint8_t posted_intr_vector;
> 
>  typedef union {
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-03-25 12:31 ` [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts Feng Wu
  2015-03-26 20:07   ` Andrew Cooper
@ 2015-04-02  6:00   ` Tian, Kevin
  2015-04-02  7:18     ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  6:00 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> This patch adds a global vector which is used to wake up
> the blocked vCPU when an interrupt is being posted to it.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c        | 33
> +++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/hvm.h     |  1 +
>  xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
>  xen/include/xen/sched.h           |  2 ++
>  4 files changed, 39 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index ff5544d..b2b4c26 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head,
> blocked_vcpu_on_cpu);
>  DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> 
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>      if ( v->vcpu_id == 0 )
>          v->arch.user_regs.eax = 1;
> 
> +    INIT_LIST_HEAD(&v->blocked_vcpu_list);
> +
>      return 0;
>  }
> 
> @@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init
> start_vmx(void)
>      }
> 
>      if ( cpu_has_vmx_posted_intr_processing )
> +    {
>          alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt);
> +
> +        if ( iommu_intpost )
> +            alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> +        else
> +            vmx_function_table.pi_desc_update = NULL;
> +    }

just style issue. Above conditional logic looks not intuitive to me.
usually we have:
	if ( iommu_intpost )
		vmx_function_table.pi_desc_update = func;
	else
		vmx_function_table.pi_desc_update = NULL;

suppose you will register callback in later patch. then better to
move the NULL one there too. Putting it here doesn't meet the
normal if...else implications. :-)

>      else
>      {
>          vmx_function_table.deliver_posted_intr = NULL;
>          vmx_function_table.sync_pir_to_irr = NULL;
> +        vmx_function_table.pi_desc_update = NULL;
>      }
> 
>      if ( cpu_has_vmx_ept
> @@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct
> cpu_user_regs *regs)
>  }
> 
>  /*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct vcpu *v;
> +    int cpu = smp_processor_id();
> +
> +    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +                    blocked_vcpu_list) {
> +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +        if ( pi_test_on(pi_desc) == 1 )
> +            tasklet_schedule(&v->vcpu_wakeup_tasklet);

why can't we directly call vcpu_unblock here?

> +    }
> +    spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +}
> +
> +/*
>   * Local variables:
>   * mode: C
>   * c-file-style: "BSD"
> diff --git a/xen/include/asm-x86/hvm/hvm.h
> b/xen/include/asm-x86/hvm/hvm.h
> index 0dc909b..a11a256 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -195,6 +195,7 @@ struct hvm_function_table {
>      void (*deliver_posted_intr)(struct vcpu *v, u8 vector);
>      void (*sync_pir_to_irr)(struct vcpu *v);
>      void (*handle_eoi)(u8 vector);
> +    void (*pi_desc_update)(struct vcpu *v, int new_state);
> 
>      /*Walk nested p2m  */
>      int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa,
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> index e643c3c..f4296ab 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -34,6 +34,7 @@ DECLARE_PER_CPU(struct list_head,
> blocked_vcpu_on_cpu);
>  DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> 
>  extern uint8_t posted_intr_vector;
> +extern uint8_t pi_wakeup_vector;
> 
>  typedef union {
>      struct {
> @@ -574,6 +575,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
>  void free_p2m_hap_data(struct p2m_domain *p2m);
>  void p2m_init_hap_data(struct p2m_domain *p2m);
> 
> +void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> +
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION         0
>  #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index c874dd4..91f0912 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -148,6 +148,8 @@ struct vcpu
> 
>      struct vcpu     *next_in_list;
> 
> +    struct list_head blocked_vcpu_list;
> +
>      s_time_t         periodic_period;
>      s_time_t         periodic_last_event;
>      struct timer     periodic_timer;
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
  2015-04-02  5:34   ` Tian, Kevin
@ 2015-04-02  6:02     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  6:02 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 1:34 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/drivers/passthrough/vtd/intremap.c | 83
> > ++++++++++++++++++++++++++++++++++
> >  xen/drivers/passthrough/vtd/iommu.h    |  3 ++
> >  xen/include/asm-x86/iommu.h            |  2 +
> >  3 files changed, 88 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/intremap.c
> > b/xen/drivers/passthrough/vtd/intremap.c
> > index 0333686..f44e74d 100644
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -898,3 +898,86 @@ void iommu_disable_x2apic_IR(void)
> >      for_each_drhd_unit ( drhd )
> >          disable_qinval(drhd->iommu);
> >  }
> > +
> > +/*
> > + * This function is used to update the IRTE for posted-interrupt
> > + * when guest changes MSI/MSI-X information
> > + */
> > +int pi_update_irte(struct vcpu *v, struct pirq *pirq, uint32_t gvec )
> > +{
> > +    struct irq_desc *desc;
> > +    struct msi_desc *msi_desc;
> > +    int remap_index, rc = -1;
> > +    struct pci_dev *pci_dev;
> > +    struct acpi_drhd_unit *drhd;
> > +    struct iommu *iommu;
> > +    struct ir_ctrl *ir_ctrl;
> > +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> > +    struct iremap_entry new_ire;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    unsigned long flags;
> > +
> > +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> > +    if ( !desc )
> > +        return -1;
> > +
> > +    msi_desc = desc->msi_desc;
> > +    if ( !msi_desc )
> > +        goto unlock_out;
> > +
> > +    remap_index = msi_desc->remap_index;
> > +    pci_dev = msi_desc->dev;
> > +    if ( !pci_dev )
> > +        goto unlock_out;
> > +
> > +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> > +    if (!drhd)
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get drhd!\n");
> > +        goto unlock_out;
> > +    }
> > +
> > +    iommu = drhd->iommu;
> > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > +    if ( !ir_ctrl )
> > +    {
> > +        dprintk(XENLOG_INFO VTDPREFIX, "failed to get ir_ctrl!\n");
> > +        goto unlock_out;
> > +    }
> > +
> > +    spin_lock_irqsave(&ir_ctrl->iremap_lock, flags);
> > +
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> > iremap_entries, p);
> > +
> > +    memcpy(&new_ire, p, sizeof(struct iremap_entry));
> > +
> > +    /* Setup/Update interrupt remapping table entry */
> > +    new_ire.lo_intpost.urg = 0;
> > +    new_ire.lo_intpost.vector = gvec;
> > +    new_ire.lo_intpost.pda_l = (((u64)virt_to_maddr(pi_desc)) >>
> > +                                (32 - PDA_LOW_BIT)) & ~(-1UL <<
> > PDA_LOW_BIT);
> > +    new_ire.hi_intpost.pda_h = (((u64)virt_to_maddr(pi_desc)) >>  32) &
> > +                                ~(-1UL << PDA_HIGH_BIT);
> > +
> > +    new_ire.lo_intpost.res_1 = 0;
> > +    new_ire.lo_intpost.res_2 = 0;
> > +    new_ire.lo_intpost.res_3 = 0;
> > +    new_ire.hi_intpost.res_1 = 0;
> > +
> > +    new_ire.lo_intpost.im = 1;
> > +
> 
> Is above code for creating a new entry or updating an existing entry?
> suppose two purposes would require different steps but here I didn't
> see a difference... or if you mean every update is equal to what's
> required for creating a new one, better add a comment for that.

It just updates an existing IRTE here to a posted format.

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-03-27  4:57         ` Wu, Feng
@ 2015-04-02  6:08           ` Tian, Kevin
  2015-04-02  7:21             ` Wu, Feng
  2015-04-02 19:15             ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  6:08 UTC (permalink / raw)
  To: Wu, Feng, Zhang, Yang Z, xen-devel; +Cc: keir, JBeulich

> From: Wu, Feng
> Sent: Friday, March 27, 2015 12:58 PM
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Yang Z
> > Sent: Friday, March 27, 2015 12:44 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when
> vCPU
> > is running
> >
> > Wu, Feng wrote on 2015-03-27:
> > >
> > >
> > > Zhang, Yang Z wrote on 2015-03-25:
> > >> when vCPU is running
> > >>
> > >> Wu, Feng wrote on 2015-03-25:
> > >>> When a vCPU is running in Root mode and a notification event has
> > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > >>> vIRR before
> > > VM-Exit in time.
> > >>
> > >> Shouldn't the pending interrupt be synced unconditionally before next
> > >> vmentry? What happens if we didn't set the softirq?
> > >
> > > If we didn't set the softirq in the notification handler, the
> > > interrupts happened exactly before VM-entry cannot be delivered to
> > > guest at this time. Please see the following code fragments from
> > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > >
> > > .Lvmx_do_vmentry
> > >
> > > ......
> > > 		/* If Vt-d engine issues a notification event here,
> > >          * it cannot be delivered to guest during this VM-entry
> > >          * without raising the softirq in notification handler. */
> > >         cmp  %ecx,(%rdx,%rax,1)
> > >         jnz  .Lvmx_process_softirqs
> > > ......
> > >
> > >         je   .Lvmx_launch
> > > ......
> > >
> > >
> > > .Lvmx_process_softirqs:
> > >         sti
> > >         call do_softirq
> > >         jmp  .Lvmx_do_vmentry
> >
> > You are right! This helps me to recall why raise the softirq when delivering
> the
> > PI.
> 
> Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it sets
> the
> softirq for this purpose, however, when VT-d HW delivers PI, we have no
> control to
> the HW itself, hence we need to set this softirq in the Notification Event
> handler.
> 

could you include this information in the comment so others can easily
understand this requirement? from code you only mentioned VCPU_KICK
_SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-04-02  5:52   ` Tian, Kevin
@ 2015-04-02  6:20     ` Wu, Feng
  2015-04-02  6:49       ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  6:20 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 1:52 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> changes
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > the destination vCPU. This follows the hardware behavior, since modern
> > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> >
> > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > still use interrupt remapping.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/drivers/passthrough/io.c | 77
> > +++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 76 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index ae050df..1d9a132 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -26,6 +26,7 @@
> >  #include <asm/hvm/iommu.h>
> >  #include <asm/hvm/support.h>
> >  #include <xen/hvm/irq.h>
> > +#include <asm/io_apic.h>
> >
> >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >      xfree(dpci);
> >  }
> >
> > +/*
> > + * Here we handle the following cases:
> > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > + *   guest vector using vector-hashing mechamisn and return true. This
> > follows
> > + *   the hardware behavior, since modern Intel CPUs use vector hashing to
> > + *   handle the lowest-priority interrupt.
> > + * - Otherwise, for single destination interrupt, it is straightforward to
> > + *   find the destination vCPU and return true.
> > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > + *   so return false.
> > + */
> > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > +                                uint8_t dest_mode, uint8_t
> > deliver_mode,
> > +                                uint32_t gvec, struct vcpu
> > **dest_vcpu)
> > +{
> > +    struct vcpu *v, **dest_vcpu_array;
> > +    unsigned int dest_vcpu_num = 0;
> > +    int ret;
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > +                                dest_id, dest_mode) )
> > +            continue;
> > +
> > +        dest_vcpu_num++;
> > +
> > +        if ( deliver_mode == dest_LowestPrio )
> > +            dest_vcpu_array[dest_vcpu_num] = v;
> > +        else
> > +            *dest_vcpu = v;
> > +    }
> > +
> > +    if ( deliver_mode == dest_LowestPrio )
> > +    {
> > +        if (  dest_vcpu_num != 0 )
> > +        {
> > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > +            ret = 1;
> > +        }
> > +        else
> > +            ret = 0;
> > +
> > +        xfree(dest_vcpu_array);
> > +        return ret;
> > +    }
> > +    else if (  dest_vcpu_num == 1 )
> > +        return 1;
> > +    else
> > +        return 0;
> > +}
> > +
> >  int pt_irq_create_bind(
> >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> >  {
> > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> >      {
> >      case PT_IRQ_TYPE_MSI:
> >      {
> > -        uint8_t dest, dest_mode;
> > +        uint8_t dest, dest_mode, deliver_mode;
> >          int dest_vcpu_id;
> >
> >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> > GFLAGS_SHIFT_DELIV_MODE) &
> > +                        VMSI_DELIV_MASK;
> >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> >          spin_unlock(&d->event_lock);
> >          if ( dest_vcpu_id >= 0 )
> >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > +
> > +        /* Use interrupt posting if it is supported */
> > +        if ( iommu_intpost )
> > +        {
> > +            struct vcpu *vcpu = NULL;
> > +
> > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > +                                    pirq_dpci->gmsi.gvec, &vcpu) )
> > +                break;
> > +
> 
> Is it possible this new pi_find_dest_vcpu will return a different target from
> earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
> earlier pirqs are migrated according to different policy. We need consolidate
> vcpu selection policies together to keep consistency.

In my understanding, what you described above is the software way to deliver
the interrupts to vCPU, when posted-interrupt is used, interrupts are delivered
by hardware according to the settings in IRTE, hence those software path will
not get touched for these interrupts. So do we need to care about how software
might migrate the interrupts here?

> 
> and why failure to find dest_vcpu doesn't lead to an error but a break?

We cannot post multicast/broadcast interrupts to a guest, and
pi_find_dest_vcpu() returns 0 when encountering a multicast/broadcast
interrupt, in that case, we still use interrupt remapping mechanism for it.

Thanks,
Feng

> 
> > +            if ( pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec ) != 0 )
> > +            {
> > +                dprintk(XENLOG_G_INFO, "failed to update PI IRTE\n");
> > +                return -EBUSY;
> > +            }
> > +        }
> > +
> >          break;
> >      }
> >
> > --
> > 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 10/15] vmx: Define two per-cpu variants
  2015-04-02  5:54   ` Tian, Kevin
@ 2015-04-02  6:24     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  6:24 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 1:55 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 10/15] vmx: Define two per-cpu variants
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > This patch defines two per-cpu variants:
> >
> > blocked_vcpu_on_cpu:
> > A list storing the vCPUs which were blocked on this pCPU.
> >
> > blocked_vcpu_on_cpu_lock:
> > The spinlock to protect blocked_vcpu_on_cpu.
> 
> since above two are already per-cpu variants, you don't need
> 'on_cpu' in the name to duplicate it. How about just call them
> "blocked_vcpus" and "blocked_vcpus_lock"? :-)

Sounds great!

Thanks,
Feng

> 
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmcs.c       | 3 +++
> >  xen/arch/x86/hvm/vmx/vmx.c        | 7 +++++++
> >  xen/include/asm-x86/hvm/vmx/vmx.h | 3 +++
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 942f4b7..1345e69 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -585,6 +585,9 @@ int vmx_cpu_up(void)
> >      if ( cpu_has_vmx_vpid )
> >          vpid_sync_all();
> >
> > +    INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > +    spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +
> >      return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index e1c55ce..ff5544d 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -81,6 +81,13 @@ static int vmx_msr_read_intercept(unsigned int msr,
> > uint64_t *msr_content);
> >  static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
> > + * can find which vCPU should be waken up.
> > + */
> > +DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> >
> >  static int vmx_domain_initialise(struct domain *d)
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> > b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index 3cd75eb..e643c3c 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -30,6 +30,9 @@
> >  #include <asm/hvm/vmx/vmcs.h>
> >  #include <asm/apic.h>
> >
> > +DECLARE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  extern uint8_t posted_intr_vector;
> >
> >  typedef union {
> > --
> > 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-03-25 12:31 ` [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling Feng Wu
  2015-03-26 20:16   ` Andrew Cooper
@ 2015-04-02  6:24   ` Tian, Kevin
  2015-04-02  8:39     ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  6:24 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, March 25, 2015 8:32 PM
> 
> The basic idea here is:
> 1. When vCPU's state is RUNSTATE_running,
>         - set 'NV' to 'Notification Vector'.
>         - Clear 'SN' to accpet PI.
>         - set 'NDST' to the right pCPU.
> 2. When vCPU's state is RUNSTATE_blocked,
>         - set 'NV' to 'Wake-up Vector', so we can wake up the
>           related vCPU when posted-interrupt happens for it.
>         - Clear 'SN' to accpet PI.
> 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
>         - Set 'SN' to suppress non-urgent interrupts.
>           (Current, we only support non-urgent interrupts)
>         - Set 'NV' back to 'Notification Vector' if needed.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 108
> +++++++++++++++++++++++++++++++++++++++++++++
>  xen/common/schedule.c      |   3 ++
>  2 files changed, 111 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b30392c..6323bd6 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
>      __vmwrite(GUEST_INTR_STATUS, status);
>  }
> 
> +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    struct pi_desc old, new;
> +    int old_state = v->runstate.state;
> +    unsigned long flags;
> +
> +    if ( !iommu_intpost )
> +        return;
> +
> +    switch ( new_state )
> +    {
> +    case RUNSTATE_runnable:
> +    case RUNSTATE_offline:
> +        /*
> +         * We don't need to send notification event to a non-running
> +         * vcpu, the interrupt information will be delivered to it before
> +         * VM-ENTRY when the vcpu is scheduled to run next time.
> +         */
> +        pi_set_sn(pi_desc);
> +
> +        /*
> +         * If the state is transferred from RUNSTATE_blocked,
> +         * we should set 'NV' feild back to posted_intr_vector,
> +         * so the Posted-Interrupts can be delivered to the vCPU
> +         * by VT-d HW after it is scheduled to run.
> +         */
> +        if ( old_state == RUNSTATE_blocked )
> +        {
> +            do
> +            {
> +                old.control = new.control = pi_desc->control;
> +                new.nv = posted_intr_vector;
> +            }
> +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                    != old.control );
> +
> +           /*
> +            * Delete the vCPU from the related wakeup queue
> +            * if we are resuming from blocked state
> +            */
> +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                             v->processor), flags);
> +           list_del(&v->blocked_vcpu_list);
> +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                                  v->processor), flags);
> +        }
> +        break;
> +
> +    case RUNSTATE_blocked:
> +        /*
> +         * The vCPU is blocked on the wait queue.
> +         * Store the blocked vCPU on the list of the
> +         * vcpu->wakeup_cpu, which is the destination
> +         * of the wake-up notification event.
> +         */
> +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                          v->processor), flags);
> +        list_add_tail(&v->blocked_vcpu_list,
> +                      &per_cpu(blocked_vcpu_on_cpu, v->processor));
> +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                               v->processor), flags);
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +
> +            /*
> +             * We should not block the vCPU if
> +             * an interrupt is posted for it.
> +             */
> +
> +            if ( pi_test_on(&old) == 1 )
> +            {
> +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> +                return;
> +            }

so you also need to remove the vcpu from the blocked list, right?

and how do you handle ON is set after above check? looks this is better
handled behind cmpxchg loop...

> +
> +            pi_clear_sn(&new);
> +            new.nv = pi_wakeup_vector;
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );
> +        break;
> +
> +    case RUNSTATE_running:
> +        ASSERT( pi_test_sn(pi_desc) == 1 );
> +
> +        do
> +        {
> +            old.control = new.control = pi_desc->control;
> +            if ( x2apic_enabled )
> +                new.ndst = cpu_physical_id(v->processor);
> +            else
> +                new.ndst = (cpu_physical_id(v->processor) << 8) &
> 0xFF00;
> +
> +            pi_clear_sn(&new);
> +        }
> +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                != old.control );
> +        break;
> +
> +    default:
> +        break;
> +    }
> +}
> +
>  void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
>                                 uint32_t *eax, uint32_t *ebx,
>                                 uint32_t *ecx, uint32_t *edx)
> @@ -1795,6 +1902,7 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>      .process_isr          = vmx_process_isr,
>      .deliver_posted_intr  = vmx_deliver_posted_intr,
>      .sync_pir_to_irr      = vmx_sync_pir_to_irr,
> +    .pi_desc_update       = vmx_pi_desc_update,
>      .handle_eoi           = vmx_handle_eoi,
>      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
>      .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ef79847..acf3186 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -157,6 +157,9 @@ static inline void vcpu_runstate_change(
>          v->runstate.state_entry_time = new_entry_time;
>      }
> 
> +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> +        hvm_funcs.pi_desc_update(v, new_state);
> +
>      v->runstate.state = new_state;
>  }
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-03-27  9:58       ` Jan Beulich
@ 2015-04-02  6:32         ` Tian, Kevin
  0 siblings, 0 replies; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  6:32 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Wu, Feng; +Cc: Zhang, Yang Z, keir, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, March 27, 2015 5:58 PM
> 
> >>> On 27.03.15 at 02:53, <feng.wu@intel.com> wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: Friday, March 27, 2015 3:01 AM
> >> On 25/03/15 12:31, Feng Wu wrote:
> >> > --- a/xen/drivers/passthrough/vtd/iommu.h
> >> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> >> > @@ -303,6 +303,18 @@ struct iremap_entry {
> >> >               res_2   : 8,
> >> >               dst     : 32;
> >> >       }lo;
> >> > +    struct {
> >> > +        u64 p       : 1,
> >> > +            fpd     : 1,
> >> > +            res_1   : 6,
> >> > +            avail   : 4,
> >> > +            res_2   : 2,
> >> > +            urg     : 1,
> >> > +            im      : 1,
> >> > +            vector  : 8,
> >> > +            res_3   : 14,
> >> > +            pda_l   : 26;
> >> > +    }lo_intpost;
> >> >     };
> >> >     union {
> >> >       u64 hi_val;
> >> > @@ -312,6 +324,13 @@ struct iremap_entry {
> >> >               svt     : 2,
> >> >               res_1   : 44;
> >> >       }hi;
> >> > +    struct {
> >> > +        u64 sid     : 16,
> >> > +            sq      : 2,
> >> > +            svt     : 2,
> >> > +            res_1   : 12,
> >> > +            pda_h   : 32;
> >> > +    }hi_intpost;
> >>
> >> I would prefer if this union was reformatted as I suggested in the
> >> thread from your design doc, but I won't insist on it as a blocker to entry.
> >
> > Thanks for the comments. I also considered your sugguestion on the Design
> > doc, here is your proposal:
> >
> > struct iremap_entry {
> >     union {
> >         struct { u64 lo, hi; };
> >         struct { <bitfields> } norm; (names subject to improvement)
> >         struct { <bitfields> } post;
> >     };
> > };
> >
> > Seems in that way, we need to change some existing code to adapt to
> > this new structure. I am okay with both of them, but can we listen
> > some voices form some others? Is it okay for you?
> 
> I think this would be a good move, but in the end it's the VT-d
> maintainers who got to decide.
> 

yes it's a good move. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-04-02  6:20     ` Wu, Feng
@ 2015-04-02  6:49       ` Tian, Kevin
  2015-04-02  8:02         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-02  6:49 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Thursday, April 02, 2015 2:21 PM
> 
> 
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Thursday, April 02, 2015 1:52 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> > changes
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, March 25, 2015 8:32 PM
> > >
> > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > for direct-assigned devices, we need to update the associated IRTE
> > > with the new guest vector, so external interrupts from the assigned
> > > devices can be injected to guests without VM-Exit.
> > >
> > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > the destination vCPU. This follows the hardware behavior, since modern
> > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > >
> > > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > > still use interrupt remapping.
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > >  xen/drivers/passthrough/io.c | 77
> > > +++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 76 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > index ae050df..1d9a132 100644
> > > --- a/xen/drivers/passthrough/io.c
> > > +++ b/xen/drivers/passthrough/io.c
> > > @@ -26,6 +26,7 @@
> > >  #include <asm/hvm/iommu.h>
> > >  #include <asm/hvm/support.h>
> > >  #include <xen/hvm/irq.h>
> > > +#include <asm/io_apic.h>
> > >
> > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > >
> > > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> *dpci)
> > >      xfree(dpci);
> > >  }
> > >
> > > +/*
> > > + * Here we handle the following cases:
> > > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > > + *   guest vector using vector-hashing mechamisn and return true. This
> > > follows
> > > + *   the hardware behavior, since modern Intel CPUs use vector hashing
> to
> > > + *   handle the lowest-priority interrupt.
> > > + * - Otherwise, for single destination interrupt, it is straightforward to
> > > + *   find the destination vCPU and return true.
> > > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt
> posting,
> > > + *   so return false.
> > > + */
> > > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > > +                                uint8_t dest_mode, uint8_t
> > > deliver_mode,
> > > +                                uint32_t gvec, struct vcpu
> > > **dest_vcpu)
> > > +{
> > > +    struct vcpu *v, **dest_vcpu_array;
> > > +    unsigned int dest_vcpu_num = 0;
> > > +    int ret;
> > > +
> > > +    if ( deliver_mode == dest_LowestPrio )
> > > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> > > +
> > > +    for_each_vcpu ( d, v )
> > > +    {
> > > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > > +                                dest_id, dest_mode) )
> > > +            continue;
> > > +
> > > +        dest_vcpu_num++;
> > > +
> > > +        if ( deliver_mode == dest_LowestPrio )
> > > +            dest_vcpu_array[dest_vcpu_num] = v;
> > > +        else
> > > +            *dest_vcpu = v;
> > > +    }
> > > +
> > > +    if ( deliver_mode == dest_LowestPrio )
> > > +    {
> > > +        if (  dest_vcpu_num != 0 )
> > > +        {
> > > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > > +            ret = 1;
> > > +        }
> > > +        else
> > > +            ret = 0;
> > > +
> > > +        xfree(dest_vcpu_array);
> > > +        return ret;
> > > +    }
> > > +    else if (  dest_vcpu_num == 1 )
> > > +        return 1;
> > > +    else
> > > +        return 0;
> > > +}
> > > +
> > >  int pt_irq_create_bind(
> > >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> > >  {
> > > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> > >      {
> > >      case PT_IRQ_TYPE_MSI:
> > >      {
> > > -        uint8_t dest, dest_mode;
> > > +        uint8_t dest, dest_mode, deliver_mode;
> > >          int dest_vcpu_id;
> > >
> > >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> > >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> > >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> > >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> > > GFLAGS_SHIFT_DELIV_MODE) &
> > > +                        VMSI_DELIV_MASK;
> > >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> > >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> > >          spin_unlock(&d->event_lock);
> > >          if ( dest_vcpu_id >= 0 )
> > >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > > +
> > > +        /* Use interrupt posting if it is supported */
> > > +        if ( iommu_intpost )
> > > +        {
> > > +            struct vcpu *vcpu = NULL;
> > > +
> > > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > > +                                    pirq_dpci->gmsi.gvec,
> &vcpu) )
> > > +                break;
> > > +
> >
> > Is it possible this new pi_find_dest_vcpu will return a different target from
> > earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
> > earlier pirqs are migrated according to different policy. We need consolidate
> > vcpu selection policies together to keep consistency.
> 
> In my understanding, what you described above is the software way to deliver
> the interrupts to vCPU, when posted-interrupt is used, interrupts are delivered
> by hardware according to the settings in IRTE, hence those software path will
> not get touched for these interrupts. So do we need to care about how
> software
> might migrate the interrupts here?

just curious why we can't use one policy for vcpu selection. if multicast 
handling is a difference, you may pass intpost as a parameter to use
same function.

> 
> >
> > and why failure to find dest_vcpu doesn't lead to an error but a break?
> 
> We cannot post multicast/broadcast interrupts to a guest, and
> pi_find_dest_vcpu() returns 0 when encountering a multicast/broadcast
> interrupt, in that case, we still use interrupt remapping mechanism for it.

then you might handle postint first, and then if muticast or no intpost support
then go to software style.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-04-02  6:00   ` Tian, Kevin
@ 2015-04-02  7:18     ` Wu, Feng
  2015-04-08  9:02       ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  7:18 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 2:01 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > This patch adds a global vector which is used to wake up
> > the blocked vCPU when an interrupt is being posted to it.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c        | 33
> > +++++++++++++++++++++++++++++++++
> >  xen/include/asm-x86/hvm/hvm.h     |  1 +
> >  xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
> >  xen/include/xen/sched.h           |  2 ++
> >  4 files changed, 39 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index ff5544d..b2b4c26 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head,
> > blocked_vcpu_on_cpu);
> >  DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> >
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >      if ( v->vcpu_id == 0 )
> >          v->arch.user_regs.eax = 1;
> >
> > +    INIT_LIST_HEAD(&v->blocked_vcpu_list);
> > +
> >      return 0;
> >  }
> >
> > @@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init
> > start_vmx(void)
> >      }
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> > +    {
> >          alloc_direct_apic_vector(&posted_intr_vector,
> > event_check_interrupt);
> > +
> > +        if ( iommu_intpost )
> > +            alloc_direct_apic_vector(&pi_wakeup_vector,
> > pi_wakeup_interrupt);
> > +        else
> > +            vmx_function_table.pi_desc_update = NULL;
> > +    }
> 
> just style issue. Above conditional logic looks not intuitive to me.
> usually we have:
> 	if ( iommu_intpost )
> 		vmx_function_table.pi_desc_update = func;
> 	else
> 		vmx_function_table.pi_desc_update = NULL;
> 
> suppose you will register callback in later patch. then better to
> move the NULL one there too. Putting it here doesn't meet the
> normal if...else implications. :-)

You suggestion is good. Here is my idea about this code fragment:

Here is the place to register notification event handle, so it is better
to register the wakeup event handle for VT-d PI here as well. Just like other
members in vmx_function_table, such as, deliver_posted_intr, sync_pir_to_irr, 
pi_desc_update is initialed to 'vmx_pi_desc_update' in the definition of
vmx_function_table statically. So do you have any ideas to make this
gracefully?

Thanks,
Feng

> 
> >      else
> >      {
> >          vmx_function_table.deliver_posted_intr = NULL;
> >          vmx_function_table.sync_pir_to_irr = NULL;
> > +        vmx_function_table.pi_desc_update = NULL;
> >      }
> >
> >      if ( cpu_has_vmx_ept
> > @@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct
> > cpu_user_regs *regs)
> >  }
> >
> >  /*
> > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > + */
> > +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +    struct vcpu *v;
> > +    int cpu = smp_processor_id();
> > +
> > +    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > +                    blocked_vcpu_list) {
> > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +        if ( pi_test_on(pi_desc) == 1 )
> > +            tasklet_schedule(&v->vcpu_wakeup_tasklet);
> 
> why can't we directly call vcpu_unblock here?

Please see the following scenario if we use vcpu_unblock directly here:

pi_wakeup_interrupt() (blocked_vcpu_on_cpu_lock is required) --> vcpu_unblock() --> 
vcpu_wake() --> vcpu_runstate_change() --> vmx_ pi_desc_update() (In this function we
may need to require blocked_vcpu_on_cpu_lock, this will cause dead lock.)

Thanks,
Feng


> 
> > +    }
> > +    spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +
> > +    ack_APIC_irq();
> > +    this_cpu(irq_count)++;
> > +}
> > +
> > +/*
> >   * Local variables:
> >   * mode: C
> >   * c-file-style: "BSD"
> > diff --git a/xen/include/asm-x86/hvm/hvm.h
> > b/xen/include/asm-x86/hvm/hvm.h
> > index 0dc909b..a11a256 100644
> > --- a/xen/include/asm-x86/hvm/hvm.h
> > +++ b/xen/include/asm-x86/hvm/hvm.h
> > @@ -195,6 +195,7 @@ struct hvm_function_table {
> >      void (*deliver_posted_intr)(struct vcpu *v, u8 vector);
> >      void (*sync_pir_to_irr)(struct vcpu *v);
> >      void (*handle_eoi)(u8 vector);
> > +    void (*pi_desc_update)(struct vcpu *v, int new_state);
> >
> >      /*Walk nested p2m  */
> >      int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa,
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> > b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index e643c3c..f4296ab 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -34,6 +34,7 @@ DECLARE_PER_CPU(struct list_head,
> > blocked_vcpu_on_cpu);
> >  DECLARE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> >
> >  extern uint8_t posted_intr_vector;
> > +extern uint8_t pi_wakeup_vector;
> >
> >  typedef union {
> >      struct {
> > @@ -574,6 +575,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
> >  void free_p2m_hap_data(struct p2m_domain *p2m);
> >  void p2m_init_hap_data(struct p2m_domain *p2m);
> >
> > +void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> > +
> >  /* EPT violation qualifications definitions */
> >  #define _EPT_READ_VIOLATION         0
> >  #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index c874dd4..91f0912 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -148,6 +148,8 @@ struct vcpu
> >
> >      struct vcpu     *next_in_list;
> >
> > +    struct list_head blocked_vcpu_list;
> > +
> >      s_time_t         periodic_period;
> >      s_time_t         periodic_last_event;
> >      struct timer     periodic_timer;
> > --
> > 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
  2015-04-02  5:53   ` Tian, Kevin
@ 2015-04-02  7:20     ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  7:20 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 1:53 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked
> vCPU
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > This patch adds a new per-vCPU tasklet to wakeup the blocked
> > vCPU. It can be used in the case vcpu_unblock cannot be called
> > directly.
> 
> could you elaborate under which scenario vcpu_unblock can't
> be called directly?

Please see the reply to " [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts "
I will elaborate it a bit more in the next post. Thanks!

Thanks,
Feng

> 
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/common/domain.c     | 11 +++++++++++
> >  xen/include/xen/sched.h |  3 +++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index aa78fd7..fe89658 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -109,6 +109,13 @@ static void vcpu_check_shutdown(struct vcpu *v)
> >      spin_unlock(&d->shutdown_lock);
> >  }
> >
> > +static void vcpu_wakeup_tasklet_handler(unsigned long arg)
> > +{
> > +    struct vcpu *v = (void *)arg;
> > +
> > +    vcpu_unblock(v);
> > +}
> > +
> >  struct vcpu *alloc_vcpu(
> >      struct domain *d, unsigned int vcpu_id, unsigned int cpu_id)
> >  {
> > @@ -126,6 +133,9 @@ struct vcpu *alloc_vcpu(
> >
> >      tasklet_init(&v->continue_hypercall_tasklet, NULL, 0);
> >
> > +    tasklet_init(&v->vcpu_wakeup_tasklet, vcpu_wakeup_tasklet_handler,
> > +                 (unsigned long)v);
> > +
> >      if ( !zalloc_cpumask_var(&v->cpu_hard_affinity) ||
> >           !zalloc_cpumask_var(&v->cpu_hard_affinity_tmp) ||
> >           !zalloc_cpumask_var(&v->cpu_hard_affinity_saved) ||
> > @@ -784,6 +794,7 @@ static void complete_domain_destroy(struct
> rcu_head
> > *head)
> >          if ( (v = d->vcpu[i]) == NULL )
> >              continue;
> >          tasklet_kill(&v->continue_hypercall_tasklet);
> > +        tasklet_kill(&v->vcpu_wakeup_tasklet);
> >          vcpu_destroy(v);
> >          sched_destroy_vcpu(v);
> >          destroy_waitqueue_vcpu(v);
> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index ccd7ed8..c874dd4 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -239,6 +239,9 @@ struct vcpu
> >      /* Tasklet for continue_hypercall_on_cpu(). */
> >      struct tasklet   continue_hypercall_tasklet;
> >
> > +    /* Tasklet for wakeup_blocked_vcpu(). */
> > +    struct tasklet   vcpu_wakeup_tasklet;
> > +
> >      /* Multicall information. */
> >      struct mc_state  mc_state;
> >
> > --
> > 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-04-02  6:08           ` Tian, Kevin
@ 2015-04-02  7:21             ` Wu, Feng
  2015-04-02 19:15             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  7:21 UTC (permalink / raw)
  To: Tian, Kevin, Zhang, Yang Z, xen-devel; +Cc: Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 2:08 PM
> To: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org
> Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when vCPU
> is running
> 
> > From: Wu, Feng
> > Sent: Friday, March 27, 2015 12:58 PM
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Yang Z
> > > Sent: Friday, March 27, 2015 12:44 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when
> > vCPU
> > > is running
> > >
> > > Wu, Feng wrote on 2015-03-27:
> > > >
> > > >
> > > > Zhang, Yang Z wrote on 2015-03-25:
> > > >> when vCPU is running
> > > >>
> > > >> Wu, Feng wrote on 2015-03-25:
> > > >>> When a vCPU is running in Root mode and a notification event has
> > > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > > >>> vIRR before
> > > > VM-Exit in time.
> > > >>
> > > >> Shouldn't the pending interrupt be synced unconditionally before next
> > > >> vmentry? What happens if we didn't set the softirq?
> > > >
> > > > If we didn't set the softirq in the notification handler, the
> > > > interrupts happened exactly before VM-entry cannot be delivered to
> > > > guest at this time. Please see the following code fragments from
> > > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > > >
> > > > .Lvmx_do_vmentry
> > > >
> > > > ......
> > > > 		/* If Vt-d engine issues a notification event here,
> > > >          * it cannot be delivered to guest during this VM-entry
> > > >          * without raising the softirq in notification handler. */
> > > >         cmp  %ecx,(%rdx,%rax,1)
> > > >         jnz  .Lvmx_process_softirqs
> > > > ......
> > > >
> > > >         je   .Lvmx_launch
> > > > ......
> > > >
> > > >
> > > > .Lvmx_process_softirqs:
> > > >         sti
> > > >         call do_softirq
> > > >         jmp  .Lvmx_do_vmentry
> > >
> > > You are right! This helps me to recall why raise the softirq when delivering
> > the
> > > PI.
> >
> > Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it
> sets
> > the
> > softirq for this purpose, however, when VT-d HW delivers PI, we have no
> > control to
> > the HW itself, hence we need to set this softirq in the Notification Event
> > handler.
> >
> 
> could you include this information in the comment so others can easily
> understand this requirement? from code you only mentioned VCPU_KICK
> _SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.
> 

No problem!

Thanks,
Feng

> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-04-02  6:49       ` Tian, Kevin
@ 2015-04-02  8:02         ` Wu, Feng
  2015-04-03  8:29           ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  8:02 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 2:50 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> changes
> 
> > From: Wu, Feng
> > Sent: Thursday, April 02, 2015 2:21 PM
> >
> >
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Thursday, April 02, 2015 1:52 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > > Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> > > changes
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, March 25, 2015 8:32 PM
> > > >
> > > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > > for direct-assigned devices, we need to update the associated IRTE
> > > > with the new guest vector, so external interrupts from the assigned
> > > > devices can be injected to guests without VM-Exit.
> > > >
> > > > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > > > the destination vCPU. This follows the hardware behavior, since modern
> > > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > > >
> > > > For multicase/broadcast vCPU, we cannot handle it via interrupt posting,
> > > > still use interrupt remapping.
> > > >
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > ---
> > > >  xen/drivers/passthrough/io.c | 77
> > > > +++++++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 76 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > > index ae050df..1d9a132 100644
> > > > --- a/xen/drivers/passthrough/io.c
> > > > +++ b/xen/drivers/passthrough/io.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include <asm/hvm/iommu.h>
> > > >  #include <asm/hvm/support.h>
> > > >  #include <xen/hvm/irq.h>
> > > > +#include <asm/io_apic.h>
> > > >
> > > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > > >
> > > > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> > *dpci)
> > > >      xfree(dpci);
> > > >  }
> > > >
> > > > +/*
> > > > + * Here we handle the following cases:
> > > > + * - For lowest-priority interrupts, we find the destination vCPU from the
> > > > + *   guest vector using vector-hashing mechamisn and return true. This
> > > > follows
> > > > + *   the hardware behavior, since modern Intel CPUs use vector
> hashing
> > to
> > > > + *   handle the lowest-priority interrupt.
> > > > + * - Otherwise, for single destination interrupt, it is straightforward to
> > > > + *   find the destination vCPU and return true.
> > > > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt
> > posting,
> > > > + *   so return false.
> > > > + */
> > > > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > > > +                                uint8_t dest_mode, uint8_t
> > > > deliver_mode,
> > > > +                                uint32_t gvec, struct vcpu
> > > > **dest_vcpu)
> > > > +{
> > > > +    struct vcpu *v, **dest_vcpu_array;
> > > > +    unsigned int dest_vcpu_num = 0;
> > > > +    int ret;
> > > > +
> > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > +        dest_vcpu_array = xzalloc_array(struct vcpu *, d->max_vcpus);
> > > > +
> > > > +    for_each_vcpu ( d, v )
> > > > +    {
> > > > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > > > +                                dest_id, dest_mode) )
> > > > +            continue;
> > > > +
> > > > +        dest_vcpu_num++;
> > > > +
> > > > +        if ( deliver_mode == dest_LowestPrio )
> > > > +            dest_vcpu_array[dest_vcpu_num] = v;
> > > > +        else
> > > > +            *dest_vcpu = v;
> > > > +    }
> > > > +
> > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > +    {
> > > > +        if (  dest_vcpu_num != 0 )
> > > > +        {
> > > > +            *dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num];
> > > > +            ret = 1;
> > > > +        }
> > > > +        else
> > > > +            ret = 0;
> > > > +
> > > > +        xfree(dest_vcpu_array);
> > > > +        return ret;
> > > > +    }
> > > > +    else if (  dest_vcpu_num == 1 )
> > > > +        return 1;
> > > > +    else
> > > > +        return 0;
> > > > +}
> > > > +
> > > >  int pt_irq_create_bind(
> > > >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> > > >  {
> > > > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> > > >      {
> > > >      case PT_IRQ_TYPE_MSI:
> > > >      {
> > > > -        uint8_t dest, dest_mode;
> > > > +        uint8_t dest, dest_mode, deliver_mode;
> > > >          int dest_vcpu_id;
> > > >
> > > >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > > > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> > > >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> > > >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> > > >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > > > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> > > > GFLAGS_SHIFT_DELIV_MODE) &
> > > > +                        VMSI_DELIV_MASK;
> > > >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest,
> dest_mode);
> > > >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> > > >          spin_unlock(&d->event_lock);
> > > >          if ( dest_vcpu_id >= 0 )
> > > >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > > > +
> > > > +        /* Use interrupt posting if it is supported */
> > > > +        if ( iommu_intpost )
> > > > +        {
> > > > +            struct vcpu *vcpu = NULL;
> > > > +
> > > > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode, deliver_mode,
> > > > +                                    pirq_dpci->gmsi.gvec,
> > &vcpu) )
> > > > +                break;
> > > > +
> > >
> > > Is it possible this new pi_find_dest_vcpu will return a different target from
> > > earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
> > > earlier pirqs are migrated according to different policy. We need consolidate
> > > vcpu selection policies together to keep consistency.
> >
> > In my understanding, what you described above is the software way to
> deliver
> > the interrupts to vCPU, when posted-interrupt is used, interrupts are
> delivered
> > by hardware according to the settings in IRTE, hence those software path will
> > not get touched for these interrupts. So do we need to care about how
> > software
> > might migrate the interrupts here?
> 
> just curious why we can't use one policy for vcpu selection. if multicast
> handling is a difference, you may pass intpost as a parameter to use
> same function.
> 

Digging into hvm_girq_dest_2_vcpu_id, I find that hvm_girq_dest_2_vcpu_id()
is introduced by commit 023e3bc7, and it is just an optimization for interrupts
with single destination. For most case, the destination of a vCPU is determined
by vmsi_deliver().

> >
> > >
> > > and why failure to find dest_vcpu doesn't lead to an error but a break?
> >
> > We cannot post multicast/broadcast interrupts to a guest, and
> > pi_find_dest_vcpu() returns 0 when encountering a multicast/broadcast
> > interrupt, in that case, we still use interrupt remapping mechanism for it.
> 
> then you might handle postint first, and then if muticast or no intpost support
> then go to software style.

That is a good suggestion. I will think more about how the handle this better.

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-04-02  6:24   ` Tian, Kevin
@ 2015-04-02  8:39     ` Wu, Feng
  2015-04-08  8:53       ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-02  8:39 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, April 02, 2015 2:25 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU
> scheduling
> 
> > From: Wu, Feng
> > Sent: Wednesday, March 25, 2015 8:32 PM
> >
> > The basic idea here is:
> > 1. When vCPU's state is RUNSTATE_running,
> >         - set 'NV' to 'Notification Vector'.
> >         - Clear 'SN' to accpet PI.
> >         - set 'NDST' to the right pCPU.
> > 2. When vCPU's state is RUNSTATE_blocked,
> >         - set 'NV' to 'Wake-up Vector', so we can wake up the
> >           related vCPU when posted-interrupt happens for it.
> >         - Clear 'SN' to accpet PI.
> > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> >         - Set 'SN' to suppress non-urgent interrupts.
> >           (Current, we only support non-urgent interrupts)
> >         - Set 'NV' back to 'Notification Vector' if needed.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c | 108
> > +++++++++++++++++++++++++++++++++++++++++++++
> >  xen/common/schedule.c      |   3 ++
> >  2 files changed, 111 insertions(+)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b30392c..6323bd6 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
> >      __vmwrite(GUEST_INTR_STATUS, status);
> >  }
> >
> > +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +    struct pi_desc old, new;
> > +    int old_state = v->runstate.state;
> > +    unsigned long flags;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> > +
> > +    switch ( new_state )
> > +    {
> > +    case RUNSTATE_runnable:
> > +    case RUNSTATE_offline:
> > +        /*
> > +         * We don't need to send notification event to a non-running
> > +         * vcpu, the interrupt information will be delivered to it before
> > +         * VM-ENTRY when the vcpu is scheduled to run next time.
> > +         */
> > +        pi_set_sn(pi_desc);
> > +
> > +        /*
> > +         * If the state is transferred from RUNSTATE_blocked,
> > +         * we should set 'NV' feild back to posted_intr_vector,
> > +         * so the Posted-Interrupts can be delivered to the vCPU
> > +         * by VT-d HW after it is scheduled to run.
> > +         */
> > +        if ( old_state == RUNSTATE_blocked )
> > +        {
> > +            do
> > +            {
> > +                old.control = new.control = pi_desc->control;
> > +                new.nv = posted_intr_vector;
> > +            }
> > +            while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                    != old.control );
> > +
> > +           /*
> > +            * Delete the vCPU from the related wakeup queue
> > +            * if we are resuming from blocked state
> > +            */
> > +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                             v->processor), flags);
> > +           list_del(&v->blocked_vcpu_list);
> > +           spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                  v->processor), flags);
> > +        }
> > +        break;
> > +
> > +    case RUNSTATE_blocked:
> > +        /*
> > +         * The vCPU is blocked on the wait queue.
> > +         * Store the blocked vCPU on the list of the
> > +         * vcpu->wakeup_cpu, which is the destination
> > +         * of the wake-up notification event.
> > +         */
> > +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                          v->processor), flags);
> > +        list_add_tail(&v->blocked_vcpu_list,
> > +                      &per_cpu(blocked_vcpu_on_cpu, v->processor));
> > +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                               v->processor), flags);
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +
> > +            /*
> > +             * We should not block the vCPU if
> > +             * an interrupt is posted for it.
> > +             */
> > +
> > +            if ( pi_test_on(&old) == 1 )
> > +            {
> > +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > +                return;
> > +            }
> 
> so you also need to remove the vcpu from the blocked list, right?

Yes, I need to remove the vcpu here. I thought it could be removed in
another place in this patch, however, I feel it cannot do it after
more thinking about it. I think we can fix this issue in the following two ways:
#1) Just add the remove logic here.
or 
#2) In function vcpu_runstate_change(), call 'hvm_funcs.pi_desc_update()' after
v->runstate.state = new_state; and pass the old_state to 'hvm_funcs.pi_desc_update()'.
then here is the code path:

tasklet_schedule(&v->vcpu_wakeup_tasklet) -> vcpu_unblock() -> vcpu_wake() ->
vcpu_runstate_change(v, RUNSTATE_runnable, NOW()) -> vmx_pi_desc_update()

So, the vCPU will be removed in 'case RUNSTATE_runnable:' of function vmx_pi_desc_update().

> 
> and how do you handle ON is set after above check? looks this is better
> handled behind cmpxchg loop...

- If 'ON' is set before 'if ( pi_test_on(&old) == 1 )', return
- If 'ON' is not set before it, and is set after it, ' cmpxchg(&pi_desc->control, old.control, new.control) != old.control ' returns ture,
so, we will do the while again, at this time, 'if ( pi_test_on(&old) == 1 )' is true.

Thanks,
Feng

> 
> > +
> > +            pi_clear_sn(&new);
> > +            new.nv = pi_wakeup_vector;
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> > +        break;
> > +
> > +    case RUNSTATE_running:
> > +        ASSERT( pi_test_sn(pi_desc) == 1 );
> > +
> > +        do
> > +        {
> > +            old.control = new.control = pi_desc->control;
> > +            if ( x2apic_enabled )
> > +                new.ndst = cpu_physical_id(v->processor);
> > +            else
> > +                new.ndst = (cpu_physical_id(v->processor) << 8) &
> > 0xFF00;
> > +
> > +            pi_clear_sn(&new);
> > +        }
> > +        while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                != old.control );
> > +        break;
> > +
> > +    default:
> > +        break;
> > +    }
> > +}
> > +
> >  void vmx_hypervisor_cpuid_leaf(uint32_t sub_idx,
> >                                 uint32_t *eax, uint32_t *ebx,
> >                                 uint32_t *ecx, uint32_t *edx)
> > @@ -1795,6 +1902,7 @@ static struct hvm_function_table __initdata
> > vmx_function_table = {
> >      .process_isr          = vmx_process_isr,
> >      .deliver_posted_intr  = vmx_deliver_posted_intr,
> >      .sync_pir_to_irr      = vmx_sync_pir_to_irr,
> > +    .pi_desc_update       = vmx_pi_desc_update,
> >      .handle_eoi           = vmx_handle_eoi,
> >      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
> >      .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index ef79847..acf3186 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -157,6 +157,9 @@ static inline void vcpu_runstate_change(
> >          v->runstate.state_entry_time = new_entry_time;
> >      }
> >
> > +    if ( is_hvm_vcpu(v) && hvm_funcs.pi_desc_update )
> > +        hvm_funcs.pi_desc_update(v, new_state);
> > +
> >      v->runstate.state = new_state;
> >  }
> >
> > --
> > 2.1.0

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-04-02  6:08           ` Tian, Kevin
  2015-04-02  7:21             ` Wu, Feng
@ 2015-04-02 19:15             ` Konrad Rzeszutek Wilk
  2015-04-03  2:00               ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-04-02 19:15 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Zhang, Yang Z, keir, Wu, Feng, JBeulich, xen-devel

On Thu, Apr 02, 2015 at 06:08:12AM +0000, Tian, Kevin wrote:
> > From: Wu, Feng
> > Sent: Friday, March 27, 2015 12:58 PM
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Zhang, Yang Z
> > > Sent: Friday, March 27, 2015 12:44 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when
> > vCPU
> > > is running
> > >
> > > Wu, Feng wrote on 2015-03-27:
> > > >
> > > >
> > > > Zhang, Yang Z wrote on 2015-03-25:
> > > >> when vCPU is running
> > > >>
> > > >> Wu, Feng wrote on 2015-03-25:
> > > >>> When a vCPU is running in Root mode and a notification event has
> > > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > > >>> vIRR before

This would imply that we had VMEXIT-ed due to pending interrupt? And we
end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set
and you stll end up calling the softirq code?

> > > > VM-Exit in time.
> > > >>
> > > >> Shouldn't the pending interrupt be synced unconditionally before next
> > > >> vmentry? What happens if we didn't set the softirq?
> > > >
> > > > If we didn't set the softirq in the notification handler, the
> > > > interrupts happened exactly before VM-entry cannot be delivered to
> > > > guest at this time. Please see the following code fragments from
> > > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > > >
> > > > .Lvmx_do_vmentry
> > > >
> > > > ......
> > > > 		/* If Vt-d engine issues a notification event here,
> > > >          * it cannot be delivered to guest during this VM-entry
> > > >          * without raising the softirq in notification handler. */
> > > >         cmp  %ecx,(%rdx,%rax,1)
> > > >         jnz  .Lvmx_process_softirqs
> > > > ......
> > > >
> > > >         je   .Lvmx_launch
> > > > ......
> > > >
> > > >
> > > > .Lvmx_process_softirqs:
> > > >         sti
> > > >         call do_softirq
> > > >         jmp  .Lvmx_do_vmentry
> > >
> > > You are right! This helps me to recall why raise the softirq when delivering
> > the
> > > PI.
> > 
> > Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it sets
> > the
> > softirq for this purpose, however, when VT-d HW delivers PI, we have no
> > control to
> > the HW itself, hence we need to set this softirq in the Notification Event
> > handler.
> > 
> 
> could you include this information in the comment so others can easily
> understand this requirement? from code you only mentioned VCPU_KICK
> _SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.
> 
> Thanks
> Kevin
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-04-02 19:15             ` Konrad Rzeszutek Wilk
@ 2015-04-03  2:00               ` Wu, Feng
  2015-04-03 13:36                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-03  2:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Tian, Kevin
  Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich, xen-devel



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, April 03, 2015 3:15 AM
> To: Tian, Kevin
> Cc: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event
> when vCPU is running
> 
> On Thu, Apr 02, 2015 at 06:08:12AM +0000, Tian, Kevin wrote:
> > > From: Wu, Feng
> > > Sent: Friday, March 27, 2015 12:58 PM
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang, Yang Z
> > > > Sent: Friday, March 27, 2015 12:44 PM
> > > > To: Wu, Feng; xen-devel@lists.xen.org
> > > > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > > > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when
> > > vCPU
> > > > is running
> > > >
> > > > Wu, Feng wrote on 2015-03-27:
> > > > >
> > > > >
> > > > > Zhang, Yang Z wrote on 2015-03-25:
> > > > >> when vCPU is running
> > > > >>
> > > > >> Wu, Feng wrote on 2015-03-25:
> > > > >>> When a vCPU is running in Root mode and a notification event has
> > > > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > > > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > > > >>> vIRR before
> 
> This would imply that we had VMEXIT-ed due to pending interrupt? And we
> end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set
> and you stll end up calling the softirq code?

No. 

Here is the scenario for the description of this patch:

When vCPU is running in root-mode (such as via hypercall, or any other
reasons which can result in VM-Exit), and before vCPU is back to non-root,
external interrupts happen. Notice that the VM-exit is not caused by this
external interrupt.

Thanks,
Feng

> 
> > > > > VM-Exit in time.
> > > > >>
> > > > >> Shouldn't the pending interrupt be synced unconditionally before next
> > > > >> vmentry? What happens if we didn't set the softirq?
> > > > >
> > > > > If we didn't set the softirq in the notification handler, the
> > > > > interrupts happened exactly before VM-entry cannot be delivered to
> > > > > guest at this time. Please see the following code fragments from
> > > > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > > > >
> > > > > .Lvmx_do_vmentry
> > > > >
> > > > > ......
> > > > > 		/* If Vt-d engine issues a notification event here,
> > > > >          * it cannot be delivered to guest during this VM-entry
> > > > >          * without raising the softirq in notification handler. */
> > > > >         cmp  %ecx,(%rdx,%rax,1)
> > > > >         jnz  .Lvmx_process_softirqs
> > > > > ......
> > > > >
> > > > >         je   .Lvmx_launch
> > > > > ......
> > > > >
> > > > >
> > > > > .Lvmx_process_softirqs:
> > > > >         sti
> > > > >         call do_softirq
> > > > >         jmp  .Lvmx_do_vmentry
> > > >
> > > > You are right! This helps me to recall why raise the softirq when delivering
> > > the
> > > > PI.
> > >
> > > Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it
> sets
> > > the
> > > softirq for this purpose, however, when VT-d HW delivers PI, we have no
> > > control to
> > > the HW itself, hence we need to set this softirq in the Notification Event
> > > handler.
> > >
> >
> > could you include this information in the comment so others can easily
> > understand this requirement? from code you only mentioned VCPU_KICK
> > _SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.
> >
> > Thanks
> > Kevin
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 08/15] Update IRTE according to guest interrupt config changes
  2015-04-02  8:02         ` Wu, Feng
@ 2015-04-03  8:29           ` Tian, Kevin
  0 siblings, 0 replies; 101+ messages in thread
From: Tian, Kevin @ 2015-04-03  8:29 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Thursday, April 02, 2015 4:03 PM
> 
> 
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Thursday, April 02, 2015 2:50 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt config
> > changes
> >
> > > From: Wu, Feng
> > > Sent: Thursday, April 02, 2015 2:21 PM
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tian, Kevin
> > > > Sent: Thursday, April 02, 2015 1:52 PM
> > > > To: Wu, Feng; xen-devel@lists.xen.org
> > > > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > > > Subject: RE: [RFC v1 08/15] Update IRTE according to guest interrupt
> config
> > > > changes
> > > >
> > > > > From: Wu, Feng
> > > > > Sent: Wednesday, March 25, 2015 8:32 PM
> > > > >
> > > > > When guest changes its interrupt configuration (such as, vector, etc.)
> > > > > for direct-assigned devices, we need to update the associated IRTE
> > > > > with the new guest vector, so external interrupts from the assigned
> > > > > devices can be injected to guests without VM-Exit.
> > > > >
> > > > > For lowest-priority interrupts, we use vector-hashing mechamisn to
> find
> > > > > the destination vCPU. This follows the hardware behavior, since
> modern
> > > > > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> > > > >
> > > > > For multicase/broadcast vCPU, we cannot handle it via interrupt
> posting,
> > > > > still use interrupt remapping.
> > > > >
> > > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > > ---
> > > > >  xen/drivers/passthrough/io.c | 77
> > > > > +++++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 76 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > > > > index ae050df..1d9a132 100644
> > > > > --- a/xen/drivers/passthrough/io.c
> > > > > +++ b/xen/drivers/passthrough/io.c
> > > > > @@ -26,6 +26,7 @@
> > > > >  #include <asm/hvm/iommu.h>
> > > > >  #include <asm/hvm/support.h>
> > > > >  #include <xen/hvm/irq.h>
> > > > > +#include <asm/io_apic.h>
> > > > >
> > > > >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> > > > >
> > > > > @@ -199,6 +200,61 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci
> > > *dpci)
> > > > >      xfree(dpci);
> > > > >  }
> > > > >
> > > > > +/*
> > > > > + * Here we handle the following cases:
> > > > > + * - For lowest-priority interrupts, we find the destination vCPU from
> the
> > > > > + *   guest vector using vector-hashing mechamisn and return true.
> This
> > > > > follows
> > > > > + *   the hardware behavior, since modern Intel CPUs use vector
> > hashing
> > > to
> > > > > + *   handle the lowest-priority interrupt.
> > > > > + * - Otherwise, for single destination interrupt, it is straightforward to
> > > > > + *   find the destination vCPU and return true.
> > > > > + * - For multicase/broadcast vCPU, we cannot handle it via interrupt
> > > posting,
> > > > > + *   so return false.
> > > > > + */
> > > > > +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id,
> > > > > +                                uint8_t dest_mode, uint8_t
> > > > > deliver_mode,
> > > > > +                                uint32_t gvec, struct vcpu
> > > > > **dest_vcpu)
> > > > > +{
> > > > > +    struct vcpu *v, **dest_vcpu_array;
> > > > > +    unsigned int dest_vcpu_num = 0;
> > > > > +    int ret;
> > > > > +
> > > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > > +        dest_vcpu_array = xzalloc_array(struct vcpu *,
> d->max_vcpus);
> > > > > +
> > > > > +    for_each_vcpu ( d, v )
> > > > > +    {
> > > > > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> > > > > +                                dest_id, dest_mode) )
> > > > > +            continue;
> > > > > +
> > > > > +        dest_vcpu_num++;
> > > > > +
> > > > > +        if ( deliver_mode == dest_LowestPrio )
> > > > > +            dest_vcpu_array[dest_vcpu_num] = v;
> > > > > +        else
> > > > > +            *dest_vcpu = v;
> > > > > +    }
> > > > > +
> > > > > +    if ( deliver_mode == dest_LowestPrio )
> > > > > +    {
> > > > > +        if (  dest_vcpu_num != 0 )
> > > > > +        {
> > > > > +            *dest_vcpu = dest_vcpu_array[gvec %
> dest_vcpu_num];
> > > > > +            ret = 1;
> > > > > +        }
> > > > > +        else
> > > > > +            ret = 0;
> > > > > +
> > > > > +        xfree(dest_vcpu_array);
> > > > > +        return ret;
> > > > > +    }
> > > > > +    else if (  dest_vcpu_num == 1 )
> > > > > +        return 1;
> > > > > +    else
> > > > > +        return 0;
> > > > > +}
> > > > > +
> > > > >  int pt_irq_create_bind(
> > > > >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> > > > >  {
> > > > > @@ -257,7 +313,7 @@ int pt_irq_create_bind(
> > > > >      {
> > > > >      case PT_IRQ_TYPE_MSI:
> > > > >      {
> > > > > -        uint8_t dest, dest_mode;
> > > > > +        uint8_t dest, dest_mode, deliver_mode;
> > > > >          int dest_vcpu_id;
> > > > >
> > > > >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > > > > @@ -330,11 +386,30 @@ int pt_irq_create_bind(
> > > > >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> > > > >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> > > > >          dest_mode = !!(pirq_dpci->gmsi.gflags &
> VMSI_DM_MASK);
> > > > > +        deliver_mode = (pirq_dpci->gmsi.gflags >>
> > > > > GFLAGS_SHIFT_DELIV_MODE) &
> > > > > +                        VMSI_DELIV_MASK;
> > > > >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest,
> > dest_mode);
> > > > >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> > > > >          spin_unlock(&d->event_lock);
> > > > >          if ( dest_vcpu_id >= 0 )
> > > > >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > > > > +
> > > > > +        /* Use interrupt posting if it is supported */
> > > > > +        if ( iommu_intpost )
> > > > > +        {
> > > > > +            struct vcpu *vcpu = NULL;
> > > > > +
> > > > > +            if ( !pi_find_dest_vcpu(d, dest, dest_mode,
> deliver_mode,
> > > > > +                                    pirq_dpci->gmsi.gvec,
> > > &vcpu) )
> > > > > +                break;
> > > > > +
> > > >
> > > > Is it possible this new pi_find_dest_vcpu will return a different target
> from
> > > > earlier hvm_girq_des_2_vcpu_id? if yes it will cause tricky issues since
> > > > earlier pirqs are migrated according to different policy. We need
> consolidate
> > > > vcpu selection policies together to keep consistency.
> > >
> > > In my understanding, what you described above is the software way to
> > deliver
> > > the interrupts to vCPU, when posted-interrupt is used, interrupts are
> > delivered
> > > by hardware according to the settings in IRTE, hence those software path
> will
> > > not get touched for these interrupts. So do we need to care about how
> > > software
> > > might migrate the interrupts here?
> >
> > just curious why we can't use one policy for vcpu selection. if multicast
> > handling is a difference, you may pass intpost as a parameter to use
> > same function.
> >
> 
> Digging into hvm_girq_dest_2_vcpu_id, I find that hvm_girq_dest_2_vcpu_id()
> is introduced by commit 023e3bc7, and it is just an optimization for interrupts
> with single destination. For most case, the destination of a vCPU is determined
> by vmsi_deliver().
> 

if can't combine at all, at least we can make the function name clearer to
better reflect distinct purpose. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-04-03  2:00               ` Wu, Feng
@ 2015-04-03 13:36                 ` Konrad Rzeszutek Wilk
  2015-04-07  0:35                   ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-04-03 13:36 UTC (permalink / raw)
  To: Wu, Feng; +Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich, xen-devel

On Fri, Apr 03, 2015 at 02:00:24AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Friday, April 03, 2015 3:15 AM
> > To: Tian, Kevin
> > Cc: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org;
> > JBeulich@suse.com
> > Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event
> > when vCPU is running
> > 
> > On Thu, Apr 02, 2015 at 06:08:12AM +0000, Tian, Kevin wrote:
> > > > From: Wu, Feng
> > > > Sent: Friday, March 27, 2015 12:58 PM
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Zhang, Yang Z
> > > > > Sent: Friday, March 27, 2015 12:44 PM
> > > > > To: Wu, Feng; xen-devel@lists.xen.org
> > > > > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > > > > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event when
> > > > vCPU
> > > > > is running
> > > > >
> > > > > Wu, Feng wrote on 2015-03-27:
> > > > > >
> > > > > >
> > > > > > Zhang, Yang Z wrote on 2015-03-25:
> > > > > >> when vCPU is running
> > > > > >>
> > > > > >> Wu, Feng wrote on 2015-03-25:
> > > > > >>> When a vCPU is running in Root mode and a notification event has
> > > > > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > > > > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > > > > >>> vIRR before
> > 
> > This would imply that we had VMEXIT-ed due to pending interrupt? And we
> > end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set
> > and you stll end up calling the softirq code?
> 
> No. 
> 
> Here is the scenario for the description of this patch:
> 
> When vCPU is running in root-mode (such as via hypercall, or any other
> reasons which can result in VM-Exit), and before vCPU is back to non-root,
> external interrupts happen. Notice that the VM-exit is not caused by this
> external interrupt.

Thank you for the explanation. You might want to add that in the commit
along with the explanation of the code flow below!

> 
> Thanks,
> Feng
> 
> > 
> > > > > > VM-Exit in time.
> > > > > >>
> > > > > >> Shouldn't the pending interrupt be synced unconditionally before next
> > > > > >> vmentry? What happens if we didn't set the softirq?
> > > > > >
> > > > > > If we didn't set the softirq in the notification handler, the
> > > > > > interrupts happened exactly before VM-entry cannot be delivered to
> > > > > > guest at this time. Please see the following code fragments from
> > > > > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > > > > >
> > > > > > .Lvmx_do_vmentry
> > > > > >
> > > > > > ......
> > > > > > 		/* If Vt-d engine issues a notification event here,
> > > > > >          * it cannot be delivered to guest during this VM-entry
> > > > > >          * without raising the softirq in notification handler. */
> > > > > >         cmp  %ecx,(%rdx,%rax,1)
> > > > > >         jnz  .Lvmx_process_softirqs
> > > > > > ......
> > > > > >
> > > > > >         je   .Lvmx_launch
> > > > > > ......
> > > > > >
> > > > > >
> > > > > > .Lvmx_process_softirqs:
> > > > > >         sti
> > > > > >         call do_softirq
> > > > > >         jmp  .Lvmx_do_vmentry
> > > > >
> > > > > You are right! This helps me to recall why raise the softirq when delivering
> > > > the
> > > > > PI.
> > > >
> > > > Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI, it
> > sets
> > > > the
> > > > softirq for this purpose, however, when VT-d HW delivers PI, we have no
> > > > control to
> > > > the HW itself, hence we need to set this softirq in the Notification Event
> > > > handler.
> > > >
> > >
> > > could you include this information in the comment so others can easily
> > > understand this requirement? from code you only mentioned VCPU_KICK
> > > _SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.
> > >
> > > Thanks
> > > Kevin
> > >
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running
  2015-04-03 13:36                 ` Konrad Rzeszutek Wilk
@ 2015-04-07  0:35                   ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-07  0:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, April 03, 2015 9:37 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event
> when vCPU is running
> 
> On Fri, Apr 03, 2015 at 02:00:24AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > > Sent: Friday, April 03, 2015 3:15 AM
> > > To: Tian, Kevin
> > > Cc: Wu, Feng; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org;
> > > JBeulich@suse.com
> > > Subject: Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification
> event
> > > when vCPU is running
> > >
> > > On Thu, Apr 02, 2015 at 06:08:12AM +0000, Tian, Kevin wrote:
> > > > > From: Wu, Feng
> > > > > Sent: Friday, March 27, 2015 12:58 PM
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Zhang, Yang Z
> > > > > > Sent: Friday, March 27, 2015 12:44 PM
> > > > > > To: Wu, Feng; xen-devel@lists.xen.org
> > > > > > Cc: JBeulich@suse.com; keir@xen.org; Tian, Kevin
> > > > > > Subject: RE: [RFC v1 12/15] vmx: Properly handle notification event
> when
> > > > > vCPU
> > > > > > is running
> > > > > >
> > > > > > Wu, Feng wrote on 2015-03-27:
> > > > > > >
> > > > > > >
> > > > > > > Zhang, Yang Z wrote on 2015-03-25:
> > > > > > >> when vCPU is running
> > > > > > >>
> > > > > > >> Wu, Feng wrote on 2015-03-25:
> > > > > > >>> When a vCPU is running in Root mode and a notification event has
> > > > > > >>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
> > > > > > >>> current cpu, so the pending interrupt in PIRR will be synced to
> > > > > > >>> vIRR before
> > >
> > > This would imply that we had VMEXIT-ed due to pending interrupt? And we
> > > end up calling 'do_IRQ'? If so then the DPCI_SOFTIRQ ends up being set
> > > and you stll end up calling the softirq code?
> >
> > No.
> >
> > Here is the scenario for the description of this patch:
> >
> > When vCPU is running in root-mode (such as via hypercall, or any other
> > reasons which can result in VM-Exit), and before vCPU is back to non-root,
> > external interrupts happen. Notice that the VM-exit is not caused by this
> > external interrupt.
> 
> Thank you for the explanation. You might want to add that in the commit
> along with the explanation of the code flow below!

Good idea! Thank you!

Thanks,
Feng

> 
> >
> > Thanks,
> > Feng
> >
> > >
> > > > > > > VM-Exit in time.
> > > > > > >>
> > > > > > >> Shouldn't the pending interrupt be synced unconditionally before
> next
> > > > > > >> vmentry? What happens if we didn't set the softirq?
> > > > > > >
> > > > > > > If we didn't set the softirq in the notification handler, the
> > > > > > > interrupts happened exactly before VM-entry cannot be delivered to
> > > > > > > guest at this time. Please see the following code fragments from
> > > > > > > xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> > > > > > >
> > > > > > > .Lvmx_do_vmentry
> > > > > > >
> > > > > > > ......
> > > > > > > 		/* If Vt-d engine issues a notification event here,
> > > > > > >          * it cannot be delivered to guest during this VM-entry
> > > > > > >          * without raising the softirq in notification handler. */
> > > > > > >         cmp  %ecx,(%rdx,%rax,1)
> > > > > > >         jnz  .Lvmx_process_softirqs
> > > > > > > ......
> > > > > > >
> > > > > > >         je   .Lvmx_launch
> > > > > > > ......
> > > > > > >
> > > > > > >
> > > > > > > .Lvmx_process_softirqs:
> > > > > > >         sti
> > > > > > >         call do_softirq
> > > > > > >         jmp  .Lvmx_do_vmentry
> > > > > >
> > > > > > You are right! This helps me to recall why raise the softirq when
> delivering
> > > > > the
> > > > > > PI.
> > > > >
> > > > > Yes, __vmx_deliver_posted_interrupt() is the software way to deliver PI,
> it
> > > sets
> > > > > the
> > > > > softirq for this purpose, however, when VT-d HW delivers PI, we have no
> > > > > control to
> > > > > the HW itself, hence we need to set this softirq in the Notification Event
> > > > > handler.
> > > > >
> > > >
> > > > could you include this information in the comment so others can easily
> > > > understand this requirement? from code you only mentioned VCPU_KICK
> > > > _SOFTIRQ is required, but how it leads to PIRR->VIRR sync is not explained.
> > > >
> > > > Thanks
> > > > Kevin
> > > >
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.xen.org
> > > > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-04-02  8:39     ` Wu, Feng
@ 2015-04-08  8:53       ` Tian, Kevin
  2015-04-08 11:01         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-08  8:53 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Thursday, April 02, 2015 4:40 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Thursday, April 02, 2015 2:25 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > Subject: RE: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU
> > scheduling
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, March 25, 2015 8:32 PM
> > >
> > > The basic idea here is:
> > > 1. When vCPU's state is RUNSTATE_running,
> > >         - set 'NV' to 'Notification Vector'.
> > >         - Clear 'SN' to accpet PI.
> > >         - set 'NDST' to the right pCPU.
> > > 2. When vCPU's state is RUNSTATE_blocked,
> > >         - set 'NV' to 'Wake-up Vector', so we can wake up the
> > >           related vCPU when posted-interrupt happens for it.
> > >         - Clear 'SN' to accpet PI.
> > > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> > >         - Set 'SN' to suppress non-urgent interrupts.
> > >           (Current, we only support non-urgent interrupts)
> > >         - Set 'NV' back to 'Notification Vector' if needed.
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > >  xen/arch/x86/hvm/vmx/vmx.c | 108
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > >  xen/common/schedule.c      |   3 ++
> > >  2 files changed, 111 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > index b30392c..6323bd6 100644
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
> > >      __vmwrite(GUEST_INTR_STATUS, status);
> > >  }
> > >
> > > +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> > > +{
> > > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > +    struct pi_desc old, new;
> > > +    int old_state = v->runstate.state;
> > > +    unsigned long flags;
> > > +
> > > +    if ( !iommu_intpost )
> > > +        return;
> > > +
> > > +    switch ( new_state )
> > > +    {
> > > +    case RUNSTATE_runnable:
> > > +    case RUNSTATE_offline:
> > > +        /*
> > > +         * We don't need to send notification event to a non-running
> > > +         * vcpu, the interrupt information will be delivered to it before
> > > +         * VM-ENTRY when the vcpu is scheduled to run next time.
> > > +         */
> > > +        pi_set_sn(pi_desc);
> > > +
> > > +        /*
> > > +         * If the state is transferred from RUNSTATE_blocked,
> > > +         * we should set 'NV' feild back to posted_intr_vector,
> > > +         * so the Posted-Interrupts can be delivered to the vCPU
> > > +         * by VT-d HW after it is scheduled to run.
> > > +         */
> > > +        if ( old_state == RUNSTATE_blocked )
> > > +        {
> > > +            do
> > > +            {
> > > +                old.control = new.control = pi_desc->control;
> > > +                new.nv = posted_intr_vector;
> > > +            }
> > > +            while ( cmpxchg(&pi_desc->control, old.control,
> new.control)
> > > +                    != old.control );
> > > +
> > > +           /*
> > > +            * Delete the vCPU from the related wakeup queue
> > > +            * if we are resuming from blocked state
> > > +            */
> > > +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +                             v->processor), flags);
> > > +           list_del(&v->blocked_vcpu_list);
> > > +
> spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +                                  v->processor), flags);
> > > +        }
> > > +        break;
> > > +
> > > +    case RUNSTATE_blocked:
> > > +        /*
> > > +         * The vCPU is blocked on the wait queue.
> > > +         * Store the blocked vCPU on the list of the
> > > +         * vcpu->wakeup_cpu, which is the destination
> > > +         * of the wake-up notification event.
> > > +         */
> > > +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +                          v->processor), flags);
> > > +        list_add_tail(&v->blocked_vcpu_list,
> > > +                      &per_cpu(blocked_vcpu_on_cpu,
> v->processor));
> > > +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +                               v->processor), flags);
> > > +
> > > +        do
> > > +        {
> > > +            old.control = new.control = pi_desc->control;
> > > +
> > > +            /*
> > > +             * We should not block the vCPU if
> > > +             * an interrupt is posted for it.
> > > +             */
> > > +
> > > +            if ( pi_test_on(&old) == 1 )
> > > +            {
> > > +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > > +                return;
> > > +            }
> >
> > so you also need to remove the vcpu from the blocked list, right?
> 
> Yes, I need to remove the vcpu here. I thought it could be removed in
> another place in this patch, however, I feel it cannot do it after
> more thinking about it. I think we can fix this issue in the following two ways:
> #1) Just add the remove logic here.
> or
> #2) In function vcpu_runstate_change(), call 'hvm_funcs.pi_desc_update()'
> after
> v->runstate.state = new_state; and pass the old_state to
> 'hvm_funcs.pi_desc_update()'.
> then here is the code path:
> 
> tasklet_schedule(&v->vcpu_wakeup_tasklet) -> vcpu_unblock() -> vcpu_wake()
> ->
> vcpu_runstate_change(v, RUNSTATE_runnable, NOW()) ->
> vmx_pi_desc_update()
> 
> So, the vCPU will be removed in 'case RUNSTATE_runnable:' of function
> vmx_pi_desc_update().

either way is OK. It's difficult to judge it now based on above function names
w/o re-reading the whole series. we can check it in your new version. :-)

> 
> >
> > and how do you handle ON is set after above check? looks this is better
> > handled behind cmpxchg loop...
> 
> - If 'ON' is set before 'if ( pi_test_on(&old) == 1 )', return
> - If 'ON' is not set before it, and is set after it, ' cmpxchg(&pi_desc->control,
> old.control, new.control) != old.control ' returns ture,
> so, we will do the while again, at this time, 'if ( pi_test_on(&old) == 1 )' is true.
> 

but do we need to raise multiple tasklets in the loop? can they be combined
into one notification after the loop?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-04-02  7:18     ` Wu, Feng
@ 2015-04-08  9:02       ` Tian, Kevin
  2015-04-08 11:14         ` Wu, Feng
  0 siblings, 1 reply; 101+ messages in thread
From: Tian, Kevin @ 2015-04-08  9:02 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Thursday, April 02, 2015 3:18 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Thursday, April 02, 2015 2:01 PM
> > To: Wu, Feng; xen-devel@lists.xen.org
> > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > Subject: RE: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d
> > Posted-Interrupts
> >
> > > From: Wu, Feng
> > > Sent: Wednesday, March 25, 2015 8:32 PM
> > >
> > > This patch adds a global vector which is used to wake up
> > > the blocked vCPU when an interrupt is being posted to it.
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> > > ---
> > >  xen/arch/x86/hvm/vmx/vmx.c        | 33
> > > +++++++++++++++++++++++++++++++++
> > >  xen/include/asm-x86/hvm/hvm.h     |  1 +
> > >  xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
> > >  xen/include/xen/sched.h           |  2 ++
> > >  4 files changed, 39 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > index ff5544d..b2b4c26 100644
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head,
> > > blocked_vcpu_on_cpu);
> > >  DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > >
> > >  uint8_t __read_mostly posted_intr_vector;
> > > +uint8_t __read_mostly pi_wakeup_vector;
> > >
> > >  static int vmx_domain_initialise(struct domain *d)
> > >  {
> > > @@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> > >      if ( v->vcpu_id == 0 )
> > >          v->arch.user_regs.eax = 1;
> > >
> > > +    INIT_LIST_HEAD(&v->blocked_vcpu_list);
> > > +
> > >      return 0;
> > >  }
> > >
> > > @@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init
> > > start_vmx(void)
> > >      }
> > >
> > >      if ( cpu_has_vmx_posted_intr_processing )
> > > +    {
> > >          alloc_direct_apic_vector(&posted_intr_vector,
> > > event_check_interrupt);
> > > +
> > > +        if ( iommu_intpost )
> > > +            alloc_direct_apic_vector(&pi_wakeup_vector,
> > > pi_wakeup_interrupt);
> > > +        else
> > > +            vmx_function_table.pi_desc_update = NULL;
> > > +    }
> >
> > just style issue. Above conditional logic looks not intuitive to me.
> > usually we have:
> > 	if ( iommu_intpost )
> > 		vmx_function_table.pi_desc_update = func;
> > 	else
> > 		vmx_function_table.pi_desc_update = NULL;
> >
> > suppose you will register callback in later patch. then better to
> > move the NULL one there too. Putting it here doesn't meet the
> > normal if...else implications. :-)
> 
> You suggestion is good. Here is my idea about this code fragment:
> 
> Here is the place to register notification event handle, so it is better
> to register the wakeup event handle for VT-d PI here as well. Just like other
> members in vmx_function_table, such as, deliver_posted_intr, sync_pir_to_irr,
> pi_desc_update is initialed to 'vmx_pi_desc_update' in the definition of
> vmx_function_table statically. So do you have any ideas to make this
> gracefully?
> 

I didn't see the problem exactly. If the point is to register callback here,
then you can register it here w/ intpost and NULL w/o intpost. or if there
is some dependency so you must do registration later, then do it later.
I just don't understand why you register a NULL for no-intpost only here :-)

> 
> >
> > >      else
> > >      {
> > >          vmx_function_table.deliver_posted_intr = NULL;
> > >          vmx_function_table.sync_pir_to_irr = NULL;
> > > +        vmx_function_table.pi_desc_update = NULL;
> > >      }
> > >
> > >      if ( cpu_has_vmx_ept
> > > @@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct
> > > cpu_user_regs *regs)
> > >  }
> > >
> > >  /*
> > > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > > + */
> > > +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > > +{
> > > +    struct vcpu *v;
> > > +    int cpu = smp_processor_id();
> > > +
> > > +    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > +    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > > +                    blocked_vcpu_list) {
> > > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > +
> > > +        if ( pi_test_on(pi_desc) == 1 )
> > > +            tasklet_schedule(&v->vcpu_wakeup_tasklet);
> >
> > why can't we directly call vcpu_unblock here?
> 
> Please see the following scenario if we use vcpu_unblock directly here:
> 
> pi_wakeup_interrupt() (blocked_vcpu_on_cpu_lock is required) -->
> vcpu_unblock() -->
> vcpu_wake() --> vcpu_runstate_change() --> vmx_ pi_desc_update() (In this
> function we
> may need to require blocked_vcpu_on_cpu_lock, this will cause dead lock.)
> 

yes, it makes sense.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-04-08  8:53       ` Tian, Kevin
@ 2015-04-08 11:01         ` Wu, Feng
  2015-04-09  2:37           ` Tian, Kevin
  0 siblings, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-08 11:01 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Wednesday, April 08, 2015 4:54 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU
> scheduling
> 
> > From: Wu, Feng
> > Sent: Thursday, April 02, 2015 4:40 PM
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Thursday, April 02, 2015 2:25 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > > Subject: RE: [RFC v1 13/15] Update Posted-Interrupts Descriptor during
> vCPU
> > > scheduling
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, March 25, 2015 8:32 PM
> > > >
> > > > The basic idea here is:
> > > > 1. When vCPU's state is RUNSTATE_running,
> > > >         - set 'NV' to 'Notification Vector'.
> > > >         - Clear 'SN' to accpet PI.
> > > >         - set 'NDST' to the right pCPU.
> > > > 2. When vCPU's state is RUNSTATE_blocked,
> > > >         - set 'NV' to 'Wake-up Vector', so we can wake up the
> > > >           related vCPU when posted-interrupt happens for it.
> > > >         - Clear 'SN' to accpet PI.
> > > > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> > > >         - Set 'SN' to suppress non-urgent interrupts.
> > > >           (Current, we only support non-urgent interrupts)
> > > >         - Set 'NV' back to 'Notification Vector' if needed.
> > > >
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > ---
> > > >  xen/arch/x86/hvm/vmx/vmx.c | 108
> > > > +++++++++++++++++++++++++++++++++++++++++++++
> > > >  xen/common/schedule.c      |   3 ++
> > > >  2 files changed, 111 insertions(+)
> > > >
> > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > > index b30392c..6323bd6 100644
> > > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > > @@ -1710,6 +1710,113 @@ static void vmx_handle_eoi(u8 vector)
> > > >      __vmwrite(GUEST_INTR_STATUS, status);
> > > >  }
> > > >
> > > > +static void vmx_pi_desc_update(struct vcpu *v, int new_state)
> > > > +{
> > > > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > > +    struct pi_desc old, new;
> > > > +    int old_state = v->runstate.state;
> > > > +    unsigned long flags;
> > > > +
> > > > +    if ( !iommu_intpost )
> > > > +        return;
> > > > +
> > > > +    switch ( new_state )
> > > > +    {
> > > > +    case RUNSTATE_runnable:
> > > > +    case RUNSTATE_offline:
> > > > +        /*
> > > > +         * We don't need to send notification event to a non-running
> > > > +         * vcpu, the interrupt information will be delivered to it before
> > > > +         * VM-ENTRY when the vcpu is scheduled to run next time.
> > > > +         */
> > > > +        pi_set_sn(pi_desc);
> > > > +
> > > > +        /*
> > > > +         * If the state is transferred from RUNSTATE_blocked,
> > > > +         * we should set 'NV' feild back to posted_intr_vector,
> > > > +         * so the Posted-Interrupts can be delivered to the vCPU
> > > > +         * by VT-d HW after it is scheduled to run.
> > > > +         */
> > > > +        if ( old_state == RUNSTATE_blocked )
> > > > +        {
> > > > +            do
> > > > +            {
> > > > +                old.control = new.control = pi_desc->control;
> > > > +                new.nv = posted_intr_vector;
> > > > +            }
> > > > +            while ( cmpxchg(&pi_desc->control, old.control,
> > new.control)
> > > > +                    != old.control );
> > > > +
> > > > +           /*
> > > > +            * Delete the vCPU from the related wakeup queue
> > > > +            * if we are resuming from blocked state
> > > > +            */
> > > > +           spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +                             v->processor), flags);
> > > > +           list_del(&v->blocked_vcpu_list);
> > > > +
> > spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +                                  v->processor), flags);
> > > > +        }
> > > > +        break;
> > > > +
> > > > +    case RUNSTATE_blocked:
> > > > +        /*
> > > > +         * The vCPU is blocked on the wait queue.
> > > > +         * Store the blocked vCPU on the list of the
> > > > +         * vcpu->wakeup_cpu, which is the destination
> > > > +         * of the wake-up notification event.
> > > > +         */
> > > > +        spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +                          v->processor), flags);
> > > > +        list_add_tail(&v->blocked_vcpu_list,
> > > > +                      &per_cpu(blocked_vcpu_on_cpu,
> > v->processor));
> > > > +        spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +                               v->processor), flags);
> > > > +
> > > > +        do
> > > > +        {
> > > > +            old.control = new.control = pi_desc->control;
> > > > +
> > > > +            /*
> > > > +             * We should not block the vCPU if
> > > > +             * an interrupt is posted for it.
> > > > +             */
> > > > +
> > > > +            if ( pi_test_on(&old) == 1 )
> > > > +            {
> > > > +                tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > > > +                return;
> > > > +            }
> > >
> > > so you also need to remove the vcpu from the blocked list, right?
> >
> > Yes, I need to remove the vcpu here. I thought it could be removed in
> > another place in this patch, however, I feel it cannot do it after
> > more thinking about it. I think we can fix this issue in the following two ways:
> > #1) Just add the remove logic here.
> > or
> > #2) In function vcpu_runstate_change(), call 'hvm_funcs.pi_desc_update()'
> > after
> > v->runstate.state = new_state; and pass the old_state to
> > 'hvm_funcs.pi_desc_update()'.
> > then here is the code path:
> >
> > tasklet_schedule(&v->vcpu_wakeup_tasklet) -> vcpu_unblock() ->
> vcpu_wake()
> > ->
> > vcpu_runstate_change(v, RUNSTATE_runnable, NOW()) ->
> > vmx_pi_desc_update()
> >
> > So, the vCPU will be removed in 'case RUNSTATE_runnable:' of function
> > vmx_pi_desc_update().
> 
> either way is OK. It's difficult to judge it now based on above function names
> w/o re-reading the whole series. we can check it in your new version. :-)
> 
> >
> > >
> > > and how do you handle ON is set after above check? looks this is better
> > > handled behind cmpxchg loop...
> >
> > - If 'ON' is set before 'if ( pi_test_on(&old) == 1 )', return
> > - If 'ON' is not set before it, and is set after it, ' cmpxchg(&pi_desc->control,
> > old.control, new.control) != old.control ' returns ture,
> > so, we will do the while again, at this time, 'if ( pi_test_on(&old) == 1 )' is true.
> >
> 
> but do we need to raise multiple tasklets in the loop? can they be combined
> into one notification after the loop?

I cannot find the reason to raise multiple tasklets here. The purpose of the check
is to make sure the vCPU is not to be blocked when there are interrupts pending
for it. If it happens, we will terminate the block operation. And the 'ON' bit is clear
along with _all_ the pending interrupts in PIR will being synced to vIRR before
VM-Entry. (once the 'ON' is set by VT-d HW, it will not generate notification event
When interrupts happens , it only stored the interrupts in the related bit in PIR.)

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
  2015-04-08  9:02       ` Tian, Kevin
@ 2015-04-08 11:14         ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-08 11:14 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wu, Feng, keir, JBeulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Wednesday, April 08, 2015 5:02 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> Subject: RE: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d
> Posted-Interrupts
> 
> > From: Wu, Feng
> > Sent: Thursday, April 02, 2015 3:18 PM
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Thursday, April 02, 2015 2:01 PM
> > > To: Wu, Feng; xen-devel@lists.xen.org
> > > Cc: JBeulich@suse.com; keir@xen.org; Zhang, Yang Z
> > > Subject: RE: [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d
> > > Posted-Interrupts
> > >
> > > > From: Wu, Feng
> > > > Sent: Wednesday, March 25, 2015 8:32 PM
> > > >
> > > > This patch adds a global vector which is used to wake up
> > > > the blocked vCPU when an interrupt is being posted to it.
> > > >
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> > > > ---
> > > >  xen/arch/x86/hvm/vmx/vmx.c        | 33
> > > > +++++++++++++++++++++++++++++++++
> > > >  xen/include/asm-x86/hvm/hvm.h     |  1 +
> > > >  xen/include/asm-x86/hvm/vmx/vmx.h |  3 +++
> > > >  xen/include/xen/sched.h           |  2 ++
> > > >  4 files changed, 39 insertions(+)
> > > >
> > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > > > index ff5544d..b2b4c26 100644
> > > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > > @@ -89,6 +89,7 @@ DEFINE_PER_CPU(struct list_head,
> > > > blocked_vcpu_on_cpu);
> > > >  DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > >
> > > >  uint8_t __read_mostly posted_intr_vector;
> > > > +uint8_t __read_mostly pi_wakeup_vector;
> > > >
> > > >  static int vmx_domain_initialise(struct domain *d)
> > > >  {
> > > > @@ -131,6 +132,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> > > >      if ( v->vcpu_id == 0 )
> > > >          v->arch.user_regs.eax = 1;
> > > >
> > > > +    INIT_LIST_HEAD(&v->blocked_vcpu_list);
> > > > +
> > > >      return 0;
> > > >  }
> > > >
> > > > @@ -1834,11 +1837,19 @@ const struct hvm_function_table * __init
> > > > start_vmx(void)
> > > >      }
> > > >
> > > >      if ( cpu_has_vmx_posted_intr_processing )
> > > > +    {
> > > >          alloc_direct_apic_vector(&posted_intr_vector,
> > > > event_check_interrupt);
> > > > +
> > > > +        if ( iommu_intpost )
> > > > +            alloc_direct_apic_vector(&pi_wakeup_vector,
> > > > pi_wakeup_interrupt);
> > > > +        else
> > > > +            vmx_function_table.pi_desc_update = NULL;
> > > > +    }
> > >
> > > just style issue. Above conditional logic looks not intuitive to me.
> > > usually we have:
> > > 	if ( iommu_intpost )
> > > 		vmx_function_table.pi_desc_update = func;
> > > 	else
> > > 		vmx_function_table.pi_desc_update = NULL;
> > >
> > > suppose you will register callback in later patch. then better to
> > > move the NULL one there too. Putting it here doesn't meet the
> > > normal if...else implications. :-)
> >
> > You suggestion is good. Here is my idea about this code fragment:
> >
> > Here is the place to register notification event handle, so it is better
> > to register the wakeup event handle for VT-d PI here as well. Just like other
> > members in vmx_function_table, such as, deliver_posted_intr, sync_pir_to_irr,
> > pi_desc_update is initialed to 'vmx_pi_desc_update' in the definition of
> > vmx_function_table statically. So do you have any ideas to make this
> > gracefully?
> >
> 
> I didn't see the problem exactly. If the point is to register callback here,
> then you can register it here w/ intpost and NULL w/o intpost. or if there
> is some dependency so you must do registration later, then do it later.
> I just don't understand why you register a NULL for no-intpost only here :-)

Thanks for the comments, I totally agree with you. I should have thought this
more before. However, I just followed the existing code around it, such as ,
vmx_function_table.deliver_posted_intr=NULL, 
vmx_function_table.sync_pir_to_irr=NULL. 

Anyway, I will find the better way to handle it in the next post.

Thanks,
Feng

> 
> >
> > >
> > > >      else
> > > >      {
> > > >          vmx_function_table.deliver_posted_intr = NULL;
> > > >          vmx_function_table.sync_pir_to_irr = NULL;
> > > > +        vmx_function_table.pi_desc_update = NULL;
> > > >      }
> > > >
> > > >      if ( cpu_has_vmx_ept
> > > > @@ -3255,6 +3266,28 @@ void vmx_vmenter_helper(const struct
> > > > cpu_user_regs *regs)
> > > >  }
> > > >
> > > >  /*
> > > > + * Handle VT-d posted-interrupt when VCPU is blocked.
> > > > + */
> > > > +void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > > > +{
> > > > +    struct vcpu *v;
> > > > +    int cpu = smp_processor_id();
> > > > +
> > > > +    spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > +    list_for_each_entry(v, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > > > +                    blocked_vcpu_list) {
> > > > +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > > +
> > > > +        if ( pi_test_on(pi_desc) == 1 )
> > > > +            tasklet_schedule(&v->vcpu_wakeup_tasklet);
> > >
> > > why can't we directly call vcpu_unblock here?
> >
> > Please see the following scenario if we use vcpu_unblock directly here:
> >
> > pi_wakeup_interrupt() (blocked_vcpu_on_cpu_lock is required) -->
> > vcpu_unblock() -->
> > vcpu_wake() --> vcpu_runstate_change() --> vmx_ pi_desc_update() (In this
> > function we
> > may need to require blocked_vcpu_on_cpu_lock, this will cause dead lock.)
> >
> 
> yes, it makes sense.
> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling
  2015-04-08 11:01         ` Wu, Feng
@ 2015-04-09  2:37           ` Tian, Kevin
  0 siblings, 0 replies; 101+ messages in thread
From: Tian, Kevin @ 2015-04-09  2:37 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z, keir, JBeulich

> From: Wu, Feng
> Sent: Wednesday, April 08, 2015 7:02 PM
> > > >
> > > > and how do you handle ON is set after above check? looks this is better
> > > > handled behind cmpxchg loop...
> > >
> > > - If 'ON' is set before 'if ( pi_test_on(&old) == 1 )', return
> > > - If 'ON' is not set before it, and is set after it, '
> cmpxchg(&pi_desc->control,
> > > old.control, new.control) != old.control ' returns ture,
> > > so, we will do the while again, at this time, 'if ( pi_test_on(&old) == 1 )' is
> true.
> > >
> >
> > but do we need to raise multiple tasklets in the loop? can they be combined
> > into one notification after the loop?
> 
> I cannot find the reason to raise multiple tasklets here. The purpose of the
> check
> is to make sure the vCPU is not to be blocked when there are interrupts
> pending
> for it. If it happens, we will terminate the block operation. And the 'ON' bit is
> clear
> along with _all_ the pending interrupts in PIR will being synced to vIRR before
> VM-Entry. (once the 'ON' is set by VT-d HW, it will not generate notification
> event
> When interrupts happens , it only stored the interrupts in the related bit in
> PIR.)
> 

yes you're right. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-04-01 13:21 ` Wu, Feng
@ 2015-04-13 12:12   ` Jan Beulich
  2015-04-13 23:38     ` Wu, Feng
  2015-04-24 17:50     ` Wu, Feng
  0 siblings, 2 replies; 101+ messages in thread
From: Jan Beulich @ 2015-04-13 12:12 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Z Zhang, Kevin Tian, keir, xen-devel

>>> On 01.04.15 at 15:21, <feng.wu@intel.com> wrote:
> Hi Jan,
> 
> Any more comments about this series? Thanks a lot!

I think it makes little sense to wait for my review before posting v2
with issues addressed which others have pointed out. I just
returned from vacation, and would have time to look at v1 next
week the earliest (with the week after being occupied by travel).

Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-04-13 12:12   ` Jan Beulich
@ 2015-04-13 23:38     ` Wu, Feng
  2015-04-24 17:50     ` Wu, Feng
  1 sibling, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-04-13 23:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, April 13, 2015 8:13 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org
> Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support
> 
> >>> On 01.04.15 at 15:21, <feng.wu@intel.com> wrote:
> > Hi Jan,
> >
> > Any more comments about this series? Thanks a lot!
> 
> I think it makes little sense to wait for my review before posting v2
> with issues addressed which others have pointed out. I just
> returned from vacation, and would have time to look at v1 next
> week the earliest (with the week after being occupied by travel).

Thanks a lot, Jan! Any comments are appreciated!

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-04-13 12:12   ` Jan Beulich
  2015-04-13 23:38     ` Wu, Feng
@ 2015-04-24 17:50     ` Wu, Feng
  2015-04-27 23:40       ` Jan Beulich
  1 sibling, 1 reply; 101+ messages in thread
From: Wu, Feng @ 2015-04-24 17:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, keir, xen-devel

Ping..

Thanks,
Feng

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, April 13, 2015 5:13 AM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org
> Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support
> 
> >>> On 01.04.15 at 15:21, <feng.wu@intel.com> wrote:
> > Hi Jan,
> >
> > Any more comments about this series? Thanks a lot!
> 
> I think it makes little sense to wait for my review before posting v2
> with issues addressed which others have pointed out. I just
> returned from vacation, and would have time to look at v1 next
> week the earliest (with the week after being occupied by travel).
> 
> Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 00/15] Add VT-d Posted-Interrupts support
  2015-04-24 17:50     ` Wu, Feng
@ 2015-04-27 23:40       ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2015-04-27 23:40 UTC (permalink / raw)
  To: feng.wu; +Cc: yang.z.zhang, kevin.tian, keir, xen-devel

>>> "Wu, Feng" <feng.wu@intel.com> 04/24/15 7:50 PM >>>
>Ping..

I'm confused - I said "it makes little sense to wait", i.e. go ahead posting v2 without
waiting for me.

Jan

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, April 13, 2015 5:13 AM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; keir@xen.org
> Subject: RE: [RFC v1 00/15] Add VT-d Posted-Interrupts support
> 
> >>> On 01.04.15 at 15:21, <feng.wu@intel.com> wrote:
> > Hi Jan,
> >
> > Any more comments about this series? Thanks a lot!
> 
> I think it makes little sense to wait for my review before posting v2
> with issues addressed which others have pointed out. I just
> returned from vacation, and would have time to look at v1 next
> week the earliest (with the week after being occupied by travel).
> 
> Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-03-26 19:29   ` Konrad Rzeszutek Wilk
  2015-03-27  1:45     ` Wu, Feng
@ 2015-05-04  5:32     ` Wu, Feng
  2015-05-04  8:10       ` Jan Beulich
  2015-05-04  8:36       ` Andrew Cooper
  1 sibling, 2 replies; 101+ messages in thread
From: Wu, Feng @ 2015-05-04  5:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, March 27, 2015 3:29 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; Tian, Kevin; keir@xen.org;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts
> Descriptor
> 
> On Wed, Mar 25, 2015 at 08:31:47PM +0800, Feng Wu wrote:
> > This patch initializes the VT-d Posted-interrupt Descriptor.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmcs.c       |  3 +++
> >  xen/include/asm-x86/hvm/vmx/vmx.h | 21 ++++++++++++++++++++-
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index d614638..942f4b7 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -1004,6 +1004,9 @@ static int construct_vmcs(struct vcpu *v)
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> >      {
> > +        if ( iommu_intpost == 1 )
> 
> if ( iommu_intpost )
> 	.. ?
> > +            pi_desc_init(v);
> > +
> >          __vmwrite(PI_DESC_ADDR,
> virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
> >          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR,
> posted_intr_vector);
> >      }
> > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> > index ecc5e17..3cd75eb 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> > @@ -28,6 +28,9 @@
> >  #include <asm/hvm/support.h>
> >  #include <asm/hvm/trace.h>
> >  #include <asm/hvm/vmx/vmcs.h>
> > +#include <asm/apic.h>
> > +
> > +extern uint8_t posted_intr_vector;
> >
> >  typedef union {
> >      struct {
> > @@ -146,6 +149,23 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >      return xchg(&pi_desc->pir[group], 0);
> >  }
> >
> > +static inline void pi_desc_init(struct vcpu *v)
> > +{
> > +    uint32_t dest;
> > +
> > +    pi_clear_sn(&v->arch.hvm_vmx.pi_desc);
> > +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> > +
> > +    /* Physical mode for Notificaiton Event */
> 
> s/Notificaiton/Notification/
> > +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> > +    dest = cpu_physical_id(v->processor);
> > +
> > +    if ( x2apic_enabled )
> > +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> > +    else
> > +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
> 
> Surely there are some macros for that?

I find some macros defined in xen/include/asm-x86/apicdef.h, but since
it is not a common format here, I cannot find one which can be used for it.

In the above case (the 'else' branch), 'dest' will occupy bit 8:15 of 'ndst' field.

Either we add a macro in the file (it is a little strange, since the format is not common),
or we remain the current solution. Any ideas?

Thanks,
Feng

> > +}
> > +
> >  /*
> >   * Exit Reasons
> >   */
> > @@ -265,7 +285,6 @@ static inline unsigned long pi_get_pir(struct pi_desc
> *pi_desc, int group)
> >  #define MODRM_EAX_ECX   ".byte 0xc1\n" /* EAX, ECX */
> >
> >  extern u64 vmx_ept_vpid_cap;
> > -extern uint8_t posted_intr_vector;
> >
> >  #define cpu_has_vmx_ept_exec_only_supported        \
> >      (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED)
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-05-04  5:32     ` Wu, Feng
@ 2015-05-04  8:10       ` Jan Beulich
  2015-05-04  8:36       ` Andrew Cooper
  1 sibling, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2015-05-04  8:10 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Z Zhang, Kevin Tian, keir, xen-devel

>>> On 04.05.15 at 07:32, <feng.wu@intel.com> wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: Friday, March 27, 2015 3:29 AM
>> On Wed, Mar 25, 2015 at 08:31:47PM +0800, Feng Wu wrote:
>> > +    v->arch.hvm_vmx.pi_desc.ndm = 0;
>> > +    dest = cpu_physical_id(v->processor);
>> > +
>> > +    if ( x2apic_enabled )
>> > +        v->arch.hvm_vmx.pi_desc.ndst = dest;
>> > +    else
>> > +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
>> 
>> Surely there are some macros for that?
> 
> I find some macros defined in xen/include/asm-x86/apicdef.h, but since
> it is not a common format here, I cannot find one which can be used for it.
> 
> In the above case (the 'else' branch), 'dest' will occupy bit 8:15 of 'ndst' 
> field.
> 
> Either we add a macro in the file (it is a little strange, since the format 
> is not common),
> or we remain the current solution. Any ideas?

Adding a macro if really no suitable one exists is the minimum -
recently I went through and replaced quite a few of such hard-
coded numbers with something more understandable, and hence
you shouldn't introduce new instances of such.

Jan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-05-04  5:32     ` Wu, Feng
  2015-05-04  8:10       ` Jan Beulich
@ 2015-05-04  8:36       ` Andrew Cooper
  2015-05-04  9:07         ` Wu, Feng
  1 sibling, 1 reply; 101+ messages in thread
From: Andrew Cooper @ 2015-05-04  8:36 UTC (permalink / raw)
  To: Wu, Feng, Konrad Rzeszutek Wilk
  Cc: Zhang, Yang Z, Tian, Kevin, keir, JBeulich, xen-devel

On 04/05/2015 06:32, Wu, Feng wrote:
>
>>> +    v->arch.hvm_vmx.pi_desc.ndm = 0;
>>> +    dest = cpu_physical_id(v->processor);
>>> +
>>> +    if ( x2apic_enabled )
>>> +        v->arch.hvm_vmx.pi_desc.ndst = dest;
>>> +    else
>>> +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
>> Surely there are some macros for that?
> I find some macros defined in xen/include/asm-x86/apicdef.h, but since
> it is not a common format here, I cannot find one which can be used for it.
>
> In the above case (the 'else' branch), 'dest' will occupy bit 8:15 of 'ndst' field.
>
> Either we add a macro in the file (it is a little strange, since the format is not common),
> or we remain the current solution. Any ideas?

MASK_INSR() is what you are looking for, but you will need to provide a
sensible name for 0xFF00.

~Andrew

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-05-04  8:36       ` Andrew Cooper
@ 2015-05-04  9:07         ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2015-05-04  9:07 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, xen-devel, JBeulich, Zhang, Yang Z, keir



> -----Original Message-----
> From: Andrew Cooper [mailto:amc96@hermes.cam.ac.uk] On Behalf Of
> Andrew Cooper
> Sent: Monday, May 04, 2015 4:36 PM
> To: Wu, Feng; Konrad Rzeszutek Wilk
> Cc: Tian, Kevin; xen-devel@lists.xen.org; JBeulich@suse.com; Zhang, Yang Z;
> keir@xen.org
> Subject: Re: [Xen-devel] [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts
> Descriptor
> 
> On 04/05/2015 06:32, Wu, Feng wrote:
> >
> >>> +    v->arch.hvm_vmx.pi_desc.ndm = 0;
> >>> +    dest = cpu_physical_id(v->processor);
> >>> +
> >>> +    if ( x2apic_enabled )
> >>> +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> >>> +    else
> >>> +        v->arch.hvm_vmx.pi_desc.ndst = (dest << 8) & 0xFF00;
> >> Surely there are some macros for that?
> > I find some macros defined in xen/include/asm-x86/apicdef.h, but since
> > it is not a common format here, I cannot find one which can be used for it.
> >
> > In the above case (the 'else' branch), 'dest' will occupy bit 8:15 of 'ndst' field.
> >
> > Either we add a macro in the file (it is a little strange, since the format is not
> common),
> > or we remain the current solution. Any ideas?
> 
> MASK_INSR() is what you are looking for, but you will need to provide a
> sensible name for 0xFF00.
> 
> ~Andrew

Yes, that is the right one for me. Thanks a lot, Andrew!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2015-05-04  9:07 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25 12:31 [RFC v1 00/15] Add VT-d Posted-Interrupts support Feng Wu
2015-03-25 12:31 ` [RFC v1 01/15] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-03-26 17:39   ` Andrew Cooper
2015-03-27  4:46     ` Wu, Feng
2015-03-27  9:55       ` Andrew Cooper
2015-03-27  9:52     ` Jan Beulich
2015-03-25 12:31 ` [RFC v1 02/15] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-03-26 18:12   ` Andrew Cooper
2015-03-27  1:21     ` Wu, Feng
2015-03-27 10:06       ` Andrew Cooper
2015-03-27 13:41         ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 03/15] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-03-26 18:37   ` Andrew Cooper
2015-03-27  1:32     ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 04/15] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-03-26 18:44   ` Andrew Cooper
2015-03-25 12:31 ` [RFC v1 05/15] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-03-26 18:53   ` Andrew Cooper
2015-03-27  1:45     ` Wu, Feng
2015-03-26 19:29   ` Konrad Rzeszutek Wilk
2015-03-27  1:45     ` Wu, Feng
2015-05-04  5:32     ` Wu, Feng
2015-05-04  8:10       ` Jan Beulich
2015-05-04  8:36       ` Andrew Cooper
2015-05-04  9:07         ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 06/15] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-03-26 19:00   ` Andrew Cooper
2015-03-27  1:53     ` Wu, Feng
2015-03-27  9:58       ` Jan Beulich
2015-04-02  6:32         ` Tian, Kevin
2015-03-25 12:31 ` [RFC v1 07/15] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-03-26 19:17   ` Andrew Cooper
2015-03-27  2:13     ` Wu, Feng
2015-03-27 10:02       ` Jan Beulich
2015-03-27  4:52     ` Wu, Feng
2015-03-26 19:36   ` Konrad Rzeszutek Wilk
2015-03-27  1:59     ` Wu, Feng
2015-04-02  5:34   ` Tian, Kevin
2015-04-02  6:02     ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 08/15] Update IRTE according to guest interrupt config changes Feng Wu
2015-03-26 19:46   ` Konrad Rzeszutek Wilk
2015-03-27  5:45     ` Wu, Feng
2015-03-26 19:59   ` Andrew Cooper
2015-03-27  5:49     ` Wu, Feng
2015-03-27 11:31       ` Andrew Cooper
2015-04-02  5:52   ` Tian, Kevin
2015-04-02  6:20     ` Wu, Feng
2015-04-02  6:49       ` Tian, Kevin
2015-04-02  8:02         ` Wu, Feng
2015-04-03  8:29           ` Tian, Kevin
2015-03-25 12:31 ` [RFC v1 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU Feng Wu
2015-04-02  5:53   ` Tian, Kevin
2015-04-02  7:20     ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 10/15] vmx: Define two per-cpu variants Feng Wu
2015-03-26 19:59   ` Andrew Cooper
2015-04-02  5:54   ` Tian, Kevin
2015-04-02  6:24     ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts Feng Wu
2015-03-26 20:07   ` Andrew Cooper
2015-04-02  6:00   ` Tian, Kevin
2015-04-02  7:18     ` Wu, Feng
2015-04-08  9:02       ` Tian, Kevin
2015-04-08 11:14         ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-03-25 14:14   ` Zhang, Yang Z
2015-03-27  4:40     ` Wu, Feng
2015-03-27  4:44       ` Zhang, Yang Z
2015-03-27  4:57         ` Wu, Feng
2015-04-02  6:08           ` Tian, Kevin
2015-04-02  7:21             ` Wu, Feng
2015-04-02 19:15             ` Konrad Rzeszutek Wilk
2015-04-03  2:00               ` Wu, Feng
2015-04-03 13:36                 ` Konrad Rzeszutek Wilk
2015-04-07  0:35                   ` Wu, Feng
2015-03-26 19:57   ` Konrad Rzeszutek Wilk
2015-03-27  3:06     ` Wu, Feng
2015-03-25 12:31 ` [RFC v1 13/15] Update Posted-Interrupts Descriptor during vCPU scheduling Feng Wu
2015-03-26 20:16   ` Andrew Cooper
2015-03-27  2:59     ` Wu, Feng
2015-04-02  6:24   ` Tian, Kevin
2015-04-02  8:39     ` Wu, Feng
2015-04-08  8:53       ` Tian, Kevin
2015-04-08 11:01         ` Wu, Feng
2015-04-09  2:37           ` Tian, Kevin
2015-03-25 12:31 ` [RFC v1 14/15] Suppress posting interrupts when 'SN' is set Feng Wu
2015-03-26 20:34   ` Andrew Cooper
2015-03-27  3:00     ` Wu, Feng
2015-03-27 12:06       ` Andrew Cooper
2015-03-27 13:45         ` Wu, Feng
2015-03-27 13:49           ` Andrew Cooper
2015-03-30  2:11             ` Wu, Feng
2015-03-30 10:11               ` Andrew Cooper
2015-03-25 12:31 ` [RFC v1 15/15] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-03-26 18:50 ` [RFC v1 00/15] Add VT-d Posted-Interrupts support Konrad Rzeszutek Wilk
2015-03-27  1:06   ` Wu, Feng
2015-03-27 14:44     ` Konrad Rzeszutek Wilk
2015-04-01 13:21 ` Wu, Feng
2015-04-13 12:12   ` Jan Beulich
2015-04-13 23:38     ` Wu, Feng
2015-04-24 17:50     ` Wu, Feng
2015-04-27 23:40       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.