All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/17] Add VT-d Posted-Interrupts support
@ 2015-11-03  8:43 Feng Wu
  2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
                   ` (17 more replies)
  0 siblings, 18 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v9:
- [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
    * Make the description more generic.

- [PATCH v9 02/17] Add cmpxchg16b support for x86-64
    * Make the *ptr operand an input and output.

- [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
    * Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare().
    * Add vmx_pi_state_change() and call it before VM Entry.

Feng Wu (17):
 r   VT-d Posted-intterrupt (PI) design
     Add cmpxchg16b support for x86-64
 ra  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
 r   vt-d: VT-d Posted-Interrupts feature detection
 ra  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
 r   vmx: Add some helper functions for Posted-Interrupts
 ra  vmx: Initialize VT-d Posted-Interrupts Descriptor
 r   vmx: Suppress posting interrupts when 'SN' is set
 r   VT-d: Remove pointless casts
  a  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
 r   vt-d: Add API to update IRTE when VT-d PI is used
  a  x86: move some APIC related macros to apicdef.h
  a  Update IRTE according to guest interrupt config changes
  a  vmx: Properly handle notification event when vCPU is running
     vmx: VT-d posted-interrupt core logic handling
     VT-d: Dump the posted format IRTE
 ra  Add a command line parameter for VT-d posted-interrupts
 
 r = has been 'Reviewed-by'
 a = has been 'Acked-by'

 docs/misc/vtd-pi.txt                   | 329 +++++++++++++++++++++++++++++++++
 docs/misc/xen-command-line.markdown    |   9 +-
 xen/arch/x86/hvm/hvm.c                 |   6 +
 xen/arch/x86/hvm/vlapic.c              |   5 -
 xen/arch/x86/hvm/vmx/vmcs.c            |  24 +++
 xen/arch/x86/hvm/vmx/vmx.c             | 270 ++++++++++++++++++++++++++-
 xen/common/schedule.c                  |   7 +-
 xen/drivers/passthrough/io.c           | 123 +++++++++++-
 xen/drivers/passthrough/iommu.c        |  16 +-
 xen/drivers/passthrough/vtd/intremap.c | 212 ++++++++++++++++-----
 xen/drivers/passthrough/vtd/iommu.c    |  14 +-
 xen/drivers/passthrough/vtd/iommu.h    |  51 +++--
 xen/drivers/passthrough/vtd/utils.c    |  40 ++--
 xen/include/asm-arm/domain.h           |   2 +
 xen/include/asm-x86/apicdef.h          |   3 +
 xen/include/asm-x86/domain.h           |   2 +
 xen/include/asm-x86/hvm/hvm.h          |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |  24 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h      |  27 +++
 xen/include/asm-x86/iommu.h            |   2 +
 xen/include/asm-x86/x86_64/system.h    |  33 ++++
 xen/include/xen/iommu.h                |   2 +-
 22 files changed, 1113 insertions(+), 90 deletions(-)
 create mode 100644 docs/misc/vtd-pi.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-10 14:52   ` Dario Faggioli
  2015-11-24  6:34   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 02/17] Add cmpxchg16b support for x86-64 Feng Wu
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Yang Zhang, Feng Wu

Add the design doc for VT-d PI.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Keir Fraser <keir@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/vtd-pi.txt | 329 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 329 insertions(+)
 create mode 100644 docs/misc/vtd-pi.txt

diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
new file mode 100644
index 0000000..f9b4637
--- /dev/null
+++ b/docs/misc/vtd-pi.txt
@@ -0,0 +1,329 @@
+Authors: Feng Wu <feng.wu@intel.com>
+
+VT-d Posted-interrupt (PI) design for XEN
+
+Background
+==========
+With the development of virtualization, there are more and more device
+assignment requirements. However, today when a VM is running with
+assigned devices (such as, NIC), external interrupt handling for the assigned
+devices always needs VMM intervention.
+
+VT-d Posted-interrupt is a more enhanced method to handle interrupts
+in the virtualization environment. Interrupt posting is the process by
+which an interrupt request is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex.
+
+With VT-d Posted-interrupt we can get the following advantages:
+- Direct delivery of external interrupts to running vCPUs without VMM
+intervention
+- Decrease the interrupt migration complexity. On vCPU migration, software
+can atomically co-migrate all interrupts targeting the migrating vCPU. For
+virtual machines with assigned devices, migrating a vCPU across pCPUs
+either incurs the overhead of forwarding interrupts in software (e.g. via VMM
+generated IPIs), or complexity to independently migrate each interrupt targeting
+the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
+of an external interrupt from assigned devices is stored in the IRTE (i.e.
+Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
+we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
+make the interrupt migration automatic.
+
+Here is what Xen currently does for external interrupts from assigned devices:
+
+When a VM is running and an external interrupt from an assigned device occurs
+for it. VM-EXIT happens, then:
+
+vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
+raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
+
+softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
+
+dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
+vmsi_inj_irq() --> vlapic_set_irq()
+
+vlapic_set_irq() does the following things:
+1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
+the virtual interrupt via posted-interrupt infrastructure.
+2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
+page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
+will help to inject the interrupt to guests.
+
+However, after VT-d PI is supported, when a guest is running in non-root and an
+external interrupt from an assigned device occurs for it. No VM-Exit is needed,
+the guest can handle this totally in non-root mode, thus avoiding all the above
+code flow.
+
+Posted-interrupt Introduction
+========================
+There are two components to the Posted-interrupt architecture:
+Processor Support and Root-Complex Support
+
+- Processor Support
+Posted-interrupt processing is a feature by which a processor processes
+the virtual interrupts by recording them as pending on the virtual-APIC
+page.
+
+Posted-interrupt processing is enabled by setting the process posted
+interrupts VM-execution control. The processing is performed in response
+to the arrival of an interrupt with the posted-interrupt notification vector.
+In response to such an interrupt, the processor processes virtual interrupts
+recorded in a data structure called a posted-interrupt descriptor.
+
+More information about APICv and CPU-side Posted-interrupt, please refer
+to Chapter 29, and Section 29.6 in the Intel SDM:
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+
+- Root-Complex Support
+Interrupt posting is the process by which an interrupt request (from IOAPIC
+or MSI/MSIx capable sources) is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex. The interrupt
+request arriving at the root-complex carry the identity of the interrupt
+request source and a 'remapping-index'. The remapping-index is used to
+look-up an entry from the memory-resident interrupt-remap-table. Unlike
+interrupt-remapping, the interrupt-remap-table-entry for a posted-interrupt,
+specifies a virtual-vector and a pointer to the posted-interrupt descriptor.
+The virtual-vector specifies the vector of the interrupt to be recorded in
+the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage
+for the virtual-vectors and contains the attributes of the notification event
+(interrupt) to be issued to the CPU complex to inform CPU/software about pending
+interrupts recorded in the posted-interrupt descriptor.
+
+More information about VT-d PI, please refer to
+http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
+
+Important Definitions
+==================
+There are some changes to IRTE and posted-interrupt descriptor after
+VT-d PI is introduced:
+IRTE: Interrupt Remapping Table Entry
+Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
+Virtual Vector: the guest vector of the interrupt
+URG: indicates if the interrupt is urgent
+
+Posted-interrupt descriptor:
+The Posted Interrupt Descriptor hosts the following fields:
+Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
+per vector, for up to 256 vectors).
+
+Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
+processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
+hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
+the notification event (processor or software) resets it as part of posted interrupt processing.
+
+Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
+generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
+URG=0).
+
+Notification Vector (NV): Specify the vector for notification event (interrupt).
+
+Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
+processor for the notification event.
+
+Design Overview
+==============
+In this design, we will cover the following items:
+1. Add a variable to control whether enable VT-d posted-interrupt or not.
+2. VT-d PI feature detection.
+3. Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+4. Extend IRTE structure to support VT-d PI.
+5. Introduce a new global vector which is used for waking up the blocked vCPU.
+6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+7. Update posted-interrupt descriptor during vCPU scheduling.
+8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+9. New boot command line for Xen, which controls VT-d PI feature by user.
+10. Multicast/broadcast and lowest priority interrupts consideration.
+
+
+Implementation details
+===================
+- New variable to control VT-d PI
+
+Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
+to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
+only when interrupt remapping and VT-d posted-interrupt are both enabled.
+
+- VT-d PI feature detection.
+Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
+
+- Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+Here is the new structure for posted-interrupt descriptor:
+
+struct pi_desc {
+    DECLARE_BITMAP(pir, NR_VECTORS);
+    union {
+        struct
+        {
+        u16 on     : 1,  /* bit 256 - Outstanding Notification */
+            sn     : 1,  /* bit 257 - Suppress Notification */
+            rsvd_1 : 14; /* bit 271:258 - Reserved */
+        u8  nv;          /* bit 279:272 - Notification Vector */
+        u8  rsvd_2;      /* bit 287:280 - Reserved */
+        u32 ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
+} __attribute__ ((aligned (64)));
+
+- Extend IRTE structure to support VT-d PI.
+
+Here is the new structure for IRTE:
+/* interrupt remap entry */
+struct iremap_entry {
+  union {
+    struct { u64 lo, hi; };
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            dm      : 1,
+            rh      : 1,
+            tm      : 1,
+            dlm     : 3,
+            avail   : 4,
+            res_1   : 4;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4   : 32;
+    } remap;
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_5   : 12;
+        u32 pda_h;
+    } post;
+  };
+};
+
+- Introduce a new global vector which is used to wake up the blocked vCPU.
+
+Currently, there is a global vector 'posted_intr_vector', which is used as the
+global notification vector for all vCPUs in the system. This vector is stored in
+VMCS and CPU considers it as a _special_ vector, uses it to notify the related
+pCPU when an interrupt is recorded in the posted-interrupt descriptor.
+
+This existing global vector is a _special_ vector to CPU, CPU handle it in a
+_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+for more information about how CPU handles it.
+
+After having VT-d PI, VT-d engine can issue notification event when the
+assigned devices issue interrupts. We need add a new global vector to
+wakeup the blocked vCPU, please refer to later section in this design for
+how to use this new global vector.
+
+- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+After VT-d PI is introduced, the format of IRTE is changed as follows:
+	Descriptor Address: the address of the posted-interrupt descriptor
+	Virtual Vector: the guest vector of the interrupt
+	URG: indicates if the interrupt is urgent
+	Other fields continue to have the same meaning
+
+'Descriptor Address' tells the destination vCPU of this interrupt, since
+each vCPU has a dedicated posted-interrupt descriptor.
+
+'Virtual Vector' tells the guest vector of the interrupt.
+
+When guest changes the configuration of the interrupts, such as, the
+cpu affinity, or the vector, we need to update the associated IRTE accordingly.
+
+- Update posted-interrupt descriptor during vCPU scheduling
+
+The basic idea here is:
+1. When vCPU is running
+        - Set 'NV' to 'posted_intr_vector'.
+        - Clear 'SN' to accept posted-interrupts.
+        - Set 'NDST' to the pCPU on which the vCPU will be running.
+2. When vCPU is blocked
+        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
+          related vCPU when posted-interrupt happens for it.
+          Please refer to the above section about the new global vector.
+        - Clear 'SN' to accept posted-interrupts
+3. When vCPU is preempted or sleeping
+        - Set 'SN' to suppress non-urgent interrupts
+          (Currently, we only support non-urgent interrupts)
+         When vCPU is preempted or sleep, it doesn't need to accept
+         posted-interrupt notification event since we don't change the behavior
+         of scheduler when the interrupt occurs, we still need wait for the next
+         scheduling of the vCPU. When external interrupts from assigned devices occur,
+         the interrupts are recorded in PIR, and will be synced to IRR before VM-Entry.
+        - Set 'NV' to 'posted_intr_vector'.
+
+- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+
+Here is the scenario for the usage of the new global vector:
+
+1. vCPU0 is running on pCPU0
+2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
+3. An external interrupt from an assigned device occurs for vCPU0, if we
+still use 'posted_intr_vector' as the notification vector for vCPU0, the
+notification event for vCPU0 (the event will go to pCPU1) will be consumed
+by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
+case is that vCPU0 will never be woken up again since the wakeup event
+for it is always consumed by other vCPUs incorrectly. So we need introduce
+another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
+
+After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
+event using this new vector. Since this new vector is not a SPECIAL one to CPU,
+it is just a normal vector. To CPU, it just receives an normal external interrupt,
+then we can get control in the handler of this new vector. In this case, hypervisor
+can do something in it, such as wakeup the blocked vCPU.
+
+Here are what we do for the blocked vCPU:
+1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
+vCPU on the pCPU.
+2. When the vCPU is going to block, insert the vCPU
+to the per-cpu list belonging to the pCPU it was running.
+3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
+
+In the handler of 'pi_wakeup_vector', we do:
+1. Get the physical CPU.
+2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
+we unblock the associated vCPU.
+
+- New boot command line for Xen, which controls VT-d PI feature by user.
+
+Like 'intremap' for interrupt remapping, we add a new boot command line
+'intpost' for posted-interrupts.
+
+- Multicast/broadcast and lowest priority interrupts consideration.
+
+With VT-d PI, the destination vCPU information of an external interrupt
+from assigned devices is stored in IRTE, this makes the following
+consideration of the design:
+1. Multicast/broadcast interrupts cannot be posted.
+2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
+(starting from Nehalem) ignore TPR value, and instead supported two other
+ways (configurable by BIOS) on how the handle lowest priority interrupts:
+	A) Round robin: In this method, the chipset simply delivers lowest priority
+interrupts in a round-robin manner across all the available logical CPUs. While
+this provides good load balancing, this was not the best thing to do always as
+interrupts from the same device (like NIC) will start running on all the CPUs
+thrashing caches and taking locks. This led to the next scheme.
+	B) Vector hashing: In this method, hardware would apply a hash function
+on the vector value in the interrupt request, and use that hash to pick a logical
+CPU to route the lowest priority interrupt. This way, a given vector always goes
+to the same logical CPU, avoiding the thrashing problem above.
+
+So, gist of above is that, lowest priority interrupts has never been delivered as
+"lowest priority" in physical hardware.
+
+Vector hashing is used in this design.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 02/17] Add cmpxchg16b support for x86-64
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Feng Wu, Jan Beulich, Andrew Cooper

This patch adds cmpxchg16b support for x86-64, so software
can perform 128-bit atomic write/read.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Make the *ptr operand an input and output.

v8:
- Remove pointless cast when assigning 'new_low'
- properly parenthesize cmpxchg16b()

v7:
- Make the last two parameters of __cmpxchg16b() const
- Remove memory clobber
- Add run-time and build-build check in cmpxchg16b()
- Cast the last two parameter to void * when calling __cmpxchg16b()

v6:
- Fix a typo

v5:
- Change back the parameters of __cmpxchg16b() to __uint128_t *
- Remove pointless cast for 'ptr'
- Remove pointless parentheses
- Use A constraint for the output

v4:
- Use pointer as the parameter of __cmpxchg16b().
- Use gcc's __uint128_t built-in type
- Make the parameters of __cmpxchg16b() void *

v3:
- Newly added.

 xen/include/asm-x86/x86_64/system.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index 662813a..7026c05 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -6,6 +6,39 @@
                                    (unsigned long)(n),sizeof(*(ptr))))
 
 /*
+ * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
+ * identical, store NEW in MEM.  Return the initial value in MEM.
+ * Success is indicated by comparing RETURN with OLD.
+ *
+ * This function can only be called when cpu_has_cx16 is true.
+ */
+
+static always_inline __uint128_t __cmpxchg16b(
+    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
+{
+    __uint128_t prev;
+    uint64_t new_high = *new >> 64;
+    uint64_t new_low = *new;
+
+    ASSERT(cpu_has_cx16);
+
+    asm volatile ( "lock; cmpxchg16b %1"
+                   : "=A" (prev), "+m" (*__xg(ptr))
+                   : "c" (new_high), "b" (new_low),
+                     "0" (*old) );
+
+    return prev;
+}
+
+#define cmpxchg16b(ptr, o, n) ({                           \
+    volatile void *_p = (ptr);                             \
+    ASSERT(!((unsigned long)_p & 0xf));                    \
+    BUILD_BUG_ON(sizeof(*(o)) != sizeof(__uint128_t));     \
+    BUILD_BUG_ON(sizeof(*(n)) != sizeof(__uint128_t));     \
+    __cmpxchg16b(_p, (void *)(o), (void *)(n));            \
+})
+
+/*
  * This function causes value _o to be changed to _n at location _p.
  * If this access causes a fault then we return 1, otherwise we return 0.
  * If no fault occurs then _o is updated to the value we saw at _p. If this
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-11-03  8:43 ` [PATCH v9 02/17] Add cmpxchg16b support for x86-64 Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Feng Wu, Jan Beulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds variable 'iommu_intpost' to control whether enable VT-d
posted-interrupt or not in the generic IOMMU code.

CC: Jan Beulich <jbeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v5:
- Remove the "if no intremap then no intpost" logic in parse_iommu_param(), which
  can be covered in iommu_setup()

v3:
- Remove pointless initializer for 'iommu_intpost'.
- Some adjustment for "if no intremap then no intpost" logic.
    * For parse_iommu_param(), move it to the end of the function,
      so we don't need to add the some logic when introduing the
      new kernel parameter 'intpost' in later patch.
    * Add this logic in iommu_setup() after iommu_hardware_setup()
      is called.

 xen/drivers/passthrough/iommu.c | 13 ++++++++++++-
 xen/include/xen/iommu.h         |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index fc7831e..36d5cc0 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -51,6 +51,14 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+
+/*
+ * In the current implementation of VT-d posted interrupts, in some extreme
+ * cases, the per cpu list which saves the blocked vCPU will be very long,
+ * and this will affect the interrupt latency, so let this feature off by
+ * default until we find a good solution to resolve it.
+ */
+bool_t __read_mostly iommu_intpost;
 bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
@@ -307,6 +315,9 @@ int __init iommu_setup(void)
         panic("Couldn't enable %s and iommu=required/force",
               !iommu_enabled ? "IOMMU" : "Interrupt Remapping");
 
+    if ( !iommu_intremap )
+        iommu_intpost = 0;
+
     if ( !iommu_enabled )
     {
         iommu_snoop = 0;
@@ -374,7 +385,7 @@ void iommu_crash_shutdown(void)
     const struct iommu_ops *ops = iommu_get_ops();
     if ( iommu_enabled )
         ops->crash_shutdown();
-    iommu_enabled = iommu_intremap = 0;
+    iommu_enabled = iommu_intremap = iommu_intpost = 0;
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 8f3a20e..1f5d04a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -30,7 +30,7 @@
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
-extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
+extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-24  7:14   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v7:
- Remove pointless "if non iommu_intremap then disable iommu_intpost" logic
- Don't need to check !iommu_intremap or !iommu_intpost when setting iommu_intpost to 0

v5:
- Remove blank line

v4:
- Correct a logic error when setting iommu_intpost to 0

v3:
- Remove the "if no intremap then no intpost" logic in
  intel_vtd_setup(), it is covered in the iommu_setup().
- Add "if no intremap then no intpost" logic in the end
  of init_vtd_hw() which is called by vtd_resume().

So the logic exists in the following three places:
- parse_iommu_param()
- iommu_setup()
- init_vtd_hw()

 xen/drivers/passthrough/vtd/iommu.c | 14 ++++++++++++--
 xen/drivers/passthrough/vtd/iommu.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1dffc40..8dee731 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2147,8 +2147,8 @@ int __init intel_vtd_setup(void)
     }
 
     /* We enable the following features only if they are supported by all VT-d
-     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
-     * Interrupt Remapping.
+     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
+     * Remapping, and Posted Interrupt
      */
     for_each_drhd_unit ( drhd )
     {
@@ -2176,6 +2176,14 @@ int __init intel_vtd_setup(void)
         if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
             iommu_intremap = 0;
 
+        /*
+         * We cannot use posted interrupt if X86_FEATURE_CX16 is
+         * not supported, since we count on this feature to
+         * atomically update 16-byte IRTE in posted format.
+         */
+        if ( !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
+            iommu_intpost = 0;
+
         if ( !vtd_ept_page_compatible(iommu) )
             iommu_hap_pt_share = 0;
 
@@ -2201,6 +2209,7 @@ int __init intel_vtd_setup(void)
     P(iommu_passthrough, "Dom0 DMA Passthrough");
     P(iommu_qinval, "Queued Invalidation");
     P(iommu_intremap, "Interrupt Remapping");
+    P(iommu_intpost, "Posted Interrupt");
     P(iommu_hap_pt_share, "Shared EPT tables");
 #undef P
 
@@ -2220,6 +2229,7 @@ int __init intel_vtd_setup(void)
     iommu_passthrough = 0;
     iommu_qinval = 0;
     iommu_intremap = 0;
+    iommu_intpost = 0;
     return ret;
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index ac71ed1..22abefe 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -61,6 +61,7 @@
 /*
  * Decoding Capability Register
  */
+#define cap_intr_post(c)       (((c) >> 59) & 1)
 #define cap_read_drain(c)      (((c) >> 55) & 1)
 #define cap_write_drain(c)     (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

Extend struct pi_desc according to VT-d Posted-Interrupts Spec.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v8:
- Coding style

v7:
- Coding style.

v3:
- Use u32 instead of u64 for the bitfield in 'struct pi_desc'

 xen/include/asm-x86/hvm/vmx/vmcs.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f1126d4..81c9e63 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -80,8 +80,18 @@ struct vmx_domain {
 
 struct pi_desc {
     DECLARE_BITMAP(pir, NR_VECTORS);
-    u32 control;
-    u32 rsvd[7];
+    union {
+        struct {
+            u16     on     : 1,  /* bit 256 - Outstanding Notification */
+                    sn     : 1,  /* bit 257 - Suppress Notification */
+                    rsvd_1 : 14; /* bit 271:258 - Reserved */
+            u8      nv;          /* bit 279:272 - Notification Vector */
+            u8      rsvd_2;      /* bit 287:280 - Reserved */
+            u32     ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
 } __attribute__ ((aligned (64)));
 
 #define ept_get_wl(ept)   ((ept)->ept_wl)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-24  7:15   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Use bitfield in pi_test_on() and pi_test_sn()

v4:
- Newly added

 xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3fbfa44..8d91110 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -101,6 +101,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
 void vmx_update_secondary_exec_control(struct vcpu *v);
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
 static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 {
     return test_and_set_bit(vector, pi_desc->pir);
@@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
     return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
 }
 
+static inline int pi_test_on(struct pi_desc *pi_desc)
+{
+    return pi_desc->on;
+}
+
 static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 {
     return xchg(&pi_desc->pir[group], 0);
 }
 
+static inline int pi_test_sn(struct pi_desc *pi_desc)
+{
+    return pi_desc->sn;
+}
+
+static inline void pi_set_sn(struct pi_desc *pi_desc)
+{
+    set_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+    clear_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
 /*
  * Exit Reasons
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch initializes the VT-d Posted-interrupt Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Add comments to function 'pi_desc_init' to clarify why we
  update the posted-interrupt descriptor in non-atomical way
  in it.

v3:
- Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c
- Remove the 'inline' flag of pi_desc_init()

 xen/arch/x86/hvm/vmx/vmcs.c       | 22 ++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index a0a97e7..5f67797 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/apic.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -951,6 +952,24 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+/*
+ * This function is only called in a vCPU's initialization phase,
+ * so we can update the posted-interrupt descriptor in non-atomic way.
+ */
+static void pi_desc_init(struct vcpu *v)
+{
+    uint32_t dest;
+
+    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
+
+    dest = cpu_physical_id(v->processor);
+
+    if ( x2apic_enabled )
+        v->arch.hvm_vmx.pi_desc.ndst = dest;
+    else
+        v->arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -1089,6 +1108,9 @@ static int construct_vmcs(struct vcpu *v)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
+        if ( iommu_intpost )
+            pi_desc_init(v);
+
         __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 8d91110..70b254f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -88,6 +88,8 @@ typedef enum {
 #define EPT_EMT_WB              6
 #define EPT_EMT_RSV2            7
 
+#define PI_xAPIC_NDST_MASK      0xFF00
+
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-24  7:29   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 09/17] VT-d: Remove pointless casts Feng Wu
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Parenthesize '1 << POSTED_INTR_ON' and '1 << POSTED_INTR_SN'

v7:
- Coding style
- Read the current pi_desc.control as the intial value of prev.control

v6:
- Add some comments

v5:
- keep the vcpu_kick() at the end of vmx_deliver_posted_intr()
- Keep the 'return' after calling __vmx_deliver_posted_interrupt()

v4:
- Coding style.
- V3 removes a vcpu_kick() from the eoi_exitmap_changed path
  incorrectly, fix it.

v3:
- use cmpxchg to test SN/ON and set ON

 xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c32d863..741a271 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1701,8 +1701,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
          */
         pi_set_on(&v->arch.hvm_vmx.pi_desc);
     }
-    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
+    else
     {
+        struct pi_desc old, new, prev;
+
+        prev.control = v->arch.hvm_vmx.pi_desc.control;
+
+        do {
+            /*
+             * Currently, we don't support urgent interrupt, all
+             * interrupts are recognized as non-urgent interrupt,
+             * so we cannot send posted-interrupt when 'SN' is set.
+             * Besides that, if 'ON' is already set, we cannot set
+             * posted-interrupts as well.
+             */
+            if ( pi_test_sn(&prev) || pi_test_on(&prev) )
+            {
+                vcpu_kick(v);
+                return;
+            }
+
+            old.control = v->arch.hvm_vmx.pi_desc.control &
+                          ~((1 << POSTED_INTR_ON) | (1 << POSTED_INTR_SN));
+            new.control = v->arch.hvm_vmx.pi_desc.control |
+                          (1 << POSTED_INTR_ON);
+
+            prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
+                                   old.control, new.control);
+        } while ( prev.control != old.control );
+
         __vmx_deliver_posted_interrupt(v);
         return;
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 09/17] VT-d: Remove pointless casts
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-24  7:30   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

Remove pointless casts.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Remove an 'u32' casting omitted in v5

v5:
- Newly added.

 xen/drivers/passthrough/vtd/utils.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 44c4ef5..a75059f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -234,10 +234,9 @@ static void dump_iommu_info(unsigned char key)
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    (u32)p->hi.svt, (u32)p->hi.sq, (u32)p->hi.sid,
-                    (u32)p->lo.dst, (u32)p->lo.vector, (u32)p->lo.avail,
-                    (u32)p->lo.dlm, (u32)p->lo.tm, (u32)p->lo.rh,
-                    (u32)p->lo.dm, (u32)p->lo.fpd, (u32)p->lo.p);
+                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
+                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
+                    p->lo.fpd, p->lo.p);
                 print_cnt++;
             }
             if ( iremap_entries )
@@ -281,11 +280,10 @@ static void dump_iommu_info(unsigned char key)
 
                 printk("   %02x:  %04x   %x    %x   %x   %x   %x    %x"
                     "    %x     %02x\n", i,
-                    (u32)remap->index_0_14 | ((u32)remap->index_15 << 15),
-                    (u32)remap->format, (u32)remap->mask, (u32)remap->trigger,
-                    (u32)remap->irr, (u32)remap->polarity,
-                    (u32)remap->delivery_status, (u32)remap->delivery_mode,
-                    (u32)remap->vector);
+                    remap->index_0_14 | (remap->index_15 << 15),
+                    remap->format, remap->mask, remap->trigger, remap->irr,
+                    remap->polarity, remap->delivery_status, remap->delivery_mode,
+                    remap->vector);
             }
         }
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 09/17] VT-d: Remove pointless casts Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v8:
- Make use of the __uint128_t member in struct iremap_entry when needed

v7:
- Add a __uint128_t member to the union in struct iremap_entry

v4:
- res_4 is not a bitfiled, correct it.
- Expose 'im' to remapped irte as well.

v3:
- Use u32 instead of u64 to define the bitfields in 'struct iremap_entry'
- Limit using bitfield if possible

 xen/drivers/passthrough/vtd/intremap.c | 92 +++++++++++++++++-----------------
 xen/drivers/passthrough/vtd/iommu.h    | 44 ++++++++++------
 xen/drivers/passthrough/vtd/utils.c    |  8 +--
 3 files changed, 81 insertions(+), 63 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 987bbe9..8f135e1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -122,9 +122,9 @@ static u16 hpetid_to_bdf(unsigned int hpet_id)
 static void set_ire_sid(struct iremap_entry *ire,
                         unsigned int svt, unsigned int sq, unsigned int sid)
 {
-    ire->hi.svt = svt;
-    ire->hi.sq = sq;
-    ire->hi.sid = sid;
+    ire->remap.svt = svt;
+    ire->remap.sq = sq;
+    ire->remap.sid = sid;
 }
 
 static void set_ioapic_source_id(int apic_id, struct iremap_entry *ire)
@@ -219,7 +219,7 @@ static unsigned int alloc_remap_entry(struct iommu *iommu, unsigned int nr)
         else
             p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-        if ( p->lo_val || p->hi_val ) /* not a free entry */
+        if ( p->val ) /* not a free entry */
             found = 0;
         else if ( ++found == nr )
             break;
@@ -253,7 +253,7 @@ static int remap_entry_to_ioapic_rte(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->val == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -263,13 +263,13 @@ static int remap_entry_to_ioapic_rte(
         return -EFAULT;
     }
 
-    old_rte->vector = iremap_entry->lo.vector;
-    old_rte->delivery_mode = iremap_entry->lo.dlm;
-    old_rte->dest_mode = iremap_entry->lo.dm;
-    old_rte->trigger = iremap_entry->lo.tm;
+    old_rte->vector = iremap_entry->remap.vector;
+    old_rte->delivery_mode = iremap_entry->remap.dlm;
+    old_rte->dest_mode = iremap_entry->remap.dm;
+    old_rte->trigger = iremap_entry->remap.tm;
     old_rte->__reserved_2 = 0;
     old_rte->dest.logical.__reserved_1 = 0;
-    old_rte->dest.logical.logical_dest = iremap_entry->lo.dst >> 8;
+    old_rte->dest.logical.logical_dest = iremap_entry->remap.dst >> 8;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -317,27 +317,28 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     if ( rte_upper )
     {
         if ( x2apic_enabled )
-            new_ire.lo.dst = value;
+            new_ire.remap.dst = value;
         else
-            new_ire.lo.dst = (value >> 24) << 8;
+            new_ire.remap.dst = (value >> 24) << 8;
     }
     else
     {
         *(((u32 *)&new_rte) + 0) = value;
-        new_ire.lo.fpd = 0;
-        new_ire.lo.dm = new_rte.dest_mode;
-        new_ire.lo.tm = new_rte.trigger;
-        new_ire.lo.dlm = new_rte.delivery_mode;
+        new_ire.remap.fpd = 0;
+        new_ire.remap.dm = new_rte.dest_mode;
+        new_ire.remap.tm = new_rte.trigger;
+        new_ire.remap.dlm = new_rte.delivery_mode;
         /* Hardware require RH = 1 for LPR delivery mode */
-        new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-        new_ire.lo.avail = 0;
-        new_ire.lo.res_1 = 0;
-        new_ire.lo.vector = new_rte.vector;
-        new_ire.lo.res_2 = 0;
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.avail = 0;
+        new_ire.remap.res_1 = 0;
+        new_ire.remap.vector = new_rte.vector;
+        new_ire.remap.res_2 = 0;
 
         set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
-        new_ire.hi.res_1 = 0;
-        new_ire.lo.p = 1;     /* finally, set present bit */
+        new_ire.remap.res_3 = 0;
+        new_ire.remap.res_4 = 0;
+        new_ire.remap.p = 1;     /* finally, set present bit */
 
         /* now construct new ioapic rte entry */
         remap_rte->vector = new_rte.vector;
@@ -510,7 +511,7 @@ static int remap_entry_to_msi_msg(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->val == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -523,25 +524,25 @@ static int remap_entry_to_msi_msg(
     msg->address_hi = MSI_ADDR_BASE_HI;
     msg->address_lo =
         MSI_ADDR_BASE_LO |
-        ((iremap_entry->lo.dm == 0) ?
+        ((iremap_entry->remap.dm == 0) ?
             MSI_ADDR_DESTMODE_PHYS:
             MSI_ADDR_DESTMODE_LOGIC) |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_ADDR_REDIRECTION_CPU:
             MSI_ADDR_REDIRECTION_LOWPRI);
     if ( x2apic_enabled )
-        msg->dest32 = iremap_entry->lo.dst;
+        msg->dest32 = iremap_entry->remap.dst;
     else
-        msg->dest32 = (iremap_entry->lo.dst >> 8) & 0xff;
+        msg->dest32 = (iremap_entry->remap.dst >> 8) & 0xff;
     msg->address_lo |= MSI_ADDR_DEST_ID(msg->dest32);
 
     msg->data =
         MSI_DATA_TRIGGER_EDGE |
         MSI_DATA_LEVEL_ASSERT |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_DATA_DELIVERY_FIXED:
             MSI_DATA_DELIVERY_LOWPRI) |
-        iremap_entry->lo.vector;
+        iremap_entry->remap.vector;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -600,29 +601,30 @@ static int msi_msg_to_remap_entry(
     memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
 
     /* Set interrupt remapping table entry */
-    new_ire.lo.fpd = 0;
-    new_ire.lo.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.lo.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.lo.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
+    new_ire.remap.fpd = 0;
+    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
+    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-    new_ire.lo.avail = 0;
-    new_ire.lo.res_1 = 0;
-    new_ire.lo.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                        MSI_DATA_VECTOR_MASK;
-    new_ire.lo.res_2 = 0;
+    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+    new_ire.remap.avail = 0;
+    new_ire.remap.res_1 = 0;
+    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                            MSI_DATA_VECTOR_MASK;
+    new_ire.remap.res_2 = 0;
     if ( x2apic_enabled )
-        new_ire.lo.dst = msg->dest32;
+        new_ire.remap.dst = msg->dest32;
     else
-        new_ire.lo.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                          & 0xff) << 8;
+        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
+                             & 0xff) << 8;
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.hi.res_1 = 0;
-    new_ire.lo.p = 1;    /* finally, set present bit */
+    new_ire.remap.res_3 = 0;
+    new_ire.remap.res_4 = 0;
+    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 22abefe..b440b69 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -281,29 +281,45 @@ struct dma_pte {
 /* interrupt remap entry */
 struct iremap_entry {
   union {
-    u64 lo_val;
+    __uint128_t val;
+    struct { u64 lo, hi; };
     struct {
-        u64 p       : 1,
+        u16 p       : 1,
             fpd     : 1,
             dm      : 1,
             rh      : 1,
             tm      : 1,
             dlm     : 3,
             avail   : 4,
-            res_1   : 4,
-            vector  : 8,
-            res_2   : 8,
-            dst     : 32;
-    }lo;
-  };
-  union {
-    u64 hi_val;
+            res_1   : 3,
+            im      : 1;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4;
+    } remap;
     struct {
-        u64 sid     : 16,
-            sq      : 2,
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
             svt     : 2,
-            res_1   : 44;
-    }hi;
+            res_5   : 12;
+        u32 pda_h;
+    } post;
   };
 };
 
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index a75059f..6daa156 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -230,13 +230,13 @@ static void dump_iommu_info(unsigned char key)
                 else
                     p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-                if ( !p->lo.p )
+                if ( !p->remap.p )
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
-                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
-                    p->lo.fpd, p->lo.p);
+                    p->remap.svt, p->remap.sq, p->remap.sid, p->remap.dst,
+                    p->remap.vector, p->remap.avail, p->remap.dlm, p->remap.tm,
+                    p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
                 print_cnt++;
             }
             if ( iremap_entries )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-24  7:45   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Jan Beulich, Yang Zhang, Feng Wu

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Some minor adjustment

v7:
- Remove __uint128_t cast
- Remove Kevin's Ack due to a bug fix for v6
- Reword some comments
- Setup posted IRTE from zeroed structure

v6:
- In some error cases, the desc->lock will be unlocked twice, fix it.
- Coding style fix.
- Add some comments.

v5:
- Make some function parameters const
- Call "spin_unlock_irq(&desc->lock);" a little eariler
- Add "ASSERT(spin_is_locked(&pcidevs_lock))"
- -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen

v4:
- Don't inline setup_posted_irte()
- const struct pi_desc *pi_desc for setup_posted_irte()
- return -EINVAL when pirq_spin_lock_irq_desc() fails.
- Make some variables const
- Release irq desc lock earlier in pi_update_irte()
- Remove the pointless do-while() loop when doing cmpxchg16b()

v3:
- Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
- Change the return type of pi_update_irte() to int.
- Remove some pointless printk message in pi_update_irte().
- Use structure assignment instead of memcpy() for irte copy.

 xen/drivers/passthrough/vtd/intremap.c | 120 +++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |   6 ++
 xen/include/asm-x86/iommu.h            |   2 +
 3 files changed, 128 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 8f135e1..67e4f6d 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -899,3 +899,123 @@ void iommu_disable_x2apic_IR(void)
     for_each_drhd_unit ( drhd )
         disable_qinval(drhd->iommu);
 }
+
+static void setup_posted_irte(
+    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
+    const struct pi_desc *pi_desc, const uint8_t gvec)
+{
+    memset(new_ire, sizeof(*new_ire), 0);
+
+    if ( !old_ire->remap.im )
+    {
+        new_ire->post.p = old_ire->remap.p;
+        new_ire->post.fpd = old_ire->remap.fpd;
+        new_ire->post.sid = old_ire->remap.sid;
+        new_ire->post.sq = old_ire->remap.sq;
+        new_ire->post.svt = old_ire->remap.svt;
+    }
+    else
+    {
+        new_ire->post.p = old_ire->post.p;
+        new_ire->post.fpd = old_ire->post.fpd;
+        new_ire->post.sid = old_ire->post.sid;
+        new_ire->post.sq = old_ire->post.sq;
+        new_ire->post.svt = old_ire->post.svt;
+        new_ire->post.urg = old_ire->post.urg;
+    }
+
+    new_ire->post.im = 1;
+    new_ire->post.vector = gvec;
+    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
+}
+
+/*
+ * This function is used to update the IRTE for posted-interrupt
+ * when guest changes MSI/MSI-X information.
+ */
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
+    const uint8_t gvec)
+{
+    struct irq_desc *desc;
+    const struct msi_desc *msi_desc;
+    int remap_index;
+    int rc = 0;
+    const struct pci_dev *pci_dev;
+    const struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+    struct ir_ctrl *ir_ctrl;
+    struct iremap_entry *iremap_entries = NULL, *p = NULL;
+    struct iremap_entry new_ire, old_ire;
+    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    __uint128_t ret;
+
+    desc = pirq_spin_lock_irq_desc(pirq, NULL);
+    if ( !desc )
+        return -EINVAL;
+
+    msi_desc = desc->msi_desc;
+    if ( !msi_desc )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    pci_dev = msi_desc->dev;
+    if ( !pci_dev )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    remap_index = msi_desc->remap_index;
+
+    spin_unlock_irq(&desc->lock);
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    /*
+     * FIXME: For performance reasons we should store the 'iommu' pointer in
+     * 'struct msi_desc' in some other place, so we don't need to waste
+     * time searching it here.
+     */
+    drhd = acpi_find_matched_drhd_unit(pci_dev);
+    if ( !drhd )
+        return -ENODEV;
+
+    iommu = drhd->iommu;
+    ir_ctrl = iommu_ir_ctrl(iommu);
+    if ( !ir_ctrl )
+        return -ENODEV;
+
+    spin_lock_irq(&ir_ctrl->iremap_lock);
+
+    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
+
+    old_ire = *p;
+
+    /* Setup/Update interrupt remapping table entry. */
+    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
+    ret = cmpxchg16b(p, &old_ire, &new_ire);
+
+    /*
+     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
+     * and the hardware cannot update the IRTE behind us, so the return value
+     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
+     */
+    ASSERT(ret == old_ire.val);
+
+    iommu_flush_cache_entry(p, sizeof(*p));
+    iommu_flush_iec_index(iommu, 0, remap_index);
+
+    unmap_vtd_domain_page(iremap_entries);
+
+    spin_unlock_irq(&ir_ctrl->iremap_lock);
+
+    return 0;
+
+ unlock_out:
+    spin_unlock_irq(&desc->lock);
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index b440b69..c55ee08 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -323,6 +323,12 @@ struct iremap_entry {
   };
 };
 
+/*
+ * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
+ * the upper 26 bits of lest significiant 32 bits is available.
+ */
+#define PDA_LOW_BIT    26
+
 /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
 #define IREMAP_PAGE_ORDER  8
 
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 29203d7..92f0900 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -31,6 +31,8 @@ int iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 12/17] x86: move some APIC related macros to apicdef.h
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 13/17] Update IRTE according to guest interrupt config changes Feng Wu
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Feng Wu, Jan Beulich, Andrew Cooper

Move some APIC related macros to apicdef.h, so they can be used
outside of vlapic.c.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Minor changes

v7:
- Put the Macros to the right place inside the file.

v6:
- Newly introduced.

 xen/arch/x86/hvm/vlapic.c     | 5 -----
 xen/include/asm-x86/apicdef.h | 3 +++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index b893b40..9b7c871 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -65,11 +65,6 @@ static const unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] =
      LVT_MASK
 };
 
-/* Following could belong in apicdef.h */
-#define APIC_SHORT_MASK                  0xc0000
-#define APIC_DEST_NOSHORT                0x0
-#define APIC_DEST_MASK                   0x800
-
 #define vlapic_lvt_vector(vlapic, lvt_type)                     \
     (vlapic_get_reg(vlapic, lvt_type) & APIC_VECTOR_MASK)
 
diff --git a/xen/include/asm-x86/apicdef.h b/xen/include/asm-x86/apicdef.h
index 6069fce..8752287 100644
--- a/xen/include/asm-x86/apicdef.h
+++ b/xen/include/asm-x86/apicdef.h
@@ -54,9 +54,11 @@
 #define			APIC_ESR_RECVILL	0x00040
 #define			APIC_ESR_ILLREGA	0x00080
 #define		APIC_ICR	0x300
+#define			APIC_DEST_NOSHORT	0x00000
 #define			APIC_DEST_SELF		0x40000
 #define			APIC_DEST_ALLINC	0x80000
 #define			APIC_DEST_ALLBUT	0xC0000
+#define			APIC_SHORT_MASK		0xC0000
 #define			APIC_ICR_RR_MASK	0x30000
 #define			APIC_ICR_RR_INVALID	0x00000
 #define			APIC_ICR_RR_INPROG	0x10000
@@ -64,6 +66,7 @@
 #define			APIC_INT_LEVELTRIG	0x08000
 #define			APIC_INT_ASSERT		0x04000
 #define			APIC_ICR_BUSY		0x01000
+#define			APIC_DEST_MASK		0x00800
 #define			APIC_DEST_LOGICAL	0x00800
 #define			APIC_DEST_PHYSICAL	0x00000
 #define			APIC_DM_FIXED		0x00000
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 13/17] Update IRTE according to guest interrupt config changes
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu, Jan Beulich

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Remove local variable 'bitmap_array_size'
- Use switch to replace if-else

v7:
- Remove some pointless debug printk
- Fix a logic error when assigning 'delivery_mode'
- Adjust the definition of local variable 'idx'
- Add a dprintk if we cannot find the vCPU from 'pi_find_dest_vcpu'
- Add 'else if ( delivery_mode == dest_Fixed )' in 'pi_find_dest_vcpu'

v6:
- Use macro to replace plain numbers
- Correct the overflow error in a loop

v5:
- Make 'struct vcpu *vcpu' const

v4:
- Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
- Make 'dest_id' uint32_t
- Rename 'size' to 'bitmap_array_size'
- find_next_bit() and find_first_bit() always return unsigned int,
  so no need to check whether the return value is less than 0.
- Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
- Remove useless warning message
- Create a seperate function vector_hashing_dest() to find the
- destination of lowest-priority interrupts.
- Change some comments

v3:
- Use bitmap to store the all the possible destination vCPUs of an
  interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 123 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 122 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index bda9374..6b1ee6a 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -25,6 +25,7 @@
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
+#include <asm/io_apic.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *    vCPUs in an array.
+ * 2. Use "gvec % max number of destination vCPUs" to find the right
+ *    destination vCPU in the array for the lowest-priority interrupt.
+ */
+static struct vcpu *vector_hashing_dest(const struct domain *d,
+                                        uint32_t dest_id,
+                                        bool_t dest_mode,
+                                        uint8_t gvec)
+
+{
+    unsigned long *dest_vcpu_bitmap;
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+    unsigned int i;
+
+    dest_vcpu_bitmap = xzalloc_array(unsigned long,
+                                     BITS_TO_LONGS(d->max_vcpus));
+    if ( !dest_vcpu_bitmap )
+        return NULL;
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                dest_id, dest_mode) )
+            continue;
+
+        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
+        dest_vcpus++;
+    }
+
+    if ( dest_vcpus != 0 )
+    {
+        unsigned int mod = gvec % dest_vcpus;
+        unsigned int idx = 0;
+
+        for ( i = 0; i <= mod; i++ )
+        {
+            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
+            BUG_ON(idx >= d->max_vcpus);
+        }
+
+        dest = d->vcpu[idx - 1];
+    }
+
+    xfree(dest_vcpu_bitmap);
+
+    return dest;
+}
+
+/*
+ * The purpose of this routine is to find the right destination vCPU for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:
+ *
+ * - For lowest-priority interrupts, use vector-hashing mechanism to find
+ *   the destination.
+ * - Otherwise, for single destination interrupt, it is straightforward to
+ *   find the destination vCPU and return true.
+ * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
+ *   so return NULL.
+ */
+static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
+                                      bool_t dest_mode, uint8_t delivery_mode,
+                                      uint8_t gvec)
+{
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+
+    switch ( delivery_mode )
+    {
+    case dest_LowestPrio:
+        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
+    case dest_Fixed:
+        for_each_vcpu ( d, v )
+        {
+            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                    dest_id, dest_mode) )
+                continue;
+
+            dest_vcpus++;
+            dest = v;
+        }
+
+        /* For fixed mode, we only handle single-destination interrupts. */
+        if ( dest_vcpus == 1 )
+            return dest;
+
+        break;
+    default:
+        break;
+    }
+
+    return NULL;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -256,7 +359,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, dest_mode;
+        uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -329,11 +432,29 @@ int pt_irq_create_bind(
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
         dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
         dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
+        delivery_mode = (pirq_dpci->gmsi.gflags & VMSI_DELIV_MASK) >>
+                         GFLAGS_SHIFT_DELIV_MODE;
+
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+
+        /* Use interrupt posting if it is supported. */
+        if ( iommu_intpost )
+        {
+            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
+                                          delivery_mode, pirq_dpci->gmsi.gvec);
+
+            if ( vcpu )
+                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+            else
+                dprintk(XENLOG_G_INFO,
+                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
+                        vcpu, pirq_dpci->gmsi.gvec);
+        }
+
         break;
     }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 14/17] vmx: Properly handle notification event when vCPU is running
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 13/17] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7:
- Retain 'cli' in the comments to make it more understandable.
- Register another notification event handler when VT-d PI is enabled

v6:
- Ack the interrupt in the beginning of pi_notification_interrupt()

v4:
- Coding style.

v3:
- Make pi_notification_interrupt() static

 xen/arch/x86/hvm/vmx/vmx.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 741a271..e448b31 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1975,6 +1975,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is running. */
+static void pi_notification_interrupt(struct cpu_user_regs *regs)
+{
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    /*
+     * We get here when a vCPU is running in root-mode (such as via hypercall,
+     * or any other reasons which can result in VM-Exit), and before vCPU is
+     * back to non-root, external interrupts from an assigned device happen
+     * and a notification event is delivered to this logical CPU.
+     *
+     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
+     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
+     * be synced to vIRR before VM-Exit in time.
+     *
+     * Please refer to the following code fragments from
+     * xen/arch/x86/hvm/vmx/entry.S:
+     *
+     *     .Lvmx_do_vmentry
+     *
+     *      ......
+     *
+     *      point 1
+     *
+     *      cli
+     *      cmp  %ecx,(%rdx,%rax,1)
+     *      jnz  .Lvmx_process_softirqs
+     *
+     *      ......
+     *
+     *      je   .Lvmx_launch
+     *
+     *      ......
+     *
+     *     .Lvmx_process_softirqs:
+     *      sti
+     *      call do_softirq
+     *      jmp  .Lvmx_do_vmentry
+     *
+     * If VT-d engine issues a notification event at point 1 above, it cannot
+     * be delivered to the guest during this VM-entry without raising the
+     * softirq in this notification handler.
+     */
+    raise_softirq(VCPU_KICK_SOFTIRQ);
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2012,7 +2059,12 @@ const struct hvm_function_table * __init start_vmx(void)
     }
 
     if ( cpu_has_vmx_posted_intr_processing )
-        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    {
+        if ( iommu_intpost )
+            alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+        else
+            alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    }
     else
     {
         vmx_function_table.deliver_posted_intr = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (13 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-03  9:07   ` Wu, Feng
  2015-11-10 18:14   ` Dario Faggioli
  2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Feng Wu

This patch includes the following aspects:
- Handling logic when vCPU is blocked:
    * Add a global vector to wake up the blocked vCPU
      when an interrupt is being posted to it (This part
      was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
    * Define two per-cpu variables:
          1. pi_blocked_vcpu:
            A list storing the vCPUs which were blocked
            on this pCPU.

          2. pi_blocked_vcpu_lock:
            The spinlock to protect pi_blocked_vcpu.

- Add the following hooks, this part was suggested
  by George Dunlap <george.dunlap@eu.citrix.com> and
  Dario Faggioli <dario.faggioli@citrix.com>.
    * arch_vcpu_block()
      Called alled before vcpu is blocking and update the PID
      (posted-interrupt descriptor).

    * vmx_pi_switch_from()
      Called before context switch, we update the PID when the
      vCPU is preempted or going to sleep.

    * vmx_pi_switch_to()
      Called after context switch, we update the PID when the vCPU
      is going to run.

- Before VM-entry, check the state of PI descriptor, make sure the
'NV' field is set to '&posted_intr_vector' when the guest is running
in non-root mode. Suggested by Jan Beulich <jbeulich@suse.com>.

When we handle the lazy context switch for the following two scenarios:
- Preempted by a tasklet, which uses in an idle context.
- the prev vcpu is in offline and no new available vcpus in run queue.
We don't change the 'SN' bit in posted-interrupt descriptor, this
may incur spurious PI notification events, but since PI notification
event is only sent when 'ON' is clear, and once the PI notificatoin
is sent, ON is set by hardware, hence no more notification events
before 'ON' is clear. Besides that, spurious PI notification events are
going to happen from time to time in Xen hypervisor, such as, when
guests trap to Xen and PI notification event happens, there is
nothing Xen actually needs to do about it, the interrupts will be
delivered to guest atht the next time we do a VMENTRY.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v9:
- Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
- Add vmx_pi_state_change() and call it before VM Entry

v8:
- Remove the lazy context switch handling for PI state transition
- Change PI state in vcpu_block() and do_poll() when the vCPU
  is going to be blocked

v7:
- Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts
  and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
  into this patch, so it is self-contained and more convenient
  for code review.
- Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
- Coding style
- Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
- Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
- Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
- Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
- Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
- Use 'spin_lock' and 'spin_unlock' when the interrupt has been
  already disabled.
- Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
- Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
- Call .pi_ctxt_switch_to() __context_switch() instead of directly
  calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
- Make .pi_block_cpu unsigned int
- Use list_del() instead of list_del_init()
- Coding style

One remaining item in v7:
Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
need Dario or George's input about this.

v6:
- Add two static inline functions for pi context switch
- Fix typos

v5:
- Rename arch_vcpu_wake to arch_vcpu_wake_prepare
- Make arch_vcpu_wake_prepare() inline for ARM
- Merge the ARM dummy hook with together
- Changes to some code comments
- Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
  PI is disabled or the vCPU is not in HVM
- Coding style

v4:
- Newly added

Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
v6:
- Fix some typos
- Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()

v4:
- Use local variables in pi_wakeup_interrupt()
- Remove vcpu from the blocked list when pi_desc.on==1, this
- avoid kick vcpu multiple times.
- Remove tasklet

v3:
- This patch is generated by merging the following three patches in v2:
   [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
   [RFC v2 10/15] vmx: Define two per-cpu variables
   [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
- rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
- Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
- rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
- Make pi_wakeup_interrupt() static
- Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
- move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
- Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
- Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'

 xen/arch/x86/hvm/hvm.c             |   6 ++
 xen/arch/x86/hvm/vmx/vmcs.c        |   2 +
 xen/arch/x86/hvm/vmx/vmx.c         | 187 +++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c              |   7 +-
 xen/include/asm-arm/domain.h       |   2 +
 xen/include/asm-x86/domain.h       |   2 +
 xen/include/asm-x86/hvm/hvm.h      |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  10 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |   4 +
 9 files changed, 220 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c957610..015c35b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6817,6 +6817,12 @@ bool_t altp2m_vcpu_emulate_ve(struct vcpu *v)
     return 0;
 }
 
+void arch_vcpu_block(struct vcpu *v)
+{
+    if ( v->arch.vcpu_block )
+        v->arch.vcpu_block(v);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 5f67797..5abe960 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -661,6 +661,8 @@ int vmx_cpu_up(void)
     if ( cpu_has_vmx_vpid )
         vpid_sync_all();
 
+    vmx_pi_per_cpu_init(cpu);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e448b31..09c9c08 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -83,7 +83,132 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 
+/*
+ * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be woken up.
+ */
+static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
+static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
+
+void vmx_pi_per_cpu_init(unsigned int cpu)
+{
+    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
+    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
+}
+
+void vmx_vcpu_block(struct vcpu *v)
+{
+    unsigned long flags;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
+
+    /*
+     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
+     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
+     * the per-CPU list, we also save it to posted-interrupt descriptor and
+     * make it as the destination of the wake-up notification event.
+     */
+    v->arch.hvm_vmx.pi_block_cpu = v->processor;
+
+    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                      v->arch.hvm_vmx.pi_block_cpu), flags);
+    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                           v->arch.hvm_vmx.pi_block_cpu), flags);
+
+    ASSERT(!pi_test_sn(pi_desc));
+
+    /*
+     * We don't need to set the 'NDST' field, since it should point to
+     * the same pCPU as v->processor.
+     */
+
+    write_atomic(&pi_desc->nv, pi_wakeup_vector);
+}
+
+static void vmx_pi_switch_from(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) || !iommu_intpost ||
+         test_bit(_VPF_blocked, &v->pause_flags) )
+        return;
+
+    /*
+     * The vCPU has been preempted or went to sleep. We don't need to send
+     * notification event to a non-running vcpu, the interrupt information
+     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
+     * to run next time.
+     */
+    pi_set_sn(pi_desc);
+}
+
+static void vmx_pi_switch_to(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
+        return;
+
+    if ( x2apic_enabled )
+        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
+    else
+        write_atomic(&pi_desc->ndst,
+                     MASK_INSR(cpu_physical_id(v->processor),
+                     PI_xAPIC_NDST_MASK));
+
+    pi_clear_sn(pi_desc);
+}
+
+static void vmx_pi_state_change(struct vcpu *v)
+{
+    unsigned long flags;
+    unsigned int pi_block_cpu;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
+        return;
+
+    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
+
+    /*
+     * Set 'NV' field back to posted_intr_vector, so the
+     * Posted-Interrupts can be delivered to the vCPU when
+     * it is running in non-root mode.
+     */
+    if ( pi_desc->nv != posted_intr_vector )
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+
+    /* the vCPU is not on any blocking list. */
+    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
+    if ( pi_block_cpu == NR_CPUS )
+        return;
+
+    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+
+    /*
+     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
+     * removed from the blocking list while we are acquiring the lock.
+     */
+    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
+    {
+        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+        return;
+    }
+
+    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+}
+
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -106,10 +231,18 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
 
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
+
+    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
 
+    if ( iommu_intpost && has_hvm_container_vcpu(v) )
+        v->arch.vcpu_block = vmx_vcpu_block;
+
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
@@ -721,6 +854,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
     vmx_save_dr(v);
+    vmx_pi_switch_from(v);
 }
 
 static void vmx_ctxt_switch_to(struct vcpu *v)
@@ -745,6 +879,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
+    vmx_pi_switch_to(v);
 }
 
 
@@ -1975,6 +2110,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is blocked. */
+static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+    struct arch_vmx_struct *vmx, *tmp;
+    struct vcpu *v;
+    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
+    struct list_head *blocked_vcpus =
+                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
+    LIST_HEAD(list);
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    spin_lock(lock);
+
+    /*
+     * XXX: The length of the list depends on how many vCPU is current
+     * blocked on this specific pCPU. This may hurt the interrupt latency
+     * if the list grows to too many entries.
+     */
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
+    {
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocked_vcpu_list);
+            vmx->pi_block_cpu = NR_CPUS;
+
+            /*
+             * We cannot call vcpu_unblock here, since it also needs
+             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
+             * set in another list and unblock them after we release
+             * 'pi_blocked_vcpu_lock'.
+             */
+            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
+        }
+    }
+
+    spin_unlock(lock);
+
+    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
+    {
+        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+        list_del(&vmx->pi_vcpu_on_set_list);
+        vcpu_unblock(v);
+    }
+}
+
 /* Handle VT-d posted-interrupt when VCPU is running. */
 static void pi_notification_interrupt(struct cpu_user_regs *regs)
 {
@@ -2061,7 +2243,10 @@ const struct hvm_function_table * __init start_vmx(void)
     if ( cpu_has_vmx_posted_intr_processing )
     {
         if ( iommu_intpost )
+        {
             alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+        }
         else
             alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
     }
@@ -3515,6 +3700,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     struct hvm_vcpu_asid *p_asid;
     bool_t need_flush;
 
+    vmx_pi_state_change(curr);
+
     if ( !cpu_has_vmx_vpid )
         goto out;
     if ( nestedhvm_vcpu_in_guestmode(curr) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3eefed7..6e5c2f9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -800,11 +800,11 @@ void vcpu_block(void)
 
     set_bit(_VPF_blocked, &v->pause_flags);
 
+    arch_vcpu_block(v);
+
     /* Check for events /after/ blocking: avoids wakeup waiting race. */
     if ( local_events_need_delivery() )
-    {
         clear_bit(_VPF_blocked, &v->pause_flags);
-    }
     else
     {
         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
@@ -837,6 +837,8 @@ static long do_poll(struct sched_poll *sched_poll)
     v->poll_evtchn = -1;
     set_bit(v->vcpu_id, d->poll_mask);
 
+    arch_vcpu_block(v);
+
 #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
     /* Check for events /after/ setting flags: avoids wakeup waiting race. */
     smp_mb();
@@ -854,6 +856,7 @@ static long do_poll(struct sched_poll *sched_poll)
 #endif
 
     rc = 0;
+
     if ( local_events_need_delivery() )
         goto out;
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 56aa208..dee5dd3 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -301,6 +301,8 @@ static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
     return vaff;
 }
 
+static inline void arch_vcpu_block(struct vcpu *v) {}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 0fce09e..27ebcc0 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -481,6 +481,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    void (*vcpu_block) (struct vcpu *);
+
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 3cac64f..0a77998 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -545,6 +545,8 @@ static inline bool_t hvm_altp2m_supported(void)
     return hvm_funcs.altp2m_supported;
 }
 
+void arch_vcpu_block(struct vcpu *v);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 81c9e63..70f4d0b 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -160,6 +160,16 @@ struct arch_vmx_struct {
     struct page_info     *vmwrite_bitmap;
 
     struct page_info     *pml_pg;
+
+    struct list_head     pi_blocked_vcpu_list;
+    struct list_head     pi_vcpu_on_set_list;
+
+    /*
+     * Before vCPU is blocked, it is added to the global per-cpu list
+     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
+     * event to 'pi_block_cpu' and wakeup the related vCPU.
+     */
+    unsigned int         pi_block_cpu;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 70b254f..2eaea32 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -28,6 +28,8 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/vmx/vmcs.h>
 
+extern uint8_t pi_wakeup_vector;
+
 typedef union {
     struct {
         u64 r       :   1,  /* bit 0 - Read permission */
@@ -557,6 +559,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
+void vmx_pi_per_cpu_init(unsigned int cpu);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 16/17] VT-d: Dump the posted format IRTE
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (14 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-17 15:55   ` Jan Beulich
  2015-11-24  7:50   ` Tian, Kevin
  2015-11-03  8:43 ` [PATCH v9 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
  2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
  17 siblings, 2 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

Add the utility to dump the posted format IRTE.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Coding style

v7:
- Remove the two stage loop

v6:
- Fix a typo

v4:
- Newly added

 xen/drivers/passthrough/vtd/utils.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 6daa156..a1b3ebc 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -203,6 +203,9 @@ static void dump_iommu_info(unsigned char key)
             ecap_intr_remap(iommu->ecap) ? "" : "not ",
             (status & DMA_GSTS_IRES) ? " and enabled" : "" );
 
+        printk("  Interrupt Posting: %ssupported.\n",
+               cap_intr_post(iommu->cap) ? "" : "not ");
+
         if ( status & DMA_GSTS_IRES )
         {
             /* Dump interrupt remapping table. */
@@ -213,8 +216,9 @@ static void dump_iommu_info(unsigned char key)
 
             printk("  Interrupt remapping table (nr_entry=%#x. "
                 "Only dump P=1 entries here):\n", nr_entry);
-            printk("       SVT  SQ   SID      DST  V  AVL DLM TM RH DM "
-                   "FPD P\n");
+            printk("R means remapped format, P means posted format.\n");
+            printk("R:       SVT  SQ   SID  V  AVL FPD      DST DLM TM RH DM P\n");
+            printk("P:       SVT  SQ   SID  V  AVL FPD              PDA  URG P\n");
             for ( i = 0; i < nr_entry; i++ )
             {
                 struct iremap_entry *p;
@@ -232,11 +236,21 @@ static void dump_iommu_info(unsigned char key)
 
                 if ( !p->remap.p )
                     continue;
-                printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
-                    "   %x %x\n", i,
-                    p->remap.svt, p->remap.sq, p->remap.sid, p->remap.dst,
-                    p->remap.vector, p->remap.avail, p->remap.dlm, p->remap.tm,
-                    p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
+                if ( !p->remap.im )
+                    printk("R:  %04x:  %x   %x  %04x %02x    %x   %x %08x   %x  %x  %x  %x %x\n",
+                           i,
+                           p->remap.svt, p->remap.sq, p->remap.sid,
+                           p->remap.vector, p->remap.avail, p->remap.fpd,
+                           p->remap.dst, p->remap.dlm, p->remap.tm, p->remap.rh,
+                           p->remap.dm, p->remap.p);
+                else
+                    printk("P:  %04x:  %x   %x  %04x %02x    %x   %x %16lx    %x %x\n",
+                           i,
+                           p->post.svt, p->post.sq, p->post.sid, p->post.vector,
+                           p->post.avail, p->post.fpd,
+                           ((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
+                           p->post.urg, p->post.p);
+
                 print_cnt++;
             }
             if ( iremap_entries )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 17/17] Add a command line parameter for VT-d posted-interrupts
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (15 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
@ 2015-11-03  8:43 ` Feng Wu
  2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
  17 siblings, 0 replies; 45+ messages in thread
From: Feng Wu @ 2015-11-03  8:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu, Jan Beulich

Enable VT-d Posted-Interrupts and add a command line
parameter for it.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v6:
- Change the default value to 'false' in xen-command-line.markdown

 docs/misc/xen-command-line.markdown | 9 ++++++++-
 xen/drivers/passthrough/iommu.c     | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a2e427c..ecaf221 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -855,7 +855,7 @@ debug hypervisor only).
 > Default: `new` unless directed-EOI is supported
 
 ### iommu
-> `= List of [ <boolean> | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
+> `= List of [ <boolean> | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
 
 > Sub-options:
 
@@ -882,6 +882,13 @@ debug hypervisor only).
 >> Control the use of interrupt remapping (DMA remapping will always be enabled
 >> if IOMMU functionality is enabled).
 
+> `intpost`
+
+> Default: `false`
+
+>> Control the use of interrupt posting, which depends on the availability of
+>> interrupt remapping.
+
 > `qinval` (VT-d)
 
 > Default: `true`
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 36d5cc0..8d03076 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -38,6 +38,7 @@ static void iommu_dump_p2m_table(unsigned char key);
  *   no-snoop                   Disable VT-d Snoop Control
  *   no-qinval                  Disable VT-d Queued Invalidation
  *   no-intremap                Disable VT-d Interrupt Remapping
+ *   no-intpost                 Disable VT-d Interrupt posting
  */
 custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
@@ -105,6 +106,8 @@ static void __init parse_iommu_param(char *s)
             iommu_qinval = val;
         else if ( !strcmp(s, "intremap") )
             iommu_intremap = val;
+        else if ( !strcmp(s, "intpost") )
+            iommu_intpost = val;
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-11-03  9:07   ` Wu, Feng
  2015-11-10 16:59     ` Dario Faggioli
  2015-11-10 18:14   ` Dario Faggioli
  1 sibling, 1 reply; 45+ messages in thread
From: Wu, Feng @ 2015-11-03  9:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Wu, Feng

BTW, In the previous discussion, we will do the PI state adjustment in vmx_do_resume, however, seems this is not a good place to do this, since this function gets called only if the scheduling occurs, no matter it switches to another vCPU or continue runs the current vCPU. If no scheduling happens during "VM->Xen->VM", vmx-do_resume() will not get called. So I put the PI state adjustment code in vmx_vmenter_helper(), if you have any other good suggestions, please let me know, thanks a lot!

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Tuesday, November 3, 2015 4:43 PM
> To: xen-devel@lists.xen.org
> Cc: Wu, Feng <feng.wu@intel.com>; Keir Fraser <keir@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Tian,
> Kevin <kevin.tian@intel.com>; George Dunlap <george.dunlap@eu.citrix.com>;
> Dario Faggioli <dario.faggioli@citrix.com>
> Subject: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
> 
> This patch includes the following aspects:
> - Handling logic when vCPU is blocked:
>     * Add a global vector to wake up the blocked vCPU
>       when an interrupt is being posted to it (This part
>       was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
>     * Define two per-cpu variables:
>           1. pi_blocked_vcpu:
>             A list storing the vCPUs which were blocked
>             on this pCPU.
> 
>           2. pi_blocked_vcpu_lock:
>             The spinlock to protect pi_blocked_vcpu.
> 
> - Add the following hooks, this part was suggested
>   by George Dunlap <george.dunlap@eu.citrix.com> and
>   Dario Faggioli <dario.faggioli@citrix.com>.
>     * arch_vcpu_block()
>       Called alled before vcpu is blocking and update the PID
>       (posted-interrupt descriptor).
> 
>     * vmx_pi_switch_from()
>       Called before context switch, we update the PID when the
>       vCPU is preempted or going to sleep.
> 
>     * vmx_pi_switch_to()
>       Called after context switch, we update the PID when the vCPU
>       is going to run.
> 
> - Before VM-entry, check the state of PI descriptor, make sure the
> 'NV' field is set to '&posted_intr_vector' when the guest is running
> in non-root mode. Suggested by Jan Beulich <jbeulich@suse.com>.
> 
> When we handle the lazy context switch for the following two scenarios:
> - Preempted by a tasklet, which uses in an idle context.
> - the prev vcpu is in offline and no new available vcpus in run queue.
> We don't change the 'SN' bit in posted-interrupt descriptor, this
> may incur spurious PI notification events, but since PI notification
> event is only sent when 'ON' is clear, and once the PI notificatoin
> is sent, ON is set by hardware, hence no more notification events
> before 'ON' is clear. Besides that, spurious PI notification events are
> going to happen from time to time in Xen hypervisor, such as, when
> guests trap to Xen and PI notification event happens, there is
> nothing Xen actually needs to do about it, the interrupts will be
> delivered to guest atht the next time we do a VMENTRY.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v9:
> - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
> - Add vmx_pi_state_change() and call it before VM Entry
> 
> v8:
> - Remove the lazy context switch handling for PI state transition
> - Change PI state in vcpu_block() and do_poll() when the vCPU
>   is going to be blocked
> 
> v7:
> - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
>   and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
>   into this patch, so it is self-contained and more convenient
>   for code review.
> - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> - Coding style
> - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
>   already disabled.
> - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> - Call .pi_ctxt_switch_to() __context_switch() instead of directly
>   calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> - Make .pi_block_cpu unsigned int
> - Use list_del() instead of list_del_init()
> - Coding style
> 
> One remaining item in v7:
> Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> need Dario or George's input about this.
> 
> v6:
> - Add two static inline functions for pi context switch
> - Fix typos
> 
> v5:
> - Rename arch_vcpu_wake to arch_vcpu_wake_prepare
> - Make arch_vcpu_wake_prepare() inline for ARM
> - Merge the ARM dummy hook with together
> - Changes to some code comments
> - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
>   PI is disabled or the vCPU is not in HVM
> - Coding style
> 
> v4:
> - Newly added
> 
> Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
> v6:
> - Fix some typos
> - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()
> 
> v4:
> - Use local variables in pi_wakeup_interrupt()
> - Remove vcpu from the blocked list when pi_desc.on==1, this
> - avoid kick vcpu multiple times.
> - Remove tasklet
> 
> v3:
> - This patch is generated by merging the following three patches in v2:
>    [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>    [RFC v2 10/15] vmx: Define two per-cpu variables
>    [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> 
>  xen/arch/x86/hvm/hvm.c             |   6 ++
>  xen/arch/x86/hvm/vmx/vmcs.c        |   2 +
>  xen/arch/x86/hvm/vmx/vmx.c         | 187
> +++++++++++++++++++++++++++++++++++++
>  xen/common/schedule.c              |   7 +-
>  xen/include/asm-arm/domain.h       |   2 +
>  xen/include/asm-x86/domain.h       |   2 +
>  xen/include/asm-x86/hvm/hvm.h      |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  10 ++
>  xen/include/asm-x86/hvm/vmx/vmx.h  |   4 +
>  9 files changed, 220 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index c957610..015c35b 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -6817,6 +6817,12 @@ bool_t altp2m_vcpu_emulate_ve(struct vcpu *v)
>      return 0;
>  }
> 
> +void arch_vcpu_block(struct vcpu *v)
> +{
> +    if ( v->arch.vcpu_block )
> +        v->arch.vcpu_block(v);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 5f67797..5abe960 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -661,6 +661,8 @@ int vmx_cpu_up(void)
>      if ( cpu_has_vmx_vpid )
>          vpid_sync_all();
> 
> +    vmx_pi_per_cpu_init(cpu);
> +
>      return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e448b31..09c9c08 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,132 @@ static int vmx_msr_write_intercept(unsigned int msr,
> uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
> 
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
> +    /*
> +     * The vCPU has been preempted or went to sleep. We don't need to send
> +     * notification event to a non-running vcpu, the interrupt information
> +     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> +     * to run next time.
> +     */
> +    pi_set_sn(pi_desc);
> +}
> +
> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));
> +
> +    pi_clear_sn(pi_desc);
> +}
> +
> +static void vmx_pi_state_change(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> +     * removed from the blocking list while we are acquiring the lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +}
> +
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -106,10 +231,18 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> 
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> 
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> 
> +    if ( iommu_intpost && has_hvm_container_vcpu(v) )
> +        v->arch.vcpu_block = vmx_vcpu_block;
> +
>      if ( (rc = vmx_create_vmcs(v)) != 0 )
>      {
>          dprintk(XENLOG_WARNING,
> @@ -721,6 +854,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
>      vmx_save_guest_msrs(v);
>      vmx_restore_host_msrs();
>      vmx_save_dr(v);
> +    vmx_pi_switch_from(v);
>  }
> 
>  static void vmx_ctxt_switch_to(struct vcpu *v)
> @@ -745,6 +879,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
> 
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
> +    vmx_pi_switch_to(v);
>  }
> 
> 
> @@ -1975,6 +2110,53 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
> 
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> +    LIST_HEAD(list);
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +
> +            /*
> +             * We cannot call vcpu_unblock here, since it also needs
> +             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
> +             * set in another list and unblock them after we release
> +             * 'pi_blocked_vcpu_lock'.
> +             */
> +            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +
> +    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
> +    {
> +        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +        list_del(&vmx->pi_vcpu_on_set_list);
> +        vcpu_unblock(v);
> +    }
> +}
> +
>  /* Handle VT-d posted-interrupt when VCPU is running. */
>  static void pi_notification_interrupt(struct cpu_user_regs *regs)
>  {
> @@ -2061,7 +2243,10 @@ const struct hvm_function_table * __init
> start_vmx(void)
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
>          if ( iommu_intpost )
> +        {
>              alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
> +            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> +        }
>          else
>              alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
>      }
> @@ -3515,6 +3700,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs
> *regs)
>      struct hvm_vcpu_asid *p_asid;
>      bool_t need_flush;
> 
> +    vmx_pi_state_change(curr);
> +
>      if ( !cpu_has_vmx_vpid )
>          goto out;
>      if ( nestedhvm_vcpu_in_guestmode(curr) )
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 3eefed7..6e5c2f9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -800,11 +800,11 @@ void vcpu_block(void)
> 
>      set_bit(_VPF_blocked, &v->pause_flags);
> 
> +    arch_vcpu_block(v);
> +
>      /* Check for events /after/ blocking: avoids wakeup waiting race. */
>      if ( local_events_need_delivery() )
> -    {
>          clear_bit(_VPF_blocked, &v->pause_flags);
> -    }
>      else
>      {
>          TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
> @@ -837,6 +837,8 @@ static long do_poll(struct sched_poll *sched_poll)
>      v->poll_evtchn = -1;
>      set_bit(v->vcpu_id, d->poll_mask);
> 
> +    arch_vcpu_block(v);
> +
>  #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
>      /* Check for events /after/ setting flags: avoids wakeup waiting race. */
>      smp_mb();
> @@ -854,6 +856,7 @@ static long do_poll(struct sched_poll *sched_poll)
>  #endif
> 
>      rc = 0;
> +
>      if ( local_events_need_delivery() )
>          goto out;
> 
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 56aa208..dee5dd3 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -301,6 +301,8 @@ static inline register_t vcpuid_to_vaffinity(unsigned int
> vcpuid)
>      return vaff;
>  }
> 
> +static inline void arch_vcpu_block(struct vcpu *v) {}
> +
>  #endif /* __ASM_DOMAIN_H__ */
> 
>  /*
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 0fce09e..27ebcc0 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -481,6 +481,8 @@ struct arch_vcpu
>      void (*ctxt_switch_from) (struct vcpu *);
>      void (*ctxt_switch_to) (struct vcpu *);
> 
> +    void (*vcpu_block) (struct vcpu *);
> +
>      struct vpmu_struct vpmu;
> 
>      /* Virtual Machine Extensions */
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 3cac64f..0a77998 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -545,6 +545,8 @@ static inline bool_t hvm_altp2m_supported(void)
>      return hvm_funcs.altp2m_supported;
>  }
> 
> +void arch_vcpu_block(struct vcpu *v);
> +
>  #ifndef NDEBUG
>  /* Permit use of the Forced Emulation Prefix in HVM guests */
>  extern bool_t opt_hvm_fep;
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> x86/hvm/vmx/vmcs.h
> index 81c9e63..70f4d0b 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,16 @@ struct arch_vmx_struct {
>      struct page_info     *vmwrite_bitmap;
> 
>      struct page_info     *pml_pg;
> +
> +    struct list_head     pi_blocked_vcpu_list;
> +    struct list_head     pi_vcpu_on_set_list;
> +
> +    /*
> +     * Before vCPU is blocked, it is added to the global per-cpu list
> +     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> +     * event to 'pi_block_cpu' and wakeup the related vCPU.
> +     */
> +    unsigned int         pi_block_cpu;
>  };
> 
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-
> x86/hvm/vmx/vmx.h
> index 70b254f..2eaea32 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,8 @@
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/vmx/vmcs.h>
> 
> +extern uint8_t pi_wakeup_vector;
> +
>  typedef union {
>      struct {
>          u64 r       :   1,  /* bit 0 - Read permission */
> @@ -557,6 +559,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
>  void free_p2m_hap_data(struct p2m_domain *p2m);
>  void p2m_init_hap_data(struct p2m_domain *p2m);
> 
> +void vmx_pi_per_cpu_init(unsigned int cpu);
> +
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION         0
>  #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] Add VT-d Posted-Interrupts support
  2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (16 preceding siblings ...)
  2015-11-03  8:43 ` [PATCH v9 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-11-10  7:33 ` Wu, Feng
  2015-11-10 10:39   ` Jan Beulich
  2015-11-17 15:39   ` Jan Beulich
  17 siblings, 2 replies; 45+ messages in thread
From: Wu, Feng @ 2015-11-10  7:33 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, dario.faggioli, Wu, Feng, Tian, Kevin,
	Jan Beulich (JBeulich@suse.com)

Thanks for your effort on this series and kindly ping..

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Tuesday, November 3, 2015 4:43 PM
> To: xen-devel@lists.xen.org
> Cc: Wu, Feng <feng.wu@intel.com>
> Subject: [PATCH v9 00/17] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-
> technology/vt-directed-io-spec.html
> 
> v9:
> - [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
>     * Make the description more generic.
> 
> - [PATCH v9 02/17] Add cmpxchg16b support for x86-64
>     * Make the *ptr operand an input and output.
> 
> - [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
>     * Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare().
>     * Add vmx_pi_state_change() and call it before VM Entry.
> 
> Feng Wu (17):
>  r   VT-d Posted-intterrupt (PI) design
>      Add cmpxchg16b support for x86-64
>  ra  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
>  r   vt-d: VT-d Posted-Interrupts feature detection
>  ra  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
>  r   vmx: Add some helper functions for Posted-Interrupts
>  ra  vmx: Initialize VT-d Posted-Interrupts Descriptor
>  r   vmx: Suppress posting interrupts when 'SN' is set
>  r   VT-d: Remove pointless casts
>   a  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
>  r   vt-d: Add API to update IRTE when VT-d PI is used
>   a  x86: move some APIC related macros to apicdef.h
>   a  Update IRTE according to guest interrupt config changes
>   a  vmx: Properly handle notification event when vCPU is running
>      vmx: VT-d posted-interrupt core logic handling
>      VT-d: Dump the posted format IRTE
>  ra  Add a command line parameter for VT-d posted-interrupts
> 
>  r = has been 'Reviewed-by'
>  a = has been 'Acked-by'
> 
>  docs/misc/vtd-pi.txt                   | 329 +++++++++++++++++++++++++++++++++
>  docs/misc/xen-command-line.markdown    |   9 +-
>  xen/arch/x86/hvm/hvm.c                 |   6 +
>  xen/arch/x86/hvm/vlapic.c              |   5 -
>  xen/arch/x86/hvm/vmx/vmcs.c            |  24 +++
>  xen/arch/x86/hvm/vmx/vmx.c             | 270 ++++++++++++++++++++++++++-
>  xen/common/schedule.c                  |   7 +-
>  xen/drivers/passthrough/io.c           | 123 +++++++++++-
>  xen/drivers/passthrough/iommu.c        |  16 +-
>  xen/drivers/passthrough/vtd/intremap.c | 212 ++++++++++++++++-----
>  xen/drivers/passthrough/vtd/iommu.c    |  14 +-
>  xen/drivers/passthrough/vtd/iommu.h    |  51 +++--
>  xen/drivers/passthrough/vtd/utils.c    |  40 ++--
>  xen/include/asm-arm/domain.h           |   2 +
>  xen/include/asm-x86/apicdef.h          |   3 +
>  xen/include/asm-x86/domain.h           |   2 +
>  xen/include/asm-x86/hvm/hvm.h          |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h     |  24 ++-
>  xen/include/asm-x86/hvm/vmx/vmx.h      |  27 +++
>  xen/include/asm-x86/iommu.h            |   2 +
>  xen/include/asm-x86/x86_64/system.h    |  33 ++++
>  xen/include/xen/iommu.h                |   2 +-
>  22 files changed, 1113 insertions(+), 90 deletions(-)
>  create mode 100644 docs/misc/vtd-pi.txt
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] Add VT-d Posted-Interrupts support
  2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
@ 2015-11-10 10:39   ` Jan Beulich
  2015-11-17 15:39   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2015-11-10 10:39 UTC (permalink / raw)
  To: Feng Wu; +Cc: george.dunlap, dario.faggioli, Kevin Tian, xen-devel

>>> On 10.11.15 at 08:33, <feng.wu@intel.com> wrote:
> Thanks for your effort on this series and kindly ping..

Don't worry, it hasn't been forgotten. But getting to it will take time.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
  2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-11-10 14:52   ` Dario Faggioli
  2015-11-24  6:34     ` Tian, Kevin
  2015-11-24  6:34   ` Tian, Kevin
  1 sibling, 1 reply; 45+ messages in thread
From: Dario Faggioli @ 2015-11-10 14:52 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Yang Zhang


[-- Attachment #1.1: Type: text/plain, Size: 4389 bytes --]

On Tue, 2015-11-03 at 16:43 +0800, Feng Wu wrote:

> diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> new file mode 100644
> index 0000000..f9b4637
> --- /dev/null
> +++ b/docs/misc/vtd-pi.txt

> +With VT-d Posted-interrupt we can get the following advantages:
> +- Direct delivery of external interrupts to running vCPUs without
> VMM
> +intervention
> +- Decrease the interrupt migration complexity. On vCPU migration,
> software
> +can atomically co-migrate all interrupts targeting the migrating
> vCPU. For
> +virtual machines with assigned devices, migrating a vCPU across
> pCPUs
> +either incurs the overhead of forwarding interrupts in software
> (e.g. via VMM
> +generated IPIs), or complexity to independently migrate each
> interrupt targeting
> +the vCPU to the new pCPU. However, after enabling VT-d PI, the
> destination vCPU
> +of an external interrupt from assigned devices is stored in the IRTE
> (i.e.
> +Posted-interrupt Descriptor Address), when vCPU is migrated to
> another pCPU,
> +we will set this new pCPU in the 'NDST' filed of Posted-interrupt

                                           ^field

> +More information about APICv and CPU-side Posted-interrupt, please
> refer
> +to Chapter 29, and Section 29.6 in the Intel SDM:
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/
> 64-ia-32-architectures-software-developer-manual-325462.pdf
> +
ISTR Andrew said something about chapter names and numbers, and about
what to link in these cases, during v8 review.

> +- Update posted-interrupt descriptor during vCPU scheduling
> +
> +The basic idea here is:
> +1. When vCPU is running
> +        - Set 'NV' to 'posted_intr_vector'.
> +        - Clear 'SN' to accept posted-interrupts.
> +        - Set 'NDST' to the pCPU on which the vCPU will be running.
> +2. When vCPU is blocked
> +        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> +          related vCPU when posted-interrupt happens for it.
> +          Please refer to the above section about the new global
> vector.
> +        - Clear 'SN' to accept posted-interrupts
> +3. When vCPU is preempted or sleeping
> +        - Set 'SN' to suppress non-urgent interrupts
> +          (Currently, we only support non-urgent interrupts)
> +         When vCPU is preempted or sleep, it doesn't need to accept
> +         posted-interrupt notification event since we don't change
> the behavior
> +         of scheduler when the interrupt occurs, we still need wait
> for the next
> +         scheduling of the vCPU. When external interrupts from
> assigned devices occur,
> +         the interrupts are recorded in PIR, and will be synced to
> IRR before VM-Entry.
> +        - Set 'NV' to 'posted_intr_vector'.
> +
> +- How to wakeup blocked vCPU when an interrupt is posted for it
> (wakeup notification handler).
> +
>
> [...]
>
> +Here are what we do for the blocked vCPU:
> +1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
> +vCPU on the pCPU.
> +2. When the vCPU is going to block, insert the vCPU
> +to the per-cpu list belonging to the pCPU it was running.
> +3. When the vCPU is unblocked, remove the vCPU from the related pCPU
> list.
> +
> +In the handler of 'pi_wakeup_vector', we do:
> +1. Get the physical CPU.
> +2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON'
> is set,
> +we unblock the associated vCPU.
> +
This is indeed more general, and the fact that it does no longer
mentions RUNSTATEs, makes it more adherent to the actual implementation
(and hence more useful), so thanks for doing this.

Personally, I'd add a quick comment about how this, despite being
related to blocking and unblocking, happens actually inside VMX
handlers, i.e., (quickly) what is the relationship within these sets of
events (blocking/unblocking VMENTER/EXIT) and why it is ok to do things
like they are done.

I'm not too fussed about this, though. So, if others don't think
something like this is necessary, I'm fine with what you have here.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-03  9:07   ` Wu, Feng
@ 2015-11-10 16:59     ` Dario Faggioli
  2015-11-11  1:43       ` Wu, Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Dario Faggioli @ 2015-11-10 16:59 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tian, Kevin, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2136 bytes --]

On Tue, 2015-11-03 at 09:07 +0000, Wu, Feng wrote:
> BTW, In the previous discussion, we will do the PI state adjustment
> in vmx_do_resume, however, seems this is not a good place to do this,
> since this function gets called only if the scheduling occurs, no
> matter it switches to another vCPU or continue runs the current vCPU.
> If no scheduling happens during "VM->Xen->VM", vmx-do_resume() will
> not get called. 
>
Mmm... When I first read this, it seemed to me to be a good thing, and
a reason for actually putting your logic in there (that would avoid
paying the price of going through it during every VMENTRY, which you
were yourself hesitant about)!

So, maybe I'm missing/misremembering something. Just to be sure, can
you tell me...

> So I put the PI state adjustment code in vmx_vmenter_helper(), if you
> have any other good suggestions, please let me know, thanks a lot!
> 
... what is the reason(s) why you need to do the update even if no
scheduling happened?

Looking at the code again, I think one reason may be to cope with when
vcpu_block() is called, but then local_events_need_delivery() returns
true, as shown here below:

void vcpu_block(void)
{
    struct vcpu *v = current;

    set_bit(_VPF_blocked, &v->pause_flags);

    arch_vcpu_block(v);                             // nv <--- pi_wakeup_vector

    /* Check for events /after/ blocking: avoids wakeup waiting race. */
    if ( local_events_need_delivery() )             // <=== TRUE
        clear_bit(_VPF_blocked, &v->pause_flags);
                                                    // we want nv == posted_intr_vector,
                                                       so we need vmx_pi_state_change()
                                                       to run, even if we're not scheduling
}

Is this the case?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
  2015-11-03  9:07   ` Wu, Feng
@ 2015-11-10 18:14   ` Dario Faggioli
  1 sibling, 0 replies; 45+ messages in thread
From: Dario Faggioli @ 2015-11-10 18:14 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 12274 bytes --]

On Tue, 2015-11-03 at 16:43 +0800, Feng Wu wrote:
> This patch includes the following aspects:
> - Handling logic when vCPU is blocked:
>     * Add a global vector to wake up the blocked vCPU
>       when an interrupt is being posted to it (This part
>       was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
>     * Define two per-cpu variables:
>           1. pi_blocked_vcpu:
>             A list storing the vCPUs which were blocked
>             on this pCPU.
> 
>           2. pi_blocked_vcpu_lock:
>             The spinlock to protect pi_blocked_vcpu.
> 
> - Add the following hooks, this part was suggested
>   by George Dunlap <george.dunlap@eu.citrix.com> and
>   Dario Faggioli <dario.faggioli@citrix.com>.
>     * arch_vcpu_block()
>       Called alled before vcpu is blocking and update the PID
>       (posted-interrupt descriptor).
> 
>     * vmx_pi_switch_from()
>       Called before context switch, we update the PID when the
>       vCPU is preempted or going to sleep.
> 
>     * vmx_pi_switch_to()
>       Called after context switch, we update the PID when the vCPU
>       is going to run.
> 
> - Before VM-entry, check the state of PI descriptor, make sure the
> 'NV' field is set to '&posted_intr_vector' when the guest is running
> in non-root mode. Suggested by Jan Beulich <jbeulich@suse.com>.
> 
So, TBH, I never felt in love with this changelog. I'm saying this now
because I'm starting to loose track of things, and the changelog is not
helpful as it could be, in explaining to me what this patch (and, in
this case, this particular version of the patch) does.

In fact, above, you're listing the functions and the data structure you
are adding, and some of the places from where they're called/used.
That, however, can all be figured out from the code.

I think this would want to become a bit more high level, and try to
describe the _reason_why_ the various bits and pieces have been put in
the shape they are. For instance, although you added a paragraph about
the VMENTRY check, it is not clear to me, by just reading this
description, that it is meant at avoiding having to add much more
hooks, as well as, as another example, what you put in the other email
about it needing to be called on ever VMENTRY (the "we can't use
vmx_do_resume() thing).

I'm not saying you should rewrite another design document in this
changelog, and I know it's hard to pick the best structure and content,
but, really, this isn't ideal.

Also, explicitly crediting people who suggested the various items is
not helpful too. In 5 years someone (either of us or new), won't care
much who actually came up with the idea of having hooks. What he'll
care, is for the changelog to provide guidance throughout what the code
does, and why it's been done that way. There is no trace of a similar
crediting scheme (which, in principle, it would apply to any, o at
least to major, comments you get at each review) in our history, and it
makes the changelog heavier to read, so I'd cut it all off.

> ---
> v9:
> - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
> - Add vmx_pi_state_change() and call it before VM Entry
> 
And this is why I said I'm losing track of what was your actual aim for
this round.

Since when the possibility of moving stuff to VMX helpers came out

Jan said, during v7 review (about the need to undo what
arch_vcpu_block() does in case local_evnts_need_delivery() is true):

1) "Couldn't this be taken care of by, if necessary, adjusting PI state
    in vmx_do_resume()?"

And George said:

2) "we could just clear SN on every vmexit, and set SN and NDST on 
    every vmentry, couldn't we?  Then we wouldn't need hooks on the 
    context switch path at all (which are only there to catch running
    -> runnable and runnable -> running) -- we could just have the 
    block/unblock hooks (to change NV and add / remove things from the 
    list)."

Then, during v8, review comments focused mostly on 1), because we felt
the need to try to get rid of the block-cancelling hook.

What you seem to have done in this patch, is something in the middle.
In fact, AFAIU:
 - the block-cancelling hook is no longer there,
 - the wakeup hooks are gone as well,
 - the context switch hooks are still there.

Now, I'm not saying that this is not satisfactory as a result, I'm
saying it's hard to keep track of what's going on and to figure out
what your aim was in this round (and hence check the aim against the
review comments, and the implementation against the aim).

If describing that does not fit in the changelog (e.g., because it
would be something about how the code is evolving, as it probably is),
then put it in the cover letter (this very patch is almost the only
thing being changed significantly in the series anyway), or in a few
paragraph, here, after the "---", above the v9 bulleted list of
changes.

> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e448b31..09c9c08 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,132 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content);

> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per
> pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> it for
> +     * the per-CPU list, we also save it to posted-interrupt
> descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v
> ->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point
> to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>
Glad to see the 'do { } while ( cpmxchg(xxx) );' not being necessary
any longer! :-)

> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
The '!iommu_intpost' is probably the quickest condition to check. If it
is, it should be the first thing checked.

The same applies in a few other places.

> +static void vmx_pi_state_change(struct vcpu *v)
> +{
FWIW, I don't like the name. _state_change to what?

Looking at the code, it seems to me that what it does it to always sets
the PI descriptor to posted_intr_vector, and removes the vCPU from a
blocking list.

The reason why it does so is because we don't want the vCPU to run
while being in a blocked list, with posted_wakeup_vector in its PI
descriptor.

I'd go for a name that better reflect either the operations being done
or the aim.

> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
> was
> +     * removed from the blocking list while we are acquiring the
> lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> pi_block_cpu), flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> pi_block_cpu), flags);
> +}
> +

> @@ -106,10 +231,18 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    if ( iommu_intpost && has_hvm_container_vcpu(v) )
> +        v->arch.vcpu_block = vmx_vcpu_block;
> +
Genuine question: is it really possible to be here, in
vmx_vcpu_initialise(), with has_hvm_container_vcpu()==false ?

> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock,
> smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu,
> smp_processor_id());
> +    LIST_HEAD(list);
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is
> current
> +     * blocked on this specific pCPU. This may hurt the interrupt
> latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus,
> pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +
> +            /*
> +             * We cannot call vcpu_unblock here, since it also needs
> +             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
> +             * set in another list and unblock them after we release
> +             * 'pi_blocked_vcpu_lock'.
> +             */
>
Is this still true? If yes, is it evident from this patch? I'm asking
because I don't see that here.

I see that it was like that in v7, before of the wakeup hooks... Is
that still the case?

> +            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +
> +    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
> +    {
> +        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +        list_del(&vmx->pi_vcpu_on_set_list);
> +        vcpu_unblock(v);
> +    }
> +}
> +

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 3eefed7..6e5c2f9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c

> @@ -854,6 +856,7 @@ static long do_poll(struct sched_poll
> *sched_poll)
>  #endif
>  
>      rc = 0;
> +
>      if ( local_events_need_delivery() )
>          goto out;
> 
Just don't add this. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-10 16:59     ` Dario Faggioli
@ 2015-11-11  1:43       ` Wu, Feng
  2015-11-11 13:29         ` Wu, Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Wu, Feng @ 2015-11-11  1:43 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, November 11, 2015 12:59 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> On Tue, 2015-11-03 at 09:07 +0000, Wu, Feng wrote:
> > BTW, In the previous discussion, we will do the PI state adjustment
> > in vmx_do_resume, however, seems this is not a good place to do this,
> > since this function gets called only if the scheduling occurs, no
> > matter it switches to another vCPU or continue runs the current vCPU.
> > If no scheduling happens during "VM->Xen->VM", vmx-do_resume() will
> > not get called.
> >
> Mmm... When I first read this, it seemed to me to be a good thing, and
> a reason for actually putting your logic in there (that would avoid
> paying the price of going through it during every VMENTRY, which you
> were yourself hesitant about)!
> 
> So, maybe I'm missing/misremembering something. Just to be sure, can
> you tell me...
> 
> > So I put the PI state adjustment code in vmx_vmenter_helper(), if you
> > have any other good suggestions, please let me know, thanks a lot!
> >
> ... what is the reason(s) why you need to do the update even if no
> scheduling happened?
> 
> Looking at the code again, I think one reason may be to cope with when
> vcpu_block() is called, but then local_events_need_delivery() returns
> true, as shown here below:
> 
> void vcpu_block(void)
> {
>     struct vcpu *v = current;
> 
>     set_bit(_VPF_blocked, &v->pause_flags);
> 
>     arch_vcpu_block(v);                             // nv <--- pi_wakeup_vector
> 
>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>     if ( local_events_need_delivery() )             // <=== TRUE
>         clear_bit(_VPF_blocked, &v->pause_flags);
>                                                     // we want nv == posted_intr_vector,
>                                                        so we need vmx_pi_state_change()
>                                                        to run, even if we're not scheduling
> }
> 
> Is this the case?

Yes, that is the case, after arch_vcpu_block() is called, we don't
cancel it even if there are events to be delivered, so we need to check
whether the NV is the right value, if it isn't, which means after
after arch_vcpu_block(),local_events_need_delivery() check returns
true, so we need to adjust the PI state before VM-Entry. As Jan mentioned
before, we will check the value before actually update it atomically, so
I think it is okay from performance point of view.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-11  1:43       ` Wu, Feng
@ 2015-11-11 13:29         ` Wu, Feng
  2015-11-11 14:46           ` Dario Faggioli
  0 siblings, 1 reply; 45+ messages in thread
From: Wu, Feng @ 2015-11-11 13:29 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Wu, Feng
> Sent: Wednesday, November 11, 2015 9:43 AM
> To: Dario Faggioli <dario.faggioli@citrix.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>; Wu, Feng
> <feng.wu@intel.com>
> Subject: RE: [Xen-devel] [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> 
> 
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > Sent: Wednesday, November 11, 2015 12:59 AM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> > Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> > <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> > Subject: Re: [Xen-devel] [PATCH v9 15/17] vmx: VT-d posted-interrupt core
> logic
> > handling
> >
> > On Tue, 2015-11-03 at 09:07 +0000, Wu, Feng wrote:
> > > BTW, In the previous discussion, we will do the PI state adjustment
> > > in vmx_do_resume, however, seems this is not a good place to do this,
> > > since this function gets called only if the scheduling occurs, no
> > > matter it switches to another vCPU or continue runs the current vCPU.
> > > If no scheduling happens during "VM->Xen->VM", vmx-do_resume() will
> > > not get called.
> > >
> > Mmm... When I first read this, it seemed to me to be a good thing, and
> > a reason for actually putting your logic in there (that would avoid
> > paying the price of going through it during every VMENTRY, which you
> > were yourself hesitant about)!
> >
> > So, maybe I'm missing/misremembering something. Just to be sure, can
> > you tell me...
> >
> > > So I put the PI state adjustment code in vmx_vmenter_helper(), if you
> > > have any other good suggestions, please let me know, thanks a lot!
> > >
> > ... what is the reason(s) why you need to do the update even if no
> > scheduling happened?
> >
> > Looking at the code again, I think one reason may be to cope with when
> > vcpu_block() is called, but then local_events_need_delivery() returns
> > true, as shown here below:
> >
> > void vcpu_block(void)
> > {
> >     struct vcpu *v = current;
> >
> >     set_bit(_VPF_blocked, &v->pause_flags);
> >
> >     arch_vcpu_block(v);                             // nv <--- pi_wakeup_vector
> >
> >     /* Check for events /after/ blocking: avoids wakeup waiting race. */
> >     if ( local_events_need_delivery() )             // <=== TRUE
> >         clear_bit(_VPF_blocked, &v->pause_flags);
> >                                                     // we want nv == posted_intr_vector,
> >                                                        so we need vmx_pi_state_change()
> >                                                        to run, even if we're not scheduling
> > }
> >
> > Is this the case?
> 
> Yes, that is the case, after arch_vcpu_block() is called, we don't
> cancel it even if there are events to be delivered, so we need to check
> whether the NV is the right value, if it isn't, which means after
> after arch_vcpu_block(),local_events_need_delivery() check returns
> true, so we need to adjust the PI state before VM-Entry. As Jan mentioned
> before, we will check the value before actually update it atomically, so
> I think it is okay from performance point of view.

Besides the case you mentioned above, it also covers another case: now
we remove the arch hook in vcpu_wake(), so when vCPU is woken up,
the 'NV' filed is still 'wakeup_vector', hence we need to recover it before
VM Entry.

Thanks,
Feng

> 
> Thanks,
> Feng

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-11-11 13:29         ` Wu, Feng
@ 2015-11-11 14:46           ` Dario Faggioli
  0 siblings, 0 replies; 45+ messages in thread
From: Dario Faggioli @ 2015-11-11 14:46 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tian, Kevin, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1761 bytes --]

On Wed, 2015-11-11 at 13:29 +0000, Wu, Feng wrote:
> 
> > -----Original Message-----
> > From: Wu, Feng

> > Yes, that is the case, after arch_vcpu_block() is called, we don't
> > cancel it even if there are events to be delivered, so we need to
> > check
> > whether the NV is the right value, if it isn't, which means after
> > after arch_vcpu_block(),local_events_need_delivery() check returns
> > true, so we need to adjust the PI state before VM-Entry. As Jan
> > mentioned
> > before, we will check the value before actually update it
> > atomically, so
> > I think it is okay from performance point of view.
> 
> Besides the case you mentioned above, it also covers another case:
> now
> we remove the arch hook in vcpu_wake(), so when vCPU is woken up,
> the 'NV' filed is still 'wakeup_vector', hence we need to recover it
> before
> VM Entry.
> 
Well, sure. That, however, could have been handled by having this code
called from vmx_do_resume(), I think.

In fact, waking up certainly calls for the descriptor to be updated,
but it also involves scheduling (well, at least starting executing
after a wakeup does). So schedule_tail(), and hence vmx_do_resume(),
would at some point be involved.

My question above was about confirming that the (only) reason why you
need that code in vmx_enter_helper() rather than in vmx_do_resume() was
the 'block cancellation issue', not about why you need to change the PI
control block at all. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] Add VT-d Posted-Interrupts support
  2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
  2015-11-10 10:39   ` Jan Beulich
@ 2015-11-17 15:39   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2015-11-17 15:39 UTC (permalink / raw)
  To: Feng Wu; +Cc: george.dunlap, dario.faggioli, Kevin Tian, xen-devel

>>> On 10.11.15 at 08:33, <feng.wu@intel.com> wrote:
> Thanks for your effort on this series and kindly ping..

Well - now that I got to look at the state of the series, the primary
problem with some more of it to go in than the two patches I pushed
earlier today is that there are VT-d/VMX maintainer acks missing.
And your ping only included Kevin... (And to be honest, I'd expect a
proper review from them, not just acks.)

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 16/17] VT-d: Dump the posted format IRTE
  2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
@ 2015-11-17 15:55   ` Jan Beulich
  2015-11-24  7:50   ` Tian, Kevin
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2015-11-17 15:55 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Zhang, Kevin Tian, xen-devel

>>> On 03.11.15 at 09:43, <feng.wu@intel.com> wrote:
> Add the utility to dump the posted format IRTE.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
  2015-11-10 14:52   ` Dario Faggioli
@ 2015-11-24  6:34     ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  6:34 UTC (permalink / raw)
  To: Dario Faggioli, Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Zhang, Yang Z, Keir Fraser, Jan Beulich

> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, November 10, 2015 10:52 PM
> >
> > +Here are what we do for the blocked vCPU:
> > +1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
> > +vCPU on the pCPU.
> > +2. When the vCPU is going to block, insert the vCPU
> > +to the per-cpu list belonging to the pCPU it was running.
> > +3. When the vCPU is unblocked, remove the vCPU from the related pCPU
> > list.
> > +
> > +In the handler of 'pi_wakeup_vector', we do:
> > +1. Get the physical CPU.
> > +2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON'
> > is set,
> > +we unblock the associated vCPU.
> > +
> This is indeed more general, and the fact that it does no longer
> mentions RUNSTATEs, makes it more adherent to the actual implementation
> (and hence more useful), so thanks for doing this.
> 
> Personally, I'd add a quick comment about how this, despite being
> related to blocking and unblocking, happens actually inside VMX
> handlers, i.e., (quickly) what is the relationship within these sets of
> events (blocking/unblocking VMENTER/EXIT) and why it is ok to do things
> like they are done.
> 
> I'm not too fussed about this, though. So, if others don't think
> something like this is necessary, I'm fine with what you have here.
> 

That type of information is welcomed.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
  2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-11-10 14:52   ` Dario Faggioli
@ 2015-11-24  6:34   ` Tian, Kevin
  2015-11-24  6:49     ` Wu, Feng
  1 sibling, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  6:34 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Zhang, Yang Z, Andrew Cooper, Keir Fraser, George Dunlap, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> 
> diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> new file mode 100644
> index 0000000..f9b4637
> --- /dev/null
> +++ b/docs/misc/vtd-pi.txt
> @@ -0,0 +1,329 @@
> +Authors: Feng Wu <feng.wu@intel.com>
> +
> +VT-d Posted-interrupt (PI) design for XEN
> +
> +Background
> +==========
> +With the development of virtualization, there are more and more device
> +assignment requirements. However, today when a VM is running with
> +assigned devices (such as, NIC), external interrupt handling for the assigned
> +devices always needs VMM intervention.
> +
> +VT-d Posted-interrupt is a more enhanced method to handle interrupts
> +in the virtualization environment. Interrupt posting is the process by
> +which an interrupt request is recorded in a memory-resident
> +posted-interrupt-descriptor structure by the root-complex, followed by
> +an optional notification event issued to the CPU complex.

Some clarification required here. Is "Interrupt Posting" only used to
represent the process of VT-d Posted-interrupt, or also for CPU-side
posting through IPI? From above context, and later explanation about
"Processor Support" and "Root-complex Support", looks the former
is true. Then how do we call CPU-side posting?

I think a one-sentence definition about those terms in the start would
be helpful.

> +
> +With VT-d Posted-interrupt we can get the following advantages:
> +- Direct delivery of external interrupts to running vCPUs without VMM
> +intervention
> +- Decrease the interrupt migration complexity. On vCPU migration, software
> +can atomically co-migrate all interrupts targeting the migrating vCPU. For
> +virtual machines with assigned devices, migrating a vCPU across pCPUs
> +either incurs the overhead of forwarding interrupts in software (e.g. via VMM
> +generated IPIs), or complexity to independently migrate each interrupt targeting
> +the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
> +of an external interrupt from assigned devices is stored in the IRTE (i.e.
> +Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
> +we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
> +make the interrupt migration automatic.
> +
> +Here is what Xen currently does for external interrupts from assigned devices:
> +
> +When a VM is running and an external interrupt from an assigned device occurs
> +for it. VM-EXIT happens, then:
> +
> +vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
> +raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> +
> +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> +
> +dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
> +vmsi_inj_irq() --> vlapic_set_irq()
> +
> +vlapic_set_irq() does the following things:
> +1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
> +the virtual interrupt via posted-interrupt infrastructure.
> +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
> +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
> +will help to inject the interrupt to guests.
> +
> +However, after VT-d PI is supported, when a guest is running in non-root and an
> +external interrupt from an assigned device occurs for it. No VM-Exit is needed,

". No VM-Exit ..." -> ", no VM-Exit..."

> +the guest can handle this totally in non-root mode, thus avoiding all the above
> +code flow.
> +
> +Posted-interrupt Introduction
> +========================
> +There are two components to the Posted-interrupt architecture:

"to the" -> "in the"


Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
  2015-11-24  6:34   ` Tian, Kevin
@ 2015-11-24  6:49     ` Wu, Feng
  0 siblings, 0 replies; 45+ messages in thread
From: Wu, Feng @ 2015-11-24  6:49 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich, Zhang,
	Yang Z, Keir Fraser



> -----Original Message-----
> From: Tian, Kevin
> Sent: Tuesday, November 24, 2015 2:35 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z <yang.z.zhang@intel.com>; Jan Beulich
> <jbeulich@suse.com>; Keir Fraser <keir@xen.org>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>
> Subject: RE: [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design
> 
> > From: Wu, Feng
> > Sent: Tuesday, November 03, 2015 4:43 PM
> >
> >
> > diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> > new file mode 100644
> > index 0000000..f9b4637
> > --- /dev/null
> > +++ b/docs/misc/vtd-pi.txt
> > @@ -0,0 +1,329 @@
> > +Authors: Feng Wu <feng.wu@intel.com>
> > +
> > +VT-d Posted-interrupt (PI) design for XEN
> > +
> > +Background
> > +==========
> > +With the development of virtualization, there are more and more device
> > +assignment requirements. However, today when a VM is running with
> > +assigned devices (such as, NIC), external interrupt handling for the
> assigned
> > +devices always needs VMM intervention.
> > +
> > +VT-d Posted-interrupt is a more enhanced method to handle interrupts
> > +in the virtualization environment. Interrupt posting is the process by
> > +which an interrupt request is recorded in a memory-resident
> > +posted-interrupt-descriptor structure by the root-complex, followed by
> > +an optional notification event issued to the CPU complex.
> 
> Some clarification required here. Is "Interrupt Posting" only used to
> represent the process of VT-d Posted-interrupt, or also for CPU-side
> posting through IPI? From above context, and later explanation about
> "Processor Support" and "Root-complex Support", looks the former
> is true. Then how do we call CPU-side posting?

Thanks for the comments. "Interrupt posting" in fact contains vt-d side
and cpu side. So I will reword this to make it clear.

Thanks,
Feng

> 
> I think a one-sentence definition about those terms in the start would
> be helpful.
> 
> > +
> > +With VT-d Posted-interrupt we can get the following advantages:
> > +- Direct delivery of external interrupts to running vCPUs without VMM
> > +intervention
> > +- Decrease the interrupt migration complexity. On vCPU migration,
> software
> > +can atomically co-migrate all interrupts targeting the migrating vCPU. For
> > +virtual machines with assigned devices, migrating a vCPU across pCPUs
> > +either incurs the overhead of forwarding interrupts in software (e.g. via
> VMM
> > +generated IPIs), or complexity to independently migrate each interrupt
> targeting
> > +the vCPU to the new pCPU. However, after enabling VT-d PI, the
> destination vCPU
> > +of an external interrupt from assigned devices is stored in the IRTE (i.e.
> > +Posted-interrupt Descriptor Address), when vCPU is migrated to another
> pCPU,
> > +we will set this new pCPU in the 'NDST' filed of Posted-interrupt
> descriptor, this
> > +make the interrupt migration automatic.
> > +
> > +Here is what Xen currently does for external interrupts from assigned
> devices:
> > +
> > +When a VM is running and an external interrupt from an assigned device
> occurs
> > +for it. VM-EXIT happens, then:
> > +
> > +vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci()
> -->
> > +raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> > +
> > +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> > +
> > +dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() -->
> vmsi_deliver() -->
> > +vmsi_inj_irq() --> vlapic_set_irq()
> > +
> > +vlapic_set_irq() does the following things:
> > +1. If CPU-side posted-interrupt is supported, call
> vmx_deliver_posted_intr() to deliver
> > +the virtual interrupt via posted-interrupt infrastructure.
> > +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR
> in vLAPIC
> > +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry,
> vmx_intr_assist()
> > +will help to inject the interrupt to guests.
> > +
> > +However, after VT-d PI is supported, when a guest is running in non-root
> and an
> > +external interrupt from an assigned device occurs for it. No VM-Exit is
> needed,
> 
> ". No VM-Exit ..." -> ", no VM-Exit..."
> 
> > +the guest can handle this totally in non-root mode, thus avoiding all the
> above
> > +code flow.
> > +
> > +Posted-interrupt Introduction
> > +========================
> > +There are two components to the Posted-interrupt architecture:
> 
> "to the" -> "in the"
> 
> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection
  2015-11-03  8:43 ` [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-11-24  7:14   ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:14 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts
  2015-11-03  8:43 ` [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-11-24  7:15   ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:15 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> This patch adds some helper functions to manipulate the
> Posted-Interrupts Descriptor.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-11-03  8:43 ` [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-11-24  7:29   ` Tian, Kevin
  2015-11-24  7:46     ` Wu, Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:29 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
N' is set
> 
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot send
> posted-interrupt when 'SN' is set.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>, with one small comment:

> +        do {
> +            /*
> +             * Currently, we don't support urgent interrupt, all
> +             * interrupts are recognized as non-urgent interrupt,
> +             * so we cannot send posted-interrupt when 'SN' is set.
> +             * Besides that, if 'ON' is already set, we cannot set
> +             * posted-interrupts as well.
> +             */

Is above comment accurate. "cannot set" is too strong for 'ON'
already set, right? Ideally there's no correctness issue if you
still deliver another posted-interrupt even when ON is already set.
To me it's more like an optimization then it's cleaner to say 
"we can avoid"...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 09/17] VT-d: Remove pointless casts
  2015-11-03  8:43 ` [PATCH v9 09/17] VT-d: Remove pointless casts Feng Wu
@ 2015-11-24  7:30   ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:30 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> Remove pointless casts.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-03  8:43 ` [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-11-24  7:45   ` Tian, Kevin
  2015-11-24  7:54     ` Wu, Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:45 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Zhang, Yang Z, Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> v8:
> - Some minor adjustment
> 
> v7:
> - Remove __uint128_t cast
> - Remove Kevin's Ack due to a bug fix for v6
> - Reword some comments
> - Setup posted IRTE from zeroed structure
> 
> v6:
> - In some error cases, the desc->lock will be unlocked twice, fix it.
> - Coding style fix.
> - Add some comments.
> 
> v5:
> - Make some function parameters const
> - Call "spin_unlock_irq(&desc->lock);" a little eariler
> - Add "ASSERT(spin_is_locked(&pcidevs_lock))"
> - -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen
> 
> v4:
> - Don't inline setup_posted_irte()
> - const struct pi_desc *pi_desc for setup_posted_irte()
> - return -EINVAL when pirq_spin_lock_irq_desc() fails.
> - Make some variables const
> - Release irq desc lock earlier in pi_update_irte()
> - Remove the pointless do-while() loop when doing cmpxchg16b()
> 
> v3:
> - Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
> - Change the return type of pi_update_irte() to int.
> - Remove some pointless printk message in pi_update_irte().
> - Use structure assignment instead of memcpy() for irte copy.
> 
>  xen/drivers/passthrough/vtd/intremap.c | 120
> +++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.h    |   6 ++
>  xen/include/asm-x86/iommu.h            |   2 +
>  3 files changed, 128 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/intremap.c
> b/xen/drivers/passthrough/vtd/intremap.c
> index 8f135e1..67e4f6d 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -899,3 +899,123 @@ void iommu_disable_x2apic_IR(void)
>      for_each_drhd_unit ( drhd )
>          disable_qinval(drhd->iommu);
>  }
> +
> +static void setup_posted_irte(
> +    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
> +    const struct pi_desc *pi_desc, const uint8_t gvec)
> +{
> +    memset(new_ire, sizeof(*new_ire), 0);
> +
> +    if ( !old_ire->remap.im )
> +    {
> +        new_ire->post.p = old_ire->remap.p;
> +        new_ire->post.fpd = old_ire->remap.fpd;
> +        new_ire->post.sid = old_ire->remap.sid;
> +        new_ire->post.sq = old_ire->remap.sq;
> +        new_ire->post.svt = old_ire->remap.svt;
> +    }
> +    else
> +    {
> +        new_ire->post.p = old_ire->post.p;
> +        new_ire->post.fpd = old_ire->post.fpd;
> +        new_ire->post.sid = old_ire->post.sid;
> +        new_ire->post.sq = old_ire->post.sq;
> +        new_ire->post.svt = old_ire->post.svt;
> +        new_ire->post.urg = old_ire->post.urg;
> +    }

An comment here is preferred, otherwise it's not straightforward to
know what 'im' is and why branches are used for it.

> +
> +    new_ire->post.im = 1;
> +    new_ire->post.vector = gvec;
> +    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> +    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
> +}
> +
> +/*
> + * This function is used to update the IRTE for posted-interrupt
> + * when guest changes MSI/MSI-X information.
> + */
> +int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
> +    const uint8_t gvec)
> +{
> +    struct irq_desc *desc;
> +    const struct msi_desc *msi_desc;
> +    int remap_index;
> +    int rc = 0;
> +    const struct pci_dev *pci_dev;
> +    const struct acpi_drhd_unit *drhd;
> +    struct iommu *iommu;
> +    struct ir_ctrl *ir_ctrl;
> +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +    struct iremap_entry new_ire, old_ire;
> +    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    __uint128_t ret;
> +
> +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +    if ( !desc )
> +        return -EINVAL;
> +
> +    msi_desc = desc->msi_desc;
> +    if ( !msi_desc )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;
> +    }
> +
> +    pci_dev = msi_desc->dev;
> +    if ( !pci_dev )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;
> +    }
> +
> +    remap_index = msi_desc->remap_index;
> +
> +    spin_unlock_irq(&desc->lock);
> +
> +    ASSERT(spin_is_locked(&pcidevs_lock));
> +
> +    /*
> +     * FIXME: For performance reasons we should store the 'iommu' pointer in
> +     * 'struct msi_desc' in some other place, so we don't need to waste
> +     * time searching it here.
> +     */
> +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> +    if ( !drhd )
> +        return -ENODEV;
> +
> +    iommu = drhd->iommu;
> +    ir_ctrl = iommu_ir_ctrl(iommu);
> +    if ( !ir_ctrl )
> +        return -ENODEV;
> +
> +    spin_lock_irq(&ir_ctrl->iremap_lock);
> +
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> +
> +    old_ire = *p;
> +
> +    /* Setup/Update interrupt remapping table entry. */
> +    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> +
> +    /*
> +     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
> +     * and the hardware cannot update the IRTE behind us, so the return value

hardware can DEFINITELY update IRTE behind us, right? e.g. after the IRTE entry
is fully up, when interrupt is posted, etc. Here you might mean hardware cannot
update the IRTE at this point?

> +     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
> +     */
> +    ASSERT(ret == old_ire.val);
> +
> +    iommu_flush_cache_entry(p, sizeof(*p));
> +    iommu_flush_iec_index(iommu, 0, remap_index);
> +
> +    unmap_vtd_domain_page(iremap_entries);
> +
> +    spin_unlock_irq(&ir_ctrl->iremap_lock);
> +
> +    return 0;
> +
> + unlock_out:
> +    spin_unlock_irq(&desc->lock);
> +
> +    return rc;
> +}
> diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> index b440b69..c55ee08 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -323,6 +323,12 @@ struct iremap_entry {
>    };
>  };
> 
> +/*
> + * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
> + * the upper 26 bits of lest significiant 32 bits is available.
> + */
> +#define PDA_LOW_BIT    26
> +
>  /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
>  #define IREMAP_PAGE_ORDER  8
> 
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index 29203d7..92f0900 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -31,6 +31,8 @@ int iommu_supports_eim(void);
>  int iommu_enable_x2apic_IR(void);
>  void iommu_disable_x2apic_IR(void);
> 
> +int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
> +
>  #endif /* !__ARCH_X86_IOMMU_H__ */
>  /*
>   * Local variables:
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-11-24  7:29   ` Tian, Kevin
@ 2015-11-24  7:46     ` Wu, Feng
  2015-11-24  7:52       ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Wu, Feng @ 2015-11-24  7:46 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Andrew Cooper, Wu, Feng, Keir Fraser, Jan Beulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Tuesday, November 24, 2015 3:30 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>
> Subject: RE: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is
> set
> 
> > From: Wu, Feng
> > Sent: Tuesday, November 03, 2015 4:43 PM
> N' is set
> >
> > Currently, we don't support urgent interrupt, all interrupts
> > are recognized as non-urgent interrupt, so we cannot send
> > posted-interrupt when 'SN' is set.
> >
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Kevin Tian <kevin.tian@intel.com>, with one small comment:
> 
> > +        do {
> > +            /*
> > +             * Currently, we don't support urgent interrupt, all
> > +             * interrupts are recognized as non-urgent interrupt,
> > +             * so we cannot send posted-interrupt when 'SN' is set.
> > +             * Besides that, if 'ON' is already set, we cannot set
> > +             * posted-interrupts as well.
> > +             */
> 
> Is above comment accurate. "cannot set" is too strong for 'ON'
> already set, right? Ideally there's no correctness issue if you
> still deliver another posted-interrupt even when ON is already set.
> To me it's more like an optimization then it's cleaner to say
> "we can avoid"...

Here, we just emulate the hardware's behavior, in hardware p.o.v,
if 'ON' is set, the notification event will not be delivered, so here
I try to follow the hardware, is this reasonable for you?

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 16/17] VT-d: Dump the posted format IRTE
  2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
  2015-11-17 15:55   ` Jan Beulich
@ 2015-11-24  7:50   ` Tian, Kevin
  1 sibling, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:50 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Zhang, Yang Z

> From: Wu, Feng
> Sent: Tuesday, November 03, 2015 4:43 PM
> 
> Add the utility to dump the posted format IRTE.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-11-24  7:46     ` Wu, Feng
@ 2015-11-24  7:52       ` Tian, Kevin
  2015-11-24  7:55         ` Wu, Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:52 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 24, 2015 3:47 PM
> 
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Tuesday, November 24, 2015 3:30 PM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> > Cooper <andrew.cooper3@citrix.com>
> > Subject: RE: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is
> > set
> >
> > > From: Wu, Feng
> > > Sent: Tuesday, November 03, 2015 4:43 PM
> > N' is set
> > >
> > > Currently, we don't support urgent interrupt, all interrupts
> > > are recognized as non-urgent interrupt, so we cannot send
> > > posted-interrupt when 'SN' is set.
> > >
> > > CC: Kevin Tian <kevin.tian@intel.com>
> > > CC: Keir Fraser <keir@xen.org>
> > > CC: Jan Beulich <jbeulich@suse.com>
> > > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> > Acked-by: Kevin Tian <kevin.tian@intel.com>, with one small comment:
> >
> > > +        do {
> > > +            /*
> > > +             * Currently, we don't support urgent interrupt, all
> > > +             * interrupts are recognized as non-urgent interrupt,
> > > +             * so we cannot send posted-interrupt when 'SN' is set.
> > > +             * Besides that, if 'ON' is already set, we cannot set
> > > +             * posted-interrupts as well.
> > > +             */
> >
> > Is above comment accurate. "cannot set" is too strong for 'ON'
> > already set, right? Ideally there's no correctness issue if you
> > still deliver another posted-interrupt even when ON is already set.
> > To me it's more like an optimization then it's cleaner to say
> > "we can avoid"...
> 
> Here, we just emulate the hardware's behavior, in hardware p.o.v,
> if 'ON' is set, the notification event will not be delivered, so here
> I try to follow the hardware, is this reasonable for you?
> 

I understand it's hardware behavior, just thought whether the comment
is too strong. Could we rephrase it as "Besides that, if 'ON' is already
set, no need to set posted-interrupts as well, according to hardware
behavior"?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-24  7:45   ` Tian, Kevin
@ 2015-11-24  7:54     ` Wu, Feng
  2015-11-24  7:56       ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Wu, Feng @ 2015-11-24  7:54 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Zhang, Yang Z, Andrew Cooper, Wu, Feng, Keir Fraser, Jan Beulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Tuesday, November 24, 2015 3:45 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z <yang.z.zhang@intel.com>; Keir Fraser <keir@xen.org>; Jan
> Beulich <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>
> Subject: RE: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is
> used
> 
> > +
> > +    iommu = drhd->iommu;
> > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > +    if ( !ir_ctrl )
> > +        return -ENODEV;
> > +
> > +    spin_lock_irq(&ir_ctrl->iremap_lock);
> > +
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> > +
> > +    old_ire = *p;
> > +
> > +    /* Setup/Update interrupt remapping table entry. */
> > +    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> > +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> > +
> > +    /*
> > +     * In the above, we use cmpxchg16 to atomically update the 128-bit
> IRTE,
> > +     * and the hardware cannot update the IRTE behind us, so the return
> value
> 
> hardware can DEFINITELY update IRTE behind us, right? e.g. after the IRTE
> entry
> is fully up, when interrupt is posted, etc. Here you might mean hardware
> cannot
> update the IRTE at this point?

Yes, you description above is more accurate. But why hardware needs to
update IRTE when interrupt is posted? I think it needs to update the
posted interrupt descriptor when posting an interrupt, not the IRTE,
right?

Thanks,
Feng

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-11-24  7:52       ` Tian, Kevin
@ 2015-11-24  7:55         ` Wu, Feng
  0 siblings, 0 replies; 45+ messages in thread
From: Wu, Feng @ 2015-11-24  7:55 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Andrew Cooper, Wu, Feng, Keir Fraser, Jan Beulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Tuesday, November 24, 2015 3:53 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>
> Subject: RE: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is
> set
> 
> > From: Wu, Feng
> > Sent: Tuesday, November 24, 2015 3:47 PM
> >
> >
> > > -----Original Message-----
> > > From: Tian, Kevin
> > > Sent: Tuesday, November 24, 2015 3:30 PM
> > > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > > Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>;
> Andrew
> > > Cooper <andrew.cooper3@citrix.com>
> > > Subject: RE: [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN'
> is
> > > set
> > >
> > > > From: Wu, Feng
> > > > Sent: Tuesday, November 03, 2015 4:43 PM
> > > N' is set
> > > >
> > > > Currently, we don't support urgent interrupt, all interrupts
> > > > are recognized as non-urgent interrupt, so we cannot send
> > > > posted-interrupt when 'SN' is set.
> > > >
> > > > CC: Kevin Tian <kevin.tian@intel.com>
> > > > CC: Keir Fraser <keir@xen.org>
> > > > CC: Jan Beulich <jbeulich@suse.com>
> > > > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > >
> > > Acked-by: Kevin Tian <kevin.tian@intel.com>, with one small comment:
> > >
> > > > +        do {
> > > > +            /*
> > > > +             * Currently, we don't support urgent interrupt, all
> > > > +             * interrupts are recognized as non-urgent interrupt,
> > > > +             * so we cannot send posted-interrupt when 'SN' is set.
> > > > +             * Besides that, if 'ON' is already set, we cannot set
> > > > +             * posted-interrupts as well.
> > > > +             */
> > >
> > > Is above comment accurate. "cannot set" is too strong for 'ON'
> > > already set, right? Ideally there's no correctness issue if you
> > > still deliver another posted-interrupt even when ON is already set.
> > > To me it's more like an optimization then it's cleaner to say
> > > "we can avoid"...
> >
> > Here, we just emulate the hardware's behavior, in hardware p.o.v,
> > if 'ON' is set, the notification event will not be delivered, so here
> > I try to follow the hardware, is this reasonable for you?
> >
> 
> I understand it's hardware behavior, just thought whether the comment
> is too strong. Could we rephrase it as "Besides that, if 'ON' is already
> set, no need to set posted-interrupts as well, according to hardware
> behavior"?

Sure, that should be great! Thanks a lot:)

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-24  7:54     ` Wu, Feng
@ 2015-11-24  7:56       ` Tian, Kevin
  2015-11-24  8:52         ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24  7:56 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: Zhang, Yang Z, Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Tuesday, November 24, 2015 3:54 PM
> 
> 
> > -----Original Message-----
> > From: Tian, Kevin
> > Sent: Tuesday, November 24, 2015 3:45 PM
> > To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> > Cc: Zhang, Yang Z <yang.z.zhang@intel.com>; Keir Fraser <keir@xen.org>; Jan
> > Beulich <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>
> > Subject: RE: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is
> > used
> >
> > > +
> > > +    iommu = drhd->iommu;
> > > +    ir_ctrl = iommu_ir_ctrl(iommu);
> > > +    if ( !ir_ctrl )
> > > +        return -ENODEV;
> > > +
> > > +    spin_lock_irq(&ir_ctrl->iremap_lock);
> > > +
> > > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> > iremap_entries, p);
> > > +
> > > +    old_ire = *p;
> > > +
> > > +    /* Setup/Update interrupt remapping table entry. */
> > > +    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> > > +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> > > +
> > > +    /*
> > > +     * In the above, we use cmpxchg16 to atomically update the 128-bit
> > IRTE,
> > > +     * and the hardware cannot update the IRTE behind us, so the return
> > value
> >
> > hardware can DEFINITELY update IRTE behind us, right? e.g. after the IRTE
> > entry
> > is fully up, when interrupt is posted, etc. Here you might mean hardware
> > cannot
> > update the IRTE at this point?
> 
> Yes, you description above is more accurate. But why hardware needs to
> update IRTE when interrupt is posted? I think it needs to update the
> posted interrupt descriptor when posting an interrupt, not the IRTE,
> right?
> 

sorry mess IRTE and posted descriptor together. but using "behind"
is still not accurate to state your point here. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-24  7:56       ` Tian, Kevin
@ 2015-11-24  8:52         ` Jan Beulich
  2015-11-24 12:05           ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2015-11-24  8:52 UTC (permalink / raw)
  To: Feng Wu, Kevin Tian, xen-devel; +Cc: Yang Z Zhang, Andrew Cooper, Keir Fraser

>>> On 24.11.15 at 08:56, <kevin.tian@intel.com> wrote:
>>  From: Wu, Feng
>> Sent: Tuesday, November 24, 2015 3:54 PM
>> > From: Tian, Kevin
>> > Sent: Tuesday, November 24, 2015 3:45 PM
>> > > +    /* Setup/Update interrupt remapping table entry. */
>> > > +    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
>> > > +    ret = cmpxchg16b(p, &old_ire, &new_ire);
>> > > +
>> > > +    /*
>> > > +     * In the above, we use cmpxchg16 to atomically update the 128-bit
>> > IRTE,
>> > > +     * and the hardware cannot update the IRTE behind us, so the return
>> > value
>> >
>> > hardware can DEFINITELY update IRTE behind us, right? e.g. after the IRTE
>> > entry
>> > is fully up, when interrupt is posted, etc. Here you might mean hardware
>> > cannot
>> > update the IRTE at this point?
>> 
>> Yes, you description above is more accurate. But why hardware needs to
>> update IRTE when interrupt is posted? I think it needs to update the
>> posted interrupt descriptor when posting an interrupt, not the IRTE,
>> right?
> 
> sorry mess IRTE and posted descriptor together. but using "behind"
> is still not accurate to state your point here. :-)

Well, "behind us" and "behind our back" seem to mean mostly the
same to me.

Jan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-11-24  8:52         ` Jan Beulich
@ 2015-11-24 12:05           ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2015-11-24 12:05 UTC (permalink / raw)
  To: Jan Beulich, Wu, Feng, xen-devel
  Cc: Zhang, Yang Z, Andrew Cooper, Keir Fraser

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, November 24, 2015 4:52 PM
> 
> >>> On 24.11.15 at 08:56, <kevin.tian@intel.com> wrote:
> >>  From: Wu, Feng
> >> Sent: Tuesday, November 24, 2015 3:54 PM
> >> > From: Tian, Kevin
> >> > Sent: Tuesday, November 24, 2015 3:45 PM
> >> > > +    /* Setup/Update interrupt remapping table entry. */
> >> > > +    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
> >> > > +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> >> > > +
> >> > > +    /*
> >> > > +     * In the above, we use cmpxchg16 to atomically update the 128-bit
> >> > IRTE,
> >> > > +     * and the hardware cannot update the IRTE behind us, so the return
> >> > value
> >> >
> >> > hardware can DEFINITELY update IRTE behind us, right? e.g. after the IRTE
> >> > entry
> >> > is fully up, when interrupt is posted, etc. Here you might mean hardware
> >> > cannot
> >> > update the IRTE at this point?
> >>
> >> Yes, you description above is more accurate. But why hardware needs to
> >> update IRTE when interrupt is posted? I think it needs to update the
> >> posted interrupt descriptor when posting an interrupt, not the IRTE,
> >> right?
> >
> > sorry mess IRTE and posted descriptor together. but using "behind"
> > is still not accurate to state your point here. :-)
> 
> Well, "behind us" and "behind our back" seem to mean mostly the
> same to me.
> 

Thanks for confirming language part. Then I'm OK with it.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-11-24 12:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03  8:43 [PATCH v9 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-11-03  8:43 ` [PATCH v9 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-11-10 14:52   ` Dario Faggioli
2015-11-24  6:34     ` Tian, Kevin
2015-11-24  6:34   ` Tian, Kevin
2015-11-24  6:49     ` Wu, Feng
2015-11-03  8:43 ` [PATCH v9 02/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-11-03  8:43 ` [PATCH v9 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-11-03  8:43 ` [PATCH v9 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-11-24  7:14   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-11-03  8:43 ` [PATCH v9 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-11-24  7:15   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-11-03  8:43 ` [PATCH v9 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-11-24  7:29   ` Tian, Kevin
2015-11-24  7:46     ` Wu, Feng
2015-11-24  7:52       ` Tian, Kevin
2015-11-24  7:55         ` Wu, Feng
2015-11-03  8:43 ` [PATCH v9 09/17] VT-d: Remove pointless casts Feng Wu
2015-11-24  7:30   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-11-03  8:43 ` [PATCH v9 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-11-24  7:45   ` Tian, Kevin
2015-11-24  7:54     ` Wu, Feng
2015-11-24  7:56       ` Tian, Kevin
2015-11-24  8:52         ` Jan Beulich
2015-11-24 12:05           ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
2015-11-03  8:43 ` [PATCH v9 13/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-11-03  8:43 ` [PATCH v9 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-11-03  8:43 ` [PATCH v9 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-11-03  9:07   ` Wu, Feng
2015-11-10 16:59     ` Dario Faggioli
2015-11-11  1:43       ` Wu, Feng
2015-11-11 13:29         ` Wu, Feng
2015-11-11 14:46           ` Dario Faggioli
2015-11-10 18:14   ` Dario Faggioli
2015-11-03  8:43 ` [PATCH v9 16/17] VT-d: Dump the posted format IRTE Feng Wu
2015-11-17 15:55   ` Jan Beulich
2015-11-24  7:50   ` Tian, Kevin
2015-11-03  8:43 ` [PATCH v9 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-11-10  7:33 ` [PATCH v9 00/17] Add VT-d Posted-Interrupts support Wu, Feng
2015-11-10 10:39   ` Jan Beulich
2015-11-17 15:39   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.