All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/17] Add VT-d Posted-Interrupts support
@ 2015-10-12  8:54 Feng Wu
  2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
                   ` (18 more replies)
  0 siblings, 19 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: feng.wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

Feng Wu (17):
  VT-d Posted-intterrupt (PI) design
  Add cmpxchg16b support for x86-64
  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  vt-d: VT-d Posted-Interrupts feature detection
  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  vmx: Add some helper functions for Posted-Interrupts
  vmx: Initialize VT-d Posted-Interrupts Descriptor
  vmx: Suppress posting interrupts when 'SN' is set
  VT-d: Remove pointless casts
  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  vt-d: Add API to update IRTE when VT-d PI is used
  x86: move some APIC related macros to apicdef.h
  Update IRTE according to guest interrupt config changes
  vmx: Properly handle notification event when vCPU is running
  vmx: VT-d posted-interrupt core logic handling
  VT-d: Dump the posted format IRTE
  Add a command line parameter for VT-d posted-interrupts

 docs/misc/vtd-pi.txt                   | 332 +++++++++++++++++++++++++++++++
 docs/misc/xen-command-line.markdown    |   9 +-
 xen/arch/x86/domain.c                  |  12 ++
 xen/arch/x86/hvm/hvm.c                 |  18 ++
 xen/arch/x86/hvm/vlapic.c              |   5 -
 xen/arch/x86/hvm/vmx/vmcs.c            |  24 +++
 xen/arch/x86/hvm/vmx/vmx.c             | 348 ++++++++++++++++++++++++++++++++-
 xen/common/schedule.c                  |   9 +
 xen/drivers/passthrough/io.c           | 123 +++++++++++-
 xen/drivers/passthrough/iommu.c        |  16 +-
 xen/drivers/passthrough/vtd/intremap.c | 212 +++++++++++++++-----
 xen/drivers/passthrough/vtd/iommu.c    |  14 +-
 xen/drivers/passthrough/vtd/iommu.h    |  51 +++--
 xen/drivers/passthrough/vtd/utils.c    |  40 ++--
 xen/include/asm-arm/domain.h           |   4 +
 xen/include/asm-x86/apicdef.h          |   3 +
 xen/include/asm-x86/domain.h           |   4 +
 xen/include/asm-x86/hvm/hvm.h          |   4 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |  25 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h      |  27 +++
 xen/include/asm-x86/iommu.h            |   2 +
 xen/include/asm-x86/x86_64/system.h    |  33 ++++
 xen/include/xen/iommu.h                |   2 +-
 23 files changed, 1229 insertions(+), 88 deletions(-)
 create mode 100644 docs/misc/vtd-pi.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12 10:41   ` Andrew Cooper
  2015-10-29 15:56   ` Dario Faggioli
  2015-10-12  8:54 ` [PATCH v8 02/17] Add cmpxchg16b support for x86-64 Feng Wu
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Yang Zhang, feng.wu

Add the design doc for VT-d PI.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Keir Fraser <keir@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/vtd-pi.txt | 332 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 332 insertions(+)
 create mode 100644 docs/misc/vtd-pi.txt

diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
new file mode 100644
index 0000000..af5409a
--- /dev/null
+++ b/docs/misc/vtd-pi.txt
@@ -0,0 +1,332 @@
+Authors: Feng Wu <feng.wu@intel.com>
+
+VT-d Posted-interrupt (PI) design for XEN
+
+Background
+==========
+With the development of virtualization, there are more and more device
+assignment requirements. However, today when a VM is running with
+assigned devices (such as, NIC), external interrupt handling for the assigned
+devices always needs VMM intervention.
+
+VT-d Posted-interrupt is a more enhanced method to handle interrupts
+in the virtualization environment. Interrupt posting is the process by
+which an interrupt request is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex.
+
+With VT-d Posted-interrupt we can get the following advantages:
+- Direct delivery of external interrupts to running vCPUs without VMM
+intervention
+- Decrease the interrupt migration complexity. On vCPU migration, software
+can atomically co-migrate all interrupts targeting the migrating vCPU. For
+virtual machines with assigned devices, migrating a vCPU across pCPUs
+either incurs the overhead of forwarding interrupts in software (e.g. via VMM
+generated IPIs), or complexity to independently migrate each interrupt targeting
+the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
+of an external interrupt from assigned devices is stored in the IRTE (i.e.
+Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
+we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
+make the interrupt migration automatic.
+
+Here is what Xen currently does for external interrupts from assigned devices:
+
+When a VM is running and an external interrupt from an assigned device occurs
+for it. VM-EXIT happens, then:
+
+vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
+raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
+
+softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
+
+dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
+vmsi_inj_irq() --> vlapic_set_irq()
+
+vlapic_set_irq() does the following things:
+1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
+the virtual interrupt via posted-interrupt infrastructure.
+2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
+page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
+will help to inject the interrupt to guests.
+
+However, after VT-d PI is supported, when a guest is running in non-root and an
+external interrupt from an assigned device occurs for it. No VM-Exit is needed,
+the guest can handle this totally in non-root mode, thus avoiding all the above
+code flow.
+
+Posted-interrupt Introduction
+========================
+There are two components to the Posted-interrupt architecture:
+Processor Support and Root-Complex Support
+
+- Processor Support
+Posted-interrupt processing is a feature by which a processor processes
+the virtual interrupts by recording them as pending on the virtual-APIC
+page.
+
+Posted-interrupt processing is enabled by setting the process posted
+interrupts VM-execution control. The processing is performed in response
+to the arrival of an interrupt with the posted-interrupt notification vector.
+In response to such an interrupt, the processor processes virtual interrupts
+recorded in a data structure called a posted-interrupt descriptor.
+
+More information about APICv and CPU-side Posted-interrupt, please refer
+to Chapter 29, and Section 29.6 in the Intel SDM:
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+
+- Root-Complex Support
+Interrupt posting is the process by which an interrupt request (from IOAPIC
+or MSI/MSIx capable sources) is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex. The interrupt
+request arriving at the root-complex carry the identity of the interrupt
+request source and a 'remapping-index'. The remapping-index is used to
+look-up an entry from the memory-resident interrupt-remap-table. Unlike
+interrupt-remapping, the interrupt-remap-table-entry for a posted-interrupt,
+specifies a virtual-vector and a pointer to the posted-interrupt descriptor.
+The virtual-vector specifies the vector of the interrupt to be recorded in
+the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage
+for the virtual-vectors and contains the attributes of the notification event
+(interrupt) to be issued to the CPU complex to inform CPU/software about pending
+interrupts recorded in the posted-interrupt descriptor.
+
+More information about VT-d PI, please refer to
+http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
+
+Important Definitions
+==================
+There are some changes to IRTE and posted-interrupt descriptor after
+VT-d PI is introduced:
+IRTE: Interrupt Remapping Table Entry
+Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
+Virtual Vector: the guest vector of the interrupt
+URG: indicates if the interrupt is urgent
+
+Posted-interrupt descriptor:
+The Posted Interrupt Descriptor hosts the following fields:
+Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
+per vector, for up to 256 vectors).
+
+Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
+processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
+hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
+the notification event (processor or software) resets it as part of posted interrupt processing.
+
+Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
+generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
+URG=0).
+
+Notification Vector (NV): Specify the vector for notification event (interrupt).
+
+Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
+processor for the notification event.
+
+Design Overview
+==============
+In this design, we will cover the following items:
+1. Add a variable to control whether enable VT-d posted-interrupt or not.
+2. VT-d PI feature detection.
+3. Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+4. Extend IRTE structure to support VT-d PI.
+5. Introduce a new global vector which is used for waking up the blocked vCPU.
+6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+7. Update posted-interrupt descriptor during vCPU scheduling (when the state
+of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
+RUNSTATE_runnable / RUNSTATE_offline).
+8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+9. New boot command line for Xen, which controls VT-d PI feature by user.
+10. Multicast/broadcast and lowest priority interrupts consideration.
+
+
+Implementation details
+===================
+- New variable to control VT-d PI
+
+Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
+to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
+only when interrupt remapping and VT-d posted-interrupt are both enabled.
+
+- VT-d PI feature detection.
+Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
+
+- Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+Here is the new structure for posted-interrupt descriptor:
+
+struct pi_desc {
+    DECLARE_BITMAP(pir, NR_VECTORS);
+    union {
+        struct
+        {
+        u16 on     : 1,  /* bit 256 - Outstanding Notification */
+            sn     : 1,  /* bit 257 - Suppress Notification */
+            rsvd_1 : 14; /* bit 271:258 - Reserved */
+        u8  nv;          /* bit 279:272 - Notification Vector */
+        u8  rsvd_2;      /* bit 287:280 - Reserved */
+        u32 ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
+} __attribute__ ((aligned (64)));
+
+- Extend IRTE structure to support VT-d PI.
+
+Here is the new structure for IRTE:
+/* interrupt remap entry */
+struct iremap_entry {
+  union {
+    struct { u64 lo, hi; };
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            dm      : 1,
+            rh      : 1,
+            tm      : 1,
+            dlm     : 3,
+            avail   : 4,
+            res_1   : 4;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4   : 32;
+    } remap;
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_5   : 12;
+        u32 pda_h;
+    } post;
+  };
+};
+
+- Introduce a new global vector which is used to wake up the blocked vCPU.
+
+Currently, there is a global vector 'posted_intr_vector', which is used as the
+global notification vector for all vCPUs in the system. This vector is stored in
+VMCS and CPU considers it as a _special_ vector, uses it to notify the related
+pCPU when an interrupt is recorded in the posted-interrupt descriptor.
+
+This existing global vector is a _special_ vector to CPU, CPU handle it in a
+_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+for more information about how CPU handles it.
+
+After having VT-d PI, VT-d engine can issue notification event when the
+assigned devices issue interrupts. We need add a new global vector to
+wakeup the blocked vCPU, please refer to later section in this design for
+how to use this new global vector.
+
+- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+After VT-d PI is introduced, the format of IRTE is changed as follows:
+	Descriptor Address: the address of the posted-interrupt descriptor
+	Virtual Vector: the guest vector of the interrupt
+	URG: indicates if the interrupt is urgent
+	Other fields continue to have the same meaning
+
+'Descriptor Address' tells the destination vCPU of this interrupt, since
+each vCPU has a dedicated posted-interrupt descriptor.
+
+'Virtual Vector' tells the guest vector of the interrupt.
+
+When guest changes the configuration of the interrupts, such as, the
+cpu affinity, or the vector, we need to update the associated IRTE accordingly.
+
+- Update posted-interrupt descriptor during vCPU scheduling
+
+The basic idea here is:
+1. When vCPU's state is RUNSTATE_running,
+        - Set 'NV' to 'posted_intr_vector'.
+        - Clear 'SN' to accept posted-interrupts.
+        - Set 'NDST' to the pCPU on which the vCPU will be running.
+2. When vCPU's state is RUNSTATE_blocked,
+        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
+          related vCPU when posted-interrupt happens for it.
+          Please refer to the above section about the new global vector.
+        - Clear 'SN' to accept posted-interrupts
+3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
+        - Set 'SN' to suppress non-urgent interrupts
+          (Currently, we only support non-urgent interrupts)
+         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline
+         It is not needed to accept posted-interrupt notification event
+         since we don't change the behavior of scheduler when the interrupt
+         occurs, we still need wait for the next scheduling of the vCPU.
+         When external interrupts from assigned devices occur, the interrupts
+         are recorded in PIR, and will be synced to IRR before VM-Entry.
+        - Set 'NV' to 'posted_intr_vector'.
+
+- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+
+Here is the scenario for the usage of the new global vector:
+
+1. vCPU0 is running on pCPU0
+2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
+3. An external interrupt from an assigned device occurs for vCPU0, if we
+still use 'posted_intr_vector' as the notification vector for vCPU0, the
+notification event for vCPU0 (the event will go to pCPU1) will be consumed
+by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
+case is that vCPU0 will never be woken up again since the wakeup event
+for it is always consumed by other vCPUs incorrectly. So we need introduce
+another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
+
+After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
+event using this new vector. Since this new vector is not a SPECIAL one to CPU,
+it is just a normal vector. To CPU, it just receives an normal external interrupt,
+then we can get control in the handler of this new vector. In this case, hypervisor
+can do something in it, such as wakeup the blocked vCPU.
+
+Here are what we do for the blocked vCPU:
+1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
+vCPU on the pCPU.
+2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU
+to the per-cpu list belonging to the pCPU it was running.
+3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
+
+In the handler of 'pi_wakeup_vector', we do:
+1. Get the physical CPU.
+2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
+we unblock the associated vCPU.
+
+- New boot command line for Xen, which controls VT-d PI feature by user.
+
+Like 'intremap' for interrupt remapping, we add a new boot command line
+'intpost' for posted-interrupts.
+
+- Multicast/broadcast and lowest priority interrupts consideration.
+
+With VT-d PI, the destination vCPU information of an external interrupt
+from assigned devices is stored in IRTE, this makes the following
+consideration of the design:
+1. Multicast/broadcast interrupts cannot be posted.
+2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
+(starting from Nehalem) ignore TPR value, and instead supported two other
+ways (configurable by BIOS) on how the handle lowest priority interrupts:
+	A) Round robin: In this method, the chipset simply delivers lowest priority
+interrupts in a round-robin manner across all the available logical CPUs. While
+this provides good load balancing, this was not the best thing to do always as
+interrupts from the same device (like NIC) will start running on all the CPUs
+thrashing caches and taking locks. This led to the next scheme.
+	B) Vector hashing: In this method, hardware would apply a hash function
+on the vector value in the interrupt request, and use that hash to pick a logical
+CPU to route the lowest priority interrupt. This way, a given vector always goes
+to the same logical CPU, avoiding the thrashing problem above.
+
+So, gist of above is that, lowest priority interrupts has never been delivered as
+"lowest priority" in physical hardware.
+
+Vector hashing is used in this design.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-13 15:29   ` Jan Beulich
  2015-10-12  8:54 ` [PATCH v8 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, feng.wu, Jan Beulich, Andrew Cooper

This patch adds cmpxchg16b support for x86-64, so software
can perform 128-bit atomic write/read.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Remove pointless cast when assigning 'new_low'
- properly parenthesize cmpxchg16b()

v7:
- Make the last two parameters of __cmpxchg16b() const
- Remove memory clobber
- Add run-time and build-build check in cmpxchg16b()
- Cast the last two parameter to void * when calling __cmpxchg16b()

v6:
- Fix a typo

v5:
- Change back the parameters of __cmpxchg16b() to __uint128_t *
- Remove pointless cast for 'ptr'
- Remove pointless parentheses
- Use A constraint for the output

v4:
- Use pointer as the parameter of __cmpxchg16b().
- Use gcc's __uint128_t built-in type
- Make the parameters of __cmpxchg16b() void *

v3:
- Newly added.

 xen/include/asm-x86/x86_64/system.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index 662813a..8f87d1e 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -6,6 +6,39 @@
                                    (unsigned long)(n),sizeof(*(ptr))))
 
 /*
+ * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
+ * identical, store NEW in MEM.  Return the initial value in MEM.
+ * Success is indicated by comparing RETURN with OLD.
+ *
+ * This function can only be called when cpu_has_cx16 is true.
+ */
+
+static always_inline __uint128_t __cmpxchg16b(
+    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
+{
+    __uint128_t prev;
+    uint64_t new_high = *new >> 64;
+    uint64_t new_low = *new;
+
+    ASSERT(cpu_has_cx16);
+
+    asm volatile ( "lock; cmpxchg16b %3"
+                   : "=A" (prev)
+                   : "c" (new_high), "b" (new_low),
+                     "m" (*__xg(ptr)), "0" (*old) );
+
+    return prev;
+}
+
+#define cmpxchg16b(ptr, o, n) ({                           \
+    volatile void *_p = (ptr);                             \
+    ASSERT(!((unsigned long)_p & 0xf));                    \
+    BUILD_BUG_ON(sizeof(*(o)) != sizeof(__uint128_t));     \
+    BUILD_BUG_ON(sizeof(*(n)) != sizeof(__uint128_t));     \
+    __cmpxchg16b(_p, (void *)(o), (void *)(n));            \
+})
+
+/*
  * This function causes value _o to be changed to _n at location _p.
  * If this access causes a fault then we return 1, otherwise we return 0.
  * If no fault occurs then _o is updated to the value we saw at _p. If this
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-10-12  8:54 ` [PATCH v8 02/17] Add cmpxchg16b support for x86-64 Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, feng.wu, Jan Beulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds variable 'iommu_intpost' to control whether enable VT-d
posted-interrupt or not in the generic IOMMU code.

CC: Jan Beulich <jbeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v5:
- Remove the "if no intremap then no intpost" logic in parse_iommu_param(), which
  can be covered in iommu_setup()

v3:
- Remove pointless initializer for 'iommu_intpost'.
- Some adjustment for "if no intremap then no intpost" logic.
    * For parse_iommu_param(), move it to the end of the function,
      so we don't need to add the some logic when introduing the
      new kernel parameter 'intpost' in later patch.
    * Add this logic in iommu_setup() after iommu_hardware_setup()
      is called.

 xen/drivers/passthrough/iommu.c | 13 ++++++++++++-
 xen/include/xen/iommu.h         |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index fc7831e..36d5cc0 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -51,6 +51,14 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+
+/*
+ * In the current implementation of VT-d posted interrupts, in some extreme
+ * cases, the per cpu list which saves the blocked vCPU will be very long,
+ * and this will affect the interrupt latency, so let this feature off by
+ * default until we find a good solution to resolve it.
+ */
+bool_t __read_mostly iommu_intpost;
 bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
@@ -307,6 +315,9 @@ int __init iommu_setup(void)
         panic("Couldn't enable %s and iommu=required/force",
               !iommu_enabled ? "IOMMU" : "Interrupt Remapping");
 
+    if ( !iommu_intremap )
+        iommu_intpost = 0;
+
     if ( !iommu_enabled )
     {
         iommu_snoop = 0;
@@ -374,7 +385,7 @@ void iommu_crash_shutdown(void)
     const struct iommu_ops *ops = iommu_get_ops();
     if ( iommu_enabled )
         ops->crash_shutdown();
-    iommu_enabled = iommu_intremap = 0;
+    iommu_enabled = iommu_intremap = iommu_intpost = 0;
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 8f3a20e..1f5d04a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -30,7 +30,7 @@
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
-extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
+extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 04/17] vt-d: VT-d Posted-Interrupts feature detection
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, feng.wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v7:
- Remove pointless "if non iommu_intremap then disable iommu_intpost" logic
- Don't need to check !iommu_intremap or !iommu_intpost when setting iommu_intpost to 0

v5:
- Remove blank line

v4:
- Correct a logic error when setting iommu_intpost to 0

v3:
- Remove the "if no intremap then no intpost" logic in
  intel_vtd_setup(), it is covered in the iommu_setup().
- Add "if no intremap then no intpost" logic in the end
  of init_vtd_hw() which is called by vtd_resume().

So the logic exists in the following three places:
- parse_iommu_param()
- iommu_setup()
- init_vtd_hw()

 xen/drivers/passthrough/vtd/iommu.c | 14 ++++++++++++--
 xen/drivers/passthrough/vtd/iommu.h |  1 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1dffc40..8dee731 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2147,8 +2147,8 @@ int __init intel_vtd_setup(void)
     }
 
     /* We enable the following features only if they are supported by all VT-d
-     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
-     * Interrupt Remapping.
+     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
+     * Remapping, and Posted Interrupt
      */
     for_each_drhd_unit ( drhd )
     {
@@ -2176,6 +2176,14 @@ int __init intel_vtd_setup(void)
         if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
             iommu_intremap = 0;
 
+        /*
+         * We cannot use posted interrupt if X86_FEATURE_CX16 is
+         * not supported, since we count on this feature to
+         * atomically update 16-byte IRTE in posted format.
+         */
+        if ( !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
+            iommu_intpost = 0;
+
         if ( !vtd_ept_page_compatible(iommu) )
             iommu_hap_pt_share = 0;
 
@@ -2201,6 +2209,7 @@ int __init intel_vtd_setup(void)
     P(iommu_passthrough, "Dom0 DMA Passthrough");
     P(iommu_qinval, "Queued Invalidation");
     P(iommu_intremap, "Interrupt Remapping");
+    P(iommu_intpost, "Posted Interrupt");
     P(iommu_hap_pt_share, "Shared EPT tables");
 #undef P
 
@@ -2220,6 +2229,7 @@ int __init intel_vtd_setup(void)
     iommu_passthrough = 0;
     iommu_qinval = 0;
     iommu_intremap = 0;
+    iommu_intpost = 0;
     return ret;
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index ac71ed1..22abefe 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -61,6 +61,7 @@
 /*
  * Decoding Capability Register
  */
+#define cap_intr_post(c)       (((c) >> 59) & 1)
 #define cap_read_drain(c)      (((c) >> 55) & 1)
 #define cap_write_drain(c)     (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, feng.wu, Jan Beulich, Andrew Cooper

Extend struct pi_desc according to VT-d Posted-Interrupts Spec.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v8:
- Coding style

v7:
- Coding style.

v3:
- Use u32 instead of u64 for the bitfield in 'struct pi_desc'

 xen/include/asm-x86/hvm/vmx/vmcs.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f1126d4..81c9e63 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -80,8 +80,18 @@ struct vmx_domain {
 
 struct pi_desc {
     DECLARE_BITMAP(pir, NR_VECTORS);
-    u32 control;
-    u32 rsvd[7];
+    union {
+        struct {
+            u16     on     : 1,  /* bit 256 - Outstanding Notification */
+                    sn     : 1,  /* bit 257 - Suppress Notification */
+                    rsvd_1 : 14; /* bit 271:258 - Reserved */
+            u8      nv;          /* bit 279:272 - Notification Vector */
+            u8      rsvd_2;      /* bit 287:280 - Reserved */
+            u32     ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
 } __attribute__ ((aligned (64)));
 
 #define ept_get_wl(ept)   ((ept)->ept_wl)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] vmx: Add some helper functions for Posted-Interrupts
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, feng.wu, Jan Beulich, Andrew Cooper

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Use bitfield in pi_test_on() and pi_test_sn()

v4:
- Newly added

 xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3fbfa44..8d91110 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -101,6 +101,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
 void vmx_update_secondary_exec_control(struct vcpu *v);
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
 static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 {
     return test_and_set_bit(vector, pi_desc->pir);
@@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
     return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
 }
 
+static inline int pi_test_on(struct pi_desc *pi_desc)
+{
+    return pi_desc->on;
+}
+
 static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 {
     return xchg(&pi_desc->pir[group], 0);
 }
 
+static inline int pi_test_sn(struct pi_desc *pi_desc)
+{
+    return pi_desc->sn;
+}
+
+static inline void pi_set_sn(struct pi_desc *pi_desc)
+{
+    set_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+    clear_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
 /*
  * Exit Reasons
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, feng.wu, Jan Beulich, Andrew Cooper

This patch initializes the VT-d Posted-interrupt Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Add comments to function 'pi_desc_init' to clarify why we
  update the posted-interrupt descriptor in non-atomical way
  in it.

v3:
- Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c
- Remove the 'inline' flag of pi_desc_init()

 xen/arch/x86/hvm/vmx/vmcs.c       | 22 ++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index a0a97e7..5f67797 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/apic.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -951,6 +952,24 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+/*
+ * This function is only called in a vCPU's initialization phase,
+ * so we can update the posted-interrupt descriptor in non-atomic way.
+ */
+static void pi_desc_init(struct vcpu *v)
+{
+    uint32_t dest;
+
+    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
+
+    dest = cpu_physical_id(v->processor);
+
+    if ( x2apic_enabled )
+        v->arch.hvm_vmx.pi_desc.ndst = dest;
+    else
+        v->arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -1089,6 +1108,9 @@ static int construct_vmcs(struct vcpu *v)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
+        if ( iommu_intpost )
+            pi_desc_init(v);
+
         __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 8d91110..70b254f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -88,6 +88,8 @@ typedef enum {
 #define EPT_EMT_WB              6
 #define EPT_EMT_RSV2            7
 
+#define PI_xAPIC_NDST_MASK      0xFF00
+
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 09/17] VT-d: Remove pointless casts Feng Wu
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, feng.wu, Jan Beulich, Andrew Cooper

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Parenthesize '1 << POSTED_INTR_ON' and '1 << POSTED_INTR_SN'

v7:
- Coding style
- Read the current pi_desc.control as the intial value of prev.control

v6:
- Add some comments

v5:
- keep the vcpu_kick() at the end of vmx_deliver_posted_intr()
- Keep the 'return' after calling __vmx_deliver_posted_interrupt()

v4:
- Coding style.
- V3 removes a vcpu_kick() from the eoi_exitmap_changed path
  incorrectly, fix it.

v3:
- use cmpxchg to test SN/ON and set ON

 xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c32d863..741a271 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1701,8 +1701,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
          */
         pi_set_on(&v->arch.hvm_vmx.pi_desc);
     }
-    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
+    else
     {
+        struct pi_desc old, new, prev;
+
+        prev.control = v->arch.hvm_vmx.pi_desc.control;
+
+        do {
+            /*
+             * Currently, we don't support urgent interrupt, all
+             * interrupts are recognized as non-urgent interrupt,
+             * so we cannot send posted-interrupt when 'SN' is set.
+             * Besides that, if 'ON' is already set, we cannot set
+             * posted-interrupts as well.
+             */
+            if ( pi_test_sn(&prev) || pi_test_on(&prev) )
+            {
+                vcpu_kick(v);
+                return;
+            }
+
+            old.control = v->arch.hvm_vmx.pi_desc.control &
+                          ~((1 << POSTED_INTR_ON) | (1 << POSTED_INTR_SN));
+            new.control = v->arch.hvm_vmx.pi_desc.control |
+                          (1 << POSTED_INTR_ON);
+
+            prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
+                                   old.control, new.control);
+        } while ( prev.control != old.control );
+
         __vmx_deliver_posted_interrupt(v);
         return;
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 09/17] VT-d: Remove pointless casts
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, feng.wu

Remove pointless casts.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v7:
- Remove an 'u32' casting omitted in v5

v5:
- Newly added.

 xen/drivers/passthrough/vtd/utils.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 44c4ef5..a75059f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -234,10 +234,9 @@ static void dump_iommu_info(unsigned char key)
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    (u32)p->hi.svt, (u32)p->hi.sq, (u32)p->hi.sid,
-                    (u32)p->lo.dst, (u32)p->lo.vector, (u32)p->lo.avail,
-                    (u32)p->lo.dlm, (u32)p->lo.tm, (u32)p->lo.rh,
-                    (u32)p->lo.dm, (u32)p->lo.fpd, (u32)p->lo.p);
+                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
+                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
+                    p->lo.fpd, p->lo.p);
                 print_cnt++;
             }
             if ( iremap_entries )
@@ -281,11 +280,10 @@ static void dump_iommu_info(unsigned char key)
 
                 printk("   %02x:  %04x   %x    %x   %x   %x   %x    %x"
                     "    %x     %02x\n", i,
-                    (u32)remap->index_0_14 | ((u32)remap->index_15 << 15),
-                    (u32)remap->format, (u32)remap->mask, (u32)remap->trigger,
-                    (u32)remap->irr, (u32)remap->polarity,
-                    (u32)remap->delivery_status, (u32)remap->delivery_mode,
-                    (u32)remap->vector);
+                    remap->index_0_14 | (remap->index_15 << 15),
+                    remap->format, remap->mask, remap->trigger, remap->irr,
+                    remap->polarity, remap->delivery_status, remap->delivery_mode,
+                    remap->vector);
             }
         }
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 09/17] VT-d: Remove pointless casts Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, feng.wu

Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v8:
- Make use of the __uint128_t member in struct iremap_entry when needed

v7:
- Add a __uint128_t member to the union in struct iremap_entry

v4:
- res_4 is not a bitfiled, correct it.
- Expose 'im' to remapped irte as well.

v3:
- Use u32 instead of u64 to define the bitfields in 'struct iremap_entry'
- Limit using bitfield if possible

 xen/drivers/passthrough/vtd/intremap.c | 92 +++++++++++++++++-----------------
 xen/drivers/passthrough/vtd/iommu.h    | 44 ++++++++++------
 xen/drivers/passthrough/vtd/utils.c    |  8 +--
 3 files changed, 81 insertions(+), 63 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 987bbe9..8f135e1 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -122,9 +122,9 @@ static u16 hpetid_to_bdf(unsigned int hpet_id)
 static void set_ire_sid(struct iremap_entry *ire,
                         unsigned int svt, unsigned int sq, unsigned int sid)
 {
-    ire->hi.svt = svt;
-    ire->hi.sq = sq;
-    ire->hi.sid = sid;
+    ire->remap.svt = svt;
+    ire->remap.sq = sq;
+    ire->remap.sid = sid;
 }
 
 static void set_ioapic_source_id(int apic_id, struct iremap_entry *ire)
@@ -219,7 +219,7 @@ static unsigned int alloc_remap_entry(struct iommu *iommu, unsigned int nr)
         else
             p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-        if ( p->lo_val || p->hi_val ) /* not a free entry */
+        if ( p->val ) /* not a free entry */
             found = 0;
         else if ( ++found == nr )
             break;
@@ -253,7 +253,7 @@ static int remap_entry_to_ioapic_rte(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->val == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -263,13 +263,13 @@ static int remap_entry_to_ioapic_rte(
         return -EFAULT;
     }
 
-    old_rte->vector = iremap_entry->lo.vector;
-    old_rte->delivery_mode = iremap_entry->lo.dlm;
-    old_rte->dest_mode = iremap_entry->lo.dm;
-    old_rte->trigger = iremap_entry->lo.tm;
+    old_rte->vector = iremap_entry->remap.vector;
+    old_rte->delivery_mode = iremap_entry->remap.dlm;
+    old_rte->dest_mode = iremap_entry->remap.dm;
+    old_rte->trigger = iremap_entry->remap.tm;
     old_rte->__reserved_2 = 0;
     old_rte->dest.logical.__reserved_1 = 0;
-    old_rte->dest.logical.logical_dest = iremap_entry->lo.dst >> 8;
+    old_rte->dest.logical.logical_dest = iremap_entry->remap.dst >> 8;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -317,27 +317,28 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     if ( rte_upper )
     {
         if ( x2apic_enabled )
-            new_ire.lo.dst = value;
+            new_ire.remap.dst = value;
         else
-            new_ire.lo.dst = (value >> 24) << 8;
+            new_ire.remap.dst = (value >> 24) << 8;
     }
     else
     {
         *(((u32 *)&new_rte) + 0) = value;
-        new_ire.lo.fpd = 0;
-        new_ire.lo.dm = new_rte.dest_mode;
-        new_ire.lo.tm = new_rte.trigger;
-        new_ire.lo.dlm = new_rte.delivery_mode;
+        new_ire.remap.fpd = 0;
+        new_ire.remap.dm = new_rte.dest_mode;
+        new_ire.remap.tm = new_rte.trigger;
+        new_ire.remap.dlm = new_rte.delivery_mode;
         /* Hardware require RH = 1 for LPR delivery mode */
-        new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-        new_ire.lo.avail = 0;
-        new_ire.lo.res_1 = 0;
-        new_ire.lo.vector = new_rte.vector;
-        new_ire.lo.res_2 = 0;
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.avail = 0;
+        new_ire.remap.res_1 = 0;
+        new_ire.remap.vector = new_rte.vector;
+        new_ire.remap.res_2 = 0;
 
         set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
-        new_ire.hi.res_1 = 0;
-        new_ire.lo.p = 1;     /* finally, set present bit */
+        new_ire.remap.res_3 = 0;
+        new_ire.remap.res_4 = 0;
+        new_ire.remap.p = 1;     /* finally, set present bit */
 
         /* now construct new ioapic rte entry */
         remap_rte->vector = new_rte.vector;
@@ -510,7 +511,7 @@ static int remap_entry_to_msi_msg(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->val == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -523,25 +524,25 @@ static int remap_entry_to_msi_msg(
     msg->address_hi = MSI_ADDR_BASE_HI;
     msg->address_lo =
         MSI_ADDR_BASE_LO |
-        ((iremap_entry->lo.dm == 0) ?
+        ((iremap_entry->remap.dm == 0) ?
             MSI_ADDR_DESTMODE_PHYS:
             MSI_ADDR_DESTMODE_LOGIC) |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_ADDR_REDIRECTION_CPU:
             MSI_ADDR_REDIRECTION_LOWPRI);
     if ( x2apic_enabled )
-        msg->dest32 = iremap_entry->lo.dst;
+        msg->dest32 = iremap_entry->remap.dst;
     else
-        msg->dest32 = (iremap_entry->lo.dst >> 8) & 0xff;
+        msg->dest32 = (iremap_entry->remap.dst >> 8) & 0xff;
     msg->address_lo |= MSI_ADDR_DEST_ID(msg->dest32);
 
     msg->data =
         MSI_DATA_TRIGGER_EDGE |
         MSI_DATA_LEVEL_ASSERT |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_DATA_DELIVERY_FIXED:
             MSI_DATA_DELIVERY_LOWPRI) |
-        iremap_entry->lo.vector;
+        iremap_entry->remap.vector;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -600,29 +601,30 @@ static int msi_msg_to_remap_entry(
     memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
 
     /* Set interrupt remapping table entry */
-    new_ire.lo.fpd = 0;
-    new_ire.lo.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.lo.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.lo.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
+    new_ire.remap.fpd = 0;
+    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
+    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-    new_ire.lo.avail = 0;
-    new_ire.lo.res_1 = 0;
-    new_ire.lo.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                        MSI_DATA_VECTOR_MASK;
-    new_ire.lo.res_2 = 0;
+    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+    new_ire.remap.avail = 0;
+    new_ire.remap.res_1 = 0;
+    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                            MSI_DATA_VECTOR_MASK;
+    new_ire.remap.res_2 = 0;
     if ( x2apic_enabled )
-        new_ire.lo.dst = msg->dest32;
+        new_ire.remap.dst = msg->dest32;
     else
-        new_ire.lo.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                          & 0xff) << 8;
+        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
+                             & 0xff) << 8;
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.hi.res_1 = 0;
-    new_ire.lo.p = 1;    /* finally, set present bit */
+    new_ire.remap.res_3 = 0;
+    new_ire.remap.res_4 = 0;
+    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 22abefe..b440b69 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -281,29 +281,45 @@ struct dma_pte {
 /* interrupt remap entry */
 struct iremap_entry {
   union {
-    u64 lo_val;
+    __uint128_t val;
+    struct { u64 lo, hi; };
     struct {
-        u64 p       : 1,
+        u16 p       : 1,
             fpd     : 1,
             dm      : 1,
             rh      : 1,
             tm      : 1,
             dlm     : 3,
             avail   : 4,
-            res_1   : 4,
-            vector  : 8,
-            res_2   : 8,
-            dst     : 32;
-    }lo;
-  };
-  union {
-    u64 hi_val;
+            res_1   : 3,
+            im      : 1;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4;
+    } remap;
     struct {
-        u64 sid     : 16,
-            sq      : 2,
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
             svt     : 2,
-            res_1   : 44;
-    }hi;
+            res_5   : 12;
+        u32 pda_h;
+    } post;
   };
 };
 
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index a75059f..6daa156 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -230,13 +230,13 @@ static void dump_iommu_info(unsigned char key)
                 else
                     p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-                if ( !p->lo.p )
+                if ( !p->remap.p )
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
-                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
-                    p->lo.fpd, p->lo.p);
+                    p->remap.svt, p->remap.sq, p->remap.sid, p->remap.dst,
+                    p->remap.vector, p->remap.avail, p->remap.dlm, p->remap.tm,
+                    p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
                 print_cnt++;
             }
             if ( iremap_entries )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Jan Beulich, Yang Zhang, feng.wu

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Some minor adjustment

v7:
- Remove __uint128_t cast
- Remove Kevin's Ack due to a bug fix for v6
- Reword some comments
- Setup posted IRTE from zeroed structure

v6:
- In some error cases, the desc->lock will be unlocked twice, fix it.
- Coding style fix.
- Add some comments.

v5:
- Make some function parameters const
- Call "spin_unlock_irq(&desc->lock);" a little eariler
- Add "ASSERT(spin_is_locked(&pcidevs_lock))"
- -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen

v4:
- Don't inline setup_posted_irte()
- const struct pi_desc *pi_desc for setup_posted_irte()
- return -EINVAL when pirq_spin_lock_irq_desc() fails.
- Make some variables const
- Release irq desc lock earlier in pi_update_irte()
- Remove the pointless do-while() loop when doing cmpxchg16b()

v3:
- Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
- Change the return type of pi_update_irte() to int.
- Remove some pointless printk message in pi_update_irte().
- Use structure assignment instead of memcpy() for irte copy.

 xen/drivers/passthrough/vtd/intremap.c | 120 +++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |   6 ++
 xen/include/asm-x86/iommu.h            |   2 +
 3 files changed, 128 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 8f135e1..67e4f6d 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -899,3 +899,123 @@ void iommu_disable_x2apic_IR(void)
     for_each_drhd_unit ( drhd )
         disable_qinval(drhd->iommu);
 }
+
+static void setup_posted_irte(
+    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
+    const struct pi_desc *pi_desc, const uint8_t gvec)
+{
+    memset(new_ire, sizeof(*new_ire), 0);
+
+    if ( !old_ire->remap.im )
+    {
+        new_ire->post.p = old_ire->remap.p;
+        new_ire->post.fpd = old_ire->remap.fpd;
+        new_ire->post.sid = old_ire->remap.sid;
+        new_ire->post.sq = old_ire->remap.sq;
+        new_ire->post.svt = old_ire->remap.svt;
+    }
+    else
+    {
+        new_ire->post.p = old_ire->post.p;
+        new_ire->post.fpd = old_ire->post.fpd;
+        new_ire->post.sid = old_ire->post.sid;
+        new_ire->post.sq = old_ire->post.sq;
+        new_ire->post.svt = old_ire->post.svt;
+        new_ire->post.urg = old_ire->post.urg;
+    }
+
+    new_ire->post.im = 1;
+    new_ire->post.vector = gvec;
+    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
+}
+
+/*
+ * This function is used to update the IRTE for posted-interrupt
+ * when guest changes MSI/MSI-X information.
+ */
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
+    const uint8_t gvec)
+{
+    struct irq_desc *desc;
+    const struct msi_desc *msi_desc;
+    int remap_index;
+    int rc = 0;
+    const struct pci_dev *pci_dev;
+    const struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+    struct ir_ctrl *ir_ctrl;
+    struct iremap_entry *iremap_entries = NULL, *p = NULL;
+    struct iremap_entry new_ire, old_ire;
+    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    __uint128_t ret;
+
+    desc = pirq_spin_lock_irq_desc(pirq, NULL);
+    if ( !desc )
+        return -EINVAL;
+
+    msi_desc = desc->msi_desc;
+    if ( !msi_desc )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    pci_dev = msi_desc->dev;
+    if ( !pci_dev )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    remap_index = msi_desc->remap_index;
+
+    spin_unlock_irq(&desc->lock);
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    /*
+     * FIXME: For performance reasons we should store the 'iommu' pointer in
+     * 'struct msi_desc' in some other place, so we don't need to waste
+     * time searching it here.
+     */
+    drhd = acpi_find_matched_drhd_unit(pci_dev);
+    if ( !drhd )
+        return -ENODEV;
+
+    iommu = drhd->iommu;
+    ir_ctrl = iommu_ir_ctrl(iommu);
+    if ( !ir_ctrl )
+        return -ENODEV;
+
+    spin_lock_irq(&ir_ctrl->iremap_lock);
+
+    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
+
+    old_ire = *p;
+
+    /* Setup/Update interrupt remapping table entry. */
+    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
+    ret = cmpxchg16b(p, &old_ire, &new_ire);
+
+    /*
+     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
+     * and the hardware cannot update the IRTE behind us, so the return value
+     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
+     */
+    ASSERT(ret == old_ire.val);
+
+    iommu_flush_cache_entry(p, sizeof(*p));
+    iommu_flush_iec_index(iommu, 0, remap_index);
+
+    unmap_vtd_domain_page(iremap_entries);
+
+    spin_unlock_irq(&ir_ctrl->iremap_lock);
+
+    return 0;
+
+ unlock_out:
+    spin_unlock_irq(&desc->lock);
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index b440b69..c55ee08 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -323,6 +323,12 @@ struct iremap_entry {
   };
 };
 
+/*
+ * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
+ * the upper 26 bits of lest significiant 32 bits is available.
+ */
+#define PDA_LOW_BIT    26
+
 /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
 #define IREMAP_PAGE_ORDER  8
 
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 29203d7..92f0900 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -31,6 +31,8 @@ int iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 12/17] x86: move some APIC related macros to apicdef.h
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:54 ` [PATCH v8 13/17] Update IRTE according to guest interrupt config changes Feng Wu
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, feng.wu, Jan Beulich, Andrew Cooper

Move some APIC related macros to apicdef.h, so they can be used
outside of vlapic.c.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Minor changes

v7:
- Put the Macros to the right place inside the file.

v6:
- Newly introduced.

 xen/arch/x86/hvm/vlapic.c     | 5 -----
 xen/include/asm-x86/apicdef.h | 3 +++
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index b893b40..9b7c871 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -65,11 +65,6 @@ static const unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] =
      LVT_MASK
 };
 
-/* Following could belong in apicdef.h */
-#define APIC_SHORT_MASK                  0xc0000
-#define APIC_DEST_NOSHORT                0x0
-#define APIC_DEST_MASK                   0x800
-
 #define vlapic_lvt_vector(vlapic, lvt_type)                     \
     (vlapic_get_reg(vlapic, lvt_type) & APIC_VECTOR_MASK)
 
diff --git a/xen/include/asm-x86/apicdef.h b/xen/include/asm-x86/apicdef.h
index 6069fce..8752287 100644
--- a/xen/include/asm-x86/apicdef.h
+++ b/xen/include/asm-x86/apicdef.h
@@ -54,9 +54,11 @@
 #define			APIC_ESR_RECVILL	0x00040
 #define			APIC_ESR_ILLREGA	0x00080
 #define		APIC_ICR	0x300
+#define			APIC_DEST_NOSHORT	0x00000
 #define			APIC_DEST_SELF		0x40000
 #define			APIC_DEST_ALLINC	0x80000
 #define			APIC_DEST_ALLBUT	0xC0000
+#define			APIC_SHORT_MASK		0xC0000
 #define			APIC_ICR_RR_MASK	0x30000
 #define			APIC_ICR_RR_INVALID	0x00000
 #define			APIC_ICR_RR_INPROG	0x10000
@@ -64,6 +66,7 @@
 #define			APIC_INT_LEVELTRIG	0x08000
 #define			APIC_INT_ASSERT		0x04000
 #define			APIC_ICR_BUSY		0x01000
+#define			APIC_DEST_MASK		0x00800
 #define			APIC_DEST_LOGICAL	0x00800
 #define			APIC_DEST_PHYSICAL	0x00000
 #define			APIC_DM_FIXED		0x00000
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] Update IRTE according to guest interrupt config changes
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
@ 2015-10-12  8:54 ` Feng Wu
  2015-10-12  8:55 ` [PATCH v8 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:54 UTC (permalink / raw)
  To: xen-devel; +Cc: feng.wu, Jan Beulich

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Remove local variable 'bitmap_array_size'
- Use switch to replace if-else

v7:
- Remove some pointless debug printk
- Fix a logic error when assigning 'delivery_mode'
- Adjust the definition of local variable 'idx'
- Add a dprintk if we cannot find the vCPU from 'pi_find_dest_vcpu'
- Add 'else if ( delivery_mode == dest_Fixed )' in 'pi_find_dest_vcpu'

v6:
- Use macro to replace plain numbers
- Correct the overflow error in a loop

v5:
- Make 'struct vcpu *vcpu' const

v4:
- Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
- Make 'dest_id' uint32_t
- Rename 'size' to 'bitmap_array_size'
- find_next_bit() and find_first_bit() always return unsigned int,
  so no need to check whether the return value is less than 0.
- Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
- Remove useless warning message
- Create a seperate function vector_hashing_dest() to find the
- destination of lowest-priority interrupts.
- Change some comments

v3:
- Use bitmap to store the all the possible destination vCPUs of an
  interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 123 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 122 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index bda9374..6b1ee6a 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -25,6 +25,7 @@
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
+#include <asm/io_apic.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *    vCPUs in an array.
+ * 2. Use "gvec % max number of destination vCPUs" to find the right
+ *    destination vCPU in the array for the lowest-priority interrupt.
+ */
+static struct vcpu *vector_hashing_dest(const struct domain *d,
+                                        uint32_t dest_id,
+                                        bool_t dest_mode,
+                                        uint8_t gvec)
+
+{
+    unsigned long *dest_vcpu_bitmap;
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+    unsigned int i;
+
+    dest_vcpu_bitmap = xzalloc_array(unsigned long,
+                                     BITS_TO_LONGS(d->max_vcpus));
+    if ( !dest_vcpu_bitmap )
+        return NULL;
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                dest_id, dest_mode) )
+            continue;
+
+        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
+        dest_vcpus++;
+    }
+
+    if ( dest_vcpus != 0 )
+    {
+        unsigned int mod = gvec % dest_vcpus;
+        unsigned int idx = 0;
+
+        for ( i = 0; i <= mod; i++ )
+        {
+            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
+            BUG_ON(idx >= d->max_vcpus);
+        }
+
+        dest = d->vcpu[idx - 1];
+    }
+
+    xfree(dest_vcpu_bitmap);
+
+    return dest;
+}
+
+/*
+ * The purpose of this routine is to find the right destination vCPU for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:
+ *
+ * - For lowest-priority interrupts, use vector-hashing mechanism to find
+ *   the destination.
+ * - Otherwise, for single destination interrupt, it is straightforward to
+ *   find the destination vCPU and return true.
+ * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
+ *   so return NULL.
+ */
+static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
+                                      bool_t dest_mode, uint8_t delivery_mode,
+                                      uint8_t gvec)
+{
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+
+    switch ( delivery_mode )
+    {
+    case dest_LowestPrio:
+        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
+    case dest_Fixed:
+        for_each_vcpu ( d, v )
+        {
+            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                    dest_id, dest_mode) )
+                continue;
+
+            dest_vcpus++;
+            dest = v;
+        }
+
+        /* For fixed mode, we only handle single-destination interrupts. */
+        if ( dest_vcpus == 1 )
+            return dest;
+
+        break;
+    default:
+        break;
+    }
+
+    return NULL;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -256,7 +359,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, dest_mode;
+        uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -329,11 +432,29 @@ int pt_irq_create_bind(
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
         dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
         dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
+        delivery_mode = (pirq_dpci->gmsi.gflags & VMSI_DELIV_MASK) >>
+                         GFLAGS_SHIFT_DELIV_MODE;
+
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+
+        /* Use interrupt posting if it is supported. */
+        if ( iommu_intpost )
+        {
+            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
+                                          delivery_mode, pirq_dpci->gmsi.gvec);
+
+            if ( vcpu )
+                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+            else
+                dprintk(XENLOG_G_INFO,
+                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
+                        vcpu, pirq_dpci->gmsi.gvec);
+        }
+
         break;
     }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 14/17] vmx: Properly handle notification event when vCPU is running
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2015-10-12  8:54 ` [PATCH v8 13/17] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-10-12  8:55 ` Feng Wu
  2015-10-12  8:55 ` [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, feng.wu, Jan Beulich, Andrew Cooper

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7:
- Retain 'cli' in the comments to make it more understandable.
- Register another notification event handler when VT-d PI is enabled

v6:
- Ack the interrupt in the beginning of pi_notification_interrupt()

v4:
- Coding style.

v3:
- Make pi_notification_interrupt() static

 xen/arch/x86/hvm/vmx/vmx.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 741a271..e448b31 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1975,6 +1975,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is running. */
+static void pi_notification_interrupt(struct cpu_user_regs *regs)
+{
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    /*
+     * We get here when a vCPU is running in root-mode (such as via hypercall,
+     * or any other reasons which can result in VM-Exit), and before vCPU is
+     * back to non-root, external interrupts from an assigned device happen
+     * and a notification event is delivered to this logical CPU.
+     *
+     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
+     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
+     * be synced to vIRR before VM-Exit in time.
+     *
+     * Please refer to the following code fragments from
+     * xen/arch/x86/hvm/vmx/entry.S:
+     *
+     *     .Lvmx_do_vmentry
+     *
+     *      ......
+     *
+     *      point 1
+     *
+     *      cli
+     *      cmp  %ecx,(%rdx,%rax,1)
+     *      jnz  .Lvmx_process_softirqs
+     *
+     *      ......
+     *
+     *      je   .Lvmx_launch
+     *
+     *      ......
+     *
+     *     .Lvmx_process_softirqs:
+     *      sti
+     *      call do_softirq
+     *      jmp  .Lvmx_do_vmentry
+     *
+     * If VT-d engine issues a notification event at point 1 above, it cannot
+     * be delivered to the guest during this VM-entry without raising the
+     * softirq in this notification handler.
+     */
+    raise_softirq(VCPU_KICK_SOFTIRQ);
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2012,7 +2059,12 @@ const struct hvm_function_table * __init start_vmx(void)
     }
 
     if ( cpu_has_vmx_posted_intr_processing )
-        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    {
+        if ( iommu_intpost )
+            alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+        else
+            alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    }
     else
     {
         vmx_function_table.deliver_posted_intr = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (13 preceding siblings ...)
  2015-10-12  8:55 ` [PATCH v8 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-10-12  8:55 ` Feng Wu
  2015-10-26 14:39   ` Dario Faggioli
  2015-10-27 12:22   ` Dario Faggioli
  2015-10-12  8:55 ` [PATCH v8 16/17] VT-d: Dump the posted format IRTE Feng Wu
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, feng.wu

This patch includes the following aspects:
- Handling logic when vCPU is blocked:
    * Add a global vector to wake up the blocked vCPU
      when an interrupt is being posted to it (This part
      was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
    * Define two per-cpu variables:
          1. pi_blocked_vcpu:
            A list storing the vCPUs which were blocked
            on this pCPU.

          2. pi_blocked_vcpu_lock:
            The spinlock to protect pi_blocked_vcpu.

- Add the following hooks, this part was suggested
  by George Dunlap <george.dunlap@eu.citrix.com> and
  Dario Faggioli <dario.faggioli@citrix.com>.
    * arch_vcpu_block()
      Called alled before vcpu is blocking and update the PID
      (posted-interrupt descriptor).

    * arch_vcpu_block_cancel()
      Called when interrupts come in during blocking.

    * vmx_pi_switch_from()
      Called before context switch, we update the PID when the
      vCPU is preempted or going to sleep.

    * vmx_pi_switch_to()
      Called after context switch, we update the PID when the vCPU
      is going to run.

    * arch_vcpu_wake_prepare()
      It will be called when waking up the vCPU, we update
      the posted interrupt descriptor when the vCPU is
      unblocked.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Sugguested-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Remove the lazy context switch handling for PI state transition
- Change PI state in vcpu_block() and do_poll() when the vCPU
  is going to be blocked

v7:
- Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts
  and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
  into this patch, so it is self-contained and more convenient
  for code review.
- Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
- Coding style
- Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
- Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
- Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
- Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
- Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
- Use 'spin_lock' and 'spin_unlock' when the interrupt has been
  already disabled.
- Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
- Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
- Call .pi_ctxt_switch_to() __context_switch() instead of directly
  calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
- Make .pi_block_cpu unsigned int
- Use list_del() instead of list_del_init()
- Coding style

One remaining item in v7:
Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
need Dario or George's input about this.

v6:
- Add two static inline functions for pi context switch
- Fix typos

v5:
- Rename arch_vcpu_wake to arch_vcpu_wake_prepare
- Make arch_vcpu_wake_prepare() inline for ARM
- Merge the ARM dummy hook with together
- Changes to some code comments
- Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
  PI is disabled or the vCPU is not in HVM
- Coding style

v4:
- Newly added

Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
v6:
- Fix some typos
- Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()

v4:
- Use local variables in pi_wakeup_interrupt()
- Remove vcpu from the blocked list when pi_desc.on==1, this
- avoid kick vcpu multiple times.
- Remove tasklet

v3:
- This patch is generated by merging the following three patches in v2:
   [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
   [RFC v2 10/15] vmx: Define two per-cpu variables
   [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
- rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
- Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
- rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
- Make pi_wakeup_interrupt() static
- Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
- move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
- Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
- Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'

 xen/arch/x86/domain.c              |  12 ++
 xen/arch/x86/hvm/hvm.c             |  18 +++
 xen/arch/x86/hvm/vmx/vmcs.c        |   2 +
 xen/arch/x86/hvm/vmx/vmx.c         | 265 +++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c              |   9 ++
 xen/include/asm-arm/domain.h       |   4 +
 xen/include/asm-x86/domain.h       |   4 +
 xen/include/asm-x86/hvm/hvm.h      |   4 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |  11 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |   4 +
 10 files changed, 333 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..1d3eb15 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1608,6 +1608,18 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     if ( (per_cpu(curr_vcpu, cpu) == next) ||
          (is_idle_domain(nextd) && cpu_online(cpu)) )
     {
+        /*
+         * When we handle the lazy context switch for the following
+         * two scenarios:
+         * - Preempted by a tasklet, which uses in an idle context
+         * - the prev vcpu is in offline and no new available vcpus in run queue
+         * We don't change the 'SN' bit in posted-interrupt descriptor, this
+         * may incur spurious PI notification events, but since PI notification
+         * event is only sent when 'ON' is clear, and once the PI notificatoin
+         * is sent, ON is set by hardware, so not so many spurious events and
+         * it is not a big deal.
+         */
+
         local_irq_enable();
     }
     else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c957610..e493e37 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6817,6 +6817,24 @@ bool_t altp2m_vcpu_emulate_ve(struct vcpu *v)
     return 0;
 }
 
+void arch_vcpu_block(struct vcpu *v)
+{
+    if ( v->arch.vcpu_block )
+        v->arch.vcpu_block(v);
+}
+
+void arch_vcpu_block_cancel(struct vcpu *v)
+{
+    if ( v->arch.vcpu_block_cancel )
+        v->arch.vcpu_block_cancel(v);
+}
+
+void arch_vcpu_wake_prepare(struct vcpu *v)
+{
+    if ( v->arch.vcpu_wake_prepare )
+        v->arch.vcpu_wake_prepare(v);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 5f67797..5abe960 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -661,6 +661,8 @@ int vmx_cpu_up(void)
     if ( cpu_has_vmx_vpid )
         vpid_sync_all();
 
+    vmx_pi_per_cpu_init(cpu);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e448b31..cad70b4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -83,7 +83,206 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 
+/*
+ * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be woken up.
+ */
+static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
+static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
+
+void vmx_pi_per_cpu_init(unsigned int cpu)
+{
+    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
+    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
+}
+
+void vmx_vcpu_block(struct vcpu *v)
+{
+    struct pi_desc old, new;
+    unsigned int dest;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    unsigned long flags;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags);
+
+    /*
+     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
+     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
+     * the per-CPU list, we also save it to posted-interrupt descriptor and
+     * make it as the destination of the wake-up notification event.
+     */
+    v->arch.hvm_vmx.pi_block_cpu = v->processor;
+
+    spin_lock(&per_cpu(pi_blocked_vcpu_lock, v->arch.hvm_vmx.pi_block_cpu));
+    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+    spin_unlock(&per_cpu(pi_blocked_vcpu_lock,
+                v->arch.hvm_vmx.pi_block_cpu));
+
+    do {
+        old.control = new.control = pi_desc->control;
+
+        /*
+         * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
+         * so when external interrupts from assigned deivces happen,
+         * wakeup notifiction event will go to
+         * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
+         * we can find the vCPU in the right list to wake up.
+         */
+        dest = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
+
+        if ( x2apic_enabled )
+            new.ndst = dest;
+        else
+            new.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+
+        pi_clear_sn(&new);
+        new.nv = pi_wakeup_vector;
+    } while ( cmpxchg(&pi_desc->control, old.control, new.control) !=
+              old.control );
+
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
+}
+
+void vmx_vcpu_block_cancel(struct vcpu *v)
+{
+    unsigned long flags;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags);
+
+    if ( !test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+        unsigned int pi_block_cpu;
+
+        /* the vCPU is not on any blocking list. */
+        pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
+        if ( pi_block_cpu == NR_CPUS )
+            goto out;
+
+        /*
+         * Set 'NV' field back to posted_intr_vector, so the
+         * Posted-Interrupts can be delivered to the vCPU by
+         * VT-d HW after it is scheduled to run.
+         */
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+
+        spin_lock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+
+        /*
+         * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
+         * removed from the blocking list while we are acquiring the lock.
+         */
+        if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
+        {
+            spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+            goto out;
+        }
+
+        list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+        v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+        spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+    }
+
+out:
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
+}
+
+void vmx_vcpu_wake_prepare(struct vcpu *v)
+{
+    unsigned long flags;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags);
+
+    if ( !test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+        unsigned int pi_block_cpu;
+
+        /* the vCPU is not on any blocking list. */
+        pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
+        if ( pi_block_cpu == NR_CPUS )
+            goto out;
+
+        /*
+         * We cannot set 'SN' here since we don't change 'SN' during lazy
+         * context switch, if we set 'SN' here, we may end up in the situation
+         * that the vCPU is running with 'SN' set.
+         */
+
+        /*
+         * Set 'NV' field back to posted_intr_vector, so the
+         * Posted-Interrupts can be delivered to the vCPU by
+         * VT-d HW after it is scheduled to run.
+         */
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+
+        spin_lock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+
+        /*
+         * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
+         * removed from the blocking list while we are acquiring the lock.
+         */
+        if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
+        {
+            spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+            goto out;
+        }
+
+        list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+        v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+        spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
+    }
+
+out:
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
+}
+
+static void vmx_pi_switch_from(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) || !iommu_intpost ||
+         test_bit(_VPF_blocked, &v->pause_flags) )
+        return;
+
+    /*
+     * The vCPU has been preempted or went to sleep. We don't need to send
+     * notification event to a non-running vcpu, the interrupt information
+     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
+     * to run next time.
+     */
+    pi_set_sn(pi_desc);
+}
+
+static void vmx_pi_switch_to(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) || !iommu_intpost )
+        return;
+
+    if ( x2apic_enabled )
+        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
+    else
+        write_atomic(&pi_desc->ndst,
+                     MASK_INSR(cpu_physical_id(v->processor),
+                     PI_xAPIC_NDST_MASK));
+
+    pi_clear_sn(pi_desc);
+}
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -106,10 +305,24 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
 
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
+
+    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+
+    spin_lock_init(&v->arch.hvm_vmx.pi_lock);
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
 
+    if ( iommu_intpost && has_hvm_container_vcpu(v) )
+    {
+        v->arch.vcpu_block = vmx_vcpu_block;
+        v->arch.vcpu_block_cancel = vmx_vcpu_block_cancel;
+        v->arch.vcpu_wake_prepare = vmx_vcpu_wake_prepare;
+    }
+
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
@@ -721,6 +934,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
     vmx_save_dr(v);
+    vmx_pi_switch_from(v);
 }
 
 static void vmx_ctxt_switch_to(struct vcpu *v)
@@ -745,6 +959,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
+    vmx_pi_switch_to(v);
 }
 
 
@@ -1975,6 +2190,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is blocked. */
+static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+    struct arch_vmx_struct *vmx, *tmp;
+    struct vcpu *v;
+    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
+    struct list_head *blocked_vcpus =
+                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
+    LIST_HEAD(list);
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    spin_lock(lock);
+
+    /*
+     * XXX: The length of the list depends on how many vCPU is current
+     * blocked on this specific pCPU. This may hurt the interrupt latency
+     * if the list grows to too many entries.
+     */
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
+    {
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocked_vcpu_list);
+            vmx->pi_block_cpu = NR_CPUS;
+
+            /*
+             * We cannot call vcpu_unblock here, since it also needs
+             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
+             * set in another list and unblock them after we release
+             * 'pi_blocked_vcpu_lock'.
+             */
+            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
+        }
+    }
+
+    spin_unlock(lock);
+
+    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
+    {
+        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+        list_del(&vmx->pi_vcpu_on_set_list);
+        vcpu_unblock(v);
+    }
+}
+
 /* Handle VT-d posted-interrupt when VCPU is running. */
 static void pi_notification_interrupt(struct cpu_user_regs *regs)
 {
@@ -2061,7 +2323,10 @@ const struct hvm_function_table * __init start_vmx(void)
     if ( cpu_has_vmx_posted_intr_processing )
     {
         if ( iommu_intpost )
+        {
             alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+        }
         else
             alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
     }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3eefed7..383fd62 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -412,6 +412,8 @@ void vcpu_wake(struct vcpu *v)
     unsigned long flags;
     spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
+    arch_vcpu_wake_prepare(v);
+
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
@@ -800,10 +802,13 @@ void vcpu_block(void)
 
     set_bit(_VPF_blocked, &v->pause_flags);
 
+    arch_vcpu_block(v);
+
     /* Check for events /after/ blocking: avoids wakeup waiting race. */
     if ( local_events_need_delivery() )
     {
         clear_bit(_VPF_blocked, &v->pause_flags);
+        arch_vcpu_block_cancel(v);
     }
     else
     {
@@ -837,6 +842,8 @@ static long do_poll(struct sched_poll *sched_poll)
     v->poll_evtchn = -1;
     set_bit(v->vcpu_id, d->poll_mask);
 
+    arch_vcpu_block(v);
+
 #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
     /* Check for events /after/ setting flags: avoids wakeup waiting race. */
     smp_mb();
@@ -854,6 +861,7 @@ static long do_poll(struct sched_poll *sched_poll)
 #endif
 
     rc = 0;
+
     if ( local_events_need_delivery() )
         goto out;
 
@@ -887,6 +895,7 @@ static long do_poll(struct sched_poll *sched_poll)
     v->poll_evtchn = 0;
     clear_bit(v->vcpu_id, d->poll_mask);
     clear_bit(_VPF_blocked, &v->pause_flags);
+    arch_vcpu_block_cancel(v);
     return rc;
 }
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 56aa208..ec2b536 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -301,6 +301,10 @@ static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
     return vaff;
 }
 
+static inline void arch_vcpu_wake_prepare(struct vcpu *v) {}
+static inline void arch_vcpu_block(struct vcpu *v) {}
+static inline void arch_vcpu_block_cancel(struct vcpu *v) {}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 0fce09e..c37af4e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -481,6 +481,10 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    void (*vcpu_block) (struct vcpu *);
+    void (*vcpu_block_cancel) (struct vcpu *);
+    void (*vcpu_wake_prepare) (struct vcpu *);
+
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 3cac64f..4cd7fcb 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -545,6 +545,10 @@ static inline bool_t hvm_altp2m_supported(void)
     return hvm_funcs.altp2m_supported;
 }
 
+void arch_vcpu_wake_prepare(struct vcpu *v);
+void arch_vcpu_block(struct vcpu *v);
+void arch_vcpu_block_cancel(struct vcpu *v);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 81c9e63..23f8192 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -160,6 +160,17 @@ struct arch_vmx_struct {
     struct page_info     *vmwrite_bitmap;
 
     struct page_info     *pml_pg;
+
+    struct list_head     pi_blocked_vcpu_list;
+    struct list_head     pi_vcpu_on_set_list;
+
+    /*
+     * Before vCPU is blocked, it is added to the global per-cpu list
+     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
+     * event to 'pi_block_cpu' and wakeup the related vCPU.
+     */
+    unsigned int         pi_block_cpu;
+    spinlock_t           pi_lock;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 70b254f..2eaea32 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -28,6 +28,8 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/vmx/vmcs.h>
 
+extern uint8_t pi_wakeup_vector;
+
 typedef union {
     struct {
         u64 r       :   1,  /* bit 0 - Read permission */
@@ -557,6 +559,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
+void vmx_pi_per_cpu_init(unsigned int cpu);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 16/17] VT-d: Dump the posted format IRTE
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (14 preceding siblings ...)
  2015-10-12  8:55 ` [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-10-12  8:55 ` Feng Wu
  2015-10-12  8:55 ` [PATCH v8 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, feng.wu

Add the utility to dump the posted format IRTE.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Coding style

v7:
- Remove the two stage loop

v6:
- Fix a typo

v4:
- Newly added

 xen/drivers/passthrough/vtd/utils.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 6daa156..a1b3ebc 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -203,6 +203,9 @@ static void dump_iommu_info(unsigned char key)
             ecap_intr_remap(iommu->ecap) ? "" : "not ",
             (status & DMA_GSTS_IRES) ? " and enabled" : "" );
 
+        printk("  Interrupt Posting: %ssupported.\n",
+               cap_intr_post(iommu->cap) ? "" : "not ");
+
         if ( status & DMA_GSTS_IRES )
         {
             /* Dump interrupt remapping table. */
@@ -213,8 +216,9 @@ static void dump_iommu_info(unsigned char key)
 
             printk("  Interrupt remapping table (nr_entry=%#x. "
                 "Only dump P=1 entries here):\n", nr_entry);
-            printk("       SVT  SQ   SID      DST  V  AVL DLM TM RH DM "
-                   "FPD P\n");
+            printk("R means remapped format, P means posted format.\n");
+            printk("R:       SVT  SQ   SID  V  AVL FPD      DST DLM TM RH DM P\n");
+            printk("P:       SVT  SQ   SID  V  AVL FPD              PDA  URG P\n");
             for ( i = 0; i < nr_entry; i++ )
             {
                 struct iremap_entry *p;
@@ -232,11 +236,21 @@ static void dump_iommu_info(unsigned char key)
 
                 if ( !p->remap.p )
                     continue;
-                printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
-                    "   %x %x\n", i,
-                    p->remap.svt, p->remap.sq, p->remap.sid, p->remap.dst,
-                    p->remap.vector, p->remap.avail, p->remap.dlm, p->remap.tm,
-                    p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
+                if ( !p->remap.im )
+                    printk("R:  %04x:  %x   %x  %04x %02x    %x   %x %08x   %x  %x  %x  %x %x\n",
+                           i,
+                           p->remap.svt, p->remap.sq, p->remap.sid,
+                           p->remap.vector, p->remap.avail, p->remap.fpd,
+                           p->remap.dst, p->remap.dlm, p->remap.tm, p->remap.rh,
+                           p->remap.dm, p->remap.p);
+                else
+                    printk("P:  %04x:  %x   %x  %04x %02x    %x   %x %16lx    %x %x\n",
+                           i,
+                           p->post.svt, p->post.sq, p->post.sid, p->post.vector,
+                           p->post.avail, p->post.fpd,
+                           ((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
+                           p->post.urg, p->post.p);
+
                 print_cnt++;
             }
             if ( iremap_entries )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 17/17] Add a command line parameter for VT-d posted-interrupts
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (15 preceding siblings ...)
  2015-10-12  8:55 ` [PATCH v8 16/17] VT-d: Dump the posted format IRTE Feng Wu
@ 2015-10-12  8:55 ` Feng Wu
  2015-10-23  2:12 ` [PATCH v8 00/17] Add VT-d Posted-Interrupts support Wu, Feng
  2015-10-29 10:51 ` Dario Faggioli
  18 siblings, 0 replies; 49+ messages in thread
From: Feng Wu @ 2015-10-12  8:55 UTC (permalink / raw)
  To: xen-devel; +Cc: feng.wu, Jan Beulich

Enable VT-d Posted-Interrupts and add a command line
parameter for it.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v6:
- Change the default value to 'false' in xen-command-line.markdown

 docs/misc/xen-command-line.markdown | 9 ++++++++-
 xen/drivers/passthrough/iommu.c     | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index a2e427c..ecaf221 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -855,7 +855,7 @@ debug hypervisor only).
 > Default: `new` unless directed-EOI is supported
 
 ### iommu
-> `= List of [ <boolean> | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
+> `= List of [ <boolean> | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
 
 > Sub-options:
 
@@ -882,6 +882,13 @@ debug hypervisor only).
 >> Control the use of interrupt remapping (DMA remapping will always be enabled
 >> if IOMMU functionality is enabled).
 
+> `intpost`
+
+> Default: `false`
+
+>> Control the use of interrupt posting, which depends on the availability of
+>> interrupt remapping.
+
 > `qinval` (VT-d)
 
 > Default: `true`
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 36d5cc0..8d03076 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -38,6 +38,7 @@ static void iommu_dump_p2m_table(unsigned char key);
  *   no-snoop                   Disable VT-d Snoop Control
  *   no-qinval                  Disable VT-d Queued Invalidation
  *   no-intremap                Disable VT-d Interrupt Remapping
+ *   no-intpost                 Disable VT-d Interrupt posting
  */
 custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
@@ -105,6 +106,8 @@ static void __init parse_iommu_param(char *s)
             iommu_qinval = val;
         else if ( !strcmp(s, "intremap") )
             iommu_intremap = val;
+        else if ( !strcmp(s, "intpost") )
+            iommu_intpost = val;
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design
  2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-10-12 10:41   ` Andrew Cooper
  2015-10-13  0:53     ` Wu, Feng
  2015-10-29 15:56   ` Dario Faggioli
  1 sibling, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2015-10-12 10:41 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: Yang Zhang, George Dunlap, Kevin Tian, Keir Fraser, Jan Beulich

On 12/10/15 09:54, Feng Wu wrote:
> Add the design doc for VT-d PI.
>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  docs/misc/vtd-pi.txt | 332 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 332 insertions(+)
>  create mode 100644 docs/misc/vtd-pi.txt
>
> diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> new file mode 100644
> index 0000000..af5409a
> --- /dev/null
> +++ b/docs/misc/vtd-pi.txt
> @@ -0,0 +1,332 @@
> +Authors: Feng Wu <feng.wu@intel.com>
> +
> +VT-d Posted-interrupt (PI) design for XEN
> +
> +Background
> +==========
> +With the development of virtualization, there are more and more device
> +assignment requirements. However, today when a VM is running with
> +assigned devices (such as, NIC), external interrupt handling for the assigned
> +devices always needs VMM intervention.
> +
> +VT-d Posted-interrupt is a more enhanced method to handle interrupts
> +in the virtualization environment. Interrupt posting is the process by
> +which an interrupt request is recorded in a memory-resident
> +posted-interrupt-descriptor structure by the root-complex, followed by
> +an optional notification event issued to the CPU complex.
> +
> +With VT-d Posted-interrupt we can get the following advantages:
> +- Direct delivery of external interrupts to running vCPUs without VMM
> +intervention
> +- Decrease the interrupt migration complexity. On vCPU migration, software
> +can atomically co-migrate all interrupts targeting the migrating vCPU. For
> +virtual machines with assigned devices, migrating a vCPU across pCPUs
> +either incurs the overhead of forwarding interrupts in software (e.g. via VMM
> +generated IPIs), or complexity to independently migrate each interrupt targeting
> +the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
> +of an external interrupt from assigned devices is stored in the IRTE (i.e.
> +Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
> +we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
> +make the interrupt migration automatic.
> +
> +Here is what Xen currently does for external interrupts from assigned devices:
> +
> +When a VM is running and an external interrupt from an assigned device occurs
> +for it. VM-EXIT happens, then:
> +
> +vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
> +raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> +
> +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> +
> +dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
> +vmsi_inj_irq() --> vlapic_set_irq()
> +
> +vlapic_set_irq() does the following things:
> +1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
> +the virtual interrupt via posted-interrupt infrastructure.
> +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
> +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
> +will help to inject the interrupt to guests.
> +
> +However, after VT-d PI is supported, when a guest is running in non-root and an
> +external interrupt from an assigned device occurs for it. No VM-Exit is needed,
> +the guest can handle this totally in non-root mode, thus avoiding all the above
> +code flow.
> +
> +Posted-interrupt Introduction
> +========================
> +There are two components to the Posted-interrupt architecture:
> +Processor Support and Root-Complex Support
> +
> +- Processor Support
> +Posted-interrupt processing is a feature by which a processor processes
> +the virtual interrupts by recording them as pending on the virtual-APIC
> +page.
> +
> +Posted-interrupt processing is enabled by setting the process posted
> +interrupts VM-execution control. The processing is performed in response
> +to the arrival of an interrupt with the posted-interrupt notification vector.
> +In response to such an interrupt, the processor processes virtual interrupts
> +recorded in a data structure called a posted-interrupt descriptor.
> +
> +More information about APICv and CPU-side Posted-interrupt, please refer
> +to Chapter 29, and Section 29.6 in the Intel SDM:
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Please give chapter names rather than numbers (as numbers shift), and
the top level page rather than a specific version of the manual.

> +
> +- Root-Complex Support
> +Interrupt posting is the process by which an interrupt request (from IOAPIC
> +or MSI/MSIx capable sources) is recorded in a memory-resident
> +posted-interrupt-descriptor structure by the root-complex, followed by
> +an optional notification event issued to the CPU complex. The interrupt
> +request arriving at the root-complex carry the identity of the interrupt
> +request source and a 'remapping-index'. The remapping-index is used to
> +look-up an entry from the memory-resident interrupt-remap-table. Unlike
> +interrupt-remapping, the interrupt-remap-table-entry for a posted-interrupt,
> +specifies a virtual-vector and a pointer to the posted-interrupt descriptor.
> +The virtual-vector specifies the vector of the interrupt to be recorded in
> +the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage
> +for the virtual-vectors and contains the attributes of the notification event
> +(interrupt) to be issued to the CPU complex to inform CPU/software about pending
> +interrupts recorded in the posted-interrupt descriptor.
> +
> +More information about VT-d PI, please refer to
> +http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> +
> +Important Definitions
> +==================
> +There are some changes to IRTE and posted-interrupt descriptor after
> +VT-d PI is introduced:
> +IRTE: Interrupt Remapping Table Entry
> +Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
> +Virtual Vector: the guest vector of the interrupt
> +URG: indicates if the interrupt is urgent
> +
> +Posted-interrupt descriptor:
> +The Posted Interrupt Descriptor hosts the following fields:
> +Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
> +per vector, for up to 256 vectors).
> +
> +Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
> +processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
> +hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
> +the notification event (processor or software) resets it as part of posted interrupt processing.
> +
> +Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
> +generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
> +URG=0).
> +
> +Notification Vector (NV): Specify the vector for notification event (interrupt).
> +
> +Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
> +processor for the notification event.
> +
> +Design Overview
> +==============
> +In this design, we will cover the following items:
> +1. Add a variable to control whether enable VT-d posted-interrupt or not.
> +2. VT-d PI feature detection.
> +3. Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
> +4. Extend IRTE structure to support VT-d PI.
> +5. Introduce a new global vector which is used for waking up the blocked vCPU.
> +6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
> +7. Update posted-interrupt descriptor during vCPU scheduling (when the state
> +of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
> +RUNSTATE_runnable / RUNSTATE_offline).
> +8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
> +9. New boot command line for Xen, which controls VT-d PI feature by user.
> +10. Multicast/broadcast and lowest priority interrupts consideration.
> +
> +
> +Implementation details
> +===================
> +- New variable to control VT-d PI
> +
> +Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
> +to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
> +only when interrupt remapping and VT-d posted-interrupt are both enabled.
> +
> +- VT-d PI feature detection.
> +Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
> +
> +- Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
> +Here is the new structure for posted-interrupt descriptor:
> +
> +struct pi_desc {
> +    DECLARE_BITMAP(pir, NR_VECTORS);
> +    union {
> +        struct
> +        {
> +        u16 on     : 1,  /* bit 256 - Outstanding Notification */
> +            sn     : 1,  /* bit 257 - Suppress Notification */
> +            rsvd_1 : 14; /* bit 271:258 - Reserved */
> +        u8  nv;          /* bit 279:272 - Notification Vector */
> +        u8  rsvd_2;      /* bit 287:280 - Reserved */
> +        u32 ndst;        /* bit 319:288 - Notification Destination */
> +        };
> +        u64 control;
> +    };
> +    u32 rsvd[6];
> +} __attribute__ ((aligned (64)));
> +
> +- Extend IRTE structure to support VT-d PI.
> +
> +Here is the new structure for IRTE:
> +/* interrupt remap entry */
> +struct iremap_entry {
> +  union {
> +    struct { u64 lo, hi; };
> +    struct {
> +        u16 p       : 1,
> +            fpd     : 1,
> +            dm      : 1,
> +            rh      : 1,
> +            tm      : 1,
> +            dlm     : 3,
> +            avail   : 4,
> +            res_1   : 4;
> +        u8  vector;
> +        u8  res_2;
> +        u32 dst;
> +        u16 sid;
> +        u16 sq      : 2,
> +            svt     : 2,
> +            res_3   : 12;
> +        u32 res_4   : 32;
> +    } remap;
> +    struct {
> +        u16 p       : 1,
> +            fpd     : 1,
> +            res_1   : 6,
> +            avail   : 4,
> +            res_2   : 2,
> +            urg     : 1,
> +            im      : 1;
> +        u8  vector;
> +        u8  res_3;
> +        u32 res_4   : 6,
> +            pda_l   : 26;
> +        u16 sid;
> +        u16 sq      : 2,
> +            svt     : 2,
> +            res_5   : 12;
> +        u32 pda_h;
> +    } post;
> +  };
> +};
> +
> +- Introduce a new global vector which is used to wake up the blocked vCPU.
> +
> +Currently, there is a global vector 'posted_intr_vector', which is used as the
> +global notification vector for all vCPUs in the system. This vector is stored in
> +VMCS and CPU considers it as a _special_ vector, uses it to notify the related
> +pCPU when an interrupt is recorded in the posted-interrupt descriptor.
> +
> +This existing global vector is a _special_ vector to CPU, CPU handle it in a
> +_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> +for more information about how CPU handles it.
> +
> +After having VT-d PI, VT-d engine can issue notification event when the
> +assigned devices issue interrupts. We need add a new global vector to
> +wakeup the blocked vCPU, please refer to later section in this design for
> +how to use this new global vector.
> +
> +- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
> +After VT-d PI is introduced, the format of IRTE is changed as follows:
> +	Descriptor Address: the address of the posted-interrupt descriptor
> +	Virtual Vector: the guest vector of the interrupt
> +	URG: indicates if the interrupt is urgent
> +	Other fields continue to have the same meaning
> +
> +'Descriptor Address' tells the destination vCPU of this interrupt, since
> +each vCPU has a dedicated posted-interrupt descriptor.
> +
> +'Virtual Vector' tells the guest vector of the interrupt.
> +
> +When guest changes the configuration of the interrupts, such as, the
> +cpu affinity, or the vector, we need to update the associated IRTE accordingly.
> +
> +- Update posted-interrupt descriptor during vCPU scheduling
> +
> +The basic idea here is:
> +1. When vCPU's state is RUNSTATE_running,
> +        - Set 'NV' to 'posted_intr_vector'.
> +        - Clear 'SN' to accept posted-interrupts.
> +        - Set 'NDST' to the pCPU on which the vCPU will be running.
> +2. When vCPU's state is RUNSTATE_blocked,
> +        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> +          related vCPU when posted-interrupt happens for it.
> +          Please refer to the above section about the new global vector.
> +        - Clear 'SN' to accept posted-interrupts
> +3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> +        - Set 'SN' to suppress non-urgent interrupts
> +          (Currently, we only support non-urgent interrupts)
> +         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline
> +         It is not needed to accept posted-interrupt notification event
> +         since we don't change the behavior of scheduler when the interrupt
> +         occurs, we still need wait for the next scheduling of the vCPU.
> +         When external interrupts from assigned devices occur, the interrupts
> +         are recorded in PIR, and will be synced to IRR before VM-Entry.
> +        - Set 'NV' to 'posted_intr_vector'.
> +
> +- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
> +
> +Here is the scenario for the usage of the new global vector:
> +
> +1. vCPU0 is running on pCPU0
> +2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
> +3. An external interrupt from an assigned device occurs for vCPU0, if we
> +still use 'posted_intr_vector' as the notification vector for vCPU0, the
> +notification event for vCPU0 (the event will go to pCPU1) will be consumed
> +by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
> +case is that vCPU0 will never be woken up again since the wakeup event
> +for it is always consumed by other vCPUs incorrectly. So we need introduce
> +another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
> +
> +After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
> +event using this new vector. Since this new vector is not a SPECIAL one to CPU,
> +it is just a normal vector. To CPU, it just receives an normal external interrupt,
> +then we can get control in the handler of this new vector. In this case, hypervisor
> +can do something in it, such as wakeup the blocked vCPU.
> +
> +Here are what we do for the blocked vCPU:
> +1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
> +vCPU on the pCPU.
> +2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU
> +to the per-cpu list belonging to the pCPU it was running.
> +3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
> +
> +In the handler of 'pi_wakeup_vector', we do:
> +1. Get the physical CPU.
> +2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
> +we unblock the associated vCPU.
> +
> +- New boot command line for Xen, which controls VT-d PI feature by user.
> +
> +Like 'intremap' for interrupt remapping, we add a new boot command line
> +'intpost' for posted-interrupts.
> +
> +- Multicast/broadcast and lowest priority interrupts consideration.
> +
> +With VT-d PI, the destination vCPU information of an external interrupt
> +from assigned devices is stored in IRTE, this makes the following
> +consideration of the design:
> +1. Multicast/broadcast interrupts cannot be posted.
> +2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
> +(starting from Nehalem) ignore TPR value, and instead supported two other
> +ways (configurable by BIOS) on how the handle lowest priority interrupts:
> +	A) Round robin: In this method, the chipset simply delivers lowest priority
> +interrupts in a round-robin manner across all the available logical CPUs. While
> +this provides good load balancing, this was not the best thing to do always as
> +interrupts from the same device (like NIC) will start running on all the CPUs
> +thrashing caches and taking locks. This led to the next scheme.
> +	B) Vector hashing: In this method, hardware would apply a hash function
> +on the vector value in the interrupt request, and use that hash to pick a logical
> +CPU to route the lowest priority interrupt. This way, a given vector always goes
> +to the same logical CPU, avoiding the thrashing problem above.
> +
> +So, gist of above is that, lowest priority interrupts has never been delivered as
> +"lowest priority" in physical hardware.
> +
> +Vector hashing is used in this design.

There appears to be a race condition here.  Consider the following setup.

VCPU A with PID A
VCPU B with PID B

Both VCPUs have Posted Interrupts set up and devices assigned.

VCPU A is running in non-root mode on CPU X, and VCPU B is on the
blocked list for CPU X.  i.e. Both PID A and B have the same ndst and nv.

An interrupt arrives to PID B, setting B.ON and sending a notification
vector for CPU X.

At this point, according to the spec, CPU X will collect interrupt
information from PID A and inject any vectors locally.

This this accurate?  If so, what causes a #VMEXIT to occur and for Xen
to service the blocked list on CPU X?

~Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design
  2015-10-12 10:41   ` Andrew Cooper
@ 2015-10-13  0:53     ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-13  0:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Jan Beulich, Zhang, Yang Z,
	Keir Fraser



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Monday, October 12, 2015 6:42 PM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Tian, Kevin; Zhang, Yang Z; Jan Beulich; Keir Fraser; George Dunlap
> Subject: Re: [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design
> 
> On 12/10/15 09:54, Feng Wu wrote:
> > Add the design doc for VT-d PI.
> >
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: Yang Zhang <yang.z.zhang@intel.com>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: George Dunlap <george.dunlap@eu.citrix.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  docs/misc/vtd-pi.txt | 332
> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 332 insertions(+)
> >  create mode 100644 docs/misc/vtd-pi.txt
> >
> There appears to be a race condition here.  Consider the following setup.
> 
> VCPU A with PID A
> VCPU B with PID B
> 
> Both VCPUs have Posted Interrupts set up and devices assigned.
> 
> VCPU A is running in non-root mode on CPU X, and VCPU B is on the
> blocked list for CPU X.  i.e. Both PID A and B have the same ndst and nv.

vCPUA is running in non-root mode, so its nv is 'posted_intr_vector', and
vCPUS is blocked, so its nv is 'pi_wakeup_vector', the NV is different, so
the situation you describe cannot happen.

Thanks,
Feng

> 
> An interrupt arrives to PID B, setting B.ON and sending a notification
> vector for CPU X.
> 
> At this point, according to the spec, CPU X will collect interrupt
> information from PID A and inject any vectors locally.
> 
> This this accurate?  If so, what causes a #VMEXIT to occur and for Xen
> to service the blocked list on CPU X?
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-12  8:54 ` [PATCH v8 02/17] Add cmpxchg16b support for x86-64 Feng Wu
@ 2015-10-13 15:29   ` Jan Beulich
  2015-10-14  5:57     ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-13 15:29 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 12.10.15 at 10:54, <feng.wu@intel.com> wrote:
> --- a/xen/include/asm-x86/x86_64/system.h
> +++ b/xen/include/asm-x86/x86_64/system.h
> @@ -6,6 +6,39 @@
>                                     (unsigned long)(n),sizeof(*(ptr))))
>  
>  /*
> + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> + * identical, store NEW in MEM.  Return the initial value in MEM.
> + * Success is indicated by comparing RETURN with OLD.
> + *
> + * This function can only be called when cpu_has_cx16 is true.
> + */
> +
> +static always_inline __uint128_t __cmpxchg16b(
> +    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
> +{
> +    __uint128_t prev;
> +    uint64_t new_high = *new >> 64;
> +    uint64_t new_low = *new;
> +
> +    ASSERT(cpu_has_cx16);
> +
> +    asm volatile ( "lock; cmpxchg16b %3"
> +                   : "=A" (prev)
> +                   : "c" (new_high), "b" (new_low),
> +                     "m" (*__xg(ptr)), "0" (*old) );

I was about to apply this when I spotted that this is still wrong:
The (requested) removal of the memory clobber requires that
the memory location (all 16 bytes of it) appear as output. With
__xg() covering 100 bytes, it should be possible to simply make
the operand an in- and output ("+m") instead of just an input.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-13 15:29   ` Jan Beulich
@ 2015-10-14  5:57     ` Wu, Feng
  2015-10-14  9:05       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-14  5:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, October 13, 2015 11:29 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
> Keir Fraser <keir@xen.org>
> Subject: Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
> 
> >>> On 12.10.15 at 10:54, <feng.wu@intel.com> wrote:
> > --- a/xen/include/asm-x86/x86_64/system.h
> > +++ b/xen/include/asm-x86/x86_64/system.h
> > @@ -6,6 +6,39 @@
> >                                     (unsigned long)(n),sizeof(*(ptr))))
> >
> >  /*
> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> > + * identical, store NEW in MEM.  Return the initial value in MEM.
> > + * Success is indicated by comparing RETURN with OLD.
> > + *
> > + * This function can only be called when cpu_has_cx16 is true.
> > + */
> > +
> > +static always_inline __uint128_t __cmpxchg16b(
> > +    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
> > +{
> > +    __uint128_t prev;
> > +    uint64_t new_high = *new >> 64;
> > +    uint64_t new_low = *new;
> > +
> > +    ASSERT(cpu_has_cx16);
> > +
> > +    asm volatile ( "lock; cmpxchg16b %3"
> > +                   : "=A" (prev)
> > +                   : "c" (new_high), "b" (new_low),
> > +                     "m" (*__xg(ptr)), "0" (*old) );
> 
> I was about to apply this when I spotted that this is still wrong:
> The (requested) removal of the memory clobber requires that
> the memory location (all 16 bytes of it) appear as output. 

This is the description about memory clobber from gcc online docs:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers

"The "memory" clobber tells the compiler that the assembly code
performs memory reads or writes to items other than those listed
in the input and output operands"

My understanding to it is if the variable is listed as input of output,
we don't need to add memory clobber. We need to add it only
when it is not listed in the intput/output and the memory may be
changed behind us.

What is your understanding about it? Thanks a lot!

Thanks,
Feng

> With __xg() covering 100 bytes, it should be possible to simply make
> the operand an in- and output ("+m") instead of just an input.
> 
> Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14  5:57     ` Wu, Feng
@ 2015-10-14  9:05       ` Jan Beulich
  2015-10-14  9:29         ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-14  9:05 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 14.10.15 at 07:57, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, October 13, 2015 11:29 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
>> Keir Fraser <keir@xen.org>
>> Subject: Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
>> 
>> >>> On 12.10.15 at 10:54, <feng.wu@intel.com> wrote:
>> > --- a/xen/include/asm-x86/x86_64/system.h
>> > +++ b/xen/include/asm-x86/x86_64/system.h
>> > @@ -6,6 +6,39 @@
>> >                                     (unsigned long)(n),sizeof(*(ptr))))
>> >
>> >  /*
>> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
>> > + * identical, store NEW in MEM.  Return the initial value in MEM.
>> > + * Success is indicated by comparing RETURN with OLD.
>> > + *
>> > + * This function can only be called when cpu_has_cx16 is true.
>> > + */
>> > +
>> > +static always_inline __uint128_t __cmpxchg16b(
>> > +    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
>> > +{
>> > +    __uint128_t prev;
>> > +    uint64_t new_high = *new >> 64;
>> > +    uint64_t new_low = *new;
>> > +
>> > +    ASSERT(cpu_has_cx16);
>> > +
>> > +    asm volatile ( "lock; cmpxchg16b %3"
>> > +                   : "=A" (prev)
>> > +                   : "c" (new_high), "b" (new_low),
>> > +                     "m" (*__xg(ptr)), "0" (*old) );
>> 
>> I was about to apply this when I spotted that this is still wrong:
>> The (requested) removal of the memory clobber requires that
>> the memory location (all 16 bytes of it) appear as output. 
> 
> This is the description about memory clobber from gcc online docs:
> https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers 
> 
> "The "memory" clobber tells the compiler that the assembly code
> performs memory reads or writes to items other than those listed
> in the input and output operands"
> 
> My understanding to it is if the variable is listed as input of output,
> we don't need to add memory clobber. We need to add it only
> when it is not listed in the intput/output and the memory may be
> changed behind us.

Did you misunderstand my reply? I'm not asking the memory clobber
to be re-added (after all it was me who asked for it to be removed).
Instead I'm pointing out that with the removal of the clobber the
memory operand needs to become an input and output one, instead
of just an input (which would make the compiler believe the memory
location's contents don't change).

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14  9:05       ` Jan Beulich
@ 2015-10-14  9:29         ` Wu, Feng
  2015-10-14  9:45           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-14  9:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, October 14, 2015 5:06 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
> Keir Fraser <keir@xen.org>
> Subject: RE: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
> 
> >>> On 14.10.15 at 07:57, <feng.wu@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Tuesday, October 13, 2015 11:29 PM
> >> To: Wu, Feng <feng.wu@intel.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
> >> Keir Fraser <keir@xen.org>
> >> Subject: Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
> >>
> >> >>> On 12.10.15 at 10:54, <feng.wu@intel.com> wrote:
> >> > --- a/xen/include/asm-x86/x86_64/system.h
> >> > +++ b/xen/include/asm-x86/x86_64/system.h
> >> > @@ -6,6 +6,39 @@
> >> >                                     (unsigned long)(n),sizeof(*(ptr))))
> >> >
> >> >  /*
> >> > + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> >> > + * identical, store NEW in MEM.  Return the initial value in MEM.
> >> > + * Success is indicated by comparing RETURN with OLD.
> >> > + *
> >> > + * This function can only be called when cpu_has_cx16 is true.
> >> > + */
> >> > +
> >> > +static always_inline __uint128_t __cmpxchg16b(
> >> > +    volatile void *ptr, const __uint128_t *old, const __uint128_t *new)
> >> > +{
> >> > +    __uint128_t prev;
> >> > +    uint64_t new_high = *new >> 64;
> >> > +    uint64_t new_low = *new;
> >> > +
> >> > +    ASSERT(cpu_has_cx16);
> >> > +
> >> > +    asm volatile ( "lock; cmpxchg16b %3"
> >> > +                   : "=A" (prev)
> >> > +                   : "c" (new_high), "b" (new_low),
> >> > +                     "m" (*__xg(ptr)), "0" (*old) );
> >>
> >> I was about to apply this when I spotted that this is still wrong:
> >> The (requested) removal of the memory clobber requires that
> >> the memory location (all 16 bytes of it) appear as output.
> >
> > This is the description about memory clobber from gcc online docs:
> > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers
> >
> > "The "memory" clobber tells the compiler that the assembly code
> > performs memory reads or writes to items other than those listed
> > in the input and output operands"
> >
> > My understanding to it is if the variable is listed as input of output,
> > we don't need to add memory clobber. We need to add it only
> > when it is not listed in the intput/output and the memory may be
> > changed behind us.
> 
> Did you misunderstand my reply? I'm not asking the memory clobber
> to be re-added (after all it was me who asked for it to be removed).
> Instead I'm pointing out that with the removal of the clobber the
> memory operand needs to become an input and output one, instead
> of just an input (which would make the compiler believe the memory
> location's contents don't change).

I understand your comments above. My understanding about the gcc
online doc is we don't need to add memory clobber when the variable
is either an input _or_ output one. However, seems your options is
memory clobber can be removed only when it is an input _and_ output
one. It would be highly appreciated if you can correct me if I understand
the doc incorrectly.

Thanks,
Feng 

> 
> Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14  9:29         ` Wu, Feng
@ 2015-10-14  9:45           ` Jan Beulich
  2015-10-14 10:03             ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-14  9:45 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 14.10.15 at 11:29, <feng.wu@intel.com> wrote:
> I understand your comments above. My understanding about the gcc
> online doc is we don't need to add memory clobber when the variable
> is either an input _or_ output one. However, seems your options is
> memory clobber can be removed only when it is an input _and_ output
> one. It would be highly appreciated if you can correct me if I understand
> the doc incorrectly.

Yes, I'm pretty convinced you read it wrong: "performs memory
reads or writes to items other than those listed in the input and
output operands" means _any_ operation not expressed by the
ams operands. The two sentence following that actually explain
this in more detail. IOW - unless your operands correctly express
_everything_ the assembly instruction(s) do(es), you need a
clobber (and this is not limited to memory).

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14  9:45           ` Jan Beulich
@ 2015-10-14 10:03             ` Wu, Feng
  2015-10-14 10:21               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-14 10:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, October 14, 2015 5:46 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
> Keir Fraser <keir@xen.org>
> Subject: RE: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
> 
> >>> On 14.10.15 at 11:29, <feng.wu@intel.com> wrote:
> > I understand your comments above. My understanding about the gcc
> > online doc is we don't need to add memory clobber when the variable
> > is either an input _or_ output one. However, seems your options is
> > memory clobber can be removed only when it is an input _and_ output
> > one. It would be highly appreciated if you can correct me if I understand
> > the doc incorrectly.
> 
> Yes, I'm pretty convinced you read it wrong: "performs memory
> reads or writes to items other than those listed in the input and
> output operands" means _any_ operation not expressed by the
> ams operands. The two sentence following that actually explain
> this in more detail. IOW - unless your operands correctly express
> _everything_ the assembly instruction(s) do(es), you need a
> clobber (and this is not limited to memory).

Thanks for the clarification. So do you think it is better to send only
this single patch after fixing the issue in your comments, or wait
for the next version of this whole patch set?

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14 10:03             ` Wu, Feng
@ 2015-10-14 10:21               ` Jan Beulich
  2015-10-14 10:25                 ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-14 10:21 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 14.10.15 at 12:03, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, October 14, 2015 5:46 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org;
>> Keir Fraser <keir@xen.org>
>> Subject: RE: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
>> 
>> >>> On 14.10.15 at 11:29, <feng.wu@intel.com> wrote:
>> > I understand your comments above. My understanding about the gcc
>> > online doc is we don't need to add memory clobber when the variable
>> > is either an input _or_ output one. However, seems your options is
>> > memory clobber can be removed only when it is an input _and_ output
>> > one. It would be highly appreciated if you can correct me if I understand
>> > the doc incorrectly.
>> 
>> Yes, I'm pretty convinced you read it wrong: "performs memory
>> reads or writes to items other than those listed in the input and
>> output operands" means _any_ operation not expressed by the
>> ams operands. The two sentence following that actually explain
>> this in more detail. IOW - unless your operands correctly express
>> _everything_ the assembly instruction(s) do(es), you need a
>> clobber (and this is not limited to memory).
> 
> Thanks for the clarification. So do you think it is better to send only
> this single patch after fixing the issue in your comments, or wait
> for the next version of this whole patch set?

I don't know - I didn't get around to look at the rest of the series yet.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 02/17] Add cmpxchg16b support for x86-64
  2015-10-14 10:21               ` Jan Beulich
@ 2015-10-14 10:25                 ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-14 10:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, Wu, Feng, xen-devel

> >>
> >> >>> On 14.10.15 at 11:29, <feng.wu@intel.com> wrote:
> >> > I understand your comments above. My understanding about the gcc
> >> > online doc is we don't need to add memory clobber when the variable
> >> > is either an input _or_ output one. However, seems your options is
> >> > memory clobber can be removed only when it is an input _and_ output
> >> > one. It would be highly appreciated if you can correct me if I understand
> >> > the doc incorrectly.
> >>
> >> Yes, I'm pretty convinced you read it wrong: "performs memory
> >> reads or writes to items other than those listed in the input and
> >> output operands" means _any_ operation not expressed by the
> >> ams operands. The two sentence following that actually explain
> >> this in more detail. IOW - unless your operands correctly express
> >> _everything_ the assembly instruction(s) do(es), you need a
> >> clobber (and this is not limited to memory).
> >
> > Thanks for the clarification. So do you think it is better to send only
> > this single patch after fixing the issue in your comments, or wait
> > for the next version of this whole patch set?
> 
> I don't know - I didn't get around to look at the rest of the series yet.

No problem, then I will wait for your comments about other patches in
this series. 

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (16 preceding siblings ...)
  2015-10-12  8:55 ` [PATCH v8 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-10-23  2:12 ` Wu, Feng
  2015-10-23  8:13   ` Jan Beulich
  2015-10-29 10:51 ` Dario Faggioli
  18 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-23  2:12 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, dario.faggioli, Wu, Feng, Jan Beulich (JBeulich@suse.com)

Kindly ping ...

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Monday, October 12, 2015 4:55 PM
> To: xen-devel@lists.xen.org
> Cc: Wu, Feng <feng.wu@intel.com>
> Subject: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-
> technology/vt-directed-io-spec.html
> 
> Feng Wu (17):
>   VT-d Posted-intterrupt (PI) design
>   Add cmpxchg16b support for x86-64
>   iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
>   vt-d: VT-d Posted-Interrupts feature detection
>   vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
>   vmx: Add some helper functions for Posted-Interrupts
>   vmx: Initialize VT-d Posted-Interrupts Descriptor
>   vmx: Suppress posting interrupts when 'SN' is set
>   VT-d: Remove pointless casts
>   vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
>   vt-d: Add API to update IRTE when VT-d PI is used
>   x86: move some APIC related macros to apicdef.h
>   Update IRTE according to guest interrupt config changes
>   vmx: Properly handle notification event when vCPU is running
>   vmx: VT-d posted-interrupt core logic handling
>   VT-d: Dump the posted format IRTE
>   Add a command line parameter for VT-d posted-interrupts
> 
>  docs/misc/vtd-pi.txt                   | 332 +++++++++++++++++++++++++++++++
>  docs/misc/xen-command-line.markdown    |   9 +-
>  xen/arch/x86/domain.c                  |  12 ++
>  xen/arch/x86/hvm/hvm.c                 |  18 ++
>  xen/arch/x86/hvm/vlapic.c              |   5 -
>  xen/arch/x86/hvm/vmx/vmcs.c            |  24 +++
>  xen/arch/x86/hvm/vmx/vmx.c             | 348
> ++++++++++++++++++++++++++++++++-
>  xen/common/schedule.c                  |   9 +
>  xen/drivers/passthrough/io.c           | 123 +++++++++++-
>  xen/drivers/passthrough/iommu.c        |  16 +-
>  xen/drivers/passthrough/vtd/intremap.c | 212 +++++++++++++++-----
>  xen/drivers/passthrough/vtd/iommu.c    |  14 +-
>  xen/drivers/passthrough/vtd/iommu.h    |  51 +++--
>  xen/drivers/passthrough/vtd/utils.c    |  40 ++--
>  xen/include/asm-arm/domain.h           |   4 +
>  xen/include/asm-x86/apicdef.h          |   3 +
>  xen/include/asm-x86/domain.h           |   4 +
>  xen/include/asm-x86/hvm/hvm.h          |   4 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h     |  25 ++-
>  xen/include/asm-x86/hvm/vmx/vmx.h      |  27 +++
>  xen/include/asm-x86/iommu.h            |   2 +
>  xen/include/asm-x86/x86_64/system.h    |  33 ++++
>  xen/include/xen/iommu.h                |   2 +-
>  23 files changed, 1229 insertions(+), 88 deletions(-)
>  create mode 100644 docs/misc/vtd-pi.txt
> 
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-23  2:12 ` [PATCH v8 00/17] Add VT-d Posted-Interrupts support Wu, Feng
@ 2015-10-23  8:13   ` Jan Beulich
  2015-10-23  8:35     ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-23  8:13 UTC (permalink / raw)
  To: Feng Wu; +Cc: george.dunlap, Lars Kurth, dario.faggioli, Susie Li, xen-devel

>>> On 23.10.15 at 04:12, <feng.wu@intel.com> wrote:
> Kindly ping ...

I can understand that you want to get this to make progress, but as
said a number of times before (in variouw contexts) - as long as Intel
contributors in general and Intel maintainers in particular won't
become more active in reviewing patches, I don't think you can
expect the few of us to carry the load of reviewing all these large
series that have been submitted by Intel during the last couple of
months.

Thanks for your understanding; your series has not been forgotten,
but its (and other new feature) review just has to have lower priority
compared to fixing bugs, reviewing bug fixes, and other day to day
work.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-23  8:13   ` Jan Beulich
@ 2015-10-23  8:35     ` Wu, Feng
  2015-10-23  8:46       ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-23  8:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Lars Kurth, Wu, Feng, george.dunlap, dario.faggioli, Li, Susie,
	xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, October 23, 2015 4:14 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: dario.faggioli@citrix.com; Lars Kurth <lars.kurth@citrix.com>;
> george.dunlap@eu.citrix.com; Li, Susie <susie.li@intel.com>; xen-
> devel@lists.xen.org
> Subject: RE: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
> 
> >>> On 23.10.15 at 04:12, <feng.wu@intel.com> wrote:
> > Kindly ping ...
> 
> I can understand that you want to get this to make progress, but as
> said a number of times before (in variouw contexts) - as long as Intel
> contributors in general and Intel maintainers in particular won't
> become more active in reviewing patches, I don't think you can
> expect the few of us to carry the load of reviewing all these large
> series that have been submitted by Intel during the last couple of
> months.

Yes, this is a large series, but most of them have been reviewed, right?
And thanks to your review efforts on them. Now I think the remaining
review effort will need be put to patch
"[PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling",
which was understand discussion among you, George, Dario, and me
during the last couple of weeks, since the memory is still fresh, why
not continue to review it? And I think maybe it is more efficient to do
it now.

Thanks,
Feng

> 
> Thanks for your understanding; your series has not been forgotten,
> but its (and other new feature) review just has to have lower priority
> compared to fixing bugs, reviewing bug fixes, and other day to day
> work.
> 
> Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-23  8:35     ` Wu, Feng
@ 2015-10-23  8:46       ` Dario Faggioli
  2015-10-23  8:52         ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-23  8:46 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich; +Cc: george.dunlap, Lars Kurth, Li, Susie, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2094 bytes --]

On Fri, 2015-10-23 at 08:35 +0000, Wu, Feng wrote:
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > > > On 23.10.15 at 04:12, <feng.wu@intel.com> wrote:
> > > Kindly ping ...
> > 
> > I can understand that you want to get this to make progress, but as
> > said a number of times before (in variouw contexts) - as long as
> > Intel
> > contributors in general and Intel maintainers in particular won't
> > become more active in reviewing patches, I don't think you can
> > expect the few of us to carry the load of reviewing all these large
> > series that have been submitted by Intel during the last couple of
> > months.
> 
> Yes, this is a large series, but most of them have been reviewed,
> right?
> And thanks to your review efforts on them. Now I think the remaining
> review effort will need be put to patch
> "[PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling",
> which was understand discussion among you, George, Dario, and me
> during the last couple of weeks, since the memory is still fresh, why
> not continue to review it? 
>
I'll have a look ASAP, probably today.

However, "why not continue to review it", I think Jan put down quite
some sensible arguments for it. Also, as he said, there is no chance
that we _don't_ continue to review the series, it's just a matter of
priorities, which are influenced by a lot of factors.

> And I think maybe it is more efficient to do
> it now.
> 
Hey, everyone of us has his own workflow, and his own definitions of
what's "more efficient". :-)

Given the timing, I think it's after all fine that you "Kindly ping"
this series... Just don't get too far with that! :-D :-D

Anyway, and jokes apart, as said, I was already planning to have a look
at this soon.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-23  8:46       ` Dario Faggioli
@ 2015-10-23  8:52         ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-23  8:52 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: george.dunlap, Lars Kurth, Li, Susie, Wu, Feng, xen-devel



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Friday, October 23, 2015 4:46 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: Lars Kurth <lars.kurth@citrix.com>; george.dunlap@eu.citrix.com; Li, Susie
> <susie.li@intel.com>; xen-devel@lists.xen.org
> Subject: Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
> 
> On Fri, 2015-10-23 at 08:35 +0000, Wu, Feng wrote:
> >
> > > -----Original Message-----
> > > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > > > > On 23.10.15 at 04:12, <feng.wu@intel.com> wrote:
> > > > Kindly ping ...
> > >
> Hey, everyone of us has his own workflow, and his own definitions of
> what's "more efficient". :-)
> 
> Given the timing, I think it's after all fine that you "Kindly ping"
> this series... Just don't get too far with that! :-D :-D
> 
> Anyway, and jokes apart, as said, I was already planning to have a look
> at this soon.
> 

Thanks a lot, great to hear that!:)

Thanks,
Feng

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-12  8:55 ` [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-10-26 14:39   ` Dario Faggioli
  2015-10-27  5:19     ` Wu, Feng
  2015-10-27 12:22   ` Dario Faggioli
  1 sibling, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-26 14:39 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 7406 bytes --]

Hey,

Here I am reviewing this patch, sorry for the delay.

Ok, we have discussed a lot about all this, and in fact I had to go
back in my mail archive and re-read the rather long sub-thread for this
patch in v7. :-)

Also, in that thread, I found (as I was recalling there being) a couple of open questions, one even pointing to the possibility of adopting a different design for this part of the code, which I am not sure could have been considered a closed matter.

In any case, it would have been nice, given the situation, if you'd have put a few words about, e.g., which solution you ended up implementing and why, either in the cover or here (e.g., in the '---' area).

From the design point of view, I said during v7 that I don't dislike having some of the things that this feature requires dpme in (VMX specific part of) the context switch path, and that is still valid.

What I really don't like much is this blocking cancellation hook you have introduced.

I mean...

On Mon, 2015-10-12 at 16:55 +0800, Feng Wu wrote:
> - Add the following hooks, this part was suggested
>   by George Dunlap <george.dunlap@eu.citrix.com> and
>   Dario Faggioli <dario.faggioli@citrix.com>.
>     * arch_vcpu_block()
>       Called alled before vcpu is blocking and update the PID
>       (posted-interrupt descriptor).
> 
>     * arch_vcpu_block_cancel()
>       Called when interrupts come in during blocking.
> 
... This one.

Reason is, hooks are not, IMO, among the nicest things. You have to
remember to call them, you have to put the call to them in the proper
place, etc., when writing the code. OTOH, when reading the code, they
break the flow and force one to go and figure out what happens in
potentially not so related areas. In summary, they're hard to get
right. :-/

That being said, I can live with this, but I wonder whether we really
can't do without. For instance, Jan said in the v7 thread:

  "Couldn't this be taken care of by, if necessary, adjusting PI state
   in vmx_do_resume()?"

This is actually what started the sub-sub-thread about the alternative
design of doing everything during VMENTERs/VMEXITs. If you are
unconvinced about going that path all the way, would at least do the
fixup in there (i.e., taking care of the case where we called
arch_vcpu_block() but then we did not block) work and make sense?

Actually, I think even another possible implementation variant that was
suggested at some point (by George, in this case, for other reasons and
purposes) could make this adding this hook unnecessary, i.e.:

  "vcpu_block()
    set(_VPF_blocked)
    local_events_need_delivery()
      hvm_vcpu_has_pending_irq()
    ...
  context_switch
     v->arch.block()
      - Add v to pcpu.pi_blocked_vcpu
      - NV => pi_wakeup_vector

  If we do it [this] way, and an interrupt comes in before the context
  switch is finished, it will call posted_intr_vector.  We can, at that
  point, check to see if the current vcpu is marked as blocked.  If it 
  is, we can call vcpu_unblock() without having to modify NV or worry 
  about adding / removing the vcpu from the pi_blocked_vcpu list."

At the time, I "voted against" this design, because it seemed we could
manage to handle interrupt ('regular' and posted) happening during
blocking in one and unified way, and with _only_ arch_vcpu_block(). If
that is no longer the case (and it's not, as we're adding more hooks,
and the need to call the second is a special case being introduced by
PI), it may be worth reconsidering things...

So, all in all, I don't know. As said, I don't like this cancellation
hook because it's one more hook and because --while I see why it's
useful in this specific case-- I don't like having it in generic code
(in schedule.c), and even less having it called in two places
(vcpu_block() and do_pool()). However, if others (Jan and George, I
guess) are not equally concerned about it, I can live with it.

Thoughts?

>     * vmx_pi_switch_from()
>       Called before context switch, we update the PID when the
>       vCPU is preempted or going to sleep.
> 
>     * vmx_pi_switch_to()
>       Called after context switch, we update the PID when the vCPU
>       is going to run.
> 
>     * arch_vcpu_wake_prepare()
>       It will be called when waking up the vCPU, we update
>       the posted interrupt descriptor when the vCPU is
>       unblocked.
> 
The rest of the patch seems fine to me (at least the scheduling related
implications).

Just a few (pretty minor) comments.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1608,6 +1608,18 @@ void context_switch(struct vcpu *prev, struct
> vcpu *next)
>      if ( (per_cpu(curr_vcpu, cpu) == next) ||
>           (is_idle_domain(nextd) && cpu_online(cpu)) )
>      {
> +        /*
> +         * When we handle the lazy context switch for the following
> +         * two scenarios:
> +         * - Preempted by a tasklet, which uses in an idle context
> +         * - the prev vcpu is in offline and no new available vcpus
> in run queue
> +         * We don't change the 'SN' bit in posted-interrupt
> descriptor, this
> +         * may incur spurious PI notification events, but since PI
> notification
> +         * event is only sent when 'ON' is clear, and once the PI
> notificatoin
> +         * is sent, ON is set by hardware, so not so many spurious
> events and
> +         * it is not a big deal.
> +         */
> +
>          local_irq_enable();
>      }
>
This comment: can't it leave somewhere else, more VMX and/or PI
related?

I know this is arch code already but, still, if I'm here, I am reading
and trying to understand how context switch works, potentially, not
being interested in PI at all... And yet I find this doc comment,
talking about some SN and ON bits, without even defining what they are
and what they mean. :-/

Really, I'm not saying we shouldn't have it. On the contrary, it has
some valuable content in it. Can we just find another place where to
put it?

Also, about the content. The last part, when it talks about spurious
interrupts, it says they're not a problem because we won't get that
many. I think that someone not very familiar with this things could use
being also told that it is ok/safe to get them (i.e., they don't get
lost, etc.). There's an email from George that explain this quite well.
I'd also be ok with this particular thing going in the patch changelog,
rather than in a comment, as far as it is somewhere.

> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 3eefed7..383fd62 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c

> @@ -800,10 +802,13 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> +    arch_vcpu_block(v);
> +
>
This is maybe not so big of a deal but, since we call this pretty early
in the blocking path, and _especially_ if we are to keep the
cancellation hook, we may want to consider arch_vcpu_block_prepare()
(as we did for wake).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-26 14:39   ` Dario Faggioli
@ 2015-10-27  5:19     ` Wu, Feng
  2015-10-27  9:51       ` Jan Beulich
  2015-10-27 12:16       ` Dario Faggioli
  0 siblings, 2 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-27  5:19 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 9208 bytes --]



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Monday, October 26, 2015 10:40 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> Hey,
> 
> Here I am reviewing this patch, sorry for the delay.

Thanks a lot for the review!

> 
> Ok, we have discussed a lot about all this, and in fact I had to go
> back in my mail archive and re-read the rather long sub-thread for this
> patch in v7. :-)

Thanks a lot, and that's why I suggested to review this patch when the memory
is fresh, since there was a long discussion about this patch in the series.

> 
> Also, in that thread, I found (as I was recalling there being) a couple of open
> questions, one even pointing to the possibility of adopting a different design for
> this part of the code, which I am not sure could have been considered a closed
> matter.
> 
> In any case, it would have been nice, given the situation, if you'd have put a few
> words about, e.g., which solution you ended up implementing and why, either in
> the cover or here (e.g., in the '---' area).

Thanks for the suggestion. As I mentioned before, updating the PI descriptor needs
to be atomic, I think it should be a little expensive. So doing it every VM-exit seems
not a good idea to me.

> 
> From the design point of view, I said during v7 that I don't dislike having some of
> the things that this feature requires dpme in (VMX specific part of) the context
> switch path, and that is still valid.
> 
> What I really don't like much is this blocking cancellation hook you have
> introduced.
> 
> I mean...
> 
> On Mon, 2015-10-12 at 16:55 +0800, Feng Wu wrote:
> > - Add the following hooks, this part was suggested
> >   by George Dunlap <george.dunlap@eu.citrix.com> and
> >   Dario Faggioli <dario.faggioli@citrix.com>.
> >     * arch_vcpu_block()
> >       Called alled before vcpu is blocking and update the PID
> >       (posted-interrupt descriptor).
> >
> >     * arch_vcpu_block_cancel()
> >       Called when interrupts come in during blocking.
> >
> ... This one.
> 
> Reason is, hooks are not, IMO, among the nicest things. You have to
> remember to call them, you have to put the call to them in the proper
> place, etc., when writing the code. OTOH, when reading the code, they
> break the flow and force one to go and figure out what happens in
> potentially not so related areas. In summary, they're hard to get
> right. :-/
> 
> That being said, I can live with this, but I wonder whether we really
> can't do without. For instance, Jan said in the v7 thread:
> 
>   "Couldn't this be taken care of by, if necessary, adjusting PI state
>    in vmx_do_resume()?"
> 
> This is actually what started the sub-sub-thread about the alternative
> design of doing everything during VMENTERs/VMEXITs. If you are
> unconvinced about going that path all the way, would at least do the
> fixup in there (i.e., taking care of the case where we called
> arch_vcpu_block() but then we did not block) work and make sense?
> 
> Actually, I think even another possible implementation variant that was
> suggested at some point (by George, in this case, for other reasons and
> purposes) could make this adding this hook unnecessary, i.e.:
> 
>   "vcpu_block()
>     set(_VPF_blocked)
>     local_events_need_delivery()
>       hvm_vcpu_has_pending_irq()
>     ...
>   context_switch
>      v->arch.block()
>       - Add v to pcpu.pi_blocked_vcpu
>       - NV => pi_wakeup_vector
> 
>   If we do it [this] way, and an interrupt comes in before the context
>   switch is finished, it will call posted_intr_vector.  We can, at that
>   point, check to see if the current vcpu is marked as blocked.  If it
>   is, we can call vcpu_unblock() without having to modify NV or worry
>   about adding / removing the vcpu from the pi_blocked_vcpu list."

This is something similar with patch v7 and before, doing vcpu block
during context switch, and seems during the discussion, you guys
prefer doing the vcpu blocking things outside context switch.

> 
> At the time, I "voted against" this design, because it seemed we could
> manage to handle interrupt ('regular' and posted) happening during
> blocking in one and unified way, and with _only_ arch_vcpu_block(). If
> that is no longer the case (and it's not, as we're adding more hooks,
> and the need to call the second is a special case being introduced by
> PI), it may be worth reconsidering things...
> 
> So, all in all, I don't know. As said, I don't like this cancellation
> hook because it's one more hook and because --while I see why it's
> useful in this specific case-- I don't like having it in generic code
> (in schedule.c), and even less having it called in two places
> (vcpu_block() and do_pool()). However, if others (Jan and George, I
> guess) are not equally concerned about it, I can live with it.
> 
> Thoughts?

If I understand it correctly, this block cancel method was suggested
by George, please refer to the attached email. George, what is your
opinion about it? It is better to discuss a clear solution before I
continue to post another version. Thanks a lot!

> 
> >     * vmx_pi_switch_from()
> >       Called before context switch, we update the PID when the
> >       vCPU is preempted or going to sleep.
> >
> >     * vmx_pi_switch_to()
> >       Called after context switch, we update the PID when the vCPU
> >       is going to run.
> >
> >     * arch_vcpu_wake_prepare()
> >       It will be called when waking up the vCPU, we update
> >       the posted interrupt descriptor when the vCPU is
> >       unblocked.
> >
> The rest of the patch seems fine to me (at least the scheduling related
> implications).
> 
> Just a few (pretty minor) comments.
> 
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -1608,6 +1608,18 @@ void context_switch(struct vcpu *prev, struct
> > vcpu *next)
> >      if ( (per_cpu(curr_vcpu, cpu) == next) ||
> >           (is_idle_domain(nextd) && cpu_online(cpu)) )
> >      {
> > +        /*
> > +         * When we handle the lazy context switch for the following
> > +         * two scenarios:
> > +         * - Preempted by a tasklet, which uses in an idle context
> > +         * - the prev vcpu is in offline and no new available vcpus
> > in run queue
> > +         * We don't change the 'SN' bit in posted-interrupt
> > descriptor, this
> > +         * may incur spurious PI notification events, but since PI
> > notification
> > +         * event is only sent when 'ON' is clear, and once the PI
> > notificatoin
> > +         * is sent, ON is set by hardware, so not so many spurious
> > events and
> > +         * it is not a big deal.
> > +         */
> > +
> >          local_irq_enable();
> >      }
> >
> This comment: can't it leave somewhere else, more VMX and/or PI
> related?
> 
> I know this is arch code already but, still, if I'm here, I am reading
> and trying to understand how context switch works, potentially, not
> being interested in PI at all... And yet I find this doc comment,
> talking about some SN and ON bits, without even defining what they are
> and what they mean. :-/
> 
> Really, I'm not saying we shouldn't have it. On the contrary, it has
> some valuable content in it. Can we just find another place where to
> put it?
> 
> Also, about the content. The last part, when it talks about spurious
> interrupts, it says they're not a problem because we won't get that
> many. I think that someone not very familiar with this things could use
> being also told that it is ok/safe to get them (i.e., they don't get
> lost, etc.). There's an email from George that explain this quite well.
> I'd also be ok with this particular thing going in the patch changelog,
> rather than in a comment, as far as it is somewhere.

Okay, I will put the comments in the patch changelog.

> 
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 3eefed7..383fd62 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> 
> > @@ -800,10 +802,13 @@ void vcpu_block(void)
> >
> >      set_bit(_VPF_blocked, &v->pause_flags);
> >
> > +    arch_vcpu_block(v);
> > +
> >
> This is maybe not so big of a deal but, since we call this pretty early
> in the blocking path, and _especially_ if we are to keep the
> cancellation hook, we may want to consider arch_vcpu_block_prepare()
> (as we did for wake).

Sure!

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #2: Type: message/rfc822, Size: 9680 bytes --]

From: George Dunlap <george.dunlap@citrix.com>
To: "Wu, Feng" <feng.wu@intel.com>, Dario Faggioli <dario.faggioli@citrix.com>, George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>, "Tian, Kevin" <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic handling
Date: Tue, 22 Sep 2015 10:26:41 +0000
Message-ID: <56012CE1.4090308@citrix.com>

On 09/22/2015 08:19 AM, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>> Sent: Monday, September 21, 2015 10:25 PM
>> To: Wu, Feng; George Dunlap; George Dunlap
>> Cc: Jan Beulich; Tian, Kevin; Keir Fraser; Andrew Cooper;
>> xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH v7 15/17] vmx: VT-d posted-interrupt core logic
>> handling
>>
>> On Mon, 2015-09-21 at 12:22 +0000, Wu, Feng wrote:
>>>
>>>> -----Original Message-----
>>>> From: George Dunlap [mailto:george.dunlap@citrix.com]
>>
>>>> You also need to check that local_events_need_delivery() will
>>>> return
>>>> "true" if you get an interrupt between that time and entering the
>>>> hypervisor.  Will that happen automatically from
>>>> hvm_local_events_need_delivery() -> hvm_vcpu_has_pending_irq() ->
>>>> vlapic_has_pending_irq()?  Or will you need to add a hook in
>>>> hvm_vcpu_has_pending_irq()?
>>>
>>> I think I don't need to add hook in hvm_vcpu_has_pending_irq(), what
>>> I need
>>> to do in vcpu_block() and do_poll() is as below:
>>>
>>> 1. set_bit(_VPF_blocked, &v->pause_flags);
>>>
>>> 2. ret = v->arch.arch_block(), in this hook, we can re-use the same
>>> logic in
>>> vmx_pre_ctx_switch_pi(), and check whether ON bit is set during
>>> updating
>>> posted-interrupt descriptor, can return 1 when ON is set
>>>
>> It think it would be ok for an hook to return a value (maybe, if doing
>> that, let's pick variable names and use comments to explain what goes
>> on as good as we can).
>>
>> I think I also see why you seem to prefer doing it that way, rather
>> than hacking local_events_need_delivery(), but can you please elaborate
>> on that? (My feeling is that you want to avoid having to update the
>> data structures in between _VPF_blocked and the check
>> local_events_need_delivery(), and then having to roll such update back
>> if local_events_need_delivery() ends up being false, is that the
>> case?).
>
> In the arch_block() hook, we actually need to
>       - Put vCPU to the blocking list
>       - Set the NV to wakeup vector
>       - Set NDST to the right pCPU
>       - Clear SN

Nit: We shouldn't need to actually clear SN here; SN should already be
clear because the vcpu should be currently running.  (I don't think
there's a way for a vcpu to go from runnable->blocked, is there?)  And
if it's just been running, then NDST should also already be the correct
pcpu.

> During we are updating the posted-interrupt descriptor, the VT-d
> hardware can issue notification event and hence ON is set. If that is the
> case, it is straightforward to return directly, and it doesn't make sense
> we continue to do the above operations since we don't need to actually.

But checking to see if any interrupts have come in in the middle of your
code just adds unnecessary complication.  We need to have the code in
place to un-do all the blocking steps in the case that
local_events_need_delivery() returns true anyway.

Additionally, while local_events_need_delivery() is only called from
do_block and do_poll, hvm_local_events_need_delivery() is called from a
number of other places, as is hvm_vcpu_has_pending_irq().  All those
places presumably also need to know whether there's a PI pending to work
properly.

>> Code would look better, IMO, if we manage to fold that somehow inside
>> local_events_need_delivery(), but that:
>
> As mentioned above, during updating the PI descriptor for blocking, we
> need to check whether ON is set, and return if it is set. This logic cannot
> be included in local_events_need_delivery(), since the update itself
> has not much relationship with local_events_need_delivery().
>
> However, I do find some issues with my proposal above, see below:
>
> 1. Set _VPF_blocked
> 2. ret = arch_block()
> 3. if ( ret || local_events_need_delivery() )
>       clear_bit(_VPF_blocked, &v->pause_flags);
>
> After step #2, if ret == false, that means, we successfully changed the PI
> descriptor in arch_block(), if local_events_need_delivery() returns true,
> _VPF_blocked is cleared. After that, external interrupt come in, hence
> pi_wakeup_interrupt() --> vcpu_unblock(), but _VPF_blocked is cleared,
> so vcpu_unblock() does nothing, so the vCPU's PI state is incorrect.
>
> One possible solution for it is:
> - Disable the interrupts before the check in step #3 above
> - if local_events_need_delivery() returns true, undo all the operations
>  done in arch_block()
> - Enable interrupts after _VPF_blocked gets cleared.

Well, yes, if as a result of checking to see that there are interrupts
you unblock a processor, then you need to do *all* the things related to
unblocking, including switching the interrupt vector and removing it
from the blocked list.

If we're going to put hooks here in do_block(), then the best thing to
do is as much as possible to make PI interrupts act exactly the same as
other interrupts; i.e.,

* Do a full transition to blocked (set _VPF_blocked, add vcpu to PI
list, switch vector to wake)
* Check to see if there are any pending interrupts (either event
channels, virtual interrupts, or PIs)
* If so, do a full transition to unblocked (clear _VPF_blocked, switch
vector to PI, remove vcpu from list).

We should be able to order the operations such that if interrupts come
in the middle nothing bad happens, without needing to actually disable
interrupts.

OTOH -- I think if we grab a lock during an interrupt, we're not allowed
to grab it with interrupts disabled, correct?  So we may end up having
to disable interrupts anyway.

 -George


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-27  5:19     ` Wu, Feng
@ 2015-10-27  9:51       ` Jan Beulich
  2015-10-28  1:50         ` Dario Faggioli
  2015-10-28  2:58         ` Wu, Feng
  2015-10-27 12:16       ` Dario Faggioli
  1 sibling, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2015-10-27  9:51 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, GeorgeDunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 27.10.15 at 06:19, <feng.wu@intel.com> wrote:
>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>> Sent: Monday, October 26, 2015 10:40 PM
>> In any case, it would have been nice, given the situation, if you'd have put a few
>> words about, e.g., which solution you ended up implementing and why, either in
>> the cover or here (e.g., in the '---' area).
> 
> Thanks for the suggestion. As I mentioned before, updating the PI descriptor 
> needs
> to be atomic, I think it should be a little expensive. So doing it every 
> VM-exit seems
> not a good idea to me.

You could check whether any update is needed before doing the
(atomic) cmpxchg16b.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-27  5:19     ` Wu, Feng
  2015-10-27  9:51       ` Jan Beulich
@ 2015-10-27 12:16       ` Dario Faggioli
  2015-10-28  2:40         ` Wu, Feng
  1 sibling, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-27 12:16 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tian, Kevin, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3930 bytes --]

On Tue, 2015-10-27 at 05:19 +0000, Wu, Feng wrote:
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > 

> This is something similar with patch v7 and before, doing vcpu block
> during context switch, and seems during the discussion, you guys
> prefer doing the vcpu blocking things outside context switch.
> 
I know, that's why I'm not 100% sure of the path to take (I think I
made that clear).

On one hand, I'm close to convince myself that it's "just" a rollback
of the blocking, which is something we do already, when we clear the
flags. On the other hand, it's two hooks, which is worse than one, IMO,
especially if one is a 'cancel' hook. :-(

> > 
> > At the time, I "voted against" this design, because it seemed we
> > could
> > manage to handle interrupt ('regular' and posted) happening during
> > blocking in one and unified way, and with _only_ arch_vcpu_block().
> > If
> > that is no longer the case (and it's not, as we're adding more
> > hooks,
> > and the need to call the second is a special case being introduced
> > by
> > PI), it may be worth reconsidering things...
> > 
> > So, all in all, I don't know. As said, I don't like this
> > cancellation
> > hook because it's one more hook and because --while I see why it's
> > useful in this specific case-- I don't like having it in generic
> > code
> > (in schedule.c), and even less having it called in two places
> > (vcpu_block() and do_pool()). However, if others (Jan and George, I
> > guess) are not equally concerned about it, I can live with it.
> > 
> If I understand it correctly, this block cancel method was suggested
> by George, please refer to the attached email. George, what is your
> opinion about it? It is better to discuss a clear solution before I
> continue to post another version. Thanks a lot!
> 
Sure.

Thanks for mentioning and attaching the email.

So, bear me with me a bit: do you mind explaining (possibly again, in
which case, sorry) why we need, for instance in vcpu_block(), to call
the hook as early as you're calling it and not later?

I mean, what's the problem with something like this:

void vcpu_block(void)
{
    struct vcpu *v = current;

    set_bit(_VPF_blocked, &v->pause_flags);

    /* Check for events /after/ blocking: avoids wakeup waiting race. */
    if ( local_events_need_delivery() )
    {   
        clear_bit(_VPF_blocked, &v->pause_flags);
    }
    else
    {
 -->    arch_vcpu_block(v);
        TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
        raise_softirq(SCHEDULE_SOFTIRQ);
    }
}

?

In fact, George said this in the mail you mention:
"We shouldn't need to actually clear SN [in the arch_block hook]; SN
should already be clear because the vcpu should be currently running.
And if it's just been running, then NDST should also already be the
correct pcpu."

And that seems correct to me. So, the difference looks to me to be
"only" the NV, and whether or not the vcpu will be in a blocked list
already. The latter, seems something we can easily compensate for (and
you're doing it already, AFAICT); the former, I'm not sure whether it
could be an issue or not.

What am I missing?

Note that this is "just" to understand and form an opinion. Sorry again
if what I asked have been analyzed already, but I don't remember
anything like that, and I'm not super-familiar with these interrupt
things. :-/
Still in that email, there is something about the possibility of having
to disable the interrupts. I guess that didn't end up to be necessary?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-12  8:55 ` [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
  2015-10-26 14:39   ` Dario Faggioli
@ 2015-10-27 12:22   ` Dario Faggioli
  1 sibling, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-10-27 12:22 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3345 bytes --]

Oh, one more thing.

On Mon, 2015-10-12 at 16:55 +0800, Feng Wu wrote: 

> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e448b31..cad70b4 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c

> +void vmx_vcpu_block_cancel(struct vcpu *v)
> +{
> +    unsigned long flags;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags);
> +
> +    if ( !test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +        unsigned int pi_block_cpu;
> +
> +        /* the vCPU is not on any blocking list. */
> +        pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +        if ( pi_block_cpu == NR_CPUS )
> +            goto out;
> +
> +        /*
> +         * Set 'NV' field back to posted_intr_vector, so the
> +         * Posted-Interrupts can be delivered to the vCPU by
> +         * VT-d HW after it is scheduled to run.
> +         */
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +        spin_lock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
> +
> +        /*
> +         * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the
> vCPU was
> +         * removed from the blocking list while we are acquiring the
> lock.
> +         */
> +        if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +        {
> +            spin_unlock(&per_cpu(pi_blocked_vcpu_lock,
> pi_block_cpu));
> +            goto out;
> +        }
> +
> +        list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +        v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +        spin_unlock(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu));
> +    }
> +
> +out:
> +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
> +}
> +

> +void vmx_vcpu_wake_prepare(struct vcpu *v)
> +{
>
This function looks exactly identical to vmx_vcpu_block_cancel(), with
the only exception of...

> +    unsigned long flags;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, flags);
> +
> +    if ( !test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +        unsigned int pi_block_cpu;
> +
> +        /* the vCPU is not on any blocking list. */
> +        pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +        if ( pi_block_cpu == NR_CPUS )
> +            goto out;
> +
> +        /*
> +         * We cannot set 'SN' here since we don't change 'SN' during
> lazy
> +         * context switch, if we set 'SN' here, we may end up in the
> situation
> +         * that the vCPU is running with 'SN' set.
> +         */
> +
>
... this comment (which is present here and not there). Is that the
case, or am I overlooking something?

In any case, there seems to be a lot of common code between the twos.

If we are to keep this design, I think common code should be factored.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-27  9:51       ` Jan Beulich
@ 2015-10-28  1:50         ` Dario Faggioli
  2015-10-28  2:58         ` Wu, Feng
  1 sibling, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-10-28  1:50 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: GeorgeDunlap, Andrew Cooper, Kevin Tian, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1229 bytes --]

On Tue, 2015-10-27 at 03:51 -0600, Jan Beulich wrote:
> > > > On 27.10.15 at 06:19, <feng.wu@intel.com> wrote:
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > > Sent: Monday, October 26, 2015 10:40 PM
> > > In any case, it would have been nice, given the situation, if
> > > you'd have put a few
> > > words about, e.g., which solution you ended up implementing and
> > > why, either in
> > > the cover or here (e.g., in the '---' area).
> > 
> > Thanks for the suggestion. As I mentioned before, updating the PI
> > descriptor 
> > needs
> > to be atomic, I think it should be a little expensive. So doing it
> > every 
> > VM-exit seems
> > not a good idea to me.
> 
> You could check whether any update is needed before doing the
> (atomic) cmpxchg16b.
> 
Indeed! Do we have enough state, when in the VMX entry/exit handlers,
to check that? If yes, I really think this solution is worth at least a
try.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-27 12:16       ` Dario Faggioli
@ 2015-10-28  2:40         ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-28  2:40 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, October 27, 2015 8:17 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> On Tue, 2015-10-27 at 05:19 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > >
> 
> > This is something similar with patch v7 and before, doing vcpu block
> > during context switch, and seems during the discussion, you guys
> > prefer doing the vcpu blocking things outside context switch.
> >
> I know, that's why I'm not 100% sure of the path to take (I think I
> made that clear).
> 
> On one hand, I'm close to convince myself that it's "just" a rollback
> of the blocking, which is something we do already, when we clear the
> flags. On the other hand, it's two hooks, which is worse than one, IMO,
> especially if one is a 'cancel' hook. :-(
> 
> > >
> > > At the time, I "voted against" this design, because it seemed we
> > > could
> > > manage to handle interrupt ('regular' and posted) happening during
> > > blocking in one and unified way, and with _only_ arch_vcpu_block().
> > > If
> > > that is no longer the case (and it's not, as we're adding more
> > > hooks,
> > > and the need to call the second is a special case being introduced
> > > by
> > > PI), it may be worth reconsidering things...
> > >
> > > So, all in all, I don't know. As said, I don't like this
> > > cancellation
> > > hook because it's one more hook and because --while I see why it's
> > > useful in this specific case-- I don't like having it in generic
> > > code
> > > (in schedule.c), and even less having it called in two places
> > > (vcpu_block() and do_pool()). However, if others (Jan and George, I
> > > guess) are not equally concerned about it, I can live with it.
> > >
> > If I understand it correctly, this block cancel method was suggested
> > by George, please refer to the attached email. George, what is your
> > opinion about it? It is better to discuss a clear solution before I
> > continue to post another version. Thanks a lot!
> >
> Sure.
> 
> Thanks for mentioning and attaching the email.
> 
> So, bear me with me a bit: do you mind explaining (possibly again, in
> which case, sorry) why we need, for instance in vcpu_block(), to call
> the hook as early as you're calling it and not later?

No problem, it is my responsibility to explain this to your guys, so
you can give better review! :)

> 
> I mean, what's the problem with something like this:
> 
> void vcpu_block(void)
> {
>     struct vcpu *v = current;
> 
>     set_bit(_VPF_blocked, &v->pause_flags);
> 
>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>     if ( local_events_need_delivery() )
>     {
>         clear_bit(_VPF_blocked, &v->pause_flags);
>     }
>     else
>     {
>  -->    arch_vcpu_block(v);
>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>         raise_softirq(SCHEDULE_SOFTIRQ);
>     }
> }
> 
> ?

Here is the story, in arch_vcpu_block(), we will change the status of
PI descriptor and put the vCPU in the pCPU blocked list. However,
during updating the posted-interrupt descriptor for blocked vCPU,
we need check whether 'ON' is set, if that is the case, we cannot
blocked the vCPU since an notification event comes in, so in patch
v7 and before, we do it this way (in v7, the following blocking related
code is running in context switch) :

+        v->arch.hvm_vmx.pi_block_cpu = v->processor;
+
+        spin_lock(&per_cpu(pi_blocked_vcpu_lock, v->arch.hvm_vmx.pi_block_cpu));
+        list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                      &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+        spin_unlock(&per_cpu(pi_blocked_vcpu_lock,
+                    v->arch.hvm_vmx.pi_block_cpu));
+
+        do {
+            old.control = new.control = pi_desc->control;
+
+            /* Should not block the vCPU if an interrupt was posted for it. */
+            if ( pi_test_on(&old) )
+            {
+                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
+                vcpu_unblock(v);
+                return;
+            }
+
+            /*
+             * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
+             * so when external interrupts from assigned deivces happen,
+             * wakeup notifiction event will go to
+             * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
+             * we can find the vCPU in the right list to wake up.
+             */
+            dest = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
+
+            if ( x2apic_enabled )
+                new.ndst = dest;
+            else
+                new.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+
+            pi_clear_sn(&new);
+            new.nv = pi_wakeup_vector;
+        } while ( cmpxchg(&pi_desc->control, old.control, new.control) !=
+                  old.control );
+    }

However, seems using vcpu_unblock(v) is not a good ideas in context
switch, so we put the above code in a arch_vcpu_block hook, and
George think it is not a good idea to check 'ON' during update the
PI status, and we need roll back after that if needed. So here comes
the current solution in v8: removing the 'ON' checking in vmx_vcpu_block()
and reuse local_events_need_delivery() in vcpu_block(), if it is true,
call arch_vcpu_block_cancel() to roll back the previous blocking
operations.

Back to your suggestion above, if we put the arch_vcpu_block() inside
the else part of vcpu_block(), we still need check if 'ON' is set during
updating PI status, if it is we also need to roll back the operations in
vmx_vcpu_block().

> 
> In fact, George said this in the mail you mention:
> "We shouldn't need to actually clear SN [in the arch_block hook]; SN
> should already be clear because the vcpu should be currently running.
> And if it's just been running, then NDST should also already be the
> correct pcpu."

Yes, I think this is correct to me too.

> 
> And that seems correct to me. So, the difference looks to me to be
> "only" the NV, and whether or not the vcpu will be in a blocked list
> already. The latter, seems something we can easily compensate for (and
> you're doing it already, AFAICT); the former, I'm not sure whether it
> could be an issue or not.
> 
> What am I missing?

Does above explanation make sense to you? If you have any concern,
feel free to ask me! :) 

> 
> Note that this is "just" to understand and form an opinion. Sorry again
> if what I asked have been analyzed already, but I don't remember
> anything like that, and I'm not super-familiar with these interrupt
> things. :-/
> Still in that email, there is something about the possibility of having
> to disable the interrupts. I guess that didn't end up to be necessary?

Yes, after more think, maybe we don't need to disable interrupts if
we handle things carefully. Let's forget this at this moment. :-/

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-27  9:51       ` Jan Beulich
  2015-10-28  1:50         ` Dario Faggioli
@ 2015-10-28  2:58         ` Wu, Feng
  2015-10-28  9:03           ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-28  2:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, GeorgeDunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, October 27, 2015 5:52 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; GeorgeDunlap <george.dunlap@eu.citrix.com>;
> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
> <keir@xen.org>
> Subject: RE: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> >>> On 27.10.15 at 06:19, <feng.wu@intel.com> wrote:
> >> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> >> Sent: Monday, October 26, 2015 10:40 PM
> >> In any case, it would have been nice, given the situation, if you'd have put a
> few
> >> words about, e.g., which solution you ended up implementing and why,
> either in
> >> the cover or here (e.g., in the '---' area).
> >
> > Thanks for the suggestion. As I mentioned before, updating the PI descriptor
> > needs
> > to be atomic, I think it should be a little expensive. So doing it every
> > VM-exit seems
> > not a good idea to me.
> 
> You could check whether any update is needed before doing the
> (atomic) cmpxchg16b.

I am not sure why you mentioned (atomic) cmpxchg16b here, for atomically
Update PI descriptor, we don't need use cmpxchg16b, right?

In my understanding, It seems there were two suggestions about this:

#1: Set 'SN' on each VM-Exit and clear 'SN' on each VM-Entry, so we don't
need to care about the status of 'SN' in root mode. This is what I mean "not
a good idea", since we add an atomic operation on each VM-Exit/VM-Entry.

#2: For the blocking cancel case, let's take the following code for example:

void vcpu_block(void)
{
    struct vcpu *v = current;

    set_bit(_VPF_blocked, &v->pause_flags);

    arch_vcpu_block(v);

    /* Check for events /after/ blocking: avoids wakeup waiting race. */
    if ( local_events_need_delivery() )
    {
        clear_bit(_VPF_blocked, &v->pause_flags);
        /* arch_vcpu_block_cancel(v); */ 
    }
    else
    {
        TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
        raise_softirq(SCHEDULE_SOFTIRQ);
    }
}

We still call arch_vcpu_block() in vcpu_block(), but we don't need to
call arch_vcpu_block_cancel(v) when local_events_need_delivery()
returns true. Then before VM-Entry, we can check if 'NV' field is
' pi_wakeup_vector', if it is, we change the PI status and remove
it from the blocking list.

Jan, if #2 is what you mean, I think it worth a try. If I don't understand
your comments correctly, could you please elaborate it more? Thanks
a lot!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-28  2:58         ` Wu, Feng
@ 2015-10-28  9:03           ` Jan Beulich
  2015-10-28 16:36             ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-10-28  9:03 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, GeorgeDunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 28.10.15 at 03:58, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, October 27, 2015 5:52 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
>> <dario.faggioli@citrix.com>; GeorgeDunlap <george.dunlap@eu.citrix.com>;
>> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
>> <keir@xen.org>
>> Subject: RE: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
>> handling
>> 
>> >>> On 27.10.15 at 06:19, <feng.wu@intel.com> wrote:
>> >> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>> >> Sent: Monday, October 26, 2015 10:40 PM
>> >> In any case, it would have been nice, given the situation, if you'd have 
> put a
>> few
>> >> words about, e.g., which solution you ended up implementing and why,
>> either in
>> >> the cover or here (e.g., in the '---' area).
>> >
>> > Thanks for the suggestion. As I mentioned before, updating the PI 
> descriptor
>> > needs
>> > to be atomic, I think it should be a little expensive. So doing it every
>> > VM-exit seems
>> > not a good idea to me.
>> 
>> You could check whether any update is needed before doing the
>> (atomic) cmpxchg16b.
> 
> I am not sure why you mentioned (atomic) cmpxchg16b here, for atomically
> Update PI descriptor, we don't need use cmpxchg16b, right?

To be honest I'm not sure which entity was the one needing
compxchg16b. Just take that comment in a more general form:
If there is an update to be done (by whatever means) that is
expensive, try avoiding the update by first checking whether
an update is actually needed. Just like we e.g. do for a couple
of MSR writes, where we keep the last written value in a per-
CPU variable.

> In my understanding, It seems there were two suggestions about this:
> 
> #1: Set 'SN' on each VM-Exit and clear 'SN' on each VM-Entry, so we don't
> need to care about the status of 'SN' in root mode. This is what I mean "not
> a good idea", since we add an atomic operation on each VM-Exit/VM-Entry.
> 
> #2: For the blocking cancel case, let's take the following code for example:
> 
> void vcpu_block(void)
> {
>     struct vcpu *v = current;
> 
>     set_bit(_VPF_blocked, &v->pause_flags);
> 
>     arch_vcpu_block(v);
> 
>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>     if ( local_events_need_delivery() )
>     {
>         clear_bit(_VPF_blocked, &v->pause_flags);
>         /* arch_vcpu_block_cancel(v); */ 
>     }
>     else
>     {
>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>         raise_softirq(SCHEDULE_SOFTIRQ);
>     }
> }
> 
> We still call arch_vcpu_block() in vcpu_block(), but we don't need to
> call arch_vcpu_block_cancel(v) when local_events_need_delivery()
> returns true. Then before VM-Entry, we can check if 'NV' field is
> ' pi_wakeup_vector', if it is, we change the PI status and remove
> it from the blocking list.
> 
> Jan, if #2 is what you mean, I think it worth a try. If I don't understand
> your comments correctly, could you please elaborate it more? Thanks
> a lot!

Ideally we'd avoid both arch_vcpu_*() calls by doing what is needed
in arch code (e.g. the VM entry path). If only avoiding the cancel
hook is reasonably possible this way, then so be it - still better to
have just one hook here than having two.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-28  9:03           ` Jan Beulich
@ 2015-10-28 16:36             ` Dario Faggioli
  2015-10-29  5:39               ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-28 16:36 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: GeorgeDunlap, Andrew Cooper, Kevin Tian, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1728 bytes --]

On Wed, 2015-10-28 at 03:03 -0600, Jan Beulich wrote:
> > > > On 28.10.15 at 03:58, <feng.wu@intel.com> wrote:

> > We still call arch_vcpu_block() in vcpu_block(), but we don't need
> > to
> > call arch_vcpu_block_cancel(v) when local_events_need_delivery()
> > returns true. Then before VM-Entry, we can check if 'NV' field is
> > ' pi_wakeup_vector', if it is, we change the PI status and remove
> > it from the blocking list.
> > 
> > Jan, if #2 is what you mean, I think it worth a try. If I don't
> > understand
> > your comments correctly, could you please elaborate it more? Thanks
> > a lot!
> 
> Ideally we'd avoid both arch_vcpu_*() calls by doing what is needed
> in arch code (e.g. the VM entry path). 
>
+1

> If only avoiding the cancel
> hook is reasonably possible this way, then so be it - still better to
> have just one hook here than having two.
> 
+1 again

As an aside, I've spoked with some ARM people, the context being that
they will get to implement a similar feature in the nearish future.

They said that, although they can't be sure, they don't think they'll
be interested in arch hooks in the scheduling code and that, actually,
having them risk being more harmful than helpful.

Of course, that does not mean that we shouldn't add them for the sake
of this patch, if we can't avoid doing that. But if we can avoid them,
that is perhaps one more reason for doing things that way.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-28 16:36             ` Dario Faggioli
@ 2015-10-29  5:39               ` Wu, Feng
  2015-10-29  9:26                 ` Dario Faggioli
  0 siblings, 1 reply; 49+ messages in thread
From: Wu, Feng @ 2015-10-29  5:39 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, GeorgeDunlap, Andrew Cooper, xen-devel,
	Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, October 29, 2015 12:37 AM
> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>;
> GeorgeDunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> On Wed, 2015-10-28 at 03:03 -0600, Jan Beulich wrote:
> > > > > On 28.10.15 at 03:58, <feng.wu@intel.com> wrote:
> 
> > > We still call arch_vcpu_block() in vcpu_block(), but we don't need
> > > to
> > > call arch_vcpu_block_cancel(v) when local_events_need_delivery()
> > > returns true. Then before VM-Entry, we can check if 'NV' field is
> > > ' pi_wakeup_vector', if it is, we change the PI status and remove
> > > it from the blocking list.
> > >
> > > Jan, if #2 is what you mean, I think it worth a try. If I don't
> > > understand
> > > your comments correctly, could you please elaborate it more? Thanks
> > > a lot!
> >
> > Ideally we'd avoid both arch_vcpu_*() calls by doing what is needed
> > in arch code (e.g. the VM entry path).
> >
> +1
> 
> > If only avoiding the cancel
> > hook is reasonably possible this way, then so be it - still better to
> > have just one hook here than having two.
> >
> +1 again
> 
> As an aside, I've spoked with some ARM people, the context being that
> they will get to implement a similar feature in the nearish future.
> 
> They said that, although they can't be sure, they don't think they'll
> be interested in arch hooks in the scheduling code and that, actually,
> having them risk being more harmful than helpful.
> 
> Of course, that does not mean that we shouldn't add them for the sake
> of this patch, if we can't avoid doing that. But if we can avoid them,
> that is perhaps one more reason for doing things that way.

Jan & Dario, thanks a lot for you guys' input. If you agree, I will remove
the arch_vcpu_block_cancel() and maybe arch_vcpu_wake_prepare()
as well, and implement the logic before VM-entry in V9, the patch will
be coming soon. Thanks again!

Thanks,
Feng 

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-29  5:39               ` Wu, Feng
@ 2015-10-29  9:26                 ` Dario Faggioli
  2015-10-29 14:07                   ` Wu, Feng
  0 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-29  9:26 UTC (permalink / raw)
  To: Wu, Feng, Jan Beulich
  Cc: GeorgeDunlap, Andrew Cooper, Tian, Kevin, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1107 bytes --]

On Thu, 2015-10-29 at 05:39 +0000, Wu, Feng wrote:
> 
> > -----Original Message-----
> > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]

> > Of course, that does not mean that we shouldn't add them for the
> > sake
> > of this patch, if we can't avoid doing that. But if we can avoid
> > them,
> > that is perhaps one more reason for doing things that way.
> 
> Jan & Dario, thanks a lot for you guys' input. 
>
Thanks to you for listening and replying. :-)

> If you agree, I will remove
> the arch_vcpu_block_cancel() and maybe arch_vcpu_wake_prepare()
> as well, and implement the logic before VM-entry in V9, the patch
> will
> be coming soon. 
>
Ok. Just to be sure I understand, you're plan is removing
arch_vcpu_block_cancel(), arch_vcpu_wake_prepare() and *leaving*
arch_vcpu_block()?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (17 preceding siblings ...)
  2015-10-23  2:12 ` [PATCH v8 00/17] Add VT-d Posted-Interrupts support Wu, Feng
@ 2015-10-29 10:51 ` Dario Faggioli
  2015-10-29 14:05   ` Wu, Feng
  18 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-10-29 10:51 UTC (permalink / raw)
  To: Feng Wu, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1441 bytes --]

On Mon, 2015-10-12 at 16:54 +0800, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted
> -Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
So, I (super-super-super-quickly) skimmed through the series, and saw
that some patches seems to have been Acked-by by the relevant
maintainer already, some have at least one Reviewed-by while some other
have nothing.

In this cases, given both the size of the series and the time this has
been outstanding, it would be useful to add, here in the cover letter,
a summary of some sort of what are the patches that still need the most
attention.

Reviewers and maintainers will double check things anyway, but
something like that is generally useful to assess the general status of
the series, and to have a quick idea on where to concentrate efforts.

On how to do it, there are many ways... This is how I've done it myself
a few patch series ago:

http://lists.xenproject.org/archives/html/xen-devel/2013-11/msg02513.html

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 00/17] Add VT-d Posted-Interrupts support
  2015-10-29 10:51 ` Dario Faggioli
@ 2015-10-29 14:05   ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-29 14:05 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, October 29, 2015 6:52 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v8 00/17] Add VT-d Posted-Interrupts support
> 
> So, I (super-super-super-quickly) skimmed through the series, and saw
> that some patches seems to have been Acked-by by the relevant
> maintainer already, some have at least one Reviewed-by while some other
> have nothing.
> 
> In this cases, given both the size of the series and the time this has
> been outstanding, it would be useful to add, here in the cover letter,
> a summary of some sort of what are the patches that still need the most
> attention.
> 
> Reviewers and maintainers will double check things anyway, but
> something like that is generally useful to assess the general status of
> the series, and to have a quick idea on where to concentrate efforts.
> 
> On how to do it, there are many ways... This is how I've done it myself
> a few patch series ago:
> 
> http://lists.xenproject.org/archives/html/xen-devel/2013-11/msg02513.html

Dario, thanks a lot for your good suggestion. I will add this kind of information
in v9!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling
  2015-10-29  9:26                 ` Dario Faggioli
@ 2015-10-29 14:07                   ` Wu, Feng
  0 siblings, 0 replies; 49+ messages in thread
From: Wu, Feng @ 2015-10-29 14:07 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, GeorgeDunlap, Andrew Cooper, xen-devel,
	Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, October 29, 2015 5:27 PM
> To: Wu, Feng <feng.wu@intel.com>; Jan Beulich <JBeulich@suse.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>;
> GeorgeDunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> On Thu, 2015-10-29 at 05:39 +0000, Wu, Feng wrote:
> >
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> 
> > > Of course, that does not mean that we shouldn't add them for the
> > > sake
> > > of this patch, if we can't avoid doing that. But if we can avoid
> > > them,
> > > that is perhaps one more reason for doing things that way.
> >
> > Jan & Dario, thanks a lot for you guys' input.
> >
> Thanks to you for listening and replying. :-)
> 
> > If you agree, I will remove
> > the arch_vcpu_block_cancel() and maybe arch_vcpu_wake_prepare()
> > as well, and implement the logic before VM-entry in V9, the patch
> > will
> > be coming soon.
> >
> Ok. Just to be sure I understand, you're plan is removing
> arch_vcpu_block_cancel(), arch_vcpu_wake_prepare() and *leaving*
> arch_vcpu_block()?

Yes, that is my plan, It seems to me it is hard to remove arch_vcpu_block(),
or do you have any ideas?

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design
  2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-10-12 10:41   ` Andrew Cooper
@ 2015-10-29 15:56   ` Dario Faggioli
  1 sibling, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-10-29 15:56 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Yang Zhang


[-- Attachment #1.1: Type: text/plain, Size: 2225 bytes --]

On Mon, 2015-10-12 at 16:54 +0800, Feng Wu wrote:
> diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> new file mode 100644
> index 0000000..af5409a
> --- /dev/null
> +++ b/docs/misc/vtd-pi.txt
> @@ -0,0 +1,332 @@
> +Authors: Feng Wu <feng.wu@intel.com>
> +
> +VT-d Posted-interrupt (PI) design for XEN
> +
And one thing about this too...

> +- Update posted-interrupt descriptor during vCPU scheduling
> +
> +The basic idea here is:
> +1. When vCPU's state is RUNSTATE_running,
> +        - Set 'NV' to 'posted_intr_vector'.
> +        - Clear 'SN' to accept posted-interrupts.
> +        - Set 'NDST' to the pCPU on which the vCPU will be running.
> +2. When vCPU's state is RUNSTATE_blocked,
> +        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> +          related vCPU when posted-interrupt happens for it.
> +          Please refer to the above section about the new global
> vector.
> +        - Clear 'SN' to accept posted-interrupts
> +3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> +        - Set 'SN' to suppress non-urgent interrupts
> +          (Currently, we only support non-urgent interrupts)
> +         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline
> +         It is not needed to accept posted-interrupt notification
> event
> +         since we don't change the behavior of scheduler when the
> interrupt
> +         occurs, we still need wait for the next scheduling of the
> vCPU.
> +         When external interrupts from assigned devices occur, the
> interrupts
> +         are recorded in PIR, and will be synced to IRR before VM
> -Entry.
> +        - Set 'NV' to 'posted_intr_vector'.
> +
This still describe the old structure, based on RUNSTATEs, from which
we've moved away. Please, either make it more general, or update it so
that it reflects the actual implementation (hooks and VMX enrty/exit
handlers), or both. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2015-10-29 15:56 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-12  8:54 [PATCH v8 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-10-12  8:54 ` [PATCH v8 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-10-12 10:41   ` Andrew Cooper
2015-10-13  0:53     ` Wu, Feng
2015-10-29 15:56   ` Dario Faggioli
2015-10-12  8:54 ` [PATCH v8 02/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-10-13 15:29   ` Jan Beulich
2015-10-14  5:57     ` Wu, Feng
2015-10-14  9:05       ` Jan Beulich
2015-10-14  9:29         ` Wu, Feng
2015-10-14  9:45           ` Jan Beulich
2015-10-14 10:03             ` Wu, Feng
2015-10-14 10:21               ` Jan Beulich
2015-10-14 10:25                 ` Wu, Feng
2015-10-12  8:54 ` [PATCH v8 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-10-12  8:54 ` [PATCH v8 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-10-12  8:54 ` [PATCH v8 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-10-12  8:54 ` [PATCH v8 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-10-12  8:54 ` [PATCH v8 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-10-12  8:54 ` [PATCH v8 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-10-12  8:54 ` [PATCH v8 09/17] VT-d: Remove pointless casts Feng Wu
2015-10-12  8:54 ` [PATCH v8 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-10-12  8:54 ` [PATCH v8 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-10-12  8:54 ` [PATCH v8 12/17] x86: move some APIC related macros to apicdef.h Feng Wu
2015-10-12  8:54 ` [PATCH v8 13/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-10-12  8:55 ` [PATCH v8 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-10-12  8:55 ` [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-10-26 14:39   ` Dario Faggioli
2015-10-27  5:19     ` Wu, Feng
2015-10-27  9:51       ` Jan Beulich
2015-10-28  1:50         ` Dario Faggioli
2015-10-28  2:58         ` Wu, Feng
2015-10-28  9:03           ` Jan Beulich
2015-10-28 16:36             ` Dario Faggioli
2015-10-29  5:39               ` Wu, Feng
2015-10-29  9:26                 ` Dario Faggioli
2015-10-29 14:07                   ` Wu, Feng
2015-10-27 12:16       ` Dario Faggioli
2015-10-28  2:40         ` Wu, Feng
2015-10-27 12:22   ` Dario Faggioli
2015-10-12  8:55 ` [PATCH v8 16/17] VT-d: Dump the posted format IRTE Feng Wu
2015-10-12  8:55 ` [PATCH v8 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-10-23  2:12 ` [PATCH v8 00/17] Add VT-d Posted-Interrupts support Wu, Feng
2015-10-23  8:13   ` Jan Beulich
2015-10-23  8:35     ` Wu, Feng
2015-10-23  8:46       ` Dario Faggioli
2015-10-23  8:52         ` Wu, Feng
2015-10-29 10:51 ` Dario Faggioli
2015-10-29 14:05   ` Wu, Feng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.