xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/17] Add VT-d Posted-Interrupts support
@ 2015-08-12  2:35 Feng Wu
  2015-08-12  2:35 ` [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

Feng Wu (17):
  VT-d Posted-intterrupt (PI) design
  Add cmpxchg16b support for x86-64
  iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  vt-d: VT-d Posted-Interrupts feature detection
  vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  vmx: Add some helper functions for Posted-Interrupts
  vmx: Initialize VT-d Posted-Interrupts Descriptor
  vmx: Suppress posting interrupts when 'SN' is set
  VT-d: Remove pointless casts
  vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  vt-d: Add API to update IRTE when VT-d PI is used
  Update IRTE according to guest interrupt config changes
  vmx: posted-interrupt handling when vCPU is blocked
  vmx: Properly handle notification event when vCPU is running
  vmx: Add some scheduler hooks for VT-d posted interrupts
  VT-d: Dump the posted format IRTE
  Add a command line parameter for VT-d posted-interrupts

 docs/misc/vtd-pi.txt                   | 333 +++++++++++++++++++++++++++++++++
 docs/misc/xen-command-line.markdown    |   9 +-
 xen/arch/x86/domain.c                  |  11 ++
 xen/arch/x86/hvm/vmx/vmcs.c            |  21 +++
 xen/arch/x86/hvm/vmx/vmx.c             | 286 +++++++++++++++++++++++++++-
 xen/common/schedule.c                  |   2 +
 xen/drivers/passthrough/io.c           | 124 +++++++++++-
 xen/drivers/passthrough/iommu.c        |  16 +-
 xen/drivers/passthrough/vtd/intremap.c | 204 +++++++++++++++-----
 xen/drivers/passthrough/vtd/iommu.c    |  17 +-
 xen/drivers/passthrough/vtd/iommu.h    |  46 +++--
 xen/drivers/passthrough/vtd/utils.c    |  57 +++++-
 xen/include/asm-arm/domain.h           |   2 +
 xen/include/asm-x86/domain.h           |   3 +
 xen/include/asm-x86/hvm/hvm.h          |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |  26 ++-
 xen/include/asm-x86/hvm/vmx/vmx.h      |  28 +++
 xen/include/asm-x86/iommu.h            |   2 +
 xen/include/asm-x86/x86_64/system.h    |  28 +++
 xen/include/xen/iommu.h                |   2 +-
 20 files changed, 1140 insertions(+), 79 deletions(-)
 create mode 100644 docs/misc/vtd-pi.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:35   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 02/17] Add cmpxchg16b support for x86-64 Feng Wu
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Yang Zhang, Feng Wu

Add the design doc for VT-d PI.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Keir Fraser <keir@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 docs/misc/vtd-pi.txt | 333 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 333 insertions(+)
 create mode 100644 docs/misc/vtd-pi.txt

diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
new file mode 100644
index 0000000..98a77ba
--- /dev/null
+++ b/docs/misc/vtd-pi.txt
@@ -0,0 +1,333 @@
+Authors: Feng Wu <feng.wu@intel.com>
+
+VT-d Posted-interrupt (PI) design for XEN
+
+Background
+==========
+With the development of virtualization, there are more and more device
+assignment requirements. However, today when a VM is running with
+assigned devices (such as, NIC), external interrupt handling for the assigned
+devices always needs VMM intervention.
+
+VT-d Posted-interrupt is a more enhanced method to handle interrupts
+in the virtualization environment. Interrupt posting is the process by
+which an interrupt request is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex.
+
+With VT-d Posted-interrupt we can get the following advantages:
+- Direct delivery of external interrupts to running vCPUs without VMM
+intervention
+- Decrease the interrupt migration complexity. On vCPU migration, software
+can atomically co-migrate all interrupts targeting the migrating vCPU. For
+virtual machines with assigned devices, migrating a vCPU across pCPUs
+either incur the overhead of forwarding interrupts in software (e.g. via VMM
+generated IPIS), or complexity to independently migrate each interrupt targeting
+the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
+of an external interrupt from assigned devices is stored in the IRTE (i.e.
+Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
+we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
+make the interrupt migration automatic.
+
+Here is what Xen currently does for external interrupts from assigned devices:
+
+When a VM is running and an external interrupt from an assigned device occurs
+for it. VM-EXIT happens, then:
+
+vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
+raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
+
+softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
+
+dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
+vmsi_inj_irq() --> vlapic_set_irq()
+
+vlapic_set_irq() does the following things:
+1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
+the virtual interrupt via posted-interrupt infrastructure.
+2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
+page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
+will help to inject the interrupt to guests.
+
+However, after VT-d PI is supported, when a guest is running in non-root and an
+external interrupt from an assigned device occurs for it. No VM-Exit is needed,
+the guest can handle this totally in non-root mode, thus avoiding all the above
+code flow.
+
+Posted-interrupt Introduction
+========================
+There are two components to the Posted-interrupt architecture:
+Processor Support and Root-Complex Support
+
+- Processor Support
+Posted-interrupt processing is a feature by which a processor processes
+the virtual interrupts by recording them as pending on the virtual-APIC
+page.
+
+Posted-interrupt processing is enabled by setting the process posted
+interrupts VM-execution control. The processing is performed in response
+to the arrival of an interrupt with the posted-interrupt notification vector.
+In response to such an interrupt, the processor processes virtual interrupts
+recorded in a data structure called a posted-interrupt descriptor.
+
+More information about APICv and CPU-side Posted-interrupt, please refer
+to Chapter 29, and Section 29.6 in the Intel SDM:
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+
+- Root-Complex Support
+Interrupt posting is the process by which an interrupt request (from IOAPIC
+or MSI/MSIx capable sources) is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex. The interrupt
+request arriving at the root-complex carry the identity of the interrupt
+request source and a 'remapping-index'. The remapping-index is used to
+look-up an entry from the memory-resident interrupt-remap-table. Unlike
+with interrupt-remapping, the interrupt-remap-table-entry for a posted-
+interrupt, specifies a virtual-vector and a pointer to the posted-interrupt
+descriptor. The virtual-vector specifies the vector of the interrupt to be
+recorded in the posted-interrupt descriptor. The posted-interrupt descriptor
+hosts storage for the virtual-vectors and contains the attributes of the
+notification event (interrupt) to be issued to the CPU complex to inform
+CPU/software about pending interrupts recorded in the posted-interrupt
+descriptor.
+
+More information about VT-d PI, please refer to
+http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
+
+Important Definitions
+==================
+There are some changes to IRTE and posted-interrupt descriptor after
+VT-d PI is introduced:
+IRTE: Interrupt Remapping Table Entry
+Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
+Virtual Vector: the guest vector of the interrupt
+URG: indicates if the interrupt is urgent
+
+Posted-interrupt descriptor:
+The Posted Interrupt Descriptor hosts the following fields:
+Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
+per vector, for up to 256 vectors).
+
+Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
+processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
+hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
+the notification event (processor or software) resets it as part of posted interrupt processing.
+
+Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
+generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
+URG=0).
+
+Notification Vector (NV): Specify the vector for notification event (interrupt).
+
+Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
+processor for the notification event.
+
+Design Overview
+==============
+In this design, we will cover the following items:
+1. Add a variable to control whether enable VT-d posted-interrupt or not.
+2. VT-d PI feature detection.
+3. Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff.
+4. Extend IRTE structure to support VT-d PI.
+5. Introduce a new global vector which is used for waking up the blocked vCPU.
+6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+7. Update posted-interrupt descriptor during vCPU scheduling (when the state
+of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
+RUNSTATE_runnable / RUNSTATE_offline).
+8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+9. New boot command line for Xen, which controls VT-d PI feature by user.
+10. Multicast/broadcast and lowest priority interrupts consideration.
+
+
+Implementation details
+===================
+- New variable to control VT-d PI
+
+Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
+to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
+only when interrupt remapping and VT-d posted-interrupt are both enabled.
+
+- VT-d PI feature detection.
+Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
+
+- Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff.
+Here is the new structure for posted-interrupt descriptor:
+
+struct pi_desc {
+    DECLARE_BITMAP(pir, NR_VECTORS);
+    union {
+        struct
+        {
+        u16 on     : 1,  /* bit 256 - Outstanding Notification */
+            sn     : 1,  /* bit 257 - Suppress Notification */
+            rsvd_1 : 14; /* bit 271:258 - Reserved */
+        u8  nv;          /* bit 279:272 - Notification Vector */
+        u8  rsvd_2;      /* bit 287:280 - Reserved */
+        u32 ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
+} __attribute__ ((aligned (64)));
+
+- Extend IRTE structure to support VT-d PI.
+
+Here is the new structure for IRTE:
+/* interrupt remap entry */
+struct iremap_entry {
+  union {
+    struct { u64 lo, hi; };
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            dm      : 1,
+            rh      : 1,
+            tm      : 1,
+            dlm     : 3,
+            avail   : 4,
+            res_1   : 4;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4   : 32;
+    } remap;
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_5   : 12;
+        u32 pda_h;
+    } post;
+  };
+};
+
+- Introduce a new global vector which is used to wake up the blocked vCPU.
+
+Currently, there is a global vector 'posted_intr_vector', which is used as the
+global notification vector for all vCPUs in the system. This vector is stored in
+VMCS and CPU considers it as a _special_ vector, uses it to notify the related
+pCPU when an interrupt is recorded in the posted-interrupt descriptor.
+
+This existing global vector is a _special_ vector to CPU, CPU handle it in a
+_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+for more information about how CPU handles it.
+
+After having VT-d PI, VT-d engine can issue notification event when the
+assigned devices issue interrupts. We need add a new global vector to
+wakeup the blocked vCPU, please refer to later section in this design for
+how to use this new global vector.
+
+- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+After VT-d PI is introduced, the format of IRTE is changed as follows:
+	Descriptor Address: the address of the posted-interrupt descriptor
+	Virtual Vector: the guest vector of the interrupt
+	URG: indicates if the interrupt is urgent
+	Other fields continue to have the same meaning
+
+'Descriptor Address' tells the destination vCPU of this interrupt, since
+each vCPU has a dedicated posted-interrupt descriptor.
+
+'Virtual Vector' tells the guest vector of the interrupt.
+
+When guest changes the configuration of the interrupts, such as, the
+cpu affinity, or the vector, we need to update the associated IRTE accordingly.
+
+- Update posted-interrupt descriptor during vCPU scheduling
+
+The basic idea here is:
+1. When vCPU's state is RUNSTATE_running,
+        - Set 'NV' to 'posted_intr_vector'.
+        - Clear 'SN' to accept posted-interrupts.
+        - Set 'NDST' to the pCPU on which the vCPU will be running.
+2. When vCPU's state is RUNSTATE_blocked,
+        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
+          related vCPU when posted-interrupt happens for it.
+          Please refer to the above section about the new global vector.
+        - Clear 'SN' to accept posted-interrupts
+3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
+        - Set 'SN' to suppress non-urgent interrupts
+          (Current, we only support non-urgent interrupts)
+         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline,
+         It is not needed to accept posted-interrupt notification event,
+         since we don't change the behavior of scheduler when the interrupt
+         occurs, we still need wait the next scheduling of the vCPU.
+         When external interrupts from assigned devices occur, the interrupts
+         are recorded in PIR, and will be synced to IRR before VM-Entry.
+        - Set 'NV' to 'posted_intr_vector'.
+
+- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+
+Here is the scenario for the usage of the new global vector:
+
+1. vCPU0 is running on pCPU0
+2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
+3. An external interrupt from an assigned device occurs for vCPU0, if we
+still use 'posted_intr_vector' as the notification vector for vCPU0, the
+notification event for vCPU0 (the event will go to pCPU1) will be consumed
+by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
+case is that vCPU0 will never be woken up again since the wakeup event
+for it is always consumed by other vCPUs incorrectly. So we need introduce
+another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
+
+After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
+event using this new vector. Since this new vector is not a SPECIAL one to CPU,
+it is just a normal vector. To cpu, it just receives an normal external interrupt,
+then we can get control in the handler of this new vector. In this case, hypervisor
+can do something in it, such as wakeup the blocked vCPU.
+
+Here are what we do for the blocked vCPU:
+1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
+vCPU on the pCPU.
+2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU
+to the per-cpu list belonging to the pCPU it was running.
+3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
+
+In the handler of 'pi_wakeup_vector', we do:
+1. Get the physical CPU.
+2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
+we unblock the associated vCPU.
+
+- New boot command line for Xen, which controls VT-d PI feature by user.
+
+Like 'intremap' for interrupt remapping, we add a new boot command line
+'intpost' for posted-interrupts.
+
+- Multicast/broadcast and lowest priority interrupts consideration.
+
+With VT-d PI, the destination vCPU information of an external interrupt
+from assigned devices is stored in IRTE, this makes the following
+consideration of the design:
+1. Multicast/broadcast interrupts cannot be posted.
+2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
+(starting from Nehalem) ignore TPR value, and instead supported two other
+ways (configurable by BIOS) on how the handle lowest priority interrupts:
+	A) Round robin: In this method, the chipset simply delivers lowest priority
+interrupts in a round-robin manner across all the available logical CPUs. While
+this provides good load balancing, this was not the best thing to do always as
+interrupts from the same device (like NIC) will start running on all the CPUs
+thrashing caches and taking locks. This led to the next scheme.
+	B) Vector hashing: In this method, hardware would apply a hash function
+on the vector value in the interrupt request, and use that hash to pick a logical
+CPU to route the lowest priority interrupt. This way, a given vector always goes
+to the same logical CPU, avoiding the thrashing problem above.
+
+So, gist of above is that, lowest priority interrupts has never been delivered as
+"lowest priority" in physical hardware.
+
+Vector hashing is used in this design.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 02/17] Add cmpxchg16b support for x86-64
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-08-12  2:35 ` [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:38   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Feng Wu, Jan Beulich, Andrew Cooper

This patch adds cmpxchg16b support for x86-64, so software
can perform 128-bit atomic write/read.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- Change back the parameters of __cmpxchg16b() to __uint128_t *
- Remove pointless cast for 'ptr'
- Remove pointless parentheses
- Use A constraint for the output

v4:
- Use pointer as the parameter of __cmpxchg16b().
- Use gcc's __uint128_t built-in type
- Make the parameters of __cmpxchg16b() void *

v3:
- Newly added.

 xen/include/asm-x86/x86_64/system.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index 662813a..ebbe4b5 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -6,6 +6,34 @@
                                    (unsigned long)(n),sizeof(*(ptr))))
 
 /*
+ * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
+ * identical, store NEW in MEM.  Return the initial value in MEM.
+ * Success is indicated by comparing RETURN with OLD.
+ *
+ * This function can only be called when cpu_has_cx16 is ture.
+ */
+
+static always_inline __uint128_t __cmpxchg16b(
+    volatile void *ptr, __uint128_t *old, __uint128_t *new)
+{
+    __uint128_t prev;
+    uint64_t new_high = *new >> 64;
+    uint64_t new_low = (uint64_t)*new;
+
+    ASSERT(cpu_has_cx16);
+
+    asm volatile ( "lock; cmpxchg16b %3"
+                   : "=A" (prev)
+                   : "c" (new_high), "b" (new_low), "m" (*__xg(ptr)), "0" (*old)
+                   : "memory" );
+
+    return prev;
+}
+
+#define cmpxchg16b(ptr,o,n)                                             \
+    __cmpxchg16b((ptr), (__uint128_t *)(o), (__uint128_t *)(n))
+
+/*
  * This function causes value _o to be changed to _n at location _p.
  * If this access causes a fault then we return 1, otherwise we return 0.
  * If no fault occurs then _o is updated to the value we saw at _p. If this
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
  2015-08-12  2:35 ` [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-08-12  2:35 ` [PATCH v5 02/17] Add cmpxchg16b support for x86-64 Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:39   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Feng Wu, Jan Beulich

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds variable 'iommu_intpost' to control whether enable VT-d
posted-interrupt or not in the generic IOMMU code.

CC: Jan Beulich <jbeulich@suse.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
v5:
- Remove the "if no intremap then no intpost" logic in parse_iommu_param(), which
  can be covered in iommu_setup()

v3:
- Remove pointless initializer for 'iommu_intpost'.
- Some adjustment for "if no intremap then no intpost" logic.
    * For parse_iommu_param(), move it to the end of the function,
      so we don't need to add the some logic when introduing the
      new kernel parameter 'intpost' in later patch.
    * Add this logic in iommu_setup() after iommu_hardware_setup()
      is called.

 xen/drivers/passthrough/iommu.c | 13 ++++++++++++-
 xen/include/xen/iommu.h         |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 11d58cc..8eb77f7 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -50,6 +50,14 @@ bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+
+/*
+ * In the current implementation of VT-d posted interrupts, in some extreme
+ * cases, the per cpu list which saves the blocked vCPU will be very long,
+ * and this will affect the interrupt latency, so let this feature off by
+ * default until we find a good solution to resolve it.
+ */
+bool_t __read_mostly iommu_intpost;
 bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
@@ -304,6 +312,9 @@ int __init iommu_setup(void)
         panic("Couldn't enable %s and iommu=required/force",
               !iommu_enabled ? "IOMMU" : "Interrupt Remapping");
 
+    if ( !iommu_intremap )
+        iommu_intpost = 0;
+
     if ( !iommu_enabled )
     {
         iommu_snoop = 0;
@@ -371,7 +382,7 @@ void iommu_crash_shutdown(void)
     const struct iommu_ops *ops = iommu_get_ops();
     if ( iommu_enabled )
         ops->crash_shutdown();
-    iommu_enabled = iommu_intremap = 0;
+    iommu_enabled = iommu_intremap = iommu_intpost = 0;
 }
 
 int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 705969b..da326a1 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -30,7 +30,7 @@
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
-extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
+extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:40   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- Remove blank line

v4:
- Correct a logic error when setting iommu_intpost to 0

v3:
- Remove the "if no intremap then no intpost" logic in
  intel_vtd_setup(), it is covered in the iommu_setup().
- Add "if no intremap then no intpost" logic in the end
  of init_vtd_hw() which is called by vtd_resume().

So the logic exists in the following three places:
- parse_iommu_param()
- iommu_setup()
- init_vtd_hw()

 xen/drivers/passthrough/vtd/iommu.c | 17 +++++++++++++++--
 xen/drivers/passthrough/vtd/iommu.h |  1 +
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1dffc40..52c7cc9 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
                 disable_intremap(drhd->iommu);
     }
 
+    if ( !iommu_intremap )
+        iommu_intpost = 0;
+
     /*
      * Set root entries for each VT-d engine.  After set root entry,
      * must globally invalidate context cache, and then globally
@@ -2147,8 +2150,8 @@ int __init intel_vtd_setup(void)
     }
 
     /* We enable the following features only if they are supported by all VT-d
-     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
-     * Interrupt Remapping.
+     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
+     * Remapping, and Posted Interrupt
      */
     for_each_drhd_unit ( drhd )
     {
@@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
         if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
             iommu_intremap = 0;
 
+        /*
+         * We cannot use posted interrupt if X86_FEATURE_CX16 is
+         * not supported, since we count on this feature to
+         * atomically update 16-byte IRTE in posted format.
+         */
+        if ( !iommu_intremap || !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
+            iommu_intpost = 0;
+
         if ( !vtd_ept_page_compatible(iommu) )
             iommu_hap_pt_share = 0;
 
@@ -2201,6 +2212,7 @@ int __init intel_vtd_setup(void)
     P(iommu_passthrough, "Dom0 DMA Passthrough");
     P(iommu_qinval, "Queued Invalidation");
     P(iommu_intremap, "Interrupt Remapping");
+    P(iommu_intpost, "Posted Interrupt");
     P(iommu_hap_pt_share, "Shared EPT tables");
 #undef P
 
@@ -2220,6 +2232,7 @@ int __init intel_vtd_setup(void)
     iommu_passthrough = 0;
     iommu_qinval = 0;
     iommu_intremap = 0;
+    iommu_intpost = 0;
     return ret;
 }
 
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index ac71ed1..22abefe 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -61,6 +61,7 @@
 /*
  * Decoding Capability Register
  */
+#define cap_intr_post(c)       (((c) >> 59) & 1)
 #define cap_read_drain(c)      (((c) >> 55) & 1)
 #define cap_write_drain(c)     (((c) >> 54) & 1)
 #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:45   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

Extend struct pi_desc according to VT-d Posted-Interrupts Spec.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v3:
- Use u32 instead of u64 for the bitfield in 'struct pi_desc'

 xen/include/asm-x86/hvm/vmx/vmcs.h | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f1126d4..7e81752 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -80,8 +80,19 @@ struct vmx_domain {
 
 struct pi_desc {
     DECLARE_BITMAP(pir, NR_VECTORS);
-    u32 control;
-    u32 rsvd[7];
+    union {
+        struct
+        {
+        u16 on     : 1,  /* bit 256 - Outstanding Notification */
+            sn     : 1,  /* bit 257 - Suppress Notification */
+            rsvd_1 : 14; /* bit 271:258 - Reserved */
+        u8  nv;          /* bit 279:272 - Notification Vector */
+        u8  rsvd_2;      /* bit 287:280 - Reserved */
+        u32 ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
 } __attribute__ ((aligned (64)));
 
 #define ept_get_wl(ept)   ((ept)->ept_wl)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 15:46   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v4:
- Newly added

 xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 3fbfa44..acd4aec 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -101,6 +101,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
 void vmx_update_secondary_exec_control(struct vcpu *v);
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
 static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 {
     return test_and_set_bit(vector, pi_desc->pir);
@@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
     return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
 }
 
+static inline int pi_test_on(struct pi_desc *pi_desc)
+{
+    return test_bit(POSTED_INTR_ON, &pi_desc->control);
+}
+
 static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 {
     return xchg(&pi_desc->pir[group], 0);
 }
 
+static inline int pi_test_sn(struct pi_desc *pi_desc)
+{
+    return test_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline void pi_set_sn(struct pi_desc *pi_desc)
+{
+    set_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+    clear_bit(POSTED_INTR_SN, &pi_desc->control);
+}
+
 /*
  * Exit Reasons
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch initializes the VT-d Posted-interrupt Descriptor.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v3:
- Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c
- Remove the 'inline' flag of pi_desc_init()

 xen/arch/x86/hvm/vmx/vmcs.c       | 18 ++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index a0a97e7..28c553f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -39,6 +39,7 @@
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
 #include <asm/tboot.h>
+#include <asm/apic.h>
 
 static bool_t __read_mostly opt_vpid_enabled = 1;
 boolean_param("vpid", opt_vpid_enabled);
@@ -951,6 +952,20 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+static void pi_desc_init(struct vcpu *v)
+{
+    uint32_t dest;
+
+    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
+
+    dest = cpu_physical_id(v->processor);
+
+    if ( x2apic_enabled )
+        v->arch.hvm_vmx.pi_desc.ndst = dest;
+    else
+        v->arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -1089,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
+        if ( iommu_intpost )
+            pi_desc_init(v);
+
         __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
         __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index acd4aec..03c529c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -88,6 +88,8 @@ typedef enum {
 #define EPT_EMT_WB              6
 #define EPT_EMT_RSV2            7
 
+#define PI_xAPIC_NDST_MASK      0xFF00
+
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
 void vmx_intr_assist(void);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 09/17] VT-d: Remove pointless casts Feng Wu
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- keep the vcpu_kick() at the end of vmx_deliver_posted_intr()
- Keep the 'return' after calling __vmx_deliver_posted_interrupt()

v4:
- Coding style.
- V3 removes a vcpu_kick() from the eoi_exitmap_changed path
  incorrectly, fix it.

v3:
- use cmpxchg to test SN/ON and set ON

 xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c32d863..d2a4cfb 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1701,8 +1701,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
          */
         pi_set_on(&v->arch.hvm_vmx.pi_desc);
     }
-    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
+    else
     {
+        struct pi_desc old, new, prev;
+
+        prev.control = 0;
+
+        do {
+            /*
+             * Currently, we don't support urgent interrupt, all
+             * interrupts are recognized as non-urgent interrupt,
+             * so we cannot send posted-interrupt when 'SN' is set.
+             * Besides that, if 'ON' is already set, we cannot set
+             * posted-interrupts as well.
+             */
+            if ( pi_test_sn(&prev) || pi_test_on(&prev) )
+            {
+                vcpu_kick(v);
+                return;
+            }
+
+            old.control = v->arch.hvm_vmx.pi_desc.control &
+                          ~( 1 << POSTED_INTR_ON | 1 << POSTED_INTR_SN );
+            new.control = v->arch.hvm_vmx.pi_desc.control |
+                          1 << POSTED_INTR_ON;
+
+            prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
+                                   old.control, new.control);
+        } while ( prev.control != old.control );
+
         __vmx_deliver_posted_interrupt(v);
         return;
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 09/17] VT-d: Remove pointless casts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

Remove pointless casts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- Newly added.

 xen/drivers/passthrough/vtd/utils.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 44c4ef5..162b764 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -234,10 +234,9 @@ static void dump_iommu_info(unsigned char key)
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    (u32)p->hi.svt, (u32)p->hi.sq, (u32)p->hi.sid,
-                    (u32)p->lo.dst, (u32)p->lo.vector, (u32)p->lo.avail,
-                    (u32)p->lo.dlm, (u32)p->lo.tm, (u32)p->lo.rh,
-                    (u32)p->lo.dm, (u32)p->lo.fpd, (u32)p->lo.p);
+                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
+                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
+                    p->lo.fpd, p->lo.p);
                 print_cnt++;
             }
             if ( iremap_entries )
@@ -281,11 +280,10 @@ static void dump_iommu_info(unsigned char key)
 
                 printk("   %02x:  %04x   %x    %x   %x   %x   %x    %x"
                     "    %x     %02x\n", i,
-                    (u32)remap->index_0_14 | ((u32)remap->index_15 << 15),
-                    (u32)remap->format, (u32)remap->mask, (u32)remap->trigger,
-                    (u32)remap->irr, (u32)remap->polarity,
-                    (u32)remap->delivery_status, (u32)remap->delivery_mode,
-                    (u32)remap->vector);
+                    remap->index_0_14 | ((u32)remap->index_15 << 15),
+                    remap->format, remap->mask, remap->trigger, remap->irr,
+                    remap->polarity, remap->delivery_status, remap->delivery_mode,
+                    remap->vector);
             }
         }
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 09/17] VT-d: Remove pointless casts Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12  2:35 ` [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

Extend struct iremap_entry according to VT-d Posted-Interrupts Spec.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v4:
- res_4 is not a bitfiled, correct it.
- Expose 'im' to remapped irte as well.

v3:
- Use u32 instead of u64 to define the bitfields in 'struct iremap_entry'
- Limit using bitfield if possible

 xen/drivers/passthrough/vtd/intremap.c | 92 +++++++++++++++++-----------------
 xen/drivers/passthrough/vtd/iommu.h    | 43 ++++++++++------
 xen/drivers/passthrough/vtd/utils.c    |  8 +--
 3 files changed, 80 insertions(+), 63 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index 987bbe9..e9fffa6 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -122,9 +122,9 @@ static u16 hpetid_to_bdf(unsigned int hpet_id)
 static void set_ire_sid(struct iremap_entry *ire,
                         unsigned int svt, unsigned int sq, unsigned int sid)
 {
-    ire->hi.svt = svt;
-    ire->hi.sq = sq;
-    ire->hi.sid = sid;
+    ire->remap.svt = svt;
+    ire->remap.sq = sq;
+    ire->remap.sid = sid;
 }
 
 static void set_ioapic_source_id(int apic_id, struct iremap_entry *ire)
@@ -219,7 +219,7 @@ static unsigned int alloc_remap_entry(struct iommu *iommu, unsigned int nr)
         else
             p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-        if ( p->lo_val || p->hi_val ) /* not a free entry */
+        if ( p->lo || p->hi ) /* not a free entry */
             found = 0;
         else if ( ++found == nr )
             break;
@@ -253,7 +253,7 @@ static int remap_entry_to_ioapic_rte(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->hi == 0 && iremap_entry->lo == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -263,13 +263,13 @@ static int remap_entry_to_ioapic_rte(
         return -EFAULT;
     }
 
-    old_rte->vector = iremap_entry->lo.vector;
-    old_rte->delivery_mode = iremap_entry->lo.dlm;
-    old_rte->dest_mode = iremap_entry->lo.dm;
-    old_rte->trigger = iremap_entry->lo.tm;
+    old_rte->vector = iremap_entry->remap.vector;
+    old_rte->delivery_mode = iremap_entry->remap.dlm;
+    old_rte->dest_mode = iremap_entry->remap.dm;
+    old_rte->trigger = iremap_entry->remap.tm;
     old_rte->__reserved_2 = 0;
     old_rte->dest.logical.__reserved_1 = 0;
-    old_rte->dest.logical.logical_dest = iremap_entry->lo.dst >> 8;
+    old_rte->dest.logical.logical_dest = iremap_entry->remap.dst >> 8;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -317,27 +317,28 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu,
     if ( rte_upper )
     {
         if ( x2apic_enabled )
-            new_ire.lo.dst = value;
+            new_ire.remap.dst = value;
         else
-            new_ire.lo.dst = (value >> 24) << 8;
+            new_ire.remap.dst = (value >> 24) << 8;
     }
     else
     {
         *(((u32 *)&new_rte) + 0) = value;
-        new_ire.lo.fpd = 0;
-        new_ire.lo.dm = new_rte.dest_mode;
-        new_ire.lo.tm = new_rte.trigger;
-        new_ire.lo.dlm = new_rte.delivery_mode;
+        new_ire.remap.fpd = 0;
+        new_ire.remap.dm = new_rte.dest_mode;
+        new_ire.remap.tm = new_rte.trigger;
+        new_ire.remap.dlm = new_rte.delivery_mode;
         /* Hardware require RH = 1 for LPR delivery mode */
-        new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-        new_ire.lo.avail = 0;
-        new_ire.lo.res_1 = 0;
-        new_ire.lo.vector = new_rte.vector;
-        new_ire.lo.res_2 = 0;
+        new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+        new_ire.remap.avail = 0;
+        new_ire.remap.res_1 = 0;
+        new_ire.remap.vector = new_rte.vector;
+        new_ire.remap.res_2 = 0;
 
         set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
-        new_ire.hi.res_1 = 0;
-        new_ire.lo.p = 1;     /* finally, set present bit */
+        new_ire.remap.res_3 = 0;
+        new_ire.remap.res_4 = 0;
+        new_ire.remap.p = 1;     /* finally, set present bit */
 
         /* now construct new ioapic rte entry */
         remap_rte->vector = new_rte.vector;
@@ -510,7 +511,7 @@ static int remap_entry_to_msi_msg(
     GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, index,
                      iremap_entries, iremap_entry);
 
-    if ( iremap_entry->hi_val == 0 && iremap_entry->lo_val == 0 )
+    if ( iremap_entry->hi == 0 && iremap_entry->lo == 0 )
     {
         dprintk(XENLOG_ERR VTDPREFIX,
                 "%s: index (%d) get an empty entry!\n",
@@ -523,25 +524,25 @@ static int remap_entry_to_msi_msg(
     msg->address_hi = MSI_ADDR_BASE_HI;
     msg->address_lo =
         MSI_ADDR_BASE_LO |
-        ((iremap_entry->lo.dm == 0) ?
+        ((iremap_entry->remap.dm == 0) ?
             MSI_ADDR_DESTMODE_PHYS:
             MSI_ADDR_DESTMODE_LOGIC) |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_ADDR_REDIRECTION_CPU:
             MSI_ADDR_REDIRECTION_LOWPRI);
     if ( x2apic_enabled )
-        msg->dest32 = iremap_entry->lo.dst;
+        msg->dest32 = iremap_entry->remap.dst;
     else
-        msg->dest32 = (iremap_entry->lo.dst >> 8) & 0xff;
+        msg->dest32 = (iremap_entry->remap.dst >> 8) & 0xff;
     msg->address_lo |= MSI_ADDR_DEST_ID(msg->dest32);
 
     msg->data =
         MSI_DATA_TRIGGER_EDGE |
         MSI_DATA_LEVEL_ASSERT |
-        ((iremap_entry->lo.dlm != dest_LowestPrio) ?
+        ((iremap_entry->remap.dlm != dest_LowestPrio) ?
             MSI_DATA_DELIVERY_FIXED:
             MSI_DATA_DELIVERY_LOWPRI) |
-        iremap_entry->lo.vector;
+        iremap_entry->remap.vector;
 
     unmap_vtd_domain_page(iremap_entries);
     spin_unlock_irqrestore(&ir_ctrl->iremap_lock, flags);
@@ -600,29 +601,30 @@ static int msi_msg_to_remap_entry(
     memcpy(&new_ire, iremap_entry, sizeof(struct iremap_entry));
 
     /* Set interrupt remapping table entry */
-    new_ire.lo.fpd = 0;
-    new_ire.lo.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
-    new_ire.lo.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
-    new_ire.lo.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
+    new_ire.remap.fpd = 0;
+    new_ire.remap.dm = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
+    new_ire.remap.tm = (msg->data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+    new_ire.remap.dlm = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     /* Hardware require RH = 1 for LPR delivery mode */
-    new_ire.lo.rh = (new_ire.lo.dlm == dest_LowestPrio);
-    new_ire.lo.avail = 0;
-    new_ire.lo.res_1 = 0;
-    new_ire.lo.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
-                        MSI_DATA_VECTOR_MASK;
-    new_ire.lo.res_2 = 0;
+    new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+    new_ire.remap.avail = 0;
+    new_ire.remap.res_1 = 0;
+    new_ire.remap.vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) &
+                            MSI_DATA_VECTOR_MASK;
+    new_ire.remap.res_2 = 0;
     if ( x2apic_enabled )
-        new_ire.lo.dst = msg->dest32;
+        new_ire.remap.dst = msg->dest32;
     else
-        new_ire.lo.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
-                          & 0xff) << 8;
+        new_ire.remap.dst = ((msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT)
+                             & 0xff) << 8;
 
     if ( pdev )
         set_msi_source_id(pdev, &new_ire);
     else
         set_hpet_source_id(msi_desc->hpet_id, &new_ire);
-    new_ire.hi.res_1 = 0;
-    new_ire.lo.p = 1;    /* finally, set present bit */
+    new_ire.remap.res_3 = 0;
+    new_ire.remap.res_4 = 0;
+    new_ire.remap.p = 1;    /* finally, set present bit */
 
     /* now construct new MSI/MSI-X rte entry */
     remap_rte = (struct msi_msg_remap_entry *)msg;
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 22abefe..6fca430 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -281,29 +281,44 @@ struct dma_pte {
 /* interrupt remap entry */
 struct iremap_entry {
   union {
-    u64 lo_val;
+    struct { u64 lo, hi; };
     struct {
-        u64 p       : 1,
+        u16 p       : 1,
             fpd     : 1,
             dm      : 1,
             rh      : 1,
             tm      : 1,
             dlm     : 3,
             avail   : 4,
-            res_1   : 4,
-            vector  : 8,
-            res_2   : 8,
-            dst     : 32;
-    }lo;
-  };
-  union {
-    u64 hi_val;
+            res_1   : 3,
+            im      : 1;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4;
+    } remap;
     struct {
-        u64 sid     : 16,
-            sq      : 2,
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
             svt     : 2,
-            res_1   : 44;
-    }hi;
+            res_5   : 12;
+        u32 pda_h;
+    } post;
   };
 };
 
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 162b764..9d556da 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -230,13 +230,13 @@ static void dump_iommu_info(unsigned char key)
                 else
                     p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-                if ( !p->lo.p )
+                if ( !p->remap.p )
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
-                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
-                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
-                    p->lo.fpd, p->lo.p);
+                    p->remap.svt, p->remap.sq, p->remap.sid, p->remap.dst,
+                    p->remap.vector, p->remap.avail, p->remap.dlm, p->remap.tm,
+                    p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
                 print_cnt++;
             }
             if ( iremap_entries )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 16:23   ` Konrad Rzeszutek Wilk
  2015-08-12 17:41   ` Andrew Cooper
  2015-08-12  2:35 ` [PATCH v5 12/17] Update IRTE according to guest interrupt config changes Feng Wu
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, Jan Beulich, Yang Zhang, Feng Wu

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v5:
- Make some function parameters const
- Call "spin_unlock_irq(&desc->lock);" a little eariler
- Add "ASSERT(spin_is_locked(&pcidevs_lock))"
- -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen

v4:
- Don't inline setup_posted_irte()
- const struct pi_desc *pi_desc for setup_posted_irte()
- return -EINVAL when pirq_spin_lock_irq_desc() fails.
- Make some variables const
- Release irq desc lock earlier in pi_update_irte()
- Remove the pointless do-while() loop when doing cmpxchg16b()

v3:
- Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
- Change the return type of pi_update_irte() to int.
- Remove some pointless printk message in pi_update_irte().
- Use structure assignment instead of memcpy() for irte copy.

 xen/drivers/passthrough/vtd/intremap.c | 112 +++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |   2 +
 xen/include/asm-x86/iommu.h            |   2 +
 3 files changed, 116 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index e9fffa6..8ec85d3 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -899,3 +899,115 @@ void iommu_disable_x2apic_IR(void)
     for_each_drhd_unit ( drhd )
         disable_qinval(drhd->iommu);
 }
+
+static void setup_posted_irte(
+    struct iremap_entry *new_ire,
+    const struct pi_desc *pi_desc,
+    const uint8_t gvec)
+{
+    new_ire->post.urg = 0;
+    new_ire->post.vector = gvec;
+    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
+
+    new_ire->post.res_1 = 0;
+    new_ire->post.res_2 = 0;
+    new_ire->post.res_3 = 0;
+    new_ire->post.res_4 = 0;
+    new_ire->post.res_5 = 0;
+
+    new_ire->post.im = 1;
+}
+
+/*
+ * This function is used to update the IRTE for posted-interrupt
+ * when guest changes MSI/MSI-X information.
+ */
+int pi_update_irte(
+    const struct vcpu *v,
+    const struct pirq *pirq,
+    const uint8_t gvec)
+{
+    struct irq_desc *desc;
+    const struct msi_desc *msi_desc;
+    int remap_index;
+    int rc = 0;
+    const struct pci_dev *pci_dev;
+    const struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+    struct ir_ctrl *ir_ctrl;
+    struct iremap_entry *iremap_entries = NULL, *p = NULL;
+    struct iremap_entry new_ire, old_ire;
+    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    __uint128_t ret;
+
+    desc = pirq_spin_lock_irq_desc(pirq, NULL);
+    if ( !desc )
+        return -EINVAL;
+
+    msi_desc = desc->msi_desc;
+    if ( !msi_desc )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    pci_dev = msi_desc->dev;
+    if ( !pci_dev )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    remap_index = msi_desc->remap_index;
+
+    spin_unlock_irq(&desc->lock);
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    /*
+     * For performance concern, we will store the 'iommu' pointer in
+     * 'struct msi_desc' in some other place, so we don't need to waste
+     * time searching it here. I will fix this later.
+     */
+    drhd = acpi_find_matched_drhd_unit(pci_dev);
+    if ( !drhd )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    iommu = drhd->iommu;
+    ir_ctrl = iommu_ir_ctrl(iommu);
+    if ( !ir_ctrl )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    spin_lock_irq(&ir_ctrl->iremap_lock);
+
+    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
+
+    old_ire = new_ire = *p;
+
+    /* Setup/Update interrupt remapping table entry. */
+    setup_posted_irte(&new_ire, pi_desc, gvec);
+    ret = cmpxchg16b(p, &old_ire, &new_ire);
+
+    ASSERT(ret == *(__uint128_t *)&old_ire);
+
+    iommu_flush_cache_entry(p, sizeof(*p));
+    iommu_flush_iec_index(iommu, 0, remap_index);
+
+    unmap_vtd_domain_page(iremap_entries);
+
+    spin_unlock_irq(&ir_ctrl->iremap_lock);
+
+    return 0;
+
+ unlock_out:
+    spin_unlock_irq(&desc->lock);
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 6fca430..ff4ceb6 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -322,6 +322,8 @@ struct iremap_entry {
   };
 };
 
+#define PDA_LOW_BIT    26
+
 /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
 #define IREMAP_PAGE_ORDER  8
 
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 29203d7..92f0900 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -31,6 +31,8 @@ int iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 12/17] Update IRTE according to guest interrupt config changes
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 16:43   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu, Jan Beulich

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- Make 'struct vcpu *vcpu' const

v4:
- Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
- Make 'dest_id' uint32_t
- Rename 'size' to 'bitmap_array_size'
- find_next_bit() and find_first_bit() always return unsigned int,
  so no need to check whether the return value is less than 0.
- Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
- Remove useless warning message
- Create a seperate function vector_hashing_dest() to find the
- destination of lowest-priority interrupts.
- Change some comments

v3:
- Use bitmap to store the all the possible destination vCPUs of an
  interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index bda9374..f62f86c 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -25,6 +25,7 @@
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
+#include <asm/io_apic.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *    vCPUs in an array.
+ * 2. Use "gvec % max number of destination vCPUs" to find the right
+ *    destination vCPU in the array for the lowest-priority interrupt.
+ */
+static struct vcpu *vector_hashing_dest(const struct domain *d,
+                                        uint32_t dest_id,
+                                        bool_t dest_mode,
+                                        uint8_t gvec)
+
+{
+    unsigned long *dest_vcpu_bitmap;
+    unsigned int dest_vcpu_num = 0, idx;
+    unsigned int bitmap_array_size = BITS_TO_LONGS(d->max_vcpus);
+    struct vcpu *v, *dest = NULL;
+    unsigned int i;
+
+    dest_vcpu_bitmap = xzalloc_array(unsigned long, bitmap_array_size);
+    if ( !dest_vcpu_bitmap )
+    {
+        dprintk(XENLOG_G_INFO,
+                "dom%d: failed to allocate memory\n", d->domain_id);
+        return NULL;
+    }
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
+                                dest_id, dest_mode) )
+            continue;
+
+        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
+        dest_vcpu_num++;
+    }
+
+    if ( dest_vcpu_num != 0 )
+    {
+        idx = 0;
+
+        for ( i = gvec % dest_vcpu_num; i >= 0; i--)
+        {
+            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
+            BUG_ON(idx >= d->max_vcpus);
+        }
+        idx--;
+
+        dest = d->vcpu[idx];
+    }
+
+    xfree(dest_vcpu_bitmap);
+
+    return dest;
+}
+
+/*
+ * The purpose of this routine is to find the right destination vCPU for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:
+ *
+ * - For lowest-priority interrupts, use vector-hashing mechanism to find
+ *   the destination.
+ * - Otherwise, for single destination interrupt, it is straightforward to
+ *   find the destination vCPU and return true.
+ * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
+ *   so return NULL.
+ */
+static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
+                                      bool_t dest_mode, uint8_t delivery_mode,
+                                      uint8_t gvec)
+{
+    unsigned int dest_vcpu_num = 0;
+    struct vcpu *v, *dest = NULL;
+
+    if ( delivery_mode == dest_LowestPrio )
+        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
+                                dest_id, dest_mode) )
+            continue;
+
+        dest_vcpu_num++;
+        dest = v;
+    }
+
+    /*
+     * For fixed destination, we only handle single-destination
+     * interrupts.
+     */
+    if ( dest_vcpu_num == 1 )
+	return dest;
+
+    return NULL;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -256,7 +359,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, dest_mode;
+        uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -329,11 +432,30 @@ int pt_irq_create_bind(
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
         dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
         dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
+        delivery_mode = (pirq_dpci->gmsi.gflags >> GFLAGS_SHIFT_DELIV_MODE) &
+                        VMSI_DELIV_MASK;
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+
+        /* Use interrupt posting if it is supported */
+        if ( iommu_intpost )
+        {
+            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
+                                          delivery_mode, pirq_dpci->gmsi.gvec);
+
+            if ( vcpu )
+            {
+                rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+                if ( unlikely(rc) )
+                    dprintk(XENLOG_G_INFO,
+                            "%pv: failed to update PI IRTE, gvec:%02x\n",
+                            vcpu, pirq_dpci->gmsi.gvec);
+            }
+        }
+
         break;
     }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 12/17] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 16:59   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch includes the following aspects:
- Add a global vector to wake up the blocked vCPU
  when an interrupt is being posted to it (This
  part was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
- Adds a new per-vCPU tasklet to wakeup the blocked
  vCPU. It can be used in the case vcpu_unblock
  cannot be called directly.
- Define two per-cpu variables:
      * pi_blocked_vcpu:
      A list storing the vCPUs which were blocked on this pCPU.

      * pi_blocked_vcpu_lock:
      The spinlock to protect pi_blocked_vcpu.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v4:
- Use local variables in pi_wakeup_interrupt()
- Remove vcpu from the blocked list when pi_desc.on==1, this
- avoid kick vcpu multiple times.
- Remove tasklet

v3:
- This patch is generated by merging the following three patches in v2:
   [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
   [RFC v2 10/15] vmx: Define two per-cpu variables
   [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
- rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
- Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
- rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
- Make pi_wakeup_interrupt() static
- Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
- move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
- Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
- Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'

 xen/arch/x86/hvm/vmx/vmcs.c        |  3 ++
 xen/arch/x86/hvm/vmx/vmx.c         | 63 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  3 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  5 +++
 4 files changed, 74 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 28c553f..2dabf16 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -661,6 +661,9 @@ int vmx_cpu_up(void)
     if ( cpu_has_vmx_vpid )
         vpid_sync_all();
 
+    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
+    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d2a4cfb..e80d888 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -83,7 +83,15 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be waken up.
+ */
+DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
+DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -106,6 +114,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
 
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
@@ -1975,6 +1986,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/*
+ * Handle VT-d posted-interrupt when VCPU is blocked.
+ */
+static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+    struct arch_vmx_struct *vmx, *tmp;
+    struct vcpu *v;
+    spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock);
+    struct list_head *blocked_vcpus = &this_cpu(pi_blocked_vcpu);
+    LIST_HEAD(list);
+
+    spin_lock(lock);
+
+    /*
+     * XXX: The length of the list depends on how many vCPU is current
+     * blocked on this specific pCPU. This may hurt the interrupt latency
+     * if the list grows to too many entries.
+     */
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
+    {
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del_init(&vmx->pi_blocked_vcpu_list);
+
+            /*
+             * We cannot call vcpu_unblock here, since it also needs
+             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
+             * set in another list and unblock them after we release
+             * 'pi_blocked_vcpu_lock'.
+             */
+            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
+        }
+    }
+
+    spin_unlock(lock);
+
+    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
+    {
+        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+        list_del_init(&vmx->pi_vcpu_on_set_list);
+        vcpu_unblock(v);
+    }
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2012,7 +2070,12 @@ const struct hvm_function_table * __init start_vmx(void)
     }
 
     if ( cpu_has_vmx_posted_intr_processing )
+    {
         alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+
+        if ( iommu_intpost )
+            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+    }
     else
     {
         vmx_function_table.deliver_posted_intr = NULL;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 7e81752..9a986d0 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -161,6 +161,9 @@ struct arch_vmx_struct {
     struct page_info     *vmwrite_bitmap;
 
     struct page_info     *pml_pg;
+
+    struct list_head     pi_blocked_vcpu_list;
+    struct list_head     pi_vcpu_on_set_list;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 03c529c..3fd66ba 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -28,6 +28,11 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/vmx/vmcs.h>
 
+DECLARE_PER_CPU(struct list_head, pi_blocked_vcpu);
+DECLARE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
+extern uint8_t pi_wakeup_vector;
+
 typedef union {
     struct {
         u64 r       :   1,  /* bit 0 - Read permission */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:02   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v4:
- Coding style.

v3:
- Make pi_notification_interrupt() static

 xen/arch/x86/hvm/vmx/vmx.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e80d888..c8a4371 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2033,6 +2033,51 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
     this_cpu(irq_count)++;
 }
 
+/* Handle VT-d posted-interrupt when VCPU is running. */
+static void pi_notification_interrupt(struct cpu_user_regs *regs)
+{
+    /*
+     * We get here when a vCPU is running in root-mode (such as via hypercall,
+     * or any other reasons which can result in VM-Exit), and before vCPU is
+     * back to non-root, external interrupts from an assigned device happen
+     * and a notification event is delivered to this logical CPU.
+     *
+     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
+     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
+     * be synced to vIRR before VM-Exit in time.
+     *
+     * Please refer to the following code fragments from
+     * xen/arch/x86/hvm/vmx/entry.S:
+     *
+     *     .Lvmx_do_vmentry
+     *
+     *      ......
+     *      point 1
+     *
+     *      cmp  %ecx,(%rdx,%rax,1)
+     *      jnz  .Lvmx_process_softirqs
+     *
+     *      ......
+     *
+     *      je   .Lvmx_launch
+     *
+     *      ......
+     *
+     *     .Lvmx_process_softirqs:
+     *      sti
+     *      call do_softirq
+     *      jmp  .Lvmx_do_vmentry
+     *
+     * If VT-d engine issues a notification event at point 1 above, it cannot
+     * be delivered to the guest during this VM-entry without raising the
+     * softirq in this notification handler.
+     */
+    raise_softirq(VCPU_KICK_SOFTIRQ);
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2071,7 +2116,7 @@ const struct hvm_function_table * __init start_vmx(void)
 
     if ( cpu_has_vmx_posted_intr_processing )
     {
-        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+        alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
 
         if ( iommu_intpost )
             alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (13 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:13   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 16/17] VT-d: Dump the posted format IRTE Feng Wu
  2015-08-12  2:35 ` [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Feng Wu

This patch adds the following arch hooks in scheduler:
- vmx_pre_ctx_switch_pi():
It is called before context switch, we update the posted
interrupt descriptor when the vCPU is preempted, go to sleep,
or is blocked.

- vmx_post_ctx_switch_pi()
It is called after context switch, we update the posted
interrupt descriptor when the vCPU is going to run.

- arch_vcpu_wake_prepare()
It will be called when waking up the vCPU, we update
the posted interrupt descriptor when the vCPU is unblocked.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Sugguested-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v5:
- Rename arch_vcpu_wake to arch_vcpu_wake_prepare
- Make arch_vcpu_wake_prepare() inline for ARM
- Merge the ARM dummy hook with together
- Changes to some code comments
- Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
  PI is disabled or the vCPU is not in HVM
- Coding style

v4:
- Newly added

 xen/arch/x86/domain.c              |  11 +++
 xen/arch/x86/hvm/vmx/vmx.c         | 147 +++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c              |   2 +
 xen/include/asm-arm/domain.h       |   2 +
 xen/include/asm-x86/domain.h       |   3 +
 xen/include/asm-x86/hvm/hvm.h      |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   8 ++
 7 files changed, 175 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..130f859 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1605,9 +1605,20 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     set_current(next);
 
+    /*
+     * When switching from non-idle to idle, we only do a lazy context switch.
+     * However, in order for posted interrupt (if available and enabled) to
+     * work properly, we at least need to update the descriptors.
+     */
+    if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
+        prev->arch.pi_ctxt_switch_from(prev);
+
     if ( (per_cpu(curr_vcpu, cpu) == next) ||
          (is_idle_domain(nextd) && cpu_online(cpu)) )
     {
+        if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
+            next->arch.pi_ctxt_switch_to(next);
+
         local_irq_enable();
     }
     else
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c8a4371..758809a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
 
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
+static void vmx_pre_ctx_switch_pi(struct vcpu *v);
+static void vmx_post_ctx_switch_pi(struct vcpu *v);
 
 static int  vmx_alloc_vlapic_mapping(struct domain *d);
 static void vmx_free_vlapic_mapping(struct domain *d);
@@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
     INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
 
+    v->arch.hvm_vmx.pi_block_cpu = -1;
+
+    spin_lock_init(&v->arch.hvm_vmx.pi_lock);
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
 
+    if ( iommu_intpost && is_hvm_vcpu(v) )
+    {
+        v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
+        v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
+    }
+
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
@@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v)
     }
 }
 
+void arch_vcpu_wake_prepare(struct vcpu *v)
+{
+    unsigned long gflags;
+
+    if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v->domain) )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
+
+    if ( likely(vcpu_runnable(v)) ||
+         !test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+        unsigned long flags;
+
+        /*
+         * We don't need to send notification event to a non-running
+         * vcpu, the interrupt information will be delivered to it before
+         * VM-ENTRY when the vcpu is scheduled to run next time.
+         */
+        pi_set_sn(pi_desc);
+
+        /*
+         * Set 'NV' feild back to posted_intr_vector, so the
+         * Posted-Interrupts can be delivered to the vCPU by
+         * VT-d HW after it is scheduled to run.
+         */
+        write_atomic((uint8_t*)&pi_desc->nv, posted_intr_vector);
+
+        /*
+         * Delete the vCPU from the related block list
+         * if we are resuming from blocked state
+         */
+        if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
+        {
+            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                              v->arch.hvm_vmx.pi_block_cpu), flags);
+            list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+            spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                                    v->arch.hvm_vmx.pi_block_cpu), flags);
+        }
+    }
+
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
+}
+
+static void vmx_pre_ctx_switch_pi(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    struct pi_desc old, new;
+    unsigned long flags, gflags;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
+
+    if ( vcpu_runnable(v) || !test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        /*
+         * The vCPU has been preempted or went to sleep. We don't need to send
+         * notification event to a non-running vcpu, the interrupt information
+         * will be delivered to it before VM-ENTRY when the vcpu is scheduled
+         * to run next time.
+         */
+        pi_set_sn(pi_desc);
+
+    }
+    else if ( test_bit(_VPF_blocked, &v->pause_flags) )
+    {
+        /*
+         * The vCPU is blocking, we need to add it to one of the per pCPU lists.
+         * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
+         * the per-CPU list, we also save it to posted-interrupt descriptor and
+         * make it as the destination of the wake-up notification event.
+         */
+        v->arch.hvm_vmx.pi_block_cpu = v->processor;
+        spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                          v->arch.hvm_vmx.pi_block_cpu), flags);
+        list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                      &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                           v->arch.hvm_vmx.pi_block_cpu), flags);
+
+        do {
+            old.control = new.control = pi_desc->control;
+
+            /* Should not block the vCPU if an interrupt was posted for it */
+            if ( pi_test_on(&old) )
+            {
+                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
+                vcpu_unblock(v);
+                return;
+            }
+
+            /*
+             * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
+             * so when external interrupts from assigned deivces happen,
+             * wakeup notifiction event will go to
+             * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
+             * we can find the vCPU in the right list to wake up.
+             */
+            if ( x2apic_enabled )
+                new.ndst = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
+            else
+                new.ndst = MASK_INSR(cpu_physical_id(
+                                 v->arch.hvm_vmx.pi_block_cpu),
+                                 PI_xAPIC_NDST_MASK);
+            pi_clear_sn(&new);
+            new.nv = pi_wakeup_vector;
+        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                  != old.control );
+    }
+
+    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
+}
+
+static void vmx_post_ctx_switch_pi(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    if ( x2apic_enabled )
+        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
+    else
+        write_atomic(&pi_desc->ndst,
+                     MASK_INSR(cpu_physical_id(v->processor),
+                     PI_xAPIC_NDST_MASK));
+
+    pi_clear_sn(pi_desc);
+}
+
 static void vmx_ctxt_switch_from(struct vcpu *v)
 {
     /*
@@ -756,6 +902,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
+    vmx_post_ctx_switch_pi(v);
 }
 
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 3eefed7..bc49098 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -412,6 +412,8 @@ void vcpu_wake(struct vcpu *v)
     unsigned long flags;
     spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
+    arch_vcpu_wake_prepare(v);
+
     if ( likely(vcpu_runnable(v)) )
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 56aa208..cffe2c6 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -301,6 +301,8 @@ static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
     return vaff;
 }
 
+static inline void arch_vcpu_wake_prepare(struct vcpu *v) {}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 0fce09e..979210a 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -481,6 +481,9 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    void (*pi_ctxt_switch_from) (struct vcpu *);
+    void (*pi_ctxt_switch_to) (struct vcpu *);
+
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 3cac64f..95f5357 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -545,6 +545,8 @@ static inline bool_t hvm_altp2m_supported(void)
     return hvm_funcs.altp2m_supported;
 }
 
+void arch_vcpu_wake_prepare(struct vcpu *v);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 9a986d0..209fb39 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -164,6 +164,14 @@ struct arch_vmx_struct {
 
     struct list_head     pi_blocked_vcpu_list;
     struct list_head     pi_vcpu_on_set_list;
+
+    /*
+     * Before vCPU is blocked, it is added to the global per-cpu list
+     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
+     * event to 'pi_block_cpu' and wakeup the related vCPU.
+     */
+    int                  pi_block_cpu;
+    spinlock_t           pi_lock;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 16/17] VT-d: Dump the posted format IRTE
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (14 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:14   ` Konrad Rzeszutek Wilk
  2015-08-12  2:35 ` [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Yang Zhang, Kevin Tian, Feng Wu

Add the utility to dump the posted format IRTE.

CC: Yang Zhang <yang.z.zhang@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v4:
- Newly added

 xen/drivers/passthrough/vtd/utils.c | 41 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 9d556da..1848385 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -203,6 +203,9 @@ static void dump_iommu_info(unsigned char key)
             ecap_intr_remap(iommu->ecap) ? "" : "not ",
             (status & DMA_GSTS_IRES) ? " and enabled" : "" );
 
+        printk("  Interrupt Posting: %ssupported.\n",
+            cap_intr_post(iommu->ecap) ? "" : "not ");
+
         if ( status & DMA_GSTS_IRES )
         {
             /* Dump interrupt remapping table. */
@@ -213,6 +216,7 @@ static void dump_iommu_info(unsigned char key)
 
             printk("  Interrupt remapping table (nr_entry=%#x. "
                 "Only dump P=1 entries here):\n", nr_entry);
+            printk ("Entries for remapped format:\n");
             printk("       SVT  SQ   SID      DST  V  AVL DLM TM RH DM "
                    "FPD P\n");
             for ( i = 0; i < nr_entry; i++ )
@@ -230,7 +234,7 @@ static void dump_iommu_info(unsigned char key)
                 else
                     p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
 
-                if ( !p->remap.p )
+                if ( !p->remap.p || p->remap.im )
                     continue;
                 printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
                     "   %x %x\n", i,
@@ -239,6 +243,41 @@ static void dump_iommu_info(unsigned char key)
                     p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
                 print_cnt++;
             }
+
+            if ( iremap_entries )
+                unmap_vtd_domain_page(iremap_entries);
+
+            iremap_entries = NULL;
+            printk ("\nEntries for posted format:\n");
+            printk("       SVT  SQ   SID              PDA  V  URG AVL FPD P\n");
+            for ( i = 0; i < nr_entry; i++ )
+            {
+                struct iremap_entry *p;
+                if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
+                {
+                    /* This entry across page boundry */
+                    if ( iremap_entries )
+                        unmap_vtd_domain_page(iremap_entries);
+
+                    GET_IREMAP_ENTRY(iremap_maddr, i,
+                                     iremap_entries, p);
+                }
+                else
+                    p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
+
+                if ( !p->post.p || !p->post.im )
+                    continue;
+
+                printk("  %04x:  %x   %x  %04x %16lx %02x    %x   %x  %x  %x\n",
+                    i,
+                    p->post.svt, p->post.sq, p->post.sid,
+                    ((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
+                    p->post.vector, p->post.urg, p->post.avail, p->post.fpd,
+                    p->post.p);
+
+                print_cnt++;
+            }
+
             if ( iremap_entries )
                 unmap_vtd_domain_page(iremap_entries);
             if ( iommu_ir_ctrl(iommu)->iremap_num != print_cnt )
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts
  2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
                   ` (15 preceding siblings ...)
  2015-08-12  2:35 ` [PATCH v5 16/17] VT-d: Dump the posted format IRTE Feng Wu
@ 2015-08-12  2:35 ` Feng Wu
  2015-08-12 17:16   ` Konrad Rzeszutek Wilk
  16 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-08-12  2:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

Enable VT-d Posted-Interrupts and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 docs/misc/xen-command-line.markdown | 9 ++++++++-
 xen/drivers/passthrough/iommu.c     | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 204e7a4..d83a292 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -855,7 +855,7 @@ debug hypervisor only).
 > Default: `new` unless directed-EOI is supported
 
 ### iommu
-> `= List of [ <boolean> | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | verbose | debug ]`
+> `= List of [ <boolean> | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | verbose | debug ]`
 
 > Sub-options:
 
@@ -882,6 +882,13 @@ debug hypervisor only).
 >> Control the use of interrupt remapping (DMA remapping will always be enabled
 >> if IOMMU functionality is enabled).
 
+> `intpost`
+
+> Default: `true`
+
+>> Control the use of interrupt posting, which depends on the availability of
+>> interrupt remapping.
+
 > `qinval` (VT-d)
 
 > Default: `true`
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 8eb77f7..84b1e43 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -38,6 +38,7 @@ static void iommu_dump_p2m_table(unsigned char key);
  *   no-snoop                   Disable VT-d Snoop Control
  *   no-qinval                  Disable VT-d Queued Invalidation
  *   no-intremap                Disable VT-d Interrupt Remapping
+ *   no-intpost                 Disable VT-d Interrupt posting
  */
 custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
@@ -102,6 +103,8 @@ static void __init parse_iommu_param(char *s)
             iommu_qinval = val;
         else if ( !strcmp(s, "intremap") )
             iommu_intremap = val;
+        else if ( !strcmp(s, "intpost") )
+            iommu_intpost = val;
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-12  2:35 ` [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-08-12 15:35   ` Konrad Rzeszutek Wilk
  2015-08-12 17:27     ` Andrew Cooper
  2015-08-13  1:37     ` Wu, Feng
  0 siblings, 2 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:35 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper, xen-devel,
	Jan Beulich, Yang Zhang

On Wed, Aug 12, 2015 at 10:35:22AM +0800, Feng Wu wrote:

The title has an extra 'i'.

> Add the design doc for VT-d PI.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  docs/misc/vtd-pi.txt | 333 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 333 insertions(+)
>  create mode 100644 docs/misc/vtd-pi.txt
> 
> diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> new file mode 100644
> index 0000000..98a77ba
> --- /dev/null
> +++ b/docs/misc/vtd-pi.txt
> @@ -0,0 +1,333 @@
> +Authors: Feng Wu <feng.wu@intel.com>
> +
> +VT-d Posted-interrupt (PI) design for XEN
> +
> +Background
> +==========
> +With the development of virtualization, there are more and more device
> +assignment requirements. However, today when a VM is running with
> +assigned devices (such as, NIC), external interrupt handling for the assigned
> +devices always needs VMM intervention.
> +
> +VT-d Posted-interrupt is a more enhanced method to handle interrupts
> +in the virtualization environment. Interrupt posting is the process by
> +which an interrupt request is recorded in a memory-resident
> +posted-interrupt-descriptor structure by the root-complex, followed by
> +an optional notification event issued to the CPU complex.
> +
> +With VT-d Posted-interrupt we can get the following advantages:
> +- Direct delivery of external interrupts to running vCPUs without VMM
> +intervention
> +- Decrease the interrupt migration complexity. On vCPU migration, software
> +can atomically co-migrate all interrupts targeting the migrating vCPU. For
> +virtual machines with assigned devices, migrating a vCPU across pCPUs
> +either incur the overhead of forwarding interrupts in software (e.g. via VMM

s/incur/incurs/
> +generated IPIS), or complexity to independently migrate each interrupt targeting

s/IPIS/IPIs
> +the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
> +of an external interrupt from assigned devices is stored in the IRTE (i.e.
> +Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
> +we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor, this
> +make the interrupt migration automatic.
> +
> +Here is what Xen currently does for external interrupts from assigned devices:
> +
> +When a VM is running and an external interrupt from an assigned device occurs
> +for it. VM-EXIT happens, then:
> +
> +vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
> +raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> +
> +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> +
> +dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
> +vmsi_inj_irq() --> vlapic_set_irq()
> +
> +vlapic_set_irq() does the following things:
> +1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
> +the virtual interrupt via posted-interrupt infrastructure.
> +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
> +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
> +will help to inject the interrupt to guests.
> +
> +However, after VT-d PI is supported, when a guest is running in non-root and an
> +external interrupt from an assigned device occurs for it. No VM-Exit is needed,
> +the guest can handle this totally in non-root mode, thus avoiding all the above
> +code flow.
> +
> +Posted-interrupt Introduction
> +========================
> +There are two components to the Posted-interrupt architecture:
> +Processor Support and Root-Complex Support
> +
> +- Processor Support
> +Posted-interrupt processing is a feature by which a processor processes
> +the virtual interrupts by recording them as pending on the virtual-APIC
> +page.
> +
> +Posted-interrupt processing is enabled by setting the process posted
> +interrupts VM-execution control. The processing is performed in response
> +to the arrival of an interrupt with the posted-interrupt notification vector.
> +In response to such an interrupt, the processor processes virtual interrupts
> +recorded in a data structure called a posted-interrupt descriptor.
> +
> +More information about APICv and CPU-side Posted-interrupt, please refer
> +to Chapter 29, and Section 29.6 in the Intel SDM:
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> +
> +- Root-Complex Support
> +Interrupt posting is the process by which an interrupt request (from IOAPIC
> +or MSI/MSIx capable sources) is recorded in a memory-resident
> +posted-interrupt-descriptor structure by the root-complex, followed by
> +an optional notification event issued to the CPU complex. The interrupt
> +request arriving at the root-complex carry the identity of the interrupt
> +request source and a 'remapping-index'. The remapping-index is used to
> +look-up an entry from the memory-resident interrupt-remap-table. Unlike
> +with interrupt-remapping, the interrupt-remap-table-entry for a posted-

s/with//
> +interrupt, specifies a virtual-vector and a pointer to the posted-interrupt
> +descriptor. The virtual-vector specifies the vector of the interrupt to be
> +recorded in the posted-interrupt descriptor. The posted-interrupt descriptor
> +hosts storage for the virtual-vectors and contains the attributes of the
> +notification event (interrupt) to be issued to the CPU complex to inform
> +CPU/software about pending interrupts recorded in the posted-interrupt
> +descriptor.
> +
> +More information about VT-d PI, please refer to
> +http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> +
> +Important Definitions
> +==================
> +There are some changes to IRTE and posted-interrupt descriptor after
> +VT-d PI is introduced:
> +IRTE: Interrupt Remapping Table Entry
> +Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
> +Virtual Vector: the guest vector of the interrupt
> +URG: indicates if the interrupt is urgent
> +
> +Posted-interrupt descriptor:
> +The Posted Interrupt Descriptor hosts the following fields:
> +Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
> +per vector, for up to 256 vectors).
> +
> +Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
> +processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
> +hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
> +the notification event (processor or software) resets it as part of posted interrupt processing.
> +
> +Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
> +generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
> +URG=0).
> +
> +Notification Vector (NV): Specify the vector for notification event (interrupt).
> +
> +Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
> +processor for the notification event.
> +
> +Design Overview
> +==============
> +In this design, we will cover the following items:
> +1. Add a variable to control whether enable VT-d posted-interrupt or not.
> +2. VT-d PI feature detection.
> +3. Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff.

s/stuff/items/

But that really is up to you :-)

> +4. Extend IRTE structure to support VT-d PI.
> +5. Introduce a new global vector which is used for waking up the blocked vCPU.
> +6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
> +7. Update posted-interrupt descriptor during vCPU scheduling (when the state
> +of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
> +RUNSTATE_runnable / RUNSTATE_offline).
> +8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
> +9. New boot command line for Xen, which controls VT-d PI feature by user.
> +10. Multicast/broadcast and lowest priority interrupts consideration.
> +
> +
> +Implementation details
> +===================
> +- New variable to control VT-d PI
> +
> +Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
> +to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
> +only when interrupt remapping and VT-d posted-interrupt are both enabled.
> +
> +- VT-d PI feature detection.
> +Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
> +
> +- Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff.
> +Here is the new structure for posted-interrupt descriptor:
> +
> +struct pi_desc {
> +    DECLARE_BITMAP(pir, NR_VECTORS);
> +    union {
> +        struct
> +        {
> +        u16 on     : 1,  /* bit 256 - Outstanding Notification */
> +            sn     : 1,  /* bit 257 - Suppress Notification */
> +            rsvd_1 : 14; /* bit 271:258 - Reserved */
> +        u8  nv;          /* bit 279:272 - Notification Vector */
> +        u8  rsvd_2;      /* bit 287:280 - Reserved */
> +        u32 ndst;        /* bit 319:288 - Notification Destination */
> +        };
> +        u64 control;
> +    };
> +    u32 rsvd[6];
> +} __attribute__ ((aligned (64)));
> +
> +- Extend IRTE structure to support VT-d PI.
> +
> +Here is the new structure for IRTE:
> +/* interrupt remap entry */
> +struct iremap_entry {
> +  union {
> +    struct { u64 lo, hi; };
> +    struct {
> +        u16 p       : 1,
> +            fpd     : 1,
> +            dm      : 1,
> +            rh      : 1,
> +            tm      : 1,
> +            dlm     : 3,
> +            avail   : 4,
> +            res_1   : 4;
> +        u8  vector;
> +        u8  res_2;
> +        u32 dst;
> +        u16 sid;
> +        u16 sq      : 2,
> +            svt     : 2,
> +            res_3   : 12;
> +        u32 res_4   : 32;
> +    } remap;
> +    struct {
> +        u16 p       : 1,
> +            fpd     : 1,
> +            res_1   : 6,
> +            avail   : 4,
> +            res_2   : 2,
> +            urg     : 1,
> +            im      : 1;
> +        u8  vector;
> +        u8  res_3;
> +        u32 res_4   : 6,
> +            pda_l   : 26;
> +        u16 sid;
> +        u16 sq      : 2,
> +            svt     : 2,
> +            res_5   : 12;
> +        u32 pda_h;
> +    } post;
> +  };
> +};
> +
> +- Introduce a new global vector which is used to wake up the blocked vCPU.
> +
> +Currently, there is a global vector 'posted_intr_vector', which is used as the
> +global notification vector for all vCPUs in the system. This vector is stored in
> +VMCS and CPU considers it as a _special_ vector, uses it to notify the related
> +pCPU when an interrupt is recorded in the posted-interrupt descriptor.
> +
> +This existing global vector is a _special_ vector to CPU, CPU handle it in a
> +_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> +for more information about how CPU handles it.
> +
> +After having VT-d PI, VT-d engine can issue notification event when the
> +assigned devices issue interrupts. We need add a new global vector to
> +wakeup the blocked vCPU, please refer to later section in this design for
> +how to use this new global vector.
> +
> +- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
> +After VT-d PI is introduced, the format of IRTE is changed as follows:
> +	Descriptor Address: the address of the posted-interrupt descriptor
> +	Virtual Vector: the guest vector of the interrupt
> +	URG: indicates if the interrupt is urgent
> +	Other fields continue to have the same meaning
> +
> +'Descriptor Address' tells the destination vCPU of this interrupt, since
> +each vCPU has a dedicated posted-interrupt descriptor.
> +
> +'Virtual Vector' tells the guest vector of the interrupt.
> +
> +When guest changes the configuration of the interrupts, such as, the
> +cpu affinity, or the vector, we need to update the associated IRTE accordingly.
> +
> +- Update posted-interrupt descriptor during vCPU scheduling
> +
> +The basic idea here is:
> +1. When vCPU's state is RUNSTATE_running,
> +        - Set 'NV' to 'posted_intr_vector'.
> +        - Clear 'SN' to accept posted-interrupts.
> +        - Set 'NDST' to the pCPU on which the vCPU will be running.
> +2. When vCPU's state is RUNSTATE_blocked,
> +        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> +          related vCPU when posted-interrupt happens for it.
> +          Please refer to the above section about the new global vector.
> +        - Clear 'SN' to accept posted-interrupts
> +3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> +        - Set 'SN' to suppress non-urgent interrupts
> +          (Current, we only support non-urgent interrupts)

s/Current/Currently/

or s/Current/Right now/


> +         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline,

s/,//
> +         It is not needed to accept posted-interrupt notification event,

s/,//

> +         since we don't change the behavior of scheduler when the interrupt
> +         occurs, we still need wait the next scheduling of the vCPU.

s/wait the next/wait for the next/
> +         When external interrupts from assigned devices occur, the interrupts
> +         are recorded in PIR, and will be synced to IRR before VM-Entry.
> +        - Set 'NV' to 'posted_intr_vector'.
> +
> +- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
> +
> +Here is the scenario for the usage of the new global vector:
> +
> +1. vCPU0 is running on pCPU0
> +2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
> +3. An external interrupt from an assigned device occurs for vCPU0, if we
> +still use 'posted_intr_vector' as the notification vector for vCPU0, the
> +notification event for vCPU0 (the event will go to pCPU1) will be consumed
> +by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
> +case is that vCPU0 will never be woken up again since the wakeup event
> +for it is always consumed by other vCPUs incorrectly. So we need introduce
> +another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
> +
> +After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
> +event using this new vector. Since this new vector is not a SPECIAL one to CPU,
> +it is just a normal vector. To cpu, it just receives an normal external interrupt,

s/cpu/CPU/
> +then we can get control in the handler of this new vector. In this case, hypervisor
> +can do something in it, such as wakeup the blocked vCPU.
> +
> +Here are what we do for the blocked vCPU:
> +1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
> +vCPU on the pCPU.
> +2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU
> +to the per-cpu list belonging to the pCPU it was running.
> +3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
> +
> +In the handler of 'pi_wakeup_vector', we do:
> +1. Get the physical CPU.
> +2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
> +we unblock the associated vCPU.
> +
> +- New boot command line for Xen, which controls VT-d PI feature by user.
> +
> +Like 'intremap' for interrupt remapping, we add a new boot command line
> +'intpost' for posted-interrupts.
> +
> +- Multicast/broadcast and lowest priority interrupts consideration.
> +
> +With VT-d PI, the destination vCPU information of an external interrupt
> +from assigned devices is stored in IRTE, this makes the following
> +consideration of the design:
> +1. Multicast/broadcast interrupts cannot be posted.
> +2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
> +(starting from Nehalem) ignore TPR value, and instead supported two other
> +ways (configurable by BIOS) on how the handle lowest priority interrupts:
> +	A) Round robin: In this method, the chipset simply delivers lowest priority
> +interrupts in a round-robin manner across all the available logical CPUs. While
> +this provides good load balancing, this was not the best thing to do always as
> +interrupts from the same device (like NIC) will start running on all the CPUs
> +thrashing caches and taking locks. This led to the next scheme.
> +	B) Vector hashing: In this method, hardware would apply a hash function
> +on the vector value in the interrupt request, and use that hash to pick a logical
> +CPU to route the lowest priority interrupt. This way, a given vector always goes
> +to the same logical CPU, avoiding the thrashing problem above.
> +
> +So, gist of above is that, lowest priority interrupts has never been delivered as
> +"lowest priority" in physical hardware.
> +
> +Vector hashing is used in this design.

And with those tiny little changes:

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 02/17] Add cmpxchg16b support for x86-64
  2015-08-12  2:35 ` [PATCH v5 02/17] Add cmpxchg16b support for x86-64 Feng Wu
@ 2015-08-12 15:38   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:38 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:23AM +0800, Feng Wu wrote:
> This patch adds cmpxchg16b support for x86-64, so software
> can perform 128-bit atomic write/read.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v5:
> - Change back the parameters of __cmpxchg16b() to __uint128_t *
> - Remove pointless cast for 'ptr'
> - Remove pointless parentheses
> - Use A constraint for the output
> 
> v4:
> - Use pointer as the parameter of __cmpxchg16b().
> - Use gcc's __uint128_t built-in type
> - Make the parameters of __cmpxchg16b() void *
> 
> v3:
> - Newly added.
> 
>  xen/include/asm-x86/x86_64/system.h | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> index 662813a..ebbe4b5 100644
> --- a/xen/include/asm-x86/x86_64/system.h
> +++ b/xen/include/asm-x86/x86_64/system.h
> @@ -6,6 +6,34 @@
>                                     (unsigned long)(n),sizeof(*(ptr))))
>  
>  /*
> + * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
> + * identical, store NEW in MEM.  Return the initial value in MEM.
> + * Success is indicated by comparing RETURN with OLD.
> + *
> + * This function can only be called when cpu_has_cx16 is ture.

s/ture/true/

> + */
> +
> +static always_inline __uint128_t __cmpxchg16b(
> +    volatile void *ptr, __uint128_t *old, __uint128_t *new)
> +{
> +    __uint128_t prev;
> +    uint64_t new_high = *new >> 64;
> +    uint64_t new_low = (uint64_t)*new;
> +
> +    ASSERT(cpu_has_cx16);
> +
> +    asm volatile ( "lock; cmpxchg16b %3"
> +                   : "=A" (prev)
> +                   : "c" (new_high), "b" (new_low), "m" (*__xg(ptr)), "0" (*old)
> +                   : "memory" );
> +
> +    return prev;
> +}
> +
> +#define cmpxchg16b(ptr,o,n)                                             \
> +    __cmpxchg16b((ptr), (__uint128_t *)(o), (__uint128_t *)(n))
> +
> +/*
>   * This function causes value _o to be changed to _n at location _p.
>   * If this access causes a fault then we return 1, otherwise we return 0.
>   * If no fault occurs then _o is updated to the value we saw at _p. If this
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature
  2015-08-12  2:35 ` [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
@ 2015-08-12 15:39   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:39 UTC (permalink / raw)
  To: Feng Wu; +Cc: Kevin Tian, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:24AM +0800, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> This patch adds variable 'iommu_intpost' to control whether enable VT-d
> posted-interrupt or not in the generic IOMMU code.
> 
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v5:
> - Remove the "if no intremap then no intpost" logic in parse_iommu_param(), which
>   can be covered in iommu_setup()
> 
> v3:
> - Remove pointless initializer for 'iommu_intpost'.
> - Some adjustment for "if no intremap then no intpost" logic.
>     * For parse_iommu_param(), move it to the end of the function,
>       so we don't need to add the some logic when introduing the
>       new kernel parameter 'intpost' in later patch.
>     * Add this logic in iommu_setup() after iommu_hardware_setup()
>       is called.
> 
>  xen/drivers/passthrough/iommu.c | 13 ++++++++++++-
>  xen/include/xen/iommu.h         |  2 +-
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 11d58cc..8eb77f7 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -50,6 +50,14 @@ bool_t __read_mostly iommu_passthrough;
>  bool_t __read_mostly iommu_snoop = 1;
>  bool_t __read_mostly iommu_qinval = 1;
>  bool_t __read_mostly iommu_intremap = 1;
> +
> +/*
> + * In the current implementation of VT-d posted interrupts, in some extreme
> + * cases, the per cpu list which saves the blocked vCPU will be very long,
> + * and this will affect the interrupt latency, so let this feature off by
> + * default until we find a good solution to resolve it.
> + */
> +bool_t __read_mostly iommu_intpost;
>  bool_t __read_mostly iommu_hap_pt_share = 1;
>  bool_t __read_mostly iommu_debug;
>  bool_t __read_mostly amd_iommu_perdev_intremap = 1;
> @@ -304,6 +312,9 @@ int __init iommu_setup(void)
>          panic("Couldn't enable %s and iommu=required/force",
>                !iommu_enabled ? "IOMMU" : "Interrupt Remapping");
>  
> +    if ( !iommu_intremap )
> +        iommu_intpost = 0;
> +
>      if ( !iommu_enabled )
>      {
>          iommu_snoop = 0;
> @@ -371,7 +382,7 @@ void iommu_crash_shutdown(void)
>      const struct iommu_ops *ops = iommu_get_ops();
>      if ( iommu_enabled )
>          ops->crash_shutdown();
> -    iommu_enabled = iommu_intremap = 0;
> +    iommu_enabled = iommu_intremap = iommu_intpost = 0;
>  }
>  
>  int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 705969b..da326a1 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -30,7 +30,7 @@
>  extern bool_t iommu_enable, iommu_enabled;
>  extern bool_t force_iommu, iommu_verbose;
>  extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
> -extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
> +extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
>  extern bool_t iommu_hap_pt_share;
>  extern bool_t iommu_debug;
>  extern bool_t amd_iommu_perdev_intremap;
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection
  2015-08-12  2:35 ` [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
@ 2015-08-12 15:40   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:40 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Zhang, Kevin Tian, xen-devel

On Wed, Aug 12, 2015 at 10:35:25AM +0800, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v5:
> - Remove blank line
> 
> v4:
> - Correct a logic error when setting iommu_intpost to 0
> 
> v3:
> - Remove the "if no intremap then no intpost" logic in
>   intel_vtd_setup(), it is covered in the iommu_setup().
> - Add "if no intremap then no intpost" logic in the end
>   of init_vtd_hw() which is called by vtd_resume().
> 
> So the logic exists in the following three places:
> - parse_iommu_param()
> - iommu_setup()
> - init_vtd_hw()
> 
>  xen/drivers/passthrough/vtd/iommu.c | 17 +++++++++++++++--
>  xen/drivers/passthrough/vtd/iommu.h |  1 +
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 1dffc40..52c7cc9 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
>                  disable_intremap(drhd->iommu);
>      }
>  
> +    if ( !iommu_intremap )
> +        iommu_intpost = 0;
> +
>      /*
>       * Set root entries for each VT-d engine.  After set root entry,
>       * must globally invalidate context cache, and then globally
> @@ -2147,8 +2150,8 @@ int __init intel_vtd_setup(void)
>      }
>  
>      /* We enable the following features only if they are supported by all VT-d
> -     * engines: Snoop Control, DMA passthrough, Queued Invalidation and
> -     * Interrupt Remapping.
> +     * engines: Snoop Control, DMA passthrough, Queued Invalidation, Interrupt
> +     * Remapping, and Posted Interrupt
>       */
>      for_each_drhd_unit ( drhd )
>      {
> @@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
>          if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
>              iommu_intremap = 0;
>  
> +        /*
> +         * We cannot use posted interrupt if X86_FEATURE_CX16 is
> +         * not supported, since we count on this feature to
> +         * atomically update 16-byte IRTE in posted format.
> +         */
> +        if ( !iommu_intremap || !cap_intr_post(iommu->cap) || !cpu_has_cx16 )
> +            iommu_intpost = 0;
> +
>          if ( !vtd_ept_page_compatible(iommu) )
>              iommu_hap_pt_share = 0;
>  
> @@ -2201,6 +2212,7 @@ int __init intel_vtd_setup(void)
>      P(iommu_passthrough, "Dom0 DMA Passthrough");
>      P(iommu_qinval, "Queued Invalidation");
>      P(iommu_intremap, "Interrupt Remapping");
> +    P(iommu_intpost, "Posted Interrupt");
>      P(iommu_hap_pt_share, "Shared EPT tables");
>  #undef P
>  
> @@ -2220,6 +2232,7 @@ int __init intel_vtd_setup(void)
>      iommu_passthrough = 0;
>      iommu_qinval = 0;
>      iommu_intremap = 0;
> +    iommu_intpost = 0;
>      return ret;
>  }
>  
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index ac71ed1..22abefe 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -61,6 +61,7 @@
>  /*
>   * Decoding Capability Register
>   */
> +#define cap_intr_post(c)       (((c) >> 59) & 1)
>  #define cap_read_drain(c)      (((c) >> 55) & 1)
>  #define cap_write_drain(c)     (((c) >> 54) & 1)
>  #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts
  2015-08-12  2:35 ` [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
@ 2015-08-12 15:45   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:45 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:26AM +0800, Feng Wu wrote:
> Extend struct pi_desc according to VT-d Posted-Interrupts Spec.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v3:
> - Use u32 instead of u64 for the bitfield in 'struct pi_desc'
> 
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index f1126d4..7e81752 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -80,8 +80,19 @@ struct vmx_domain {
>  
>  struct pi_desc {
>      DECLARE_BITMAP(pir, NR_VECTORS);
> -    u32 control;
> -    u32 rsvd[7];
> +    union {
> +        struct
> +        {
> +        u16 on     : 1,  /* bit 256 - Outstanding Notification */
> +            sn     : 1,  /* bit 257 - Suppress Notification */
> +            rsvd_1 : 14; /* bit 271:258 - Reserved */
> +        u8  nv;          /* bit 279:272 - Notification Vector */
> +        u8  rsvd_2;      /* bit 287:280 - Reserved */
> +        u32 ndst;        /* bit 319:288 - Notification Destination */
> +        };
> +        u64 control;
> +    };
> +    u32 rsvd[6];
>  } __attribute__ ((aligned (64)));
>  
>  #define ept_get_wl(ept)   ((ept)->ept_wl)
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts
  2015-08-12  2:35 ` [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-08-12 15:46   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 15:46 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:27AM +0800, Feng Wu wrote:
> This patch adds some helper functions to manipulate the
> Posted-Interrupts Descriptor.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v4:
> - Newly added
> 
>  xen/include/asm-x86/hvm/vmx/vmx.h | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 3fbfa44..acd4aec 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -101,6 +101,7 @@ void vmx_update_cpu_exec_control(struct vcpu *v);
>  void vmx_update_secondary_exec_control(struct vcpu *v);
>  
>  #define POSTED_INTR_ON  0
> +#define POSTED_INTR_SN  1
>  static inline int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
>  {
>      return test_and_set_bit(vector, pi_desc->pir);
> @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct pi_desc *pi_desc)
>      return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
>  }
>  
> +static inline int pi_test_on(struct pi_desc *pi_desc)
> +{
> +    return test_bit(POSTED_INTR_ON, &pi_desc->control);
> +}
> +
>  static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
>  {
>      return xchg(&pi_desc->pir[group], 0);
>  }
>  
> +static inline int pi_test_sn(struct pi_desc *pi_desc)
> +{
> +    return test_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
> +static inline void pi_set_sn(struct pi_desc *pi_desc)
> +{
> +    set_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
> +static inline void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> +    clear_bit(POSTED_INTR_SN, &pi_desc->control);
> +}
> +
>  /*
>   * Exit Reasons
>   */
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-12  2:35 ` [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-08-12 16:23   ` Konrad Rzeszutek Wilk
  2015-08-13  8:33     ` Jan Beulich
  2015-08-12 17:41   ` Andrew Cooper
  1 sibling, 1 reply; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 16:23 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, xen-devel, Jan Beulich,
	Yang Zhang

On Wed, Aug 12, 2015 at 10:35:32AM +0800, Feng Wu wrote:
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> ---
> v5:
> - Make some function parameters const
> - Call "spin_unlock_irq(&desc->lock);" a little eariler
> - Add "ASSERT(spin_is_locked(&pcidevs_lock))"
> - -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen
> 
> v4:
> - Don't inline setup_posted_irte()
> - const struct pi_desc *pi_desc for setup_posted_irte()
> - return -EINVAL when pirq_spin_lock_irq_desc() fails.
> - Make some variables const
> - Release irq desc lock earlier in pi_update_irte()
> - Remove the pointless do-while() loop when doing cmpxchg16b()
> 
> v3:
> - Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
> - Change the return type of pi_update_irte() to int.
> - Remove some pointless printk message in pi_update_irte().
> - Use structure assignment instead of memcpy() for irte copy.
> 
>  xen/drivers/passthrough/vtd/intremap.c | 112 +++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/vtd/iommu.h    |   2 +
>  xen/include/asm-x86/iommu.h            |   2 +
>  3 files changed, 116 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
> index e9fffa6..8ec85d3 100644
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -899,3 +899,115 @@ void iommu_disable_x2apic_IR(void)
>      for_each_drhd_unit ( drhd )
>          disable_qinval(drhd->iommu);
>  }
> +
> +static void setup_posted_irte(

space before '{]'

> +    struct iremap_entry *new_ire,
> +    const struct pi_desc *pi_desc,
> +    const uint8_t gvec)

Not sure why but your parameters are on their own line? Shouldnt at least one
be with the function name - and then the rest on the same offset?

> +{
> +    new_ire->post.urg = 0;
> +    new_ire->post.vector = gvec;
> +    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> +    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
> +
> +    new_ire->post.res_1 = 0;
> +    new_ire->post.res_2 = 0;
> +    new_ire->post.res_3 = 0;
> +    new_ire->post.res_4 = 0;
> +    new_ire->post.res_5 = 0;
> +
> +    new_ire->post.im = 1;
> +}
> +
> +/*
> + * This function is used to update the IRTE for posted-interrupt
> + * when guest changes MSI/MSI-X information.
> + */
> +int pi_update_irte(
> +    const struct vcpu *v,
> +    const struct pirq *pirq,
> +    const uint8_t gvec)

Ditto - perhaps move these to be right after the function name?

> +{
> +    struct irq_desc *desc;
> +    const struct msi_desc *msi_desc;
> +    int remap_index;
> +    int rc = 0;
> +    const struct pci_dev *pci_dev;
> +    const struct acpi_drhd_unit *drhd;
> +    struct iommu *iommu;
> +    struct ir_ctrl *ir_ctrl;
> +    struct iremap_entry *iremap_entries = NULL, *p = NULL;
> +    struct iremap_entry new_ire, old_ire;
> +    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    __uint128_t ret;
> +
> +    desc = pirq_spin_lock_irq_desc(pirq, NULL);
> +    if ( !desc )
> +        return -EINVAL;
> +
> +    msi_desc = desc->msi_desc;
> +    if ( !msi_desc )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;
> +    }
> +
> +    pci_dev = msi_desc->dev;
> +    if ( !pci_dev )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;
> +    }
> +
> +    remap_index = msi_desc->remap_index;
> +
> +    spin_unlock_irq(&desc->lock);
> +
> +    ASSERT(spin_is_locked(&pcidevs_lock));
> +
> +    /*
> +     * For performance concern, we will store the 'iommu' pointer in
> +     * 'struct msi_desc' in some other place, so we don't need to waste
> +     * time searching it here. I will fix this later.
> +     */
> +    drhd = acpi_find_matched_drhd_unit(pci_dev);
> +    if ( !drhd )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;

But you already unlocked desc->lock (so doing unlock_out would do a second
spin unlock)? Should this be an return -ENODEV?

> +    }
> +
> +    iommu = drhd->iommu;
> +    ir_ctrl = iommu_ir_ctrl(iommu);
> +    if ( !ir_ctrl )
> +    {
> +        rc = -ENODEV;
> +        goto unlock_out;

Ditto.

> +    }
> +
> +    spin_lock_irq(&ir_ctrl->iremap_lock);
> +
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> +
> +    old_ire = new_ire = *p;
> +
> +    /* Setup/Update interrupt remapping table entry. */
> +    setup_posted_irte(&new_ire, pi_desc, gvec);
> +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> +
> +    ASSERT(ret == *(__uint128_t *)&old_ire);
> +
> +    iommu_flush_cache_entry(p, sizeof(*p));

The other use sites of iommu_flush_cache_entry are sizeof(struct iremap_entry).
I think if you are doing this modification - then you should also have an
patch to modify the other code to be in sync.

Or just use the sizeof(struct ...).


> +    iommu_flush_iec_index(iommu, 0, remap_index);
> +
> +    unmap_vtd_domain_page(iremap_entries);
> +
> +    spin_unlock_irq(&ir_ctrl->iremap_lock);
> +
> +    return 0;
> +
> + unlock_out:
> +    spin_unlock_irq(&desc->lock);
> +
> +    return rc;
> +}
> diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
> index 6fca430..ff4ceb6 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -322,6 +322,8 @@ struct iremap_entry {
>    };
>  };
> 

Perhaps a comment about it?
 
> +#define PDA_LOW_BIT    26
> +
>  /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
>  #define IREMAP_PAGE_ORDER  8
>  
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index 29203d7..92f0900 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -31,6 +31,8 @@ int iommu_supports_eim(void);
>  int iommu_enable_x2apic_IR(void);
>  void iommu_disable_x2apic_IR(void);
>  
> +int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
> +
>  #endif /* !__ARCH_X86_IOMMU_H__ */
>  /*
>   * Local variables:
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 12/17] Update IRTE according to guest interrupt config changes
  2015-08-12  2:35 ` [PATCH v5 12/17] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-08-12 16:43   ` Konrad Rzeszutek Wilk
  2015-08-13  1:37     ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 16:43 UTC (permalink / raw)
  To: Feng Wu; +Cc: Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:33AM +0800, Feng Wu wrote:
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
> 
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> 
> For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> still use interrupt remapping.
> 
> CC: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v5:
> - Make 'struct vcpu *vcpu' const
> 
> v4:
> - Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
> - Make 'dest_id' uint32_t
> - Rename 'size' to 'bitmap_array_size'
> - find_next_bit() and find_first_bit() always return unsigned int,
>   so no need to check whether the return value is less than 0.
> - Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
> - Remove useless warning message
> - Create a seperate function vector_hashing_dest() to find the
> - destination of lowest-priority interrupts.
> - Change some comments
> 
> v3:
> - Use bitmap to store the all the possible destination vCPUs of an
>   interrupt, then trying to find the right destination from the bitmap
> - Typo and some small changes
> 
>  xen/drivers/passthrough/io.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index bda9374..f62f86c 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -25,6 +25,7 @@
>  #include <asm/hvm/iommu.h>
>  #include <asm/hvm/support.h>
>  #include <xen/hvm/irq.h>
> +#include <asm/io_apic.h>
>  
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
>  
> @@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
>      xfree(dpci);
>  }
>  
> +/*
> + * This routine handles lowest-priority interrupts using vector-hashing
> + * mechanism. As an example, modern Intel CPUs use this method to handle
> + * lowest-priority interrupts.
> + *
> + * Here is the details about the vector-hashing mechanism:
> + * 1. For lowest-priority interrupts, store all the possible destination
> + *    vCPUs in an array.
> + * 2. Use "gvec % max number of destination vCPUs" to find the right
> + *    destination vCPU in the array for the lowest-priority interrupt.
> + */
> +static struct vcpu *vector_hashing_dest(const struct domain *d,
> +                                        uint32_t dest_id,
> +                                        bool_t dest_mode,
> +                                        uint8_t gvec)
> +
> +{
> +    unsigned long *dest_vcpu_bitmap;
> +    unsigned int dest_vcpu_num = 0, idx;
> +    unsigned int bitmap_array_size = BITS_TO_LONGS(d->max_vcpus);
> +    struct vcpu *v, *dest = NULL;
> +    unsigned int i;
> +
> +    dest_vcpu_bitmap = xzalloc_array(unsigned long, bitmap_array_size);
> +    if ( !dest_vcpu_bitmap )
> +    {
> +        dprintk(XENLOG_G_INFO,
> +                "dom%d: failed to allocate memory\n", d->domain_id);
> +        return NULL;
> +    }
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,

s/0/APIC_DEST_NOSHORT/

> +                                dest_id, dest_mode) )
> +            continue;
> +
> +        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
> +        dest_vcpu_num++;

Perhaps change the variable to:

dest_vcpus

?
> +    }
> +
> +    if ( dest_vcpu_num != 0 )
> +    {
> +        idx = 0;
> +
> +        for ( i = gvec % dest_vcpu_num; i >= 0; i--)

That loop is not good as it will overflow.

Imagine gvec = 40, dest_vcpu_num = 2
On first iteration i = 0, on the next i = -1 (aka 0xfffffff), and so on.

> +        {
> +            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
> +            BUG_ON(idx >= d->max_vcpus);
> +        }
> +        idx--;
> +
> +        dest = d->vcpu[idx];
> +    }
> +
> +    xfree(dest_vcpu_bitmap);
> +
> +    return dest;
> +}
> +
> +/*
> + * The purpose of this routine is to find the right destination vCPU for
> + * an interrupt which will be delivered by VT-d posted-interrupt. There
> + * are several cases as below:
> + *
> + * - For lowest-priority interrupts, use vector-hashing mechanism to find
> + *   the destination.
> + * - Otherwise, for single destination interrupt, it is straightforward to
> + *   find the destination vCPU and return true.
> + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> + *   so return NULL.
> + */
> +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
> +                                      bool_t dest_mode, uint8_t delivery_mode,
> +                                      uint8_t gvec)
> +{
> +    unsigned int dest_vcpu_num = 0;
> +    struct vcpu *v, *dest = NULL;
> +
> +    if ( delivery_mode == dest_LowestPrio )
> +        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
> +
> +    for_each_vcpu ( d, v )
> +    {
> +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,

s/0/APIC_DEST_NOSHORT/
> +                                dest_id, dest_mode) )
> +            continue;
> +
> +        dest_vcpu_num++;
> +        dest = v;
> +    }
> +
> +    /*
> +     * For fixed destination, we only handle single-destination
> +     * interrupts.
> +     */
> +    if ( dest_vcpu_num == 1 )
> +	return dest;

Something is off with the tabs here.

> +
> +    return NULL;
> +}
> +
>  int pt_irq_create_bind(
>      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
>  {
> @@ -256,7 +359,7 @@ int pt_irq_create_bind(
>      {
>      case PT_IRQ_TYPE_MSI:
>      {
> -        uint8_t dest, dest_mode;
> +        uint8_t dest, dest_mode, delivery_mode;
>          int dest_vcpu_id;
>  
>          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> @@ -329,11 +432,30 @@ int pt_irq_create_bind(
>          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
>          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
>          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> +        delivery_mode = (pirq_dpci->gmsi.gflags >> GFLAGS_SHIFT_DELIV_MODE) &
> +                        VMSI_DELIV_MASK;
>          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
>          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
>          spin_unlock(&d->event_lock);
>          if ( dest_vcpu_id >= 0 )
>              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> +
> +        /* Use interrupt posting if it is supported */

Missing full stop.

> +        if ( iommu_intpost )
> +        {
> +            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
> +                                          delivery_mode, pirq_dpci->gmsi.gvec);
> +
> +            if ( vcpu )
> +            {
> +                rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
> +                if ( unlikely(rc) )
> +                    dprintk(XENLOG_G_INFO,
> +                            "%pv: failed to update PI IRTE, gvec:%02x\n",
> +                            vcpu, pirq_dpci->gmsi.gvec);
> +            }
> +        }
> +
>          break;
>      }
>  
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked
  2015-08-12  2:35 ` [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
@ 2015-08-12 16:59   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 16:59 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:34AM +0800, Feng Wu wrote:
> This patch includes the following aspects:
> - Add a global vector to wake up the blocked vCPU
>   when an interrupt is being posted to it (This
>   part was sugguested by Yang Zhang <yang.z.zhang@intel.com>).
> - Adds a new per-vCPU tasklet to wakeup the blocked
>   vCPU. It can be used in the case vcpu_unblock
>   cannot be called directly.
> - Define two per-cpu variables:
>       * pi_blocked_vcpu:
>       A list storing the vCPUs which were blocked on this pCPU.
> 
>       * pi_blocked_vcpu_lock:
>       The spinlock to protect pi_blocked_vcpu.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v4:
> - Use local variables in pi_wakeup_interrupt()
> - Remove vcpu from the blocked list when pi_desc.on==1, this
> - avoid kick vcpu multiple times.
> - Remove tasklet
> 
> v3:
> - This patch is generated by merging the following three patches in v2:
>    [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>    [RFC v2 10/15] vmx: Define two per-cpu variables
>    [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> 
>  xen/arch/x86/hvm/vmx/vmcs.c        |  3 ++
>  xen/arch/x86/hvm/vmx/vmx.c         | 63 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  3 ++
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  5 +++
>  4 files changed, 74 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 28c553f..2dabf16 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -661,6 +661,9 @@ int vmx_cpu_up(void)
>      if ( cpu_has_vmx_vpid )
>          vpid_sync_all();
>  
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +
>      return 0;
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index d2a4cfb..e80d888 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,15 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>  
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in PI wakeup handler we

s/maintian/maintain/

> + * can find which vCPU should be waken up.

s/waken/woken/
> + */
> +DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
>  
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -106,6 +114,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> @@ -1975,6 +1986,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
>  
> +/*
> + * Handle VT-d posted-interrupt when VCPU is blocked.
> + */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &this_cpu(pi_blocked_vcpu_lock);
> +    struct list_head *blocked_vcpus = &this_cpu(pi_blocked_vcpu);
> +    LIST_HEAD(list);
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del_init(&vmx->pi_blocked_vcpu_list);
> +
> +            /*
> +             * We cannot call vcpu_unblock here, since it also needs
> +             * 'pi_blocked_vcpu_lock', we store the vCPUs with ON
> +             * set in another list and unblock them after we release
> +             * 'pi_blocked_vcpu_lock'.
> +             */
> +            list_add_tail(&vmx->pi_vcpu_on_set_list, &list);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +
> +    list_for_each_entry_safe(vmx, tmp, &list, pi_vcpu_on_set_list)
> +    {
> +        v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +        list_del_init(&vmx->pi_vcpu_on_set_list);
> +        vcpu_unblock(v);
> +    }
> +
> +    ack_APIC_irq();

Could this be moved to be right after the spin_unlock?

> +    this_cpu(irq_count)++;
> +}
> +
>  const struct hvm_function_table * __init start_vmx(void)
>  {
>      set_in_cr4(X86_CR4_VMXE);
> @@ -2012,7 +2070,12 @@ const struct hvm_function_table * __init start_vmx(void)
>      }
>  
>      if ( cpu_has_vmx_posted_intr_processing )
> +    {
>          alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
> +
> +        if ( iommu_intpost )
> +            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> +    }
>      else
>      {
>          vmx_function_table.deliver_posted_intr = NULL;
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 7e81752..9a986d0 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -161,6 +161,9 @@ struct arch_vmx_struct {
>      struct page_info     *vmwrite_bitmap;
>  
>      struct page_info     *pml_pg;
> +
> +    struct list_head     pi_blocked_vcpu_list;
> +    struct list_head     pi_vcpu_on_set_list;
>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 03c529c..3fd66ba 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,11 @@
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/vmx/vmcs.h>
>  
> +DECLARE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +DECLARE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
> +extern uint8_t pi_wakeup_vector;
> +
>  typedef union {
>      struct {
>          u64 r       :   1,  /* bit 0 - Read permission */
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running
  2015-08-12  2:35 ` [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-08-12 17:02   ` Konrad Rzeszutek Wilk
  2015-08-13  1:37     ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:02 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:35AM +0800, Feng Wu wrote:
> When a vCPU is running in Root mode and a notification event
> has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
> the current cpu, so the pending interrupt in PIRR will be
> synced to vIRR before VM-Exit in time.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> ---
> v4:
> - Coding style.
> 
> v3:
> - Make pi_notification_interrupt() static
> 
>  xen/arch/x86/hvm/vmx/vmx.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e80d888..c8a4371 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2033,6 +2033,51 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
>      this_cpu(irq_count)++;
>  }
>  
> +/* Handle VT-d posted-interrupt when VCPU is running. */
> +static void pi_notification_interrupt(struct cpu_user_regs *regs)
> +{
> +    /*
> +     * We get here when a vCPU is running in root-mode (such as via hypercall,
> +     * or any other reasons which can result in VM-Exit), and before vCPU is
> +     * back to non-root, external interrupts from an assigned device happen
> +     * and a notification event is delivered to this logical CPU.
> +     *
> +     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
> +     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
> +     * be synced to vIRR before VM-Exit in time.
> +     *
> +     * Please refer to the following code fragments from
> +     * xen/arch/x86/hvm/vmx/entry.S:
> +     *
> +     *     .Lvmx_do_vmentry
> +     *
> +     *      ......
> +     *      point 1
> +     *
> +     *      cmp  %ecx,(%rdx,%rax,1)
> +     *      jnz  .Lvmx_process_softirqs
> +     *
> +     *      ......
> +     *
> +     *      je   .Lvmx_launch
> +     *
> +     *      ......
> +     *
> +     *     .Lvmx_process_softirqs:
> +     *      sti
> +     *      call do_softirq
> +     *      jmp  .Lvmx_do_vmentry
> +     *
> +     * If VT-d engine issues a notification event at point 1 above, it cannot
> +     * be delivered to the guest during this VM-entry without raising the
> +     * softirq in this notification handler.
> +     */
> +    raise_softirq(VCPU_KICK_SOFTIRQ);
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;

Most (except the AMD?) have ack_APIC_irq() and such done at the start
of the functions. Is there a particular need to diverge?

> +}
> +
>  const struct hvm_function_table * __init start_vmx(void)
>  {
>      set_in_cr4(X86_CR4_VMXE);
> @@ -2071,7 +2116,7 @@ const struct hvm_function_table * __init start_vmx(void)
>  
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
> -        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
> +        alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
>  
>          if ( iommu_intpost )
>              alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor
  2015-08-12  2:35 ` [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:03 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:28AM +0800, Feng Wu wrote:
> This patch initializes the VT-d Posted-interrupt Descriptor.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v3:
> - Move pi_desc_init() to xen/arch/x86/hvm/vmx/vmcs.c
> - Remove the 'inline' flag of pi_desc_init()
> 
>  xen/arch/x86/hvm/vmx/vmcs.c       | 18 ++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmx.h |  2 ++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index a0a97e7..28c553f 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -39,6 +39,7 @@
>  #include <asm/flushtlb.h>
>  #include <asm/shadow.h>
>  #include <asm/tboot.h>
> +#include <asm/apic.h>
>  
>  static bool_t __read_mostly opt_vpid_enabled = 1;
>  boolean_param("vpid", opt_vpid_enabled);
> @@ -951,6 +952,20 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
>      virtual_vmcs_exit(vvmcs);
>  }
>  
> +static void pi_desc_init(struct vcpu *v)
> +{
> +    uint32_t dest;
> +
> +    v->arch.hvm_vmx.pi_desc.nv = posted_intr_vector;
> +
> +    dest = cpu_physical_id(v->processor);
> +
> +    if ( x2apic_enabled )
> +        v->arch.hvm_vmx.pi_desc.ndst = dest;
> +    else
> +        v->arch.hvm_vmx.pi_desc.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
> +}
> +
>  static int construct_vmcs(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> @@ -1089,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v)
>  
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
> +        if ( iommu_intpost )
> +            pi_desc_init(v);
> +
>          __vmwrite(PI_DESC_ADDR, virt_to_maddr(&v->arch.hvm_vmx.pi_desc));
>          __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector);
>      }
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index acd4aec..03c529c 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -88,6 +88,8 @@ typedef enum {
>  #define EPT_EMT_WB              6
>  #define EPT_EMT_RSV2            7
>  
> +#define PI_xAPIC_NDST_MASK      0xFF00
> +
>  void vmx_asm_vmexit_handler(struct cpu_user_regs);
>  void vmx_asm_do_vmentry(void);
>  void vmx_intr_assist(void);
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set
  2015-08-12  2:35 ` [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:03 UTC (permalink / raw)
  To: Feng Wu; +Cc: Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich, xen-devel

On Wed, Aug 12, 2015 at 10:35:29AM +0800, Feng Wu wrote:
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot send
> posted-interrupt when 'SN' is set.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v5:
> - keep the vcpu_kick() at the end of vmx_deliver_posted_intr()
> - Keep the 'return' after calling __vmx_deliver_posted_interrupt()
> 
> v4:
> - Coding style.
> - V3 removes a vcpu_kick() from the eoi_exitmap_changed path
>   incorrectly, fix it.
> 
> v3:
> - use cmpxchg to test SN/ON and set ON
> 
>  xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c32d863..d2a4cfb 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1701,8 +1701,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
>           */
>          pi_set_on(&v->arch.hvm_vmx.pi_desc);
>      }
> -    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> +    else
>      {
> +        struct pi_desc old, new, prev;
> +

Could you add a comment saying:

/* To skip over first check in the loop below. */

just to easy somebody in reading the code.
> +        prev.control = 0;
> +
> +        do {
> +            /*
> +             * Currently, we don't support urgent interrupt, all
> +             * interrupts are recognized as non-urgent interrupt,
> +             * so we cannot send posted-interrupt when 'SN' is set.
> +             * Besides that, if 'ON' is already set, we cannot set
> +             * posted-interrupts as well.
> +             */
> +            if ( pi_test_sn(&prev) || pi_test_on(&prev) )
> +            {
> +                vcpu_kick(v);
> +                return;
> +            }
> +
> +            old.control = v->arch.hvm_vmx.pi_desc.control &
> +                          ~( 1 << POSTED_INTR_ON | 1 << POSTED_INTR_SN );
> +            new.control = v->arch.hvm_vmx.pi_desc.control |
> +                          1 << POSTED_INTR_ON;
> +
> +            prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
> +                                   old.control, new.control);
> +        } while ( prev.control != old.control );
> +
>          __vmx_deliver_posted_interrupt(v);
>          return;
>      }

Otherwise:

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 09/17] VT-d: Remove pointless casts
  2015-08-12  2:35 ` [PATCH v5 09/17] VT-d: Remove pointless casts Feng Wu
@ 2015-08-12 17:03   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:03 UTC (permalink / raw)
  To: Feng Wu; +Cc: xen-devel

On Wed, Aug 12, 2015 at 10:35:30AM +0800, Feng Wu wrote:
> Remove pointless casts.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>

I am not even sure if it needs a review..
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> ---
> v5:
> - Newly added.
> 
>  xen/drivers/passthrough/vtd/utils.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
> index 44c4ef5..162b764 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -234,10 +234,9 @@ static void dump_iommu_info(unsigned char key)
>                      continue;
>                  printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
>                      "   %x %x\n", i,
> -                    (u32)p->hi.svt, (u32)p->hi.sq, (u32)p->hi.sid,
> -                    (u32)p->lo.dst, (u32)p->lo.vector, (u32)p->lo.avail,
> -                    (u32)p->lo.dlm, (u32)p->lo.tm, (u32)p->lo.rh,
> -                    (u32)p->lo.dm, (u32)p->lo.fpd, (u32)p->lo.p);
> +                    p->hi.svt, p->hi.sq, p->hi.sid, p->lo.dst, p->lo.vector,
> +                    p->lo.avail, p->lo.dlm, p->lo.tm, p->lo.rh, p->lo.dm,
> +                    p->lo.fpd, p->lo.p);
>                  print_cnt++;
>              }
>              if ( iremap_entries )
> @@ -281,11 +280,10 @@ static void dump_iommu_info(unsigned char key)
>  
>                  printk("   %02x:  %04x   %x    %x   %x   %x   %x    %x"
>                      "    %x     %02x\n", i,
> -                    (u32)remap->index_0_14 | ((u32)remap->index_15 << 15),
> -                    (u32)remap->format, (u32)remap->mask, (u32)remap->trigger,
> -                    (u32)remap->irr, (u32)remap->polarity,
> -                    (u32)remap->delivery_status, (u32)remap->delivery_mode,
> -                    (u32)remap->vector);
> +                    remap->index_0_14 | ((u32)remap->index_15 << 15),
> +                    remap->format, remap->mask, remap->trigger, remap->irr,
> +                    remap->polarity, remap->delivery_status, remap->delivery_mode,
> +                    remap->vector);
>              }
>          }
>      }
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts
  2015-08-12  2:35 ` [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
@ 2015-08-12 17:13   ` Konrad Rzeszutek Wilk
  2015-08-13  9:06     ` Dario Faggioli
  0 siblings, 1 reply; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:13 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Jan Beulich

On Wed, Aug 12, 2015 at 10:35:36AM +0800, Feng Wu wrote:
> This patch adds the following arch hooks in scheduler:
> - vmx_pre_ctx_switch_pi():
> It is called before context switch, we update the posted
> interrupt descriptor when the vCPU is preempted, go to sleep,
> or is blocked.
> 
> - vmx_post_ctx_switch_pi()
> It is called after context switch, we update the posted
> interrupt descriptor when the vCPU is going to run.
> 
> - arch_vcpu_wake_prepare()
> It will be called when waking up the vCPU, we update
> the posted interrupt descriptor when the vCPU is unblocked.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> Sugguested-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v5:
> - Rename arch_vcpu_wake to arch_vcpu_wake_prepare
> - Make arch_vcpu_wake_prepare() inline for ARM
> - Merge the ARM dummy hook with together
> - Changes to some code comments
> - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
>   PI is disabled or the vCPU is not in HVM
> - Coding style
> 
> v4:
> - Newly added
> 
>  xen/arch/x86/domain.c              |  11 +++
>  xen/arch/x86/hvm/vmx/vmx.c         | 147 +++++++++++++++++++++++++++++++++++++
>  xen/common/schedule.c              |   2 +
>  xen/include/asm-arm/domain.h       |   2 +
>  xen/include/asm-x86/domain.h       |   3 +
>  xen/include/asm-x86/hvm/hvm.h      |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   8 ++
>  7 files changed, 175 insertions(+)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..130f859 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1605,9 +1605,20 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>  
>      set_current(next);
>  
> +    /*
> +     * When switching from non-idle to idle, we only do a lazy context switch.
> +     * However, in order for posted interrupt (if available and enabled) to
> +     * work properly, we at least need to update the descriptors.
> +     */
> +    if ( prev->arch.pi_ctxt_switch_from && !is_idle_vcpu(prev) )
> +        prev->arch.pi_ctxt_switch_from(prev);
> +

This begs to be made in an inline function..

>      if ( (per_cpu(curr_vcpu, cpu) == next) ||
>           (is_idle_domain(nextd) && cpu_online(cpu)) )
>      {
> +        if ( next->arch.pi_ctxt_switch_to && !is_idle_vcpu(next) )
> +            next->arch.pi_ctxt_switch_to(next);

Ditto.

> +
>          local_irq_enable();
>      }
>      else
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c8a4371..758809a 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -67,6 +67,8 @@ enum handler_return { HNDL_done, HNDL_unhandled, HNDL_exception_raised };
>  
>  static void vmx_ctxt_switch_from(struct vcpu *v);
>  static void vmx_ctxt_switch_to(struct vcpu *v);
> +static void vmx_pre_ctx_switch_pi(struct vcpu *v);
> +static void vmx_post_ctx_switch_pi(struct vcpu *v);
>  
>  static int  vmx_alloc_vlapic_mapping(struct domain *d);
>  static void vmx_free_vlapic_mapping(struct domain *d);
> @@ -117,10 +119,20 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>      INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_vcpu_on_set_list);
>  
> +    v->arch.hvm_vmx.pi_block_cpu = -1;
> +
> +    spin_lock_init(&v->arch.hvm_vmx.pi_lock);
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    if ( iommu_intpost && is_hvm_vcpu(v) )
> +    {
> +        v->arch.pi_ctxt_switch_from = vmx_pre_ctx_switch_pi;
> +        v->arch.pi_ctxt_switch_to = vmx_post_ctx_switch_pi;
> +    }
> +
>      if ( (rc = vmx_create_vmcs(v)) != 0 )
>      {
>          dprintk(XENLOG_WARNING,
> @@ -718,6 +730,140 @@ static void vmx_fpu_leave(struct vcpu *v)
>      }
>  }
>  
> +void arch_vcpu_wake_prepare(struct vcpu *v)
> +{
> +    unsigned long gflags;
> +
> +    if ( !iommu_intpost || !is_hvm_vcpu(v) || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
> +
> +    if ( likely(vcpu_runnable(v)) ||
> +         !test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +        unsigned long flags;
> +
> +        /*
> +         * We don't need to send notification event to a non-running
> +         * vcpu, the interrupt information will be delivered to it before
> +         * VM-ENTRY when the vcpu is scheduled to run next time.
> +         */
> +        pi_set_sn(pi_desc);
> +
> +        /*
> +         * Set 'NV' feild back to posted_intr_vector, so the

s/feild/field/

> +         * Posted-Interrupts can be delivered to the vCPU by
> +         * VT-d HW after it is scheduled to run.
> +         */
> +        write_atomic((uint8_t*)&pi_desc->nv, posted_intr_vector);
> +
> +        /*
> +         * Delete the vCPU from the related block list
> +         * if we are resuming from blocked state

Missing full stop.

> +         */
> +        if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> +        {
> +            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                              v->arch.hvm_vmx.pi_block_cpu), flags);
> +            list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +            spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                                    v->arch.hvm_vmx.pi_block_cpu), flags);
> +        }
> +    }
> +
> +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
> +}
> +
> +static void vmx_pre_ctx_switch_pi(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +    struct pi_desc old, new;
> +    unsigned long flags, gflags;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_lock, gflags);
> +
> +    if ( vcpu_runnable(v) || !test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        /*
> +         * The vCPU has been preempted or went to sleep. We don't need to send
> +         * notification event to a non-running vcpu, the interrupt information
> +         * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> +         * to run next time.
> +         */
> +        pi_set_sn(pi_desc);
> +
> +    }
> +    else if ( test_bit(_VPF_blocked, &v->pause_flags) )
> +    {
> +        /*
> +         * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +         * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +         * the per-CPU list, we also save it to posted-interrupt descriptor and
> +         * make it as the destination of the wake-up notification event.
> +         */
> +        v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +        spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                          v->arch.hvm_vmx.pi_block_cpu), flags);
> +        list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                      &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +        do {
> +            old.control = new.control = pi_desc->control;
> +
> +            /* Should not block the vCPU if an interrupt was posted for it */
Missing full stop.
> +            if ( pi_test_on(&old) )
> +            {
> +                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
> +                vcpu_unblock(v);
> +                return;
> +            }
> +
> +            /*
> +             * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
> +             * so when external interrupts from assigned deivces happen,
> +             * wakeup notifiction event will go to
> +             * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
> +             * we can find the vCPU in the right list to wake up.
> +             */
> +            if ( x2apic_enabled )
> +                new.ndst = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
> +            else
> +                new.ndst = MASK_INSR(cpu_physical_id(
> +                                 v->arch.hvm_vmx.pi_block_cpu),
> +                                 PI_xAPIC_NDST_MASK);
> +            pi_clear_sn(&new);
> +            new.nv = pi_wakeup_vector;
> +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                  != old.control );
> +    }
> +
> +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, gflags);
> +}
> +
> +static void vmx_post_ctx_switch_pi(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));
> +
> +    pi_clear_sn(pi_desc);
> +}
> +
>  static void vmx_ctxt_switch_from(struct vcpu *v)
>  {
>      /*
> @@ -756,6 +902,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>  
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
> +    vmx_post_ctx_switch_pi(v);
>  }
>  
>  
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 3eefed7..bc49098 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -412,6 +412,8 @@ void vcpu_wake(struct vcpu *v)
>      unsigned long flags;
>      spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
>  
> +    arch_vcpu_wake_prepare(v);
> +
>      if ( likely(vcpu_runnable(v)) )
>      {
>          if ( v->runstate.state >= RUNSTATE_blocked )
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 56aa208..cffe2c6 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -301,6 +301,8 @@ static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid)
>      return vaff;
>  }
>  
> +static inline void arch_vcpu_wake_prepare(struct vcpu *v) {}
> +
>  #endif /* __ASM_DOMAIN_H__ */
>  
>  /*
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 0fce09e..979210a 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -481,6 +481,9 @@ struct arch_vcpu
>      void (*ctxt_switch_from) (struct vcpu *);
>      void (*ctxt_switch_to) (struct vcpu *);
>  
> +    void (*pi_ctxt_switch_from) (struct vcpu *);
> +    void (*pi_ctxt_switch_to) (struct vcpu *);
> +
>      struct vpmu_struct vpmu;
>  
>      /* Virtual Machine Extensions */
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 3cac64f..95f5357 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -545,6 +545,8 @@ static inline bool_t hvm_altp2m_supported(void)
>      return hvm_funcs.altp2m_supported;
>  }
>  
> +void arch_vcpu_wake_prepare(struct vcpu *v);
> +
>  #ifndef NDEBUG
>  /* Permit use of the Forced Emulation Prefix in HVM guests */
>  extern bool_t opt_hvm_fep;
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 9a986d0..209fb39 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -164,6 +164,14 @@ struct arch_vmx_struct {
>  
>      struct list_head     pi_blocked_vcpu_list;
>      struct list_head     pi_vcpu_on_set_list;
> +
> +    /*
> +     * Before vCPU is blocked, it is added to the global per-cpu list
> +     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> +     * event to 'pi_block_cpu' and wakeup the related vCPU.
> +     */
> +    int                  pi_block_cpu;
> +    spinlock_t           pi_lock;
>  };
>  
>  int vmx_create_vmcs(struct vcpu *v);
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 16/17] VT-d: Dump the posted format IRTE
  2015-08-12  2:35 ` [PATCH v5 16/17] VT-d: Dump the posted format IRTE Feng Wu
@ 2015-08-12 17:14   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:14 UTC (permalink / raw)
  To: Feng Wu; +Cc: Yang Zhang, Kevin Tian, xen-devel

On Wed, Aug 12, 2015 at 10:35:37AM +0800, Feng Wu wrote:
> Add the utility to dump the posted format IRTE.
> 
> CC: Yang Zhang <yang.z.zhang@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v4:
> - Newly added
> 
>  xen/drivers/passthrough/vtd/utils.c | 41 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
> index 9d556da..1848385 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -203,6 +203,9 @@ static void dump_iommu_info(unsigned char key)
>              ecap_intr_remap(iommu->ecap) ? "" : "not ",
>              (status & DMA_GSTS_IRES) ? " and enabled" : "" );
>  
> +        printk("  Interrupt Posting: %ssupported.\n",
> +            cap_intr_post(iommu->ecap) ? "" : "not ");
> +
>          if ( status & DMA_GSTS_IRES )
>          {
>              /* Dump interrupt remapping table. */
> @@ -213,6 +216,7 @@ static void dump_iommu_info(unsigned char key)
>  
>              printk("  Interrupt remapping table (nr_entry=%#x. "
>                  "Only dump P=1 entries here):\n", nr_entry);
> +            printk ("Entries for remapped format:\n");
>              printk("       SVT  SQ   SID      DST  V  AVL DLM TM RH DM "
>                     "FPD P\n");
>              for ( i = 0; i < nr_entry; i++ )
> @@ -230,7 +234,7 @@ static void dump_iommu_info(unsigned char key)
>                  else
>                      p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
>  
> -                if ( !p->remap.p )
> +                if ( !p->remap.p || p->remap.im )
>                      continue;
>                  printk("  %04x:  %x   %x  %04x %08x %02x    %x   %x  %x  %x  %x"
>                      "   %x %x\n", i,
> @@ -239,6 +243,41 @@ static void dump_iommu_info(unsigned char key)
>                      p->remap.rh, p->remap.dm, p->remap.fpd, p->remap.p);
>                  print_cnt++;
>              }
> +
> +            if ( iremap_entries )
> +                unmap_vtd_domain_page(iremap_entries);
> +
> +            iremap_entries = NULL;
> +            printk ("\nEntries for posted format:\n");
> +            printk("       SVT  SQ   SID              PDA  V  URG AVL FPD P\n");
> +            for ( i = 0; i < nr_entry; i++ )
> +            {
> +                struct iremap_entry *p;
> +                if ( i % (1 << IREMAP_ENTRY_ORDER) == 0 )
> +                {
> +                    /* This entry across page boundry */

Missing full stop, and s/boundry/boundary/

> +                    if ( iremap_entries )
> +                        unmap_vtd_domain_page(iremap_entries);
> +
> +                    GET_IREMAP_ENTRY(iremap_maddr, i,
> +                                     iremap_entries, p);
> +                }
> +                else
> +                    p = &iremap_entries[i % (1 << IREMAP_ENTRY_ORDER)];
> +
> +                if ( !p->post.p || !p->post.im )
> +                    continue;
> +
> +                printk("  %04x:  %x   %x  %04x %16lx %02x    %x   %x  %x  %x\n",
> +                    i,
> +                    p->post.svt, p->post.sq, p->post.sid,
> +                    ((u64)p->post.pda_h << 32) | (p->post.pda_l << 6),
> +                    p->post.vector, p->post.urg, p->post.avail, p->post.fpd,
> +                    p->post.p);
> +
> +                print_cnt++;
> +            }
> +
>              if ( iremap_entries )
>                  unmap_vtd_domain_page(iremap_entries);
>              if ( iommu_ir_ctrl(iommu)->iremap_num != print_cnt )
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts
  2015-08-12  2:35 ` [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-08-12 17:16   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:16 UTC (permalink / raw)
  To: Feng Wu; +Cc: xen-devel

On Wed, Aug 12, 2015 at 10:35:38AM +0800, Feng Wu wrote:
> Enable VT-d Posted-Interrupts and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  docs/misc/xen-command-line.markdown | 9 ++++++++-
>  xen/drivers/passthrough/iommu.c     | 3 +++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 204e7a4..d83a292 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -855,7 +855,7 @@ debug hypervisor only).
>  > Default: `new` unless directed-EOI is supported
>  
>  ### iommu
> -> `= List of [ <boolean> | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | verbose | debug ]`
> +> `= List of [ <boolean> | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | verbose | debug ]`
>  
>  > Sub-options:
>  
> @@ -882,6 +882,13 @@ debug hypervisor only).
>  >> Control the use of interrupt remapping (DMA remapping will always be enabled
>  >> if IOMMU functionality is enabled).
>  
> +> `intpost`
> +
> +> Default: `true`

I believe this is false ? Patch #3 suggests this.

> +
> +>> Control the use of interrupt posting, which depends on the availability of
> +>> interrupt remapping.
> +
>  > `qinval` (VT-d)
>  
>  > Default: `true`
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 8eb77f7..84b1e43 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -38,6 +38,7 @@ static void iommu_dump_p2m_table(unsigned char key);
>   *   no-snoop                   Disable VT-d Snoop Control
>   *   no-qinval                  Disable VT-d Queued Invalidation
>   *   no-intremap                Disable VT-d Interrupt Remapping
> + *   no-intpost                 Disable VT-d Interrupt posting
>   */
>  custom_param("iommu", parse_iommu_param);
>  bool_t __initdata iommu_enable = 1;
> @@ -102,6 +103,8 @@ static void __init parse_iommu_param(char *s)
>              iommu_qinval = val;
>          else if ( !strcmp(s, "intremap") )
>              iommu_intremap = val;
> +        else if ( !strcmp(s, "intpost") )
> +            iommu_intpost = val;
>          else if ( !strcmp(s, "debug") )
>          {
>              iommu_debug = val;
> -- 
> 2.1.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-12 15:35   ` Konrad Rzeszutek Wilk
@ 2015-08-12 17:27     ` Andrew Cooper
  2015-08-12 17:46       ` Konrad Rzeszutek Wilk
  2015-08-13  1:37     ` Wu, Feng
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2015-08-12 17:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, xen-devel, Jan Beulich,
	Yang Zhang

On 12/08/15 16:35, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 12, 2015 at 10:35:22AM +0800, Feng Wu wrote:
>
> The title has an extra 'i'.

It does?  It certainly has one-too-many t's.

~Andrew

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-12  2:35 ` [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
  2015-08-12 16:23   ` Konrad Rzeszutek Wilk
@ 2015-08-12 17:41   ` Andrew Cooper
  2015-08-13  1:41     ` Wu, Feng
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Cooper @ 2015-08-12 17:41 UTC (permalink / raw)
  To: Feng Wu, xen-devel; +Cc: Yang Zhang, Kevin Tian, Keir Fraser, Jan Beulich

On 12/08/15 03:35, Feng Wu wrote:
> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> +
> +    old_ire = new_ire = *p;
> +
> +    /* Setup/Update interrupt remapping table entry. */
> +    setup_posted_irte(&new_ire, pi_desc, gvec);
> +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> +
> +    ASSERT(ret == *(__uint128_t *)&old_ire);

You have still not addressed my issues from earlier series.  This
ASSERT() is buggy until you have a clear, detailed description of why it
is claimed to be safe in this instance.

~Andrew

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-12 17:27     ` Andrew Cooper
@ 2015-08-12 17:46       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-12 17:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Keir Fraser, George Dunlap, xen-devel, Jan Beulich,
	Yang Zhang, Feng Wu

On Wed, Aug 12, 2015 at 06:27:12PM +0100, Andrew Cooper wrote:
> On 12/08/15 16:35, Konrad Rzeszutek Wilk wrote:
> > On Wed, Aug 12, 2015 at 10:35:22AM +0800, Feng Wu wrote:
> >
> > The title has an extra 'i'.
> 
> It does?  It certainly has one-too-many t's.

Indeed!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-12 15:35   ` Konrad Rzeszutek Wilk
  2015-08-12 17:27     ` Andrew Cooper
@ 2015-08-13  1:37     ` Wu, Feng
  2015-08-13  8:29       ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-08-13  1:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Jan Beulich, Zhang, Yang Z, Wu, Feng



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 12, 2015 11:35 PM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser; George Dunlap; Andrew
> Cooper; Jan Beulich; Zhang, Yang Z
> Subject: Re: [Xen-devel] [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
> 
> On Wed, Aug 12, 2015 at 10:35:22AM +0800, Feng Wu wrote:
> 
> The title has an extra 'i'.
> 
> > Add the design doc for VT-d PI.
> >
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: Yang Zhang <yang.z.zhang@intel.com>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: George Dunlap <george.dunlap@eu.citrix.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  docs/misc/vtd-pi.txt | 333
> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 333 insertions(+)
> >  create mode 100644 docs/misc/vtd-pi.txt
> >
> > diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
> > new file mode 100644
> > index 0000000..98a77ba
> > --- /dev/null
> > +++ b/docs/misc/vtd-pi.txt
> > @@ -0,0 +1,333 @@
> > +Authors: Feng Wu <feng.wu@intel.com>
> > +
> > +VT-d Posted-interrupt (PI) design for XEN
> > +
> > +Background
> > +==========
> > +With the development of virtualization, there are more and more device
> > +assignment requirements. However, today when a VM is running with
> > +assigned devices (such as, NIC), external interrupt handling for the assigned
> > +devices always needs VMM intervention.
> > +
> > +VT-d Posted-interrupt is a more enhanced method to handle interrupts
> > +in the virtualization environment. Interrupt posting is the process by
> > +which an interrupt request is recorded in a memory-resident
> > +posted-interrupt-descriptor structure by the root-complex, followed by
> > +an optional notification event issued to the CPU complex.
> > +
> > +With VT-d Posted-interrupt we can get the following advantages:
> > +- Direct delivery of external interrupts to running vCPUs without VMM
> > +intervention
> > +- Decrease the interrupt migration complexity. On vCPU migration, software
> > +can atomically co-migrate all interrupts targeting the migrating vCPU. For
> > +virtual machines with assigned devices, migrating a vCPU across pCPUs
> > +either incur the overhead of forwarding interrupts in software (e.g. via VMM
> 
> s/incur/incurs/
> > +generated IPIS), or complexity to independently migrate each interrupt
> targeting
> 
> s/IPIS/IPIs
> > +the vCPU to the new pCPU. However, after enabling VT-d PI, the destination
> vCPU
> > +of an external interrupt from assigned devices is stored in the IRTE (i.e.
> > +Posted-interrupt Descriptor Address), when vCPU is migrated to another
> pCPU,
> > +we will set this new pCPU in the 'NDST' filed of Posted-interrupt descriptor,
> this
> > +make the interrupt migration automatic.
> > +
> > +Here is what Xen currently does for external interrupts from assigned
> devices:
> > +
> > +When a VM is running and an external interrupt from an assigned device
> occurs
> > +for it. VM-EXIT happens, then:
> > +
> > +vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci()
> -->
> > +raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
> > +
> > +softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
> > +
> > +dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver()
> -->
> > +vmsi_inj_irq() --> vlapic_set_irq()
> > +
> > +vlapic_set_irq() does the following things:
> > +1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr()
> to deliver
> > +the virtual interrupt via posted-interrupt infrastructure.
> > +2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in
> vLAPIC
> > +page and call vcpu_kick() to kick the related vCPU. Before VM-Entry,
> vmx_intr_assist()
> > +will help to inject the interrupt to guests.
> > +
> > +However, after VT-d PI is supported, when a guest is running in non-root and
> an
> > +external interrupt from an assigned device occurs for it. No VM-Exit is
> needed,
> > +the guest can handle this totally in non-root mode, thus avoiding all the
> above
> > +code flow.
> > +
> > +Posted-interrupt Introduction
> > +========================
> > +There are two components to the Posted-interrupt architecture:
> > +Processor Support and Root-Complex Support
> > +
> > +- Processor Support
> > +Posted-interrupt processing is a feature by which a processor processes
> > +the virtual interrupts by recording them as pending on the virtual-APIC
> > +page.
> > +
> > +Posted-interrupt processing is enabled by setting the process posted
> > +interrupts VM-execution control. The processing is performed in response
> > +to the arrival of an interrupt with the posted-interrupt notification vector.
> > +In response to such an interrupt, the processor processes virtual interrupts
> > +recorded in a data structure called a posted-interrupt descriptor.
> > +
> > +More information about APICv and CPU-side Posted-interrupt, please refer
> > +to Chapter 29, and Section 29.6 in the Intel SDM:
> >
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/
> 64-ia-32-architectures-software-developer-manual-325462.pdf
> > +
> > +- Root-Complex Support
> > +Interrupt posting is the process by which an interrupt request (from IOAPIC
> > +or MSI/MSIx capable sources) is recorded in a memory-resident
> > +posted-interrupt-descriptor structure by the root-complex, followed by
> > +an optional notification event issued to the CPU complex. The interrupt
> > +request arriving at the root-complex carry the identity of the interrupt
> > +request source and a 'remapping-index'. The remapping-index is used to
> > +look-up an entry from the memory-resident interrupt-remap-table. Unlike
> > +with interrupt-remapping, the interrupt-remap-table-entry for a posted-
> 
> s/with//
> > +interrupt, specifies a virtual-vector and a pointer to the posted-interrupt
> > +descriptor. The virtual-vector specifies the vector of the interrupt to be
> > +recorded in the posted-interrupt descriptor. The posted-interrupt descriptor
> > +hosts storage for the virtual-vectors and contains the attributes of the
> > +notification event (interrupt) to be issued to the CPU complex to inform
> > +CPU/software about pending interrupts recorded in the posted-interrupt
> > +descriptor.
> > +
> > +More information about VT-d PI, please refer to
> >
> +http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolo
> gy/vt-directed-io-spec.html
> > +
> > +Important Definitions
> > +==================
> > +There are some changes to IRTE and posted-interrupt descriptor after
> > +VT-d PI is introduced:
> > +IRTE: Interrupt Remapping Table Entry
> > +Posted-interrupt Descriptor Address: the address of the posted-interrupt
> descriptor
> > +Virtual Vector: the guest vector of the interrupt
> > +URG: indicates if the interrupt is urgent
> > +
> > +Posted-interrupt descriptor:
> > +The Posted Interrupt Descriptor hosts the following fields:
> > +Posted Interrupt Request (PIR): Provide storage for posting (recording)
> interrupts (one bit
> > +per vector, for up to 256 vectors).
> > +
> > +Outstanding Notification (ON): Indicate if there is a notification event
> outstanding (not
> > +processed by processor or software) for this Posted Interrupt Descriptor.
> When this field is 0,
> > +hardware modifies it from 0 to 1 when generating a notification event, and
> the entity receiving
> > +the notification event (processor or software) resets it as part of posted
> interrupt processing.
> > +
> > +Suppress Notification (SN): Indicate if a notification event is to be
> suppressed (not
> > +generated) for non-urgent interrupt requests (interrupts processed through
> an IRTE with
> > +URG=0).
> > +
> > +Notification Vector (NV): Specify the vector for notification event (interrupt).
> > +
> > +Notification Destination (NDST): Specify the physical APIC-ID of the
> destination logical
> > +processor for the notification event.
> > +
> > +Design Overview
> > +==============
> > +In this design, we will cover the following items:
> > +1. Add a variable to control whether enable VT-d posted-interrupt or not.
> > +2. VT-d PI feature detection.
> > +3. Extend posted-interrupt descriptor structure to cover VT-d PI specific
> stuff.
> 
> s/stuff/items/
> 
> But that really is up to you :-)
> 
> > +4. Extend IRTE structure to support VT-d PI.
> > +5. Introduce a new global vector which is used for waking up the blocked
> vCPU.
> > +6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx
> configuration).
> > +7. Update posted-interrupt descriptor during vCPU scheduling (when the
> state
> > +of the vCPU is transmitted among RUNSTATE_running / RUNSTATE_blocked/
> > +RUNSTATE_runnable / RUNSTATE_offline).
> > +8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup
> notification handler).
> > +9. New boot command line for Xen, which controls VT-d PI feature by user.
> > +10. Multicast/broadcast and lowest priority interrupts consideration.
> > +
> > +
> > +Implementation details
> > +===================
> > +- New variable to control VT-d PI
> > +
> > +Like variable 'iommu_intremap' for interrupt remapping, it is very
> straightforward
> > +to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is
> set
> > +only when interrupt remapping and VT-d posted-interrupt are both enabled.
> > +
> > +- VT-d PI feature detection.
> > +Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt
> support.
> > +
> > +- Extend posted-interrupt descriptor structure to cover VT-d PI specific stuff.
> > +Here is the new structure for posted-interrupt descriptor:
> > +
> > +struct pi_desc {
> > +    DECLARE_BITMAP(pir, NR_VECTORS);
> > +    union {
> > +        struct
> > +        {
> > +        u16 on     : 1,  /* bit 256 - Outstanding Notification */
> > +            sn     : 1,  /* bit 257 - Suppress Notification */
> > +            rsvd_1 : 14; /* bit 271:258 - Reserved */
> > +        u8  nv;          /* bit 279:272 - Notification Vector */
> > +        u8  rsvd_2;      /* bit 287:280 - Reserved */
> > +        u32 ndst;        /* bit 319:288 - Notification Destination */
> > +        };
> > +        u64 control;
> > +    };
> > +    u32 rsvd[6];
> > +} __attribute__ ((aligned (64)));
> > +
> > +- Extend IRTE structure to support VT-d PI.
> > +
> > +Here is the new structure for IRTE:
> > +/* interrupt remap entry */
> > +struct iremap_entry {
> > +  union {
> > +    struct { u64 lo, hi; };
> > +    struct {
> > +        u16 p       : 1,
> > +            fpd     : 1,
> > +            dm      : 1,
> > +            rh      : 1,
> > +            tm      : 1,
> > +            dlm     : 3,
> > +            avail   : 4,
> > +            res_1   : 4;
> > +        u8  vector;
> > +        u8  res_2;
> > +        u32 dst;
> > +        u16 sid;
> > +        u16 sq      : 2,
> > +            svt     : 2,
> > +            res_3   : 12;
> > +        u32 res_4   : 32;
> > +    } remap;
> > +    struct {
> > +        u16 p       : 1,
> > +            fpd     : 1,
> > +            res_1   : 6,
> > +            avail   : 4,
> > +            res_2   : 2,
> > +            urg     : 1,
> > +            im      : 1;
> > +        u8  vector;
> > +        u8  res_3;
> > +        u32 res_4   : 6,
> > +            pda_l   : 26;
> > +        u16 sid;
> > +        u16 sq      : 2,
> > +            svt     : 2,
> > +            res_5   : 12;
> > +        u32 pda_h;
> > +    } post;
> > +  };
> > +};
> > +
> > +- Introduce a new global vector which is used to wake up the blocked vCPU.
> > +
> > +Currently, there is a global vector 'posted_intr_vector', which is used as the
> > +global notification vector for all vCPUs in the system. This vector is stored in
> > +VMCS and CPU considers it as a _special_ vector, uses it to notify the
> related
> > +pCPU when an interrupt is recorded in the posted-interrupt descriptor.
> > +
> > +This existing global vector is a _special_ vector to CPU, CPU handle it in a
> > +_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
> >
> +http://www.intel.com/content/dam/www/public/us/en/documents/manuals/
> 64-ia-32-architectures-software-developer-manual-325462.pdf
> > +for more information about how CPU handles it.
> > +
> > +After having VT-d PI, VT-d engine can issue notification event when the
> > +assigned devices issue interrupts. We need add a new global vector to
> > +wakeup the blocked vCPU, please refer to later section in this design for
> > +how to use this new global vector.
> > +
> > +- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx
> configuration).
> > +After VT-d PI is introduced, the format of IRTE is changed as follows:
> > +	Descriptor Address: the address of the posted-interrupt descriptor
> > +	Virtual Vector: the guest vector of the interrupt
> > +	URG: indicates if the interrupt is urgent
> > +	Other fields continue to have the same meaning
> > +
> > +'Descriptor Address' tells the destination vCPU of this interrupt, since
> > +each vCPU has a dedicated posted-interrupt descriptor.
> > +
> > +'Virtual Vector' tells the guest vector of the interrupt.
> > +
> > +When guest changes the configuration of the interrupts, such as, the
> > +cpu affinity, or the vector, we need to update the associated IRTE
> accordingly.
> > +
> > +- Update posted-interrupt descriptor during vCPU scheduling
> > +
> > +The basic idea here is:
> > +1. When vCPU's state is RUNSTATE_running,
> > +        - Set 'NV' to 'posted_intr_vector'.
> > +        - Clear 'SN' to accept posted-interrupts.
> > +        - Set 'NDST' to the pCPU on which the vCPU will be running.
> > +2. When vCPU's state is RUNSTATE_blocked,
> > +        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
> > +          related vCPU when posted-interrupt happens for it.
> > +          Please refer to the above section about the new global vector.
> > +        - Clear 'SN' to accept posted-interrupts
> > +3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> > +        - Set 'SN' to suppress non-urgent interrupts
> > +          (Current, we only support non-urgent interrupts)
> 
> s/Current/Currently/
> 
> or s/Current/Right now/
> 
> 
> > +         When vCPU is in RUNSTATE_runnable or RUNSTATE_offline,
> 
> s/,//
> > +         It is not needed to accept posted-interrupt notification event,
> 
> s/,//
> 
> > +         since we don't change the behavior of scheduler when the
> interrupt
> > +         occurs, we still need wait the next scheduling of the vCPU.
> 
> s/wait the next/wait for the next/
> > +         When external interrupts from assigned devices occur, the
> interrupts
> > +         are recorded in PIR, and will be synced to IRR before VM-Entry.
> > +        - Set 'NV' to 'posted_intr_vector'.
> > +
> > +- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup
> notification handler).
> > +
> > +Here is the scenario for the usage of the new global vector:
> > +
> > +1. vCPU0 is running on pCPU0
> > +2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
> > +3. An external interrupt from an assigned device occurs for vCPU0, if we
> > +still use 'posted_intr_vector' as the notification vector for vCPU0, the
> > +notification event for vCPU0 (the event will go to pCPU1) will be consumed
> > +by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
> > +case is that vCPU0 will never be woken up again since the wakeup event
> > +for it is always consumed by other vCPUs incorrectly. So we need introduce
> > +another global vector, naming 'pi_wakeup_vector' to wake up the blocked
> vCPU.
> > +
> > +After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
> > +event using this new vector. Since this new vector is not a SPECIAL one to
> CPU,
> > +it is just a normal vector. To cpu, it just receives an normal external
> interrupt,
> 
> s/cpu/CPU/
> > +then we can get control in the handler of this new vector. In this case,
> hypervisor
> > +can do something in it, such as wakeup the blocked vCPU.
> > +
> > +Here are what we do for the blocked vCPU:
> > +1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
> > +vCPU on the pCPU.
> > +2. When the vCPU's state is changed to RUNSTATE_blocked, insert the vCPU
> > +to the per-cpu list belonging to the pCPU it was running.
> > +3. When the vCPU is unblocked, remove the vCPU from the related pCPU
> list.
> > +
> > +In the handler of 'pi_wakeup_vector', we do:
> > +1. Get the physical CPU.
> > +2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
> > +we unblock the associated vCPU.
> > +
> > +- New boot command line for Xen, which controls VT-d PI feature by user.
> > +
> > +Like 'intremap' for interrupt remapping, we add a new boot command line
> > +'intpost' for posted-interrupts.
> > +
> > +- Multicast/broadcast and lowest priority interrupts consideration.
> > +
> > +With VT-d PI, the destination vCPU information of an external interrupt
> > +from assigned devices is stored in IRTE, this makes the following
> > +consideration of the design:
> > +1. Multicast/broadcast interrupts cannot be posted.
> > +2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
> > +(starting from Nehalem) ignore TPR value, and instead supported two other
> > +ways (configurable by BIOS) on how the handle lowest priority interrupts:
> > +	A) Round robin: In this method, the chipset simply delivers lowest priority
> > +interrupts in a round-robin manner across all the available logical CPUs.
> While
> > +this provides good load balancing, this was not the best thing to do always
> as
> > +interrupts from the same device (like NIC) will start running on all the CPUs
> > +thrashing caches and taking locks. This led to the next scheme.
> > +	B) Vector hashing: In this method, hardware would apply a hash function
> > +on the vector value in the interrupt request, and use that hash to pick a
> logical
> > +CPU to route the lowest priority interrupt. This way, a given vector always
> goes
> > +to the same logical CPU, avoiding the thrashing problem above.
> > +
> > +So, gist of above is that, lowest priority interrupts has never been delivered
> as
> > +"lowest priority" in physical hardware.
> > +
> > +Vector hashing is used in this design.
> 
> And with those tiny little changes:
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Thanks a lot for your review on this whole series, Konrad!

Thanks,
Feng

> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running
  2015-08-12 17:02   ` Konrad Rzeszutek Wilk
@ 2015-08-13  1:37     ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-08-13  1:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, Andrew Cooper, xen-devel, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Thursday, August 13, 2015 1:02 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Keir Fraser; Tian, Kevin; Jan Beulich; Andrew
> Cooper
> Subject: Re: [Xen-devel] [PATCH v5 14/17] vmx: Properly handle notification
> event when vCPU is running
> 
> On Wed, Aug 12, 2015 at 10:35:35AM +0800, Feng Wu wrote:
> > When a vCPU is running in Root mode and a notification event
> > has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
> > the current cpu, so the pending interrupt in PIRR will be
> > synced to vIRR before VM-Exit in time.
> >
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Acked-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> > v4:
> > - Coding style.
> >
> > v3:
> > - Make pi_notification_interrupt() static
> >
> >  xen/arch/x86/hvm/vmx/vmx.c | 47
> +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index e80d888..c8a4371 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -2033,6 +2033,51 @@ static void pi_wakeup_interrupt(struct
> cpu_user_regs *regs)
> >      this_cpu(irq_count)++;
> >  }
> >
> > +/* Handle VT-d posted-interrupt when VCPU is running. */
> > +static void pi_notification_interrupt(struct cpu_user_regs *regs)
> > +{
> > +    /*
> > +     * We get here when a vCPU is running in root-mode (such as via
> hypercall,
> > +     * or any other reasons which can result in VM-Exit), and before vCPU
> is
> > +     * back to non-root, external interrupts from an assigned device
> happen
> > +     * and a notification event is delivered to this logical CPU.
> > +     *
> > +     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
> > +     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR
> will
> > +     * be synced to vIRR before VM-Exit in time.
> > +     *
> > +     * Please refer to the following code fragments from
> > +     * xen/arch/x86/hvm/vmx/entry.S:
> > +     *
> > +     *     .Lvmx_do_vmentry
> > +     *
> > +     *      ......
> > +     *      point 1
> > +     *
> > +     *      cmp  %ecx,(%rdx,%rax,1)
> > +     *      jnz  .Lvmx_process_softirqs
> > +     *
> > +     *      ......
> > +     *
> > +     *      je   .Lvmx_launch
> > +     *
> > +     *      ......
> > +     *
> > +     *     .Lvmx_process_softirqs:
> > +     *      sti
> > +     *      call do_softirq
> > +     *      jmp  .Lvmx_do_vmentry
> > +     *
> > +     * If VT-d engine issues a notification event at point 1 above, it cannot
> > +     * be delivered to the guest during this VM-entry without raising the
> > +     * softirq in this notification handler.
> > +     */
> > +    raise_softirq(VCPU_KICK_SOFTIRQ);
> > +
> > +    ack_APIC_irq();
> > +    this_cpu(irq_count)++;
> 
> Most (except the AMD?) have ack_APIC_irq() and such done at the start
> of the functions. Is there a particular need to diverge?

Nothing special here, I can move it to the beginning of this function.

Thanks,
Feng

> 
> > +}
> > +
> >  const struct hvm_function_table * __init start_vmx(void)
> >  {
> >      set_in_cr4(X86_CR4_VMXE);
> > @@ -2071,7 +2116,7 @@ const struct hvm_function_table * __init
> start_vmx(void)
> >
> >      if ( cpu_has_vmx_posted_intr_processing )
> >      {
> > -        alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt);
> > +        alloc_direct_apic_vector(&posted_intr_vector,
> pi_notification_interrupt);
> >
> >          if ( iommu_intpost )
> >              alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 12/17] Update IRTE according to guest interrupt config changes
  2015-08-12 16:43   ` Konrad Rzeszutek Wilk
@ 2015-08-13  1:37     ` Wu, Feng
  2015-08-13  8:36       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-08-13  1:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wu, Feng, Jan Beulich, xen-devel



> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Thursday, August 13, 2015 12:43 AM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v5 12/17] Update IRTE according to guest
> interrupt config changes
> 
> On Wed, Aug 12, 2015 at 10:35:33AM +0800, Feng Wu wrote:
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > For lowest-priority interrupts, we use vector-hashing mechamisn to find
> > the destination vCPU. This follows the hardware behavior, since modern
> > Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> >
> > For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > still use interrupt remapping.
> >
> > CC: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> > v5:
> > - Make 'struct vcpu *vcpu' const
> >
> > v4:
> > - Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
> > - Make 'dest_id' uint32_t
> > - Rename 'size' to 'bitmap_array_size'
> > - find_next_bit() and find_first_bit() always return unsigned int,
> >   so no need to check whether the return value is less than 0.
> > - Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
> > - Remove useless warning message
> > - Create a seperate function vector_hashing_dest() to find the
> > - destination of lowest-priority interrupts.
> > - Change some comments
> >
> > v3:
> > - Use bitmap to store the all the possible destination vCPUs of an
> >   interrupt, then trying to find the right destination from the bitmap
> > - Typo and some small changes
> >
> >  xen/drivers/passthrough/io.c | 124
> ++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 123 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> > index bda9374..f62f86c 100644
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -25,6 +25,7 @@
> >  #include <asm/hvm/iommu.h>
> >  #include <asm/hvm/support.h>
> >  #include <xen/hvm/irq.h>
> > +#include <asm/io_apic.h>
> >
> >  static DEFINE_PER_CPU(struct list_head, dpci_list);
> >
> > @@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
> >      xfree(dpci);
> >  }
> >
> > +/*
> > + * This routine handles lowest-priority interrupts using vector-hashing
> > + * mechanism. As an example, modern Intel CPUs use this method to handle
> > + * lowest-priority interrupts.
> > + *
> > + * Here is the details about the vector-hashing mechanism:
> > + * 1. For lowest-priority interrupts, store all the possible destination
> > + *    vCPUs in an array.
> > + * 2. Use "gvec % max number of destination vCPUs" to find the right
> > + *    destination vCPU in the array for the lowest-priority interrupt.
> > + */
> > +static struct vcpu *vector_hashing_dest(const struct domain *d,
> > +                                        uint32_t dest_id,
> > +                                        bool_t dest_mode,
> > +                                        uint8_t gvec)
> > +
> > +{
> > +    unsigned long *dest_vcpu_bitmap;
> > +    unsigned int dest_vcpu_num = 0, idx;
> > +    unsigned int bitmap_array_size = BITS_TO_LONGS(d->max_vcpus);
> > +    struct vcpu *v, *dest = NULL;
> > +    unsigned int i;
> > +
> > +    dest_vcpu_bitmap = xzalloc_array(unsigned long, bitmap_array_size);
> > +    if ( !dest_vcpu_bitmap )
> > +    {
> > +        dprintk(XENLOG_G_INFO,
> > +                "dom%d: failed to allocate memory\n", d->domain_id);
> > +        return NULL;
> > +    }
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> 
> s/0/APIC_DEST_NOSHORT/
> 
> > +                                dest_id, dest_mode) )
> > +            continue;
> > +
> > +        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
> > +        dest_vcpu_num++;
> 
> Perhaps change the variable to:
> 
> dest_vcpus
> 
> ?
> > +    }
> > +
> > +    if ( dest_vcpu_num != 0 )
> > +    {
> > +        idx = 0;
> > +
> > +        for ( i = gvec % dest_vcpu_num; i >= 0; i--)
> 
> That loop is not good as it will overflow.
> 
> Imagine gvec = 40, dest_vcpu_num = 2
> On first iteration i = 0, on the next i = -1 (aka 0xfffffff), and so on.
> 

So we should define 'i' as int, right?

Thanks,
Feng

> > +        {
> > +            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) +
> 1;
> > +            BUG_ON(idx >= d->max_vcpus);
> > +        }
> > +        idx--;
> > +
> > +        dest = d->vcpu[idx];
> > +    }
> > +
> > +    xfree(dest_vcpu_bitmap);
> > +
> > +    return dest;
> > +}
> > +
> > +/*
> > + * The purpose of this routine is to find the right destination vCPU for
> > + * an interrupt which will be delivered by VT-d posted-interrupt. There
> > + * are several cases as below:
> > + *
> > + * - For lowest-priority interrupts, use vector-hashing mechanism to find
> > + *   the destination.
> > + * - Otherwise, for single destination interrupt, it is straightforward to
> > + *   find the destination vCPU and return true.
> > + * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> > + *   so return NULL.
> > + */
> > +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t
> dest_id,
> > +                                      bool_t dest_mode, uint8_t
> delivery_mode,
> > +                                      uint8_t gvec)
> > +{
> > +    unsigned int dest_vcpu_num = 0;
> > +    struct vcpu *v, *dest = NULL;
> > +
> > +    if ( delivery_mode == dest_LowestPrio )
> > +        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
> > +
> > +    for_each_vcpu ( d, v )
> > +    {
> > +        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0,
> 
> s/0/APIC_DEST_NOSHORT/
> > +                                dest_id, dest_mode) )
> > +            continue;
> > +
> > +        dest_vcpu_num++;
> > +        dest = v;
> > +    }
> > +
> > +    /*
> > +     * For fixed destination, we only handle single-destination
> > +     * interrupts.
> > +     */
> > +    if ( dest_vcpu_num == 1 )
> > +	return dest;
> 
> Something is off with the tabs here.
> 
> > +
> > +    return NULL;
> > +}
> > +
> >  int pt_irq_create_bind(
> >      struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
> >  {
> > @@ -256,7 +359,7 @@ int pt_irq_create_bind(
> >      {
> >      case PT_IRQ_TYPE_MSI:
> >      {
> > -        uint8_t dest, dest_mode;
> > +        uint8_t dest, dest_mode, delivery_mode;
> >          int dest_vcpu_id;
> >
> >          if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
> > @@ -329,11 +432,30 @@ int pt_irq_create_bind(
> >          /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> >          dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> >          dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > +        delivery_mode = (pirq_dpci->gmsi.gflags >>
> GFLAGS_SHIFT_DELIV_MODE) &
> > +                        VMSI_DELIV_MASK;
> >          dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >          pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> >          spin_unlock(&d->event_lock);
> >          if ( dest_vcpu_id >= 0 )
> >              hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > +
> > +        /* Use interrupt posting if it is supported */
> 
> Missing full stop.
> 
> > +        if ( iommu_intpost )
> > +        {
> > +            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest,
> dest_mode,
> > +                                          delivery_mode,
> pirq_dpci->gmsi.gvec);
> > +
> > +            if ( vcpu )
> > +            {
> > +                rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
> > +                if ( unlikely(rc) )
> > +                    dprintk(XENLOG_G_INFO,
> > +                            "%pv: failed to update PI IRTE,
> gvec:%02x\n",
> > +                            vcpu, pirq_dpci->gmsi.gvec);
> > +            }
> > +        }
> > +
> >          break;
> >      }
> >
> > --
> > 2.1.0
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-12 17:41   ` Andrew Cooper
@ 2015-08-13  1:41     ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-08-13  1:41 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Zhang, Yang Z, Wu, Feng, Tian, Kevin, Keir Fraser, Jan Beulich



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Thursday, August 13, 2015 1:41 AM
> To: Wu, Feng; xen-devel@lists.xen.org
> Cc: Zhang, Yang Z; Tian, Kevin; Keir Fraser; Jan Beulich
> Subject: Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
> 
> On 12/08/15 03:35, Feng Wu wrote:
> > +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index,
> iremap_entries, p);
> > +
> > +    old_ire = new_ire = *p;
> > +
> > +    /* Setup/Update interrupt remapping table entry. */
> > +    setup_posted_irte(&new_ire, pi_desc, gvec);
> > +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> > +
> > +    ASSERT(ret == *(__uint128_t *)&old_ire);
> 
> You have still not addressed my issues from earlier series.  This
> ASSERT() is buggy until you have a clear, detailed description of why it
> is claimed to be safe in this instance.

Sure, I am add a description here in the next version! Thanks again!

Thanks,
Feng

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design
  2015-08-13  1:37     ` Wu, Feng
@ 2015-08-13  8:29       ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2015-08-13  8:29 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper, xen-devel,
	Yang Z Zhang

>>> On 13.08.15 at 03:37, <feng.wu@intel.com> wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: Wednesday, August 12, 2015 11:35 PM
>> And with those tiny little changes:
>> 
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Thanks a lot for your review on this whole series, Konrad!

Especially with a reply like this, you should have - just like I'm doing
now - trimmed almost all of the previous mail's body from your reply.
There's absolutely no reason to have all readers scroll through
many pages just to find this one line response.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-12 16:23   ` Konrad Rzeszutek Wilk
@ 2015-08-13  8:33     ` Jan Beulich
  2015-08-13 18:27       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2015-08-13  8:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, xen-devel, Yang Zhang, Feng Wu

>>> On 12.08.15 at 18:23, <konrad.wilk@oracle.com> wrote:
> On Wed, Aug 12, 2015 at 10:35:32AM +0800, Feng Wu wrote:
>> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>> +
>> +    old_ire = new_ire = *p;
>> +
>> +    /* Setup/Update interrupt remapping table entry. */
>> +    setup_posted_irte(&new_ire, pi_desc, gvec);
>> +    ret = cmpxchg16b(p, &old_ire, &new_ire);
>> +
>> +    ASSERT(ret == *(__uint128_t *)&old_ire);
>> +
>> +    iommu_flush_cache_entry(p, sizeof(*p));
> 
> The other use sites of iommu_flush_cache_entry are sizeof(struct 
> iremap_entry).
> I think if you are doing this modification - then you should also have an
> patch to modify the other code to be in sync.
> 
> Or just use the sizeof(struct ...).

I asked for this (to avoid proliferation of the bad style); I
certainly wouldn't mind another cleanup patch to deal with the
other use sites, but I certainly don't want to see this go back
to sizeof(struct ...).

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 12/17] Update IRTE according to guest interrupt config changes
  2015-08-13  1:37     ` Wu, Feng
@ 2015-08-13  8:36       ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2015-08-13  8:36 UTC (permalink / raw)
  To: Feng Wu; +Cc: xen-devel

>>> On 13.08.15 at 03:37, <feng.wu@intel.com> wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: Thursday, August 13, 2015 12:43 AM
>> On Wed, Aug 12, 2015 at 10:35:33AM +0800, Feng Wu wrote:
>> > +    }
>> > +
>> > +    if ( dest_vcpu_num != 0 )
>> > +    {
>> > +        idx = 0;
>> > +
>> > +        for ( i = gvec % dest_vcpu_num; i >= 0; i--)
>> 
>> That loop is not good as it will overflow.
>> 
>> Imagine gvec = 40, dest_vcpu_num = 2
>> On first iteration i = 0, on the next i = -1 (aka 0xfffffff), and so on.
>> 
> 
> So we should define 'i' as int, right?

No, you should fix the loop. 'i' can't validly hold negative values.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts
  2015-08-12 17:13   ` Konrad Rzeszutek Wilk
@ 2015-08-13  9:06     ` Dario Faggioli
  2015-08-13  9:15       ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Dario Faggioli @ 2015-08-13  9:06 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper, xen-devel,
	Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1450 bytes --]

On Wed, 2015-08-12 at 13:13 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 12, 2015 at 10:35:36AM +0800, Feng Wu wrote:
> > This patch adds the following arch hooks in scheduler:
> > - vmx_pre_ctx_switch_pi():
> > It is called before context switch, we update the posted
> > interrupt descriptor when the vCPU is preempted, go to sleep,
> > or is blocked.
> > 
> > - vmx_post_ctx_switch_pi()
> > It is called after context switch, we update the posted
> > interrupt descriptor when the vCPU is going to run.
> > 
> > - arch_vcpu_wake_prepare()
> > It will be called when waking up the vCPU, we update
> > the posted interrupt descriptor when the vCPU is unblocked.
> > 
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: George Dunlap <george.dunlap@eu.citrix.com>
> > CC: Dario Faggioli <dario.faggioli@citrix.com>
> > Sugguested-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
>
With Konrad's points addressed:

Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts
  2015-08-13  9:06     ` Dario Faggioli
@ 2015-08-13  9:15       ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-08-13  9:15 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Jan Beulich, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Thursday, August 13, 2015 5:06 PM
> To: Wu, Feng
> Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser;
> George Dunlap; Andrew Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v5 15/17] vmx: Add some scheduler hooks for
> VT-d posted interrupts
> 
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> >
> With Konrad's points addressed:
> 
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Thanks for this!

-Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used
  2015-08-13  8:33     ` Jan Beulich
@ 2015-08-13 18:27       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 47+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-08-13 18:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Andrew Cooper, xen-devel, Yang Zhang, Feng Wu

On Thu, Aug 13, 2015 at 02:33:03AM -0600, Jan Beulich wrote:
> >>> On 12.08.15 at 18:23, <konrad.wilk@oracle.com> wrote:
> > On Wed, Aug 12, 2015 at 10:35:32AM +0800, Feng Wu wrote:
> >> +    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
> >> +
> >> +    old_ire = new_ire = *p;
> >> +
> >> +    /* Setup/Update interrupt remapping table entry. */
> >> +    setup_posted_irte(&new_ire, pi_desc, gvec);
> >> +    ret = cmpxchg16b(p, &old_ire, &new_ire);
> >> +
> >> +    ASSERT(ret == *(__uint128_t *)&old_ire);
> >> +
> >> +    iommu_flush_cache_entry(p, sizeof(*p));
> > 
> > The other use sites of iommu_flush_cache_entry are sizeof(struct 
> > iremap_entry).
> > I think if you are doing this modification - then you should also have an
> > patch to modify the other code to be in sync.
> > 
> > Or just use the sizeof(struct ...).
> 
> I asked for this (to avoid proliferation of the bad style); I
> certainly wouldn't mind another cleanup patch to deal with the
> other use sites, but I certainly don't want to see this go back
> to sizeof(struct ...).

Aah, I missed that!

Thank you.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-08-13 18:27 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-12  2:35 [PATCH v5 00/17] Add VT-d Posted-Interrupts support Feng Wu
2015-08-12  2:35 ` [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design Feng Wu
2015-08-12 15:35   ` Konrad Rzeszutek Wilk
2015-08-12 17:27     ` Andrew Cooper
2015-08-12 17:46       ` Konrad Rzeszutek Wilk
2015-08-13  1:37     ` Wu, Feng
2015-08-13  8:29       ` Jan Beulich
2015-08-12  2:35 ` [PATCH v5 02/17] Add cmpxchg16b support for x86-64 Feng Wu
2015-08-12 15:38   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 03/17] iommu: Add iommu_intpost to control VT-d Posted-Interrupts feature Feng Wu
2015-08-12 15:39   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 04/17] vt-d: VT-d Posted-Interrupts feature detection Feng Wu
2015-08-12 15:40   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 05/17] vmx: Extend struct pi_desc to support VT-d Posted-Interrupts Feng Wu
2015-08-12 15:45   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 06/17] vmx: Add some helper functions for Posted-Interrupts Feng Wu
2015-08-12 15:46   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 07/17] vmx: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2015-08-12 17:03   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 08/17] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-08-12 17:03   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 09/17] VT-d: Remove pointless casts Feng Wu
2015-08-12 17:03   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 10/17] vt-d: Extend struct iremap_entry to support VT-d Posted-Interrupts Feng Wu
2015-08-12  2:35 ` [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-08-12 16:23   ` Konrad Rzeszutek Wilk
2015-08-13  8:33     ` Jan Beulich
2015-08-13 18:27       ` Konrad Rzeszutek Wilk
2015-08-12 17:41   ` Andrew Cooper
2015-08-13  1:41     ` Wu, Feng
2015-08-12  2:35 ` [PATCH v5 12/17] Update IRTE according to guest interrupt config changes Feng Wu
2015-08-12 16:43   ` Konrad Rzeszutek Wilk
2015-08-13  1:37     ` Wu, Feng
2015-08-13  8:36       ` Jan Beulich
2015-08-12  2:35 ` [PATCH v5 13/17] vmx: posted-interrupt handling when vCPU is blocked Feng Wu
2015-08-12 16:59   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 14/17] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-08-12 17:02   ` Konrad Rzeszutek Wilk
2015-08-13  1:37     ` Wu, Feng
2015-08-12  2:35 ` [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts Feng Wu
2015-08-12 17:13   ` Konrad Rzeszutek Wilk
2015-08-13  9:06     ` Dario Faggioli
2015-08-13  9:15       ` Wu, Feng
2015-08-12  2:35 ` [PATCH v5 16/17] VT-d: Dump the posted format IRTE Feng Wu
2015-08-12 17:14   ` Konrad Rzeszutek Wilk
2015-08-12  2:35 ` [PATCH v5 17/17] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-08-12 17:16   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).