All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/7] Add VT-d Posted-Interrupts support
@ 2015-12-03  8:35 Feng Wu
  2015-12-03  8:35 ` [PATCH v10 1/7] VT-d Posted-intterrupt (PI) design Feng Wu
                   ` (8 more replies)
  0 siblings, 9 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

Feng Wu (17):
 r   VT-d Posted-intterrupt (PI) design
 ra  vmx: Suppress posting interrupts when 'SN' is set
 r   vt-d: Add API to update IRTE when VT-d PI is used
  a  Update IRTE according to guest interrupt config changes
  a  vmx: Properly handle notification event when vCPU is running
     vmx: VT-d posted-interrupt core logic handling
 ra  Add a command line parameter for VT-d posted-interrupts
 
 r = has been 'Reviewed-by'
 a = has been 'Acked-by'

Feng Wu (7):
  VT-d Posted-intterrupt (PI) design
  vmx: Suppress posting interrupts when 'SN' is set
  vt-d: Add API to update IRTE when VT-d PI is used
  Update IRTE according to guest interrupt config changes
  vmx: Properly handle notification event when vCPU is running
  vmx: VT-d posted-interrupt core logic handling
  Add a command line parameter for VT-d posted-interrupts

 docs/misc/vtd-pi.txt                   | 336 +++++++++++++++++++++++++++++++++
 docs/misc/xen-command-line.markdown    |   9 +-
 xen/arch/x86/hvm/hvm.c                 |   6 +
 xen/arch/x86/hvm/vmx/vmcs.c            |   2 +
 xen/arch/x86/hvm/vmx/vmx.c             | 255 ++++++++++++++++++++++++-
 xen/common/schedule.c                  |   4 +
 xen/drivers/passthrough/io.c           | 123 +++++++++++-
 xen/drivers/passthrough/iommu.c        |   3 +
 xen/drivers/passthrough/vtd/intremap.c | 126 +++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |   6 +
 xen/include/asm-arm/domain.h           |   2 +
 xen/include/asm-x86/domain.h           |   2 +
 xen/include/asm-x86/hvm/hvm.h          |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h     |   9 +
 xen/include/asm-x86/hvm/vmx/vmx.h      |   4 +
 xen/include/asm-x86/iommu.h            |   2 +
 16 files changed, 887 insertions(+), 4 deletions(-)
 create mode 100644 docs/misc/vtd-pi.txt

-- 
2.1.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v10 1/7] VT-d Posted-intterrupt (PI) design
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-03  8:35 ` [PATCH v10 2/7] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jan Beulich, Feng Wu

Add the design doc for VT-d PI.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Keir Fraser <keir@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v10:
- Add some new definitions
- Add more description in the vCPU block section
- Use chapter name instead of chapter number when referring SDM
- typo

 docs/misc/vtd-pi.txt | 336 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)
 create mode 100644 docs/misc/vtd-pi.txt

diff --git a/docs/misc/vtd-pi.txt b/docs/misc/vtd-pi.txt
new file mode 100644
index 0000000..bdf9be1
--- /dev/null
+++ b/docs/misc/vtd-pi.txt
@@ -0,0 +1,336 @@
+Authors: Feng Wu <feng.wu@intel.com>
+
+VT-d Posted-interrupt (PI) design for XEN
+
+Important Definitions
+==================
+VT-d posted-interrupts: posted-interrupts support in root-complex side
+CPU-side posted-interrupts: posted-interrupts support in CPU side
+IRTE: Interrupt Remapping Table Entry
+Posted-interrupt Descriptor Address: the address of the posted-interrupt descriptor
+Virtual Vector: the guest vector of the interrupt
+URG: indicates if the interrupt is urgent
+
+Posted-interrupt descriptor:
+The Posted Interrupt Descriptor hosts the following fields:
+Posted Interrupt Request (PIR): Provide storage for posting (recording) interrupts (one bit
+per vector, for up to 256 vectors).
+
+Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
+processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
+hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
+the notification event (processor or software) resets it as part of posted interrupt processing.
+
+Suppress Notification (SN): Indicate if a notification event is to be suppressed (not
+generated) for non-urgent interrupt requests (interrupts processed through an IRTE with
+URG=0).
+
+Notification Vector (NV): Specify the vector for notification event (interrupt).
+
+Notification Destination (NDST): Specify the physical APIC-ID of the destination logical
+processor for the notification event.
+
+Background
+==========
+With the development of virtualization, there are more and more device
+assignment requirements. However, today when a VM is running with
+assigned devices (such as, NIC), external interrupt handling for the assigned
+devices always needs VMM intervention.
+
+VT-d Posted-interrupt is a more enhanced method to handle interrupts
+in the virtualization environment. Interrupt posting is the process by
+which an interrupt request is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex or software,
+followed by an optional notification event issued to the CPU.
+
+With VT-d Posted-interrupt we can get the following advantages:
+- Direct delivery of external interrupts to running vCPUs without VMM
+intervention
+- Decrease the interrupt migration complexity. On vCPU migration, software
+can atomically co-migrate all interrupts targeting the migrating vCPU. For
+virtual machines with assigned devices, migrating a vCPU across pCPUs
+either incurs the overhead of forwarding interrupts in software (e.g. via VMM
+generated IPIs), or complexity to independently migrate each interrupt targeting
+the vCPU to the new pCPU. However, after enabling VT-d PI, the destination vCPU
+of an external interrupt from assigned devices is stored in the IRTE (i.e.
+Posted-interrupt Descriptor Address), when vCPU is migrated to another pCPU,
+we will set this new pCPU in the 'NDST' field of Posted-interrupt descriptor, this
+make the interrupt migration automatic.
+
+Here is what Xen currently does for external interrupts from assigned devices:
+
+When a VM is running and an external interrupt from an assigned device occurs
+for it. VM-EXIT happens, then:
+
+vmx_do_extint() --> do_IRQ() --> __do_IRQ_guest() --> hvm_do_IRQ_dpci() -->
+raise_softirq_for(pirq_dpci) --> raise_softirq(HVM_DPCI_SOFTIRQ)
+
+softirq HVM_DPCI_SOFTIRQ is bound to dpci_softirq()
+
+dpci_softirq() --> hvm_dirq_assist() --> vmsi_deliver_pirq() --> vmsi_deliver() -->
+vmsi_inj_irq() --> vlapic_set_irq()
+
+vlapic_set_irq() does the following things:
+1. If CPU-side posted-interrupt is supported, call vmx_deliver_posted_intr() to deliver
+the virtual interrupt via posted-interrupt infrastructure.
+2. Else if CPU-side posted-interrupt is not supported, set the related vIRR in vLAPIC
+page and call vcpu_kick() to kick the related vCPU. Before VM-Entry, vmx_intr_assist()
+will help to inject the interrupt to guests.
+
+However, after VT-d PI is supported, when a guest is running in non-root and an
+external interrupt from an assigned device occurs for it. no VM-Exit is needed,
+the guest can handle this totally in non-root mode, thus avoiding all the above
+code flow.
+
+Posted-interrupt Introduction
+========================
+There are two components in the Posted-interrupt architecture:
+Processor Support and Root-Complex Support
+
+- Processor Support
+Posted-interrupt processing is a feature by which a processor processes
+the virtual interrupts by recording them as pending on the virtual-APIC
+page.
+
+Posted-interrupt processing is enabled by setting the process posted
+interrupts VM-execution control. The processing is performed in response
+to the arrival of an interrupt with the posted-interrupt notification vector.
+In response to such an interrupt, the processor processes virtual interrupts
+recorded in a data structure called a posted-interrupt descriptor.
+
+More information about APICv and CPU-side Posted-interrupt, please refer
+to Chapter "APIC VIRTUALIZATION AND VIRTUAL INTERRUPTS", and Section
+"POSTED-INTERRUPT PROCESSING" in the Intel SDM:
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+
+- Root-Complex Support
+Interrupt posting is the process by which an interrupt request (from IOAPIC
+or MSI/MSIx capable sources) is recorded in a memory-resident
+posted-interrupt-descriptor structure by the root-complex, followed by
+an optional notification event issued to the CPU complex. The interrupt
+request arriving at the root-complex carry the identity of the interrupt
+request source and a 'remapping-index'. The remapping-index is used to
+look-up an entry from the memory-resident interrupt-remap-table. Unlike
+interrupt-remapping, the interrupt-remap-table-entry for a posted-interrupt,
+specifies a virtual-vector and a pointer to the posted-interrupt descriptor.
+The virtual-vector specifies the vector of the interrupt to be recorded in
+the posted-interrupt descriptor. The posted-interrupt descriptor hosts storage
+for the virtual-vectors and contains the attributes of the notification event
+(interrupt) to be issued to the CPU complex to inform CPU/software about pending
+interrupts recorded in the posted-interrupt descriptor.
+
+More information about VT-d PI, please refer to
+http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
+
+Design Overview
+==============
+In this design, we will cover the following items:
+1. Add a variable to control whether enable VT-d posted-interrupt or not.
+2. VT-d PI feature detection.
+3. Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+4. Extend IRTE structure to support VT-d PI.
+5. Introduce a new global vector which is used for waking up the blocked vCPU.
+6. Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+7. Update posted-interrupt descriptor during vCPU scheduling.
+8. How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+9. New boot command line for Xen, which controls VT-d PI feature by user.
+10. Multicast/broadcast and lowest priority interrupts consideration.
+
+
+Implementation details
+===================
+- New variable to control VT-d PI
+
+Like variable 'iommu_intremap' for interrupt remapping, it is very straightforward
+to add a new one 'iommu_intpost' for posted-interrupt. 'iommu_intpost' is set
+only when interrupt remapping and VT-d posted-interrupt are both enabled.
+
+- VT-d PI feature detection.
+Bit 59 in VT-d Capability Register is used to report VT-d Posted-interrupt support.
+
+- Extend posted-interrupt descriptor structure to cover VT-d PI specific items.
+Here is the new structure for posted-interrupt descriptor:
+
+struct pi_desc {
+    DECLARE_BITMAP(pir, NR_VECTORS);
+    union {
+        struct
+        {
+        u16 on     : 1,  /* bit 256 - Outstanding Notification */
+            sn     : 1,  /* bit 257 - Suppress Notification */
+            rsvd_1 : 14; /* bit 271:258 - Reserved */
+        u8  nv;          /* bit 279:272 - Notification Vector */
+        u8  rsvd_2;      /* bit 287:280 - Reserved */
+        u32 ndst;        /* bit 319:288 - Notification Destination */
+        };
+        u64 control;
+    };
+    u32 rsvd[6];
+} __attribute__ ((aligned (64)));
+
+- Extend IRTE structure to support VT-d PI.
+
+Here is the new structure for IRTE:
+/* interrupt remap entry */
+struct iremap_entry {
+  union {
+    struct { u64 lo, hi; };
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            dm      : 1,
+            rh      : 1,
+            tm      : 1,
+            dlm     : 3,
+            avail   : 4,
+            res_1   : 4;
+        u8  vector;
+        u8  res_2;
+        u32 dst;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_3   : 12;
+        u32 res_4   : 32;
+    } remap;
+    struct {
+        u16 p       : 1,
+            fpd     : 1,
+            res_1   : 6,
+            avail   : 4,
+            res_2   : 2,
+            urg     : 1,
+            im      : 1;
+        u8  vector;
+        u8  res_3;
+        u32 res_4   : 6,
+            pda_l   : 26;
+        u16 sid;
+        u16 sq      : 2,
+            svt     : 2,
+            res_5   : 12;
+        u32 pda_h;
+    } post;
+  };
+};
+
+- Introduce a new global vector which is used to wake up the blocked vCPU.
+
+Currently, there is a global vector 'posted_intr_vector', which is used as the
+global notification vector for all vCPUs in the system. This vector is stored in
+VMCS and CPU considers it as a _special_ vector, uses it to notify the related
+pCPU when an interrupt is recorded in the posted-interrupt descriptor.
+
+This existing global vector is a _special_ vector to CPU, CPU handle it in a
+_special_ way compared to normal vectors, please refer to 29.6 in Intel SDM
+http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
+for more information about how CPU handles it.
+
+After having VT-d PI, VT-d engine can issue notification event when the
+assigned devices issue interrupts. We need add a new global vector to
+wakeup the blocked vCPU, please refer to later section in this design for
+how to use this new global vector.
+
+- Update IRTE when guest modifies the interrupt configuration (MSI/MSIx configuration).
+After VT-d PI is introduced, the format of IRTE is changed as follows:
+	Descriptor Address: the address of the posted-interrupt descriptor
+	Virtual Vector: the guest vector of the interrupt
+	URG: indicates if the interrupt is urgent
+	Other fields continue to have the same meaning
+
+'Descriptor Address' tells the destination vCPU of this interrupt, since
+each vCPU has a dedicated posted-interrupt descriptor.
+
+'Virtual Vector' tells the guest vector of the interrupt.
+
+When guest changes the configuration of the interrupts, such as, the
+cpu affinity, or the vector, we need to update the associated IRTE accordingly.
+
+- Update posted-interrupt descriptor during vCPU scheduling
+
+The basic idea here is:
+1. When vCPU is running
+        - Set 'NV' to 'posted_intr_vector'.
+        - Clear 'SN' to accept posted-interrupts.
+        - Set 'NDST' to the pCPU on which the vCPU will be running.
+2. When vCPU is blocked
+        - Set 'NV' to ' pi_wakeup_vector ', so we can wake up the
+          related vCPU when posted-interrupt happens for it.
+          Please refer to the above section about the new global vector.
+        - Clear 'SN' to accept posted-interrupts
+3. When vCPU is preempted or sleeping
+        - Set 'SN' to suppress non-urgent interrupts
+          (Currently, we only support non-urgent interrupts)
+         When vCPU is preempted or sleep, it doesn't need to accept
+         posted-interrupt notification event since we don't change the behavior
+         of scheduler when the interrupt occurs, we still need wait for the next
+         scheduling of the vCPU. When external interrupts from assigned devices occur,
+         the interrupts are recorded in PIR, and will be synced to IRR before VM-Entry.
+        - Set 'NV' to 'posted_intr_vector'.
+
+- How to wakeup blocked vCPU when an interrupt is posted for it (wakeup notification handler).
+
+Here is the scenario for the usage of the new global vector:
+
+1. vCPU0 is running on pCPU0
+2. vCPU0 is blocked and vCPU1 is currently running on pCPU0
+3. An external interrupt from an assigned device occurs for vCPU0, if we
+still use 'posted_intr_vector' as the notification vector for vCPU0, the
+notification event for vCPU0 (the event will go to pCPU1) will be consumed
+by vCPU1 incorrectly (remember this is a special vector to CPU). The worst
+case is that vCPU0 will never be woken up again since the wakeup event
+for it is always consumed by other vCPUs incorrectly. So we need introduce
+another global vector, naming 'pi_wakeup_vector' to wake up the blocked vCPU.
+
+After using 'pi_wakeup_vector' for vCPU0, VT-d engine will issue notification
+event using this new vector. Since this new vector is not a SPECIAL one to CPU,
+it is just a normal vector. To CPU, it just receives an normal external interrupt,
+then we can get control in the handler of this new vector. In this case, hypervisor
+can do something in it, such as wakeup the blocked vCPU.
+
+Here are what we do for the blocked vCPU:
+1. Define a per-cpu list 'pi_blocked_vcpu', which stored the blocked
+vCPU on the pCPU.
+2. When the vCPU is going to block, insert the vCPU
+to the per-cpu list belonging to the pCPU it was running.
+3. When the vCPU is unblocked, remove the vCPU from the related pCPU list.
+
+In the handler of 'pi_wakeup_vector', we do:
+1. Get the physical CPU.
+2. Iterate the list 'pi_blocked_vcpu' of the current pCPU, if 'ON' is set,
+we unblock the associated vCPU.
+
+When the vCPU is blocked, we change the posted-interrupts descriptor and
+put it in the pCPU's blocking list, we don't change the status of posted-
+interrupts descriptor back when the vCPU is unblocked or the blocking
+operation directly returns since there are events to be delivered. Instead,
+we do it exactly before VM-Entry.
+
+- New boot command line for Xen, which controls VT-d PI feature by user.
+
+Like 'intremap' for interrupt remapping, we add a new boot command line
+'intpost' for posted-interrupts.
+
+- Multicast/broadcast and lowest priority interrupts consideration.
+
+With VT-d PI, the destination vCPU information of an external interrupt
+from assigned devices is stored in IRTE, this makes the following
+consideration of the design:
+1. Multicast/broadcast interrupts cannot be posted.
+2. For lowest-priority interrupts, new Intel CPU/Chipset/root-complex
+(starting from Nehalem) ignore TPR value, and instead supported two other
+ways (configurable by BIOS) on how the handle lowest priority interrupts:
+	A) Round robin: In this method, the chipset simply delivers lowest priority
+interrupts in a round-robin manner across all the available logical CPUs. While
+this provides good load balancing, this was not the best thing to do always as
+interrupts from the same device (like NIC) will start running on all the CPUs
+thrashing caches and taking locks. This led to the next scheme.
+	B) Vector hashing: In this method, hardware would apply a hash function
+on the vector value in the interrupt request, and use that hash to pick a logical
+CPU to route the lowest priority interrupt. This way, a given vector always goes
+to the same logical CPU, avoiding the thrashing problem above.
+
+So, gist of above is that, lowest priority interrupts has never been delivered as
+"lowest priority" in physical hardware.
+
+Vector hashing is used in this design.
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 2/7] vmx: Suppress posting interrupts when 'SN' is set
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
  2015-12-03  8:35 ` [PATCH v10 1/7] VT-d Posted-intterrupt (PI) design Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-03  8:35 ` [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: <kevin.tian@intel.com>
---
v10:
- Changed some comments to make them clear.

v8:
- Parenthesize '1 << POSTED_INTR_ON' and '1 << POSTED_INTR_SN'

v7:
- Coding style
- Read the current pi_desc.control as the intial value of prev.control

v6:
- Add some comments

v5:
- keep the vcpu_kick() at the end of vmx_deliver_posted_intr()
- Keep the 'return' after calling __vmx_deliver_posted_interrupt()

v4:
- Coding style.
- V3 removes a vcpu_kick() from the eoi_exitmap_changed path
  incorrectly, fix it.

v3:
- use cmpxchg to test SN/ON and set ON

 xen/arch/x86/hvm/vmx/vmx.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 9ad6d82..91d80b3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1718,8 +1718,35 @@ static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector)
          */
         pi_set_on(&v->arch.hvm_vmx.pi_desc);
     }
-    else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
+    else
     {
+        struct pi_desc old, new, prev;
+
+        prev.control = v->arch.hvm_vmx.pi_desc.control;
+
+        do {
+            /*
+             * Currently, we don't support urgent interrupt, all
+             * interrupts are recognized as non-urgent interrupt,
+             * Besides that, if 'ON' is already set, no need to
+             * sent posted-interrupts notification event as well,
+             * according to hardware behavior.
+             */
+            if ( pi_test_sn(&prev) || pi_test_on(&prev) )
+            {
+                vcpu_kick(v);
+                return;
+            }
+
+            old.control = v->arch.hvm_vmx.pi_desc.control &
+                          ~((1 << POSTED_INTR_ON) | (1 << POSTED_INTR_SN));
+            new.control = v->arch.hvm_vmx.pi_desc.control |
+                          (1 << POSTED_INTR_ON);
+
+            prev.control = cmpxchg(&v->arch.hvm_vmx.pi_desc.control,
+                                   old.control, new.control);
+        } while ( prev.control != old.control );
+
         __vmx_deliver_posted_interrupt(v);
         return;
     }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
  2015-12-03  8:35 ` [PATCH v10 1/7] VT-d Posted-intterrupt (PI) design Feng Wu
  2015-12-03  8:35 ` [PATCH v10 2/7] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-10 10:52   ` Tian, Kevin
  2015-12-03  8:35 ` [PATCH v10 4/7] Update IRTE according to guest interrupt config changes Feng Wu
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v10:
- Add some comments

v8:
- Some minor adjustment

v7:
- Remove __uint128_t cast
- Remove Kevin's Ack due to a bug fix for v6
- Reword some comments
- Setup posted IRTE from zeroed structure

v6:
- In some error cases, the desc->lock will be unlocked twice, fix it.
- Coding style fix.
- Add some comments.

v5:
- Make some function parameters const
- Call "spin_unlock_irq(&desc->lock);" a little eariler
- Add "ASSERT(spin_is_locked(&pcidevs_lock))"
- -EBADSLT -> -ENODEV, EBADSLT is removed in the lasted Xen

v4:
- Don't inline setup_posted_irte()
- const struct pi_desc *pi_desc for setup_posted_irte()
- return -EINVAL when pirq_spin_lock_irq_desc() fails.
- Make some variables const
- Release irq desc lock earlier in pi_update_irte()
- Remove the pointless do-while() loop when doing cmpxchg16b()

v3:
- Remove "adding PDA_MASK()" when updating 'pda_l' and 'pda_h' for IRTE.
- Change the return type of pi_update_irte() to int.
- Remove some pointless printk message in pi_update_irte().
- Use structure assignment instead of memcpy() for irte copy.

 xen/drivers/passthrough/vtd/intremap.c | 126 +++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h    |   6 ++
 xen/include/asm-x86/iommu.h            |   2 +
 3 files changed, 134 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
index ed9fb82..0a48cd4 100644
--- a/xen/drivers/passthrough/vtd/intremap.c
+++ b/xen/drivers/passthrough/vtd/intremap.c
@@ -905,3 +905,129 @@ void iommu_disable_x2apic_IR(void)
     for_each_drhd_unit ( drhd )
         disable_qinval(drhd->iommu);
 }
+
+static void setup_posted_irte(
+    struct iremap_entry *new_ire, const struct iremap_entry *old_ire,
+    const struct pi_desc *pi_desc, const uint8_t gvec)
+{
+    memset(new_ire, sizeof(*new_ire), 0);
+
+    /*
+     * 'im' filed decides whether the irte is in posted format (with value 1)
+     * or remapped format (with value 0), if the old irte is in remapped format,
+     * we copy things from remapped part in 'struct iremap_entry', otherwise,
+     * we copy from posted part.
+     */
+    if ( !old_ire->remap.im )
+    {
+        new_ire->post.p = old_ire->remap.p;
+        new_ire->post.fpd = old_ire->remap.fpd;
+        new_ire->post.sid = old_ire->remap.sid;
+        new_ire->post.sq = old_ire->remap.sq;
+        new_ire->post.svt = old_ire->remap.svt;
+    }
+    else
+    {
+        new_ire->post.p = old_ire->post.p;
+        new_ire->post.fpd = old_ire->post.fpd;
+        new_ire->post.sid = old_ire->post.sid;
+        new_ire->post.sq = old_ire->post.sq;
+        new_ire->post.svt = old_ire->post.svt;
+        new_ire->post.urg = old_ire->post.urg;
+    }
+
+    new_ire->post.im = 1;
+    new_ire->post.vector = gvec;
+    new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
+    new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
+}
+
+/*
+ * This function is used to update the IRTE for posted-interrupt
+ * when guest changes MSI/MSI-X information.
+ */
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq,
+    const uint8_t gvec)
+{
+    struct irq_desc *desc;
+    const struct msi_desc *msi_desc;
+    int remap_index;
+    int rc = 0;
+    const struct pci_dev *pci_dev;
+    const struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+    struct ir_ctrl *ir_ctrl;
+    struct iremap_entry *iremap_entries = NULL, *p = NULL;
+    struct iremap_entry new_ire, old_ire;
+    const struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+    __uint128_t ret;
+
+    desc = pirq_spin_lock_irq_desc(pirq, NULL);
+    if ( !desc )
+        return -EINVAL;
+
+    msi_desc = desc->msi_desc;
+    if ( !msi_desc )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    pci_dev = msi_desc->dev;
+    if ( !pci_dev )
+    {
+        rc = -ENODEV;
+        goto unlock_out;
+    }
+
+    remap_index = msi_desc->remap_index;
+
+    spin_unlock_irq(&desc->lock);
+
+    ASSERT(spin_is_locked(&pcidevs_lock));
+
+    /*
+     * FIXME: For performance reasons we should store the 'iommu' pointer in
+     * 'struct msi_desc' in some other place, so we don't need to waste
+     * time searching it here.
+     */
+    drhd = acpi_find_matched_drhd_unit(pci_dev);
+    if ( !drhd )
+        return -ENODEV;
+
+    iommu = drhd->iommu;
+    ir_ctrl = iommu_ir_ctrl(iommu);
+    if ( !ir_ctrl )
+        return -ENODEV;
+
+    spin_lock_irq(&ir_ctrl->iremap_lock);
+
+    GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
+
+    old_ire = *p;
+
+    /* Setup/Update interrupt remapping table entry. */
+    setup_posted_irte(&new_ire, &old_ire, pi_desc, gvec);
+    ret = cmpxchg16b(p, &old_ire, &new_ire);
+
+    /*
+     * In the above, we use cmpxchg16 to atomically update the 128-bit IRTE,
+     * and the hardware cannot update the IRTE behind us, so the return value
+     * of cmpxchg16 should be the same as old_ire. This ASSERT validate it.
+     */
+    ASSERT(ret == old_ire.val);
+
+    iommu_flush_cache_entry(p, sizeof(*p));
+    iommu_flush_iec_index(iommu, 0, remap_index);
+
+    unmap_vtd_domain_page(iremap_entries);
+
+    spin_unlock_irq(&ir_ctrl->iremap_lock);
+
+    return 0;
+
+ unlock_out:
+    spin_unlock_irq(&desc->lock);
+
+    return rc;
+}
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index b440b69..c55ee08 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -323,6 +323,12 @@ struct iremap_entry {
   };
 };
 
+/*
+ * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
+ * the upper 26 bits of lest significiant 32 bits is available.
+ */
+#define PDA_LOW_BIT    26
+
 /* Max intr remapping table page order is 8, as max number of IRTEs is 64K */
 #define IREMAP_PAGE_ORDER  8
 
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 7e24b1a..f22b3a5 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -31,6 +31,8 @@ bool_t iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
 
+int pi_update_irte(const struct vcpu *v, const struct pirq *pirq, const uint8_t gvec);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 4/7] Update IRTE according to guest interrupt config changes
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-12-03  8:35 ` [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-10 10:53   ` Tian, Kevin
  2015-12-03  8:35 ` [PATCH v10 5/7] vmx: Properly handle notification event when vCPU is running Feng Wu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu, Jan Beulich

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v8:
- Remove local variable 'bitmap_array_size'
- Use switch to replace if-else

v7:
- Remove some pointless debug printk
- Fix a logic error when assigning 'delivery_mode'
- Adjust the definition of local variable 'idx'
- Add a dprintk if we cannot find the vCPU from 'pi_find_dest_vcpu'
- Add 'else if ( delivery_mode == dest_Fixed )' in 'pi_find_dest_vcpu'

v6:
- Use macro to replace plain numbers
- Correct the overflow error in a loop

v5:
- Make 'struct vcpu *vcpu' const

v4:
- Make some 'int' variables 'unsigned int' in pi_find_dest_vcpu()
- Make 'dest_id' uint32_t
- Rename 'size' to 'bitmap_array_size'
- find_next_bit() and find_first_bit() always return unsigned int,
  so no need to check whether the return value is less than 0.
- Message error level XENLOG_G_WARNING -> XENLOG_G_INFO
- Remove useless warning message
- Create a seperate function vector_hashing_dest() to find the
- destination of lowest-priority interrupts.
- Change some comments

v3:
- Use bitmap to store the all the possible destination vCPUs of an
  interrupt, then trying to find the right destination from the bitmap
- Typo and some small changes

 xen/drivers/passthrough/io.c | 123 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 122 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index bda9374..6b1ee6a 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -25,6 +25,7 @@
 #include <asm/hvm/iommu.h>
 #include <asm/hvm/support.h>
 #include <xen/hvm/irq.h>
+#include <asm/io_apic.h>
 
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
@@ -198,6 +199,108 @@ void free_hvm_irq_dpci(struct hvm_irq_dpci *dpci)
     xfree(dpci);
 }
 
+/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to handle
+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ *    vCPUs in an array.
+ * 2. Use "gvec % max number of destination vCPUs" to find the right
+ *    destination vCPU in the array for the lowest-priority interrupt.
+ */
+static struct vcpu *vector_hashing_dest(const struct domain *d,
+                                        uint32_t dest_id,
+                                        bool_t dest_mode,
+                                        uint8_t gvec)
+
+{
+    unsigned long *dest_vcpu_bitmap;
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+    unsigned int i;
+
+    dest_vcpu_bitmap = xzalloc_array(unsigned long,
+                                     BITS_TO_LONGS(d->max_vcpus));
+    if ( !dest_vcpu_bitmap )
+        return NULL;
+
+    for_each_vcpu ( d, v )
+    {
+        if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                dest_id, dest_mode) )
+            continue;
+
+        __set_bit(v->vcpu_id, dest_vcpu_bitmap);
+        dest_vcpus++;
+    }
+
+    if ( dest_vcpus != 0 )
+    {
+        unsigned int mod = gvec % dest_vcpus;
+        unsigned int idx = 0;
+
+        for ( i = 0; i <= mod; i++ )
+        {
+            idx = find_next_bit(dest_vcpu_bitmap, d->max_vcpus, idx) + 1;
+            BUG_ON(idx >= d->max_vcpus);
+        }
+
+        dest = d->vcpu[idx - 1];
+    }
+
+    xfree(dest_vcpu_bitmap);
+
+    return dest;
+}
+
+/*
+ * The purpose of this routine is to find the right destination vCPU for
+ * an interrupt which will be delivered by VT-d posted-interrupt. There
+ * are several cases as below:
+ *
+ * - For lowest-priority interrupts, use vector-hashing mechanism to find
+ *   the destination.
+ * - Otherwise, for single destination interrupt, it is straightforward to
+ *   find the destination vCPU and return true.
+ * - For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
+ *   so return NULL.
+ */
+static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t dest_id,
+                                      bool_t dest_mode, uint8_t delivery_mode,
+                                      uint8_t gvec)
+{
+    unsigned int dest_vcpus = 0;
+    struct vcpu *v, *dest = NULL;
+
+    switch ( delivery_mode )
+    {
+    case dest_LowestPrio:
+        return vector_hashing_dest(d, dest_id, dest_mode, gvec);
+    case dest_Fixed:
+        for_each_vcpu ( d, v )
+        {
+            if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, APIC_DEST_NOSHORT,
+                                    dest_id, dest_mode) )
+                continue;
+
+            dest_vcpus++;
+            dest = v;
+        }
+
+        /* For fixed mode, we only handle single-destination interrupts. */
+        if ( dest_vcpus == 1 )
+            return dest;
+
+        break;
+    default:
+        break;
+    }
+
+    return NULL;
+}
+
 int pt_irq_create_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -256,7 +359,7 @@ int pt_irq_create_bind(
     {
     case PT_IRQ_TYPE_MSI:
     {
-        uint8_t dest, dest_mode;
+        uint8_t dest, dest_mode, delivery_mode;
         int dest_vcpu_id;
 
         if ( !(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
@@ -329,11 +432,29 @@ int pt_irq_create_bind(
         /* Calculate dest_vcpu_id for MSI-type pirq migration. */
         dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
         dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
+        delivery_mode = (pirq_dpci->gmsi.gflags & VMSI_DELIV_MASK) >>
+                         GFLAGS_SHIFT_DELIV_MODE;
+
         dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
         pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
         spin_unlock(&d->event_lock);
         if ( dest_vcpu_id >= 0 )
             hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
+
+        /* Use interrupt posting if it is supported. */
+        if ( iommu_intpost )
+        {
+            const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest, dest_mode,
+                                          delivery_mode, pirq_dpci->gmsi.gvec);
+
+            if ( vcpu )
+                pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
+            else
+                dprintk(XENLOG_G_INFO,
+                        "%pv: deliver interrupt in remapping mode,gvec:%02x\n",
+                        vcpu, pirq_dpci->gmsi.gvec);
+        }
+
         break;
     }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 5/7] vmx: Properly handle notification event when vCPU is running
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-12-03  8:35 ` [PATCH v10 4/7] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Kevin Tian, Feng Wu, Jan Beulich, Andrew Cooper

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

CC: Kevin Tian <kevin.tian@intel.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
v7:
- Retain 'cli' in the comments to make it more understandable.
- Register another notification event handler when VT-d PI is enabled

v6:
- Ack the interrupt in the beginning of pi_notification_interrupt()

v4:
- Coding style.

v3:
- Make pi_notification_interrupt() static

 xen/arch/x86/hvm/vmx/vmx.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 91d80b3..39dc500 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2014,6 +2014,53 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is running. */
+static void pi_notification_interrupt(struct cpu_user_regs *regs)
+{
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    /*
+     * We get here when a vCPU is running in root-mode (such as via hypercall,
+     * or any other reasons which can result in VM-Exit), and before vCPU is
+     * back to non-root, external interrupts from an assigned device happen
+     * and a notification event is delivered to this logical CPU.
+     *
+     * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
+     * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
+     * be synced to vIRR before VM-Exit in time.
+     *
+     * Please refer to the following code fragments from
+     * xen/arch/x86/hvm/vmx/entry.S:
+     *
+     *     .Lvmx_do_vmentry
+     *
+     *      ......
+     *
+     *      point 1
+     *
+     *      cli
+     *      cmp  %ecx,(%rdx,%rax,1)
+     *      jnz  .Lvmx_process_softirqs
+     *
+     *      ......
+     *
+     *      je   .Lvmx_launch
+     *
+     *      ......
+     *
+     *     .Lvmx_process_softirqs:
+     *      sti
+     *      call do_softirq
+     *      jmp  .Lvmx_do_vmentry
+     *
+     * If VT-d engine issues a notification event at point 1 above, it cannot
+     * be delivered to the guest during this VM-entry without raising the
+     * softirq in this notification handler.
+     */
+    raise_softirq(VCPU_KICK_SOFTIRQ);
+}
+
 const struct hvm_function_table * __init start_vmx(void)
 {
     set_in_cr4(X86_CR4_VMXE);
@@ -2051,7 +2098,12 @@ const struct hvm_function_table * __init start_vmx(void)
     }
 
     if ( cpu_has_vmx_posted_intr_processing )
-        alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    {
+        if ( iommu_intpost )
+            alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+        else
+            alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
+    }
     else
     {
         vmx_function_table.deliver_posted_intr = NULL;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-12-03  8:35 ` [PATCH v10 5/7] vmx: Properly handle notification event when vCPU is running Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-10 11:40   ` Tian, Kevin
                     ` (3 more replies)
  2015-12-03  8:35 ` [PATCH v10 7/7] Add a command line parameter for VT-d posted-interrupts Feng Wu
                   ` (2 subsequent siblings)
  8 siblings, 4 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Feng Wu

This is the core logic handling for VT-d posted-interrupts. Basically it
deals with how and when to update posted-interrupts during the following
scenarios:
- vCPU is preempted
- vCPU is slept
- vCPU is blocked

When vCPU is preempted/slept, we update the posted-interrupts during
scheduling by introducing two new architecutral scheduler hooks:
vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
introduce a new architectural hooks: arch_vcpu_block() to update
posted-interrupts descriptor.

Besides that, before VM-entry, we will make sure the 'NV' filed is set
to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
is needed when vCPU is running in non-root mode. The reason we do this check
is because we change the posted-interrupts descriptor in vcpu_block(),
however, we don't change it back in vcpu_unblock() or when vcpu_block()
directly returns due to event delivery (in fact, we don't need to do it
in the two places, that is why we do it before VM-Entry).

When we handle the lazy context switch for the following two scenarios:
- Preempted by a tasklet, which uses in an idle context.
- the prev vcpu is in offline and no new available vcpus in run queue.
We don't change the 'SN' bit in posted-interrupt descriptor, this
may incur spurious PI notification events, but since PI notification
event is only sent when 'ON' is clear, and once the PI notificatoin
is sent, ON is set by hardware, hence no more notification events
before 'ON' is clear. Besides that, spurious PI notification events are
going to happen from time to time in Xen hypervisor, such as, when
guests trap to Xen and PI notification event happens, there is
nothing Xen actually needs to do about it, the interrupts will be
delivered to guest atht the next time we do a VMENTRY.

CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v10:
- Check iommu_intpost first
- Remove pointless checking of has_hvm_container_vcpu(v)
- Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
- Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
  don't need use another list to save the vCPUs with 'ON' set, just
  directly call vcpu_unblock(v).

v9:
- Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
- Add vmx_pi_state_change() and call it before VM Entry

v8:
- Remove the lazy context switch handling for PI state transition
- Change PI state in vcpu_block() and do_poll() when the vCPU
  is going to be blocked

v7:
- Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts
  and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
  into this patch, so it is self-contained and more convenient
  for code review.
- Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
- Coding style
- Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
- Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
- Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
- Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
- Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
- Use 'spin_lock' and 'spin_unlock' when the interrupt has been
  already disabled.
- Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
- Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
- Call .pi_ctxt_switch_to() __context_switch() instead of directly
  calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
- Make .pi_block_cpu unsigned int
- Use list_del() instead of list_del_init()
- Coding style

One remaining item in v7:
Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
need Dario or George's input about this.

v6:
- Add two static inline functions for pi context switch
- Fix typos

v5:
- Rename arch_vcpu_wake to arch_vcpu_wake_prepare
- Make arch_vcpu_wake_prepare() inline for ARM
- Merge the ARM dummy hook with together
- Changes to some code comments
- Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
  PI is disabled or the vCPU is not in HVM
- Coding style

v4:
- Newly added

Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
v6:
- Fix some typos
- Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()

v4:
- Use local variables in pi_wakeup_interrupt()
- Remove vcpu from the blocked list when pi_desc.on==1, this
- avoid kick vcpu multiple times.
- Remove tasklet

v3:
- This patch is generated by merging the following three patches in v2:
   [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
   [RFC v2 10/15] vmx: Define two per-cpu variables
   [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
- rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
- Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
- rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
- Make pi_wakeup_interrupt() static
- Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
- move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
- Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
- Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'

 xen/arch/x86/hvm/hvm.c             |   6 ++
 xen/arch/x86/hvm/vmx/vmcs.c        |   2 +
 xen/arch/x86/hvm/vmx/vmx.c         | 172 +++++++++++++++++++++++++++++++++++++
 xen/common/schedule.c              |   4 +
 xen/include/asm-arm/domain.h       |   2 +
 xen/include/asm-x86/domain.h       |   2 +
 xen/include/asm-x86/hvm/hvm.h      |   2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
 xen/include/asm-x86/hvm/vmx/vmx.h  |   4 +
 9 files changed, 203 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6c2b512..3368cf2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
     hvm_destroy_all_ioreq_servers(d);
 }
 
+void arch_vcpu_block(struct vcpu *v)
+{
+    if ( v->arch.vcpu_block )
+        v->arch.vcpu_block(v);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 000d06e..0f23fce 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -676,6 +676,8 @@ int vmx_cpu_up(void)
     if ( cpu_has_vmx_vpid )
         vpid_sync_all();
 
+    vmx_pi_per_cpu_init(cpu);
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 39dc500..0d9462e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
 static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
 
+/*
+ * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
+ * can find which vCPU should be woken up.
+ */
+static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
+static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
+
 uint8_t __read_mostly posted_intr_vector;
+uint8_t __read_mostly pi_wakeup_vector;
+
+void vmx_pi_per_cpu_init(unsigned int cpu)
+{
+    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
+    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
+}
+
+void vmx_vcpu_block(struct vcpu *v)
+{
+    unsigned long flags;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !has_arch_pdevs(v->domain) )
+        return;
+
+    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
+
+    /*
+     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
+     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
+     * the per-CPU list, we also save it to posted-interrupt descriptor and
+     * make it as the destination of the wake-up notification event.
+     */
+    v->arch.hvm_vmx.pi_block_cpu = v->processor;
+
+    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
+                      v->arch.hvm_vmx.pi_block_cpu), flags);
+    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
+                           v->arch.hvm_vmx.pi_block_cpu), flags);
+
+    ASSERT(!pi_test_sn(pi_desc));
+
+    /*
+     * We don't need to set the 'NDST' field, since it should point to
+     * the same pCPU as v->processor.
+     */
+
+    write_atomic(&pi_desc->nv, pi_wakeup_vector);
+}
+
+static void vmx_pi_switch_from(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
+         test_bit(_VPF_blocked, &v->pause_flags) )
+        return;
+
+    /*
+     * The vCPU has been preempted or went to sleep. We don't need to send
+     * notification event to a non-running vcpu, the interrupt information
+     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
+     * to run next time.
+     */
+    pi_set_sn(pi_desc);
+}
+
+static void vmx_pi_switch_to(struct vcpu *v)
+{
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
+        return;
+
+    if ( x2apic_enabled )
+        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
+    else
+        write_atomic(&pi_desc->ndst,
+                     MASK_INSR(cpu_physical_id(v->processor),
+                     PI_xAPIC_NDST_MASK));
+
+    pi_clear_sn(pi_desc);
+}
+
+static void vmx_pi_state_to_normal(struct vcpu *v)
+{
+    unsigned long flags;
+    unsigned int pi_block_cpu;
+    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
+
+    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
+        return;
+
+    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
+
+    /*
+     * Set 'NV' field back to posted_intr_vector, so the
+     * Posted-Interrupts can be delivered to the vCPU when
+     * it is running in non-root mode.
+     */
+    if ( pi_desc->nv != posted_intr_vector )
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+
+    /* the vCPU is not on any blocking list. */
+    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
+    if ( pi_block_cpu == NR_CPUS )
+        return;
+
+    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+
+    /*
+     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
+     * removed from the blocking list while we are acquiring the lock.
+     */
+    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
+    {
+        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+        return;
+    }
+
+    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
+}
 
 static int vmx_domain_initialise(struct domain *d)
 {
@@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 
     spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
 
+    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
+
+    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
+
     v->arch.schedule_tail    = vmx_do_resume;
     v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
     v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
 
+    if ( iommu_intpost )
+        v->arch.vcpu_block = vmx_vcpu_block;
+
     if ( (rc = vmx_create_vmcs(v)) != 0 )
     {
         dprintk(XENLOG_WARNING,
@@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
     vmx_save_dr(v);
+    vmx_pi_switch_from(v);
 }
 
 static void vmx_ctxt_switch_to(struct vcpu *v)
@@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
+    vmx_pi_switch_to(v);
 }
 
 
@@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
+/* Handle VT-d posted-interrupt when VCPU is blocked. */
+static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
+{
+    struct arch_vmx_struct *vmx, *tmp;
+    struct vcpu *v;
+    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
+    struct list_head *blocked_vcpus =
+                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
+    LIST_HEAD(list);
+
+    ack_APIC_irq();
+    this_cpu(irq_count)++;
+
+    spin_lock(lock);
+
+    /*
+     * XXX: The length of the list depends on how many vCPU is current
+     * blocked on this specific pCPU. This may hurt the interrupt latency
+     * if the list grows to too many entries.
+     */
+    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
+    {
+        if ( pi_test_on(&vmx->pi_desc) )
+        {
+            list_del(&vmx->pi_blocked_vcpu_list);
+            vmx->pi_block_cpu = NR_CPUS;
+            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
+            vcpu_unblock(v);
+        }
+    }
+
+    spin_unlock(lock);
+}
+
 /* Handle VT-d posted-interrupt when VCPU is running. */
 static void pi_notification_interrupt(struct cpu_user_regs *regs)
 {
@@ -2100,7 +2267,10 @@ const struct hvm_function_table * __init start_vmx(void)
     if ( cpu_has_vmx_posted_intr_processing )
     {
         if ( iommu_intpost )
+        {
             alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
+            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
+        }
         else
             alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
     }
@@ -3543,6 +3713,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     struct hvm_vcpu_asid *p_asid;
     bool_t need_flush;
 
+    vmx_pi_state_to_normal(curr);
+
     if ( !cpu_has_vmx_vpid )
         goto out;
     if ( nestedhvm_vcpu_in_guestmode(curr) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c195129..fc18035 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -802,6 +802,8 @@ void vcpu_block(void)
 
     set_bit(_VPF_blocked, &v->pause_flags);
 
+    arch_vcpu_block(v);
+
     /* Check for events /after/ blocking: avoids wakeup waiting race. */
     if ( local_events_need_delivery() )
     {
@@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
     v->poll_evtchn = -1;
     set_bit(v->vcpu_id, d->poll_mask);
 
+    arch_vcpu_block(v);
+
 #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
     /* Check for events /after/ setting flags: avoids wakeup waiting race. */
     smp_mb();
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index e7e40da..ada146c 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -310,6 +310,8 @@ static inline void free_vcpu_guest_context(struct vcpu_guest_context *vgc)
     xfree(vgc);
 }
 
+static inline void arch_vcpu_block(struct vcpu *v) {}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index c825975..135f7f9 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -494,6 +494,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    void (*vcpu_block) (struct vcpu *);
+
     struct vpmu_struct vpmu;
 
     /* Virtual Machine Extensions */
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index f80e143..336fa62 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -560,6 +560,8 @@ void altp2m_vcpu_update_vmfunc_ve(struct vcpu *v);
 /* emulates #VE */
 bool_t altp2m_vcpu_emulate_ve(struct vcpu *v);
 
+void arch_vcpu_block(struct vcpu *v);
+
 #endif /* __ASM_X86_HVM_HVM_H__ */
 
 /*
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index b3b0946..7b3a931 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -160,6 +160,15 @@ struct arch_vmx_struct {
     struct page_info     *vmwrite_bitmap;
 
     struct page_info     *pml_pg;
+
+    struct list_head     pi_blocked_vcpu_list;
+
+    /*
+     * Before vCPU is blocked, it is added to the global per-cpu list
+     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
+     * event to 'pi_block_cpu' and wakeup the related vCPU.
+     */
+    unsigned int         pi_block_cpu;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 1719965..8129bff 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -28,6 +28,8 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/vmx/vmcs.h>
 
+extern uint8_t pi_wakeup_vector;
+
 typedef union {
     struct {
         u64 r       :   1,  /* bit 0 - Read permission */
@@ -563,6 +565,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
 void free_p2m_hap_data(struct p2m_domain *p2m);
 void p2m_init_hap_data(struct p2m_domain *p2m);
 
+void vmx_pi_per_cpu_init(unsigned int cpu);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v10 7/7] Add a command line parameter for VT-d posted-interrupts
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-12-03  8:35 ` Feng Wu
  2015-12-03 11:19 ` [PATCH v10 0/7] Add VT-d Posted-Interrupts support Jan Beulich
  2015-12-10 10:48 ` Tian, Kevin
  8 siblings, 0 replies; 47+ messages in thread
From: Feng Wu @ 2015-12-03  8:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Feng Wu, Jan Beulich

Enable VT-d Posted-Interrupts and add a command line
parameter for it.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v6:
- Change the default value to 'false' in xen-command-line.markdown

 docs/misc/xen-command-line.markdown | 9 ++++++++-
 xen/drivers/passthrough/iommu.c     | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index c103894..cbfb59d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -868,7 +868,7 @@ debug hypervisor only).
 > Default: `new` unless directed-EOI is supported
 
 ### iommu
-> `= List of [ <boolean> | force | required | intremap | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
+> `= List of [ <boolean> | force | required | intremap | intpost | qinval | snoop | sharept | dom0-passthrough | dom0-strict | amd-iommu-perdev-intremap | workaround_bios_bug | igfx | verbose | debug ]`
 
 > Sub-options:
 
@@ -895,6 +895,13 @@ debug hypervisor only).
 >> Control the use of interrupt remapping (DMA remapping will always be enabled
 >> if IOMMU functionality is enabled).
 
+> `intpost`
+
+> Default: `false`
+
+>> Control the use of interrupt posting, which depends on the availability of
+>> interrupt remapping.
+
 > `qinval` (VT-d)
 
 > Default: `true`
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 3d02550..fae2547 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -32,6 +32,7 @@ static void iommu_dump_p2m_table(unsigned char key);
  *   off|no|false|disable       Disable IOMMU (default)
  *   force|required             Don't boot unless IOMMU is enabled
  *   no-intremap                Disable interrupt remapping
+ *   no-intpost                 Disable VT-d Interrupt posting
  *   verbose                    Be more verbose
  *   debug                      Enable debugging messages and checks
  *   workaround_bios_bug        Workaround some bios issue to still enable
@@ -105,6 +106,8 @@ static void __init parse_iommu_param(char *s)
             iommu_qinval = val;
         else if ( !strcmp(s, "intremap") )
             iommu_intremap = val;
+        else if ( !strcmp(s, "intpost") )
+            iommu_intpost = val;
         else if ( !strcmp(s, "debug") )
         {
             iommu_debug = val;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 0/7] Add VT-d Posted-Interrupts support
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-12-03  8:35 ` [PATCH v10 7/7] Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-12-03 11:19 ` Jan Beulich
  2015-12-03 13:54   ` Wu, Feng
  2015-12-10 10:48 ` Tian, Kevin
  8 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2015-12-03 11:19 UTC (permalink / raw)
  To: Feng Wu; +Cc: xen-devel

>>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt- 
> directed-io-spec.html
> 
> Feng Wu (17):
>  r   VT-d Posted-intterrupt (PI) design
>  ra  vmx: Suppress posting interrupts when 'SN' is set
>  r   vt-d: Add API to update IRTE when VT-d PI is used
>   a  Update IRTE according to guest interrupt config changes
>   a  vmx: Properly handle notification event when vCPU is running
>      vmx: VT-d posted-interrupt core logic handling
>  ra  Add a command line parameter for VT-d posted-interrupts
>  
>  r = has been 'Reviewed-by'
>  a = has been 'Acked-by'

Actually this kind of summary is misleading (and I understand
you're just following what others did before, i.e. this is not a
problem just here): For it to be meaningful, one would need to
know whether reviews/acks are in place by all necessary
maintainers. But of course the summary helps in drawing
attention to things completely un-reviewed/un-acked.

Perhaps one could use upper case letters to indicate full
maintainer review/ack state?

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 0/7] Add VT-d Posted-Interrupts support
  2015-12-03 11:19 ` [PATCH v10 0/7] Add VT-d Posted-Interrupts support Jan Beulich
@ 2015-12-03 13:54   ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-12-03 13:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Wu, Feng, xen-devel



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, December 3, 2015 7:19 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v10 0/7] Add VT-d Posted-Interrupts support
> 
> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > http://www.intel.com/content/www/us/en/intelligent-systems/intel-
> technology/vt-
> > directed-io-spec.html
> >
> > Feng Wu (17):
> >  r   VT-d Posted-intterrupt (PI) design
> >  ra  vmx: Suppress posting interrupts when 'SN' is set
> >  r   vt-d: Add API to update IRTE when VT-d PI is used
> >   a  Update IRTE according to guest interrupt config changes
> >   a  vmx: Properly handle notification event when vCPU is running
> >      vmx: VT-d posted-interrupt core logic handling
> >  ra  Add a command line parameter for VT-d posted-interrupts
> >
> >  r = has been 'Reviewed-by'
> >  a = has been 'Acked-by'
> 
> Actually this kind of summary is misleading (and I understand
> you're just following what others did before, i.e. this is not a
> problem just here): For it to be meaningful, one would need to
> know whether reviews/acks are in place by all necessary
> maintainers. But of course the summary helps in drawing
> attention to things completely un-reviewed/un-acked.

Yes, I agree with you. For this case, only patch
"vmx: VT-d posted-interrupt core logic handling" doesn't have
reviewed-by or acked-by, so it may need more attention by
the maintainers.

> 
> Perhaps one could use upper case letters to indicate full
> maintainer review/ack state?

Sounds great!

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 0/7] Add VT-d Posted-Interrupts support
  2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-12-03 11:19 ` [PATCH v10 0/7] Add VT-d Posted-Interrupts support Jan Beulich
@ 2015-12-10 10:48 ` Tian, Kevin
  2015-12-10 13:40   ` Wu, Feng
  8 siblings, 1 reply; 47+ messages in thread
From: Tian, Kevin @ 2015-12-10 10:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Wu, Feng

> From: Feng Wu
> Sent: Thursday, December 03, 2015 4:35 PM
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-direct
> ed-io-spec.html
> 
> Feng Wu (17):
>  r   VT-d Posted-intterrupt (PI) design
>  ra  vmx: Suppress posting interrupts when 'SN' is set
>  r   vt-d: Add API to update IRTE when VT-d PI is used
>   a  Update IRTE according to guest interrupt config changes
>   a  vmx: Properly handle notification event when vCPU is running
>      vmx: VT-d posted-interrupt core logic handling
>  ra  Add a command line parameter for VT-d posted-interrupts
> 
>  r = has been 'Reviewed-by'
>  a = has been 'Acked-by'
> 
> Feng Wu (7):
>   VT-d Posted-intterrupt (PI) design
>   vmx: Suppress posting interrupts when 'SN' is set
>   vt-d: Add API to update IRTE when VT-d PI is used
>   Update IRTE according to guest interrupt config changes
>   vmx: Properly handle notification event when vCPU is running
>   vmx: VT-d posted-interrupt core logic handling
>   Add a command line parameter for VT-d posted-interrupts

Only 7 patches in this series because other 10 have been checked in?

Thanks
Kevin 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used
  2015-12-03  8:35 ` [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
@ 2015-12-10 10:52   ` Tian, Kevin
  0 siblings, 0 replies; 47+ messages in thread
From: Tian, Kevin @ 2015-12-10 10:52 UTC (permalink / raw)
  To: Wu, Feng, xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Thursday, December 03, 2015 4:36 PM
is used
> 
> This patch adds an API which is used to update the IRTE
> for posted-interrupt when guest changes MSI/MSI-X information.
> 
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 4/7] Update IRTE according to guest interrupt config changes
  2015-12-03  8:35 ` [PATCH v10 4/7] Update IRTE according to guest interrupt config changes Feng Wu
@ 2015-12-10 10:53   ` Tian, Kevin
  0 siblings, 0 replies; 47+ messages in thread
From: Tian, Kevin @ 2015-12-10 10:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Wu, Feng, Jan Beulich

> From: Feng Wu
> Sent: Thursday, December 03, 2015 4:36 PM
> 
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
> 
> For lowest-priority interrupts, we use vector-hashing mechamisn to find
> the destination vCPU. This follows the hardware behavior, since modern
> Intel CPUs use vector hashing to handle the lowest-priority interrupt.
> 
> For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
> still use interrupt remapping.
> 
> CC: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
@ 2015-12-10 11:40   ` Tian, Kevin
  2015-12-11  1:58     ` Wu, Feng
  2015-12-21  6:43   ` Wu, Feng
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Tian, Kevin @ 2015-12-10 11:40 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Thursday, December 03, 2015 4:36 PM
> 
> This is the core logic handling for VT-d posted-interrupts. Basically it
> deals with how and when to update posted-interrupts during the following
> scenarios:
> - vCPU is preempted
> - vCPU is slept
> - vCPU is blocked
> 
> When vCPU is preempted/slept, we update the posted-interrupts during
> scheduling by introducing two new architecutral scheduler hooks:
> vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
> introduce a new architectural hooks: arch_vcpu_block() to update

hooks -> hook

> posted-interrupts descriptor.
> 
> Besides that, before VM-entry, we will make sure the 'NV' filed is set
> to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
> is needed when vCPU is running in non-root mode. The reason we do this check
> is because we change the posted-interrupts descriptor in vcpu_block(),
> however, we don't change it back in vcpu_unblock() or when vcpu_block()
> directly returns due to event delivery (in fact, we don't need to do it
> in the two places, that is why we do it before VM-Entry).
> 
> When we handle the lazy context switch for the following two scenarios:
> - Preempted by a tasklet, which uses in an idle context.
> - the prev vcpu is in offline and no new available vcpus in run queue.
> We don't change the 'SN' bit in posted-interrupt descriptor, this
> may incur spurious PI notification events, but since PI notification
> event is only sent when 'ON' is clear, and once the PI notification
> is sent, ON is set by hardware, hence no more notification events
> before 'ON' is clear. Besides that, spurious PI notification events are
> going to happen from time to time in Xen hypervisor, such as, when
> guests trap to Xen and PI notification event happens, there is
> nothing Xen actually needs to do about it, the interrupts will be
> delivered to guest atht the next time we do a VMENTRY.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6c2b512..3368cf2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
>      hvm_destroy_all_ioreq_servers(d);
>  }
> 
> +void arch_vcpu_block(struct vcpu *v)
> +{
> +    if ( v->arch.vcpu_block )
> +        v->arch.vcpu_block(v);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 000d06e..0f23fce 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -676,6 +676,8 @@ int vmx_cpu_up(void)
>      if ( cpu_has_vmx_vpid )
>          vpid_sync_all();
> 
> +    vmx_pi_per_cpu_init(cpu);
> +
>      return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 39dc500..0d9462e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t
> msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
> 
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we

vCPU -> vCPUs

> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> +     * make it as the destination of the wake-up notification event.

the 2nd piece - "we also save it to posted-interrupt descriptor" is not
reflected within this function. Do you mean "we have saved it to..."
or "we will save it later to..." in other places?

> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
> +    /*
> +     * The vCPU has been preempted or went to sleep. We don't need to send
> +     * notification event to a non-running vcpu, the interrupt information

I might be wrong on this, but my gut-feeling is that 'non-running' includes
both runnable and blocked vcpu...

> +     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> +     * to run next time.
> +     */
> +    pi_set_sn(pi_desc);
> +}
> +
> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));
> +
> +    pi_clear_sn(pi_desc);
> +}
> +
> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> +     * removed from the blocking list while we are acquiring the lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +}
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> 
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> 
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> 
> +    if ( iommu_intpost )
> +        v->arch.vcpu_block = vmx_vcpu_block;
> +
>      if ( (rc = vmx_create_vmcs(v)) != 0 )
>      {
>          dprintk(XENLOG_WARNING,
> @@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
>      vmx_save_guest_msrs(v);
>      vmx_restore_host_msrs();
>      vmx_save_dr(v);
> +    vmx_pi_switch_from(v);
>  }
> 
>  static void vmx_ctxt_switch_to(struct vcpu *v)
> @@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
> 
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
> +    vmx_pi_switch_to(v);
>  }
> 
> 
> @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
> 
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> +    LIST_HEAD(list);

what's it?

> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +            vcpu_unblock(v);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +}
> +

I may miss something here. Who will clear ON bit? test read?

It's unclear from design doc:

+Outstanding Notification (ON): Indicate if there is a notification event outstanding (not
+processed by processor or software) for this Posted Interrupt Descriptor. When this field is 0,
+hardware modifies it from 0 to 1 when generating a notification event, and the entity receiving
+the notification event (processor or software) resets it as part of posted interrupt processing.


Thanks
Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 0/7] Add VT-d Posted-Interrupts support
  2015-12-10 10:48 ` Tian, Kevin
@ 2015-12-10 13:40   ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-12-10 13:40 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Wu, Feng



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, December 10, 2015 6:49 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Wu, Feng <feng.wu@intel.com>
> Subject: RE: [Xen-devel] [PATCH v10 0/7] Add VT-d Posted-Interrupts support
> 
> > From: Feng Wu
> > Sent: Thursday, December 03, 2015 4:35 PM
> >
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > http://www.intel.com/content/www/us/en/intelligent-systems/intel-
> technology/vt-direct
> > ed-io-spec.html
> >
> > Feng Wu (17):
> >  r   VT-d Posted-intterrupt (PI) design
> >  ra  vmx: Suppress posting interrupts when 'SN' is set
> >  r   vt-d: Add API to update IRTE when VT-d PI is used
> >   a  Update IRTE according to guest interrupt config changes
> >   a  vmx: Properly handle notification event when vCPU is running
> >      vmx: VT-d posted-interrupt core logic handling
> >  ra  Add a command line parameter for VT-d posted-interrupts
> >
> >  r = has been 'Reviewed-by'
> >  a = has been 'Acked-by'
> >
> > Feng Wu (7):
> >   VT-d Posted-intterrupt (PI) design
> >   vmx: Suppress posting interrupts when 'SN' is set
> >   vt-d: Add API to update IRTE when VT-d PI is used
> >   Update IRTE according to guest interrupt config changes
> >   vmx: Properly handle notification event when vCPU is running
> >   vmx: VT-d posted-interrupt core logic handling
> >   Add a command line parameter for VT-d posted-interrupts
> 
> Only 7 patches in this series because other 10 have been checked in?

Yes, the other 10 have been merged by Jan.

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-10 11:40   ` Tian, Kevin
@ 2015-12-11  1:58     ` Wu, Feng
  2015-12-11  2:27       ` Tian, Kevin
  0 siblings, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-12-11  1:58 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Wu, Feng, George Dunlap, Andrew Cooper, Dario Faggioli,
	Jan Beulich, Keir Fraser



> -----Original Message-----
> From: Tian, Kevin
> Sent: Thursday, December 10, 2015 7:40 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>
> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> > From: Wu, Feng
> > Sent: Thursday, December 03, 2015 4:36 PM
> >
> > This is the core logic handling for VT-d posted-interrupts. Basically it
> > deals with how and when to update posted-interrupts during the following
> > scenarios:
> > - vCPU is preempted
> > - vCPU is slept
> > - vCPU is blocked
> >
> > When vCPU is preempted/slept, we update the posted-interrupts during
> > scheduling by introducing two new architecutral scheduler hooks:
> > vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
> > introduce a new architectural hooks: arch_vcpu_block() to update
> 
> hooks -> hook
> 
> > posted-interrupts descriptor.
> >
> > Besides that, before VM-entry, we will make sure the 'NV' filed is set
> > to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
> > is needed when vCPU is running in non-root mode. The reason we do this
> check
> > is because we change the posted-interrupts descriptor in vcpu_block(),
> > however, we don't change it back in vcpu_unblock() or when vcpu_block()
> > directly returns due to event delivery (in fact, we don't need to do it
> > in the two places, that is why we do it before VM-Entry).
> >
> > When we handle the lazy context switch for the following two scenarios:
> > - Preempted by a tasklet, which uses in an idle context.
> > - the prev vcpu is in offline and no new available vcpus in run queue.
> > We don't change the 'SN' bit in posted-interrupt descriptor, this
> > may incur spurious PI notification events, but since PI notification
> > event is only sent when 'ON' is clear, and once the PI notification
> > is sent, ON is set by hardware, hence no more notification events
> > before 'ON' is clear. Besides that, spurious PI notification events are
> > going to happen from time to time in Xen hypervisor, such as, when
> > guests trap to Xen and PI notification event happens, there is
> > nothing Xen actually needs to do about it, the interrupts will be
> > delivered to guest atht the next time we do a VMENTRY.
> >
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: George Dunlap <george.dunlap@eu.citrix.com>
> > CC: Dario Faggioli <dario.faggioli@citrix.com>
> > Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> > Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
> > Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> > Suggested-by: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 6c2b512..3368cf2 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
> >      hvm_destroy_all_ioreq_servers(d);
> >  }
> >
> > +void arch_vcpu_block(struct vcpu *v)
> > +{
> > +    if ( v->arch.vcpu_block )
> > +        v->arch.vcpu_block(v);
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 000d06e..0f23fce 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -676,6 +676,8 @@ int vmx_cpu_up(void)
> >      if ( cpu_has_vmx_vpid )
> >          vpid_sync_all();
> >
> > +    vmx_pi_per_cpu_init(cpu);
> > +
> >      return 0;
> >  }
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 39dc500..0d9462e 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr,
> uint64_t
> > msr_content);
> >  static void vmx_invlpg_intercept(unsigned long vaddr);
> >  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
> >
> > +/*
> > + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> 
> vCPU -> vCPUs
> 
> > + * can find which vCPU should be woken up.
> > + */
> > +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> > +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> > +
> >  uint8_t __read_mostly posted_intr_vector;
> > +uint8_t __read_mostly pi_wakeup_vector;
> > +
> > +void vmx_pi_per_cpu_init(unsigned int cpu)
> > +{
> > +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> > +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> > +}
> > +
> > +void vmx_vcpu_block(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > +
> > +    /*
> > +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> > +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> > +     * make it as the destination of the wake-up notification event.
> 
> the 2nd piece - "we also save it to posted-interrupt descriptor" is not
> reflected within this function. Do you mean "we have saved it to..."
> or "we will save it later to..." in other places?
> 
> > +     */
> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> > +
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> > +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> > +
> > +    ASSERT(!pi_test_sn(pi_desc));
> > +
> > +    /*
> > +     * We don't need to set the 'NDST' field, since it should point to
> > +     * the same pCPU as v->processor.
> > +     */
> > +
> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> > +}
> > +
> > +static void vmx_pi_switch_from(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> > +         test_bit(_VPF_blocked, &v->pause_flags) )
> > +        return;
> > +
> > +    /*
> > +     * The vCPU has been preempted or went to sleep. We don't need to send
> > +     * notification event to a non-running vcpu, the interrupt information
> 
> I might be wrong on this, but my gut-feeling is that 'non-running' includes
> both runnable and blocked vcpu...

Yes, I should use "We don't need to send notification event to a runnable or
sleeping vcpu ..."

> 
> > +     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> > +     * to run next time.
> > +     */
> > +    pi_set_sn(pi_desc);
> > +}
> > +
> > +static void vmx_pi_switch_to(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    if ( x2apic_enabled )
> > +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> > +    else
> > +        write_atomic(&pi_desc->ndst,
> > +                     MASK_INSR(cpu_physical_id(v->processor),
> > +                     PI_xAPIC_NDST_MASK));
> > +
> > +    pi_clear_sn(pi_desc);
> > +}
> > +
> > +static void vmx_pi_state_to_normal(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    unsigned int pi_block_cpu;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> > +
> > +    /*
> > +     * Set 'NV' field back to posted_intr_vector, so the
> > +     * Posted-Interrupts can be delivered to the vCPU when
> > +     * it is running in non-root mode.
> > +     */
> > +    if ( pi_desc->nv != posted_intr_vector )
> > +        write_atomic(&pi_desc->nv, posted_intr_vector);
> > +
> > +    /* the vCPU is not on any blocking list. */
> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> > +    if ( pi_block_cpu == NR_CPUS )
> > +        return;
> > +
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> > +
> > +    /*
> > +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> > +     * removed from the blocking list while we are acquiring the lock.
> > +     */
> > +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> > +    {
> > +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> > +        return;
> > +    }
> > +
> > +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> > +}
> >
> >  static int vmx_domain_initialise(struct domain *d)
> >  {
> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >
> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> >
> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> > +
> >      v->arch.schedule_tail    = vmx_do_resume;
> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> >
> > +    if ( iommu_intpost )
> > +        v->arch.vcpu_block = vmx_vcpu_block;
> > +
> >      if ( (rc = vmx_create_vmcs(v)) != 0 )
> >      {
> >          dprintk(XENLOG_WARNING,
> > @@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
> >      vmx_save_guest_msrs(v);
> >      vmx_restore_host_msrs();
> >      vmx_save_dr(v);
> > +    vmx_pi_switch_from(v);
> >  }
> >
> >  static void vmx_ctxt_switch_to(struct vcpu *v)
> > @@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
> >
> >      vmx_restore_guest_msrs(v);
> >      vmx_restore_dr(v);
> > +    vmx_pi_switch_to(v);
> >  }
> >
> >
> > @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata
> > vmx_function_table = {
> >      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
> >  };
> >
> > +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> > +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> > +{
> > +    struct arch_vmx_struct *vmx, *tmp;
> > +    struct vcpu *v;
> > +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> > +    struct list_head *blocked_vcpus =
> > +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> > +    LIST_HEAD(list);
> 
> what's it?

Oh, this should be removed. I missed this while removing some code in this
function from v9.

> 
> > +
> > +    ack_APIC_irq();
> > +    this_cpu(irq_count)++;
> > +
> > +    spin_lock(lock);
> > +
> > +    /*
> > +     * XXX: The length of the list depends on how many vCPU is current
> > +     * blocked on this specific pCPU. This may hurt the interrupt latency
> > +     * if the list grows to too many entries.
> > +     */
> > +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> > +    {
> > +        if ( pi_test_on(&vmx->pi_desc) )
> > +        {
> > +            list_del(&vmx->pi_blocked_vcpu_list);
> > +            vmx->pi_block_cpu = NR_CPUS;
> > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> > +            vcpu_unblock(v);
> > +        }
> > +    }
> > +
> > +    spin_unlock(lock);
> > +}
> > +
> 
> I may miss something here. Who will clear ON bit? test read?

There are two places where ON can get clear.
- In vmx_sync_pir_to_irr() during sync interrupts information from
PIR to IRR
- If the guest is running in non-root, the CPU hardware will clear
'ON' when handling the notification event. No vm-exits in this case.

Thanks,
Feng


> 
> It's unclear from design doc:
> 
> +Outstanding Notification (ON): Indicate if there is a notification event
> outstanding (not
> +processed by processor or software) for this Posted Interrupt Descriptor. When
> this field is 0,
> +hardware modifies it from 0 to 1 when generating a notification event, and the
> entity receiving
> +the notification event (processor or software) resets it as part of posted
> interrupt processing.
> 
> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-11  1:58     ` Wu, Feng
@ 2015-12-11  2:27       ` Tian, Kevin
  2015-12-11  3:12         ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Tian, Kevin @ 2015-12-11  2:27 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Friday, December 11, 2015 9:59 AM
>
> > > +void vmx_vcpu_block(struct vcpu *v)
> > > +{
> > > +    unsigned long flags;
> > > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > +
> > > +    if ( !has_arch_pdevs(v->domain) )
> > > +        return;
> > > +
> > > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > > +
> > > +    /*
> > > +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> > > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> > > +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> > > +     * make it as the destination of the wake-up notification event.
> >
> > the 2nd piece - "we also save it to posted-interrupt descriptor" is not
> > reflected within this function. Do you mean "we have saved it to..."
> > or "we will save it later to..." in other places?

what about this question?

> >
> > > +
> > > +    ack_APIC_irq();
> > > +    this_cpu(irq_count)++;
> > > +
> > > +    spin_lock(lock);
> > > +
> > > +    /*
> > > +     * XXX: The length of the list depends on how many vCPU is current
> > > +     * blocked on this specific pCPU. This may hurt the interrupt latency
> > > +     * if the list grows to too many entries.
> > > +     */
> > > +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> > > +    {
> > > +        if ( pi_test_on(&vmx->pi_desc) )
> > > +        {
> > > +            list_del(&vmx->pi_blocked_vcpu_list);
> > > +            vmx->pi_block_cpu = NR_CPUS;
> > > +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> > > +            vcpu_unblock(v);
> > > +        }
> > > +    }
> > > +
> > > +    spin_unlock(lock);
> > > +}
> > > +
> >
> > I may miss something here. Who will clear ON bit? test read?
> 
> There are two places where ON can get clear.
> - In vmx_sync_pir_to_irr() during sync interrupts information from
> PIR to IRR
> - If the guest is running in non-root, the CPU hardware will clear
> 'ON' when handling the notification event. No vm-exits in this case.
> 

Thanks for explanation. It makes sense to me. Indeed no need to
clear ON here.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-11  2:27       ` Tian, Kevin
@ 2015-12-11  3:12         ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-12-11  3:12 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Wu, Feng, George Dunlap, Andrew Cooper, Dario Faggioli,
	Jan Beulich, Keir Fraser



> -----Original Message-----
> From: Tian, Kevin
> Sent: Friday, December 11, 2015 10:28 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>; Dario Faggioli <dario.faggioli@citrix.com>
> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> > From: Wu, Feng
> > Sent: Friday, December 11, 2015 9:59 AM
> >
> > > > +void vmx_vcpu_block(struct vcpu *v)
> > > > +{
> > > > +    unsigned long flags;
> > > > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > > > +
> > > > +    if ( !has_arch_pdevs(v->domain) )
> > > > +        return;
> > > > +
> > > > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > > > +
> > > > +    /*
> > > > +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> > > > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it
> for
> > > > +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> > > > +     * make it as the destination of the wake-up notification event.
> > >
> > > the 2nd piece - "we also save it to posted-interrupt descriptor" is not
> > > reflected within this function. Do you mean "we have saved it to..."
> > > or "we will save it later to..." in other places?
> 
> what about this question?

Oh, sorry, I didn't notice this comments. Originally, I saved the pCPU information
in posted-interrupts descriptor in this function, however, George pointed out
that the NDST filed in posted-interrupts descriptor should have pointed to the
same pCPU, no need to update it again, so I removed the code. Seems I forgot
to update the comments, will do it in the next version, thanks for pointing it out!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
  2015-12-10 11:40   ` Tian, Kevin
@ 2015-12-21  6:43   ` Wu, Feng
  2015-12-21  9:03     ` Dario Faggioli
  2015-12-23  2:21   ` Dario Faggioli
  2016-01-18 15:14   ` Jan Beulich
  3 siblings, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-12-21  6:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Jan Beulich, Wu, Feng

Ping...

> -----Original Message-----
> From: Wu, Feng
> Sent: Thursday, December 3, 2015 4:36 PM
> To: xen-devel@lists.xen.org
> Cc: Wu, Feng <feng.wu@intel.com>; Keir Fraser <keir@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Tian,
> Kevin <kevin.tian@intel.com>; George Dunlap <george.dunlap@eu.citrix.com>;
> Dario Faggioli <dario.faggioli@citrix.com>
> Subject: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> This is the core logic handling for VT-d posted-interrupts. Basically it
> deals with how and when to update posted-interrupts during the following
> scenarios:
> - vCPU is preempted
> - vCPU is slept
> - vCPU is blocked
> 
> When vCPU is preempted/slept, we update the posted-interrupts during
> scheduling by introducing two new architecutral scheduler hooks:
> vmx_pi_switch_from() and vmx_pi_switch_to(). When vCPU is blocked, we
> introduce a new architectural hooks: arch_vcpu_block() to update
> posted-interrupts descriptor.
> 
> Besides that, before VM-entry, we will make sure the 'NV' filed is set
> to 'posted_intr_vector' and the vCPU is not in any blocking lists, which
> is needed when vCPU is running in non-root mode. The reason we do this check
> is because we change the posted-interrupts descriptor in vcpu_block(),
> however, we don't change it back in vcpu_unblock() or when vcpu_block()
> directly returns due to event delivery (in fact, we don't need to do it
> in the two places, that is why we do it before VM-Entry).
> 
> When we handle the lazy context switch for the following two scenarios:
> - Preempted by a tasklet, which uses in an idle context.
> - the prev vcpu is in offline and no new available vcpus in run queue.
> We don't change the 'SN' bit in posted-interrupt descriptor, this
> may incur spurious PI notification events, but since PI notification
> event is only sent when 'ON' is clear, and once the PI notificatoin
> is sent, ON is set by hardware, hence no more notification events
> before 'ON' is clear. Besides that, spurious PI notification events are
> going to happen from time to time in Xen hypervisor, such as, when
> guests trap to Xen and PI notification event happens, there is
> nothing Xen actually needs to do about it, the interrupts will be
> delivered to guest atht the next time we do a VMENTRY.
> 
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
> Suggested-by: Dario Faggioli <dario.faggioli@citrix.com>
> Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v10:
> - Check iommu_intpost first
> - Remove pointless checking of has_hvm_container_vcpu(v)
> - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
> - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
>   don't need use another list to save the vCPUs with 'ON' set, just
>   directly call vcpu_unblock(v).
> 
> v9:
> - Remove arch_vcpu_block_cancel() and arch_vcpu_wake_prepare()
> - Add vmx_pi_state_change() and call it before VM Entry
> 
> v8:
> - Remove the lazy context switch handling for PI state transition
> - Change PI state in vcpu_block() and do_poll() when the vCPU
>   is going to be blocked
> 
> v7:
> - Merge [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
>   and "[PATCH v6 14/18] vmx: posted-interrupt handling when vCPU is blocked"
>   into this patch, so it is self-contained and more convenient
>   for code review.
> - Make 'pi_blocked_vcpu' and 'pi_blocked_vcpu_lock' static
> - Coding style
> - Use per_cpu() instead of this_cpu() in pi_wakeup_interrupt()
> - Move ack_APIC_irq() to the beginning of pi_wakeup_interrupt()
> - Rename 'pi_ctxt_switch_from' to 'ctxt_switch_prepare'
> - Rename 'pi_ctxt_switch_to' to 'ctxt_switch_cancel'
> - Use 'has_hvm_container_vcpu' instead of 'is_hvm_vcpu'
> - Use 'spin_lock' and 'spin_unlock' when the interrupt has been
>   already disabled.
> - Rename arch_vcpu_wake_prepare to vmx_vcpu_wake_prepare
> - Define vmx_vcpu_wake_prepare in xen/arch/x86/hvm/hvm.c
> - Call .pi_ctxt_switch_to() __context_switch() instead of directly
>   calling vmx_post_ctx_switch_pi() in vmx_ctxt_switch_to()
> - Make .pi_block_cpu unsigned int
> - Use list_del() instead of list_del_init()
> - Coding style
> 
> One remaining item in v7:
> Jan has concern about calling vcpu_unblock() in vmx_pre_ctx_switch_pi(),
> need Dario or George's input about this.
> 
> v6:
> - Add two static inline functions for pi context switch
> - Fix typos
> 
> v5:
> - Rename arch_vcpu_wake to arch_vcpu_wake_prepare
> - Make arch_vcpu_wake_prepare() inline for ARM
> - Merge the ARM dummy hook with together
> - Changes to some code comments
> - Leave 'pi_ctxt_switch_from' and 'pi_ctxt_switch_to' NULL if
>   PI is disabled or the vCPU is not in HVM
> - Coding style
> 
> v4:
> - Newly added
> 
> Changlog for "vmx: posted-interrupt handling when vCPU is blocked"
> v6:
> - Fix some typos
> - Ack the interrupt right after the spin_unlock in pi_wakeup_interrupt()
> 
> v4:
> - Use local variables in pi_wakeup_interrupt()
> - Remove vcpu from the blocked list when pi_desc.on==1, this
> - avoid kick vcpu multiple times.
> - Remove tasklet
> 
> v3:
> - This patch is generated by merging the following three patches in v2:
>    [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
>    [RFC v2 10/15] vmx: Define two per-cpu variables
>    [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
> - rename 'vcpu_wakeup_tasklet' to 'pi_vcpu_wakeup_tasklet'
> - Move the definition of 'pi_vcpu_wakeup_tasklet' to 'struct arch_vmx_struct'
> - rename 'vcpu_wakeup_tasklet_handler' to 'pi_vcpu_wakeup_tasklet_handler'
> - Make pi_wakeup_interrupt() static
> - Rename 'blocked_vcpu_list' to 'pi_blocked_vcpu_list'
> - move 'pi_blocked_vcpu_list' to 'struct arch_vmx_struct'
> - Rename 'blocked_vcpu' to 'pi_blocked_vcpu'
> - Rename 'blocked_vcpu_lock' to 'pi_blocked_vcpu_lock'
> 
>  xen/arch/x86/hvm/hvm.c             |   6 ++
>  xen/arch/x86/hvm/vmx/vmcs.c        |   2 +
>  xen/arch/x86/hvm/vmx/vmx.c         | 172
> +++++++++++++++++++++++++++++++++++++
>  xen/common/schedule.c              |   4 +
>  xen/include/asm-arm/domain.h       |   2 +
>  xen/include/asm-x86/domain.h       |   2 +
>  xen/include/asm-x86/hvm/hvm.h      |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   9 ++
>  xen/include/asm-x86/hvm/vmx/vmx.h  |   4 +
>  9 files changed, 203 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 6c2b512..3368cf2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
>      hvm_destroy_all_ioreq_servers(d);
>  }
> 
> +void arch_vcpu_block(struct vcpu *v)
> +{
> +    if ( v->arch.vcpu_block )
> +        v->arch.vcpu_block(v);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 000d06e..0f23fce 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -676,6 +676,8 @@ int vmx_cpu_up(void)
>      if ( cpu_has_vmx_vpid )
>          vpid_sync_all();
> 
> +    vmx_pi_per_cpu_init(cpu);
> +
>      return 0;
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 39dc500..0d9462e 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr,
> uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
> 
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;
> +
> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> +}
> +
> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
> +    /*
> +     * The vCPU has been preempted or went to sleep. We don't need to send
> +     * notification event to a non-running vcpu, the interrupt information
> +     * will be delivered to it before VM-ENTRY when the vcpu is scheduled
> +     * to run next time.
> +     */
> +    pi_set_sn(pi_desc);
> +}
> +
> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));
> +
> +    pi_clear_sn(pi_desc);
> +}
> +
> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> +     * removed from the blocking list while we are acquiring the lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> +}
> 
>  static int vmx_domain_initialise(struct domain *d)
>  {
> @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> 
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> 
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> 
> +    if ( iommu_intpost )
> +        v->arch.vcpu_block = vmx_vcpu_block;
> +
>      if ( (rc = vmx_create_vmcs(v)) != 0 )
>      {
>          dprintk(XENLOG_WARNING,
> @@ -734,6 +865,7 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
>      vmx_save_guest_msrs(v);
>      vmx_restore_host_msrs();
>      vmx_save_dr(v);
> +    vmx_pi_switch_from(v);
>  }
> 
>  static void vmx_ctxt_switch_to(struct vcpu *v)
> @@ -758,6 +890,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
> 
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
> +    vmx_pi_switch_to(v);
>  }
> 
> 
> @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
> 
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> +    LIST_HEAD(list);
> +
> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +            vcpu_unblock(v);
> +        }
> +    }
> +
> +    spin_unlock(lock);
> +}
> +
>  /* Handle VT-d posted-interrupt when VCPU is running. */
>  static void pi_notification_interrupt(struct cpu_user_regs *regs)
>  {
> @@ -2100,7 +2267,10 @@ const struct hvm_function_table * __init
> start_vmx(void)
>      if ( cpu_has_vmx_posted_intr_processing )
>      {
>          if ( iommu_intpost )
> +        {
>              alloc_direct_apic_vector(&posted_intr_vector, pi_notification_interrupt);
> +            alloc_direct_apic_vector(&pi_wakeup_vector, pi_wakeup_interrupt);
> +        }
>          else
>              alloc_direct_apic_vector(&posted_intr_vector, event_check_interrupt);
>      }
> @@ -3543,6 +3713,8 @@ void vmx_vmenter_helper(const struct cpu_user_regs
> *regs)
>      struct hvm_vcpu_asid *p_asid;
>      bool_t need_flush;
> 
> +    vmx_pi_state_to_normal(curr);
> +
>      if ( !cpu_has_vmx_vpid )
>          goto out;
>      if ( nestedhvm_vcpu_in_guestmode(curr) )
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index c195129..fc18035 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -802,6 +802,8 @@ void vcpu_block(void)
> 
>      set_bit(_VPF_blocked, &v->pause_flags);
> 
> +    arch_vcpu_block(v);
> +
>      /* Check for events /after/ blocking: avoids wakeup waiting race. */
>      if ( local_events_need_delivery() )
>      {
> @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
>      v->poll_evtchn = -1;
>      set_bit(v->vcpu_id, d->poll_mask);
> 
> +    arch_vcpu_block(v);
> +
>  #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
>      /* Check for events /after/ setting flags: avoids wakeup waiting race. */
>      smp_mb();
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index e7e40da..ada146c 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -310,6 +310,8 @@ static inline void free_vcpu_guest_context(struct
> vcpu_guest_context *vgc)
>      xfree(vgc);
>  }
> 
> +static inline void arch_vcpu_block(struct vcpu *v) {}
> +
>  #endif /* __ASM_DOMAIN_H__ */
> 
>  /*
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index c825975..135f7f9 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -494,6 +494,8 @@ struct arch_vcpu
>      void (*ctxt_switch_from) (struct vcpu *);
>      void (*ctxt_switch_to) (struct vcpu *);
> 
> +    void (*vcpu_block) (struct vcpu *);
> +
>      struct vpmu_struct vpmu;
> 
>      /* Virtual Machine Extensions */
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index f80e143..336fa62 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -560,6 +560,8 @@ void altp2m_vcpu_update_vmfunc_ve(struct vcpu *v);
>  /* emulates #VE */
>  bool_t altp2m_vcpu_emulate_ve(struct vcpu *v);
> 
> +void arch_vcpu_block(struct vcpu *v);
> +
>  #endif /* __ASM_X86_HVM_HVM_H__ */
> 
>  /*
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> x86/hvm/vmx/vmcs.h
> index b3b0946..7b3a931 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,15 @@ struct arch_vmx_struct {
>      struct page_info     *vmwrite_bitmap;
> 
>      struct page_info     *pml_pg;
> +
> +    struct list_head     pi_blocked_vcpu_list;
> +
> +    /*
> +     * Before vCPU is blocked, it is added to the global per-cpu list
> +     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> +     * event to 'pi_block_cpu' and wakeup the related vCPU.
> +     */
> +    unsigned int         pi_block_cpu;
>  };
> 
>  int vmx_create_vmcs(struct vcpu *v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-
> x86/hvm/vmx/vmx.h
> index 1719965..8129bff 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -28,6 +28,8 @@
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/vmx/vmcs.h>
> 
> +extern uint8_t pi_wakeup_vector;
> +
>  typedef union {
>      struct {
>          u64 r       :   1,  /* bit 0 - Read permission */
> @@ -563,6 +565,8 @@ int alloc_p2m_hap_data(struct p2m_domain *p2m);
>  void free_p2m_hap_data(struct p2m_domain *p2m);
>  void p2m_init_hap_data(struct p2m_domain *p2m);
> 
> +void vmx_pi_per_cpu_init(unsigned int cpu);
> +
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION         0
>  #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
> --
> 2.1.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-21  6:43   ` Wu, Feng
@ 2015-12-21  9:03     ` Dario Faggioli
  2015-12-21  9:26       ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Dario Faggioli @ 2015-12-21  9:03 UTC (permalink / raw)
  To: Wu, Feng, xen-devel
  Cc: George Dunlap, Andrew Cooper, Tian, Kevin, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 487 bytes --]

On Mon, 2015-12-21 at 06:43 +0000, Wu, Feng wrote:
> Ping...
> 
Yep, it's on my list of things to do before leaving for the winter
holidays (which will be on Wednesday). I'll send in my comments soon.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-21  9:03     ` Dario Faggioli
@ 2015-12-21  9:26       ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-12-21  9:26 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Monday, December 21, 2015 5:04 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Tian, Kevin <kevin.tian@intel.com>;
> George Dunlap <george.dunlap@eu.citrix.com>
> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> On Mon, 2015-12-21 at 06:43 +0000, Wu, Feng wrote:
> > Ping...
> >
> Yep, it's on my list of things to do before leaving for the winter
> holidays (which will be on Wednesday). I'll send in my comments soon.

Thanks a lot, Dario!

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
  2015-12-10 11:40   ` Tian, Kevin
  2015-12-21  6:43   ` Wu, Feng
@ 2015-12-23  2:21   ` Dario Faggioli
  2015-12-23  2:28     ` Wu, Feng
  2015-12-23  4:58     ` Wu, Feng
  2016-01-18 15:14   ` Jan Beulich
  3 siblings, 2 replies; 47+ messages in thread
From: Dario Faggioli @ 2015-12-23  2:21 UTC (permalink / raw)
  To: Feng Wu, xen-devel
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 5179 bytes --]

Here I am,

On Thu, 2015-12-03 at 16:35 +0800, Feng Wu wrote:
> This is the core logic handling for VT-d posted-interrupts. Basically
> it
> deals with how and when to update posted-interrupts during the
> following
> scenarios:
> - vCPU is preempted
> - vCPU is slept
> - vCPU is blocked
> 
> [..]
>
Thanks for changing the changelog, making it look like much more an
high level description of what happens and why.

> v10:
> - Check iommu_intpost first
> - Remove pointless checking of has_hvm_container_vcpu(v)
> - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
> - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
>   don't need use another list to save the vCPUs with 'ON' set, just
>   directly call vcpu_unblock(v).
> 
This were all my comments to v9 and, I've verified in the patch, they
actually have been all addressed... Thanks for this.

This patch looks fine to me now. There are a few minor issues that I'll
raise inline, but in general, I think this is a good design, and the
patch does it job fine at implementing it.

Here they are the detailed comments.

First of all, trying to apply it, I got:

<stdin>:105: trailing whitespace.
void arch_vcpu_block(struct vcpu *v)
<stdin>:106: trailing whitespace.
{
<stdin>:107: trailing whitespace.
    if ( v->arch.vcpu_block )
<stdin>:108: trailing whitespace.
        v->arch.vcpu_block(v);
<stdin>:109: trailing whitespace.
}

It may not be accurate, though (i.e., it may be due to what I used for
applying the patches), so, please, double check.

And there are also a couple of long lines, which you should split.

> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per
> pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> it for
> +     * the per-CPU list, we also save it to posted-interrupt
> descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v-
> >arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point
> to
> +     * the same pCPU as v->processor.
> +     */
> +
So, maybe we can ASSERT() that?

> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> +}

> +static void vmx_pi_switch_from(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> +         test_bit(_VPF_blocked, &v->pause_flags) )
> +        return;
> +
> +    /*
> +     * The vCPU has been preempted or went to sleep. We don't need
> to send
> +     * notification event to a non-running vcpu, the interrupt
> information
> +     * will be delivered to it before VM-ENTRY when the vcpu is
> scheduled
> +     * to run next time.
> +     */
> +    pi_set_sn(pi_desc);
> +}
> +
> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));
> +
> +    pi_clear_sn(pi_desc);
>
No comment matching the one above (for pi_set_sn(), in
vmx_pi_switch_from())? I don't feel too strong about this, but it would
be nice to have both (or none, but I'd prefer both! :-D).

> +}
> +
> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
>
I'm still a bit puzzled about the name... But it's better than in the
previous round, and I can't suggest a solution that I would like myself
better... so I'm fine with this one.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23  2:21   ` Dario Faggioli
@ 2015-12-23  2:28     ` Wu, Feng
  2015-12-23  5:16       ` Tian, Kevin
  2015-12-23  4:58     ` Wu, Feng
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-12-23  2:28 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Andrew Cooper, Jan Beulich,
	Keir Fraser



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, December 23, 2015 10:21 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic
> handling
> 
> Here I am,

Nice to having you reviewing the code :)

> 
> On Thu, 2015-12-03 at 16:35 +0800, Feng Wu wrote:
> > This is the core logic handling for VT-d posted-interrupts. Basically
> > it
> > deals with how and when to update posted-interrupts during the
> > following
> > scenarios:
> > - vCPU is preempted
> > - vCPU is slept
> > - vCPU is blocked
> >
> > [..]
> >
> Thanks for changing the changelog, making it look like much more an
> high level description of what happens and why.
> 
> > v10:
> > - Check iommu_intpost first
> > - Remove pointless checking of has_hvm_container_vcpu(v)
> > - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
> > - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
> >   don't need use another list to save the vCPUs with 'ON' set, just
> >   directly call vcpu_unblock(v).
> >
> This were all my comments to v9 and, I've verified in the patch, they
> actually have been all addressed... Thanks for this.
> 
> This patch looks fine to me now. There are a few minor issues that I'll
> raise inline, but in general, I think this is a good design, and the
> patch does it job fine at implementing it.
> 
> Here they are the detailed comments.
> 
> First of all, trying to apply it, I got:
> 
> <stdin>:105: trailing whitespace.
> void arch_vcpu_block(struct vcpu *v)
> <stdin>:106: trailing whitespace.
> {
> <stdin>:107: trailing whitespace.
>     if ( v->arch.vcpu_block )
> <stdin>:108: trailing whitespace.
>         v->arch.vcpu_block(v);
> <stdin>:109: trailing whitespace.
> }
> 
> It may not be accurate, though (i.e., it may be due to what I used for
> applying the patches), so, please, double check.

Sure, will double check it.

> 
> And there are also a couple of long lines, which you should split.
> 
> > +void vmx_vcpu_block(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > +
> > +    /*
> > +     * The vCPU is blocking, we need to add it to one of the per
> > pCPU lists.
> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> > it for
> > +     * the per-CPU list, we also save it to posted-interrupt
> > descriptor and
> > +     * make it as the destination of the wake-up notification event.
> > +     */
> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> > +
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> > +                  &per_cpu(pi_blocked_vcpu, v-
> > >arch.hvm_vmx.pi_block_cpu));
> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> > +
> > +    ASSERT(!pi_test_sn(pi_desc));
> > +
> > +    /*
> > +     * We don't need to set the 'NDST' field, since it should point
> > to
> > +     * the same pCPU as v->processor.
> > +     */
> > +
> So, maybe we can ASSERT() that?

Yes, I think ASSERT() is preferred here.

> 
> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> > +}
> 
> > +static void vmx_pi_switch_from(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> > +         test_bit(_VPF_blocked, &v->pause_flags) )
> > +        return;
> > +
> > +    /*
> > +     * The vCPU has been preempted or went to sleep. We don't need
> > to send
> > +     * notification event to a non-running vcpu, the interrupt
> > information
> > +     * will be delivered to it before VM-ENTRY when the vcpu is
> > scheduled
> > +     * to run next time.
> > +     */
> > +    pi_set_sn(pi_desc);
> > +}
> > +
> > +static void vmx_pi_switch_to(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    if ( x2apic_enabled )
> > +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> > +    else
> > +        write_atomic(&pi_desc->ndst,
> > +                     MASK_INSR(cpu_physical_id(v->processor),
> > +                     PI_xAPIC_NDST_MASK));
> > +
> > +    pi_clear_sn(pi_desc);
> >
> No comment matching the one above (for pi_set_sn(), in
> vmx_pi_switch_from())? I don't feel too strong about this, but it would
> be nice to have both (or none, but I'd prefer both! :-D).

I will add some comments here.

> 
> > +}
> > +
> > +static void vmx_pi_state_to_normal(struct vcpu *v)
> > +{
> >
> I'm still a bit puzzled about the name... But it's better than in the
> previous round, and I can't suggest a solution that I would like myself
> better... so I'm fine with this one.

Yeah, I also didn't find a better name, I will continue to think about it:)

In the end, thanks for the review, Dario! Merry Christmas and Happy New Year!!

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23  2:21   ` Dario Faggioli
  2015-12-23  2:28     ` Wu, Feng
@ 2015-12-23  4:58     ` Wu, Feng
  2015-12-23 10:09       ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2015-12-23  4:58 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, Wu, Feng

Hi Jan,

Kevin and Dario gave some comments about this patch. I would like to
know whether you have any comments about this patch, it is highly
appreciated if you can give your opinions, which is very important for
it to get merged as soon as possible. Thank you!

Thanks,
Feng

> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, December 23, 2015 10:21 AM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic
> handling
> 
> Here I am,
> 
> On Thu, 2015-12-03 at 16:35 +0800, Feng Wu wrote:
> > This is the core logic handling for VT-d posted-interrupts. Basically
> > it
> > deals with how and when to update posted-interrupts during the
> > following
> > scenarios:
> > - vCPU is preempted
> > - vCPU is slept
> > - vCPU is blocked
> >
> > [..]
> >
> Thanks for changing the changelog, making it look like much more an
> high level description of what happens and why.
> 
> > v10:
> > - Check iommu_intpost first
> > - Remove pointless checking of has_hvm_container_vcpu(v)
> > - Rename 'vmx_pi_state_change' to 'vmx_pi_state_to_normal'
> > - Since vcpu_unblock() doesn't acquire 'pi_blocked_vcpu_lock', we
> >   don't need use another list to save the vCPUs with 'ON' set, just
> >   directly call vcpu_unblock(v).
> >
> This were all my comments to v9 and, I've verified in the patch, they
> actually have been all addressed... Thanks for this.
> 
> This patch looks fine to me now. There are a few minor issues that I'll
> raise inline, but in general, I think this is a good design, and the
> patch does it job fine at implementing it.
> 
> Here they are the detailed comments.
> 
> First of all, trying to apply it, I got:
> 
> <stdin>:105: trailing whitespace.
> void arch_vcpu_block(struct vcpu *v)
> <stdin>:106: trailing whitespace.
> {
> <stdin>:107: trailing whitespace.
>     if ( v->arch.vcpu_block )
> <stdin>:108: trailing whitespace.
>         v->arch.vcpu_block(v);
> <stdin>:109: trailing whitespace.
> }
> 
> It may not be accurate, though (i.e., it may be due to what I used for
> applying the patches), so, please, double check.
> 
> And there are also a couple of long lines, which you should split.
> 
> > +void vmx_vcpu_block(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > +
> > +    /*
> > +     * The vCPU is blocking, we need to add it to one of the per
> > pCPU lists.
> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> > it for
> > +     * the per-CPU list, we also save it to posted-interrupt
> > descriptor and
> > +     * make it as the destination of the wake-up notification event.
> > +     */
> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> > +
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> > +                  &per_cpu(pi_blocked_vcpu, v-
> > >arch.hvm_vmx.pi_block_cpu));
> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> > +
> > +    ASSERT(!pi_test_sn(pi_desc));
> > +
> > +    /*
> > +     * We don't need to set the 'NDST' field, since it should point
> > to
> > +     * the same pCPU as v->processor.
> > +     */
> > +
> So, maybe we can ASSERT() that?
> 
> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> > +}
> 
> > +static void vmx_pi_switch_from(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) ||
> > +         test_bit(_VPF_blocked, &v->pause_flags) )
> > +        return;
> > +
> > +    /*
> > +     * The vCPU has been preempted or went to sleep. We don't need
> > to send
> > +     * notification event to a non-running vcpu, the interrupt
> > information
> > +     * will be delivered to it before VM-ENTRY when the vcpu is
> > scheduled
> > +     * to run next time.
> > +     */
> > +    pi_set_sn(pi_desc);
> > +}
> > +
> > +static void vmx_pi_switch_to(struct vcpu *v)
> > +{
> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> > +
> > +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> > +        return;
> > +
> > +    if ( x2apic_enabled )
> > +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> > +    else
> > +        write_atomic(&pi_desc->ndst,
> > +                     MASK_INSR(cpu_physical_id(v->processor),
> > +                     PI_xAPIC_NDST_MASK));
> > +
> > +    pi_clear_sn(pi_desc);
> >
> No comment matching the one above (for pi_set_sn(), in
> vmx_pi_switch_from())? I don't feel too strong about this, but it would
> be nice to have both (or none, but I'd prefer both! :-D).
> 
> > +}
> > +
> > +static void vmx_pi_state_to_normal(struct vcpu *v)
> > +{
> >
> I'm still a bit puzzled about the name... But it's better than in the
> previous round, and I can't suggest a solution that I would like myself
> better... so I'm fine with this one.
> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23  2:28     ` Wu, Feng
@ 2015-12-23  5:16       ` Tian, Kevin
  2015-12-23  5:18         ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Tian, Kevin @ 2015-12-23  5:16 UTC (permalink / raw)
  To: Wu, Feng, Dario Faggioli, xen-devel
  Cc: George Dunlap, Andrew Cooper, Keir Fraser, Jan Beulich

> From: Wu, Feng
> Sent: Wednesday, December 23, 2015 10:29 AM
> >
> > > +}
> > > +
> > > +static void vmx_pi_state_to_normal(struct vcpu *v)
> > > +{
> > >
> > I'm still a bit puzzled about the name... But it's better than in the
> > previous round, and I can't suggest a solution that I would like myself
> > better... so I'm fine with this one.
> 
> Yeah, I also didn't find a better name, I will continue to think about it:)
> 

I guess what you say 'normal' you actually mean resuming to 
non-root mode to get posted-interrupt benefit, i.e. direct delivery
of external interrupt w/o VMM intervention.

Then what about vmx_pi_do_resume?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23  5:16       ` Tian, Kevin
@ 2015-12-23  5:18         ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2015-12-23  5:18 UTC (permalink / raw)
  To: Tian, Kevin, Dario Faggioli, xen-devel
  Cc: George Dunlap, Andrew Cooper, Wu, Feng, Keir Fraser, Jan Beulich



> -----Original Message-----
> From: Tian, Kevin
> Sent: Wednesday, December 23, 2015 1:17 PM
> To: Wu, Feng <feng.wu@intel.com>; Dario Faggioli <dario.faggioli@citrix.com>;
> xen-devel@lists.xen.org
> Cc: Keir Fraser <keir@xen.org>; George Dunlap <george.dunlap@eu.citrix.com>;
> Andrew Cooper <andrew.cooper3@citrix.com>; Jan Beulich
> <jbeulich@suse.com>
> Subject: RE: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic
> handling
> 
> > From: Wu, Feng
> > Sent: Wednesday, December 23, 2015 10:29 AM
> > >
> > > > +}
> > > > +
> > > > +static void vmx_pi_state_to_normal(struct vcpu *v)
> > > > +{
> > > >
> > > I'm still a bit puzzled about the name... But it's better than in the
> > > previous round, and I can't suggest a solution that I would like myself
> > > better... so I'm fine with this one.
> >
> > Yeah, I also didn't find a better name, I will continue to think about it:)
> >
> 
> I guess what you say 'normal' you actually mean resuming to
> non-root mode to get posted-interrupt benefit, i.e. direct delivery
> of external interrupt w/o VMM intervention.

Exactly.

> 
> Then what about vmx_pi_do_resume?

Sounds great ! :)

Thanks,
Feng

> 
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23  4:58     ` Wu, Feng
@ 2015-12-23 10:09       ` Jan Beulich
  2016-01-18  1:20         ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2015-12-23 10:09 UTC (permalink / raw)
  To: feng.wu, xen-devel
  Cc: george.dunlap, andrew.cooper3, dario.faggioli, keir, kevin.tian

>>> "Wu, Feng" <feng.wu@intel.com> 12/23/15 5:58 AM >>>
>Kevin and Dario gave some comments about this patch. I would like to
>know whether you have any comments about this patch, it is highly
>appreciated if you can give your opinions, which is very important for
>it to get merged as soon as possible. Thank you!

Only after January 6th, and even then I can't predict when I'll get to it. Yes,
it has been pending for a long time, but no, this is not something that I can
just quickly slip between two other things: What you're doing is complicated
and not the prettiest of all things, so it needs careful looking at, and hence
enough time (in one piece).

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-23 10:09       ` Jan Beulich
@ 2016-01-18  1:20         ` Wu, Feng
  2016-01-18  8:40           ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-18  1:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:jbeulich@suse.com]
> Sent: Wednesday, December 23, 2015 6:10 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>;
> keir@xen.org
> Subject: Re: RE: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core
> logic handling
> 
> >>> "Wu, Feng" <feng.wu@intel.com> 12/23/15 5:58 AM >>>
> >Kevin and Dario gave some comments about this patch. I would like to
> >know whether you have any comments about this patch, it is highly
> >appreciated if you can give your opinions, which is very important for
> >it to get merged as soon as possible. Thank you!
> 
> Only after January 6th, and even then I can't predict when I'll get to it. Yes,
> it has been pending for a long time, but no, this is not something that I can
> just quickly slip between two other things: What you're doing is complicated
> and not the prettiest of all things, so it needs careful looking at, and hence
> enough time (in one piece).

Hi Jan, have you had a chance to look at this remaining one so far?

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18  1:20         ` Wu, Feng
@ 2016-01-18  8:40           ` Jan Beulich
  2016-01-18  8:45             ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2016-01-18  8:40 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 18.01.16 at 02:20, <feng.wu@intel.com> wrote:
> Hi Jan, have you had a chance to look at this remaining one so far?

No, not yet, sorry.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18  8:40           ` Jan Beulich
@ 2016-01-18  8:45             ` Wu, Feng
  2016-01-18  9:03               ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-18  8:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 18, 2016 4:41 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; keir@xen.org
> Subject: RE: RE: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core
> logic handling
> 
> >>> On 18.01.16 at 02:20, <feng.wu@intel.com> wrote:
> > Hi Jan, have you had a chance to look at this remaining one so far?
> 
> No, not yet, sorry.

It has been pending for such a long time, and Dario and Kevin both reviewed
it, it is pending on your comments, could you tell when you will look at it?

Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18  8:45             ` Wu, Feng
@ 2016-01-18  9:03               ` Jan Beulich
  2016-01-19  8:48                 ` Wu, Feng
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2016-01-18  9:03 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel

>>> On 18.01.16 at 09:45, <feng.wu@intel.com> wrote:
> It has been pending for such a long time, and Dario and Kevin both reviewed
> it, it is pending on your comments, could you tell when you will look at it?

I can't really, and I don't think I've seen e.g. Dario give his
Reviewed-by. This is a complicated topic, which I need sufficient
time (in a single piece) to look into (I had, btw, planned to get to
this today, but just like in so many earlier cases there may again
end up being reasons why I won't make it there). In any event
I view bug fixes (creation as well as review thereof) taking
priority over review of new feature additions, and the next thing
in the order are patches that can be expected to not take as much
time to create/review (yielding the overall better throughput).

I hate to repeat myself, but the best thing to help forward
progress here is for all of you sending out larger changes to the
hypervisor to also help with other patch review (which, perhaps
not in this particular case anymore - since the series has meanwhile
undergone quite a few iterations - but in general, includes for
everyone to look critically at their own patches, as many review
comments could be avoided by doing so). Which, btw, would
also benefit all of you, as it will inevitably lead to a better
understanding of the overall hypervisor code, (hopefully)
ultimately leading to fewer iterations on future patch series.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
                     ` (2 preceding siblings ...)
  2015-12-23  2:21   ` Dario Faggioli
@ 2016-01-18 15:14   ` Jan Beulich
  2016-01-20  7:49     ` Wu, Feng
  2016-01-21  9:05     ` Wu, Feng
  3 siblings, 2 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-18 15:14 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -7019,6 +7019,12 @@ void hvm_domain_soft_reset(struct domain *d)
>      hvm_destroy_all_ioreq_servers(d);
>  }
>  
> +void arch_vcpu_block(struct vcpu *v)
> +{
> +    if ( v->arch.vcpu_block )
> +        v->arch.vcpu_block(v);
> +}

Perhaps better as an inline function? In no case would it (and its
declaration) belong in a HVM-specific file if the hook isn't HVM-specific.
Or if the hook was, then it ought to be put in struct hvm_vcpu.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
>  static int vmx_vmfunc_intercept(struct cpu_user_regs *regs);
>  
> +/*
> + * We maintain a per-CPU linked-list of vCPU, so in PI wakeup handler we
> + * can find which vCPU should be woken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, pi_blocked_vcpu);
> +static DEFINE_PER_CPU(spinlock_t, pi_blocked_vcpu_lock);
> +
>  uint8_t __read_mostly posted_intr_vector;
> +uint8_t __read_mostly pi_wakeup_vector;

I'm pretty sure I complained about this before - this is used only in
this file, and hence should be static.

> +void vmx_pi_per_cpu_init(unsigned int cpu)
> +{
> +    INIT_LIST_HEAD(&per_cpu(pi_blocked_vcpu, cpu));
> +    spin_lock_init(&per_cpu(pi_blocked_vcpu_lock, cpu));
> +}
> +
> +void vmx_vcpu_block(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !has_arch_pdevs(v->domain) )
> +        return;

Is there no locking needed for this check to be safe? Same goes
for vmx_pi_switch_{from,to}() then obviously...

> +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> +
> +    /*
> +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it for
> +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> +     * make it as the destination of the wake-up notification event.
> +     */
> +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> +
> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> +
> +    ASSERT(!pi_test_sn(pi_desc));
> +
> +    /*
> +     * We don't need to set the 'NDST' field, since it should point to
> +     * the same pCPU as v->processor.
> +     */
> +
> +    write_atomic(&pi_desc->nv, pi_wakeup_vector);

Stray blank line between comment and corresponding code.

Also the ASSERT() is rather more disconnected from the write
than seems desirable: Wouldn't it be better to cmpxchg() the
whole 32-bit value, validating that SN is clear in the result?

> +static void vmx_pi_switch_to(struct vcpu *v)
> +{
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    if ( x2apic_enabled )
> +        write_atomic(&pi_desc->ndst, cpu_physical_id(v->processor));
> +    else
> +        write_atomic(&pi_desc->ndst,
> +                     MASK_INSR(cpu_physical_id(v->processor),
> +                     PI_xAPIC_NDST_MASK));

Considering these are atomic writes, I'm not sure the compiler
will manage to fold the two. Could I therefor talk you into
morphing this into

    write_atomic(&pi_desc->ndst, x2apic_enabled ? ... : ... );

? Indentation will need to be fixed in any case.

> +static void vmx_pi_state_to_normal(struct vcpu *v)
> +{
> +    unsigned long flags;
> +    unsigned int pi_block_cpu;
> +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +
> +    if ( !iommu_intpost || !has_arch_pdevs(v->domain) )
> +        return;
> +
> +    ASSERT(!test_bit(_VPF_blocked, &v->pause_flags));
> +
> +    /*
> +     * Set 'NV' field back to posted_intr_vector, so the
> +     * Posted-Interrupts can be delivered to the vCPU when
> +     * it is running in non-root mode.
> +     */
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */

Comment style.

> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;

Are this condition and the one in the immediately preceding if()
connected in any way? I.e. could the latter perhaps become an
ASSERT() by connecting the two blocks suitably? I'm simply
trying to limit the number of different cases to consider mentally
when looking at this code ...

> +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +
> +    /*
> +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU was
> +     * removed from the blocking list while we are acquiring the lock.
> +     */
> +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )

With you wanting to deal with changes behind your back here,
isn't there come kind of barrier needed between reading and using
pi_block_cpu, such that the compiler won't convert the local
variable accesses into multiple reads of
v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?

> +    {
> +        spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +        return;
> +    }
> +
> +    list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;

Perhaps (again for readability's sake) better to invert the if()
condition and move these two lines into its body, ...

> +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu), flags);
> +}

... while this would then be common.

> @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>  
>      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>  
> +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> +
> +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> +
>      v->arch.schedule_tail    = vmx_do_resume;
>      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>  
> +    if ( iommu_intpost )
> +        v->arch.vcpu_block = vmx_vcpu_block;

Considering the locking question above - couldn't this be done only
when a device gets added, and the pointer zapped upon removal
of the last device, at once saving the conditional inside the hook?

> @@ -2014,6 +2147,40 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
>  
> +/* Handle VT-d posted-interrupt when VCPU is blocked. */
> +static void pi_wakeup_interrupt(struct cpu_user_regs *regs)
> +{
> +    struct arch_vmx_struct *vmx, *tmp;
> +    struct vcpu *v;
> +    spinlock_t *lock = &per_cpu(pi_blocked_vcpu_lock, smp_processor_id());
> +    struct list_head *blocked_vcpus =
> +                       &per_cpu(pi_blocked_vcpu, smp_processor_id());
> +    LIST_HEAD(list);

Unused local variable?

> +    ack_APIC_irq();
> +    this_cpu(irq_count)++;
> +
> +    spin_lock(lock);
> +
> +    /*
> +     * XXX: The length of the list depends on how many vCPU is current
> +     * blocked on this specific pCPU. This may hurt the interrupt latency
> +     * if the list grows to too many entries.
> +     */
> +    list_for_each_entry_safe(vmx, tmp, blocked_vcpus, pi_blocked_vcpu_list)
> +    {
> +        if ( pi_test_on(&vmx->pi_desc) )
> +        {
> +            list_del(&vmx->pi_blocked_vcpu_list);
> +            vmx->pi_block_cpu = NR_CPUS;
> +            v = container_of(vmx, struct vcpu, arch.hvm_vmx);
> +            vcpu_unblock(v);

It not being used elsewhere, I don't see the need for the local
variable v. If you really want to keep it, move its declaration into
this most narrow scope please.

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -802,6 +802,8 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> +    arch_vcpu_block(v);
> +
>      /* Check for events /after/ blocking: avoids wakeup waiting race. */
>      if ( local_events_need_delivery() )
>      {

ISTR some previous discussion on this placement - wasn't it
determined that putting it in the else part here would be
sufficient for your needs and better in terms of overhead?
Or is there some race possible if moved past the invocation
of local_events_need_delivery()?

> @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
>      v->poll_evtchn = -1;
>      set_bit(v->vcpu_id, d->poll_mask);
>  
> +    arch_vcpu_block(v);
> +
>  #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
>      /* Check for events /after/ setting flags: avoids wakeup waiting race. */
>      smp_mb();

Same question here then regarding moving this further down.

> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -160,6 +160,15 @@ struct arch_vmx_struct {
>      struct page_info     *vmwrite_bitmap;
>  
>      struct page_info     *pml_pg;
> +
> +    struct list_head     pi_blocked_vcpu_list;
> +
> +    /*
> +     * Before vCPU is blocked, it is added to the global per-cpu list

global or per-cpu?

> +     * of 'pi_block_cpu', then VT-d engine can send wakeup notification
> +     * event to 'pi_block_cpu' and wakeup the related vCPU.
> +     */

Perhaps to help understanding the comment the part after the
second comma could be a separate sentence?

> +    unsigned int         pi_block_cpu;

Looking at all the use sites I wonder whether it wouldn't make for
easier to read code if you instead stored the lock to acquire. This
would then perhaps also prepare a road for balancing lists growing
too large (i.e. rather than having one list per CPU, you could
then have a variable number of lists, depending on current needs,
and [kind of] freely select one of them).

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18  9:03               ` Jan Beulich
@ 2016-01-19  8:48                 ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2016-01-19  8:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, george.dunlap, andrew.cooper3, dario.faggioli,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 18, 2016 5:04 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: andrew.cooper3@citrix.com; dario.faggioli@citrix.com;
> george.dunlap@eu.citrix.com; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; keir@xen.org
> Subject: RE: RE: [Xen-devel] [PATCH v10 6/7] vmx: VT-d posted-interrupt core
> logic handling
> 
> >>> On 18.01.16 at 09:45, <feng.wu@intel.com> wrote:
> > It has been pending for such a long time, and Dario and Kevin both
> reviewed
> > it, it is pending on your comments, could you tell when you will look at it?
> 
> I can't really, and I don't think I've seen e.g. Dario give his
> Reviewed-by. 

I didn't say Dario gave his reviewed-by, but he did reviewed this patch
and said: " This patch looks fine to me now. " with some minor
issues, I have addressed the minor issues and just want to get your
feedbacks before posting version 11 (Thanks for your comments in
the following email).

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18 15:14   ` Jan Beulich
@ 2016-01-20  7:49     ` Wu, Feng
  2016-01-20  8:35       ` Jan Beulich
  2016-01-21  9:05     ` Wu, Feng
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-20  7:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 18, 2016 11:14 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; George Dunlap <george.dunlap@eu.citrix.com>;
> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
> <keir@xen.org>
> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
> msr, uint64_t msr_content);
> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> > +
> > +    /*
> > +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it
> for
> > +     * the per-CPU list, we also save it to posted-interrupt descriptor and
> > +     * make it as the destination of the wake-up notification event.
> > +     */
> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> > +
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> > +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> > +
> > +    ASSERT(!pi_test_sn(pi_desc));
> > +
> > +    /*
> > +     * We don't need to set the 'NDST' field, since it should point to
> > +     * the same pCPU as v->processor.
> > +     */
> > +
> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> 
> Stray blank line between comment and corresponding code.
> 
> Also the ASSERT() is rather more disconnected from the write
> than seems desirable: Wouldn't it be better to cmpxchg() the
> whole 32-bit value, validating that SN is clear in the result?

Not quite understand this. The control field is 64 bits, do you
mean cmpxchg() the whole 64-bit value like follows:

+        do {
+            old.control = new.control = pi_desc->control;
+            new.nv = pi_wakeup_vector;
+        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
+                  != old.control );

This a somewhat like the implementation in the earlier versions,
why do you want to change it back?

> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> > +    if ( pi_block_cpu == NR_CPUS )
> > +        return;
> 
> Are this condition and the one in the immediately preceding if()
> connected in any way?

I am not sure I understand this correctly. Which one is
" the one in the immediately preceding if()"?

> I.e. could the latter perhaps become an
> ASSERT() by connecting the two blocks suitably? I'm simply
> trying to limit the number of different cases to consider mentally
> when looking at this code ...

If we get a true from above check, it means the vcpu was removed by
pi_wakeup_interrupt(), then we just need to return. If we get a false,
we will acquire the spinlock as the following code does, there are two
scenarios:
- pi_wakeup_interrupt() acquires the spinlock before us, then when
we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
- We get the spinlock before pi_wakeup_interrupt(), this time we need
to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
to NR_CPUS.

> 
> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> > +
> > +    /*
> > +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
> was
> > +     * removed from the blocking list while we are acquiring the lock.
> > +     */
> > +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> 
> With you wanting to deal with changes behind your back here,
> isn't there come kind of barrier needed between reading and using
> pi_block_cpu, such that the compiler won't convert the local
> variable accesses into multiple reads of
> v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?

Will think about this.

> 
> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >
> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> >
> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > +
> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> > +
> >      v->arch.schedule_tail    = vmx_do_resume;
> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> >
> > +    if ( iommu_intpost )
> > +        v->arch.vcpu_block = vmx_vcpu_block;
> 
> Considering the locking question above - couldn't this be done only
> when a device gets added, and the pointer zapped upon removal
> of the last device, at once saving the conditional inside the hook?

That is another solution, but I think it make simple thing difficult,
I feel the current logic is simple and clear. Btw, this hook is not
a critical path, another conditional in it should not a big deal. 

> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -802,6 +802,8 @@ void vcpu_block(void)
> >
> >      set_bit(_VPF_blocked, &v->pause_flags);
> >
> > +    arch_vcpu_block(v);
> > +
> >      /* Check for events /after/ blocking: avoids wakeup waiting race. */
> >      if ( local_events_need_delivery() )
> >      {
> 
> ISTR some previous discussion on this placement - wasn't it
> determined that putting it in the else part here would be
> sufficient for your needs and better in terms of overhead?
> Or is there some race possible if moved past the invocation
> of local_events_need_delivery()?

The discussion happened several month ago, it is really hard to recall
every details, I have to look for it from my index. It is really bad to
last for so long for the review.

We need to call arch_vcpu_block() before local_events_need_delivery(),
since VT-d hardware can issue notification event when we are in
arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check
it in local_events_need_delivery(). If we do arch_vcpu_block() in the
else part, we cannot handle this. This is one reason I can recall, I am
not sure whether there are other concerns since it has been really
a long time. The current implementation is fully discussed with scheduler
maintainers.

BTW, I don't think the current implantation introduces
much overhead.

> 
> > @@ -839,6 +841,8 @@ static long do_poll(struct sched_poll *sched_poll)
> >      v->poll_evtchn = -1;
> >      set_bit(v->vcpu_id, d->poll_mask);
> >
> > +    arch_vcpu_block(v);
> > +
> >  #ifndef CONFIG_X86 /* set_bit() implies mb() on x86 */
> >      /* Check for events /after/ setting flags: avoids wakeup waiting race. */
> >      smp_mb();
> 
> Same question here then regarding moving this further down.
> 
> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> 
> > +    unsigned int         pi_block_cpu;
> 
> Looking at all the use sites I wonder whether it wouldn't make for
> easier to read code if you instead stored the lock to acquire. This
> would then perhaps also prepare a road for balancing lists growing
> too large (i.e. rather than having one list per CPU, you could
> then have a variable number of lists, depending on current needs,
> and [kind of] freely select one of them).

Since this patch is still experimental, can we put the "length of the list"
issue in the next stage when we turn it to default on?

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20  7:49     ` Wu, Feng
@ 2016-01-20  8:35       ` Jan Beulich
  2016-01-20 11:12         ` Dario Faggioli
  2016-01-20 11:20         ` Wu, Feng
  0 siblings, 2 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-20  8:35 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, January 18, 2016 11:14 PM
>> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
>> msr, uint64_t msr_content);
>> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
>> > +
>> > +    /*
>> > +     * The vCPU is blocking, we need to add it to one of the per pCPU lists.
>> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use it
>> for
>> > +     * the per-CPU list, we also save it to posted-interrupt descriptor and
>> > +     * make it as the destination of the wake-up notification event.
>> > +     */
>> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
>> > +
>> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
>> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
>> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
>> > +                  &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
>> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
>> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
>> > +
>> > +    ASSERT(!pi_test_sn(pi_desc));
>> > +
>> > +    /*
>> > +     * We don't need to set the 'NDST' field, since it should point to
>> > +     * the same pCPU as v->processor.
>> > +     */
>> > +
>> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>> 
>> Stray blank line between comment and corresponding code.
>> 
>> Also the ASSERT() is rather more disconnected from the write
>> than seems desirable: Wouldn't it be better to cmpxchg() the
>> whole 32-bit value, validating that SN is clear in the result?
> 
> Not quite understand this. The control field is 64 bits, do you
> mean cmpxchg() the whole 64-bit value like follows:
> 
> +        do {
> +            old.control = new.control = pi_desc->control;
> +            new.nv = pi_wakeup_vector;
> +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
> +                  != old.control );
> 
> This a somewhat like the implementation in the earlier versions,
> why do you want to change it back?

Did you read what I've said? I'm worried about the disconnect of
assertion and update: You really care about SN being clear
_after_ the update aiui. And since NDST doesn't change, a 32-bit
CMPXCHG would seem to suffice.

>> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> > +    if ( pi_block_cpu == NR_CPUS )
>> > +        return;
>> 
>> Are this condition and the one in the immediately preceding if()
>> connected in any way?
> 
> I am not sure I understand this correctly. Which one is
> " the one in the immediately preceding if()"?

If you hadn't ripped out too much of the context, it would be a
matter of pointing you back up. Now I have to re-quote the full
code:

+    if ( pi_desc->nv != posted_intr_vector )
+        write_atomic(&pi_desc->nv, posted_intr_vector);
+
+    /* the vCPU is not on any blocking list. */
+    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
+    if ( pi_block_cpu == NR_CPUS )
+        return;

These are the two if()-s the question is about.

>> I.e. could the latter perhaps become an
>> ASSERT() by connecting the two blocks suitably? I'm simply
>> trying to limit the number of different cases to consider mentally
>> when looking at this code ...
> 
> If we get a true from above check, it means the vcpu was removed by
> pi_wakeup_interrupt(), then we just need to return. If we get a false,
> we will acquire the spinlock as the following code does, there are two
> scenarios:
> - pi_wakeup_interrupt() acquires the spinlock before us, then when
> we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
> set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
> - We get the spinlock before pi_wakeup_interrupt(), this time we need
> to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
> to NR_CPUS.

This is all only about the second condition, but the question really is
whether one being true or false may imply the result of other. In
which case this would better be ASSERT()ed than handled by two
conditionals.

>> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >
>> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>> >
>> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>> > +
>> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
>> > +
>> >      v->arch.schedule_tail    = vmx_do_resume;
>> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>> >
>> > +    if ( iommu_intpost )
>> > +        v->arch.vcpu_block = vmx_vcpu_block;
>> 
>> Considering the locking question above - couldn't this be done only
>> when a device gets added, and the pointer zapped upon removal
>> of the last device, at once saving the conditional inside the hook?
> 
> That is another solution, but I think it make simple thing difficult,
> I feel the current logic is simple and clear. Btw, this hook is not
> a critical path, another conditional in it should not a big deal. 

Then you didn't understand: The question isn't this path, but the
path where the hook gets called if non-NULL (and hence the
possibility to avoid such needless calls).

>> > --- a/xen/common/schedule.c
>> > +++ b/xen/common/schedule.c
>> > @@ -802,6 +802,8 @@ void vcpu_block(void)
>> >
>> >      set_bit(_VPF_blocked, &v->pause_flags);
>> >
>> > +    arch_vcpu_block(v);
>> > +
>> >      /* Check for events /after/ blocking: avoids wakeup waiting race. */
>> >      if ( local_events_need_delivery() )
>> >      {
>> 
>> ISTR some previous discussion on this placement - wasn't it
>> determined that putting it in the else part here would be
>> sufficient for your needs and better in terms of overhead?
>> Or is there some race possible if moved past the invocation
>> of local_events_need_delivery()?
> 
> The discussion happened several month ago, it is really hard to recall
> every details, I have to look for it from my index. It is really bad to
> last for so long for the review.

I understand the frustration on your end, but I can only justify
it by the frustration at my end resulting from everyone dumping
out their patch sets with only very few people helping to deal
with that flood. Plus, after previous iterations, I expected the
review to be more involved than it turned out in the end (i.e.
you've done a good job in simplifying things after the feedback
you had received - thanks for that).

> We need to call arch_vcpu_block() before local_events_need_delivery(),
> since VT-d hardware can issue notification event when we are in
> arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check
> it in local_events_need_delivery(). If we do arch_vcpu_block() in the
> else part, we cannot handle this. This is one reason I can recall, I am
> not sure whether there are other concerns since it has been really
> a long time. The current implementation is fully discussed with scheduler
> maintainers.

Okay, this all makes sense. It's just that I don't see how the ON
bit would get checked by local_events_need_delivery(). But that
may in turn be because, with the long turnaround, I've lost some
of the details of the rest of this series.

>> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> 
>> > +    unsigned int         pi_block_cpu;
>> 
>> Looking at all the use sites I wonder whether it wouldn't make for
>> easier to read code if you instead stored the lock to acquire. This
>> would then perhaps also prepare a road for balancing lists growing
>> too large (i.e. rather than having one list per CPU, you could
>> then have a variable number of lists, depending on current needs,
>> and [kind of] freely select one of them).
> 
> Since this patch is still experimental, can we put the "length of the list"
> issue in the next stage when we turn it to default on?

I in no way meant to say that the list length issue needs to be
dealt with right away. All I was saying was that storing a lock
pointer here instead of a CPU number may make that easier (and
might therefore be a good idea to do right away).

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20  8:35       ` Jan Beulich
@ 2016-01-20 11:12         ` Dario Faggioli
  2016-01-20 11:18           ` Wu, Feng
  2016-01-20 11:20         ` Wu, Feng
  1 sibling, 1 reply; 47+ messages in thread
From: Dario Faggioli @ 2016-01-20 11:12 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2792 bytes --]

On Wed, 2016-01-20 at 01:35 -0700, Jan Beulich wrote:
> > > > On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
> > > 
> > We need to call arch_vcpu_block() before
> > local_events_need_delivery(),
> > since VT-d hardware can issue notification event when we are in
> > arch_vcpu_block(), it that happens, 'ON' bit is set, then we will
> > check
> > it in local_events_need_delivery(). If we do arch_vcpu_block() in
> > the
> > else part, we cannot handle this. This is one reason I can recall,
> > I am
> > not sure whether there are other concerns since it has been really
> > a long time. The current implementation is fully discussed with
> > scheduler
> > maintainers.
> 
> Okay, this all makes sense. 
>
Yes, I proposed / asked about that here:
http://bugs.xenproject.org/xen/mid/%3C1445948204.2937.130.camel@citrix.com%3E

As Feng says, the whole idea is doing one last check before actually
blocking, to see whether we can avoid doing so, if an event has arrived
in the meantime.

It can be seen as an optimization (whether a really effective one, I
can't tell). In fact, an event can well come just immediately after
local_events_need_delivery() has returned false!

We somehow do the same thing already in vcpu_block(), and consistency
with that was what convinced me that we can live with Feng's code as it
is in this patch.

> It's just that I don't see how the ON
> bit would get checked by local_events_need_delivery(). But that
> may in turn be because, with the long turnaround, I've lost some
> of the details of the rest of this series.
> 
I think what Feng means is "if that happens [== that an event arrives],
'ON' bit is set, then we will check it [== whether an event has
arrived] in local_events_need_delivery()."

If *no* event has arrived, no need to do anything particular, we can
just block. If an event *has* arrived in time for
local_events_need_delivery() to see it, we avoid blocking. We do need
to rollback what arch_vcpu_block() did, and that is done while leaving
the hypervisor, as soon as we realize we haven't blocked.

If we put the hook in 'else', we'd always go down the full blocking
path, even if an event became pending while arranging for that, and
we'll have to be woken again, as soon as the event is handled. How
effective an optimization this is depends on how probable and frequent
it is that we get interrupts during the execution of the hook, which,
as said, it's not easy to either just tell or measure.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 11:12         ` Dario Faggioli
@ 2016-01-20 11:18           ` Wu, Feng
  0 siblings, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2016-01-20 11:18 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, January 20, 2016 7:13 PM
> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; Keir Fraser <keir@xen.org>
> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> On Wed, 2016-01-20 at 01:35 -0700, Jan Beulich wrote:
> > > > > On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
> > > >
> 
> If we put the hook in 'else', we'd always go down the full blocking
> path, even if an event became pending while arranging for that, and
> we'll have to be woken again, as soon as the event is handled. How
> effective an optimization this is depends on how probable and frequent
> it is that we get interrupts during the execution of the hook, which,
> as said, it's not easy to either just tell or measure.

Thanks a lot for your explanation, Dario.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20  8:35       ` Jan Beulich
  2016-01-20 11:12         ` Dario Faggioli
@ 2016-01-20 11:20         ` Wu, Feng
  2016-01-20 11:35           ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-20 11:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, January 20, 2016 4:35 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; George Dunlap <george.dunlap@eu.citrix.com>;
> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
> <keir@xen.org>
> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> >>> On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, January 18, 2016 11:14 PM
> >> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
> >> msr, uint64_t msr_content);
> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> >> > +
> >> > +    /*
> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
> lists.
> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
> it
> >> for
> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
> and
> >> > +     * make it as the destination of the wake-up notification event.
> >> > +     */
> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> >> > +
> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> >> > +                  &per_cpu(pi_blocked_vcpu, v-
> >arch.hvm_vmx.pi_block_cpu));
> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> >> > +
> >> > +    ASSERT(!pi_test_sn(pi_desc));
> >> > +
> >> > +    /*
> >> > +     * We don't need to set the 'NDST' field, since it should point to
> >> > +     * the same pCPU as v->processor.
> >> > +     */
> >> > +
> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> >>
> >> Stray blank line between comment and corresponding code.
> >>
> >> Also the ASSERT() is rather more disconnected from the write
> >> than seems desirable: Wouldn't it be better to cmpxchg() the
> >> whole 32-bit value, validating that SN is clear in the result?
> >
> > Not quite understand this. The control field is 64 bits, do you
> > mean cmpxchg() the whole 64-bit value like follows:
> >
> > +        do {
> > +            old.control = new.control = pi_desc->control;
> > +            new.nv = pi_wakeup_vector;
> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
> > +                  != old.control );
> >
> > This a somewhat like the implementation in the earlier versions,
> > why do you want to change it back?
> 
> Did you read what I've said? I'm worried about the disconnect of
> assertion and update: You really care about SN being clear
> _after_ the update aiui. 

No, the ASSERT has no connection with the update. the SN bit should
be clear before updating pi_desc->nv, adding ASSERT here is just kind
of protection code.

> And since NDST doesn't change, a 32-bit
> CMPXCHG would seem to suffice.
> 
> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >> > +    if ( pi_block_cpu == NR_CPUS )
> >> > +        return;
> >>
> >> Are this condition and the one in the immediately preceding if()
> >> connected in any way?
> >
> > I am not sure I understand this correctly. Which one is
> > " the one in the immediately preceding if()"?
> 
> If you hadn't ripped out too much of the context, it would be a
> matter of pointing you back up. Now I have to re-quote the full
> code:
> 
> +    if ( pi_desc->nv != posted_intr_vector )
> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> +
> +    /* the vCPU is not on any blocking list. */
> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> +    if ( pi_block_cpu == NR_CPUS )
> +        return;
> 
> These are the two if()-s the question is about.
> 
> >> I.e. could the latter perhaps become an
> >> ASSERT() by connecting the two blocks suitably? I'm simply
> >> trying to limit the number of different cases to consider mentally
> >> when looking at this code ...
> >
> > If we get a true from above check, it means the vcpu was removed by
> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
> > we will acquire the spinlock as the following code does, there are two
> > scenarios:
> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
> > to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
> > to NR_CPUS.
> 
> This is all only about the second condition, but the question really is
> whether one being true or false may imply the result of other. In
> which case this would better be ASSERT()ed than handled by two
> conditionals.

Thanks for the explanation. I don't think there are any connections
Between " if ( pi_desc->nv != posted_intr_vector )" and
" if ( pi_block_cpu == NR_CPUS )". 

> 
> >> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >> >
> >> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
> >> >
> >> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> >> > +
> >> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
> >> > +
> >> >      v->arch.schedule_tail    = vmx_do_resume;
> >> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
> >> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
> >> >
> >> > +    if ( iommu_intpost )
> >> > +        v->arch.vcpu_block = vmx_vcpu_block;
> >>
> >> Considering the locking question above - couldn't this be done only
> >> when a device gets added, and the pointer zapped upon removal
> >> of the last device, at once saving the conditional inside the hook?
> >
> > That is another solution, but I think it make simple thing difficult,
> > I feel the current logic is simple and clear. Btw, this hook is not
> > a critical path, another conditional in it should not a big deal.
> 
> Then you didn't understand: The question isn't this path, but the
> path where the hook gets called if non-NULL (and hence the
> possibility to avoid such needless calls).

I understand you mean the overhead happens when the hooks
is called. My point is the hook is not called in a critical path, so I doubt
whether it worth doing so to make the logic complex.

> 
> >> > --- a/xen/common/schedule.c
> >> > +++ b/xen/common/schedule.c
> >> > @@ -802,6 +802,8 @@ void vcpu_block(void)
> >> >
> >> >      set_bit(_VPF_blocked, &v->pause_flags);
> >> >
> >> > +    arch_vcpu_block(v);
> >> > +
> >> >      /* Check for events /after/ blocking: avoids wakeup waiting race. */
> >> >      if ( local_events_need_delivery() )
> >> >      {
> >>
> >> ISTR some previous discussion on this placement - wasn't it
> >> determined that putting it in the else part here would be
> >> sufficient for your needs and better in terms of overhead?
> >> Or is there some race possible if moved past the invocation
> >> of local_events_need_delivery()?
> >
> > The discussion happened several month ago, it is really hard to recall
> > every details, I have to look for it from my index. It is really bad to
> > last for so long for the review.
> 
> I understand the frustration on your end, but I can only justify
> it by the frustration at my end resulting from everyone dumping
> out their patch sets with only very few people helping to deal
> with that flood. Plus, after previous iterations, I expected the
> review to be more involved than it turned out in the end (i.e.
> you've done a good job in simplifying things after the feedback
> you had received - thanks for that).
> 
> > We need to call arch_vcpu_block() before local_events_need_delivery(),
> > since VT-d hardware can issue notification event when we are in
> > arch_vcpu_block(), it that happens, 'ON' bit is set, then we will check
> > it in local_events_need_delivery(). If we do arch_vcpu_block() in the
> > else part, we cannot handle this. This is one reason I can recall, I am
> > not sure whether there are other concerns since it has been really
> > a long time. The current implementation is fully discussed with scheduler
> > maintainers.
> 
> Okay, this all makes sense. It's just that I don't see how the ON
> bit would get checked by local_events_need_delivery(). But that
> may in turn be because, with the long turnaround, I've lost some
> of the details of the rest of this series.

Here is how ON is checked:
local_events_need_delivery() -->
	hvm_local_events_need_delivery() -->
		hvm_vcpu_has_pending_irq() -->
			vlapic_has_pending_irq() -->
				vlapic_find_highest_irr() -->
					vmx_sync_pir_to_irr() -->
						pi_test_and_clear_on()

> 
> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> >>
> >> > +    unsigned int         pi_block_cpu;
> >>
> >> Looking at all the use sites I wonder whether it wouldn't make for
> >> easier to read code if you instead stored the lock to acquire. This
> >> would then perhaps also prepare a road for balancing lists growing
> >> too large (i.e. rather than having one list per CPU, you could
> >> then have a variable number of lists, depending on current needs,
> >> and [kind of] freely select one of them).
> >
> > Since this patch is still experimental, can we put the "length of the list"
> > issue in the next stage when we turn it to default on?
> 
> I in no way meant to say that the list length issue needs to be
> dealt with right away. All I was saying was that storing a lock
> pointer here instead of a CPU number may make that easier (and
> might therefore be a good idea to do right away).

Do you mean storing a pointer to ' pi_blocked_vcpu_lock ' here?
Are you strong with this method. I will think about it a bit more to
see whether it affects current implementation a lot.

Thanks,
Feng

> 
> Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 11:20         ` Wu, Feng
@ 2016-01-20 11:35           ` Jan Beulich
  2016-01-20 13:30             ` Dario Faggioli
  2016-01-20 13:48             ` Wu, Feng
  0 siblings, 2 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-20 11:35 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, January 20, 2016 4:35 PM
>> >>> On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: Monday, January 18, 2016 11:14 PM
>> >> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
>> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned int
>> >> msr, uint64_t msr_content);
>> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
>> >> > +
>> >> > +    /*
>> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
>> lists.
>> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and use
>> it
>> >> for
>> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
>> and
>> >> > +     * make it as the destination of the wake-up notification event.
>> >> > +     */
>> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
>> >> > +
>> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
>> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
>> >> > +                  &per_cpu(pi_blocked_vcpu, v-
>> >arch.hvm_vmx.pi_block_cpu));
>> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
>> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> > +
>> >> > +    ASSERT(!pi_test_sn(pi_desc));
>> >> > +
>> >> > +    /*
>> >> > +     * We don't need to set the 'NDST' field, since it should point to
>> >> > +     * the same pCPU as v->processor.
>> >> > +     */
>> >> > +
>> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>> >>
>> >> Stray blank line between comment and corresponding code.
>> >>
>> >> Also the ASSERT() is rather more disconnected from the write
>> >> than seems desirable: Wouldn't it be better to cmpxchg() the
>> >> whole 32-bit value, validating that SN is clear in the result?
>> >
>> > Not quite understand this. The control field is 64 bits, do you
>> > mean cmpxchg() the whole 64-bit value like follows:
>> >
>> > +        do {
>> > +            old.control = new.control = pi_desc->control;
>> > +            new.nv = pi_wakeup_vector;
>> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
>> > +                  != old.control );
>> >
>> > This a somewhat like the implementation in the earlier versions,
>> > why do you want to change it back?
>> 
>> Did you read what I've said? I'm worried about the disconnect of
>> assertion and update: You really care about SN being clear
>> _after_ the update aiui. 
> 
> No, the ASSERT has no connection with the update. the SN bit should
> be clear before updating pi_desc->nv, adding ASSERT here is just kind
> of protection code.

And SN can't get set behind your back between the ASSERT() and
the update? If the answer is "yes", then the code is indeed fine as is.

>> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> >> > +    if ( pi_block_cpu == NR_CPUS )
>> >> > +        return;
>> >>
>> >> Are this condition and the one in the immediately preceding if()
>> >> connected in any way?
>> >
>> > I am not sure I understand this correctly. Which one is
>> > " the one in the immediately preceding if()"?
>> 
>> If you hadn't ripped out too much of the context, it would be a
>> matter of pointing you back up. Now I have to re-quote the full
>> code:
>> 
>> +    if ( pi_desc->nv != posted_intr_vector )
>> +        write_atomic(&pi_desc->nv, posted_intr_vector);
>> +
>> +    /* the vCPU is not on any blocking list. */
>> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> +    if ( pi_block_cpu == NR_CPUS )
>> +        return;
>> 
>> These are the two if()-s the question is about.
>> 
>> >> I.e. could the latter perhaps become an
>> >> ASSERT() by connecting the two blocks suitably? I'm simply
>> >> trying to limit the number of different cases to consider mentally
>> >> when looking at this code ...
>> >
>> > If we get a true from above check, it means the vcpu was removed by
>> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
>> > we will acquire the spinlock as the following code does, there are two
>> > scenarios:
>> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
>> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
>> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
>> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
>> > to remove the vCPU from the list and set 'v->arch.hvm_vmx.pi_block_cpu'
>> > to NR_CPUS.
>> 
>> This is all only about the second condition, but the question really is
>> whether one being true or false may imply the result of other. In
>> which case this would better be ASSERT()ed than handled by two
>> conditionals.
> 
> Thanks for the explanation. I don't think there are any connections
> Between " if ( pi_desc->nv != posted_intr_vector )" and
> " if ( pi_block_cpu == NR_CPUS )". 

Well, it would have seemed to me that some of the four possible
combinations can't validly occur, e.g. due to NV always having a
particular value when pi_block_cpu != NR_CPUS.

>> >> > @@ -106,10 +230,17 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>> >> >
>> >> >      spin_lock_init(&v->arch.hvm_vmx.vmcs_lock);
>> >> >
>> >> > +    INIT_LIST_HEAD(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
>> >> > +
>> >> > +    v->arch.hvm_vmx.pi_block_cpu = NR_CPUS;
>> >> > +
>> >> >      v->arch.schedule_tail    = vmx_do_resume;
>> >> >      v->arch.ctxt_switch_from = vmx_ctxt_switch_from;
>> >> >      v->arch.ctxt_switch_to   = vmx_ctxt_switch_to;
>> >> >
>> >> > +    if ( iommu_intpost )
>> >> > +        v->arch.vcpu_block = vmx_vcpu_block;
>> >>
>> >> Considering the locking question above - couldn't this be done only
>> >> when a device gets added, and the pointer zapped upon removal
>> >> of the last device, at once saving the conditional inside the hook?
>> >
>> > That is another solution, but I think it make simple thing difficult,
>> > I feel the current logic is simple and clear. Btw, this hook is not
>> > a critical path, another conditional in it should not a big deal.
>> 
>> Then you didn't understand: The question isn't this path, but the
>> path where the hook gets called if non-NULL (and hence the
>> possibility to avoid such needless calls).
> 
> I understand you mean the overhead happens when the hooks
> is called. My point is the hook is not called in a critical path, so I doubt
> whether it worth doing so to make the logic complex.

Are you sure scheduling code is not a critical path?

>> >> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
>> >> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
>> >>
>> >> > +    unsigned int         pi_block_cpu;
>> >>
>> >> Looking at all the use sites I wonder whether it wouldn't make for
>> >> easier to read code if you instead stored the lock to acquire. This
>> >> would then perhaps also prepare a road for balancing lists growing
>> >> too large (i.e. rather than having one list per CPU, you could
>> >> then have a variable number of lists, depending on current needs,
>> >> and [kind of] freely select one of them).
>> >
>> > Since this patch is still experimental, can we put the "length of the list"
>> > issue in the next stage when we turn it to default on?
>> 
>> I in no way meant to say that the list length issue needs to be
>> dealt with right away. All I was saying was that storing a lock
>> pointer here instead of a CPU number may make that easier (and
>> might therefore be a good idea to do right away).
> 
> Do you mean storing a pointer to ' pi_blocked_vcpu_lock ' here?

Whatever you name it, but yes.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 11:35           ` Jan Beulich
@ 2016-01-20 13:30             ` Dario Faggioli
  2016-01-20 13:42               ` Wu, Feng
  2016-01-25  5:26               ` Wu, Feng
  2016-01-20 13:48             ` Wu, Feng
  1 sibling, 2 replies; 47+ messages in thread
From: Dario Faggioli @ 2016-01-20 13:30 UTC (permalink / raw)
  To: Jan Beulich, Feng Wu
  Cc: George Dunlap, Andrew Cooper, Kevin Tian, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1051 bytes --]

On Wed, 2016-01-20 at 04:35 -0700, Jan Beulich wrote:
> > > > On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
> > > 
> > > Then you didn't understand: The question isn't this path, but the
> > > path where the hook gets called if non-NULL (and hence the
> > > possibility to avoid such needless calls).
> > 
> > I understand you mean the overhead happens when the hooks
> > is called. My point is the hook is not called in a critical path,
> > so I doubt
> > whether it worth doing so to make the logic complex.
> 
> Are you sure scheduling code is not a critical path?
> 
TBH, I like Jan's point... It's always good to make all we can to avoid
calling the hook, if unnecessary.

Does it really complicates things a lot? Feng, can you give it a try?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 13:30             ` Dario Faggioli
@ 2016-01-20 13:42               ` Wu, Feng
  2016-01-25  5:26               ` Wu, Feng
  1 sibling, 0 replies; 47+ messages in thread
From: Wu, Feng @ 2016-01-20 13:42 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, January 20, 2016 9:31 PM
> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; Keir Fraser <keir@xen.org>
> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> On Wed, 2016-01-20 at 04:35 -0700, Jan Beulich wrote:
> > > > > On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
> > > >
> > > > Then you didn't understand: The question isn't this path, but the
> > > > path where the hook gets called if non-NULL (and hence the
> > > > possibility to avoid such needless calls).
> > >
> > > I understand you mean the overhead happens when the hooks
> > > is called. My point is the hook is not called in a critical path,
> > > so I doubt
> > > whether it worth doing so to make the logic complex.
> >
> > Are you sure scheduling code is not a critical path?
> >
> TBH, I like Jan's point... It's always good to make all we can to avoid
> calling the hook, if unnecessary.
> 
> Does it really complicates things a lot? Feng, can you give it a try?

Sure, I will try it and discuss with your guys if I meet some issues in
future.

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 11:35           ` Jan Beulich
  2016-01-20 13:30             ` Dario Faggioli
@ 2016-01-20 13:48             ` Wu, Feng
  2016-01-20 14:03               ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-20 13:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Wu, Feng



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, January 20, 2016 7:36 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; George Dunlap <george.dunlap@eu.citrix.com>;
> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
> <keir@xen.org>
> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> >>> On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, January 20, 2016 4:35 PM
> >> >>> On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >> Sent: Monday, January 18, 2016 11:14 PM
> >> >> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
> >> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned
> int
> >> >> msr, uint64_t msr_content);
> >> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
> >> >> > +
> >> >> > +    /*
> >> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
> >> lists.
> >> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and
> use
> >> it
> >> >> for
> >> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
> >> and
> >> >> > +     * make it as the destination of the wake-up notification event.
> >> >> > +     */
> >> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
> >> >> > +
> >> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> >> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
> >> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
> >> >> > +                  &per_cpu(pi_blocked_vcpu, v-
> >> >arch.hvm_vmx.pi_block_cpu));
> >> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> >> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
> >> >> > +
> >> >> > +    ASSERT(!pi_test_sn(pi_desc));
> >> >> > +
> >> >> > +    /*
> >> >> > +     * We don't need to set the 'NDST' field, since it should point to
> >> >> > +     * the same pCPU as v->processor.
> >> >> > +     */
> >> >> > +
> >> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
> >> >>
> >> >> Stray blank line between comment and corresponding code.
> >> >>
> >> >> Also the ASSERT() is rather more disconnected from the write
> >> >> than seems desirable: Wouldn't it be better to cmpxchg() the
> >> >> whole 32-bit value, validating that SN is clear in the result?
> >> >
> >> > Not quite understand this. The control field is 64 bits, do you
> >> > mean cmpxchg() the whole 64-bit value like follows:
> >> >
> >> > +        do {
> >> > +            old.control = new.control = pi_desc->control;
> >> > +            new.nv = pi_wakeup_vector;
> >> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
> >> > +                  != old.control );
> >> >
> >> > This a somewhat like the implementation in the earlier versions,
> >> > why do you want to change it back?
> >>
> >> Did you read what I've said? I'm worried about the disconnect of
> >> assertion and update: You really care about SN being clear
> >> _after_ the update aiui.
> >
> > No, the ASSERT has no connection with the update. the SN bit should
> > be clear before updating pi_desc->nv, adding ASSERT here is just kind
> > of protection code.
> 
> And SN can't get set behind your back between the ASSERT() and
> the update?

Yes.

> If the answer is "yes", then the code is indeed fine as is.
> 
> >> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >> >> > +    if ( pi_block_cpu == NR_CPUS )
> >> >> > +        return;
> >> >>
> >> >> Are this condition and the one in the immediately preceding if()
> >> >> connected in any way?
> >> >
> >> > I am not sure I understand this correctly. Which one is
> >> > " the one in the immediately preceding if()"?
> >>
> >> If you hadn't ripped out too much of the context, it would be a
> >> matter of pointing you back up. Now I have to re-quote the full
> >> code:
> >>
> >> +    if ( pi_desc->nv != posted_intr_vector )
> >> +        write_atomic(&pi_desc->nv, posted_intr_vector);
> >> +
> >> +    /* the vCPU is not on any blocking list. */
> >> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >> +    if ( pi_block_cpu == NR_CPUS )
> >> +        return;
> >>
> >> These are the two if()-s the question is about.
> >>
> >> >> I.e. could the latter perhaps become an
> >> >> ASSERT() by connecting the two blocks suitably? I'm simply
> >> >> trying to limit the number of different cases to consider mentally
> >> >> when looking at this code ...
> >> >
> >> > If we get a true from above check, it means the vcpu was removed by
> >> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
> >> > we will acquire the spinlock as the following code does, there are two
> >> > scenarios:
> >> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
> >> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
> >> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
> >> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
> >> > to remove the vCPU from the list and set 'v-
> >arch.hvm_vmx.pi_block_cpu'
> >> > to NR_CPUS.
> >>
> >> This is all only about the second condition, but the question really is
> >> whether one being true or false may imply the result of other. In
> >> which case this would better be ASSERT()ed than handled by two
> >> conditionals.
> >
> > Thanks for the explanation. I don't think there are any connections
> > Between " if ( pi_desc->nv != posted_intr_vector )" and
> > " if ( pi_block_cpu == NR_CPUS )".
> 
> Well, it would have seemed to me that some of the four possible
> combinations can't validly occur, e.g. due to NV always having a
> particular value when pi_block_cpu != NR_CPUS.

If pi_desc->nv != posted_intr_vector, we can have pi_block_cpu == NR_CPUS
or pi_block_cpu != NR_CPUS

If pi_desc->nv == posted_intr_vector, I think pi_block_cpu == NR_CPUS

Base on the above fact, how do you think I should add the ASSERT() thing?

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 13:48             ` Wu, Feng
@ 2016-01-20 14:03               ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-20 14:03 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 20.01.16 at 14:48, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, January 20, 2016 7:36 PM
>> To: Wu, Feng <feng.wu@intel.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
>> <dario.faggioli@citrix.com>; George Dunlap <george.dunlap@eu.citrix.com>;
>> Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xen.org; Keir Fraser
>> <keir@xen.org>
>> Subject: RE: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
>> 
>> >>> On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> Sent: Wednesday, January 20, 2016 4:35 PM
>> >> >>> On 20.01.16 at 08:49, <feng.wu@intel.com> wrote:
>> >> >> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >> >> Sent: Monday, January 18, 2016 11:14 PM
>> >> >> >>> On 03.12.15 at 09:35, <feng.wu@intel.com> wrote:
>> >> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> >> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> >> >> > @@ -83,7 +83,131 @@ static int vmx_msr_write_intercept(unsigned
>> int
>> >> >> msr, uint64_t msr_content);
>> >> >> > +    ASSERT(v->arch.hvm_vmx.pi_block_cpu == NR_CPUS);
>> >> >> > +
>> >> >> > +    /*
>> >> >> > +     * The vCPU is blocking, we need to add it to one of the per pCPU
>> >> lists.
>> >> >> > +     * We save v->processor to v->arch.hvm_vmx.pi_block_cpu and
>> use
>> >> it
>> >> >> for
>> >> >> > +     * the per-CPU list, we also save it to posted-interrupt descriptor
>> >> and
>> >> >> > +     * make it as the destination of the wake-up notification event.
>> >> >> > +     */
>> >> >> > +    v->arch.hvm_vmx.pi_block_cpu = v->processor;
>> >> >> > +
>> >> >> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
>> >> >> > +                      v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> >> > +    list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
>> >> >> > +                  &per_cpu(pi_blocked_vcpu, v-
>> >> >arch.hvm_vmx.pi_block_cpu));
>> >> >> > +    spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
>> >> >> > +                           v->arch.hvm_vmx.pi_block_cpu), flags);
>> >> >> > +
>> >> >> > +    ASSERT(!pi_test_sn(pi_desc));
>> >> >> > +
>> >> >> > +    /*
>> >> >> > +     * We don't need to set the 'NDST' field, since it should point to
>> >> >> > +     * the same pCPU as v->processor.
>> >> >> > +     */
>> >> >> > +
>> >> >> > +    write_atomic(&pi_desc->nv, pi_wakeup_vector);
>> >> >>
>> >> >> Stray blank line between comment and corresponding code.
>> >> >>
>> >> >> Also the ASSERT() is rather more disconnected from the write
>> >> >> than seems desirable: Wouldn't it be better to cmpxchg() the
>> >> >> whole 32-bit value, validating that SN is clear in the result?
>> >> >
>> >> > Not quite understand this. The control field is 64 bits, do you
>> >> > mean cmpxchg() the whole 64-bit value like follows:
>> >> >
>> >> > +        do {
>> >> > +            old.control = new.control = pi_desc->control;
>> >> > +            new.nv = pi_wakeup_vector;
>> >> > +        } while ( cmpxchg(&pi_desc->control, old.control, new.control)
>> >> > +                  != old.control );
>> >> >
>> >> > This a somewhat like the implementation in the earlier versions,
>> >> > why do you want to change it back?
>> >>
>> >> Did you read what I've said? I'm worried about the disconnect of
>> >> assertion and update: You really care about SN being clear
>> >> _after_ the update aiui.
>> >
>> > No, the ASSERT has no connection with the update. the SN bit should
>> > be clear before updating pi_desc->nv, adding ASSERT here is just kind
>> > of protection code.
>> 
>> And SN can't get set behind your back between the ASSERT() and
>> the update?
> 
> Yes.
> 
>> If the answer is "yes", then the code is indeed fine as is.
>> 
>> >> >> > +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> >> >> > +    if ( pi_block_cpu == NR_CPUS )
>> >> >> > +        return;
>> >> >>
>> >> >> Are this condition and the one in the immediately preceding if()
>> >> >> connected in any way?
>> >> >
>> >> > I am not sure I understand this correctly. Which one is
>> >> > " the one in the immediately preceding if()"?
>> >>
>> >> If you hadn't ripped out too much of the context, it would be a
>> >> matter of pointing you back up. Now I have to re-quote the full
>> >> code:
>> >>
>> >> +    if ( pi_desc->nv != posted_intr_vector )
>> >> +        write_atomic(&pi_desc->nv, posted_intr_vector);
>> >> +
>> >> +    /* the vCPU is not on any blocking list. */
>> >> +    pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
>> >> +    if ( pi_block_cpu == NR_CPUS )
>> >> +        return;
>> >>
>> >> These are the two if()-s the question is about.
>> >>
>> >> >> I.e. could the latter perhaps become an
>> >> >> ASSERT() by connecting the two blocks suitably? I'm simply
>> >> >> trying to limit the number of different cases to consider mentally
>> >> >> when looking at this code ...
>> >> >
>> >> > If we get a true from above check, it means the vcpu was removed by
>> >> > pi_wakeup_interrupt(), then we just need to return. If we get a false,
>> >> > we will acquire the spinlock as the following code does, there are two
>> >> > scenarios:
>> >> > - pi_wakeup_interrupt() acquires the spinlock before us, then when
>> >> > we get the lock, 'v->arch.hvm_vmx.pi_block_cpu' has already been
>> >> > set to NR_CPUS by pi_wakeup_interrupt(), we need do nothing.
>> >> > - We get the spinlock before pi_wakeup_interrupt(), this time we need
>> >> > to remove the vCPU from the list and set 'v-
>> >arch.hvm_vmx.pi_block_cpu'
>> >> > to NR_CPUS.
>> >>
>> >> This is all only about the second condition, but the question really is
>> >> whether one being true or false may imply the result of other. In
>> >> which case this would better be ASSERT()ed than handled by two
>> >> conditionals.
>> >
>> > Thanks for the explanation. I don't think there are any connections
>> > Between " if ( pi_desc->nv != posted_intr_vector )" and
>> > " if ( pi_block_cpu == NR_CPUS )".
>> 
>> Well, it would have seemed to me that some of the four possible
>> combinations can't validly occur, e.g. due to NV always having a
>> particular value when pi_block_cpu != NR_CPUS.
> 
> If pi_desc->nv != posted_intr_vector, we can have pi_block_cpu == NR_CPUS
> or pi_block_cpu != NR_CPUS
> 
> If pi_desc->nv == posted_intr_vector, I think pi_block_cpu == NR_CPUS
> 
> Base on the above fact, how do you think I should add the ASSERT() thing?

Due to the former case I then indeed think adding an ASSERT()
would harm readability. Thanks for being patient with me.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-18 15:14   ` Jan Beulich
  2016-01-20  7:49     ` Wu, Feng
@ 2016-01-21  9:05     ` Wu, Feng
  2016-01-21 10:34       ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-21  9:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Wu, Feng


> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
> flags);
> > +
> > +    /*
> > +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
> was
> > +     * removed from the blocking list while we are acquiring the lock.
> > +     */
> > +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
> 
> With you wanting to deal with changes behind your back here,
> isn't there come kind of barrier needed between reading and using
> pi_block_cpu, such that the compiler won't convert the local
> variable accesses into multiple reads of
> v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?
> 

Thinking about this more. Seems we call spin_lock_irqsave() before
using  ' v->arch.hvm_vmx.pi_block_cpu ', can spin_lock_irqsave()
used as the serializing instruction here? IIRIC, spin lock operations
can be used for this purpose, right?

Thanks,
Feng

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-21  9:05     ` Wu, Feng
@ 2016-01-21 10:34       ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-21 10:34 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 21.01.16 at 10:05, <feng.wu@intel.com> wrote:
>> > +    spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock, pi_block_cpu),
>> flags);
>> > +
>> > +    /*
>> > +     * v->arch.hvm_vmx.pi_block_cpu == NR_CPUS here means the vCPU
>> was
>> > +     * removed from the blocking list while we are acquiring the lock.
>> > +     */
>> > +    if ( v->arch.hvm_vmx.pi_block_cpu == NR_CPUS )
>> 
>> With you wanting to deal with changes behind your back here,
>> isn't there come kind of barrier needed between reading and using
>> pi_block_cpu, such that the compiler won't convert the local
>> variable accesses into multiple reads of
>> v->arch.hvm_vmx.pi_block_cpu (which iiuc it is allowed to do)?
> 
> Thinking about this more. Seems we call spin_lock_irqsave() before
> using  ' v->arch.hvm_vmx.pi_block_cpu ', can spin_lock_irqsave()
> used as the serializing instruction here? IIRIC, spin lock operations
> can be used for this purpose, right?

The way I placed the comment may have been misleading, and I'm
sorry for that. The comment really is about the use of the local
variable in acquiring the lock (I may have got misguided by local
variable and field name being the same into thinking this latter
access also checked the local variable).

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-20 13:30             ` Dario Faggioli
  2016-01-20 13:42               ` Wu, Feng
@ 2016-01-25  5:26               ` Wu, Feng
  2016-01-25 13:59                 ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Wu, Feng @ 2016-01-25  5:26 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, George Dunlap, Andrew Cooper,
	xen-devel, Wu, Feng



> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Wednesday, January 20, 2016 9:31 PM
> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@eu.citrix.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; Keir Fraser <keir@xen.org>
> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
> 
> On Wed, 2016-01-20 at 04:35 -0700, Jan Beulich wrote:
> > > > > On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
> > > >
> > > > Then you didn't understand: The question isn't this path, but the
> > > > path where the hook gets called if non-NULL (and hence the
> > > > possibility to avoid such needless calls).
> > >
> > > I understand you mean the overhead happens when the hooks
> > > is called. My point is the hook is not called in a critical path,
> > > so I doubt
> > > whether it worth doing so to make the logic complex.
> >
> > Are you sure scheduling code is not a critical path?
> >
> TBH, I like Jan's point... It's always good to make all we can to avoid
> calling the hook, if unnecessary.
> 
> Does it really complicates things a lot? Feng, can you give it a try?

After more thinking about this, it seems to me avoiding the checking
of assigned devices and spinlocks in vmx_pi_switch_from() and
vmx_pi_switch_to() is more meaningful, since this two functions are
the real ones involved in scheduling than arch_vcpu_block(), should 
we also use the method Jan mentioned above to avoid the overhead
in these two function. To achieve this, we have some issues, since
these two functions are not hooks (they are called in other hooks).
vmx_pi_so_resume() has the same problem as well.

Can we do it this way?
- Define the three functions above as hooks in 'v->arch.hvm_vmx'.
- Initial them when the first device is assigned and zap them when the
last assigned device is de-assigned.

Jan and Dario, any ideas?

Thanks,
Feng

> 
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
  2016-01-25  5:26               ` Wu, Feng
@ 2016-01-25 13:59                 ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2016-01-25 13:59 UTC (permalink / raw)
  To: Feng Wu
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel

>>> On 25.01.16 at 06:26, <feng.wu@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
>> Sent: Wednesday, January 20, 2016 9:31 PM
>> To: Jan Beulich <JBeulich@suse.com>; Wu, Feng <feng.wu@intel.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
>> <george.dunlap@eu.citrix.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
>> devel@lists.xen.org; Keir Fraser <keir@xen.org>
>> Subject: Re: [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling
>> 
>> On Wed, 2016-01-20 at 04:35 -0700, Jan Beulich wrote:
>> > > > > On 20.01.16 at 12:20, <feng.wu@intel.com> wrote:
>> > > >
>> > > > Then you didn't understand: The question isn't this path, but the
>> > > > path where the hook gets called if non-NULL (and hence the
>> > > > possibility to avoid such needless calls).
>> > >
>> > > I understand you mean the overhead happens when the hooks
>> > > is called. My point is the hook is not called in a critical path,
>> > > so I doubt
>> > > whether it worth doing so to make the logic complex.
>> >
>> > Are you sure scheduling code is not a critical path?
>> >
>> TBH, I like Jan's point... It's always good to make all we can to avoid
>> calling the hook, if unnecessary.
>> 
>> Does it really complicates things a lot? Feng, can you give it a try?
> 
> After more thinking about this, it seems to me avoiding the checking
> of assigned devices and spinlocks in vmx_pi_switch_from() and
> vmx_pi_switch_to() is more meaningful, since this two functions are
> the real ones involved in scheduling than arch_vcpu_block(), should 
> we also use the method Jan mentioned above to avoid the overhead
> in these two function. To achieve this, we have some issues, since
> these two functions are not hooks (they are called in other hooks).
> vmx_pi_so_resume() has the same problem as well.
> 
> Can we do it this way?
> - Define the three functions above as hooks in 'v->arch.hvm_vmx'.
> - Initial them when the first device is assigned and zap them when the
> last assigned device is de-assigned.
> 
> Jan and Dario, any ideas?

To be honest, questions of this sort irritate me: I simply can't
tell how neat or ugly the result would be without seeing the
result. Use your judgment in such cases, and simply be
prepared to answer questions as to why a specific choice was
made, or how much worse alternatives would have been.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2016-01-25 13:59 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03  8:35 [PATCH v10 0/7] Add VT-d Posted-Interrupts support Feng Wu
2015-12-03  8:35 ` [PATCH v10 1/7] VT-d Posted-intterrupt (PI) design Feng Wu
2015-12-03  8:35 ` [PATCH v10 2/7] vmx: Suppress posting interrupts when 'SN' is set Feng Wu
2015-12-03  8:35 ` [PATCH v10 3/7] vt-d: Add API to update IRTE when VT-d PI is used Feng Wu
2015-12-10 10:52   ` Tian, Kevin
2015-12-03  8:35 ` [PATCH v10 4/7] Update IRTE according to guest interrupt config changes Feng Wu
2015-12-10 10:53   ` Tian, Kevin
2015-12-03  8:35 ` [PATCH v10 5/7] vmx: Properly handle notification event when vCPU is running Feng Wu
2015-12-03  8:35 ` [PATCH v10 6/7] vmx: VT-d posted-interrupt core logic handling Feng Wu
2015-12-10 11:40   ` Tian, Kevin
2015-12-11  1:58     ` Wu, Feng
2015-12-11  2:27       ` Tian, Kevin
2015-12-11  3:12         ` Wu, Feng
2015-12-21  6:43   ` Wu, Feng
2015-12-21  9:03     ` Dario Faggioli
2015-12-21  9:26       ` Wu, Feng
2015-12-23  2:21   ` Dario Faggioli
2015-12-23  2:28     ` Wu, Feng
2015-12-23  5:16       ` Tian, Kevin
2015-12-23  5:18         ` Wu, Feng
2015-12-23  4:58     ` Wu, Feng
2015-12-23 10:09       ` Jan Beulich
2016-01-18  1:20         ` Wu, Feng
2016-01-18  8:40           ` Jan Beulich
2016-01-18  8:45             ` Wu, Feng
2016-01-18  9:03               ` Jan Beulich
2016-01-19  8:48                 ` Wu, Feng
2016-01-18 15:14   ` Jan Beulich
2016-01-20  7:49     ` Wu, Feng
2016-01-20  8:35       ` Jan Beulich
2016-01-20 11:12         ` Dario Faggioli
2016-01-20 11:18           ` Wu, Feng
2016-01-20 11:20         ` Wu, Feng
2016-01-20 11:35           ` Jan Beulich
2016-01-20 13:30             ` Dario Faggioli
2016-01-20 13:42               ` Wu, Feng
2016-01-25  5:26               ` Wu, Feng
2016-01-25 13:59                 ` Jan Beulich
2016-01-20 13:48             ` Wu, Feng
2016-01-20 14:03               ` Jan Beulich
2016-01-21  9:05     ` Wu, Feng
2016-01-21 10:34       ` Jan Beulich
2015-12-03  8:35 ` [PATCH v10 7/7] Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-12-03 11:19 ` [PATCH v10 0/7] Add VT-d Posted-Interrupts support Jan Beulich
2015-12-03 13:54   ` Wu, Feng
2015-12-10 10:48 ` Tian, Kevin
2015-12-10 13:40   ` Wu, Feng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.