linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v3 00/26] Add VT-d Posted-Interrupts support
@ 2014-12-12 15:14 Feng Wu
  2014-12-12 15:14 ` [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU Feng Wu
                   ` (28 more replies)
  0 siblings, 29 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v1->v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
  base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
* Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain patches provided
  by Jiang Liu <jiang.liu@linux.intel.com>

v2->v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo

This patch series is made of the following groups:
1-6: Some preparation changes in iommu and irq component, this is based on the
     new hierarchy irqdomain logic.
7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection,
          command line parameter.
10-17, 22-25: Changes related to KVM itself.
18-20: Changes in VFIO component, this part was previously sent out as
"[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d Posted-Interrupts"
21: x86 irq related changes

Feng Wu (26):
  genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
    VCPU
  iommu: Add new member capability to struct irq_remap_ops
  iommu, x86: Define new irte structure for VT-d Posted-Interrupts
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
  iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Add intel_irq_remapping_capability() for Intel
  iommu, x86: define irq_remapping_cap()
  KVM: change struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Initialize VT-d Posted-Interrupts Descriptor
  KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
  KVM: add interfaces to control PI outside vmx
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  KVM: kvm-vfio: User API for VT-d Posted-Interrupts
  KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
  KVM: x86: kvm-vfio: VT-d posted-interrupts setup
  x86, irq: Define a global vector for VT-d Posted-Interrupts
  KVM: Define a wakeup worker thread for vCPU
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  KVM: Suppress posted-interrupt when 'SN' is set
  iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

 Documentation/kernel-parameters.txt        |   1 +
 Documentation/virtual/kvm/devices/vfio.txt |   9 ++
 arch/x86/include/asm/entry_arch.h          |   2 +
 arch/x86/include/asm/hardirq.h             |   1 +
 arch/x86/include/asm/hw_irq.h              |   2 +
 arch/x86/include/asm/irq_remapping.h       |  11 ++
 arch/x86/include/asm/irq_vectors.h         |   1 +
 arch/x86/include/asm/kvm_host.h            |  12 ++
 arch/x86/kernel/apic/msi.c                 |   1 +
 arch/x86/kernel/entry_64.S                 |   2 +
 arch/x86/kernel/irq.c                      |  27 ++++
 arch/x86/kernel/irqinit.c                  |   2 +
 arch/x86/kvm/Makefile                      |   2 +-
 arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
 arch/x86/kvm/vmx.c                         | 244 ++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                         |  22 ++-
 drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
 drivers/iommu/irq_remapping.c              |  24 ++-
 drivers/iommu/irq_remapping.h              |   8 +
 include/linux/dmar.h                       |  32 ++++
 include/linux/intel-iommu.h                |   1 +
 include/linux/irq.h                        |   7 +
 include/linux/kvm_host.h                   |  46 ++++++
 include/uapi/linux/kvm.h                   |  11 ++
 kernel/irq/chip.c                          |  14 ++
 kernel/irq/manage.c                        |  20 +++
 virt/kvm/irq_comm.c                        |  43 ++++-
 virt/kvm/irqchip.c                         |  11 --
 virt/kvm/kvm_main.c                        |  15 ++
 virt/kvm/vfio.c                            | 107 +++++++++++++
 30 files changed, 795 insertions(+), 28 deletions(-)
 create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 02/26] iommu: Add new member capability to struct irq_remap_ops Feng Wu
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

With Posted-Interrupts support in Intel CPU and IOMMU, an external
interrupt from assigned-devices could be directly delivered to a
virtual CPU in a virtual machine. Instead of hacking KVM and Intel
IOMMU drivers, we propose a platform independent interface to target
an interrupt to a specific virtual CPU in a virtual machine, or set
virtual CPU affinity for an interrupt.

By adopting this new interface and the hierarchy irqdomain, we could
easily support posted-interrupts on Intel platforms, and also provide
flexible enough interfaces for other platforms to support similar
features.

We may also cooperate between set_affinity() and set_vcpu_affinity()
in IRQ core or irq chip drivers.

Here is the usage scenario for this interface:
Guest update MSI/MSI-X interrupt configuration
	-->QEMU and KVM handle this
	-->KVM call this interface (passing posted interrupts descriptor
	   and guest vector)
	-->irq core will transfer the control to IOMMU
	-->IOMMU will do the real work of updating IRTE (IRTE has new
	   format for VT-d Posted-Interrupts)

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 include/linux/irq.h |  4 ++++
 kernel/irq/chip.c   | 14 ++++++++++++++
 kernel/irq/manage.c | 20 ++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index f26e736..83abafc 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -324,6 +324,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
  *				irq_request_resources
  * @irq_compose_msi_msg:	optional to compose message content for MSI
  * @irq_write_msi_msg:	optional to write message content for MSI
+ * @irq_set_vcpu_affinity:	optional to target a virtual CPU in a virtual
+ *				machine
  * @flags:		chip specific flags
  */
 struct irq_chip {
@@ -362,6 +364,7 @@ struct irq_chip {
 
 	void		(*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
 	void		(*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);
+	int		(*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info);
 
 	unsigned long	flags;
 };
@@ -416,6 +419,7 @@ extern void irq_cpu_online(void);
 extern void irq_cpu_offline(void);
 extern int irq_set_affinity_locked(struct irq_data *data,
 				   const struct cpumask *cpumask, bool force);
+extern int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ)
 void irq_move_irq(struct irq_data *data);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 6f1c7a5..fe0908f 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -948,6 +948,20 @@ int irq_chip_retrigger_hierarchy(struct irq_data *data)
 
 	return -ENOSYS;
 }
+
+/**
+ * irq_chip_set_vcpu_affinity_parent - Set vcpu affinity on the parent interrupt
+ * @data:	Pointer to interrupt specific data
+ * @dest:	The vcpu affinity information
+ */
+int irq_chip_set_vcpu_affinity_parent(struct irq_data *data, void *vcpu_info)
+{
+	data = data->parent_data;
+	if (data->chip->irq_set_vcpu_affinity)
+		return data->chip->irq_set_vcpu_affinity(data, vcpu_info);
+
+	return -ENOSYS;
+}
 #endif
 
 /**
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8069237..bd3a1ba 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -247,6 +247,26 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
 
+int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_chip *chip;
+	unsigned long flags;
+	int ret = -ENOSYS;
+
+	if (!desc)
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&desc->lock, flags);
+	chip = desc->irq_data.chip;
+	if (chip && chip->irq_set_vcpu_affinity)
+		ret = chip->irq_set_vcpu_affinity(irq_desc_get_irq_data(desc),
+						  vcpu_info);
+	raw_spin_unlock_irqrestore(&desc->lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(irq_set_vcpu_affinity);
+
 static void irq_affinity_notify(struct work_struct *work)
 {
 	struct irq_affinity_notify *notify =
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 02/26] iommu: Add new member capability to struct irq_remap_ops
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
  2014-12-12 15:14 ` [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2015-01-28 15:22   ` David Woodhouse
  2014-12-12 15:14 ` [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts Feng Wu
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds a new member capability to struct irq_remap_ops,
this new function ops can be used to check whether some
features are supported, such as VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/irq_remapping.h | 4 ++++
 drivers/iommu/irq_remapping.h        | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 6ba2431..f67ae08 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -31,6 +31,10 @@ struct irq_alloc_info;
 
 #ifdef CONFIG_IRQ_REMAP
 
+enum irq_remap_cap {
+	IRQ_POSTING_CAP = 0,
+};
+
 extern void setup_irq_remapping_ops(void);
 extern int irq_remapping_supported(void);
 extern void set_irq_remapping_broken(void);
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 4bd791d..2d991b2 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -28,6 +28,7 @@ struct irq_data;
 struct msi_msg;
 struct irq_domain;
 struct irq_alloc_info;
+enum irq_remap_cap;
 
 extern int disable_irq_remap;
 extern int irq_remap_broken;
@@ -39,6 +40,9 @@ struct irq_remap_ops {
 	/* Check whether Interrupt Remapping is supported */
 	int (*supported)(void);
 
+	/* Check some capability is supported */
+	bool (*capability)(enum irq_remap_cap);
+
 	/* Initializes hardware and makes it ready for remapping interrupts */
 	int  (*prepare)(void);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
  2014-12-12 15:14 ` [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU Feng Wu
  2014-12-12 15:14 ` [v3 02/26] iommu: Add new member capability to struct irq_remap_ops Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2015-01-28 15:26   ` David Woodhouse
  2014-12-12 15:14 ` [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Feng Wu
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Add a new irte_pi structure for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 include/linux/dmar.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 8473756..c7f9cda 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -212,6 +212,38 @@ struct irte {
 	};
 };
 
+struct irte_pi {
+	union {
+		struct {
+			__u64   present		: 1,
+				fpd		: 1,
+				__reserved_1	: 6,
+				avail		: 4,
+				__reserved_2	: 2,
+				urg		: 1,
+				pst		: 1,
+				vector		: 8,
+				__reserved_3	: 14,
+				pda_l		: 26;
+		};
+		__u64 low;
+	};
+
+	union {
+		struct {
+			__u64	sid		: 16,
+				sq		: 2,
+				svt		: 2,
+				__reserved_4	: 12,
+				pda_h		: 32;
+		};
+		__u64 high;
+	};
+};
+
+#define PDA_LOW_BIT    26
+#define PDA_HIGH_BIT   32
+
 enum {
 	IRQ_REMAP_XAPIC_MODE,
 	IRQ_REMAP_X2APIC_MODE,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2014-12-12 15:14 ` [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2015-01-28 15:26   ` David Woodhouse
  2014-12-12 15:14 ` [v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller Feng Wu
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Implement irq_set_vcpu_affinity for intel_ir_chip.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/irq_remapping.h |  5 +++++
 drivers/iommu/intel_irq_remapping.c  | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index f67ae08..f87ac70 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -60,6 +60,11 @@ static inline struct irq_domain *arch_get_ir_parent_domain(void)
 	return x86_vector_domain;
 }
 
+struct vcpu_data {
+	u64 pi_desc_addr;	/* Physical address of PI Descriptor */
+	u32 vector;		/* Guest vector of the interrupt */
+};
+
 #else  /* CONFIG_IRQ_REMAP */
 
 static inline void setup_irq_remapping_ops(void) { }
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f6da3b2..48c2051 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -42,6 +42,7 @@ struct irq_2_iommu {
 struct intel_ir_data {
 	struct irq_2_iommu			irq_2_iommu;
 	struct irte				irte_entry;
+	struct irte_pi                          irte_pi_entry;
 	union {
 		struct msi_msg			msi_entry;
 	};
@@ -1010,10 +1011,44 @@ static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
 	*msg = ir_data->msi_entry;
 }
 
+static int intel_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
+{
+	struct intel_ir_data *ir_data = data->chip_data;
+	struct irte_pi *irte_pi = &ir_data->irte_pi_entry;
+	struct vcpu_data *vcpu_pi_info;
+
+	/* stop posting interrupts, back to remapping mode */
+	if (!vcpu_info)
+		modify_irte(&ir_data->irq_2_iommu, &ir_data->irte_entry);
+	else {
+		vcpu_pi_info = (struct vcpu_data *)vcpu_info;
+		memcpy(irte_pi, &ir_data->irte_entry, sizeof(struct irte));
+
+		irte_pi->urg = 0;
+		irte_pi->vector = vcpu_pi_info->vector;
+		irte_pi->pda_l = (vcpu_pi_info->pi_desc_addr >>
+				 (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
+		irte_pi->pda_h = (vcpu_pi_info->pi_desc_addr >> 32) &
+				 ~(-1UL << PDA_HIGH_BIT);
+
+		irte_pi->__reserved_1 = 0;
+		irte_pi->__reserved_2 = 0;
+		irte_pi->__reserved_3 = 0;
+		irte_pi->__reserved_4 = 0;
+
+		irte_pi->pst = 1;
+
+		modify_irte(&ir_data->irq_2_iommu, (struct irte *)irte_pi);
+	}
+
+	return 0;
+}
+
 static struct irq_chip intel_ir_chip = {
 	.irq_ack = ir_ack_apic_edge,
 	.irq_set_affinity = intel_ir_set_affinity,
 	.irq_compose_msi_msg = intel_ir_compose_msi_msg,
+	.irq_set_vcpu_affinity = intel_ir_set_vcpu_affinity,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2014-12-12 15:14 ` [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Implement irq_set_vcpu_affinity for pci_msi_ir_controller.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c | 1 +
 include/linux/irq.h        | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index da163da..b0ed073 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -152,6 +152,7 @@ static struct irq_chip pci_msi_ir_controller = {
 	.irq_mask		= pci_msi_mask_irq,
 	.irq_ack		= irq_chip_ack_parent,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_set_vcpu_affinity	= irq_chip_set_vcpu_affinity_parent,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 83abafc..5dcaa7f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -464,6 +464,9 @@ extern void irq_chip_eoi_parent(struct irq_data *data);
 extern int irq_chip_set_affinity_parent(struct irq_data *data,
 					const struct cpumask *dest,
 					bool force);
+extern int irq_chip_set_vcpu_affinity_parent(struct irq_data *data,
+					     void *vcpu_info);
+
 #endif
 
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2014-12-12 15:14 ` [v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-18 14:26   ` Zhang, Yang Z
  2015-01-28 15:29   ` David Woodhouse
  2014-12-12 15:14 ` [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability Feng Wu
                   ` (22 subsequent siblings)
  28 siblings, 2 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

We don't need to migrate the irqs for VT-d Posted-Interrupts here.
When 'pst' is set in IRTE, the associated irq will be posted to
guests instead of interrupt remapping. The destination of the
interrupt is set in Posted-Interrupts Descriptor, and the migration
happens during vCPU scheduling.

However, we still update the cached irte here, which can be used
when changing back to remapping mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 48c2051..ab9057a 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -977,6 +977,7 @@ intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
 {
 	struct intel_ir_data *ir_data = data->chip_data;
 	struct irte *irte = &ir_data->irte_entry;
+	struct irte_pi *irte_pi = (struct irte_pi *)irte;
 	struct irq_cfg *cfg = irqd_cfg(data);
 	struct irq_data *parent = data->parent_data;
 	int ret;
@@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	 */
 	irte->vector = cfg->vector;
 	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
-	modify_irte(&ir_data->irq_2_iommu, irte);
+
+	/* We don't need to modify irte if the interrupt is for posting. */
+	if (irte_pi->pst != 1)
+		modify_irte(&ir_data->irq_2_iommu, irte);
 
 	/*
 	 * After this point, all the interrupts will start arriving
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2015-01-28 15:32   ` David Woodhouse
  2014-12-12 15:14 ` [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel Feng Wu
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Add helper function to detect VT-d Posted-Interrupts capability.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 include/linux/intel-iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ecaf3a9..8174ae8 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -87,6 +87,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
 /*
  * Decoding Capability Register
  */
+#define cap_pi_support(c)	(((c) >> 59) & 1)
 #define cap_read_drain(c)	(((c) >> 55) & 1)
 #define cap_write_drain(c)	(((c) >> 54) & 1)
 #define cap_max_amask_val(c)	(((c) >> 48) & 0x3f)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2014-12-12 15:14 ` [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2015-01-28 15:37   ` David Woodhouse
  2014-12-12 15:14 ` [v3 09/26] iommu, x86: define irq_remapping_cap() Feng Wu
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Add the Intel side implementation for capability in
struct irq_remap_ops.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 27 +++++++++++++++++++++++++++
 drivers/iommu/irq_remapping.c       |  2 ++
 drivers/iommu/irq_remapping.h       |  4 ++++
 3 files changed, 33 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index ab9057a..08a7c39 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -652,6 +652,32 @@ error:
 	return -1;
 }
 
+static bool intel_irq_remapping_capability(enum irq_remap_cap cap)
+{
+	struct dmar_drhd_unit *drhd;
+	struct intel_iommu *iommu;
+
+	switch (cap) {
+	case IRQ_POSTING_CAP:
+		/*
+		 * If 1) posted-interrupts is disabled by user
+		 * or 2) irq remapping is disabled, posted-interrupts
+		 * is not supported.
+		 */
+		if (disable_irq_post || !irq_remapping_enabled)
+			return 0;
+
+		for_each_iommu(iommu, drhd)
+			if (!cap_pi_support(iommu->cap))
+				return 0;
+
+		return 1;
+	default:
+		pr_warn("Unknown irq remapping capability.\n");
+		return 0;
+	}
+}
+
 static int ir_parse_one_hpet_scope(struct acpi_dmar_device_scope *scope,
 				   struct intel_iommu *iommu,
 				   struct acpi_dmar_hardware_unit *drhd)
@@ -948,6 +974,7 @@ static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
 
 struct irq_remap_ops intel_irq_remap_ops = {
 	.supported		= intel_irq_remapping_supported,
+	.capability		= intel_irq_remapping_capability,
 	.prepare		= dmar_table_init,
 	.enable			= intel_enable_irq_remapping,
 	.disable		= disable_irq_remapping,
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 3c3da04d..e63e969 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -24,6 +24,8 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
+int disable_irq_post = 1;
+
 static struct irq_remap_ops *remap_ops;
 
 static void irq_remapping_disable_io_apic(void)
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 2d991b2..cb1f46d 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -36,6 +36,8 @@ extern int disable_sourceid_checking;
 extern int no_x2apic_optout;
 extern int irq_remapping_enabled;
 
+extern int disable_irq_post;
+
 struct irq_remap_ops {
 	/* Check whether Interrupt Remapping is supported */
 	int (*supported)(void);
@@ -76,6 +78,8 @@ extern void ir_ack_apic_edge(struct irq_data *data);
 #define disable_irq_remap     1
 #define irq_remap_broken      0
 
+#define disable_irq_post      1
+
 #endif /* CONFIG_IRQ_REMAP */
 
 #endif /* __IRQ_REMAPPING_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 09/26] iommu, x86: define irq_remapping_cap()
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2014-12-12 15:14 ` [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts Feng Wu
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds a new interface irq_remapping_cap() to detect
whether irq remapping supports new features, such as VT-d
Posted-Interrupts. We export this function out, so that KVM
code can check this and use this mechanism properly.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/irq_remapping.h |  2 ++
 drivers/iommu/irq_remapping.c        | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index f87ac70..b3ad067 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -37,6 +37,7 @@ enum irq_remap_cap {
 
 extern void setup_irq_remapping_ops(void);
 extern int irq_remapping_supported(void);
+extern bool irq_remapping_cap(enum irq_remap_cap cap);
 extern void set_irq_remapping_broken(void);
 extern int irq_remapping_prepare(void);
 extern int irq_remapping_enable(void);
@@ -69,6 +70,7 @@ struct vcpu_data {
 
 static inline void setup_irq_remapping_ops(void) { }
 static inline int irq_remapping_supported(void) { return 0; }
+static bool irq_remapping_cap(enum irq_remap_cap cap) { return 0; }
 static inline void set_irq_remapping_broken(void) { }
 static inline int irq_remapping_prepare(void) { return -ENODEV; }
 static inline int irq_remapping_enable(void) { return -ENODEV; }
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index e63e969..b008663 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -103,6 +103,18 @@ int irq_remapping_supported(void)
 	return remap_ops->supported();
 }
 
+bool irq_remapping_cap(enum irq_remap_cap cap)
+{
+	if (disable_irq_post)
+		return 0;
+
+	if (!remap_ops || !remap_ops->capability)
+		return 0;
+
+	return remap_ops->capability(cap);
+}
+EXPORT_SYMBOL_GPL(irq_remapping_cap);
+
 int __init irq_remapping_prepare(void)
 {
 	if (!remap_ops || !remap_ops->prepare)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2014-12-12 15:14 ` [v3 09/26] iommu, x86: define irq_remapping_cap() Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 11/26] KVM: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Change struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3e556c6..abdb84f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -411,8 +411,19 @@ struct nested_vmx {
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
-	u32 control;	/* bit 0 of control is outstanding notification bit */
-	u32 rsvd[7];
+	union {
+		struct {
+			u64	on	: 1,
+				sn	: 1,
+				rsvd_1	: 13,
+				ndm	: 1,
+				nv	: 8,
+				rsvd_2	: 8,
+				ndst	: 32;
+		};
+		u64 control;
+	};
+	u32 rsvd[6];
 } __aligned(64);
 
 static bool pi_test_and_set_on(struct pi_desc *pi_desc)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 11/26] KVM: Add some helper functions for Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2014-12-12 15:14 ` [v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index abdb84f..0b1383e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -408,6 +408,8 @@ struct nested_vmx {
 };
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
+
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
@@ -443,6 +445,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
 }
 
+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+	return clear_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+	return set_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_ON,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
 	unsigned long         host_rsp;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2014-12-12 15:14 ` [v3 11/26] KVM: Add some helper functions for Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-18 15:19   ` Zhang, Yang Z
  2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
                   ` (16 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch initializes the VT-d Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0b1383e..66ca275 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
 #include <asm/perf_event.h>
 #include <asm/debugreg.h>
 #include <asm/kexec.h>
+#include <asm/irq_remapping.h>
 
 #include "trace.h"
 
@@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
 }
 
+static void pi_desc_init(struct vcpu_vmx *vmx)
+{
+	unsigned int dest;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	/*
+	 * Initialize Posted-Interrupt Descriptor
+	 */
+
+	pi_clear_sn(&vmx->pi_desc);
+	vmx->pi_desc.nv = POSTED_INTR_VECTOR;
+
+	/* Physical mode for Notificaiton Event */
+	vmx->pi_desc.ndm = 0;
+	dest = cpu_physical_id(vmx->vcpu.cpu);
+
+	if (x2apic_enabled())
+		vmx->pi_desc.ndst = dest;
+	else
+		vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
+}
+
 /*
  * Sets up the vmcs for emulated real mode.
  */
@@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
 		vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
 		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
+
+		pi_desc_init(vmx);
 	}
 
 	if (ple_gap) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2014-12-12 15:14 ` [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-18 14:49   ` Zhang, Yang Z
  2015-01-09 14:54   ` Radim Krčmář
  2014-12-12 15:14 ` [v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu Feng Wu
                   ` (15 subsequent siblings)
  28 siblings, 2 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch defines a new interface kvm_find_dest_vcpu for
VT-d PI, which can returns the destination vCPU of the
interrupt for guests.

Since VT-d PI cannot handle broadcast/multicast interrupt,
Here we only handle Fixed and Lowest priority interrupts.

The current method of handling guest lowest priority interrtups
is to use a counter 'apic_arb_prio' for each vCPU, we choose the
vCPU with smallest 'apic_arb_prio' and then increase it by 1.
However, for VT-d PI, we cannot re-use this, since we no longer
have control to 'apic_arb_prio' with posted interrupt direct
delivery by Hardware.

Here, we introduce a similar way with 'apic_arb_prio' to handle
guest lowest priority interrtups when VT-d PI is used. Here is the
ideas:
- Each vCPU has a counter 'round_robin_counter'.
- When guests sets an interrupts to lowest priority, we choose
the vCPU with smallest 'round_robin_counter' as the destination,
then increase it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 virt/kvm/irq_comm.c             | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ed0c30..7a41808 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
 	struct kvm_lapic *apic;    /* kernel irqchip context */
 	unsigned long apic_attention;
 	int32_t apic_arb_prio;
+	int32_t round_robin_counter;
 	int mp_state;
 	u64 ia32_misc_enable_msr;
 	bool tpr_access_reporting;
@@ -1093,4 +1094,7 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 
+bool kvm_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			struct kvm_vcpu **dest_vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 963b899..f3c5d69 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -317,6 +317,47 @@ out:
 	return r;
 }
 
+int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
+{
+	return vcpu1->arch.round_robin_counter -
+			vcpu2->arch.round_robin_counter;
+}
+
+bool kvm_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+				struct kvm_vcpu **dest_vcpu)
+{
+	int i, r = 0;
+	struct kvm_vcpu *vcpu, *dest = NULL;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+					irq->dest_id, irq->dest_mode))
+			continue;
+
+		if (!kvm_is_dm_lowest_prio(irq)) {
+			r++;
+			*dest_vcpu = vcpu;
+		} else if (kvm_lapic_enabled(vcpu)) {
+			if (!dest)
+				dest = vcpu;
+			else if (kvm_compare_rr_counter(vcpu, dest) < 0)
+				dest = vcpu;
+		}
+	}
+
+	if (dest) {
+		dest->arch.round_robin_counter++;
+		*dest_vcpu = dest;
+		return true;
+	} else if (r == 1)
+		return true;
+
+	return false;
+}
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
 	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
 	  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 15/26] KVM: add interfaces to control PI outside vmx Feng Wu
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Define a interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx.c              | 12 ++++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a41808..9b45b78 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,7 @@ struct kvm_x86_ops {
 	int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
 
 	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+	u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 66ca275..81f239b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -562,6 +562,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+	return &(to_vmx(vcpu)->pi_desc);
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
@@ -4298,6 +4303,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
 	return;
 }
 
+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+	return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
 /*
  * Set up the vmcs's constant host-state fields, i.e., host-state fields that
  * will not change in the lifetime of the guest.
@@ -9244,6 +9254,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.check_nested_events = vmx_check_nested_events,
 
 	.sched_in = vmx_sched_in,
+
+	.get_pi_desc_addr = vmx_get_pi_desc_addr,
 };
 
 static int __init vmx_init(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 15/26] KVM: add interfaces to control PI outside vmx
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (13 preceding siblings ...)
  2014-12-12 15:14 ` [v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/vmx.c              | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b45b78..cd4b174 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -773,6 +773,9 @@ struct kvm_x86_ops {
 
 	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 	u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+	void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+	void (*pi_set_sn)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81f239b..ee3b735 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -567,6 +567,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
 	return &(to_vmx(vcpu)->pi_desc);
 }
 
+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+	pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+	pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
@@ -9256,6 +9266,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.sched_in = vmx_sched_in,
 
 	.get_pi_desc_addr = vmx_get_pi_desc_addr,
+
+	.pi_clear_sn = vmx_pi_clear_sn,
+	.pi_set_sn = vmx_pi_set_sn,
 };
 
 static int __init vmx_init(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (14 preceding siblings ...)
  2014-12-12 15:14 ` [v3 15/26] KVM: add interfaces to control PI outside vmx Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-17 16:17   ` Paolo Bonzini
  2014-12-12 15:14 ` [v3 17/26] KVM: make kvm_set_msi_irq() public Feng Wu
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 include/linux/kvm_host.h | 19 +++++++++++++++++++
 virt/kvm/irqchip.c       | 11 -----------
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b9659d..cfa85ac 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
 	struct hlist_node link;
 };
 
+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+	struct kvm_kernel_irq_routing_entry *rt_entries;
+	u32 nr_rt_entries;
+	/*
+	 * Array indexed by gsi. Each entry contains list of irq chips
+	 * the gsi is connected to.
+	 */
+	struct hlist_head map[0];
+};
+
+#else
+
+struct kvm_irq_routing_table {};
+
+#endif
+
 #ifndef KVM_PRIVATE_MEM_SLOTS
 #define KVM_PRIVATE_MEM_SLOTS 0
 #endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 7f256f3..cdf29a6 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,17 +31,6 @@
 #include <trace/events/kvm.h>
 #include "irq.h"
 
-struct kvm_irq_routing_table {
-	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
-	struct kvm_kernel_irq_routing_entry *rt_entries;
-	u32 nr_rt_entries;
-	/*
-	 * Array indexed by gsi. Each entry contains list of irq chips
-	 * the gsi is connected to.
-	 */
-	struct hlist_head map[0];
-};
-
 int kvm_irq_map_gsi(struct kvm *kvm,
 		    struct kvm_kernel_irq_routing_entry *entries, int gsi)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 17/26] KVM: make kvm_set_msi_irq() public
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (15 preceding siblings ...)
  2014-12-12 15:14 ` [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-17 17:32   ` Paolo Bonzini
  2014-12-12 15:14 ` [v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts Feng Wu
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 include/linux/kvm_host.h | 2 ++
 virt/kvm/irq_comm.c      | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cfa85ac..5cd4420 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -785,6 +785,8 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
 int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		     struct kvm_lapic_irq *irq);
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index f3c5d69..231671a 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 				   struct kvm_lapic_irq *irq)
 {
 	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (16 preceding siblings ...)
  2014-12-12 15:14 ` [v3 17/26] KVM: make kvm_set_msi_irq() public Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton " Feng Wu
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds and documents two new attributes
KVM_DEV_VFIO_DEVICE_POST_IRQ and KVM_DEV_VFIO_DEVICE_UNPOST_IRQ
in KVM_DEV_VFIO_DEVICE group. The new attributes are used for
VT-d Posted-Interrupts.

When guest OS changes the interrupt configuration for an
assigned device, such as, MSI/MSIx data/address fields,
QEMU will use this IRQ attribute to tell KVM to update the
related IRTE according the VT-d Posted-Interrrupts Specification,
such as, the guest vector should be updated in the related IRTE.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 Documentation/virtual/kvm/devices/vfio.txt |  9 +++++++++
 include/uapi/linux/kvm.h                   | 11 +++++++++++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index f7aff29..ecfbf61 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -42,3 +42,12 @@ activated before VFIO_DEVICE_SET_IRQS has been called to trigger the IRQ
 or associate an eventfd to it. Unforwarding can only be called while the
 signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this condition is
 not satisfied, the command returns an -EBUSY.
+
+  KVM_DEV_VFIO_DEVICE_POST_IRQ: set a VFIO device IRQ as posted
+  KVM_DEV_VFIO_DEVICE_UNPOST_IRQ: set a VFIO device IRQ as remapped
+For this attribute, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When guest OS changes the interrupt configuration for an assigned device,
+such as, MSI/MSIx data/address fields, QEMU will use this IRQ attribute
+to tell KVM to update the related IRTE according the VT-d Posted-Interrrupts
+Specification, such as, the guest vector should be updated in the related IRTE.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a269a42..8f51487 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,6 +949,8 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_DEVICE			2
 #define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
 #define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
+#define   KVM_DEV_VFIO_DEVICE_POST_IRQ				3
+#define   KVM_DEV_VFIO_DEVICE_UNPOST_IRQ			4
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
@@ -973,6 +975,15 @@ struct kvm_arch_forwarded_irq {
 	__u32 gsi; /* gsi, ie. virtual IRQ number */
 };
 
+struct kvm_vfio_dev_irq {
+	__u32	argsz;
+	__u32	fd;		/* file descriptor of the VFIO device */
+	__u32	index;		/* VFIO device IRQ index */
+	__u32	start;
+	__u32	count;
+	__u32	gsi[];		/* gsi, ie. virtual IRQ number */
+};
+
 /*
  * ioctls for VM fds
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (17 preceding siblings ...)
  2014-12-12 15:14 ` [v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup Feng Wu
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch adds the kvm-vfio interface for VT-d Posted-Interrrupts.
When guests update MSI/MSI-x information for an assigned-device,
QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
IRTE for VT-d PI. Userspace program can also use
KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping mode.
This patch implements these IRQ attributes.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 include/linux/kvm_host.h |  20 +++++++++
 virt/kvm/vfio.c          | 107 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5cd4420..ca9a393 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1134,6 +1134,26 @@ static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
 }
 #endif
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+				 uint32_t guest_irq, bool set);
+#else
+static int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+					uint32_t guest_irq, bool set)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 6bc7001..dbc6c3b 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -446,6 +446,99 @@ out:
 	return ret;
 }
 
+static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
+{
+	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
+		u8 pin;
+
+		pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
+		if (pin)
+			return 1;
+	} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX)
+		return pci_msi_vec_count(pdev);
+	else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX)
+		return pci_msix_vec_count(pdev);
+
+	return 0;
+}
+
+static int kvm_vfio_control_pi(struct kvm_device *kdev,
+			       int32_t __user *argp, bool set)
+{
+	struct kvm_vfio_dev_irq pi_info;
+	uint32_t *gsi;
+	unsigned long minsz;
+	struct vfio_device *vdev;
+	struct msi_desc *entry;
+	struct device *dev;
+	struct pci_dev *pdev;
+	int i, max, ret;
+
+	minsz = offsetofend(struct kvm_vfio_dev_irq, count);
+
+	if (copy_from_user(&pi_info, (void __user *)argp, minsz))
+		return -EFAULT;
+
+	if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
+		return -EINVAL;
+
+	vdev = kvm_vfio_get_vfio_device(pi_info.fd);
+	if (IS_ERR(vdev))
+		return PTR_ERR(vdev);
+
+	dev = kvm_vfio_external_base_device(vdev);
+	if (!dev || !dev_is_pci(dev)) {
+		ret = -EFAULT;
+		goto put_vfio_device;
+	}
+
+	pdev = to_pci_dev(dev);
+
+	max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
+	if (max <= 0) {
+		ret = -EFAULT;
+		goto put_vfio_device;
+	}
+
+	if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
+	    pi_info.start >= max || pi_info.start + pi_info.count > max) {
+		ret = -EINVAL;
+		goto put_vfio_device;
+	}
+
+	gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
+			   pi_info.count * sizeof(u32));
+	if (IS_ERR(gsi)) {
+		ret = PTR_ERR(gsi);
+		goto put_vfio_device;
+	}
+
+#ifdef CONFIG_PCI_MSI
+	for (i = 0; i < pi_info.count; i++) {
+		list_for_each_entry(entry, &pdev->msi_list, list) {
+			if (entry->msi_attrib.entry_nr != pi_info.start+i)
+				continue;
+
+			ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
+							   entry->irq,
+							   gsi[i],
+							   set);
+			if (ret)
+				goto free_gsi;
+		}
+	}
+#endif
+
+	ret = 0;
+
+free_gsi:
+	kfree(gsi);
+
+put_vfio_device:
+	kvm_vfio_put_vfio_device(vdev);
+	return ret;
+}
+
 static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
 {
 	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
@@ -456,6 +549,14 @@ static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
 	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
 		ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
 		break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+	case KVM_DEV_VFIO_DEVICE_POST_IRQ:
+		ret = kvm_vfio_control_pi(kdev, argp, 1);
+		break;
+	case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
+		ret = kvm_vfio_control_pi(kdev, argp, 0);
+		break;
+#endif
 	default:
 		ret = -ENXIO;
 	}
@@ -511,6 +612,12 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
 			return 0;
 #endif
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+		case KVM_DEV_VFIO_DEVICE_POST_IRQ:
+		case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
+			return 0;
+#endif
+
 		}
 		break;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (18 preceding siblings ...)
  2014-12-12 15:14 ` [v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton " Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch defines macro __KVM_HAVE_ARCH_KVM_VFIO_POST and
implement kvm_arch_vfio_update_pi_irte for x86 architecture.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/Makefile           |  2 +-
 arch/x86/kvm/kvm_vfio_x86.c     | 77 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cd4b174..13e3e40 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -82,6 +82,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
 		(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
 }
 
+#define __KVM_HAVE_ARCH_KVM_VFIO_POST
+
 #define SELECTOR_TI_MASK (1 << 2)
 #define SELECTOR_RPL_MASK 0x03
 
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2..8809d58 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-			   i8254.o cpuid.o pmu.o
+			   i8254.o cpuid.o pmu.o kvm_vfio_x86.o
 kvm-intel-y		+= vmx.o
 kvm-amd-y		+= svm.o
 
diff --git a/arch/x86/kvm/kvm_vfio_x86.c b/arch/x86/kvm/kvm_vfio_x86.c
new file mode 100644
index 0000000..2ba618e
--- /dev/null
+++ b/arch/x86/kvm/kvm_vfio_x86.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (C) 2014 Intel Corporation.
+ * Authors: Feng Wu <feng.wu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/irq_remapping.h>
+
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+				 uint32_t guest_irq, bool set)
+{
+	struct kvm_kernel_irq_routing_entry *e;
+	struct kvm_irq_routing_table *irq_rt;
+	struct kvm_lapic_irq irq;
+	struct kvm_vcpu *vcpu;
+	struct vcpu_data vcpu_info;
+	int idx, ret = -EINVAL;
+
+	idx = srcu_read_lock(&kvm->irq_srcu);
+	irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+	BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+	hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+		if (e->type != KVM_IRQ_ROUTING_MSI)
+			continue;
+		/*
+		 * VT-d PI cannot support posting multicast/broadcast
+		 * interrupts to a VCPU, we still use interrupt remapping
+		 * for these kind of interrupts.
+		 */
+
+		kvm_set_msi_irq(e, &irq);
+		if (!kvm_find_dest_vcpu(kvm, &irq, &vcpu))
+			continue;
+
+		vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+		vcpu_info.vector = irq.vector;
+
+		if (set)
+			ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+		else {
+			/* suppress notification event before unposting */
+			kvm_x86_ops->pi_set_sn(vcpu);
+			ret = irq_set_vcpu_affinity(host_irq, NULL);
+			kvm_x86_ops->pi_clear_sn(vcpu);
+		}
+
+		if (ret < 0) {
+			printk(KERN_INFO "%s: failed to update PI IRTE\n",
+					__func__);
+			goto out;
+		}
+	}
+
+	ret = 0;
+out:
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+	return ret;
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (19 preceding siblings ...)
  2014-12-12 15:14 ` [v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-18 14:54   ` Zhang, Yang Z
                     ` (2 more replies)
  2014-12-12 15:14 ` [v3 22/26] KVM: Define a wakeup worker thread for vCPU Feng Wu
                   ` (7 subsequent siblings)
  28 siblings, 3 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the vCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep vCPU when an external
interrupt from a direct-assigned device happens for that vCPU.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/entry_arch.h  |  2 ++
 arch/x86/include/asm/hardirq.h     |  1 +
 arch/x86/include/asm/hw_irq.h      |  2 ++
 arch/x86/include/asm/irq_vectors.h |  1 +
 arch/x86/kernel/entry_64.S         |  2 ++
 arch/x86/kernel/irq.c              | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/irqinit.c          |  2 ++
 7 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index dc5fa66..27ca0af 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -23,6 +23,8 @@ BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
 #ifdef CONFIG_HAVE_KVM
 BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
 		 smp_kvm_posted_intr_ipi)
+BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
+		 smp_kvm_posted_intr_wakeup_ipi)
 #endif
 
 /*
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b..9866065 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -14,6 +14,7 @@ typedef struct {
 #endif
 #ifdef CONFIG_HAVE_KVM
 	unsigned int kvm_posted_intr_ipis;
+	unsigned int kvm_posted_intr_wakeup_ipis;
 #endif
 	unsigned int x86_platform_ipis;	/* arch dependent */
 	unsigned int apic_perf_irqs;
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e7ae6eb..38fac9b 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -29,6 +29,7 @@
 extern asmlinkage void apic_timer_interrupt(void);
 extern asmlinkage void x86_platform_ipi(void);
 extern asmlinkage void kvm_posted_intr_ipi(void);
+extern asmlinkage void kvm_posted_intr_wakeup_ipi(void);
 extern asmlinkage void error_interrupt(void);
 extern asmlinkage void irq_work_interrupt(void);
 
@@ -92,6 +93,7 @@ extern void trace_call_function_single_interrupt(void);
 #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
 #define trace_reboot_interrupt  reboot_interrupt
 #define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
+#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
 #endif /* CONFIG_TRACING */
 
 struct irq_domain;
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index b26cb12..dca94f2 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,7 @@
 /* Vector for KVM to deliver posted interrupt IPI */
 #ifdef CONFIG_HAVE_KVM
 #define POSTED_INTR_VECTOR		0xf2
+#define POSTED_INTR_WAKEUP_VECTOR	0xf1
 #endif
 
 /*
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e61c14a..a598447 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -960,6 +960,8 @@ apicinterrupt X86_PLATFORM_IPI_VECTOR \
 #ifdef CONFIG_HAVE_KVM
 apicinterrupt3 POSTED_INTR_VECTOR \
 	kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
+apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
+	kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
 #endif
 
 #ifdef CONFIG_X86_MCE_THRESHOLD
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 922d285..47408c3 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_HAVE_KVM
+void (*wakeup_handler_callback)(void) = NULL;
+EXPORT_SYMBOL_GPL(wakeup_handler_callback);
+
 /*
  * Handler for POSTED_INTERRUPT_VECTOR.
  */
@@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
 
 	set_irq_regs(old_regs);
 }
+
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+
+	ack_APIC_irq();
+
+	irq_enter();
+
+	exit_idle();
+
+	inc_irq_stat(kvm_posted_intr_wakeup_ipis);
+
+	if (wakeup_handler_callback)
+		wakeup_handler_callback();
+
+	irq_exit();
+
+	set_irq_regs(old_regs);
+}
+
 #endif
 
 __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 70e181e..844673c 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -144,6 +144,8 @@ static void __init apic_intr_init(void)
 #ifdef CONFIG_HAVE_KVM
 	/* IPI for KVM to deliver posted interrupt */
 	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
+	/* IPI for KVM to deliver interrupt to wake up tasks */
+	alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi);
 #endif
 
 	/* IPI vectors for APIC spurious and error interrupts */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 22/26] KVM: Define a wakeup worker thread for vCPU
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (20 preceding siblings ...)
  2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Define a wakeup worker thread for a vCPU.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c      | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ca9a393..3d7242c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -249,6 +249,7 @@ struct kvm_vcpu {
 	int sigset_active;
 	sigset_t sigset;
 	struct kvm_vcpu_stat stat;
+	struct work_struct wakeup_worker;
 
 #ifdef CONFIG_HAS_IOMEM
 	int mmio_needed;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25ffac9..ba53fd6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -211,6 +211,13 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
+static void wakeup_thread(struct work_struct *work)
+{
+	struct kvm_vcpu *vcpu = container_of(work, struct kvm_vcpu,
+				wakeup_worker);
+	kvm_vcpu_kick(vcpu);
+}
+
 int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 {
 	struct page *page;
@@ -224,6 +231,8 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	init_waitqueue_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
+	INIT_WORK(&vcpu->wakeup_worker, wakeup_thread);
+
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
 		r = -ENOMEM;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (21 preceding siblings ...)
  2014-12-12 15:14 ` [v3 22/26] KVM: Define a wakeup worker thread for vCPU Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-17 17:11   ` Paolo Bonzini
  2015-02-23 22:21   ` Marcelo Tosatti
  2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
                   ` (5 subsequent siblings)
  28 siblings, 2 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ee3b735..bf2e6cd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1916,10 +1916,54 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
 		vmx->loaded_vmcs->cpu = cpu;
 	}
+
+	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+		struct pi_desc old, new;
+		unsigned int dest;
+
+		memset(&old, 0, sizeof(old));
+		memset(&new, 0, sizeof(new));
+
+		do {
+			old.control = new.control = pi_desc->control;
+			if (vcpu->cpu != cpu) {
+				dest = cpu_physical_id(cpu);
+
+				if (x2apic_enabled())
+					new.ndst = dest;
+				else
+					new.ndst = (dest << 8) & 0xFF00;
+			}
+
+			pi_clear_sn(&new);
+
+			/* set 'NV' to 'notification vector' */
+			new.nv = POSTED_INTR_VECTOR;
+		} while (cmpxchg(&pi_desc->control, old.control,
+				new.control) != old.control);
+	}
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+		struct pi_desc old, new;
+
+		memset(&old, 0, sizeof(old));
+		memset(&new, 0, sizeof(new));
+
+		/* Set SN when the vCPU is preempted */
+		if (vcpu->preempted) {
+			do {
+				old.control = new.control = pi_desc->control;
+				pi_set_sn(&new);
+			} while (cmpxchg(&pi_desc->control, old.control,
+					new.control) != old.control);
+		}
+	}
+
 	__vmx_load_host_state(to_vmx(vcpu));
 	if (!vmm_exclusive) {
 		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (22 preceding siblings ...)
  2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-17 17:09   ` Paolo Bonzini
                     ` (2 more replies)
  2014-12-12 15:14 ` [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Feng Wu
                   ` (4 subsequent siblings)
  28 siblings, 3 replies; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Clear 'SN'
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/kvm/vmx.c              | 96 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              | 22 +++++++---
 include/linux/kvm_host.h        |  4 ++
 virt/kvm/kvm_main.c             |  6 +++
 5 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 13e3e40..32c110a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
 
 #define ASYNC_PF_PER_VCPU 64
 
+extern void (*wakeup_handler_callback)(void);
+
 enum kvm_reg {
 	VCPU_REGS_RAX = 0,
 	VCPU_REGS_RCX = 1,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bf2e6cd..a1c83a2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
@@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 		struct pi_desc old, new;
 		unsigned int dest;
+		unsigned long flags;
 
 		memset(&old, 0, sizeof(old));
 		memset(&new, 0, sizeof(new));
@@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			new.nv = POSTED_INTR_VECTOR;
 		} while (cmpxchg(&pi_desc->control, old.control,
 				new.control) != old.control);
+
+		/*
+		 * Delete the vCPU from the related wakeup queue
+		 * if we are resuming from blocked state
+		 */
+		if (vcpu->blocked) {
+			vcpu->blocked = false;
+			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+				vcpu->wakeup_cpu), flags);
+			list_del(&vcpu->blocked_vcpu_list);
+			spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+				vcpu->wakeup_cpu), flags);
+			vcpu->wakeup_cpu = -1;
+		}
 	}
 }
 
@@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
 		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 		struct pi_desc old, new;
+		unsigned long flags;
+		int cpu;
+		struct cpumask cpu_others_mask;
 
 		memset(&old, 0, sizeof(old));
 		memset(&new, 0, sizeof(new));
@@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 				pi_set_sn(&new);
 			} while (cmpxchg(&pi_desc->control, old.control,
 					new.control) != old.control);
+		} else if (vcpu->blocked) {
+			/*
+			 * The vcpu is blocked on the wait queue.
+			 * Store the blocked vCPU on the list of the
+			 * vcpu->wakeup_cpu, which is the destination
+			 * of the wake-up notification event.
+			 */
+			vcpu->wakeup_cpu = vcpu->cpu;
+			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+					  vcpu->wakeup_cpu), flags);
+			list_add_tail(&vcpu->blocked_vcpu_list,
+				      &per_cpu(blocked_vcpu_on_cpu,
+				      vcpu->wakeup_cpu));
+			spin_unlock_irqrestore(
+					&per_cpu(blocked_vcpu_on_cpu_lock,
+					vcpu->wakeup_cpu), flags);
+
+			do {
+				old.control = new.control = pi_desc->control;
+
+				/*
+				 * We should not block the vCPU if
+				 * an interrupt is posted for it.
+				 */
+				if (pi_test_on(pi_desc) == 1) {
+					/*
+					 * We need schedule the wakeup worker
+					 * on a different cpu other than
+					 * vcpu->cpu, because in some case,
+					 * schedule_work() will call
+					 * try_to_wake_up() which needs acquire
+					 * the rq lock. This can cause deadlock.
+					 */
+					cpumask_copy(&cpu_others_mask,
+						     cpu_online_mask);
+					cpu_clear(vcpu->cpu, cpu_others_mask);
+					cpu = any_online_cpu(cpu_others_mask);
+
+					schedule_work_on(cpu,
+							 &vcpu->wakeup_worker);
+				}
+
+				pi_clear_sn(&new);
+
+				/* set 'NV' to 'wakeup vector' */
+				new.nv = POSTED_INTR_WAKEUP_VECTOR;
+			} while (cmpxchg(&pi_desc->control, old.control,
+				new.control) != old.control);
 		}
 	}
 
@@ -2842,6 +2915,8 @@ static int hardware_enable(void)
 		return -EBUSY;
 
 	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
 	/*
 	 * Now we can enable the vmclear operation in kdump
@@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.pi_set_sn = vmx_pi_set_sn,
 };
 
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+void wakeup_handler(void)
+{
+	struct kvm_vcpu *vcpu;
+	int cpu = smp_processor_id();
+
+	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+			blocked_vcpu_list) {
+		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+		if (pi_test_on(pi_desc) == 1)
+			kvm_vcpu_kick(vcpu);
+	}
+	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
 static int __init vmx_init(void)
 {
 	int r, i, msr;
@@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
 
 	update_ple_window_actual_max();
 
+	wakeup_handler_callback = wakeup_handler;
+
 	return 0;
 
 out7:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..1551a46 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_reload_apic_access_page(vcpu);
 	}
 
+	/*
+	 * Since posted-interrupts can be set by VT-d HW now, in this
+	 * case, KVM_REQ_EVENT is not set. We move the following
+	 * operations out of the if statement.
+	 */
+	if (kvm_lapic_enabled(vcpu)) {
+		/*
+		 * Update architecture specific hints for APIC
+		 * virtual interrupt delivery.
+		 */
+		if (kvm_x86_ops->hwapic_irr_update)
+			kvm_x86_ops->hwapic_irr_update(vcpu,
+				kvm_lapic_find_highest_irr(vcpu));
+	}
+
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 		kvm_apic_accept_events(vcpu);
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6168,13 +6183,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
-			/*
-			 * Update architecture specific hints for APIC
-			 * virtual interrupt delivery.
-			 */
-			if (kvm_x86_ops->hwapic_irr_update)
-				kvm_x86_ops->hwapic_irr_update(vcpu,
-					kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d7242c..d981d16 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -239,6 +239,9 @@ struct kvm_vcpu {
 	unsigned long requests;
 	unsigned long guest_debug;
 
+	int wakeup_cpu;
+	struct list_head blocked_vcpu_list;
+
 	struct mutex mutex;
 	struct kvm_run *run;
 
@@ -282,6 +285,7 @@ struct kvm_vcpu {
 	} spin_loop;
 #endif
 	bool preempted;
+	bool blocked;
 	struct kvm_vcpu_arch arch;
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba53fd6..6deb994 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -233,6 +233,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 
 	INIT_WORK(&vcpu->wakeup_worker, wakeup_thread);
 
+	vcpu->wakeup_cpu = -1;
+	INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
 		r = -ENOMEM;
@@ -243,6 +246,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	kvm_vcpu_set_in_spin_loop(vcpu, false);
 	kvm_vcpu_set_dy_eligible(vcpu, false);
 	vcpu->preempted = false;
+	vcpu->blocked = false;
 
 	r = kvm_arch_vcpu_init(vcpu);
 	if (r < 0)
@@ -1752,6 +1756,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	DEFINE_WAIT(wait);
 
 	for (;;) {
+		vcpu->blocked = true;
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_arch_vcpu_runnable(vcpu)) {
@@ -1767,6 +1772,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	}
 
 	finish_wait(&vcpu->wq, &wait);
+	vcpu->blocked = false;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (23 preceding siblings ...)
  2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2014-12-12 15:14 ` Feng Wu
  2014-12-17 17:42   ` Paolo Bonzini
  2014-12-12 15:15 ` [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:14 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1c83a2..0aee151 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4401,15 +4401,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
 static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	int r;
+	int r, sn;
 
 	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
 		return;
 
+	/*
+	 * Currently, we don't support urgent interrupt, all interrupts
+	 * are recognized as non-urgent interrupt, so we cannot send
+	 * posted-interrupt when 'SN' is set.
+	 */
+	sn = pi_test_sn(&vmx->pi_desc);
+
 	r = pi_test_and_set_on(&vmx->pi_desc);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 #ifdef CONFIG_SMP
-	if (!r && (vcpu->mode == IN_GUEST_MODE))
+	if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
 		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
 				POSTED_INTR_VECTOR);
 	else
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (24 preceding siblings ...)
  2014-12-12 15:14 ` [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Feng Wu
@ 2014-12-12 15:15 ` Feng Wu
  2015-01-28 15:39   ` David Woodhouse
  2014-12-16  9:04 ` [v3 00/26] Add VT-d Posted-Interrupts support Wu, Feng
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 140+ messages in thread
From: Feng Wu @ 2014-12-12 15:15 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Feng Wu

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 Documentation/kernel-parameters.txt |  1 +
 drivers/iommu/irq_remapping.c       | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 838f377..324b790 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1453,6 +1453,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			nosid	disable Source ID checking
 			no_x2apic_optout
 				BIOS x2APIC opt-out request will be ignored
+			nopost	disable Interrupt Posting
 
 	iomem=		Disable strict checking of access to MMIO memory
 		strict	regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index b008663..aa3cd23 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -24,7 +24,7 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
-int disable_irq_post = 1;
+int disable_irq_post = 0;
 
 static struct irq_remap_ops *remap_ops;
 
@@ -59,14 +59,18 @@ static __init int setup_irqremap(char *str)
 		return -EINVAL;
 
 	while (*str) {
-		if (!strncmp(str, "on", 2))
+		if (!strncmp(str, "on", 2)) {
 			disable_irq_remap = 0;
-		else if (!strncmp(str, "off", 3))
+			disable_irq_post = 0;
+		} else if (!strncmp(str, "off", 3)) {
 			disable_irq_remap = 1;
-		else if (!strncmp(str, "nosid", 5))
+			disable_irq_post = 1;
+		} else if (!strncmp(str, "nosid", 5))
 			disable_sourceid_checking = 1;
 		else if (!strncmp(str, "no_x2apic_optout", 16))
 			no_x2apic_optout = 1;
+		else if (!strncmp(str, "nopost", 6))
+			disable_irq_post = 1;
 
 		str += strcspn(str, ",");
 		while (*str == ',')
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (25 preceding siblings ...)
  2014-12-12 15:15 ` [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2014-12-16  9:04 ` Wu, Feng
  2015-01-06  1:10 ` Wu, Feng
  2015-01-21  2:25 ` Wu, Feng
  28 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-16  9:04 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Hi Paolo,

Could you please have a look at this series? Thanks a lot!

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Friday, December 12, 2014 11:15 PM
> To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> v1->v2:
> * Use VFIO framework to enable this feature, the VFIO part of this series is
>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> * Rebase this patchset on
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>   then revise some irq logic based on the new hierarchy irqdomain patches
> provided
>   by Jiang Liu <jiang.liu@linux.intel.com>
> 
> v2->v3:
> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>   preempted or blocked.
> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> KVM_DEV_VFIO_DEVICE_POST_IRQ
> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> __KVM_HAVE_ARCH_KVM_VFIO_POST
> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>   can be used to change back to remapping mode.
> * Fix typo
> 
> This patch series is made of the following groups:
> 1-6: Some preparation changes in iommu and irq component, this is based on
> the
>      new hierarchy irqdomain logic.
> 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
> detection,
>           command line parameter.
> 10-17, 22-25: Changes related to KVM itself.
> 18-20: Changes in VFIO component, this part was previously sent out as
> "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> Posted-Interrupts"
> 21: x86 irq related changes
> 
> Feng Wu (26):
>   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
>     VCPU
>   iommu: Add new member capability to struct irq_remap_ops
>   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
>   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
>   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
>   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
>   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
>   iommu, x86: Add intel_irq_remapping_capability() for Intel
>   iommu, x86: define irq_remapping_cap()
>   KVM: change struct pi_desc for VT-d Posted-Interrupts
>   KVM: Add some helper functions for Posted-Interrupts
>   KVM: Initialize VT-d Posted-Interrupts Descriptor
>   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
>   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
>   KVM: add interfaces to control PI outside vmx
>   KVM: Make struct kvm_irq_routing_table accessible
>   KVM: make kvm_set_msi_irq() public
>   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
>   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
>   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
>   x86, irq: Define a global vector for VT-d Posted-Interrupts
>   KVM: Define a wakeup worker thread for vCPU
>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>   KVM: Suppress posted-interrupt when 'SN' is set
>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> 
>  Documentation/kernel-parameters.txt        |   1 +
>  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
>  arch/x86/include/asm/entry_arch.h          |   2 +
>  arch/x86/include/asm/hardirq.h             |   1 +
>  arch/x86/include/asm/hw_irq.h              |   2 +
>  arch/x86/include/asm/irq_remapping.h       |  11 ++
>  arch/x86/include/asm/irq_vectors.h         |   1 +
>  arch/x86/include/asm/kvm_host.h            |  12 ++
>  arch/x86/kernel/apic/msi.c                 |   1 +
>  arch/x86/kernel/entry_64.S                 |   2 +
>  arch/x86/kernel/irq.c                      |  27 ++++
>  arch/x86/kernel/irqinit.c                  |   2 +
>  arch/x86/kvm/Makefile                      |   2 +-
>  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
>  arch/x86/kvm/vmx.c                         | 244
> ++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                         |  22 ++-
>  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
>  drivers/iommu/irq_remapping.c              |  24 ++-
>  drivers/iommu/irq_remapping.h              |   8 +
>  include/linux/dmar.h                       |  32 ++++
>  include/linux/intel-iommu.h                |   1 +
>  include/linux/irq.h                        |   7 +
>  include/linux/kvm_host.h                   |  46 ++++++
>  include/uapi/linux/kvm.h                   |  11 ++
>  kernel/irq/chip.c                          |  14 ++
>  kernel/irq/manage.c                        |  20 +++
>  virt/kvm/irq_comm.c                        |  43 ++++-
>  virt/kvm/irqchip.c                         |  11 --
>  virt/kvm/kvm_main.c                        |  15 ++
>  virt/kvm/vfio.c                            | 107 +++++++++++++
>  30 files changed, 795 insertions(+), 28 deletions(-)
>  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> 
> --
> 1.9.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
  2014-12-12 15:14 ` [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
@ 2014-12-17 16:17   ` Paolo Bonzini
  2014-12-19  2:19     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-17 16:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm



On 12/12/2014 16:14, Feng Wu wrote:
> Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
> so we can use it outside of irqchip.c.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  include/linux/kvm_host.h | 19 +++++++++++++++++++
>  virt/kvm/irqchip.c       | 11 -----------
>  2 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0b9659d..cfa85ac 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
>  	struct hlist_node link;
>  };
>  
> +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> +
> +struct kvm_irq_routing_table {
> +	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> +	struct kvm_kernel_irq_routing_entry *rt_entries;
> +	u32 nr_rt_entries;
> +	/*
> +	 * Array indexed by gsi. Each entry contains list of irq chips
> +	 * the gsi is connected to.
> +	 */
> +	struct hlist_head map[0];
> +};
> +
> +#else
> +
> +struct kvm_irq_routing_table {};

If possible, just make this "struct kvm_irq_routing_table;" and pull
this line to include/linux/kvm_types.h.

Paolo

> +
> +#endif
> +
>  #ifndef KVM_PRIVATE_MEM_SLOTS
>  #define KVM_PRIVATE_MEM_SLOTS 0
>  #endif
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 7f256f3..cdf29a6 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -31,17 +31,6 @@
>  #include <trace/events/kvm.h>
>  #include "irq.h"
>  
> -struct kvm_irq_routing_table {
> -	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> -	struct kvm_kernel_irq_routing_entry *rt_entries;
> -	u32 nr_rt_entries;
> -	/*
> -	 * Array indexed by gsi. Each entry contains list of irq chips
> -	 * the gsi is connected to.
> -	 */
> -	struct hlist_head map[0];
> -};
> -
>  int kvm_irq_map_gsi(struct kvm *kvm,
>  		    struct kvm_kernel_irq_routing_entry *entries, int gsi)
>  {
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2014-12-17 17:09   ` Paolo Bonzini
  2014-12-18  3:16     ` Wu, Feng
  2015-02-25 21:50   ` Marcelo Tosatti
  2015-02-26 23:40   ` Marcelo Tosatti
  2 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-17 17:09 UTC (permalink / raw)
  To: Wu, Feng, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, Paolo Bonzini, dwmw2, joro, Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 12/12/2014 16:14, Feng Wu wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
> 
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Clear 'SN'

Should SN be already clear (and NV set to POSTED_INTR_VECTOR)?  Can it
happen that you go from sched-out to blocked without doing a sched-in first?

In fact, if this is possible, what happens if vcpu->preempted &&
vcpu->blocked?

> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> 
> post-block:
> - Remove the vCPU from the per-CPU list

Paolo

> Signed-off-by: Feng Wu <feng.wu@intel.com>

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
@ 2014-12-17 17:11   ` Paolo Bonzini
  2014-12-18  3:15     ` Wu, Feng
  2015-02-23 22:21   ` Marcelo Tosatti
  1 sibling, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-17 17:11 UTC (permalink / raw)
  To: Wu, Feng, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, Paolo Bonzini, dwmw2, joro, Alex Williamson,
	jiang.liu-VuQAYsv1563Yd54FQh9/CA
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 12/12/2014 16:14, Feng Wu wrote:
> +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +		struct pi_desc old, new;
> +		unsigned int dest;
> +
> +		memset(&old, 0, sizeof(old));
> +		memset(&new, 0, sizeof(new));

This is quite expensive.  Just use an u64 for old_control and
new_control, instead of a full struct.

> 
> +			pi_clear_sn(&new);

This can be simply new.sn = 0.  It does not need atomic operations.

Same in patch 24 (if needed at all there---see the reply there).

> 
> +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +		struct pi_desc old, new;
> +
> +		memset(&old, 0, sizeof(old));
> +		memset(&new, 0, sizeof(new));
> +

Here you do not need old/new at all because...

> +		if (vcpu->preempted) {
> +			do {
> +				old.control = new.control = pi_desc->control;
> +				pi_set_sn(&new);
> +			} while (cmpxchg(&pi_desc->control, old.control,
> +					new.control) != old.control);

this can do pi_set_sn directly on pi_desc, without the cmpxchg.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 17/26] KVM: make kvm_set_msi_irq() public
  2014-12-12 15:14 ` [v3 17/26] KVM: make kvm_set_msi_irq() public Feng Wu
@ 2014-12-17 17:32   ` Paolo Bonzini
  0 siblings, 0 replies; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-17 17:32 UTC (permalink / raw)
  To: Wu, Feng, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, Paolo Bonzini, dwmw2, joro, Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 12/12/2014 16:14, Feng Wu wrote:
> Make kvm_set_msi_irq() public, we can use this function outside.
> 
> Signed-off-by: Feng Wu <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  include/linux/kvm_host.h | 2 ++
>  virt/kvm/irq_comm.c      | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cfa85ac..5cd4420 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -785,6 +785,8 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
>  				   struct kvm_irq_ack_notifier *kian);
>  int kvm_request_irq_source_id(struct kvm *kvm);
>  void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> +		     struct kvm_lapic_irq *irq);

This function is now in arch/x86, so please add to
arch/x86/include/asm/kvm_host.h instead.

>  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
>  int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index f3c5d69..231671a 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  	return r;
>  }
>  
> -static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>  				   struct kvm_lapic_irq *irq)
>  {
>  	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-12 15:14 ` [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Feng Wu
@ 2014-12-17 17:42   ` Paolo Bonzini
  2014-12-18  3:14     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-17 17:42 UTC (permalink / raw)
  To: Wu, Feng, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, Paolo Bonzini, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 12/12/2014 16:14, Feng Wu wrote:
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot send
> posted-interrupt when 'SN' is set.

Can this happen?  If the vcpu is in guest mode, it cannot have been
scheduled out, and that's the only case when SN is set.

Paolo

> Signed-off-by: Feng Wu <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  arch/x86/kvm/vmx.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a1c83a2..0aee151 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4401,15 +4401,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
>  static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -	int r;
> +	int r, sn;
>  
>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>  		return;
>  
> +	/*
> +	 * Currently, we don't support urgent interrupt, all interrupts
> +	 * are recognized as non-urgent interrupt, so we cannot send
> +	 * posted-interrupt when 'SN' is set.
> +	 */
> +	sn = pi_test_sn(&vmx->pi_desc);
> +
>  	r = pi_test_and_set_on(&vmx->pi_desc);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  #ifdef CONFIG_SMP
> -	if (!r && (vcpu->mode == IN_GUEST_MODE))
> +	if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
>  		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
>  				POSTED_INTR_VECTOR);
>  	else
> -- 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-17 17:42   ` Paolo Bonzini
@ 2014-12-18  3:14     ` Wu, Feng
  2014-12-18  8:38       ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-18  3:14 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, dwmw2, joro, Alex Williamson, Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 1:43 AM
> To: Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter Anvin; x86@kernel.org;
> Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org; joro@8bytes.org; Alex
> Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
> > Currently, we don't support urgent interrupt, all interrupts
> > are recognized as non-urgent interrupt, so we cannot send
> > posted-interrupt when 'SN' is set.
> 
> Can this happen?  If the vcpu is in guest mode, it cannot have been
> scheduled out, and that's the only case when SN is set.
> 
> Paolo

Currently, the only place where SN is set is vCPU is preempted and waiting for
the next scheduling in the runqueue. But I am not sure whether we need to
set SN for other purpose in future. Adding SN checking here is just to follow
the Spec. non-urgent interrupts are suppressed when SN is set.

Thanks,
Feng

> 
> > Signed-off-by: Feng Wu
> <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > ---
> >  arch/x86/kvm/vmx.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index a1c83a2..0aee151 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -4401,15 +4401,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
> >  static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int
> vector)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > -	int r;
> > +	int r, sn;
> >
> >  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
> >  		return;
> >
> > +	/*
> > +	 * Currently, we don't support urgent interrupt, all interrupts
> > +	 * are recognized as non-urgent interrupt, so we cannot send
> > +	 * posted-interrupt when 'SN' is set.
> > +	 */
> > +	sn = pi_test_sn(&vmx->pi_desc);
> > +
> >  	r = pi_test_and_set_on(&vmx->pi_desc);
> >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >  #ifdef CONFIG_SMP
> > -	if (!r && (vcpu->mode == IN_GUEST_MODE))
> > +	if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
> >  		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
> >  				POSTED_INTR_VECTOR);
> >  	else
> > --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-17 17:11   ` Paolo Bonzini
@ 2014-12-18  3:15     ` Wu, Feng
  2014-12-18  8:32       ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-18  3:15 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, dwmw2, joro, Alex Williamson,
	jiang.liu-VuQAYsv1563Yd54FQh9/CA
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Thursday, December 18, 2014 1:11 AM
> To: Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter Anvin; x86@kernel.org;
> Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org; joro@8bytes.org; Alex
> Williamson; jiang.liu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is preempted
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
> > +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +		struct pi_desc old, new;
> > +		unsigned int dest;
> > +
> > +		memset(&old, 0, sizeof(old));
> > +		memset(&new, 0, sizeof(new));
> 
> This is quite expensive.  Just use an u64 for old_control and
> new_control, instead of a full struct.
> 
> >
> > +			pi_clear_sn(&new);
> 
> This can be simply new.sn = 0.  It does not need atomic operations.

Thanks for your comments, Paolo!

If we use u64 new_control, we cannot use new.sn any more.
Maybe we can change the struct pi_desc {} like this:

typedef struct pid_control{
        u64     on      : 1,
        sn      : 1,
        rsvd_1  : 13,
        ndm     : 1,
        nv      : 8,
        rsvd_2  : 8,
        ndst    : 32;
}pid_control_t;

struct pi_desc {
        u32 pir[8];     /* Posted interrupt requested */
		pid_control_t control;
        u32 rsvd[6];
} __aligned(64);


Then we can define pid_control_t new_control, old_control. And use new_control.sn = 0.

What is your opinon?

Thanks,
Feng

> 
> Same in patch 24 (if needed at all there---see the reply there).
> 
> >
> > +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +		struct pi_desc old, new;
> > +
> > +		memset(&old, 0, sizeof(old));
> > +		memset(&new, 0, sizeof(new));
> > +
> 
> Here you do not need old/new at all because...
> 
> > +		if (vcpu->preempted) {
> > +			do {
> > +				old.control = new.control = pi_desc->control;
> > +				pi_set_sn(&new);
> > +			} while (cmpxchg(&pi_desc->control, old.control,
> > +					new.control) != old.control);
> 
> this can do pi_set_sn directly on pi_desc, without the cmpxchg.
> 
> Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-17 17:09   ` Paolo Bonzini
@ 2014-12-18  3:16     ` Wu, Feng
  2014-12-18  8:37       ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-18  3:16 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Gleb Natapov, dwmw2, joro, Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 1:10 AM
> To: Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter Anvin; x86@kernel.org;
> Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org; joro@8bytes.org; Alex
> Williamson
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Clear 'SN'
> 
> Should SN be already clear (and NV set to POSTED_INTR_VECTOR)? 

I think the SN bit should be clear here, Adding it here is just to make sure
SN is clear when vCPU is blocked, so it can receive wakeup notification event later.

> Can it
> happen that you go from sched-out to blocked without doing a sched-in first?
> 

I cannot imagine this scenario, can you please be more specific? Thanks a lot!

> In fact, if this is possible, what happens if vcpu->preempted &&
> vcpu->blocked?

In fact, vcpu->preempted && vcpu->blocked happens sometimes, but I think there is
no issues. Please refer to the following case:

kvm_vcpu_block() 
	-> vcpu->blocked = true;
	-> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);

	before schedule() is called, this vcpu is woken up by another guy, so
	the state of the vcpu associated thread is changed to TASK_RUNNING,
	then preemption happens after interrupts or the following schedule() is
	hit, this will call kvm_sched_out(), in which current->state == TASK_RUNNING
	and vcpu->preempted is set to true. So now vcpu->preempted and vcpu->blocked
	are both true. In vmx_vcpu_put(), we will check vcpu->preempted first, so
	the vCPU will not be blocked, and the vcpu->blocked will be set the false in
	vmx_vcpu_load().

	But maybe I need do a little change to the vmx_vcpu_load() like below:

                /*
                 * Delete the vCPU from the related wakeup queue
                 * if we are resuming from blocked state
                 */
                if (vcpu->blocked) {
                        vcpu->blocked = false;
+						/* if wakeup_cpu == -1, the vcpu is currently not blocked on any
+						  pCPU, don't need dequeue here */
+						if (vcpu->wakeup_cpu != -1) {
               		         spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
                                vcpu->wakeup_cpu), flags);
                    	     list_del(&vcpu->blocked_vcpu_list);
                        	 spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
                                vcpu->wakeup_cpu), flags);
                        	 vcpu->wakeup_cpu = -1;
+						}
                }

Any ideas about this? Thanks a lot!

Thanks,
Feng


	-> schedule();


> 
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> 
> Paolo
> 
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-18  3:15     ` Wu, Feng
@ 2014-12-18  8:32       ` Paolo Bonzini
  2014-12-19  2:09         ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-18  8:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm



On 18/12/2014 04:15, Wu, Feng wrote:
> Thanks for your comments, Paolo!
> 
> If we use u64 new_control, we cannot use new.sn any more.
> Maybe we can change the struct pi_desc {} like this:
> 
> typedef struct pid_control{
>         u64     on      : 1,
>         sn      : 1,
>         rsvd_1  : 13,
>         ndm     : 1,
>         nv      : 8,
>         rsvd_2  : 8,
>         ndst    : 32;
> }pid_control_t;
> 
> struct pi_desc {
>         u32 pir[8];     /* Posted interrupt requested */
> 		pid_control_t control;

Probably something like this to keep the union:

typedef union pid_control {
	u64 full;
	struct {
		u64 on : 1,
		...
	} fields;
};

>         u32 rsvd[6];
> } __aligned(64);
> 
> 
> Then we can define pid_control_t new_control, old_control. And use new_control.sn = 0.
> 
> What is your opinon?

Sure.  Alternatively, keep using struct pi_desc new; just
do not zero it, nor access any field outide the control word.

Paolo


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-18  3:16     ` Wu, Feng
@ 2014-12-18  8:37       ` Paolo Bonzini
  2014-12-19  2:51         ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-18  8:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm



On 18/12/2014 04:16, Wu, Feng wrote:
>>> pre-block:
>>> - Add the vCPU to the blocked per-CPU list
>>> - Clear 'SN'
>>
>> Should SN be already clear (and NV set to POSTED_INTR_VECTOR)? 
> 
> I think the SN bit should be clear here, Adding it here is just to make sure
> SN is clear when vCPU is blocked, so it can receive wakeup notification event later.

Then, please, WARN if the SN bit is set inside the if (vcpu->blocked).
Inside that if you can just add the vCPU to the blocked list on vcpu_put.

>> Can it
>> happen that you go from sched-out to blocked without doing a sched-in first?
>>
> 
> I cannot imagine this scenario, can you please be more specific? Thanks a lot!

I cannot either. :)  But it would be the case where SN is not cleared.
So we agree that it cannot happen.

>> In fact, if this is possible, what happens if vcpu->preempted &&
>> vcpu->blocked?
> 
> In fact, vcpu->preempted && vcpu->blocked happens sometimes, but I think there is
> no issues. Please refer to the following case:

I agree that there should be no issues.  But if it can happen, it's better:

1) to separate the handling of preemption and blocking: preemption
handles SN/NV/NDST, blocking handles the wakeup list.

2) to change this

+		} else if (vcpu->blocked) {
+			/*
+			 * The vcpu is blocked on the wait queue.
+			 * Store the blocked vCPU on the list of the
+			 * vcpu->wakeup_cpu, which is the destination
+			 * of the wake-up notification event.

to just

		}
		if (vcpu->blocked) {
			...
		}
> kvm_vcpu_block() 
> 	-> vcpu->blocked = true;
> 	-> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> 
> 	before schedule() is called, this vcpu is woken up by another guy, so
> 	the state of the vcpu associated thread is changed to TASK_RUNNING,
> 	then preemption happens after interrupts or the following schedule() is
> 	hit, this will call kvm_sched_out(), in which current->state == TASK_RUNNING
> 	and vcpu->preempted is set to true. So now vcpu->preempted and vcpu->blocked
> 	are both true. In vmx_vcpu_put(), we will check vcpu->preempted first, so
> 	the vCPU will not be blocked, and the vcpu->blocked will be set the false in
> 	vmx_vcpu_load().
> 
> 	But maybe I need do a little change to the vmx_vcpu_load() like below:
> 
>                 /*
>                  * Delete the vCPU from the related wakeup queue
>                  * if we are resuming from blocked state
>                  */
>                 if (vcpu->blocked) {
>                         vcpu->blocked = false;
> +						/* if wakeup_cpu == -1, the vcpu is currently not blocked on any
> +						  pCPU, don't need dequeue here */
> +						if (vcpu->wakeup_cpu != -1) {
>                		         spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>                                 vcpu->wakeup_cpu), flags);
>                     	     list_del(&vcpu->blocked_vcpu_list);
>                         	 spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
>                                 vcpu->wakeup_cpu), flags);
>                         	 vcpu->wakeup_cpu = -1;
> +						}
>                 }

Good idea.

Paolo

> Any ideas about this? Thanks a lot!
> 
> Thanks,
> Feng
> 
> 
> 	-> schedule();
> 
> 
>>
>>> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
>>>
>>> post-block:
>>> - Remove the vCPU from the per-CPU list
>>
>> Paolo
>>
>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-18  3:14     ` Wu, Feng
@ 2014-12-18  8:38       ` Paolo Bonzini
  2014-12-18 15:09         ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-18  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm



On 18/12/2014 04:14, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org
>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
>> Sent: Thursday, December 18, 2014 1:43 AM
>> To: Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter Anvin; x86@kernel.org;
>> Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org; joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex
>> Williamson; Jiang Liu
>> Cc: iommu@lists.linux-foundation.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
>> Eric Auger
>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
>>
>>
>>
>> On 12/12/2014 16:14, Feng Wu wrote:
>>> Currently, we don't support urgent interrupt, all interrupts
>>> are recognized as non-urgent interrupt, so we cannot send
>>> posted-interrupt when 'SN' is set.
>>
>> Can this happen?  If the vcpu is in guest mode, it cannot have been
>> scheduled out, and that's the only case when SN is set.
>>
>> Paolo
> 
> Currently, the only place where SN is set is vCPU is preempted and waiting for
> the next scheduling in the runqueue. But I am not sure whether we need to
> set SN for other purpose in future. Adding SN checking here is just to follow
> the Spec. non-urgent interrupts are suppressed when SN is set.

I would change that to a WARN_ON_ONCE then.

Paolo

> Thanks,
> Feng
> 
>>
>>> Signed-off-by: Feng Wu
>> <feng.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>> ---
>>>  arch/x86/kvm/vmx.c | 11 +++++++++--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index a1c83a2..0aee151 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -4401,15 +4401,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
>>>  static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int
>> vector)
>>>  {
>>>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> -	int r;
>>> +	int r, sn;
>>>
>>>  	if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>>>  		return;
>>>
>>> +	/*
>>> +	 * Currently, we don't support urgent interrupt, all interrupts
>>> +	 * are recognized as non-urgent interrupt, so we cannot send
>>> +	 * posted-interrupt when 'SN' is set.
>>> +	 */
>>> +	sn = pi_test_sn(&vmx->pi_desc);
>>> +
>>>  	r = pi_test_and_set_on(&vmx->pi_desc);
>>>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>  #ifdef CONFIG_SMP
>>> -	if (!r && (vcpu->mode == IN_GUEST_MODE))
>>> +	if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
>>>  		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
>>>  				POSTED_INTR_VECTOR);
>>>  	else
>>> --
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
@ 2014-12-18 14:26   ` Zhang, Yang Z
  2014-12-19  1:40     ` Wu, Feng
  2015-01-28 15:29   ` David Woodhouse
  1 sibling, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-18 14:26 UTC (permalink / raw)
  To: Wu, Feng, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Feng Wu wrote on 2014-12-12:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to guests
> instead of interrupt remapping. The destination of the interrupt is
> set in Posted-Interrupts Descriptor, and the migration happens during
> vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used when
> changing back to remapping mode.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  drivers/iommu/intel_irq_remapping.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> a/drivers/iommu/intel_irq_remapping.c +++
> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  {
>  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
>  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
>  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
>  = data->parent_data; 	int ret;
> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> const struct cpumask *mask,
>  	 */
>  	irte->vector = cfg->vector;
>  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> -	modify_irte(&ir_data->irq_2_iommu, irte);
> +
> +	/* We don't need to modify irte if the interrupt is for posting. */
> +	if (irte_pi->pst != 1)
> +		modify_irte(&ir_data->irq_2_iommu, irte);

What happens if user changes the IRQ affinity manually?

> 
>  	/*
>  	 * After this point, all the interrupts will start arriving


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
@ 2014-12-18 14:49   ` Zhang, Yang Z
  2014-12-18 16:58     ` Paolo Bonzini
  2015-01-09 14:54   ` Radim Krčmář
  1 sibling, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-18 14:49 UTC (permalink / raw)
  To: Wu, Feng, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Feng Wu wrote on 2014-12-12:
> This patch defines a new interface kvm_find_dest_vcpu for
> VT-d PI, which can returns the destination vCPU of the
> interrupt for guests.
> 
> Since VT-d PI cannot handle broadcast/multicast interrupt,
> Here we only handle Fixed and Lowest priority interrupts.
> 
> The current method of handling guest lowest priority interrtups
> is to use a counter 'apic_arb_prio' for each vCPU, we choose the
> vCPU with smallest 'apic_arb_prio' and then increase it by 1.
> However, for VT-d PI, we cannot re-use this, since we no longer
> have control to 'apic_arb_prio' with posted interrupt direct
> delivery by Hardware.
> 
> Here, we introduce a similar way with 'apic_arb_prio' to handle guest
> lowest priority interrtups when VT-d PI is used. Here is the ideas: -
> Each vCPU has a counter 'round_robin_counter'. - When guests sets an
> interrupts to lowest priority, we choose the vCPU with smallest
> 'round_robin_counter' as the destination, then increase it.
 
How this can work well? All subsequent interrupts are delivered to one vCPU? It shouldn't be the best solution, need more consideration. Also, I think you should take the apic_arb_prio into consider since the priority is for the whole vCPU not for one interrupt.

Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
@ 2014-12-18 14:54   ` Zhang, Yang Z
  2014-12-19  0:52     ` Wu, Feng
  2015-01-30 18:18   ` H. Peter Anvin
  2015-02-23 22:04   ` Marcelo Tosatti
  2 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-18 14:54 UTC (permalink / raw)
  To: Wu, Feng, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Feng Wu wrote on 2014-12-12:
> Currently, we use a global vector as the Posted-Interrupts
> Notification Event for all the vCPUs in the system. We need to
> introduce another global vector for VT-d Posted-Interrtups, which will
> be used to wakeup the sleep vCPU when an external interrupt from a direct-assigned device happens for that vCPU.
> 

Hi Feng, 

Since the idea of two global vectors mechanism is from me, please add me to the comments.

> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  arch/x86/include/asm/entry_arch.h  |  2 ++
>  arch/x86/include/asm/hardirq.h     |  1 +
>  arch/x86/include/asm/hw_irq.h      |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  1 +
>  arch/x86/kernel/entry_64.S         |  2 ++
>  arch/x86/kernel/irq.c              | 27 +++++++++++++++++++++++++++
>  arch/x86/kernel/irqinit.c          |  2 ++
>  7 files changed, 37 insertions(+)
> diff --git a/arch/x86/include/asm/entry_arch.h
> b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 ---
> a/arch/x86/include/asm/entry_arch.h +++
> b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@
> BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)  #ifdef
> CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
>  		 smp_kvm_posted_intr_ipi)
> +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
> +		 smp_kvm_posted_intr_wakeup_ipi)
>  #endif
>  
>  /*
> diff --git a/arch/x86/include/asm/hardirq.h
> b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -14,6 +14,7 @@ typedef struct {
>  #endif #ifdef CONFIG_HAVE_KVM 	unsigned int kvm_posted_intr_ipis;
>  +	unsigned int kvm_posted_intr_wakeup_ipis; #endif 	unsigned int
>  x86_platform_ipis;	/* arch dependent */ 	unsigned int apic_perf_irqs;
> diff --git a/arch/x86/include/asm/hw_irq.h
> b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -29,6 +29,7 @@
>  extern asmlinkage void apic_timer_interrupt(void);  extern asmlinkage
> void x86_platform_ipi(void);  extern asmlinkage void
> kvm_posted_intr_ipi(void); +extern asmlinkage void
> kvm_posted_intr_wakeup_ipi(void);
>  extern asmlinkage void error_interrupt(void);  extern asmlinkage void
> irq_work_interrupt(void);
> 
> @@ -92,6 +93,7 @@ extern void
> trace_call_function_single_interrupt(void);
>  #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
> #define trace_reboot_interrupt  reboot_interrupt  #define
> trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
> +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
>  #endif /* CONFIG_TRACING */
>  
>  struct irq_domain;
> diff --git a/arch/x86/include/asm/irq_vectors.h
> b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 ---
> a/arch/x86/include/asm/irq_vectors.h +++
> b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@
>  /* Vector for KVM to deliver posted interrupt IPI */  #ifdef
>  CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR		0xf2 +#define
>  POSTED_INTR_WAKEUP_VECTOR	0xf1 #endif
>  
>  /*
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++
> b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt
> X86_PLATFORM_IPI_VECTOR \  #ifdef CONFIG_HAVE_KVM
>  apicinterrupt3 POSTED_INTR_VECTOR \
>  	kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
> +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
> +	kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
>  #endif
>  
>  #ifdef CONFIG_X86_MCE_THRESHOLD
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index
> 922d285..47408c3 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs
> *regs)  }
> 
>  #ifdef CONFIG_HAVE_KVM
> +void (*wakeup_handler_callback)(void) = NULL;
> +EXPORT_SYMBOL_GPL(wakeup_handler_callback); +
>  /*
>   * Handler for POSTED_INTERRUPT_VECTOR.
>   */
> @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct
> pt_regs *regs)
> 
>  	set_irq_regs(old_regs);
>  }
> +
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs) {
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +
> +	ack_APIC_irq();
> +
> +	irq_enter();
> +
> +	exit_idle();
> +
> +	inc_irq_stat(kvm_posted_intr_wakeup_ipis);
> +
> +	if (wakeup_handler_callback)
> +		wakeup_handler_callback();
> +
> +	irq_exit();
> +
> +	set_irq_regs(old_regs);
> +}
> +
>  #endif
>  
>  __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) diff
> --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index
> 70e181e..844673c 100644 --- a/arch/x86/kernel/irqinit.c +++
> b/arch/x86/kernel/irqinit.c @@ -144,6 +144,8 @@ static void __init
> apic_intr_init(void)  #ifdef CONFIG_HAVE_KVM
>  	/* IPI for KVM to deliver posted interrupt */
>  	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
> +	/* IPI for KVM to deliver interrupt to wake up tasks */
> +	alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR,
> +kvm_posted_intr_wakeup_ipi);
>  #endif
>  
>  	/* IPI vectors for APIC spurious and error interrupts */


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-18  8:38       ` Paolo Bonzini
@ 2014-12-18 15:09         ` Zhang, Yang Z
  2014-12-19  2:58           ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-18 15:09 UTC (permalink / raw)
  To: Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, linux-kernel, kvm

Paolo Bonzini wrote on 2014-12-18:
> 
> 
> On 18/12/2014 04:14, Wu, Feng wrote:
>> 
>> 
>> linux-kernel-owner@vger.kernel.org wrote on mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org;
>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
>>> joro-zLv9SwRftAIdnm+Jiang
>>> Liu
>>> Cc: iommu@lists.linux-foundation.org;
>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
>>> Eric Auger
>>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>>> set
>>> 
>>> 
>>> 
>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>> Currently, we don't support urgent interrupt, all interrupts are
>>>> recognized as non-urgent interrupt, so we cannot send
>>>> posted-interrupt when 'SN' is set.
>>> 
>>> Can this happen?  If the vcpu is in guest mode, it cannot have been
>>> scheduled out, and that's the only case when SN is set.
>>> 
>>> Paolo
>> 
>> Currently, the only place where SN is set is vCPU is preempted and

If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if a PI is occurs when vCPU is preempted?

>> waiting for the next scheduling in the runqueue. But I am not sure
>> whether we need to set SN for other purpose in future. Adding SN
>> checking here is just to follow the Spec. non-urgent interrupts are
>> suppressed
> when SN is set.
> 
> I would change that to a WARN_ON_ONCE then.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor
  2014-12-12 15:14 ` [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
@ 2014-12-18 15:19   ` Zhang, Yang Z
  0 siblings, 0 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-18 15:19 UTC (permalink / raw)
  To: Wu, Feng, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Feng Wu wrote on 2014-12-12:
> This patch initializes the VT-d Posted-Interrupts Descriptor.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  arch/x86/kvm/vmx.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> 0b1383e..66ca275 100644 --- a/arch/x86/kvm/vmx.c +++
> b/arch/x86/kvm/vmx.c @@ -45,6 +45,7 @@
>  #include <asm/perf_event.h>
>  #include <asm/debugreg.h>
>  #include <asm/kexec.h>
> +#include <asm/irq_remapping.h>
> 
>  #include "trace.h"
> @@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void)
>  	kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);  }
> +static void pi_desc_init(struct vcpu_vmx *vmx) {
> +	unsigned int dest;
> +
> +	if (!irq_remapping_cap(IRQ_POSTING_CAP))
> +		return;
> +
> +	/*
> +	 * Initialize Posted-Interrupt Descriptor
> +	 */
> +
> +	pi_clear_sn(&vmx->pi_desc);
> +	vmx->pi_desc.nv = POSTED_INTR_VECTOR;

Here.

> +
> +	/* Physical mode for Notificaiton Event */
> +	vmx->pi_desc.ndm = 0;

And from here..

> +	dest = cpu_physical_id(vmx->vcpu.cpu);
> +
> +	if (x2apic_enabled())
> +		vmx->pi_desc.ndst = dest;
> +	else
> +		vmx->pi_desc.ndst = (dest << 8) & 0xFF00; }
> +

..to here are useless. The right place to update PI descriptor is where vcpu got loaded not in initialization.

>  /*
>   * Sets up the vmcs for emulated real mode.
>   */
> @@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> 
>  		vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>  		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> +
> +		pi_desc_init(vmx);
>  	}
>  
>  	if (ple_gap) {


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-18 14:49   ` Zhang, Yang Z
@ 2014-12-18 16:58     ` Paolo Bonzini
  2014-12-19  1:13       ` Zhang, Yang Z
  2014-12-19  1:30       ` Wu, Feng
  0 siblings, 2 replies; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-18 16:58 UTC (permalink / raw)
  To: Zhang, Yang Z, Wu, Feng, tglx, mingo, hpa, x86, gleb, dwmw2,
	joro, alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm



On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>> guest lowest priority interrtups when VT-d PI is used. Here is
>>> the ideas: - Each vCPU has a counter 'round_robin_counter'. -
>>> When guests sets an interrupts to lowest priority, we choose the
>>> vCPU with smallest 'round_robin_counter' as the destination, then
>>> increase it.
> 
> How this can work well? All subsequent interrupts are delivered to
> one vCPU? It shouldn't be the best solution, need more consideration.

Well, it's a hardware limitation.  The alternative (which is easy to
implement) is to only do PI for single-CPU interrupts.  This should work
well for multiqueue NICs (and of course for UP guests :)), so perhaps
it's a good idea to only support that as a first attempt.

Paolo

> Also, I think you should take the apic_arb_prio into consider since
> the priority is for the whole vCPU not for one interrupt.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2014-12-18 14:54   ` Zhang, Yang Z
@ 2014-12-19  0:52     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  0:52 UTC (permalink / raw)
  To: Zhang, Yang Z, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2,
	joro, alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Thursday, December 18, 2014 10:55 PM
> To: Wu, Feng; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org; gleb@kernel.org; pbonzini@redhat.com;
> dwmw2@infradead.org; joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 21/26] x86, irq: Define a global vector for VT-d
> Posted-Interrupts
> 
> Feng Wu wrote on 2014-12-12:
> > Currently, we use a global vector as the Posted-Interrupts
> > Notification Event for all the vCPUs in the system. We need to
> > introduce another global vector for VT-d Posted-Interrtups, which will
> > be used to wakeup the sleep vCPU when an external interrupt from a
> direct-assigned device happens for that vCPU.
> >
> 
> Hi Feng,
> 
> Since the idea of two global vectors mechanism is from me, please add me to
> the comments.

No problem, Yang, I will add a "suggested-by Yang Zhang <yang.z.zhang@intel.com>"
in this patch. Thanks a lot!

Thanks,
Feng

> 
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  arch/x86/include/asm/entry_arch.h  |  2 ++
> >  arch/x86/include/asm/hardirq.h     |  1 +
> >  arch/x86/include/asm/hw_irq.h      |  2 ++
> >  arch/x86/include/asm/irq_vectors.h |  1 +
> >  arch/x86/kernel/entry_64.S         |  2 ++
> >  arch/x86/kernel/irq.c              | 27
> +++++++++++++++++++++++++++
> >  arch/x86/kernel/irqinit.c          |  2 ++
> >  7 files changed, 37 insertions(+)
> > diff --git a/arch/x86/include/asm/entry_arch.h
> > b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 ---
> > a/arch/x86/include/asm/entry_arch.h +++
> > b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@
> > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)  #ifdef
> > CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi,
> POSTED_INTR_VECTOR,
> >  		 smp_kvm_posted_intr_ipi)
> > +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi,
> POSTED_INTR_WAKEUP_VECTOR,
> > +		 smp_kvm_posted_intr_wakeup_ipi)
> >  #endif
> >
> >  /*
> > diff --git a/arch/x86/include/asm/hardirq.h
> > b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644
> > --- a/arch/x86/include/asm/hardirq.h
> > +++ b/arch/x86/include/asm/hardirq.h
> > @@ -14,6 +14,7 @@ typedef struct {
> >  #endif #ifdef CONFIG_HAVE_KVM 	unsigned int kvm_posted_intr_ipis;
> >  +	unsigned int kvm_posted_intr_wakeup_ipis; #endif 	unsigned int
> >  x86_platform_ipis;	/* arch dependent */ 	unsigned int apic_perf_irqs;
> > diff --git a/arch/x86/include/asm/hw_irq.h
> > b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644
> > --- a/arch/x86/include/asm/hw_irq.h
> > +++ b/arch/x86/include/asm/hw_irq.h
> > @@ -29,6 +29,7 @@
> >  extern asmlinkage void apic_timer_interrupt(void);  extern asmlinkage
> > void x86_platform_ipi(void);  extern asmlinkage void
> > kvm_posted_intr_ipi(void); +extern asmlinkage void
> > kvm_posted_intr_wakeup_ipi(void);
> >  extern asmlinkage void error_interrupt(void);  extern asmlinkage void
> > irq_work_interrupt(void);
> >
> > @@ -92,6 +93,7 @@ extern void
> > trace_call_function_single_interrupt(void);
> >  #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
> > #define trace_reboot_interrupt  reboot_interrupt  #define
> > trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
> > +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
> >  #endif /* CONFIG_TRACING */
> >
> >  struct irq_domain;
> > diff --git a/arch/x86/include/asm/irq_vectors.h
> > b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 ---
> > a/arch/x86/include/asm/irq_vectors.h +++
> > b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@
> >  /* Vector for KVM to deliver posted interrupt IPI */  #ifdef
> >  CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR		0xf2 +#define
> >  POSTED_INTR_WAKEUP_VECTOR	0xf1 #endif
> >
> >  /*
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++
> > b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt
> > X86_PLATFORM_IPI_VECTOR \  #ifdef CONFIG_HAVE_KVM
> >  apicinterrupt3 POSTED_INTR_VECTOR \
> >  	kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
> > +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
> > +	kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
> >  #endif
> >
> >  #ifdef CONFIG_X86_MCE_THRESHOLD
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index
> > 922d285..47408c3 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> > @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs
> > *regs)  }
> >
> >  #ifdef CONFIG_HAVE_KVM
> > +void (*wakeup_handler_callback)(void) = NULL;
> > +EXPORT_SYMBOL_GPL(wakeup_handler_callback); +
> >  /*
> >   * Handler for POSTED_INTERRUPT_VECTOR.
> >   */
> > @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct
> > pt_regs *regs)
> >
> >  	set_irq_regs(old_regs);
> >  }
> > +
> > +/*
> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > + */
> > +__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs) {
> > +	struct pt_regs *old_regs = set_irq_regs(regs);
> > +
> > +	ack_APIC_irq();
> > +
> > +	irq_enter();
> > +
> > +	exit_idle();
> > +
> > +	inc_irq_stat(kvm_posted_intr_wakeup_ipis);
> > +
> > +	if (wakeup_handler_callback)
> > +		wakeup_handler_callback();
> > +
> > +	irq_exit();
> > +
> > +	set_irq_regs(old_regs);
> > +}
> > +
> >  #endif
> >
> >  __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) diff
> > --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index
> > 70e181e..844673c 100644 --- a/arch/x86/kernel/irqinit.c +++
> > b/arch/x86/kernel/irqinit.c @@ -144,6 +144,8 @@ static void __init
> > apic_intr_init(void)  #ifdef CONFIG_HAVE_KVM
> >  	/* IPI for KVM to deliver posted interrupt */
> >  	alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
> > +	/* IPI for KVM to deliver interrupt to wake up tasks */
> > +	alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR,
> > +kvm_posted_intr_wakeup_ipi);
> >  #endif
> >
> >  	/* IPI vectors for APIC spurious and error interrupts */
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-18 16:58     ` Paolo Bonzini
@ 2014-12-19  1:13       ` Zhang, Yang Z
  2014-12-19  1:30         ` Wu, Feng
  2014-12-19  1:30       ` Wu, Feng
  1 sibling, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  1:13 UTC (permalink / raw)
  To: Paolo Bonzini, Wu, Feng, tglx, mingo, hpa, x86, gleb, dwmw2,
	joro, alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm

Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>>> guest lowest priority interrtups when VT-d PI is used. Here is the
>>>> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
>>>> guests sets an interrupts to lowest priority, we choose the vCPU
>>>> with smallest 'round_robin_counter' as the destination, then
>>>> increase it.
>> 
>> How this can work well? All subsequent interrupts are delivered to
>> one vCPU? It shouldn't be the best solution, need more consideration.
> 
> Well, it's a hardware limitation.  The alternative (which is easy to

Agree, it is limited by hardware. But lowest priority distributes the interrupt more efficient than fixed mode. And current implementation more likes to switch the lowest priority mode to fixed mode. In case of interrupt intensive environment, this may be a bottleneck and VM may not benefit greatly from VT-d PI. But agree again, it is really a hardware limitation.

> implement) is to only do PI for single-CPU interrupts.  This should
> work well for multiqueue NICs (and of course for UP guests :)), so
> perhaps it's a good idea to only support that as a first attempt.

The more easy way is to deliver the interrupt to the first matched VCPU we find. The round_robin_counter really helps nothing here since the interrupt is delivered by hardware directly.

> 
> Paolo
> 
>> Also, I think you should take the apic_arb_prio into consider since
>> the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-19  1:13       ` Zhang, Yang Z
@ 2014-12-19  1:30         ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  1:30 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, tglx, mingo, hpa, x86, gleb, dwmw2,
	joro, alex.williamson, jiang.liu, Sankaran, Rajesh
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 9:14 AM
> To: Paolo Bonzini; Wu, Feng; tglx@linutronix.de; mingo@redhat.com;
> hpa@zytor.com; x86@kernel.org; gleb@kernel.org; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> Paolo Bonzini wrote on 2014-12-19:
> >
> >
> > On 18/12/2014 15:49, Zhang, Yang Z wrote:
> >>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
> >>>> guest lowest priority interrtups when VT-d PI is used. Here is the
> >>>> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
> >>>> guests sets an interrupts to lowest priority, we choose the vCPU
> >>>> with smallest 'round_robin_counter' as the destination, then
> >>>> increase it.
> >>
> >> How this can work well? All subsequent interrupts are delivered to
> >> one vCPU? It shouldn't be the best solution, need more consideration.
> >
> > Well, it's a hardware limitation.  The alternative (which is easy to
> 
> Agree, it is limited by hardware. But lowest priority distributes the interrupt
> more efficient than fixed mode. And current implementation more likes to
> switch the lowest priority mode to fixed mode. In case of interrupt intensive
> environment, this may be a bottleneck and VM may not benefit greatly from
> VT-d PI. But agree again, it is really a hardware limitation.
> 
> > implement) is to only do PI for single-CPU interrupts.  This should
> > work well for multiqueue NICs (and of course for UP guests :)), so
> > perhaps it's a good idea to only support that as a first attempt.
> 
> The more easy way is to deliver the interrupt to the first matched VCPU we find.
> The round_robin_counter really helps nothing here since the interrupt is
> delivered by hardware directly.
> 
> >
> > Paolo
> >
> >> Also, I think you should take the apic_arb_prio into consider since
> >> the priority is for the whole vCPU not for one interrupt.
> 
> 
> Best regards,
> Yang

In fact, the current solution was discussed with Rajesh in the cc List, here is Rajesh's original words:

"When you see a guest requesting a lowest priority interrupts (by programming the virtual IOAPIC, or by programming the virtual MSI/MSI-X registers), have KVM associate it to a vCPU.  Or, put another way, use the 'apic_arb_prio' method you describe below, but instead of using it at time of interrupt (which you no longer have control with posted interrupt direct delivery), do it at time of initializing the interrupt resource.  This way, if the guest asks for 4 lowest priority interrupts, and say you a guest with two vCPUs, the first interrupt request will be serviced by KVM by assigning it through posting to vCPU0, the next one goes to vCPU1, the next one would go back to vCPU0, and so forth..  You could also choose to do this based on vector hashing instead of round-robin."

Thanks,
Feng

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-18 16:58     ` Paolo Bonzini
  2014-12-19  1:13       ` Zhang, Yang Z
@ 2014-12-19  1:30       ` Wu, Feng
  2014-12-19  1:47         ` Zhang, Yang Z
  2014-12-19 11:59         ` Paolo Bonzini
  1 sibling, 2 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  1:30 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, tglx, mingo, hpa, x86, gleb, dwmw2,
	joro, alex.williamson, jiang.liu, Sankaran, Rajesh
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, December 19, 2014 12:58 AM
> To: Zhang, Yang Z; Wu, Feng; tglx@linutronix.de; mingo@redhat.com;
> hpa@zytor.com; x86@kernel.org; gleb@kernel.org; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 18/12/2014 15:49, Zhang, Yang Z wrote:
> >>> Here, we introduce a similar way with 'apic_arb_prio' to handle
> >>> guest lowest priority interrtups when VT-d PI is used. Here is
> >>> the ideas: - Each vCPU has a counter 'round_robin_counter'. -
> >>> When guests sets an interrupts to lowest priority, we choose the
> >>> vCPU with smallest 'round_robin_counter' as the destination, then
> >>> increase it.
> >
> > How this can work well? All subsequent interrupts are delivered to
> > one vCPU? It shouldn't be the best solution, need more consideration.
> 
> Well, it's a hardware limitation.  The alternative (which is easy to
> implement) is to only do PI for single-CPU interrupts.  This should work
> well for multiqueue NICs (and of course for UP guests :)), so perhaps
> it's a good idea to only support that as a first attempt.
> 
> Paolo

Paolo, what do you mean by "single-CPU interrupts"? Do you mean we don't
support lowest priority interrupts for PI? But Linux OS uses lowest priority
for most of the case? If so, we can hardly get benefit from this feature for
Linux guest OS.

Thanks,
Feng

> 
> > Also, I think you should take the apic_arb_prio into consider since
> > the priority is for the whole vCPU not for one interrupt.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-18 14:26   ` Zhang, Yang Z
@ 2014-12-19  1:40     ` Wu, Feng
  2014-12-19  1:46       ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  1:40 UTC (permalink / raw)
  To: Zhang, Yang Z, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2,
	joro, alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Thursday, December 18, 2014 10:26 PM
> To: Wu, Feng; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org; gleb@kernel.org; pbonzini@redhat.com;
> dwmw2@infradead.org; joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> Feng Wu wrote on 2014-12-12:
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to guests
> > instead of interrupt remapping. The destination of the interrupt is
> > set in Posted-Interrupts Descriptor, and the migration happens during
> > vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used when
> > changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> > ---
> >  drivers/iommu/intel_irq_remapping.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> > b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> > a/drivers/iommu/intel_irq_remapping.c +++
> > b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> > intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >  {
> >  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
> >  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
> >  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
> >  = data->parent_data; 	int ret;
> > @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> > const struct cpumask *mask,
> >  	 */
> >  	irte->vector = cfg->vector;
> >  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > -	modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > +	/* We don't need to modify irte if the interrupt is for posting. */
> > +	if (irte_pi->pst != 1)
> > +		modify_irte(&ir_data->irq_2_iommu, irte);
> 
> What happens if user changes the IRQ affinity manually?

If the IRQ is posted, its affinity is controlled by guest (irq <---> vCPU <----> pCPU),
it has no effect when host changes its affinity.

Thanks,
Feng

> 
> >
> >  	/*
> >  	 * After this point, all the interrupts will start arriving
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-19  1:40     ` Wu, Feng
@ 2014-12-19  1:46       ` Zhang, Yang Z
  2014-12-19 11:59         ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  1:46 UTC (permalink / raw)
  To: Wu, Feng, tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-18:
>> jiang.liu@linux.intel.com
>> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
>> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Feng Wu wrote on 2014-12-12:
>>> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
>>> When 'pst' is set in IRTE, the associated irq will be posted to
>>> guests instead of interrupt remapping. The destination of the
>>> interrupt is set in Posted-Interrupts Descriptor, and the
>>> migration happens during vCPU scheduling.
>>> 
>>> However, we still update the cached irte here, which can be used
>>> when changing back to remapping mode.
>>> 
>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/intel_irq_remapping.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-) diff --git
>>> a/drivers/iommu/intel_irq_remapping.c
>>> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a
>>> 100644
>>> --- a/drivers/iommu/intel_irq_remapping.c +++
>>> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
>>> intel_ir_set_affinity(struct irq_data *data, const struct cpumask
>>> *mask,  {
>>>  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
>>>  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
>>>  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
>>>  = data->parent_data; 	int ret;
>>> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
>>> const struct cpumask *mask,
>>>  	 */
>>>  	irte->vector = cfg->vector;
>>>  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
>>> -	modify_irte(&ir_data->irq_2_iommu, irte);
>>> +
>>> +	/* We don't need to modify irte if the interrupt is for posting. */
>>> +	if (irte_pi->pst != 1)
>>> +		modify_irte(&ir_data->irq_2_iommu, irte);
>> 
>> What happens if user changes the IRQ affinity manually?
> 
> If the IRQ is posted, its affinity is controlled by guest (irq <--->
> vCPU <----> pCPU), it has no effect when host changes its affinity.

That's the problem: User is able to changes it in host but it never takes effect since it is actually controlled by guest. I guess it will break the IRQ balance too.

> 
> Thanks,
> Feng
> 
>> 
>>> 
>>>  	/*
>>>  	 * After this point, all the interrupts will start arriving
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-19  1:30       ` Wu, Feng
@ 2014-12-19  1:47         ` Zhang, Yang Z
  2014-12-19 11:59         ` Paolo Bonzini
  1 sibling, 0 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  1:47 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, tglx, mingo, hpa, x86, gleb, dwmw2,
	joro, alex.williamson, jiang.liu, Sankaran, Rajesh
  Cc: eric.auger, linux-kernel, iommu, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> Paolo Bonzini wrote on 2014-12-19:
>> jiang.liu@linux.intel.com
>> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
>> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
>> Subject: Re: [v3 13/26] KVM: Define a new interface
>> kvm_find_dest_vcpu() for VT-d PI
>> 
>> 
>> 
>> On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>>>> guest lowest priority interrtups when VT-d PI is used. Here is
>>>>> the
>>>>> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
>>>>> guests sets an interrupts to lowest priority, we choose the vCPU
>>>>> with smallest 'round_robin_counter' as the destination, then
>>>>> increase it.
>>> 
>>> How this can work well? All subsequent interrupts are delivered to
>>> one vCPU? It shouldn't be the best solution, need more consideration.
>> 
>> Well, it's a hardware limitation.  The alternative (which is easy to
>> implement) is to only do PI for single-CPU interrupts.  This should
>> work well for multiqueue NICs (and of course for UP guests :)), so
>> perhaps it's a good idea to only support that as a first attempt.
>> 
>> Paolo
> 
> Paolo, what do you mean by "single-CPU interrupts"? Do you mean we

It should be same idea as I mentioned on another thread: deliver the interrupt to a single CPU(maybe the first matched VCPU?)

> don't support lowest priority interrupts for PI? But Linux OS uses
> lowest priority for most of the case? If so, we can hardly get benefit
> from this feature for Linux guest OS.
> 
> Thanks,
> Feng
> 
>> 
>>> Also, I think you should take the apic_arb_prio into consider
>>> since the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-18  8:32       ` Paolo Bonzini
@ 2014-12-19  2:09         ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  2:09 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 4:32 PM
> To: linux-kernel@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is preempted
> 
> 
> 
> On 18/12/2014 04:15, Wu, Feng wrote:
> > Thanks for your comments, Paolo!
> >
> > If we use u64 new_control, we cannot use new.sn any more.
> > Maybe we can change the struct pi_desc {} like this:
> >
> > typedef struct pid_control{
> >         u64     on      : 1,
> >         sn      : 1,
> >         rsvd_1  : 13,
> >         ndm     : 1,
> >         nv      : 8,
> >         rsvd_2  : 8,
> >         ndst    : 32;
> > }pid_control_t;
> >
> > struct pi_desc {
> >         u32 pir[8];     /* Posted interrupt requested */
> > 		pid_control_t control;
> 
> Probably something like this to keep the union:
> 
> typedef union pid_control {
> 	u64 full;
> 	struct {
> 		u64 on : 1,
> 		...
> 	} fields;
> };
> 
> >         u32 rsvd[6];
> > } __aligned(64);
> >
> >
> > Then we can define pid_control_t new_control, old_control. And use
> new_control.sn = 0.
> >
> > What is your opinon?
> 
> Sure.  Alternatively, keep using struct pi_desc new; just
> do not zero it, nor access any field outide the control word.
> 
> Paolo

Yes, this is also a good idea. Thanks!

Thanks,
Feng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
  2014-12-17 16:17   ` Paolo Bonzini
@ 2014-12-19  2:19     ` Wu, Feng
  2014-12-19 11:59       ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  2:19 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 12:18 AM
> To: linux-kernel@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
> > Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
> > so we can use it outside of irqchip.c.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  include/linux/kvm_host.h | 19 +++++++++++++++++++
> >  virt/kvm/irqchip.c       | 11 -----------
> >  2 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 0b9659d..cfa85ac 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
> >  	struct hlist_node link;
> >  };
> >
> > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> > +
> > +struct kvm_irq_routing_table {
> > +	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > +	struct kvm_kernel_irq_routing_entry *rt_entries;
> > +	u32 nr_rt_entries;
> > +	/*
> > +	 * Array indexed by gsi. Each entry contains list of irq chips
> > +	 * the gsi is connected to.
> > +	 */
> > +	struct hlist_head map[0];
> > +};
> > +
> > +#else
> > +
> > +struct kvm_irq_routing_table {};
> 
> If possible, just make this "struct kvm_irq_routing_table;" and pull
> this line to include/linux/kvm_types.h.
> 
> Paolo

Do you mean move the definition of struct kvm_irq_routing_table
to include/linux/kvm_types.h and add a declaration here?

Thanks,
Feng

> 
> > +
> > +#endif
> > +
> >  #ifndef KVM_PRIVATE_MEM_SLOTS
> >  #define KVM_PRIVATE_MEM_SLOTS 0
> >  #endif
> > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> > index 7f256f3..cdf29a6 100644
> > --- a/virt/kvm/irqchip.c
> > +++ b/virt/kvm/irqchip.c
> > @@ -31,17 +31,6 @@
> >  #include <trace/events/kvm.h>
> >  #include "irq.h"
> >
> > -struct kvm_irq_routing_table {
> > -	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > -	struct kvm_kernel_irq_routing_entry *rt_entries;
> > -	u32 nr_rt_entries;
> > -	/*
> > -	 * Array indexed by gsi. Each entry contains list of irq chips
> > -	 * the gsi is connected to.
> > -	 */
> > -	struct hlist_head map[0];
> > -};
> > -
> >  int kvm_irq_map_gsi(struct kvm *kvm,
> >  		    struct kvm_kernel_irq_routing_entry *entries, int gsi)
> >  {
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-18  8:37       ` Paolo Bonzini
@ 2014-12-19  2:51         ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  2:51 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: iommu, kvm, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 4:37 PM
> To: linux-kernel@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; kvm@vger.kernel.org;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> 
> 
> On 18/12/2014 04:16, Wu, Feng wrote:
> >>> pre-block:
> >>> - Add the vCPU to the blocked per-CPU list
> >>> - Clear 'SN'
> >>
> >> Should SN be already clear (and NV set to POSTED_INTR_VECTOR)?
> >
> > I think the SN bit should be clear here, Adding it here is just to make sure
> > SN is clear when vCPU is blocked, so it can receive wakeup notification event
> later.
> 
> Then, please, WARN if the SN bit is set inside the if (vcpu->blocked).
> Inside that if you can just add the vCPU to the blocked list on vcpu_put.
> 
> >> Can it
> >> happen that you go from sched-out to blocked without doing a sched-in
> first?
> >>
> >
> > I cannot imagine this scenario, can you please be more specific? Thanks a lot!
> 
> I cannot either. :)  But it would be the case where SN is not cleared.
> So we agree that it cannot happen.
> 
> >> In fact, if this is possible, what happens if vcpu->preempted &&
> >> vcpu->blocked?
> >
> > In fact, vcpu->preempted && vcpu->blocked happens sometimes, but I think
> there is
> > no issues. Please refer to the following case:
> 
> I agree that there should be no issues.  But if it can happen, it's better:
> 
> 1) to separate the handling of preemption and blocking: preemption
> handles SN/NV/NDST, blocking handles the wakeup list.
> 
Sorry, I don't quite understand this.

I think handling of preemption and blocking is separated in vmx_vcpu_put().
For vmx_vcpu_load(), the handling of SN/NV/NDST is common for preemption
and blocking.

Thanks,
Feng

> 2) to change this
> 
> +		} else if (vcpu->blocked) {
> +			/*
> +			 * The vcpu is blocked on the wait queue.
> +			 * Store the blocked vCPU on the list of the
> +			 * vcpu->wakeup_cpu, which is the destination
> +			 * of the wake-up notification event.
> 
> to just
> 
> 		}
> 		if (vcpu->blocked) {
> 			...
> 		}
> > kvm_vcpu_block()
> > 	-> vcpu->blocked = true;
> > 	-> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >
> > 	before schedule() is called, this vcpu is woken up by another guy, so
> > 	the state of the vcpu associated thread is changed to TASK_RUNNING,
> > 	then preemption happens after interrupts or the following schedule() is
> > 	hit, this will call kvm_sched_out(), in which current->state ==
> TASK_RUNNING
> > 	and vcpu->preempted is set to true. So now vcpu->preempted and
> vcpu->blocked
> > 	are both true. In vmx_vcpu_put(), we will check vcpu->preempted first, so
> > 	the vCPU will not be blocked, and the vcpu->blocked will be set the false in
> > 	vmx_vcpu_load().
> >
> > 	But maybe I need do a little change to the vmx_vcpu_load() like below:
> >
> >                 /*
> >                  * Delete the vCPU from the related wakeup queue
> >                  * if we are resuming from blocked state
> >                  */
> >                 if (vcpu->blocked) {
> >                         vcpu->blocked = false;
> > +						/* if wakeup_cpu == -1, the vcpu is currently not
> blocked on any
> > +						  pCPU, don't need dequeue here */
> > +						if (vcpu->wakeup_cpu != -1) {
> >
> spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> >                                 vcpu->wakeup_cpu), flags);
> >                     	     list_del(&vcpu->blocked_vcpu_list);
> >
> spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> >                                 vcpu->wakeup_cpu), flags);
> >                         	 vcpu->wakeup_cpu = -1;
> > +						}
> >                 }
> 
> Good idea.
> 
> Paolo
> 
> > Any ideas about this? Thanks a lot!
> >
> > Thanks,
> > Feng
> >
> >
> > 	-> schedule();
> >
> >
> >>
> >>> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >>>
> >>> post-block:
> >>> - Remove the vCPU from the per-CPU list
> >>
> >> Paolo
> >>
> >>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-18 15:09         ` Zhang, Yang Z
@ 2014-12-19  2:58           ` Wu, Feng
  2014-12-19  3:32             ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  2:58 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org
> [mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of Zhang, Yang Z
> Sent: Thursday, December 18, 2014 11:10 PM
> To: Paolo Bonzini; kvm@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Paolo Bonzini wrote on 2014-12-18:
> >
> >
> > On 18/12/2014 04:14, Wu, Feng wrote:
> >>
> >>
> >> linux-kernel-owner@vger.kernel.org wrote on
> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
> >>> x86@kernel.org; Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org;
> >>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
> >>> joro-zLv9SwRftAIdnm+Jiang
> >>> Liu
> >>> Cc: iommu@lists.linux-foundation.org;
> >>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
> >>> Eric Auger
> >>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >>> set
> >>>
> >>>
> >>>
> >>> On 12/12/2014 16:14, Feng Wu wrote:
> >>>> Currently, we don't support urgent interrupt, all interrupts are
> >>>> recognized as non-urgent interrupt, so we cannot send
> >>>> posted-interrupt when 'SN' is set.
> >>>
> >>> Can this happen?  If the vcpu is in guest mode, it cannot have been
> >>> scheduled out, and that's the only case when SN is set.
> >>>
> >>> Paolo
> >>
> >> Currently, the only place where SN is set is vCPU is preempted and
> 
> If the vCPU is preempted, shouldn't the subsequent be ignored? What happens
> if a PI is occurs when vCPU is preempted?

If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts are
suppressed for posting.

Thanks,
Feng

> 
> >> waiting for the next scheduling in the runqueue. But I am not sure
> >> whether we need to set SN for other purpose in future. Adding SN
> >> checking here is just to follow the Spec. non-urgent interrupts are
> >> suppressed
> > when SN is set.
> >
> > I would change that to a WARN_ON_ONCE then.
> 
> 
> Best regards,
> Yang
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  2:58           ` Wu, Feng
@ 2014-12-19  3:32             ` Zhang, Yang Z
  2014-12-19  4:34               ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  3:32 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> iommu-bounces@lists.linux-foundation.org wrote on mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
>> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> kvm@vger.kernel.org
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Paolo Bonzini wrote on 2014-12-18:
>>> 
>>> 
>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>> 
>>>> 
>>>> linux-kernel-owner@vger.kernel.org wrote on
>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
>>>>> dwmw2@infradead.org;
>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
>>>>> joro-zLv9SwRftAIdnm+Jiang
>>>>> Liu
>>>>> Cc: iommu@lists.linux-foundation.org;
>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
>>>>> Eric Auger
>>>>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>> is set
>>>>> 
>>>>> 
>>>>> 
>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>> Currently, we don't support urgent interrupt, all interrupts
>>>>>> are recognized as non-urgent interrupt, so we cannot send
>>>>>> posted-interrupt when 'SN' is set.
>>>>> 
>>>>> Can this happen?  If the vcpu is in guest mode, it cannot have
>>>>> been scheduled out, and that's the only case when SN is set.
>>>>> 
>>>>> Paolo
>>>> 
>>>> Currently, the only place where SN is set is vCPU is preempted
>>>> and
>> 
>> If the vCPU is preempted, shouldn't the subsequent be ignored? What
>> happens if a PI is occurs when vCPU is preempted?
> 
> If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts
> are suppressed for posting.

I mean what happens if we don't set SN bit. From my point, if preempter already disabled the interrupt, it is ok to leave SN bit as zero. But if preempter enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, since there already has ON bit, so this means there only have one interrupt arrived at most and it doesn't hurt performance. Do we really need to set SN bit?

> 
> Thanks,
> Feng
> 
>> 
>>>> waiting for the next scheduling in the runqueue. But I am not
>>>> sure whether we need to set SN for other purpose in future.
>>>> Adding SN checking here is just to follow the Spec. non-urgent
>>>> interrupts are suppressed
>>> when SN is set.
>>> 
>>> I would change that to a WARN_ON_ONCE then.
>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  3:32             ` Zhang, Yang Z
@ 2014-12-19  4:34               ` Wu, Feng
  2014-12-19  4:44                 ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  4:34 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 11:33 AM
> To: Wu, Feng; Paolo Bonzini; kvm@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > iommu-bounces@lists.linux-foundation.org wrote on
> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
> >> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> >> kvm@vger.kernel.org
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Paolo Bonzini wrote on 2014-12-18:
> >>>
> >>>
> >>> On 18/12/2014 04:14, Wu, Feng wrote:
> >>>>
> >>>>
> >>>> linux-kernel-owner@vger.kernel.org wrote on
> >> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
> >>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
> >>>>> dwmw2@infradead.org;
> >>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
> >>>>> joro-zLv9SwRftAIdnm+Jiang
> >>>>> Liu
> >>>>> Cc: iommu@lists.linux-foundation.org;
> >>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
> >>>>> Eric Auger
> >>>>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> >>>>> is set
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 12/12/2014 16:14, Feng Wu wrote:
> >>>>>> Currently, we don't support urgent interrupt, all interrupts
> >>>>>> are recognized as non-urgent interrupt, so we cannot send
> >>>>>> posted-interrupt when 'SN' is set.
> >>>>>
> >>>>> Can this happen?  If the vcpu is in guest mode, it cannot have
> >>>>> been scheduled out, and that's the only case when SN is set.
> >>>>>
> >>>>> Paolo
> >>>>
> >>>> Currently, the only place where SN is set is vCPU is preempted
> >>>> and
> >>
> >> If the vCPU is preempted, shouldn't the subsequent be ignored? What
> >> happens if a PI is occurs when vCPU is preempted?
> >
> > If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts
> > are suppressed for posting.
> 
> I mean what happens if we don't set SN bit. From my point, if preempter
> already disabled the interrupt, it is ok to leave SN bit as zero. But if preempter
> enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW,
> since there already has ON bit, so this means there only have one interrupt
> arrived at most and it doesn't hurt performance. Do we really need to set SN
> bit?


See this scenario:
vCPU0 is running on pCPU0
--> vCPU0 is preempted by vCPU1
--> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule in runqueue

If the we don't set SN for vCPU0, then all subsequent interrupts for vCPU0 is posted
to vCPU1, this will consume hardware and software efforts and in fact it is not needed
at all. If SN is set for vCPU0, VT-d hardware will not issue Notification Event for vCPU0
when an interrupt is for it, but just setting the related PIR bit.

Thanks,
Feng

> 
> >
> > Thanks,
> > Feng
> >
> >>
> >>>> waiting for the next scheduling in the runqueue. But I am not
> >>>> sure whether we need to set SN for other purpose in future.
> >>>> Adding SN checking here is just to follow the Spec. non-urgent
> >>>> interrupts are suppressed
> >>> when SN is set.
> >>>
> >>> I would change that to a WARN_ON_ONCE then.
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> >>
> >> _______________________________________________
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  4:34               ` Wu, Feng
@ 2014-12-19  4:44                 ` Zhang, Yang Z
  2014-12-19  4:49                   ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  4:44 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> iommu-bounces@lists.linux-foundation.org wrote on
>> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
>>>> Cc: iommu@lists.linux-foundation.org;
>>>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>> 
>>>>> 
>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>> 
>>>>>> 
>>>>>> linux-kernel-owner@vger.kernel.org wrote on
>>>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
>>>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini; dwmw2@infradead.org;
>>>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt
>>>>>>> when 'SN' is set
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>> Currently, we don't support urgent interrupt, all interrupts
>>>>>>>> are recognized as non-urgent interrupt, so we cannot send
>>>>>>>> posted-interrupt when 'SN' is set.
>>>>>>> 
>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot have
>>>>>>> been scheduled out, and that's the only case when SN is set.
>>>>>>> 
>>>>>>> Paolo
>>>>>> 
>>>>>> Currently, the only place where SN is set is vCPU is preempted
>>>>>> and
>>>> 
>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>> What happens if a PI is occurs when vCPU is preempted?
>>> 
>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>> interrupts are suppressed for posting.
>> 
>> I mean what happens if we don't set SN bit. From my point, if
>> preempter already disabled the interrupt, it is ok to leave SN bit
>> as zero. But if preempter enabled the interrupt, doesn't this mean
>> he allow interrupt to happen? BTW, since there already has ON bit,
>> so this means there only have one interrupt arrived at most and it
>> doesn't hurt performance. Do we really need to set SN bit?
> 
> 
> See this scenario:
> vCPU0 is running on pCPU0
> --> vCPU0 is preempted by vCPU1
> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule
> --> in runqueue
> 
> If the we don't set SN for vCPU0, then all subsequent interrupts for
> vCPU0 is posted to vCPU1, this will consume hardware and software

The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?

> efforts and in fact it is not needed at all. If SN is set for vCPU0,
> VT-d hardware will not issue Notification Event for vCPU0 when an
> interrupt is for it, but just setting the related PIR bit.
> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>>>> waiting for the next scheduling in the runqueue. But I am not
>>>>>> sure whether we need to set SN for other purpose in future.
>>>>>> Adding SN checking here is just to follow the Spec. non-urgent
>>>>>> interrupts are suppressed
>>>>> when SN is set.
>>>>> 
>>>>> I would change that to a WARN_ON_ONCE then.
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> 
>>>> _______________________________________________
>>>> iommu mailing list
>>>> iommu@lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  4:44                 ` Zhang, Yang Z
@ 2014-12-19  4:49                   ` Wu, Feng
  2014-12-19  5:25                     ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  4:49 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 12:44 PM
> To: Wu, Feng; Paolo Bonzini; kvm@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > Zhang, Yang Z wrote on 2014-12-19:
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Wu, Feng wrote on 2014-12-19:
> >>>
> >>>
> >>> iommu-bounces@lists.linux-foundation.org wrote on
> >> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
> >>>> Cc: iommu@lists.linux-foundation.org;
> >>>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> >>>> is set
> >>>>
> >>>> Paolo Bonzini wrote on 2014-12-18:
> >>>>>
> >>>>>
> >>>>> On 18/12/2014 04:14, Wu, Feng wrote:
> >>>>>>
> >>>>>>
> >>>>>> linux-kernel-owner@vger.kernel.org wrote on
> >>>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
> >>>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
> dwmw2@infradead.org;
> >>>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
> >>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
> >>>>>>> iommu@lists.linux-foundation.org;
> >>>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM
> list;
> >>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt
> >>>>>>> when 'SN' is set
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
> >>>>>>>> Currently, we don't support urgent interrupt, all interrupts
> >>>>>>>> are recognized as non-urgent interrupt, so we cannot send
> >>>>>>>> posted-interrupt when 'SN' is set.
> >>>>>>>
> >>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot have
> >>>>>>> been scheduled out, and that's the only case when SN is set.
> >>>>>>>
> >>>>>>> Paolo
> >>>>>>
> >>>>>> Currently, the only place where SN is set is vCPU is preempted
> >>>>>> and
> >>>>
> >>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
> >>>> What happens if a PI is occurs when vCPU is preempted?
> >>>
> >>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
> >>> interrupts are suppressed for posting.
> >>
> >> I mean what happens if we don't set SN bit. From my point, if
> >> preempter already disabled the interrupt, it is ok to leave SN bit
> >> as zero. But if preempter enabled the interrupt, doesn't this mean
> >> he allow interrupt to happen? BTW, since there already has ON bit,
> >> so this means there only have one interrupt arrived at most and it
> >> doesn't hurt performance. Do we really need to set SN bit?
> >
> >
> > See this scenario:
> > vCPU0 is running on pCPU0
> > --> vCPU0 is preempted by vCPU1
> > --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule
> > --> in runqueue
> >
> > If the we don't set SN for vCPU0, then all subsequent interrupts for
> > vCPU0 is posted to vCPU1, this will consume hardware and software
> 
> The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0
> should be wakeup vector. Why vCPU1 will consume this PI event?

Wakeup vector is only used for blocking case, when vCPU is preempted
and waiting in the runqueue, the NV is the notification vector.

Thanks,
Feng

> 
> > efforts and in fact it is not needed at all. If SN is set for vCPU0,
> > VT-d hardware will not issue Notification Event for vCPU0 when an
> > interrupt is for it, but just setting the related PIR bit.
> >
> > Thanks,
> > Feng
> >
> >>
> >>>
> >>> Thanks,
> >>> Feng
> >>>
> >>>>
> >>>>>> waiting for the next scheduling in the runqueue. But I am not
> >>>>>> sure whether we need to set SN for other purpose in future.
> >>>>>> Adding SN checking here is just to follow the Spec. non-urgent
> >>>>>> interrupts are suppressed
> >>>>> when SN is set.
> >>>>>
> >>>>> I would change that to a WARN_ON_ONCE then.
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Yang
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> iommu mailing list
> >>>> iommu@lists.linux-foundation.org
> >>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  4:49                   ` Wu, Feng
@ 2014-12-19  5:25                     ` Zhang, Yang Z
  2014-12-19  5:46                       ` Wu, Feng
  2014-12-19 12:00                       ` Paolo Bonzini
  0 siblings, 2 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  5:25 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Wu, Feng wrote on 2014-12-19:
>>>>> 
>>>>> 
>>>>> iommu-bounces@lists.linux-foundation.org wrote on
>>>> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
>>>>>> Cc: iommu@lists.linux-foundation.org;
>>>>>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>> is set
>>>>>> 
>>>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>>>> 
>>>>>>> 
>>>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> linux-kernel-owner@vger.kernel.org wrote on
>>>>>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
>>>>>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
>>>>>>>>> dwmw2@infradead.org;
>>>>>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
>>>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM list;
>>>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
>>>>>>>>> posted-interrupt when 'SN' is set
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>>>> Currently, we don't support urgent interrupt, all
>>>>>>>>>> interrupts are recognized as non-urgent interrupt, so we
>>>>>>>>>> cannot send posted-interrupt when 'SN' is set.
>>>>>>>>> 
>>>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot
>>>>>>>>> have been scheduled out, and that's the only case when SN is set.
>>>>>>>>> 
>>>>>>>>> Paolo
>>>>>>>> 
>>>>>>>> Currently, the only place where SN is set is vCPU is
>>>>>>>> preempted and
>>>>>> 
>>>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>>>> What happens if a PI is occurs when vCPU is preempted?
>>>>> 
>>>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>>>> interrupts are suppressed for posting.
>>>> 
>>>> I mean what happens if we don't set SN bit. From my point, if
>>>> preempter already disabled the interrupt, it is ok to leave SN
>>>> bit as zero. But if preempter enabled the interrupt, doesn't this
>>>> mean he allow interrupt to happen? BTW, since there already has
>>>> ON bit, so this means there only have one interrupt arrived at
>>>> most and it doesn't hurt performance. Do we really need to set SN bit?
>>> 
>>> 
>>> See this scenario:
>>> vCPU0 is running on pCPU0
>>> --> vCPU0 is preempted by vCPU1
>>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
>>> --> schedule in runqueue
>>> 
>>> If the we don't set SN for vCPU0, then all subsequent interrupts
>>> for
>>> vCPU0 is posted to vCPU1, this will consume hardware and software
>> 
>> The PI vector for vCPU1 is notification vector, but the PI vector
>> for
>> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
> 
> Wakeup vector is only used for blocking case, when vCPU is preempted
> and waiting in the runqueue, the NV is the notification vector.

I see your point. But from performance point, if we can schedule the vCPU to another PCPU to handle the interrupt, it would helpful. But I remember current KVM will not schedule the vCPU in run queue (even though it got preempted) to another pCPU to run(Am I right?). So it may hard to do it.

> 
> Thanks,
> Feng
> 
>> 
>>> efforts and in fact it is not needed at all. If SN is set for
>>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
>>> when an interrupt is for it, but just setting the related PIR bit.
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Feng
>>>>> 
>>>>>> 
>>>>>>>> waiting for the next scheduling in the runqueue. But I am not
>>>>>>>> sure whether we need to set SN for other purpose in future.
>>>>>>>> Adding SN checking here is just to follow the Spec.
>>>>>>>> non-urgent interrupts are suppressed
>>>>>>> when SN is set.
>>>>>>> 
>>>>>>> I would change that to a WARN_ON_ONCE then.
>>>>>> 
>>>>>> 
>>>>>> Best regards,
>>>>>> Yang
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> iommu mailing list
>>>>>> iommu@lists.linux-foundation.org
>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  5:25                     ` Zhang, Yang Z
@ 2014-12-19  5:46                       ` Wu, Feng
  2014-12-19  7:04                         ` Zhang, Yang Z
  2014-12-19 12:00                       ` Paolo Bonzini
  1 sibling, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19  5:46 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 1:26 PM
> To: Wu, Feng; Paolo Bonzini; kvm@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > Zhang, Yang Z wrote on 2014-12-19:
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Wu, Feng wrote on 2014-12-19:
> >>>
> >>>
> >>> Zhang, Yang Z wrote on 2014-12-19:
> >>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> >>>> is set
> >>>>
> >>>> Wu, Feng wrote on 2014-12-19:
> >>>>>
> >>>>>
> >>>>> iommu-bounces@lists.linux-foundation.org wrote on
> >>>> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
> >>>>>> Cc: iommu@lists.linux-foundation.org;
> >>>>>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> >>>>>> is set
> >>>>>>
> >>>>>> Paolo Bonzini wrote on 2014-12-18:
> >>>>>>>
> >>>>>>>
> >>>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> linux-kernel-owner@vger.kernel.org wrote on
> >>>>>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
> >>>>>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
> >>>>>>>>> dwmw2@infradead.org;
> >>>>>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex
> Williamson;
> >>>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
> >>>>>>>>> iommu@lists.linux-foundation.org;
> >>>>>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; KVM
> list;
> >>>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
> >>>>>>>>> posted-interrupt when 'SN' is set
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
> >>>>>>>>>> Currently, we don't support urgent interrupt, all
> >>>>>>>>>> interrupts are recognized as non-urgent interrupt, so we
> >>>>>>>>>> cannot send posted-interrupt when 'SN' is set.
> >>>>>>>>>
> >>>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot
> >>>>>>>>> have been scheduled out, and that's the only case when SN is set.
> >>>>>>>>>
> >>>>>>>>> Paolo
> >>>>>>>>
> >>>>>>>> Currently, the only place where SN is set is vCPU is
> >>>>>>>> preempted and
> >>>>>>
> >>>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
> >>>>>> What happens if a PI is occurs when vCPU is preempted?
> >>>>>
> >>>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
> >>>>> interrupts are suppressed for posting.
> >>>>
> >>>> I mean what happens if we don't set SN bit. From my point, if
> >>>> preempter already disabled the interrupt, it is ok to leave SN
> >>>> bit as zero. But if preempter enabled the interrupt, doesn't this
> >>>> mean he allow interrupt to happen? BTW, since there already has
> >>>> ON bit, so this means there only have one interrupt arrived at
> >>>> most and it doesn't hurt performance. Do we really need to set SN bit?
> >>>
> >>>
> >>> See this scenario:
> >>> vCPU0 is running on pCPU0
> >>> --> vCPU0 is preempted by vCPU1
> >>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
> >>> --> schedule in runqueue
> >>>
> >>> If the we don't set SN for vCPU0, then all subsequent interrupts
> >>> for
> >>> vCPU0 is posted to vCPU1, this will consume hardware and software
> >>
> >> The PI vector for vCPU1 is notification vector, but the PI vector
> >> for
> >> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
> >
> > Wakeup vector is only used for blocking case, when vCPU is preempted
> > and waiting in the runqueue, the NV is the notification vector.
> 
> I see your point. But from performance point, if we can schedule the vCPU to
> another PCPU to handle the interrupt, it would helpful. But I remember current
> KVM will not schedule the vCPU in run queue (even though it got preempted) to
> another pCPU to run(Am I right?). So it may hard to do it.
> 

KVM is using the Linux scheduler, when the preempted vCPU (in runqueue) is
scheduled again depends on the scheduling algorithm itself, I think it is a little
hard for us to get involved.

I think what you mentioned is a little like the urgent interrupt in VT-d PI Spec.
For this kind of interrupts, if an interrupt is coming for an preempted vCPU
(waiting in the run queue), we need to schedule the vCPU immediately. This
is some real time things. And we don't support urgent interrupt so far.

Thanks,
Feng

> >
> > Thanks,
> > Feng
> >
> >>
> >>> efforts and in fact it is not needed at all. If SN is set for
> >>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
> >>> when an interrupt is for it, but just setting the related PIR bit.
> >>>
> >>> Thanks,
> >>> Feng
> >>>
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Feng
> >>>>>
> >>>>>>
> >>>>>>>> waiting for the next scheduling in the runqueue. But I am not
> >>>>>>>> sure whether we need to set SN for other purpose in future.
> >>>>>>>> Adding SN checking here is just to follow the Spec.
> >>>>>>>> non-urgent interrupts are suppressed
> >>>>>>> when SN is set.
> >>>>>>>
> >>>>>>> I would change that to a WARN_ON_ONCE then.
> >>>>>>
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Yang
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> iommu mailing list
> >>>>>> iommu@lists.linux-foundation.org
> >>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Yang
> >>>>
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  5:46                       ` Wu, Feng
@ 2014-12-19  7:04                         ` Zhang, Yang Z
  0 siblings, 0 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-19  7:04 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, kvm; +Cc: iommu, linux-kernel, kvm

Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Wu, Feng wrote on 2014-12-19:
>>>>> 
>>>>> 
>>>>> Zhang, Yang Z wrote on 2014-12-19:
>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>> is set
>>>>>> 
>>>>>> Wu, Feng wrote on 2014-12-19:
>>>>>>> 
>>>>>>> 
>>>>>>> iommu-bounces@lists.linux-foundation.org wrote on
>>>>>> mailto:iommu-bounces@lists.linux-foundation.org] On Behalf Of:
>>>>>>>> Cc: iommu@lists.linux-foundation.org;
>>>>>>>> linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>>>> is set
>>>>>>>> 
>>>>>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> linux-kernel-owner@vger.kernel.org wrote on
>>>>>>>> mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Paolo:
>>>>>>>>>>> x86@kernel.org; Gleb Natapov; Paolo Bonzini;
>>>>>>>>>>> dwmw2@infradead.org;
>>>>>>>>>>> joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org; Alex Williamson;
>>>>>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>>>>>> linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> KVM
>> list;
>>>>>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
>>>>>>>>>>> posted-interrupt when 'SN' is set
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>>>>>> Currently, we don't support urgent interrupt, all
>>>>>>>>>>>> interrupts are recognized as non-urgent interrupt, so we
>>>>>>>>>>>> cannot send posted-interrupt when 'SN' is set.
>>>>>>>>>>> 
>>>>>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot
>>>>>>>>>>> have been scheduled out, and that's the only case when SN is set.
>>>>>>>>>>> 
>>>>>>>>>>> Paolo
>>>>>>>>>> 
>>>>>>>>>> Currently, the only place where SN is set is vCPU is
>>>>>>>>>> preempted and
>>>>>>>> 
>>>>>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>>>>>> What happens if a PI is occurs when vCPU is preempted?
>>>>>>> 
>>>>>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>>>>>> interrupts are suppressed for posting.
>>>>>> 
>>>>>> I mean what happens if we don't set SN bit. From my point, if
>>>>>> preempter already disabled the interrupt, it is ok to leave SN
>>>>>> bit as zero. But if preempter enabled the interrupt, doesn't
>>>>>> this mean he allow interrupt to happen? BTW, since there
>>>>>> already has ON bit, so this means there only have one interrupt
>>>>>> arrived at most and it doesn't hurt performance. Do we really need to set SN bit?
>>>>> 
>>>>> 
>>>>> See this scenario:
>>>>> vCPU0 is running on pCPU0
>>>>> --> vCPU0 is preempted by vCPU1
>>>>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
>>>>> --> schedule in runqueue
>>>>> 
>>>>> If the we don't set SN for vCPU0, then all subsequent interrupts
>>>>> for
>>>>> vCPU0 is posted to vCPU1, this will consume hardware and
>>>>> software
>>>> 
>>>> The PI vector for vCPU1 is notification vector, but the PI vector
>>>> for
>>>> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
>>> 
>>> Wakeup vector is only used for blocking case, when vCPU is
>>> preempted and waiting in the runqueue, the NV is the notification vector.
>> 
>> I see your point. But from performance point, if we can schedule the
>> vCPU to another PCPU to handle the interrupt, it would helpful. But I
>> remember current KVM will not schedule the vCPU in run queue (even
>> though it got preempted) to another pCPU to run(Am I right?). So it may
>> hard to do it.
>> 
> 
> KVM is using the Linux scheduler, when the preempted vCPU (in
> runqueue) is scheduled again depends on the scheduling algorithm
> itself, I think it is a little hard for us to get involved.
> 
> I think what you mentioned is a little like the urgent interrupt in VT-d PI Spec.
> For this kind of interrupts, if an interrupt is coming for an
> preempted vCPU (waiting in the run queue), we need to schedule the
> vCPU immediately. This is some real time things. And we don't support urgent interrupt so far.

Yes. IIRC, if we use two global vectors mechanism properly, there should no need to use hardware urgent interrupt mechanism. :)

> 
> Thanks,
> Feng
> 
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>>> efforts and in fact it is not needed at all. If SN is set for
>>>>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
>>>>> when an interrupt is for it, but just setting the related PIR bit.
>>>>> 
>>>>> Thanks,
>>>>> Feng
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Feng
>>>>>>> 
>>>>>>>> 
>>>>>>>>>> waiting for the next scheduling in the runqueue. But I am
>>>>>>>>>> not sure whether we need to set SN for other purpose in future.
>>>>>>>>>> Adding SN checking here is just to follow the Spec.
>>>>>>>>>> non-urgent interrupts are suppressed
>>>>>>>>> when SN is set.
>>>>>>>>> 
>>>>>>>>> I would change that to a WARN_ON_ONCE then.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Yang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> iommu mailing list
>>>>>>>> iommu@lists.linux-foundation.org
>>>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>>> 
>>>>>> 
>>>>>> Best regards,
>>>>>> Yang
>>>>>> 
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-19  1:30       ` Wu, Feng
  2014-12-19  1:47         ` Zhang, Yang Z
@ 2014-12-19 11:59         ` Paolo Bonzini
  2014-12-19 23:48           ` Wu, Feng
  1 sibling, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-19 11:59 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, Yang Zhang, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson, Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 19/12/2014 02:30, Wu, Feng wrote:
>>> How this can work well? All subsequent interrupts are delivered to
>>> one vCPU? It shouldn't be the best solution, need more consideration.
>>
>> Well, it's a hardware limitation.  The alternative (which is easy to
>> implement) is to only do PI for single-CPU interrupts.  This should work
>> well for multiqueue NICs (and of course for UP guests :)), so perhaps
>> it's a good idea to only support that as a first attempt.
>>
>> Paolo
> 
> Paolo, what do you mean by "single-CPU interrupts"? Do you mean we don't
> support lowest priority interrupts for PI? But Linux OS uses lowest priority
> for most of the case? If so, we can hardly get benefit from this feature for
> Linux guest OS.

You can post lowest priority interrupts if they are delivered to a
single CPU, in which case they are effectively fixed priority.

If they are broadcast to multiple CPUs, do not post them.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-19  1:46       ` Zhang, Yang Z
@ 2014-12-19 11:59         ` Paolo Bonzini
  2014-12-23  0:37           ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-19 11:59 UTC (permalink / raw)
  To: Yang Zhang, Wu, Feng, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, Paolo Bonzini, dwmw2, joro,
	Alex Williamson, Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 19/12/2014 02:46, Zhang, Yang Z wrote:
>> If the IRQ is posted, its affinity is controlled by guest (irq <--->
>> vCPU <----> pCPU), it has no effect when host changes its affinity.
> 
> That's the problem: User is able to changes it in host but it never
> takes effect since it is actually controlled by guest. I guess it
> will break the IRQ balance too.

I don't think that's a problem.

Controlling the affinity in the host affects which CPU in the host takes
care of signaling the guest.

If this signaling is done directly by the chipset, there is no need to
do anything in the host and thus the host affinity can be bypassed.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
  2014-12-19  2:19     ` Wu, Feng
@ 2014-12-19 11:59       ` Paolo Bonzini
  2014-12-19 23:39         ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-19 11:59 UTC (permalink / raw)
  To: Wu, Feng, linux-kernel; +Cc: iommu, kvm



On 19/12/2014 03:19, Wu, Feng wrote:
>>> > >
>>> > > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>>> > > +
>>> > > +struct kvm_irq_routing_table {
>>> > > +	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
>>> > > +	struct kvm_kernel_irq_routing_entry *rt_entries;
>>> > > +	u32 nr_rt_entries;
>>> > > +	/*
>>> > > +	 * Array indexed by gsi. Each entry contains list of irq chips
>>> > > +	 * the gsi is connected to.
>>> > > +	 */
>>> > > +	struct hlist_head map[0];
>>> > > +};
>>> > > +
>>> > > +#else
>>> > > +
>>> > > +struct kvm_irq_routing_table {};
>> > 
>> > If possible, just make this "struct kvm_irq_routing_table;" and pull
>> > this line to include/linux/kvm_types.h.
>> > 
>> > Paolo
> Do you mean move the definition of struct kvm_irq_routing_table
> to include/linux/kvm_types.h and add a declaration here?

Move

struct kvm_irq_routing_table;

to include/linux/kvm_types.h.  In kvm_host.h, leave the #ifdef with the
full definition but drop the #else.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19  5:25                     ` Zhang, Yang Z
  2014-12-19  5:46                       ` Wu, Feng
@ 2014-12-19 12:00                       ` Paolo Bonzini
  2014-12-19 23:34                         ` Wu, Feng
  1 sibling, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-19 12:00 UTC (permalink / raw)
  To: Yang Zhang, Wu, Feng, Paolo Bonzini, KVM list; +Cc: iommu, linux-kernel



On 19/12/2014 06:25, Zhang, Yang Z wrote:
> I see your point. But from performance point, if we can schedule the
> vCPU to another PCPU to handle the interrupt, it would helpful. But I
> remember current KVM will not schedule the vCPU in run queue (even
> though it got preempted) to another pCPU to run(Am I right?). So it
> may hard to do it.

Yes.  If the vCPU is in the run queue, it means it exhausted its
quantum.  As Feng said, the scheduler can decide to migrate it to
another pCPU, or it can decide to leave it runnable but not start it.
KVM doesn't try to force the scheduler one way or the other.

If the vCPU is I/O bound, it will not exhaust its quantum and will not
be preempted.  It will block, and the wakeup vector will restart it.

I don't think urgent notifications are interesting.  If you want to do
real time work, pin the vCPU to a physical CPU, and isolate the pCPU
with isolcpus.  Then the vCPU will always be running.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
  2014-12-19 12:00                       ` Paolo Bonzini
@ 2014-12-19 23:34                         ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19 23:34 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, KVM list; +Cc: iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Friday, December 19, 2014 8:01 PM
> To: Zhang, Yang Z; Wu, Feng; Paolo Bonzini; KVM list
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> 
> 
> On 19/12/2014 06:25, Zhang, Yang Z wrote:
> > I see your point. But from performance point, if we can schedule the
> > vCPU to another PCPU to handle the interrupt, it would helpful. But I
> > remember current KVM will not schedule the vCPU in run queue (even
> > though it got preempted) to another pCPU to run(Am I right?). So it
> > may hard to do it.
> 
> Yes.  If the vCPU is in the run queue, it means it exhausted its
> quantum.  As Feng said, the scheduler can decide to migrate it to
> another pCPU, or it can decide to leave it runnable but not start it.
> KVM doesn't try to force the scheduler one way or the other.
> 
> If the vCPU is I/O bound, it will not exhaust its quantum and will not
> be preempted.  It will block, and the wakeup vector will restart it.
> 
> I don't think urgent notifications are interesting.  If you want to do
> real time work, pin the vCPU to a physical CPU, and isolate the pCPU
> with isolcpus.  Then the vCPU will always be running.
> 
> Paolo

I agree, thanks Paolo!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
  2014-12-19 11:59       ` Paolo Bonzini
@ 2014-12-19 23:39         ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2014-12-19 23:39 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel; +Cc: iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Friday, December 19, 2014 8:00 PM
> To: Wu, Feng; linux-kernel@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
> 
> 
> 
> On 19/12/2014 03:19, Wu, Feng wrote:
> >>> > >
> >>> > > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> >>> > > +
> >>> > > +struct kvm_irq_routing_table {
> >>> > > +	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> >>> > > +	struct kvm_kernel_irq_routing_entry *rt_entries;
> >>> > > +	u32 nr_rt_entries;
> >>> > > +	/*
> >>> > > +	 * Array indexed by gsi. Each entry contains list of irq chips
> >>> > > +	 * the gsi is connected to.
> >>> > > +	 */
> >>> > > +	struct hlist_head map[0];
> >>> > > +};
> >>> > > +
> >>> > > +#else
> >>> > > +
> >>> > > +struct kvm_irq_routing_table {};
> >> >
> >> > If possible, just make this "struct kvm_irq_routing_table;" and pull
> >> > this line to include/linux/kvm_types.h.
> >> >
> >> > Paolo
> > Do you mean move the definition of struct kvm_irq_routing_table
> > to include/linux/kvm_types.h and add a declaration here?
> 
> Move
> 
> struct kvm_irq_routing_table;
> 
> to include/linux/kvm_types.h.  In kvm_host.h, leave the #ifdef with the
> full definition but drop the #else.
> 
> Paolo


Paolo, Thanks for the explanation. I notice that " struct kvm_irq_routing_table;"
is already in include/linux/kvm_types.h.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-19 11:59         ` Paolo Bonzini
@ 2014-12-19 23:48           ` Wu, Feng
  2014-12-20 13:16             ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-19 23:48 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Friday, December 19, 2014 7:59 PM
> To: Wu, Feng; Paolo Bonzini; Zhang, Yang Z; Thomas Gleixner; Ingo Molnar; H.
> Peter Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 19/12/2014 02:30, Wu, Feng wrote:
> >>> How this can work well? All subsequent interrupts are delivered to
> >>> one vCPU? It shouldn't be the best solution, need more consideration.
> >>
> >> Well, it's a hardware limitation.  The alternative (which is easy to
> >> implement) is to only do PI for single-CPU interrupts.  This should work
> >> well for multiqueue NICs (and of course for UP guests :)), so perhaps
> >> it's a good idea to only support that as a first attempt.
> >>
> >> Paolo
> >
> > Paolo, what do you mean by "single-CPU interrupts"? Do you mean we don't
> > support lowest priority interrupts for PI? But Linux OS uses lowest priority
> > for most of the case? If so, we can hardly get benefit from this feature for
> > Linux guest OS.
> 
> You can post lowest priority interrupts if they are delivered to a
> single CPU, in which case they are effectively fixed priority.
> 
> If they are broadcast to multiple CPUs, do not post them.
> 
> Paolo

In my understanding, lowest priority interrupts are always delivered to a
Single CPU, we need to find the right destination CPU from the cpumask.
This is what I do in this patch. Did I misunderstanding something in your
Comments? Thanks a lot!

Actually, we don't support posting broadcast/multicast interrupts, because
the interrupt is associated with one Posted-interrupts descriptor, then one
vCPU.

Thanks,
Feng


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-19 23:48           ` Wu, Feng
@ 2014-12-20 13:16             ` Paolo Bonzini
  2014-12-22  4:48               ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-20 13:16 UTC (permalink / raw)
  To: Wu, Feng, Paolo Bonzini, Yang Zhang, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson, Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 20/12/2014 00:48, Wu, Feng wrote:
> In my understanding, lowest priority interrupts are always delivered to a
> Single CPU, we need to find the right destination CPU from the cpumask.

Yes, but which CPU however differs every time the interrupt is
delivered.  So the emulation here is a bit poor.  For now, please limit
PI to fixed interrupts.

> Actually, we don't support posting broadcast/multicast interrupts, because
> the interrupt is associated with one Posted-interrupts descriptor, then one
> vCPU.

Understood.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-20 13:16             ` Paolo Bonzini
@ 2014-12-22  4:48               ` Wu, Feng
  2014-12-22  9:27                 ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-22  4:48 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Saturday, December 20, 2014 9:17 PM
> To: Wu, Feng; Paolo Bonzini; Zhang, Yang Z; Thomas Gleixner; Ingo Molnar; H.
> Peter Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 20/12/2014 00:48, Wu, Feng wrote:
> > In my understanding, lowest priority interrupts are always delivered to a
> > Single CPU, we need to find the right destination CPU from the cpumask.
> 
> Yes, but which CPU however differs every time the interrupt is
> delivered.  So the emulation here is a bit poor.  For now, please limit
> PI to fixed interrupts.

Do you mean we don't support Lowest priority interrupts? As I mentioned before,
Lowest priority interrupts is widely used in Linux, so I think supporting lowest priority
interrupts is very important for Linux guest OS. Do you have any ideas/suggestions about
how to support Lowest priority interrupts for PI? Thanks a lot!

Thanks,
Feng

> 
> > Actually, we don't support posting broadcast/multicast interrupts, because
> > the interrupt is associated with one Posted-interrupts descriptor, then one
> > vCPU.
> 
> Understood.
> 
> Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22  4:48               ` Wu, Feng
@ 2014-12-22  9:27                 ` Paolo Bonzini
  2014-12-22 11:04                   ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-22  9:27 UTC (permalink / raw)
  To: Wu, Feng, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 22/12/2014 05:48, Wu, Feng wrote:
> Do you mean we don't support Lowest priority interrupts? As I mentioned before,
> Lowest priority interrupts is widely used in Linux, so I think supporting lowest priority
> interrupts is very important for Linux guest OS. Do you have any ideas/suggestions about
> how to support Lowest priority interrupts for PI? Thanks a lot!

Can you support them only if the destination is a single CPU?

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22  9:27                 ` Paolo Bonzini
@ 2014-12-22 11:04                   ` Wu, Feng
  2014-12-22 11:06                     ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-22 11:04 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, December 22, 2014 5:28 PM
> To: Wu, Feng; Zhang, Yang Z; Thomas Gleixner; Ingo Molnar; H. Peter Anvin;
> x86@kernel.org; Gleb Natapov; dwmw2@infradead.org; joro@8bytes.org;
> Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 22/12/2014 05:48, Wu, Feng wrote:
> > Do you mean we don't support Lowest priority interrupts? As I mentioned
> before,
> > Lowest priority interrupts is widely used in Linux, so I think supporting lowest
> priority
> > interrupts is very important for Linux guest OS. Do you have any
> ideas/suggestions about
> > how to support Lowest priority interrupts for PI? Thanks a lot!
> 
> Can you support them only if the destination is a single CPU?

Sorry, I am not quite understand this. I still don't understand the "single CPU" here.
Lowest priority interrupts always have a cpumask which contains multiple CPU.

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22 11:04                   ` Wu, Feng
@ 2014-12-22 11:06                     ` Paolo Bonzini
  2014-12-22 11:17                       ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-22 11:06 UTC (permalink / raw)
  To: Wu, Feng, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 22/12/2014 12:04, Wu, Feng wrote:
> > Can you support them only if the destination is a single CPU?
>
> Sorry, I am not quite understand this. I still don't understand the "single CPU" here.
> Lowest priority interrupts always have a cpumask which contains multiple CPU.

Yes, and those need not be accelerated.  But what if you set affinity to
a single CPU?

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22 11:06                     ` Paolo Bonzini
@ 2014-12-22 11:17                       ` Wu, Feng
  2014-12-22 11:23                         ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-22 11:17 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, December 22, 2014 7:07 PM
> To: Wu, Feng; Zhang, Yang Z; Thomas Gleixner; Ingo Molnar; H. Peter Anvin;
> x86@kernel.org; Gleb Natapov; dwmw2@infradead.org; joro@8bytes.org;
> Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 22/12/2014 12:04, Wu, Feng wrote:
> > > Can you support them only if the destination is a single CPU?
> >
> > Sorry, I am not quite understand this. I still don't understand the "single CPU"
> here.
> > Lowest priority interrupts always have a cpumask which contains multiple
> CPU.
> 
> Yes, and those need not be accelerated.  But what if you set affinity to
> a single CPU?

How do I set affinity to a single CPU if guest configure a lowest priority interrupt? Thanks a lot!

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22 11:17                       ` Wu, Feng
@ 2014-12-22 11:23                         ` Paolo Bonzini
  2014-12-22 14:13                           ` Yong Wang
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-22 11:23 UTC (permalink / raw)
  To: Wu, Feng, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 22/12/2014 12:17, Wu, Feng wrote:
>> Yes, and those need not be accelerated.  But what if you set
>> affinity to a single CPU?
>
> How do I set affinity to a single CPU if guest configure a lowest
> priority interrupt? Thanks a lot!

I mean if the guest (via irqbalance and /proc/irq/) configures affinity
to a single vCPU.  In that case, you can use PI.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-22 11:23                         ` Paolo Bonzini
@ 2014-12-22 14:13                           ` Yong Wang
  0 siblings, 0 replies; 140+ messages in thread
From: Yong Wang @ 2014-12-22 14:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wu, Feng, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu, iommu, linux-kernel, KVM list, Eric Auger

On Mon, Dec 22, 2014 at 12:23:36PM +0100, Paolo Bonzini wrote:
> 
> 
> On 22/12/2014 12:17, Wu, Feng wrote:
> >> Yes, and those need not be accelerated.  But what if you set
> >> affinity to a single CPU?
> >
> > How do I set affinity to a single CPU if guest configure a lowest
> > priority interrupt? Thanks a lot!
> 
> I mean if the guest (via irqbalance and /proc/irq/) configures affinity
> to a single vCPU.  In that case, you can use PI.
> 

The problem is we still need to support PI with lowest priority delivery mode
even if guest does not configure irq affinity via /proc/irq/. Don't we?

Thanks
-Yong


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-19 11:59         ` Paolo Bonzini
@ 2014-12-23  0:37           ` Zhang, Yang Z
  2014-12-23  8:47             ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-23  0:37 UTC (permalink / raw)
  To: Paolo Bonzini, Wu, Feng, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger

Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 19/12/2014 02:46, Zhang, Yang Z wrote:
>>> If the IRQ is posted, its affinity is controlled by guest (irq
>>> <---> vCPU <----> pCPU), it has no effect when host changes its affinity.
>> 
>> That's the problem: User is able to changes it in host but it never
>> takes effect since it is actually controlled by guest. I guess it
>> will break the IRQ balance too.
> 
> I don't think that's a problem.
> 
> Controlling the affinity in the host affects which CPU in the host
> takes care of signaling the guest.
> 
> If this signaling is done directly by the chipset, there is no need to
> do anything in the host and thus the host affinity can be bypassed.

I don't quite understand it. If user set an interrupt's affinity to a CPU, but he still see the interrupt delivers to other CPUs in host. Do you think it is a right behavior?

> 
> Paolo


Best regards,
Yang


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-23  0:37           ` Zhang, Yang Z
@ 2014-12-23  8:47             ` Paolo Bonzini
  2014-12-23  9:07               ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-23  8:47 UTC (permalink / raw)
  To: Yang Zhang, Paolo Bonzini, Wu, Feng, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson, Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 23/12/2014 01:37, Zhang, Yang Z wrote:
> I don't quite understand it. If user set an interrupt's affinity to a
> CPU, but he still see the interrupt delivers to other CPUs in host.
> Do you think it is a right behavior?

No, the interrupt is not delivered at all in the host.  Normally you'd have:

- interrupt delivered to CPU from host affinity

- VFIO interrupt handler writes to irqfd

- interrupt delivered to vCPU from guest affinity

Here, you just skip the first two steps.  The interrupt is delivered to
the thread that is running the vCPU directly, so the host affinity is
bypassed entirely.

... unless you are considering the case where the vCPU is blocked and
the host is processing the posted interrupt wakeup vector.  In that case
yes, it would be better to set NDST to a CPU matching the host affinity.
 But it would be handled in patch 24.  We also have the same problem
with lowest-priority interrupts; likely the host has configured the
interrupt affinity for any CPU.  So we can do it later when we add
vector hashing support.  In the meanwhile, Feng, please add a FIXME comment.

Does this make sense?

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-23  8:47             ` Paolo Bonzini
@ 2014-12-23  9:07               ` Wu, Feng
  2014-12-23  9:34                 ` Paolo Bonzini
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-23  9:07 UTC (permalink / raw)
  To: Paolo Bonzini, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Tuesday, December 23, 2014 4:48 PM
> To: Zhang, Yang Z; Paolo Bonzini; Wu, Feng; Thomas Gleixner; Ingo Molnar; H.
> Peter Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> 
> 
> On 23/12/2014 01:37, Zhang, Yang Z wrote:
> > I don't quite understand it. If user set an interrupt's affinity to a
> > CPU, but he still see the interrupt delivers to other CPUs in host.
> > Do you think it is a right behavior?
> 
> No, the interrupt is not delivered at all in the host.  Normally you'd have:
> 
> - interrupt delivered to CPU from host affinity
> 
> - VFIO interrupt handler writes to irqfd
> 
> - interrupt delivered to vCPU from guest affinity
> 
> Here, you just skip the first two steps.  The interrupt is delivered to
> the thread that is running the vCPU directly, so the host affinity is
> bypassed entirely.
> 
> ... unless you are considering the case where the vCPU is blocked and
> the host is processing the posted interrupt wakeup vector.  In that case
> yes, it would be better to set NDST to a CPU matching the host affinity.

In my understanding, wakeup vector should have no relationship with the
host affinity of the irq. Wakeup notification event should be delivered to
the pCPU which the vCPU was blocked on. And in kernel's point of view,
the irq is not associated with the wakeup vector, right?

Thanks,
Feng

>  But it would be handled in patch 24.  We also have the same problem
> with lowest-priority interrupts; likely the host has configured the
> interrupt affinity for any CPU.  So we can do it later when we add
> vector hashing support.  In the meanwhile, Feng, please add a FIXME
> comment.
> 
> Does this make sense?
> 
> Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-23  9:07               ` Wu, Feng
@ 2014-12-23  9:34                 ` Paolo Bonzini
  2014-12-24  1:38                   ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2014-12-23  9:34 UTC (permalink / raw)
  To: Wu, Feng, Zhang, Yang Z, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger



On 23/12/2014 10:07, Wu, Feng wrote:
>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>> I don't quite understand it. If user set an interrupt's affinity to a
>>> CPU, but he still see the interrupt delivers to other CPUs in host.
>>> Do you think it is a right behavior?
>>
>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>
>> - interrupt delivered to CPU from host affinity
>>
>> - VFIO interrupt handler writes to irqfd
>>
>> - interrupt delivered to vCPU from guest affinity
>>
>> Here, you just skip the first two steps.  The interrupt is delivered to
>> the thread that is running the vCPU directly, so the host affinity is
>> bypassed entirely.
>>
>> ... unless you are considering the case where the vCPU is blocked and
>> the host is processing the posted interrupt wakeup vector.  In that case
>> yes, it would be better to set NDST to a CPU matching the host affinity.
> 
> In my understanding, wakeup vector should have no relationship with the
> host affinity of the irq. Wakeup notification event should be delivered to
> the pCPU which the vCPU was blocked on. And in kernel's point of view,
> the irq is not associated with the wakeup vector, right?

That is correct indeed.  It is not associated to the wakeup vector,
hence this patch is right, I think.

However, the wakeup vector has the same function as the VFIO interrupt
handler, so you could argue that it is tied to the host affinity rather
than the guest.  Let's wait for Yang to answer.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-23  9:34                 ` Paolo Bonzini
@ 2014-12-24  1:38                   ` Zhang, Yang Z
  2014-12-24  2:12                     ` Jiang Liu
  0 siblings, 1 reply; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-24  1:38 UTC (permalink / raw)
  To: Paolo Bonzini, Wu, Feng, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson,
	Jiang Liu
  Cc: iommu, linux-kernel, KVM list, Eric Auger

Paolo Bonzini wrote on 2014-12-23:
> 
> 
> On 23/12/2014 10:07, Wu, Feng wrote:
>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>> I don't quite understand it. If user set an interrupt's affinity
>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>> Do you think it is a right behavior?
>>> 
>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>> 
>>> - interrupt delivered to CPU from host affinity
>>> 
>>> - VFIO interrupt handler writes to irqfd
>>> 
>>> - interrupt delivered to vCPU from guest affinity
>>> 
>>> Here, you just skip the first two steps.  The interrupt is
>>> delivered to the thread that is running the vCPU directly, so the
>>> host affinity is bypassed entirely.
>>> 
>>> ... unless you are considering the case where the vCPU is blocked
>>> and the host is processing the posted interrupt wakeup vector.  In
>>> that case yes, it would be better to set NDST to a CPU matching the host affinity.
>> 
>> In my understanding, wakeup vector should have no relationship with
>> the host affinity of the irq. Wakeup notification event should be
>> delivered to the pCPU which the vCPU was blocked on. And in kernel's
>> point of view, the irq is not associated with the wakeup vector, right?
> 
> That is correct indeed.  It is not associated to the wakeup vector,
> hence this patch is right, I think.
> 
> However, the wakeup vector has the same function as the VFIO interrupt
> handler, so you could argue that it is tied to the host affinity
> rather than the guest.  Let's wait for Yang to answer.

Actually, that's my original question too. I am wondering what happens if the user changes the assigned device's affinity in host's /proc/irq/? If ignore it is acceptable, then this patch is ok. But it seems the discussion out of my scope, need some experts to tell us their idea since it will impact the user experience. 

> 
> Paolo


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-24  1:38                   ` Zhang, Yang Z
@ 2014-12-24  2:12                     ` Jiang Liu
  2014-12-24  2:32                       ` Zhang, Yang Z
  0 siblings, 1 reply; 140+ messages in thread
From: Jiang Liu @ 2014-12-24  2:12 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, Wu, Feng, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger

On 2014/12/24 9:38, Zhang, Yang Z wrote:
> Paolo Bonzini wrote on 2014-12-23:
>>
>>
>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>> Do you think it is a right behavior?
>>>>
>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>
>>>> - interrupt delivered to CPU from host affinity
>>>>
>>>> - VFIO interrupt handler writes to irqfd
>>>>
>>>> - interrupt delivered to vCPU from guest affinity
>>>>
>>>> Here, you just skip the first two steps.  The interrupt is
>>>> delivered to the thread that is running the vCPU directly, so the
>>>> host affinity is bypassed entirely.
>>>>
>>>> ... unless you are considering the case where the vCPU is blocked
>>>> and the host is processing the posted interrupt wakeup vector.  In
>>>> that case yes, it would be better to set NDST to a CPU matching the host affinity.
>>>
>>> In my understanding, wakeup vector should have no relationship with
>>> the host affinity of the irq. Wakeup notification event should be
>>> delivered to the pCPU which the vCPU was blocked on. And in kernel's
>>> point of view, the irq is not associated with the wakeup vector, right?
>>
>> That is correct indeed.  It is not associated to the wakeup vector,
>> hence this patch is right, I think.
>>
>> However, the wakeup vector has the same function as the VFIO interrupt
>> handler, so you could argue that it is tied to the host affinity
>> rather than the guest.  Let's wait for Yang to answer.
> 
> Actually, that's my original question too. I am wondering what happens if the user changes the assigned device's affinity in host's /proc/irq/? If ignore it is acceptable, then this patch is ok. But it seems the discussion out of my scope, need some experts to tell us their idea since it will impact the user experience. 
Hi Yang,
	Originally we have a proposal to return failure when user
sets IRQ affinity through native OS interfaces if an IRQ is in PI
mode. But that proposal will break CPU hot-removal because OS needs
to migrate away all IRQs binding to the CPU to be offlined. Then we
propose saving user IRQ affinity setting without changing hardware
configuration (keeping PI configuration). Later when PI mode is
disabled, the cached affinity setting will be used to setup IRQ
destination for native OS. On the other hand, for IRQ in PI mode,
it won't be delivered to native OS, so user may not sense that
the IRQ is delivered to CPUs other than those in the affinity set.
In that aspect, I think it's acceptable:)
Regards!
Gerry
> 
>>
>> Paolo
> 
> 
> Best regards,
> Yang
> 
> 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-24  2:12                     ` Jiang Liu
@ 2014-12-24  2:32                       ` Zhang, Yang Z
  2014-12-24  3:08                         ` Wu, Feng
  2014-12-24  4:54                         ` Jiang Liu
  0 siblings, 2 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-24  2:32 UTC (permalink / raw)
  To: Jiang Liu, Paolo Bonzini, Wu, Feng, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger

Jiang Liu wrote on 2014-12-24:
> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>> Paolo Bonzini wrote on 2014-12-23:
>>> 
>>> 
>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>>> Do you think it is a right behavior?
>>>>> 
>>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>> 
>>>>> - interrupt delivered to CPU from host affinity
>>>>> 
>>>>> - VFIO interrupt handler writes to irqfd
>>>>> 
>>>>> - interrupt delivered to vCPU from guest affinity
>>>>> 
>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>> delivered to the thread that is running the vCPU directly, so the
>>>>> host affinity is bypassed entirely.
>>>>> 
>>>>> ... unless you are considering the case where the vCPU is blocked
>>>>> and the host is processing the posted interrupt wakeup vector.
>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>> matching the host
> affinity.
>>>> 
>>>> In my understanding, wakeup vector should have no relationship
>>>> with the host affinity of the irq. Wakeup notification event
>>>> should be delivered to the pCPU which the vCPU was blocked on. And
>>>> in kernel's point of view, the irq is not associated with the wakeup vector, right?
>>> 
>>> That is correct indeed.  It is not associated to the wakeup vector,
>>> hence this patch is right, I think.
>>> 
>>> However, the wakeup vector has the same function as the VFIO
>>> interrupt handler, so you could argue that it is tied to the host
>>> affinity rather than the guest.  Let's wait for Yang to answer.
>> 
>> Actually, that's my original question too. I am wondering what
>> happens if the
> user changes the assigned device's affinity in host's /proc/irq/? If
> ignore it is acceptable, then this patch is ok. But it seems the
> discussion out of my scope, need some experts to tell us their idea since it will impact the user experience.
> Hi Yang,

Hi Jiang,

> 	Originally we have a proposal to return failure when user sets IRQ
> affinity through native OS interfaces if an IRQ is in PI mode. But
> that proposal will break CPU hot-removal because OS needs to migrate
> away all IRQs binding to the CPU to be offlined. Then we propose
> saving user IRQ affinity setting without changing hardware
> configuration (keeping PI configuration). Later when PI mode is
> disabled, the cached affinity setting will be used to setup IRQ
> destination for native OS. On the other hand, for IRQ in PI mode, it
> won't be delivered to native OS, so user may not sense that the IRQ is delivered to CPUs other than those in the affinity set.

The IRQ is still there but will be delivered to host in the form of PI event(if the VCPU is running in root-mode). I am not sure whether those interrupts should be reflected in /proc/interrupts? If the answer is yes, then which entries should be used, a new PI entry or use the original IRQ entry?

> In that aspect, I think it's acceptable:) Regards!

Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable then we can follow current implementation and document it.

> Gerry
>> 
>>> 
>>> Paolo
>> 
>> 
>> Best regards,
>> Yang
>> 
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-24  2:32                       ` Zhang, Yang Z
@ 2014-12-24  3:08                         ` Wu, Feng
  2014-12-24  4:04                           ` Zhang, Yang Z
  2014-12-24  4:54                         ` Jiang Liu
  1 sibling, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2014-12-24  3:08 UTC (permalink / raw)
  To: Zhang, Yang Z, Jiang Liu, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger, Wu, Feng



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Wednesday, December 24, 2014 10:33 AM
> To: Jiang Liu; Paolo Bonzini; Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter
> Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> Jiang Liu wrote on 2014-12-24:
> > On 2014/12/24 9:38, Zhang, Yang Z wrote:
> >> Paolo Bonzini wrote on 2014-12-23:
> >>>
> >>>
> >>> On 23/12/2014 10:07, Wu, Feng wrote:
> >>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
> >>>>>> I don't quite understand it. If user set an interrupt's affinity
> >>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
> >>>>>> Do you think it is a right behavior?
> >>>>>
> >>>>> No, the interrupt is not delivered at all in the host.  Normally you'd
> have:
> >>>>>
> >>>>> - interrupt delivered to CPU from host affinity
> >>>>>
> >>>>> - VFIO interrupt handler writes to irqfd
> >>>>>
> >>>>> - interrupt delivered to vCPU from guest affinity
> >>>>>
> >>>>> Here, you just skip the first two steps.  The interrupt is
> >>>>> delivered to the thread that is running the vCPU directly, so the
> >>>>> host affinity is bypassed entirely.
> >>>>>
> >>>>> ... unless you are considering the case where the vCPU is blocked
> >>>>> and the host is processing the posted interrupt wakeup vector.
> >>>>> In that case yes, it would be better to set NDST to a CPU
> >>>>> matching the host
> > affinity.
> >>>>
> >>>> In my understanding, wakeup vector should have no relationship
> >>>> with the host affinity of the irq. Wakeup notification event
> >>>> should be delivered to the pCPU which the vCPU was blocked on. And
> >>>> in kernel's point of view, the irq is not associated with the wakeup vector,
> right?
> >>>
> >>> That is correct indeed.  It is not associated to the wakeup vector,
> >>> hence this patch is right, I think.
> >>>
> >>> However, the wakeup vector has the same function as the VFIO
> >>> interrupt handler, so you could argue that it is tied to the host
> >>> affinity rather than the guest.  Let's wait for Yang to answer.
> >>
> >> Actually, that's my original question too. I am wondering what
> >> happens if the
> > user changes the assigned device's affinity in host's /proc/irq/? If
> > ignore it is acceptable, then this patch is ok. But it seems the
> > discussion out of my scope, need some experts to tell us their idea since it will
> impact the user experience.
> > Hi Yang,
> 
> Hi Jiang,
> 
> > 	Originally we have a proposal to return failure when user sets IRQ
> > affinity through native OS interfaces if an IRQ is in PI mode. But
> > that proposal will break CPU hot-removal because OS needs to migrate
> > away all IRQs binding to the CPU to be offlined. Then we propose
> > saving user IRQ affinity setting without changing hardware
> > configuration (keeping PI configuration). Later when PI mode is
> > disabled, the cached affinity setting will be used to setup IRQ
> > destination for native OS. On the other hand, for IRQ in PI mode, it
> > won't be delivered to native OS, so user may not sense that the IRQ is
> delivered to CPUs other than those in the affinity set.
> 
> The IRQ is still there but will be delivered to host in the form of PI event(if the
> VCPU is running in root-mode). I am not sure whether those interrupts should
> be reflected in /proc/interrupts? If the answer is yes, then which entries should
> be used, a new PI entry or use the original IRQ entry?

Even though, setting the affinity of the IRQ in host should not affect the destination of the
PI event (normal notification event of wakeup notification event), because the destination
of the PI event is determined in NDST field of Posted-interrupts descriptor and PI notification
vector is global. Just had a discussion with Jiang offline, maybe we can add the statistics
information for the notification vector in /proc/interrupts just like any other global
interrupts.

Thanks,
Feng

> 
> > In that aspect, I think it's acceptable:) Regards!
> 
> Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable
> then we can follow current implementation and document it.
> 
> > Gerry
> >>
> >>>
> >>> Paolo
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> >>
> 
> 
> Best regards,
> Yang
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-24  3:08                         ` Wu, Feng
@ 2014-12-24  4:04                           ` Zhang, Yang Z
  0 siblings, 0 replies; 140+ messages in thread
From: Zhang, Yang Z @ 2014-12-24  4:04 UTC (permalink / raw)
  To: Wu, Feng, Jiang Liu, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro, Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger

Wu, Feng wrote on 2014-12-24:
> 
> 
> Zhang, Yang Z wrote on 2014-12-24:
>> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> KVM list; Eric Auger
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Jiang Liu wrote on 2014-12-24:
>>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>>> Paolo Bonzini wrote on 2014-12-23:
>>>>> 
>>>>> 
>>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs
>>>>>>>> in host. Do you think it is a right behavior?
>>>>>>> 
>>>>>>> No, the interrupt is not delivered at all in the host. Normally
>>>>>>> you'd have:
>>>>>>> 
>>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>> 
>>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>> 
>>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>> 
>>>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>>>> delivered to the thread that is running the vCPU directly, so
>>>>>>> the host affinity is bypassed entirely.
>>>>>>> 
>>>>>>> ... unless you are considering the case where the vCPU is
>>>>>>> blocked and the host is processing the posted interrupt wakeup vector.
>>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>>> matching the host
>>> affinity.
>>>>>> 
>>>>>> In my understanding, wakeup vector should have no relationship
>>>>>> with the host affinity of the irq. Wakeup notification event
>>>>>> should be delivered to the pCPU which the vCPU was blocked on.
>>>>>> And in kernel's point of view, the irq is not associated with
>>>>>> the wakeup vector,
>> right?
>>>>> 
>>>>> That is correct indeed.  It is not associated to the wakeup
>>>>> vector, hence this patch is right, I think.
>>>>> 
>>>>> However, the wakeup vector has the same function as the VFIO
>>>>> interrupt handler, so you could argue that it is tied to the
>>>>> host affinity rather than the guest.  Let's wait for Yang to answer.
>>>> 
>>>> Actually, that's my original question too. I am wondering what
>>>> happens if the
>>> user changes the assigned device's affinity in host's /proc/irq/? If
>>> ignore it is acceptable, then this patch is ok. But it seems the
>>> discussion out of my scope, need some experts to tell us their idea
>>> since it will impact the user experience. Hi Yang,
>> 
>> Hi Jiang,
>> 
>>> 	Originally we have a proposal to return failure when user sets
>>> IRQ affinity through native OS interfaces if an IRQ is in PI mode.
>>> But that proposal will break CPU hot-removal because OS needs to
>>> migrate away all IRQs binding to the CPU to be offlined. Then we
>>> propose saving user IRQ affinity setting without changing hardware
>>> configuration (keeping PI configuration). Later when PI mode is
>>> disabled, the cached affinity setting will be used to setup IRQ
>>> destination for native OS. On the other hand, for IRQ in PI mode,
>>> it won't be delivered to native OS, so user may not sense that the
>>> IRQ is
>> delivered to CPUs other than those in the affinity set.
>> 
>> The IRQ is still there but will be delivered to host in the form of
>> PI event(if the VCPU is running in root-mode). I am not sure whether
>> those interrupts should be reflected in /proc/interrupts? If the
>> answer is yes, then which entries should be used, a new PI entry or
>> use the
> original IRQ entry?
> 
> Even though, setting the affinity of the IRQ in host should not affect
> the destination of the PI event (normal notification event of wakeup

This is your implementation. To me, disable PI if the VCPU is going to 
run in the CPU out of IRQ affinity bitmap also is acceptable. And it will 
keep the user interface looks the same as before. 

Hi Thomas, Ingo, Peter

Can you guys help to review this patch? Really appreciate if you can give
some feedbacks.

> notification event), because the destination of the PI event is
> determined in NDST field of Posted-interrupts descriptor and PI
> notification vector is global. Just had a discussion with Jiang
> offline, maybe we can add the statistics information for the notification vector in /proc/interrupts just like any other global interrupts.
> 
> Thanks,
> Feng
> 
>> 
>>> In that aspect, I think it's acceptable:) Regards!
>> 
>> Yes, if all of you guys(especially the IRQ maintainer) are think it
>> is acceptable then we can follow current implementation and document it.
>> 
>>> Gerry
>>>> 
>>>>> 
>>>>> Paolo
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-24  2:32                       ` Zhang, Yang Z
  2014-12-24  3:08                         ` Wu, Feng
@ 2014-12-24  4:54                         ` Jiang Liu
  1 sibling, 0 replies; 140+ messages in thread
From: Jiang Liu @ 2014-12-24  4:54 UTC (permalink / raw)
  To: Zhang, Yang Z, Paolo Bonzini, Wu, Feng, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, Gleb Natapov, dwmw2, joro,
	Alex Williamson
  Cc: iommu, linux-kernel, KVM list, Eric Auger

On 2014/12/24 10:32, Zhang, Yang Z wrote:
> Jiang Liu wrote on 2014-12-24:
>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>> Paolo Bonzini wrote on 2014-12-23:
>>>>
>>>>
>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>>>> Do you think it is a right behavior?
>>>>>>
>>>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>>>
>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>
>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>
>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>
>>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>>> delivered to the thread that is running the vCPU directly, so the
>>>>>> host affinity is bypassed entirely.
>>>>>>
>>>>>> ... unless you are considering the case where the vCPU is blocked
>>>>>> and the host is processing the posted interrupt wakeup vector.
>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>> matching the host
>> affinity.
>>>>>
>>>>> In my understanding, wakeup vector should have no relationship
>>>>> with the host affinity of the irq. Wakeup notification event
>>>>> should be delivered to the pCPU which the vCPU was blocked on. And
>>>>> in kernel's point of view, the irq is not associated with the wakeup vector, right?
>>>>
>>>> That is correct indeed.  It is not associated to the wakeup vector,
>>>> hence this patch is right, I think.
>>>>
>>>> However, the wakeup vector has the same function as the VFIO
>>>> interrupt handler, so you could argue that it is tied to the host
>>>> affinity rather than the guest.  Let's wait for Yang to answer.
>>>
>>> Actually, that's my original question too. I am wondering what
>>> happens if the
>> user changes the assigned device's affinity in host's /proc/irq/? If
>> ignore it is acceptable, then this patch is ok. But it seems the
>> discussion out of my scope, need some experts to tell us their idea since it will impact the user experience.
>> Hi Yang,
> 
> Hi Jiang,
> 
>> 	Originally we have a proposal to return failure when user sets IRQ
>> affinity through native OS interfaces if an IRQ is in PI mode. But
>> that proposal will break CPU hot-removal because OS needs to migrate
>> away all IRQs binding to the CPU to be offlined. Then we propose
>> saving user IRQ affinity setting without changing hardware
>> configuration (keeping PI configuration). Later when PI mode is
>> disabled, the cached affinity setting will be used to setup IRQ
>> destination for native OS. On the other hand, for IRQ in PI mode, it
>> won't be delivered to native OS, so user may not sense that the IRQ is delivered to CPUs other than those in the affinity set.
> 
> The IRQ is still there but will be delivered to host in the form of PI event(if the VCPU is running in root-mode). I am not sure whether those interrupts should be reflected in /proc/interrupts? If the answer is yes, then which entries should be used, a new PI entry or use the original IRQ entry?

You are right, the native interrupt statistics will become inaccurate.
Maybe some document about this behavior is preferred.

> 
>> In that aspect, I think it's acceptable:) Regards!
> 
> Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable then we can follow current implementation and document it.
Good suggestion, we will send an email to Thomas for advice after New
Year.
> 
>> Gerry
>>>
>>>>
>>>> Paolo
>>>
>>>
>>> Best regards,
>>> Yang
>>>
>>>
> 
> 
> Best regards,
> Yang
> 
> 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (26 preceding siblings ...)
  2014-12-16  9:04 ` [v3 00/26] Add VT-d Posted-Interrupts support Wu, Feng
@ 2015-01-06  1:10 ` Wu, Feng
  2015-01-09 12:46   ` joro
  2015-01-21  2:25 ` Wu, Feng
  28 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-01-06  1:10 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

Ping...

Hi Joerg & David,

Could you please have a look at the IOMMU part of this series (patch 02 - 04, patch 06 - 09 , patch 26)?

Hi Thomas, Ingo, & Peter,

Could you please have a look at this series, especially for patch 01, 05, 21?

Thanks,
Feng

> -----Original Message-----
> From: Wu, Feng
> Sent: Friday, December 12, 2014 11:15 PM
> To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> v1->v2:
> * Use VFIO framework to enable this feature, the VFIO part of this series is
>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> * Rebase this patchset on
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>   then revise some irq logic based on the new hierarchy irqdomain patches
> provided
>   by Jiang Liu <jiang.liu@linux.intel.com>
> 
> v2->v3:
> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>   preempted or blocked.
> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> KVM_DEV_VFIO_DEVICE_POST_IRQ
> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> __KVM_HAVE_ARCH_KVM_VFIO_POST
> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>   can be used to change back to remapping mode.
> * Fix typo
> 
> This patch series is made of the following groups:
> 1-6: Some preparation changes in iommu and irq component, this is based on
> the
>      new hierarchy irqdomain logic.
> 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection,
>           command line parameter.
> 10-17, 22-25: Changes related to KVM itself.
> 18-20: Changes in VFIO component, this part was previously sent out as
> "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> Posted-Interrupts"
> 21: x86 irq related changes
> 
> Feng Wu (26):
>   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
>     VCPU
>   iommu: Add new member capability to struct irq_remap_ops
>   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
>   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
>   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
>   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
>   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
>   iommu, x86: Add intel_irq_remapping_capability() for Intel
>   iommu, x86: define irq_remapping_cap()
>   KVM: change struct pi_desc for VT-d Posted-Interrupts
>   KVM: Add some helper functions for Posted-Interrupts
>   KVM: Initialize VT-d Posted-Interrupts Descriptor
>   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
>   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
>   KVM: add interfaces to control PI outside vmx
>   KVM: Make struct kvm_irq_routing_table accessible
>   KVM: make kvm_set_msi_irq() public
>   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
>   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
>   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
>   x86, irq: Define a global vector for VT-d Posted-Interrupts
>   KVM: Define a wakeup worker thread for vCPU
>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>   KVM: Suppress posted-interrupt when 'SN' is set
>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> 
>  Documentation/kernel-parameters.txt        |   1 +
>  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
>  arch/x86/include/asm/entry_arch.h          |   2 +
>  arch/x86/include/asm/hardirq.h             |   1 +
>  arch/x86/include/asm/hw_irq.h              |   2 +
>  arch/x86/include/asm/irq_remapping.h       |  11 ++
>  arch/x86/include/asm/irq_vectors.h         |   1 +
>  arch/x86/include/asm/kvm_host.h            |  12 ++
>  arch/x86/kernel/apic/msi.c                 |   1 +
>  arch/x86/kernel/entry_64.S                 |   2 +
>  arch/x86/kernel/irq.c                      |  27 ++++
>  arch/x86/kernel/irqinit.c                  |   2 +
>  arch/x86/kvm/Makefile                      |   2 +-
>  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
>  arch/x86/kvm/vmx.c                         | 244
> ++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                         |  22 ++-
>  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
>  drivers/iommu/irq_remapping.c              |  24 ++-
>  drivers/iommu/irq_remapping.h              |   8 +
>  include/linux/dmar.h                       |  32 ++++
>  include/linux/intel-iommu.h                |   1 +
>  include/linux/irq.h                        |   7 +
>  include/linux/kvm_host.h                   |  46 ++++++
>  include/uapi/linux/kvm.h                   |  11 ++
>  kernel/irq/chip.c                          |  14 ++
>  kernel/irq/manage.c                        |  20 +++
>  virt/kvm/irq_comm.c                        |  43 ++++-
>  virt/kvm/irqchip.c                         |  11 --
>  virt/kvm/kvm_main.c                        |  15 ++
>  virt/kvm/vfio.c                            | 107 +++++++++++++
>  30 files changed, 795 insertions(+), 28 deletions(-)
>  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> 
> --
> 1.9.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 00/26] Add VT-d Posted-Interrupts support
  2015-01-06  1:10 ` Wu, Feng
@ 2015-01-09 12:46   ` joro
  2015-01-09 13:58     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: joro @ 2015-01-09 12:46 UTC (permalink / raw)
  To: Wu, Feng
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

Hi Feng,

On Tue, Jan 06, 2015 at 01:10:19AM +0000, Wu, Feng wrote:
> Ping...
> 
> Hi Joerg & David,
> 
> Could you please have a look at the IOMMU part of this series (patch 02 - 04, patch 06 - 09 , patch 26)?
> 
> Hi Thomas, Ingo, & Peter,
> 
> Could you please have a look at this series, especially for patch 01, 05, 21?

I fear this conflicts somewhat with the irq-domain patches from Jiang
Liu. Once we worked this out (should happen soon) I'll have a look at
the VT-d posted interrupt patches.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2015-01-09 12:46   ` joro
@ 2015-01-09 13:58     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-09 13:58 UTC (permalink / raw)
  To: joro
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: joro@8bytes.org [mailto:joro@8bytes.org]
> Sent: Friday, January 09, 2015 8:46 PM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> Hi Feng,
> 
> On Tue, Jan 06, 2015 at 01:10:19AM +0000, Wu, Feng wrote:
> > Ping...
> >
> > Hi Joerg & David,
> >
> > Could you please have a look at the IOMMU part of this series (patch 02 - 04,
> patch 06 - 09 , patch 26)?
> >
> > Hi Thomas, Ingo, & Peter,
> >
> > Could you please have a look at this series, especially for patch 01, 05, 21?
> 
> I fear this conflicts somewhat with the irq-domain patches from Jiang
> Liu. Once we worked this out (should happen soon) I'll have a look at
> the VT-d posted interrupt patches.
> 
> Regards,
> 
> 	Joerg

Thanks a lot for your feedback on this. In fact, my patches is based on Jiang Liu's
irq-domain patches and has been discussed with Jiang offline. So I don't think
it will conflicts with Jiang's work. Anyway, any comments are welcome!

Thanks,
Feng


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
  2014-12-18 14:49   ` Zhang, Yang Z
@ 2015-01-09 14:54   ` Radim Krčmář
  2015-01-09 14:56     ` Paolo Bonzini
  1 sibling, 1 reply; 140+ messages in thread
From: Radim Krčmář @ 2015-01-09 14:54 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

2014-12-12 23:14+0800, Feng Wu:
> This patch defines a new interface kvm_find_dest_vcpu for
> VT-d PI, which can returns the destination vCPU of the
> interrupt for guests.
> 
> Since VT-d PI cannot handle broadcast/multicast interrupt,
> Here we only handle Fixed and Lowest priority interrupts.
> 
> The current method of handling guest lowest priority interrtups
> is to use a counter 'apic_arb_prio' for each vCPU, we choose the
> vCPU with smallest 'apic_arb_prio' and then increase it by 1.
> However, for VT-d PI, we cannot re-use this, since we no longer
> have control to 'apic_arb_prio' with posted interrupt direct
> delivery by Hardware.
> 
> Here, we introduce a similar way with 'apic_arb_prio' to handle
> guest lowest priority interrtups when VT-d PI is used. Here is the
> ideas:
> - Each vCPU has a counter 'round_robin_counter'.
> - When guests sets an interrupts to lowest priority, we choose
> the vCPU with smallest 'round_robin_counter' as the destination,
> then increase it.

There are two points relevant to this patch in new KVM's implementation,
("KVM: x86: amend APIC lowest priority arbitration",
 https://lkml.org/lkml/2015/1/9/362)

1) lowest priority depends on TPR
2) there is no need for balancing

(1) has to be considered with PI as well.
I kept (2) to avoid whining from people building on that behaviour, but
lowest priority backed by PI could be transparent without it.

Patch below removes the balancing, but I am not sure this is a price we
allowed ourselves to pay ... what are your opinions?


---8<---
KVM: x86: don't balance lowest priority interrupts

Balancing is not mandated by specification and real hardware most likely
doesn't do it.  We break backward compatibility to allow optimizations.
(Posted interrupts can deliver to only one fixed destination.)

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/lapic.c            | 8 ++------
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 97a5dd0222c8..aa4bd8286232 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -370,7 +370,6 @@ struct kvm_vcpu_arch {
 	u64 apic_base;
 	struct kvm_lapic *apic;    /* kernel irqchip context */
 	unsigned long apic_attention;
-	int32_t apic_arb_prio;
 	int mp_state;
 	u64 ia32_misc_enable_msr;
 	bool tpr_access_reporting;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5b9d8c589bba..eb85af8e8fc0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -749,7 +749,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 				  trig_mode, vector);
 	switch (delivery_mode) {
 	case APIC_DM_LOWEST:
-		vcpu->arch.apic_arb_prio++;
 	case APIC_DM_FIXED:
 		/* FIXME add logic for vcpu on reset */
 		if (unlikely(!apic_enabled(apic)))
@@ -837,11 +836,9 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
 	 *      - uses the APR register (which also considers ISR and IRR),
 	 *      - chooses the highest APIC ID when APRs are identical,
 	 *      - and allows a focus processor.
-	 * XXX: pseudo-balancing with apic_arb_prio is a KVM-specific feature
 	 */
-	int tpr = kvm_apic_get_reg(vcpu1->arch.apic, APIC_TASKPRI) -
-	          kvm_apic_get_reg(vcpu2->arch.apic, APIC_TASKPRI);
-	return tpr ? : vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
+	return kvm_apic_get_reg(vcpu1->arch.apic, APIC_TASKPRI) -
+	       kvm_apic_get_reg(vcpu2->arch.apic, APIC_TASKPRI);
 }
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
@@ -1595,7 +1592,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.pv_eoi.msr_val = 0;
 	apic_update_ppr(apic);
 
-	vcpu->arch.apic_arb_prio = 0;
 	vcpu->arch.apic_attention = 0;
 
 	apic_debug("%s: vcpu=%p, id=%d, base_msr="
-- 
2.2.0


^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-09 14:54   ` Radim Krčmář
@ 2015-01-09 14:56     ` Paolo Bonzini
  2015-01-09 15:12       ` Radim Krčmář
  2015-01-13  0:27       ` Wu, Feng
  0 siblings, 2 replies; 140+ messages in thread
From: Paolo Bonzini @ 2015-01-09 14:56 UTC (permalink / raw)
  To: Radim Krčmář, Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, dwmw2, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm



On 09/01/2015 15:54, Radim Krčmář wrote:
> There are two points relevant to this patch in new KVM's implementation,
> ("KVM: x86: amend APIC lowest priority arbitration",
>  https://lkml.org/lkml/2015/1/9/362)
> 
> 1) lowest priority depends on TPR
> 2) there is no need for balancing
> 
> (1) has to be considered with PI as well.

The chipset doesn't support it. :(

> I kept (2) to avoid whining from people building on that behaviour, but
> lowest priority backed by PI could be transparent without it.
> 
> Patch below removes the balancing, but I am not sure this is a price we
> allowed ourselves to pay ... what are your opinions?

I wouldn't mind, but it requires a lot of benchmarking.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-09 14:56     ` Paolo Bonzini
@ 2015-01-09 15:12       ` Radim Krčmář
  2015-01-09 15:18         ` Paolo Bonzini
  2015-01-13  0:27       ` Wu, Feng
  1 sibling, 1 reply; 140+ messages in thread
From: Radim Krčmář @ 2015-01-09 15:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Feng Wu, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

2015-01-09 15:56+0100, Paolo Bonzini:
> 
> 
> On 09/01/2015 15:54, Radim Krčmář wrote:
> > There are two points relevant to this patch in new KVM's implementation,
> > ("KVM: x86: amend APIC lowest priority arbitration",
> >  https://lkml.org/lkml/2015/1/9/362)
> > 
> > 1) lowest priority depends on TPR
> > 2) there is no need for balancing
> > 
> > (1) has to be considered with PI as well.
> 
> The chipset doesn't support it. :(

I meant that we need to recompute PI entries for lowest priority
interrupts every time guest's TPR changes.

Luckily, Linux doesn't use TPR, but other OS might be a reason to drop
lowest priority from PI optimizations.  (Or make it more complicated.)

> > I kept (2) to avoid whining from people building on that behaviour, but
> > lowest priority backed by PI could be transparent without it.
> > 
> > Patch below removes the balancing, but I am not sure this is a price we
> > allowed ourselves to pay ... what are your opinions?
> 
> I wouldn't mind, but it requires a lot of benchmarking.

(I was afraid it would come to that.)

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-09 15:12       ` Radim Krčmář
@ 2015-01-09 15:18         ` Paolo Bonzini
  2015-01-09 15:47           ` Radim Krčmář
  0 siblings, 1 reply; 140+ messages in thread
From: Paolo Bonzini @ 2015-01-09 15:18 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Feng Wu, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm



On 09/01/2015 16:12, Radim Krčmář wrote:
> > The chipset doesn't support it. :(
> 
> I meant that we need to recompute PI entries for lowest priority
> interrupts every time guest's TPR changes.
> 
> Luckily, Linux doesn't use TPR, but other OS might be a reason to drop
> lowest priority from PI optimizations.  (Or make it more complicated.)

Doing vector hashing is a possibility as well.  I would like to know
what existing chipsets do in practice, then we can mimic it.

Paolo

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-09 15:18         ` Paolo Bonzini
@ 2015-01-09 15:47           ` Radim Krčmář
  0 siblings, 0 replies; 140+ messages in thread
From: Radim Krčmář @ 2015-01-09 15:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Feng Wu, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

2015-01-09 16:18+0100, Paolo Bonzini:
> On 09/01/2015 16:12, Radim Krčmář wrote:
> > > The chipset doesn't support it. :(
> > 
> > I meant that we need to recompute PI entries for lowest priority
> > interrupts every time guest's TPR changes.
> > 
> > Luckily, Linux doesn't use TPR, but other OS might be a reason to drop
> > lowest priority from PI optimizations.  (Or make it more complicated.)
> 
> Doing vector hashing is a possibility as well.  I would like to know
> what existing chipsets do in practice, then we can mimic it.

When looking at /proc/interrupts from time to time, I have only seen
interrupts landing on the first CPU of the set.

We could also distinguish between AMD and Intel ...
AMD should deliver to the highest APIC ID.
(If we still need to decide after focus processor and APR checks.)

I'll try to check using a more trustworthy approach.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-09 14:56     ` Paolo Bonzini
  2015-01-09 15:12       ` Radim Krčmář
@ 2015-01-13  0:27       ` Wu, Feng
  2015-01-13 16:17         ` Radim Kr?má?
  1 sibling, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-01-13  0:27 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Kr?má?
  Cc: tglx, mingo, hpa, x86, gleb, dwmw2, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1751 bytes --]



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, January 09, 2015 10:56 PM
> To: Radim Krčmář; Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; dwmw2@infradead.org; joro@8bytes.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 09/01/2015 15:54, Radim Krčmář wrote:
> > There are two points relevant to this patch in new KVM's implementation,
> > ("KVM: x86: amend APIC lowest priority arbitration",
> >  https://lkml.org/lkml/2015/1/9/362)
> >
> > 1) lowest priority depends on TPR
> > 2) there is no need for balancing
> >
> > (1) has to be considered with PI as well.
> 
> The chipset doesn't support it. :(
> 
> > I kept (2) to avoid whining from people building on that behaviour, but
> > lowest priority backed by PI could be transparent without it.
> >
> > Patch below removes the balancing, but I am not sure this is a price we
> > allowed ourselves to pay ... what are your opinions?
> 
> I wouldn't mind, but it requires a lot of benchmarking.

In fact, the real hardware may do lowest priority in round robin way, the new
hardware even doesn't consider the TPR for lowest priority interrupts delivery.

As discussed with Paolo before, I will submit a patch to support lowest priority for PI
after this series is merged.

Thanks,
Feng

> 
> Paolo
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-13  0:27       ` Wu, Feng
@ 2015-01-13 16:17         ` Radim Kr?má?
  2015-01-14  1:27           ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Radim Kr?má? @ 2015-01-13 16:17 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Paolo Bonzini, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

2015-01-13 00:27+0000, Wu, Feng:
> > On 09/01/2015 15:54, Radim Krčmář wrote:
> > > There are two points relevant to this patch in new KVM's implementation,
> > > ("KVM: x86: amend APIC lowest priority arbitration",
> > >  https://lkml.org/lkml/2015/1/9/362)
> > >
> > > 1) lowest priority depends on TPR
> > > 2) there is no need for balancing
> > >
> > > (1) has to be considered with PI as well.
> > 
> > The chipset doesn't support it. :(
> > 
> > > I kept (2) to avoid whining from people building on that behaviour, but
> > > lowest priority backed by PI could be transparent without it.
> > >
> > > Patch below removes the balancing, but I am not sure this is a price we
> > > allowed ourselves to pay ... what are your opinions?
> > 
> > I wouldn't mind, but it requires a lot of benchmarking.
> 
> In fact, the real hardware may do lowest priority in round robin way,

Yes, but we won't emulate round robin with PI and I think it is wrong to
have backends with significantly different guest-visible behaviors.

>                                                                       the new
> hardware even doesn't consider the TPR for lowest priority interrupts delivery.

A bold move ... what hardware was the first to do so?

> As discussed with Paolo before, I will submit a patch to support lowest priority for PI
> after this series is merged.

Sure, I see only two good solutions though
 1) don't optimize lowest priority with PI
 2) don't balance lowest priority

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-13 16:17         ` Radim Kr?má?
@ 2015-01-14  1:27           ` Wu, Feng
  2015-01-14 13:02             ` Paolo Bonzini
  2015-01-14 16:59             ` Radim Kr?má?
  0 siblings, 2 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-14  1:27 UTC (permalink / raw)
  To: Radim Kr?má?
  Cc: Paolo Bonzini, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2691 bytes --]



> -----Original Message-----
> From: Radim Kr?má? [mailto:rkrcmar@redhat.com]
> Sent: Wednesday, January 14, 2015 12:17 AM
> To: Wu, Feng
> Cc: Paolo Bonzini; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org; gleb@kernel.org; dwmw2@infradead.org; joro@8bytes.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 2015-01-13 00:27+0000, Wu, Feng:
> > > On 09/01/2015 15:54, Radim Krčmář wrote:
> > > > There are two points relevant to this patch in new KVM's implementation,
> > > > ("KVM: x86: amend APIC lowest priority arbitration",
> > > >  https://lkml.org/lkml/2015/1/9/362)
> > > >
> > > > 1) lowest priority depends on TPR
> > > > 2) there is no need for balancing
> > > >
> > > > (1) has to be considered with PI as well.
> > >
> > > The chipset doesn't support it. :(
> > >
> > > > I kept (2) to avoid whining from people building on that behaviour, but
> > > > lowest priority backed by PI could be transparent without it.
> > > >
> > > > Patch below removes the balancing, but I am not sure this is a price we
> > > > allowed ourselves to pay ... what are your opinions?
> > >
> > > I wouldn't mind, but it requires a lot of benchmarking.
> >
> > In fact, the real hardware may do lowest priority in round robin way,
> 
> Yes, but we won't emulate round robin with PI and I think it is wrong to
> have backends with significantly different guest-visible behaviors.
> 
> >
> the new
> > hardware even doesn't consider the TPR for lowest priority interrupts
> delivery.
> 
> A bold move ... what hardware was the first to do so?

I think it was starting with Nehalem.

> 
> > As discussed with Paolo before, I will submit a patch to support lowest
> priority for PI
> > after this series is merged.
> 
> Sure, I see only two good solutions though
>  1) don't optimize lowest priority with PI
>  2) don't balance lowest priority

As discussed with Paolo before, as the first stage, we only support single-CPU
lowest priority for PI, since this is a new hardware feature enabling, Paolo trends
to do simple things in the beginning. Then we will support full lowest priority for
it, such as, using vector hashing (this is one method of what hardware do for
lowest priority today), I need to get some detailed information about this from
hardware guys before enabling it.

Thanks,
Feng
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-14  1:27           ` Wu, Feng
@ 2015-01-14 13:02             ` Paolo Bonzini
  2015-01-14 16:59             ` Radim Kr?má?
  1 sibling, 0 replies; 140+ messages in thread
From: Paolo Bonzini @ 2015-01-14 13:02 UTC (permalink / raw)
  To: Wu, Feng, Radim Kr?má?
  Cc: tglx, mingo, hpa, x86, gleb, dwmw2, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm



On 14/01/2015 02:27, Wu, Feng wrote:
> As discussed with Paolo before, as the first stage, we only support single-CPU
> lowest priority for PI, since this is a new hardware feature enabling, Paolo trends
> to do simple things in the beginning.

:)

Nice way to sum it up!

Paolo

> Then we will support full lowest priority for
> it, such as, using vector hashing (this is one method of what hardware do for
> lowest priority today), I need to get some detailed information about this from
> hardware guys before enabling it.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-14  1:27           ` Wu, Feng
  2015-01-14 13:02             ` Paolo Bonzini
@ 2015-01-14 16:59             ` Radim Kr?má?
  2015-01-20 21:04               ` Nadav Amit
  1 sibling, 1 reply; 140+ messages in thread
From: Radim Kr?má? @ 2015-01-14 16:59 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Paolo Bonzini, tglx, mingo, hpa, x86, gleb, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

2015-01-14 01:27+0000, Wu, Feng:
> > the new
> > > hardware even doesn't consider the TPR for lowest priority interrupts
> > delivery.
> > 
> > A bold move ... what hardware was the first to do so?
> 
> I think it was starting with Nehalem.

Thanks,  (Could be that QPI can't inform about TPR changes anymore ...)

I played with Linux's TPR on Haswell and found that is has no effect.

> > > As discussed with Paolo before, I will submit a patch to support lowest
> > priority for PI
> > > after this series is merged.
> > 
> > Sure, I see only two good solutions though
> >  1) don't optimize lowest priority with PI
> >  2) don't balance lowest priority
> 
> As discussed with Paolo before, as the first stage, we only support single-CPU
> lowest priority for PI, since this is a new hardware feature enabling, Paolo trends
> to do simple things in the beginning.

I agree, that is the best we can do without changing lowest priority.

I wanted to avoid a future solution that would introduce two behaviors
for lowest priority (round robin and something).
Round robin (anything dynamic) can't be done with PI, hence the question
if we can remove it.

>                                       Then we will support full lowest priority for
> it, such as, using vector hashing (this is one method of what hardware do for
> lowest priority today), I need to get some detailed information about this from
> hardware guys before enabling it.

I wasn't able to confirm hashing, is it a recent addition?
I'm not sure we want it then:  OS still has to take care of proper
distribution, being predictable is better than uncertain gains, and
we'll save code.

The same should apply to hardware though ... do you know the reasons
behind vector hashing?

Thank you.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-14 16:59             ` Radim Kr?má?
@ 2015-01-20 21:04               ` Nadav Amit
  2015-01-21 21:16                 ` Radim Kr?má?
  0 siblings, 1 reply; 140+ messages in thread
From: Nadav Amit @ 2015-01-20 21:04 UTC (permalink / raw)
  To: Radim Kr?má?
  Cc: Wu, Feng, kvm, eric.auger, gleb, x86, linux-kernel, iommu, mingo,
	hpa, Paolo Bonzini, tglx, dwmw2, jiang.liu

Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2015-01-14 01:27+0000, Wu, Feng:
>>> the new
>>>> hardware even doesn't consider the TPR for lowest priority interrupts
>>> delivery.
>>> 
>>> A bold move ... what hardware was the first to do so?
>> 
>> I think it was starting with Nehalem.
> 
> Thanks,  (Could be that QPI can't inform about TPR changes anymore ...)
> 
> I played with Linux's TPR on Haswell and found that is has no effect.

Sorry for jumping into the discussion, but doesn’t it depend on
IA32_MISC_ENABLE[23]? This bit disables xTPR messages. On my machine it is
set (probably by the BIOS), but since there is no IA32_MISC_ENABLE is not
locked for changes, the OS can control it.

Nadav

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
                   ` (27 preceding siblings ...)
  2015-01-06  1:10 ` Wu, Feng
@ 2015-01-21  2:25 ` Wu, Feng
  2015-01-28  3:01   ` Wu, Feng
  28 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-01-21  2:25 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng


> -----Original Message-----
> From: Wu, Feng
> Sent: Friday, December 12, 2014 11:15 PM
> To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> 
> v1->v2:
> * Use VFIO framework to enable this feature, the VFIO part of this series is
>   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> * Rebase this patchset on
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
>   then revise some irq logic based on the new hierarchy irqdomain patches
> provided
>   by Jiang Liu <jiang.liu@linux.intel.com>
> 
> v2->v3:
> * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
>   preempted or blocked.
> * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> KVM_DEV_VFIO_DEVICE_POST_IRQ
> * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> __KVM_HAVE_ARCH_KVM_VFIO_POST
> * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
>   can be used to change back to remapping mode.
> * Fix typo
> 
> This patch series is made of the following groups:
> 1-6: Some preparation changes in iommu and irq component, this is based on
> the
>      new hierarchy irqdomain logic.
> 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
> detection,
>           command line parameter.
> 10-17, 22-25: Changes related to KVM itself.
> 18-20: Changes in VFIO component, this part was previously sent out as
> "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> Posted-Interrupts"
> 21: x86 irq related changes
> 
> Feng Wu (26):
>   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
>     VCPU
>   iommu: Add new member capability to struct irq_remap_ops
>   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
>   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
>   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
>   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
>   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
>   iommu, x86: Add intel_irq_remapping_capability() for Intel
>   iommu, x86: define irq_remapping_cap()
>   KVM: change struct pi_desc for VT-d Posted-Interrupts
>   KVM: Add some helper functions for Posted-Interrupts
>   KVM: Initialize VT-d Posted-Interrupts Descriptor
>   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
>   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
>   KVM: add interfaces to control PI outside vmx
>   KVM: Make struct kvm_irq_routing_table accessible
>   KVM: make kvm_set_msi_irq() public
>   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
>   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
>   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
>   x86, irq: Define a global vector for VT-d Posted-Interrupts
>   KVM: Define a wakeup worker thread for vCPU
>   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
>   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
>   KVM: Suppress posted-interrupt when 'SN' is set
>   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> 
>  Documentation/kernel-parameters.txt        |   1 +
>  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
>  arch/x86/include/asm/entry_arch.h          |   2 +
>  arch/x86/include/asm/hardirq.h             |   1 +
>  arch/x86/include/asm/hw_irq.h              |   2 +
>  arch/x86/include/asm/irq_remapping.h       |  11 ++
>  arch/x86/include/asm/irq_vectors.h         |   1 +
>  arch/x86/include/asm/kvm_host.h            |  12 ++
>  arch/x86/kernel/apic/msi.c                 |   1 +
>  arch/x86/kernel/entry_64.S                 |   2 +
>  arch/x86/kernel/irq.c                      |  27 ++++
>  arch/x86/kernel/irqinit.c                  |   2 +
>  arch/x86/kvm/Makefile                      |   2 +-
>  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
>  arch/x86/kvm/vmx.c                         | 244
> ++++++++++++++++++++++++++++-
>  arch/x86/kvm/x86.c                         |  22 ++-
>  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
>  drivers/iommu/irq_remapping.c              |  24 ++-
>  drivers/iommu/irq_remapping.h              |   8 +
>  include/linux/dmar.h                       |  32 ++++
>  include/linux/intel-iommu.h                |   1 +
>  include/linux/irq.h                        |   7 +
>  include/linux/kvm_host.h                   |  46 ++++++
>  include/uapi/linux/kvm.h                   |  11 ++
>  kernel/irq/chip.c                          |  14 ++
>  kernel/irq/manage.c                        |  20 +++
>  virt/kvm/irq_comm.c                        |  43 ++++-
>  virt/kvm/irqchip.c                         |  11 --
>  virt/kvm/kvm_main.c                        |  15 ++
>  virt/kvm/vfio.c                            | 107 +++++++++++++
>  30 files changed, 795 insertions(+), 28 deletions(-)
>  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> 

Hi Paolo, Alex, and other maintainers,

Since this series contain multiple subsystems, IOMMU, irq, x86, VFIO, KVM, etc.
I am wondering how you guys handled this case before? If all the patches are
reviewed and acked by the associated maintainer, are you only merge the patches
related to your own subsystem to your tree? However, you may also need get other
patches to make the build successful, so I am a little curious about how you guys
handle this? Thanks a lot!

Thanks,
Feng

> --
> 1.9.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
  2015-01-20 21:04               ` Nadav Amit
@ 2015-01-21 21:16                 ` Radim Kr?má?
  0 siblings, 0 replies; 140+ messages in thread
From: Radim Kr?má? @ 2015-01-21 21:16 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Wu, Feng, kvm, eric.auger, gleb, x86, linux-kernel, iommu, mingo,
	hpa, Paolo Bonzini, tglx, dwmw2, jiang.liu

2015-01-20 23:04+0200, Nadav Amit:
> Radim Kr?má? <rkrcmar@redhat.com> wrote:
> > 2015-01-14 01:27+0000, Wu, Feng:
> >>> the new
> >>>> hardware even doesn't consider the TPR for lowest priority interrupts
> >>> delivery.
> >>> 
> >>> A bold move ... what hardware was the first to do so?
> >> 
> >> I think it was starting with Nehalem.
> > 
> > Thanks,  (Could be that QPI can't inform about TPR changes anymore ...)
> > 
> > I played with Linux's TPR on Haswell and found that is has no effect.
> 
> Sorry for jumping into the discussion, but doesn’t it depend on
> IA32_MISC_ENABLE[23]? This bit disables xTPR messages. On my machine it is
> set (probably by the BIOS), but since there is no IA32_MISC_ENABLE is not
> locked for changes, the OS can control it.

Thanks, I didn't know about it.
On Ivy Bridge EP (the only modern machine at hand), the bit was set by
default.  After clearing it, TPR still had no effect.

The most relevant mention of xTPR I found is related to FSB [1].
[2] isn't enlightening, so there might be more from QPI-era ...


---
1: Intel® E7320 Memory Controller Hub (MCH) Datasheet
   http://www.intel.com/content/dam/doc/datasheet/e7320-memory-controller-hub-datasheet.pdf
   5.2.2 System Bus Interrupts
2: Intel® Xeon® Processor E5 v2 Family: Datasheet, Vol. 2
   http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf
   6.1.2 IntControl

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2015-01-21  2:25 ` Wu, Feng
@ 2015-01-28  3:01   ` Wu, Feng
  2015-01-28  3:44     ` Alex Williamson
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-01-28  3:01 UTC (permalink / raw)
  To: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng



> -----Original Message-----
> From: Wu, Feng
> Sent: Wednesday, January 21, 2015 10:26 AM
> To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> 
> > -----Original Message-----
> > From: Wu, Feng
> > Sent: Friday, December 12, 2014 11:15 PM
> > To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> > Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> > Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> >
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> > y/vt-directed-io-spec.html
> >
> > v1->v2:
> > * Use VFIO framework to enable this feature, the VFIO part of this series is
> >   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> > * Rebase this patchset on
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> >   then revise some irq logic based on the new hierarchy irqdomain patches
> > provided
> >   by Jiang Liu <jiang.liu@linux.intel.com>
> >
> > v2->v3:
> > * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> >   preempted or blocked.
> > * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> > KVM_DEV_VFIO_DEVICE_POST_IRQ
> > * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> > __KVM_HAVE_ARCH_KVM_VFIO_POST
> > * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> >   can be used to change back to remapping mode.
> > * Fix typo
> >
> > This patch series is made of the following groups:
> > 1-6: Some preparation changes in iommu and irq component, this is based on
> > the
> >      new hierarchy irqdomain logic.
> > 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
> > detection,
> >           command line parameter.
> > 10-17, 22-25: Changes related to KVM itself.
> > 18-20: Changes in VFIO component, this part was previously sent out as
> > "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> > Posted-Interrupts"
> > 21: x86 irq related changes
> >
> > Feng Wu (26):
> >   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
> >     VCPU
> >   iommu: Add new member capability to struct irq_remap_ops
> >   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
> >   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
> >   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
> >   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
> >   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
> >   iommu, x86: Add intel_irq_remapping_capability() for Intel
> >   iommu, x86: define irq_remapping_cap()
> >   KVM: change struct pi_desc for VT-d Posted-Interrupts
> >   KVM: Add some helper functions for Posted-Interrupts
> >   KVM: Initialize VT-d Posted-Interrupts Descriptor
> >   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
> >   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
> >   KVM: add interfaces to control PI outside vmx
> >   KVM: Make struct kvm_irq_routing_table accessible
> >   KVM: make kvm_set_msi_irq() public
> >   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
> >   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
> >   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
> >   x86, irq: Define a global vector for VT-d Posted-Interrupts
> >   KVM: Define a wakeup worker thread for vCPU
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> >   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> >   KVM: Suppress posted-interrupt when 'SN' is set
> >   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> >
> >  Documentation/kernel-parameters.txt        |   1 +
> >  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
> >  arch/x86/include/asm/entry_arch.h          |   2 +
> >  arch/x86/include/asm/hardirq.h             |   1 +
> >  arch/x86/include/asm/hw_irq.h              |   2 +
> >  arch/x86/include/asm/irq_remapping.h       |  11 ++
> >  arch/x86/include/asm/irq_vectors.h         |   1 +
> >  arch/x86/include/asm/kvm_host.h            |  12 ++
> >  arch/x86/kernel/apic/msi.c                 |   1 +
> >  arch/x86/kernel/entry_64.S                 |   2 +
> >  arch/x86/kernel/irq.c                      |  27 ++++
> >  arch/x86/kernel/irqinit.c                  |   2 +
> >  arch/x86/kvm/Makefile                      |   2 +-
> >  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
> >  arch/x86/kvm/vmx.c                         | 244
> > ++++++++++++++++++++++++++++-
> >  arch/x86/kvm/x86.c                         |  22 ++-
> >  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
> >  drivers/iommu/irq_remapping.c              |  24 ++-
> >  drivers/iommu/irq_remapping.h              |   8 +
> >  include/linux/dmar.h                       |  32 ++++
> >  include/linux/intel-iommu.h                |   1 +
> >  include/linux/irq.h                        |   7 +
> >  include/linux/kvm_host.h                   |  46 ++++++
> >  include/uapi/linux/kvm.h                   |  11 ++
> >  kernel/irq/chip.c                          |  14 ++
> >  kernel/irq/manage.c                        |  20 +++
> >  virt/kvm/irq_comm.c                        |  43 ++++-
> >  virt/kvm/irqchip.c                         |  11 --
> >  virt/kvm/kvm_main.c                        |  15 ++
> >  virt/kvm/vfio.c                            | 107 +++++++++++++
> >  30 files changed, 795 insertions(+), 28 deletions(-)
> >  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> >
> 
> Hi Paolo, Alex, and other maintainers,
> 
> Since this series contain multiple subsystems, IOMMU, irq, x86, VFIO, KVM, etc.
> I am wondering how you guys handled this case before? If all the patches are
> reviewed and acked by the associated maintainer, are you only merge the
> patches
> related to your own subsystem to your tree? However, you may also need get
> other
> patches to make the build successful, so I am a little curious about how you guys
> handle this? Thanks a lot!

Can anyone share some experiences about this?

Thanks,
Feng

> 
> Thanks,
> Feng
> 
> > --
> > 1.9.1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 00/26] Add VT-d Posted-Interrupts support
  2015-01-28  3:01   ` Wu, Feng
@ 2015-01-28  3:44     ` Alex Williamson
  2015-01-28  4:44       ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Alex Williamson @ 2015-01-28  3:44 UTC (permalink / raw)
  To: Wu, Feng
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro, jiang.liu,
	eric.auger, linux-kernel, iommu, kvm

On Wed, 2015-01-28 at 03:01 +0000, Wu, Feng wrote:
> 
> > -----Original Message-----
> > From: Wu, Feng
> > Sent: Wednesday, January 21, 2015 10:26 AM
> > To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> > Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> > Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
> > 
> > 
> > > -----Original Message-----
> > > From: Wu, Feng
> > > Sent: Friday, December 12, 2014 11:15 PM
> > > To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> > x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> > > Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> > > Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> > >
> > > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > > With VT-d Posted-Interrupts enabled, external interrupts from
> > > direct-assigned devices can be delivered to guests without VMM
> > > intervention when guest is running in non-root mode.
> > >
> > > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > >
> > http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> > > y/vt-directed-io-spec.html
> > >
> > > v1->v2:
> > > * Use VFIO framework to enable this feature, the VFIO part of this series is
> > >   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> > > * Rebase this patchset on
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> > >   then revise some irq logic based on the new hierarchy irqdomain patches
> > > provided
> > >   by Jiang Liu <jiang.liu@linux.intel.com>
> > >
> > > v2->v3:
> > > * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> > >   preempted or blocked.
> > > * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> > > KVM_DEV_VFIO_DEVICE_POST_IRQ
> > > * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> > > __KVM_HAVE_ARCH_KVM_VFIO_POST
> > > * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> > >   can be used to change back to remapping mode.
> > > * Fix typo
> > >
> > > This patch series is made of the following groups:
> > > 1-6: Some preparation changes in iommu and irq component, this is based on
> > > the
> > >      new hierarchy irqdomain logic.
> > > 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
> > > detection,
> > >           command line parameter.
> > > 10-17, 22-25: Changes related to KVM itself.
> > > 18-20: Changes in VFIO component, this part was previously sent out as
> > > "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> > > Posted-Interrupts"
> > > 21: x86 irq related changes
> > >
> > > Feng Wu (26):
> > >   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
> > >     VCPU
> > >   iommu: Add new member capability to struct irq_remap_ops
> > >   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
> > >   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
> > >   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
> > >   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
> > >   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
> > >   iommu, x86: Add intel_irq_remapping_capability() for Intel
> > >   iommu, x86: define irq_remapping_cap()
> > >   KVM: change struct pi_desc for VT-d Posted-Interrupts
> > >   KVM: Add some helper functions for Posted-Interrupts
> > >   KVM: Initialize VT-d Posted-Interrupts Descriptor
> > >   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
> > >   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
> > >   KVM: add interfaces to control PI outside vmx
> > >   KVM: Make struct kvm_irq_routing_table accessible
> > >   KVM: make kvm_set_msi_irq() public
> > >   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
> > >   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
> > >   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
> > >   x86, irq: Define a global vector for VT-d Posted-Interrupts
> > >   KVM: Define a wakeup worker thread for vCPU
> > >   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> > >   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> > >   KVM: Suppress posted-interrupt when 'SN' is set
> > >   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> > >
> > >  Documentation/kernel-parameters.txt        |   1 +
> > >  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
> > >  arch/x86/include/asm/entry_arch.h          |   2 +
> > >  arch/x86/include/asm/hardirq.h             |   1 +
> > >  arch/x86/include/asm/hw_irq.h              |   2 +
> > >  arch/x86/include/asm/irq_remapping.h       |  11 ++
> > >  arch/x86/include/asm/irq_vectors.h         |   1 +
> > >  arch/x86/include/asm/kvm_host.h            |  12 ++
> > >  arch/x86/kernel/apic/msi.c                 |   1 +
> > >  arch/x86/kernel/entry_64.S                 |   2 +
> > >  arch/x86/kernel/irq.c                      |  27 ++++
> > >  arch/x86/kernel/irqinit.c                  |   2 +
> > >  arch/x86/kvm/Makefile                      |   2 +-
> > >  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
> > >  arch/x86/kvm/vmx.c                         | 244
> > > ++++++++++++++++++++++++++++-
> > >  arch/x86/kvm/x86.c                         |  22 ++-
> > >  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
> > >  drivers/iommu/irq_remapping.c              |  24 ++-
> > >  drivers/iommu/irq_remapping.h              |   8 +
> > >  include/linux/dmar.h                       |  32 ++++
> > >  include/linux/intel-iommu.h                |   1 +
> > >  include/linux/irq.h                        |   7 +
> > >  include/linux/kvm_host.h                   |  46 ++++++
> > >  include/uapi/linux/kvm.h                   |  11 ++
> > >  kernel/irq/chip.c                          |  14 ++
> > >  kernel/irq/manage.c                        |  20 +++
> > >  virt/kvm/irq_comm.c                        |  43 ++++-
> > >  virt/kvm/irqchip.c                         |  11 --
> > >  virt/kvm/kvm_main.c                        |  15 ++
> > >  virt/kvm/vfio.c                            | 107 +++++++++++++
> > >  30 files changed, 795 insertions(+), 28 deletions(-)
> > >  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> > >
> > 
> > Hi Paolo, Alex, and other maintainers,
> > 
> > Since this series contain multiple subsystems, IOMMU, irq, x86, VFIO, KVM, etc.
> > I am wondering how you guys handled this case before? If all the patches are
> > reviewed and acked by the associated maintainer, are you only merge the
> > patches
> > related to your own subsystem to your tree? However, you may also need get
> > other
> > patches to make the build successful, so I am a little curious about how you guys
> > handle this? Thanks a lot!
> 
> Can anyone share some experiences about this?

Generally you need to split the patches into logical changes per
subsystem and submit them separately through the mailing list and
maintainer.  This helps to make sure the changes make sense and stand on
their own and aren't simply a hack to reach the end goal.  Submit what
you can in parallel, but expect that if you send patches dependent on
other, unmerged series, they will get ignored and will need to be resent
when the dependencies are resolved in -next.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 00/26] Add VT-d Posted-Interrupts support
  2015-01-28  3:44     ` Alex Williamson
@ 2015-01-28  4:44       ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-28  4:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro, jiang.liu,
	eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8914 bytes --]



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, January 28, 2015 11:44 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 00/26] Add VT-d Posted-Interrupts support
> 
> On Wed, 2015-01-28 at 03:01 +0000, Wu, Feng wrote:
> >
> > > -----Original Message-----
> > > From: Wu, Feng
> > > Sent: Wednesday, January 21, 2015 10:26 AM
> > > To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> > > Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> > > Subject: RE: [v3 00/26] Add VT-d Posted-Interrupts support
> > >
> > >
> > > > -----Original Message-----
> > > > From: Wu, Feng
> > > > Sent: Friday, December 12, 2014 11:15 PM
> > > > To: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> > > x86@kernel.org;
> > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> > > > Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> > > > Subject: [v3 00/26] Add VT-d Posted-Interrupts support
> > > >
> > > > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > > > With VT-d Posted-Interrupts enabled, external interrupts from
> > > > direct-assigned devices can be delivered to guests without VMM
> > > > intervention when guest is running in non-root mode.
> > > >
> > > > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> > > >
> > >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> > > > y/vt-directed-io-spec.html
> > > >
> > > > v1->v2:
> > > > * Use VFIO framework to enable this feature, the VFIO part of this series is
> > > >   base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
> > > > * Rebase this patchset on
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
> > > >   then revise some irq logic based on the new hierarchy irqdomain
> patches
> > > > provided
> > > >   by Jiang Liu <jiang.liu@linux.intel.com>
> > > >
> > > > v2->v3:
> > > > * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
> > > >   preempted or blocked.
> > > > * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -->
> > > > KVM_DEV_VFIO_DEVICE_POST_IRQ
> > > > * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -->
> > > > __KVM_HAVE_ARCH_KVM_VFIO_POST
> > > > * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
> > > >   can be used to change back to remapping mode.
> > > > * Fix typo
> > > >
> > > > This patch series is made of the following groups:
> > > > 1-6: Some preparation changes in iommu and irq component, this is based
> on
> > > > the
> > > >      new hierarchy irqdomain logic.
> > > > 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
> > > > detection,
> > > >           command line parameter.
> > > > 10-17, 22-25: Changes related to KVM itself.
> > > > 18-20: Changes in VFIO component, this part was previously sent out as
> > > > "[RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
> > > > Posted-Interrupts"
> > > > 21: x86 irq related changes
> > > >
> > > > Feng Wu (26):
> > > >   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
> > > >     VCPU
> > > >   iommu: Add new member capability to struct irq_remap_ops
> > > >   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
> > > >   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
> > > >   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
> > > >   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
> > > >   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
> > > >   iommu, x86: Add intel_irq_remapping_capability() for Intel
> > > >   iommu, x86: define irq_remapping_cap()
> > > >   KVM: change struct pi_desc for VT-d Posted-Interrupts
> > > >   KVM: Add some helper functions for Posted-Interrupts
> > > >   KVM: Initialize VT-d Posted-Interrupts Descriptor
> > > >   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
> > > >   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
> > > >   KVM: add interfaces to control PI outside vmx
> > > >   KVM: Make struct kvm_irq_routing_table accessible
> > > >   KVM: make kvm_set_msi_irq() public
> > > >   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
> > > >   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
> > > >   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
> > > >   x86, irq: Define a global vector for VT-d Posted-Interrupts
> > > >   KVM: Define a wakeup worker thread for vCPU
> > > >   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> > > >   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> > > >   KVM: Suppress posted-interrupt when 'SN' is set
> > > >   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
> > > >
> > > >  Documentation/kernel-parameters.txt        |   1 +
> > > >  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
> > > >  arch/x86/include/asm/entry_arch.h          |   2 +
> > > >  arch/x86/include/asm/hardirq.h             |   1 +
> > > >  arch/x86/include/asm/hw_irq.h              |   2 +
> > > >  arch/x86/include/asm/irq_remapping.h       |  11 ++
> > > >  arch/x86/include/asm/irq_vectors.h         |   1 +
> > > >  arch/x86/include/asm/kvm_host.h            |  12 ++
> > > >  arch/x86/kernel/apic/msi.c                 |   1 +
> > > >  arch/x86/kernel/entry_64.S                 |   2 +
> > > >  arch/x86/kernel/irq.c                      |  27 ++++
> > > >  arch/x86/kernel/irqinit.c                  |   2 +
> > > >  arch/x86/kvm/Makefile                      |   2 +-
> > > >  arch/x86/kvm/kvm_vfio_x86.c                |  77 +++++++++
> > > >  arch/x86/kvm/vmx.c                         | 244
> > > > ++++++++++++++++++++++++++++-
> > > >  arch/x86/kvm/x86.c                         |  22 ++-
> > > >  drivers/iommu/intel_irq_remapping.c        |  68 +++++++-
> > > >  drivers/iommu/irq_remapping.c              |  24 ++-
> > > >  drivers/iommu/irq_remapping.h              |   8 +
> > > >  include/linux/dmar.h                       |  32 ++++
> > > >  include/linux/intel-iommu.h                |   1 +
> > > >  include/linux/irq.h                        |   7 +
> > > >  include/linux/kvm_host.h                   |  46 ++++++
> > > >  include/uapi/linux/kvm.h                   |  11 ++
> > > >  kernel/irq/chip.c                          |  14 ++
> > > >  kernel/irq/manage.c                        |  20 +++
> > > >  virt/kvm/irq_comm.c                        |  43 ++++-
> > > >  virt/kvm/irqchip.c                         |  11 --
> > > >  virt/kvm/kvm_main.c                        |  15 ++
> > > >  virt/kvm/vfio.c                            | 107 +++++++++++++
> > > >  30 files changed, 795 insertions(+), 28 deletions(-)
> > > >  create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> > > >
> > >
> > > Hi Paolo, Alex, and other maintainers,
> > >
> > > Since this series contain multiple subsystems, IOMMU, irq, x86, VFIO, KVM,
> etc.
> > > I am wondering how you guys handled this case before? If all the patches are
> > > reviewed and acked by the associated maintainer, are you only merge the
> > > patches
> > > related to your own subsystem to your tree? However, you may also need
> get
> > > other
> > > patches to make the build successful, so I am a little curious about how you
> guys
> > > handle this? Thanks a lot!
> >
> > Can anyone share some experiences about this?
> 
> Generally you need to split the patches into logical changes per
> subsystem and submit them separately through the mailing list and
> maintainer.  This helps to make sure the changes make sense and stand on
> their own and aren't simply a hack to reach the end goal.  Submit what
> you can in parallel, but expect that if you send patches dependent on
> other, unmerged series, they will get ignored and will need to be resent
> when the dependencies are resolved in -next.  Thanks,
> 
> Alex

Thanks a lot for your response, Alex! I am still waiting for comments for
the generic IRQ and IOMMU parts, after I got the comments, I will consider
your advice in the next post. Thank you!

Thanks,
Feng

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 02/26] iommu: Add new member capability to struct irq_remap_ops
  2014-12-12 15:14 ` [v3 02/26] iommu: Add new member capability to struct irq_remap_ops Feng Wu
@ 2015-01-28 15:22   ` David Woodhouse
  2015-01-29  8:34     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:22 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> This patch adds a new member capability to struct irq_remap_ops,
> this new function ops can be used to check whether some
> features are supported, such as VT-d Posted-Interrupts.

> +	/* Check some capability is supported */
> +	bool (*capability)(enum irq_remap_cap);
> +

Does this need to be a function call? Or could we just have a set of
flags in the irq_remap_ops instead, with less overhead to check them?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts Feng Wu
@ 2015-01-28 15:26   ` David Woodhouse
  0 siblings, 0 replies; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:26 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> Add a new irte_pi structure for VT-d Posted-Interrupts.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com

Acked-by: David.Woodhouse <David.Woodhouse@intel.com>

I think it makes most sense for this to go along with the other patches
rather than through me, and I'm happy for it to do so.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  2014-12-12 15:14 ` [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Feng Wu
@ 2015-01-28 15:26   ` David Woodhouse
  2015-01-29  7:55     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:26 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> Implement irq_set_vcpu_affinity for intel_ir_chip.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Acked-by: David.Woodhouse <David.Woodhouse@intel.com> assuming a
suitable answer to...

> +		vcpu_pi_info = (struct vcpu_data *)vcpu_info;
> +		memcpy(irte_pi, &ir_data->irte_entry, sizeof(struct irte));
> +
> +		irte_pi->urg = 0;
> +		irte_pi->vector = vcpu_pi_info->vector;
> +		irte_pi->pda_l = (vcpu_pi_info->pi_desc_addr >>
> +				 (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
> +		irte_pi->pda_h = (vcpu_pi_info->pi_desc_addr >> 32) &
> +				 ~(-1UL << PDA_HIGH_BIT);
> +
> +		irte_pi->__reserved_1 = 0;
> +		irte_pi->__reserved_2 = 0;
> +		irte_pi->__reserved_3 = 0;
> +		irte_pi->__reserved_4 = 0;

.... do we need a barrier here before we set this bit?

> +		irte_pi->pst = 1;
> +
> +		modify_irte(&ir_data->irq_2_iommu, (struct irte *)irte_pi);


-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
  2014-12-18 14:26   ` Zhang, Yang Z
@ 2015-01-28 15:29   ` David Woodhouse
  1 sibling, 0 replies; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:29 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to
> guests instead of interrupt remapping. The destination of the
> interrupt is set in Posted-Interrupts Descriptor, and the migration
> happens during vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used
> when changing back to remapping mode.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Acked-by: David Woodhouse <David.Woodhouse@intel.com>

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  2014-12-12 15:14 ` [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability Feng Wu
@ 2015-01-28 15:32   ` David Woodhouse
  0 siblings, 0 replies; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:32 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 435 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> Add helper function to detect VT-d Posted-Interrupts capability.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Acked-by: David Woodhouse <David.Woodhouse@intel.com>

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel
  2014-12-12 15:14 ` [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel Feng Wu
@ 2015-01-28 15:37   ` David Woodhouse
  2015-01-29  8:57     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:37 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> Add the Intel side implementation for capability in
> struct irq_remap_ops.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

> +static bool intel_irq_remapping_capability(enum irq_remap_cap cap)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +
> +	switch (cap) {
> +	case IRQ_POSTING_CAP:
> +		/*
> +		 * If 1) posted-interrupts is disabled by user
> +		 * or 2) irq remapping is disabled, posted-interrupts
> +		 * is not supported.
> +		 */
> +		if (disable_irq_post || !irq_remapping_enabled)
> +			return 0;
> +
> +		for_each_iommu(iommu, drhd)
> +			if (!cap_pi_support(iommu->cap))
> +				return 0;
> +

If a new IOMMU is hotplugged now which doesn't support posted
interrupts, what happens?

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  2014-12-12 15:15 ` [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-01-28 15:39   ` David Woodhouse
  0 siblings, 0 replies; 140+ messages in thread
From: David Woodhouse @ 2015-01-28 15:39 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]

On Fri, 2014-12-12 at 23:15 +0800, Feng Wu wrote:
> Enable VT-d Posted-Interrtups and add a command line
> parameter for it.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Acked-by: David Woodhouse <David.Woodhouse@intel.com>

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  2015-01-28 15:26   ` David Woodhouse
@ 2015-01-29  7:55     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-29  7:55 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2029 bytes --]



> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Wednesday, January 28, 2015 11:27 PM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; joro@8bytes.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for
> intel_ir_chip
> 
> On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> > Implement irq_set_vcpu_affinity for intel_ir_chip.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> 
> Acked-by: David.Woodhouse <David.Woodhouse@intel.com> assuming a
> suitable answer to...
> 
> > +		vcpu_pi_info = (struct vcpu_data *)vcpu_info;
> > +		memcpy(irte_pi, &ir_data->irte_entry, sizeof(struct irte));
> > +
> > +		irte_pi->urg = 0;
> > +		irte_pi->vector = vcpu_pi_info->vector;
> > +		irte_pi->pda_l = (vcpu_pi_info->pi_desc_addr >>
> > +				 (32 - PDA_LOW_BIT)) & ~(-1UL << PDA_LOW_BIT);
> > +		irte_pi->pda_h = (vcpu_pi_info->pi_desc_addr >> 32) &
> > +				 ~(-1UL << PDA_HIGH_BIT);
> > +
> > +		irte_pi->__reserved_1 = 0;
> > +		irte_pi->__reserved_2 = 0;
> > +		irte_pi->__reserved_3 = 0;
> > +		irte_pi->__reserved_4 = 0;
> 
> .... do we need a barrier here before we set this bit?

Thanks a lot for your Ack, David!

I cannot find a reason why we need a barrier here, since 'irte_pi' is only a local
variant here, the real operation to program hardware occurs in modify_irte(), in
which spin lock is acquired, this means the there is an implicit barrier there.

Thanks,
Feng

> 
> > +		irte_pi->pst = 1;
> > +
> > +		modify_irte(&ir_data->irq_2_iommu, (struct irte *)irte_pi);
> 
> 
> --
> dwmw2
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 02/26] iommu: Add new member capability to struct irq_remap_ops
  2015-01-28 15:22   ` David Woodhouse
@ 2015-01-29  8:34     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-29  8:34 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1441 bytes --]



> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Wednesday, January 28, 2015 11:23 PM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; joro@8bytes.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 02/26] iommu: Add new member capability to struct
> irq_remap_ops
> 
> On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> > This patch adds a new member capability to struct irq_remap_ops,
> > this new function ops can be used to check whether some
> > features are supported, such as VT-d Posted-Interrupts.
> 
> > +	/* Check some capability is supported */
> > +	bool (*capability)(enum irq_remap_cap);
> > +
> 
> Does this need to be a function call? Or could we just have a set of
> flags in the irq_remap_ops instead, with less overhead to check them?

Sounds a good idea, I will follow this in the next post! Thanks for the comments!

Thanks,
Feng

> 
> --
> David Woodhouse                            Open Source Technology
> Centre
> David.Woodhouse@intel.com                              Intel
> Corporation
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel
  2015-01-28 15:37   ` David Woodhouse
@ 2015-01-29  8:57     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-01-29  8:57 UTC (permalink / raw)
  To: David Woodhouse
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, joro, alex.williamson,
	jiang.liu, eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2047 bytes --]



> -----Original Message-----
> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Wednesday, January 28, 2015 11:38 PM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; joro@8bytes.org;
> alex.williamson@redhat.com; jiang.liu@linux.intel.com; eric.auger@linaro.org;
> linux-kernel@vger.kernel.org; iommu@lists.linux-foundation.org;
> kvm@vger.kernel.org
> Subject: Re: [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for
> Intel
> 
> On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> > Add the Intel side implementation for capability in
> > struct irq_remap_ops.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> 
> > +static bool intel_irq_remapping_capability(enum irq_remap_cap cap)
> > +{
> > +	struct dmar_drhd_unit *drhd;
> > +	struct intel_iommu *iommu;
> > +
> > +	switch (cap) {
> > +	case IRQ_POSTING_CAP:
> > +		/*
> > +		 * If 1) posted-interrupts is disabled by user
> > +		 * or 2) irq remapping is disabled, posted-interrupts
> > +		 * is not supported.
> > +		 */
> > +		if (disable_irq_post || !irq_remapping_enabled)
> > +			return 0;
> > +
> > +		for_each_iommu(iommu, drhd)
> > +			if (!cap_pi_support(iommu->cap))
> > +				return 0;
> > +
> 
> If a new IOMMU is hotplugged now which doesn't support posted
> interrupts, what happens?

Good question, Just had a offline discussion with Jiang Liu, actually, there
is the same question for IR. In the current implementation, If IR is in use
and a new IOMMU without IR capability is hotplugged, it will reject this
hotplugging. I think I can simple follow the same policy for PI.

Thanks,
Feng

> 
> --
> David Woodhouse                            Open Source Technology
> Centre
> David.Woodhouse@intel.com                              Intel
> Corporation
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
  2014-12-18 14:54   ` Zhang, Yang Z
@ 2015-01-30 18:18   ` H. Peter Anvin
  2015-02-02  1:06     ` Wu, Feng
  2015-02-23 22:04   ` Marcelo Tosatti
  2 siblings, 1 reply; 140+ messages in thread
From: H. Peter Anvin @ 2015-01-30 18:18 UTC (permalink / raw)
  To: Feng Wu, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm

On 12/12/2014 07:14 AM, Feng Wu wrote:
> Currently, we use a global vector as the Posted-Interrupts
> Notification Event for all the vCPUs in the system. We need
> to introduce another global vector for VT-d Posted-Interrtups,
> which will be used to wakeup the sleep vCPU when an external
> interrupt from a direct-assigned device happens for that vCPU.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
>  

>  #ifdef CONFIG_HAVE_KVM
> +void (*wakeup_handler_callback)(void) = NULL;
> +EXPORT_SYMBOL_GPL(wakeup_handler_callback);
> +

Stylistic nitpick: we generally don't explicitly initialize
global/static pointer variables to NULL (that happens automatically anyway.)

Other than that,

Acked-by: H. Peter Anvin <hpa@linux.intel.com>


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2015-01-30 18:18   ` H. Peter Anvin
@ 2015-02-02  1:06     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-02-02  1:06 UTC (permalink / raw)
  To: H. Peter Anvin, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu
  Cc: eric.auger, linux-kernel, iommu, kvm, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1519 bytes --]



> -----Original Message-----
> From: H. Peter Anvin [mailto:hpa@zytor.com]
> Sent: Saturday, January 31, 2015 2:19 AM
> To: Wu, Feng; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 21/26] x86, irq: Define a global vector for VT-d
> Posted-Interrupts
> 
> On 12/12/2014 07:14 AM, Feng Wu wrote:
> > Currently, we use a global vector as the Posted-Interrupts
> > Notification Event for all the vCPUs in the system. We need
> > to introduce another global vector for VT-d Posted-Interrtups,
> > which will be used to wakeup the sleep vCPU when an external
> > interrupt from a direct-assigned device happens for that vCPU.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> >
> 
> >  #ifdef CONFIG_HAVE_KVM
> > +void (*wakeup_handler_callback)(void) = NULL;
> > +EXPORT_SYMBOL_GPL(wakeup_handler_callback);
> > +
> 
> Stylistic nitpick: we generally don't explicitly initialize
> global/static pointer variables to NULL (that happens automatically anyway.)
> 
> Other than that,
> 
> Acked-by: H. Peter Anvin <hpa@linux.intel.com>

Thanks a lot for your review, Peter!

Thanks,
Feng
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
  2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
  2014-12-18 14:54   ` Zhang, Yang Z
  2015-01-30 18:18   ` H. Peter Anvin
@ 2015-02-23 22:04   ` Marcelo Tosatti
  2 siblings, 0 replies; 140+ messages in thread
From: Marcelo Tosatti @ 2015-02-23 22:04 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Dec 12, 2014 at 11:14:55PM +0800, Feng Wu wrote:
> Currently, we use a global vector as the Posted-Interrupts
> Notification Event for all the vCPUs in the system. We need
> to introduce another global vector for VT-d Posted-Interrtups,
> which will be used to wakeup the sleep vCPU when an external
> interrupt from a direct-assigned device happens for that vCPU.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>

Why an additional vector is necessary?

Can't you simply wakeup the vcpu from kvm_posted_intr_ipi, the posted
interrupt vector handler ?


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
  2014-12-17 17:11   ` Paolo Bonzini
@ 2015-02-23 22:21   ` Marcelo Tosatti
  2015-03-02  9:12     ` Wu, Feng
  1 sibling, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-02-23 22:21 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Dec 12, 2014 at 11:14:57PM +0800, Feng Wu wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is preempted.
> 
> sched out:
> - Set 'SN' to suppress furture non-urgent interrupts posted for
> the vCPU.

What wakes the vcpu in the case of a non-urgent interrupt, then?

I wonder how is software suppose to configure the urgent/non-urgent
flag. Can you give examples of (hypothetical) urgent and non-urgent
interrupts.

> sched in:
> - Clear 'SN'
> - Change NDST if vCPU is scheduled to a different CPU
> - Set 'NV' to POSTED_INTR_VECTOR

What about:

POSTED_INTR_VECTOR interrupt handler:
- Wakeup vcpu.
- Set 'SN' to suppress future interrupts.

HLT emulation entry:
- Clear 'SN' to receive VT-d interrupt notification.

> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  arch/x86/kvm/vmx.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ee3b735..bf2e6cd 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1916,10 +1916,54 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
>  		vmx->loaded_vmcs->cpu = cpu;
>  	}
> +
> +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +		struct pi_desc old, new;
> +		unsigned int dest;
> +
> +		memset(&old, 0, sizeof(old));
> +		memset(&new, 0, sizeof(new));
> +
> +		do {
> +			old.control = new.control = pi_desc->control;
> +			if (vcpu->cpu != cpu) {
> +				dest = cpu_physical_id(cpu);
> +
> +				if (x2apic_enabled())
> +					new.ndst = dest;
> +				else
> +					new.ndst = (dest << 8) & 0xFF00;
> +			}
> +
> +			pi_clear_sn(&new);
> +
> +			/* set 'NV' to 'notification vector' */
> +			new.nv = POSTED_INTR_VECTOR;
> +		} while (cmpxchg(&pi_desc->control, old.control,
> +				new.control) != old.control);
> +	}
>  }
>  
>  static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +		struct pi_desc old, new;
> +
> +		memset(&old, 0, sizeof(old));
> +		memset(&new, 0, sizeof(new));
> +
> +		/* Set SN when the vCPU is preempted */
> +		if (vcpu->preempted) {
> +			do {
> +				old.control = new.control = pi_desc->control;
> +				pi_set_sn(&new);
> +			} while (cmpxchg(&pi_desc->control, old.control,
> +					new.control) != old.control);
> +		}
> +	}
> +
>  	__vmx_load_host_state(to_vmx(vcpu));
>  	if (!vmm_exclusive) {
>  		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
  2014-12-17 17:09   ` Paolo Bonzini
@ 2015-02-25 21:50   ` Marcelo Tosatti
  2015-02-26  8:08     ` Wu, Feng
  2015-02-26 23:40   ` Marcelo Tosatti
  2 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-02-25 21:50 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
> 
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Clear 'SN'
> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> 
> post-block:
> - Remove the vCPU from the per-CPU list
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---

Don't see this is needed, can use the existing POSTED_INTR_VECTOR:

If in guest mode, IPI will be handled in VMX non-root by performed
PIR->IRR transfer.

If outside guest mode, POSTED_INTR_VECTOR IPI will be handled by host
which can wakeup the guest (in case it is halted).

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-02-25 21:50   ` Marcelo Tosatti
@ 2015-02-26  8:08     ` Wu, Feng
  2015-02-26 23:41       ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-02-26  8:08 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Zhang, Yang Z, Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Thursday, February 26, 2015 5:50 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Clear 'SN'
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> 
> Don't see this is needed, can use the existing POSTED_INTR_VECTOR:
> 
> If in guest mode, IPI will be handled in VMX non-root by performed
> PIR->IRR transfer.
> 
> If outside guest mode, POSTED_INTR_VECTOR IPI will be handled by host
> which can wakeup the guest (in case it is halted).

Please see the following scenario:

1. vCPU0 is running on pCPU0
2. vCPU0 is halted and vCPU1 is currently running on pCPU0
3. An interrupt occurs for vCPU0, if we still use POSTED_INTR_VECTOR
for vCPU0, the notification event for vCPU0 (the event will go to pCPU1)
will be consumed by vCPU1 incorrectly. The worst case is that vCPU0
will never be woken up again since the wakeup event for it is always
consumed by other vCPUs incorrectly.

Thanks,
Feng

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
  2014-12-17 17:09   ` Paolo Bonzini
  2015-02-25 21:50   ` Marcelo Tosatti
@ 2015-02-26 23:40   ` Marcelo Tosatti
  2015-03-02 13:36     ` Wu, Feng
  2 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-02-26 23:40 UTC (permalink / raw)
  To: Feng Wu
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> This patch updates the Posted-Interrupts Descriptor when vCPU
> is blocked.
> 
> pre-block:
> - Add the vCPU to the blocked per-CPU list
> - Clear 'SN'
> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> 
> post-block:
> - Remove the vCPU from the per-CPU list
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/vmx.c              | 96 +++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c              | 22 +++++++---
>  include/linux/kvm_host.h        |  4 ++
>  virt/kvm/kvm_main.c             |  6 +++
>  5 files changed, 123 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 13e3e40..32c110a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
>  
>  #define ASYNC_PF_PER_VCPU 64
>  
> +extern void (*wakeup_handler_callback)(void);
> +
>  enum kvm_reg {
>  	VCPU_REGS_RAX = 0,
>  	VCPU_REGS_RCX = 1,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bf2e6cd..a1c83a2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
>  
> +/*
> + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> + * can find which vCPU should be waken up.
> + */
> +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> +
>  static unsigned long *vmx_io_bitmap_a;
>  static unsigned long *vmx_io_bitmap_b;
>  static unsigned long *vmx_msr_bitmap_legacy;
> @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>  		struct pi_desc old, new;
>  		unsigned int dest;
> +		unsigned long flags;
>  
>  		memset(&old, 0, sizeof(old));
>  		memset(&new, 0, sizeof(new));
> @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  			new.nv = POSTED_INTR_VECTOR;
>  		} while (cmpxchg(&pi_desc->control, old.control,
>  				new.control) != old.control);
> +
> +		/*
> +		 * Delete the vCPU from the related wakeup queue
> +		 * if we are resuming from blocked state
> +		 */
> +		if (vcpu->blocked) {
> +			vcpu->blocked = false;
> +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +				vcpu->wakeup_cpu), flags);
> +			list_del(&vcpu->blocked_vcpu_list);
> +			spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> +				vcpu->wakeup_cpu), flags);
> +			vcpu->wakeup_cpu = -1;
> +		}
>  	}
>  }
>  
> @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
>  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
>  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>  		struct pi_desc old, new;
> +		unsigned long flags;
> +		int cpu;
> +		struct cpumask cpu_others_mask;
>  
>  		memset(&old, 0, sizeof(old));
>  		memset(&new, 0, sizeof(new));
> @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
>  				pi_set_sn(&new);
>  			} while (cmpxchg(&pi_desc->control, old.control,
>  					new.control) != old.control);
> +		} else if (vcpu->blocked) {
> +			/*
> +			 * The vcpu is blocked on the wait queue.
> +			 * Store the blocked vCPU on the list of the
> +			 * vcpu->wakeup_cpu, which is the destination
> +			 * of the wake-up notification event.
> +			 */
> +			vcpu->wakeup_cpu = vcpu->cpu;
> +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +					  vcpu->wakeup_cpu), flags);
> +			list_add_tail(&vcpu->blocked_vcpu_list,
> +				      &per_cpu(blocked_vcpu_on_cpu,
> +				      vcpu->wakeup_cpu));
> +			spin_unlock_irqrestore(
> +					&per_cpu(blocked_vcpu_on_cpu_lock,
> +					vcpu->wakeup_cpu), flags);
> +
> +			do {
> +				old.control = new.control = pi_desc->control;
> +
> +				/*
> +				 * We should not block the vCPU if
> +				 * an interrupt is posted for it.
> +				 */
> +				if (pi_test_on(pi_desc) == 1) {
> +					/*
> +					 * We need schedule the wakeup worker
> +					 * on a different cpu other than
> +					 * vcpu->cpu, because in some case,
> +					 * schedule_work() will call
> +					 * try_to_wake_up() which needs acquire
> +					 * the rq lock. This can cause deadlock.
> +					 */
> +					cpumask_copy(&cpu_others_mask,
> +						     cpu_online_mask);
> +					cpu_clear(vcpu->cpu, cpu_others_mask);
> +					cpu = any_online_cpu(cpu_others_mask);
> +
> +					schedule_work_on(cpu,
> +							 &vcpu->wakeup_worker);
> +				}
> +
> +				pi_clear_sn(&new);
> +
> +				/* set 'NV' to 'wakeup vector' */
> +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> +			} while (cmpxchg(&pi_desc->control, old.control,
> +				new.control) != old.control);
>  		}

This can be done exclusively on HLT emulation, correct? (that is, on
entry to HLT and exit from HLT).

If the vcpu is scheduled out for any other reason (transition to
userspace or transition to other thread), it will eventually resume
execution. And in that case, continuation of execution does not depend
on the event (VT-d interrupt) notification.

There is a race window with the code above, I believe.

>  	}
>  
> @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
>  		return -EBUSY;
>  
>  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
>  
>  	/*
>  	 * Now we can enable the vmclear operation in kdump
> @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
>  	.pi_set_sn = vmx_pi_set_sn,
>  };
>  
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +void wakeup_handler(void)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int cpu = smp_processor_id();
> +
> +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +			blocked_vcpu_list) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +		if (pi_test_on(pi_desc) == 1)
> +			kvm_vcpu_kick(vcpu);
> +	}
> +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +}

Looping through all blocked vcpus does not scale:
Can you allocate more vectors and then multiplex those
vectors amongst the HLT'ed vcpus? 

It seems there is a bunch free:

commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
Author: Alex Shi <alex.shi@intel.com>
Date:   Thu Jun 28 09:02:23 2012 +0800

    x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR

Can you add only vcpus which have posted IRTEs that point to this pCPU
to the HLT'ed vcpu lists? (so for example, vcpus without assigned
devices are not part of the list).

> +
>  static int __init vmx_init(void)
>  {
>  	int r, i, msr;
> @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
>  
>  	update_ple_window_actual_max();
>  
> +	wakeup_handler_callback = wakeup_handler;
> +
>  	return 0;
>  
>  out7:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0033df3..1551a46 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_vcpu_reload_apic_access_page(vcpu);
>  	}
>  
> +	/*
> +	 * Since posted-interrupts can be set by VT-d HW now, in this
> +	 * case, KVM_REQ_EVENT is not set. We move the following
> +	 * operations out of the if statement.
> +	 */
> +	if (kvm_lapic_enabled(vcpu)) {
> +		/*
> +		 * Update architecture specific hints for APIC
> +		 * virtual interrupt delivery.
> +		 */
> +		if (kvm_x86_ops->hwapic_irr_update)
> +			kvm_x86_ops->hwapic_irr_update(vcpu,
> +				kvm_lapic_find_highest_irr(vcpu));
> +	}
> +

This is a hot fast path. You can set KVM_REQ_EVENT from wakeup_handler.

>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>  		kvm_apic_accept_events(vcpu);
>  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> @@ -6168,13 +6183,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_x86_ops->enable_irq_window(vcpu);
>  
>  		if (kvm_lapic_enabled(vcpu)) {
> -			/*
> -			 * Update architecture specific hints for APIC
> -			 * virtual interrupt delivery.
> -			 */
> -			if (kvm_x86_ops->hwapic_irr_update)
> -				kvm_x86_ops->hwapic_irr_update(vcpu,
> -					kvm_lapic_find_highest_irr(vcpu));
>  			update_cr8_intercept(vcpu);
>  			kvm_lapic_sync_to_vapic(vcpu);
>  		}
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3d7242c..d981d16 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -239,6 +239,9 @@ struct kvm_vcpu {
>  	unsigned long requests;
>  	unsigned long guest_debug;
>  
> +	int wakeup_cpu;
> +	struct list_head blocked_vcpu_list;
> +
>  	struct mutex mutex;
>  	struct kvm_run *run;
>  
> @@ -282,6 +285,7 @@ struct kvm_vcpu {
>  	} spin_loop;
>  #endif
>  	bool preempted;
> +	bool blocked;
>  	struct kvm_vcpu_arch arch;
>  };

Please remove blocked and wakeup_cpu, they should not be necessary.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ba53fd6..6deb994 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -233,6 +233,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>  
>  	INIT_WORK(&vcpu->wakeup_worker, wakeup_thread);
>  
> +	vcpu->wakeup_cpu = -1;
> +	INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
> +
>  	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>  	if (!page) {
>  		r = -ENOMEM;
> @@ -243,6 +246,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>  	kvm_vcpu_set_in_spin_loop(vcpu, false);
>  	kvm_vcpu_set_dy_eligible(vcpu, false);
>  	vcpu->preempted = false;
> +	vcpu->blocked = false;
>  
>  	r = kvm_arch_vcpu_init(vcpu);
>  	if (r < 0)
> @@ -1752,6 +1756,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	DEFINE_WAIT(wait);
>  
>  	for (;;) {
> +		vcpu->blocked = true;
>  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
>  		if (kvm_arch_vcpu_runnable(vcpu)) {
> @@ -1767,6 +1772,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	}
>  
>  	finish_wait(&vcpu->wq, &wait);
> +	vcpu->blocked = false;
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-02-26  8:08     ` Wu, Feng
@ 2015-02-26 23:41       ` Marcelo Tosatti
  0 siblings, 0 replies; 140+ messages in thread
From: Marcelo Tosatti @ 2015-02-26 23:41 UTC (permalink / raw)
  To: Wu, Feng
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Zhang, Yang Z

On Thu, Feb 26, 2015 at 08:08:15AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Thursday, February 26, 2015 5:50 AM
> > To: Wu, Feng
> > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > is blocked.
> > >
> > > pre-block:
> > > - Add the vCPU to the blocked per-CPU list
> > > - Clear 'SN'
> > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > >
> > > post-block:
> > > - Remove the vCPU from the per-CPU list
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > 
> > Don't see this is needed, can use the existing POSTED_INTR_VECTOR:
> > 
> > If in guest mode, IPI will be handled in VMX non-root by performed
> > PIR->IRR transfer.
> > 
> > If outside guest mode, POSTED_INTR_VECTOR IPI will be handled by host
> > which can wakeup the guest (in case it is halted).
> 
> Please see the following scenario:
> 
> 1. vCPU0 is running on pCPU0
> 2. vCPU0 is halted and vCPU1 is currently running on pCPU0
> 3. An interrupt occurs for vCPU0, if we still use POSTED_INTR_VECTOR
> for vCPU0, the notification event for vCPU0 (the event will go to pCPU1)
> will be consumed by vCPU1 incorrectly. The worst case is that vCPU0
> will never be woken up again since the wakeup event for it is always
> consumed by other vCPUs incorrectly.
> 
> Thanks,
> Feng

Ouch, yes.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2015-02-23 22:21   ` Marcelo Tosatti
@ 2015-03-02  9:12     ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-03-02  9:12 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Tuesday, February 24, 2015 6:22 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is preempted
> 
> On Fri, Dec 12, 2014 at 11:14:57PM +0800, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is preempted.
> >
> > sched out:
> > - Set 'SN' to suppress furture non-urgent interrupts posted for
> > the vCPU.
> 
> What wakes the vcpu in the case of a non-urgent interrupt, then?

Here we set 'SN' when vCPU's state is transmitted from running to
runnable (waiting in the runqueue), after the vCPU is chosen to run
again, the 'SN' will be clear. So no need to wakeup it explicitly.

> 
> I wonder how is software suppose to configure the urgent/non-urgent
> flag. Can you give examples of (hypothetical) urgent and non-urgent
> interrupts.

Well, urgent and non-urgent flag is supported in hardware, I think the
original purpose of urgent interrupts is for real time usage. Then, when
such urgent interrupts happen, we can change the behavior of the
scheduler and make the related vCPU run immediately. However, from
software's point of view, we didn't find a clear picture about which
interrupts should be urgent and how to configure it, so we don't support
this currently.

> 
> > sched in:
> > - Clear 'SN'
> > - Change NDST if vCPU is scheduled to a different CPU
> > - Set 'NV' to POSTED_INTR_VECTOR
> 
> What about:
> 
> POSTED_INTR_VECTOR interrupt handler:
> - Wakeup vcpu.
- If the vCPU is still running (not preempted), we don't need
to wakeup it. 
- In POSTED_INTR_VECTOR interrupt handler, it is a little hard
to get vCPU related information, even if we get, it is not accurate
and may harm the performance. (need search)

> - Set 'SN' to suppress future interrupts.
We only need to set 'SN' when the vCPU is waiting on the runqueue,
So seems set 'SN' in this handler is not a good idea.

> 
> HLT emulation entry:
> - Clear 'SN' to receive VT-d interrupt notification.
> 
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  arch/x86/kvm/vmx.c | 44
> ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index ee3b735..bf2e6cd 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -1916,10 +1916,54 @@ static void vmx_vcpu_load(struct kvm_vcpu
> *vcpu, int cpu)
> >  		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
> >  		vmx->loaded_vmcs->cpu = cpu;
> >  	}
> > +
> > +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +		struct pi_desc old, new;
> > +		unsigned int dest;
> > +
> > +		memset(&old, 0, sizeof(old));
> > +		memset(&new, 0, sizeof(new));
> > +
> > +		do {
> > +			old.control = new.control = pi_desc->control;
> > +			if (vcpu->cpu != cpu) {
> > +				dest = cpu_physical_id(cpu);
> > +
> > +				if (x2apic_enabled())
> > +					new.ndst = dest;
> > +				else
> > +					new.ndst = (dest << 8) & 0xFF00;
> > +			}
> > +
> > +			pi_clear_sn(&new);
> > +
> > +			/* set 'NV' to 'notification vector' */
> > +			new.nv = POSTED_INTR_VECTOR;
> > +		} while (cmpxchg(&pi_desc->control, old.control,
> > +				new.control) != old.control);
> > +	}
> >  }
> >
> >  static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> >  {
> > +	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +		struct pi_desc old, new;
> > +
> > +		memset(&old, 0, sizeof(old));
> > +		memset(&new, 0, sizeof(new));
> > +
> > +		/* Set SN when the vCPU is preempted */
> > +		if (vcpu->preempted) {
> > +			do {
> > +				old.control = new.control = pi_desc->control;
> > +				pi_set_sn(&new);
> > +			} while (cmpxchg(&pi_desc->control, old.control,
> > +					new.control) != old.control);
> > +		}
> > +	}
> > +
> >  	__vmx_load_host_state(to_vmx(vcpu));
> >  	if (!vmm_exclusive) {
> >  		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-02-26 23:40   ` Marcelo Tosatti
@ 2015-03-02 13:36     ` Wu, Feng
  2015-03-04 12:06       ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-03-02 13:36 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Friday, February 27, 2015 7:41 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > This patch updates the Posted-Interrupts Descriptor when vCPU
> > is blocked.
> >
> > pre-block:
> > - Add the vCPU to the blocked per-CPU list
> > - Clear 'SN'
> > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > post-block:
> > - Remove the vCPU from the per-CPU list
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  2 +
> >  arch/x86/kvm/vmx.c              | 96
> +++++++++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c              | 22 +++++++---
> >  include/linux/kvm_host.h        |  4 ++
> >  virt/kvm/kvm_main.c             |  6 +++
> >  5 files changed, 123 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 13e3e40..32c110a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> base_gfn, int level)
> >
> >  #define ASYNC_PF_PER_VCPU 64
> >
> > +extern void (*wakeup_handler_callback)(void);
> > +
> >  enum kvm_reg {
> >  	VCPU_REGS_RAX = 0,
> >  	VCPU_REGS_RCX = 1,
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index bf2e6cd..a1c83a2 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> current_vmcs);
> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> >
> > +/*
> > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > + * can find which vCPU should be waken up.
> > + */
> > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > +
> >  static unsigned long *vmx_io_bitmap_a;
> >  static unsigned long *vmx_io_bitmap_b;
> >  static unsigned long *vmx_msr_bitmap_legacy;
> > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu,
> int cpu)
> >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> >  		struct pi_desc old, new;
> >  		unsigned int dest;
> > +		unsigned long flags;
> >
> >  		memset(&old, 0, sizeof(old));
> >  		memset(&new, 0, sizeof(new));
> > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu
> *vcpu, int cpu)
> >  			new.nv = POSTED_INTR_VECTOR;
> >  		} while (cmpxchg(&pi_desc->control, old.control,
> >  				new.control) != old.control);
> > +
> > +		/*
> > +		 * Delete the vCPU from the related wakeup queue
> > +		 * if we are resuming from blocked state
> > +		 */
> > +		if (vcpu->blocked) {
> > +			vcpu->blocked = false;
> > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +				vcpu->wakeup_cpu), flags);
> > +			list_del(&vcpu->blocked_vcpu_list);
> > +			spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +				vcpu->wakeup_cpu), flags);
> > +			vcpu->wakeup_cpu = -1;
> > +		}
> >  	}
> >  }
> >
> > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> >  		struct pi_desc old, new;
> > +		unsigned long flags;
> > +		int cpu;
> > +		struct cpumask cpu_others_mask;
> >
> >  		memset(&old, 0, sizeof(old));
> >  		memset(&new, 0, sizeof(new));
> > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu
> *vcpu)
> >  				pi_set_sn(&new);
> >  			} while (cmpxchg(&pi_desc->control, old.control,
> >  					new.control) != old.control);
> > +		} else if (vcpu->blocked) {
> > +			/*
> > +			 * The vcpu is blocked on the wait queue.
> > +			 * Store the blocked vCPU on the list of the
> > +			 * vcpu->wakeup_cpu, which is the destination
> > +			 * of the wake-up notification event.
> > +			 */
> > +			vcpu->wakeup_cpu = vcpu->cpu;
> > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					  vcpu->wakeup_cpu), flags);
> > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > +				      &per_cpu(blocked_vcpu_on_cpu,
> > +				      vcpu->wakeup_cpu));
> > +			spin_unlock_irqrestore(
> > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > +					vcpu->wakeup_cpu), flags);
> > +
> > +			do {
> > +				old.control = new.control = pi_desc->control;
> > +
> > +				/*
> > +				 * We should not block the vCPU if
> > +				 * an interrupt is posted for it.
> > +				 */
> > +				if (pi_test_on(pi_desc) == 1) {
> > +					/*
> > +					 * We need schedule the wakeup worker
> > +					 * on a different cpu other than
> > +					 * vcpu->cpu, because in some case,
> > +					 * schedule_work() will call
> > +					 * try_to_wake_up() which needs acquire
> > +					 * the rq lock. This can cause deadlock.
> > +					 */
> > +					cpumask_copy(&cpu_others_mask,
> > +						     cpu_online_mask);
> > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > +					cpu = any_online_cpu(cpu_others_mask);
> > +
> > +					schedule_work_on(cpu,
> > +							 &vcpu->wakeup_worker);
> > +				}
> > +
> > +				pi_clear_sn(&new);
> > +
> > +				/* set 'NV' to 'wakeup vector' */
> > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > +			} while (cmpxchg(&pi_desc->control, old.control,
> > +				new.control) != old.control);
> >  		}
> 
> This can be done exclusively on HLT emulation, correct? (that is, on
> entry to HLT and exit from HLT).

Do you mean the following?
In kvm_emulate_halt(), we do:
1. Add vCPU in the blocking list
2. Clear 'SN'
3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR

In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
Bloc king list.

> 
> If the vcpu is scheduled out for any other reason (transition to
> userspace or transition to other thread), it will eventually resume
> execution. And in that case, continuation of execution does not depend
> on the event (VT-d interrupt) notification.

Yes, I think this is true for my current implementation, right?

> 
> There is a race window with the code above, I believe.

I did careful code review back and forth for the above code, It will
be highly appreciated if you can point out the race window!

> 
> >  	}
> >
> > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> >  		return -EBUSY;
> >
> >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> >
> >  	/*
> >  	 * Now we can enable the vmclear operation in kdump
> > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >  	.pi_set_sn = vmx_pi_set_sn,
> >  };
> >
> > +/*
> > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > + */
> > +void wakeup_handler(void)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	int cpu = smp_processor_id();
> > +
> > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > +			blocked_vcpu_list) {
> > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > +
> > +		if (pi_test_on(pi_desc) == 1)
> > +			kvm_vcpu_kick(vcpu);
> > +	}
> > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > +}
> 
> Looping through all blocked vcpus does not scale:
> Can you allocate more vectors and then multiplex those
> vectors amongst the HLT'ed vcpus?

I am a little confused about this, can you elaborate it a bit more?
Thanks a lot!

> 
> It seems there is a bunch free:
> 
> commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> Author: Alex Shi <alex.shi@intel.com>
> Date:   Thu Jun 28 09:02:23 2012 +0800
> 
>     x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
> 
> Can you add only vcpus which have posted IRTEs that point to this pCPU
> to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> devices are not part of the list).

Is it easy to find whether a vCPU (or the associated domain) has assigned devices?
If so, we can only add those vCPUs with assigned devices.

> 
> > +
> >  static int __init vmx_init(void)
> >  {
> >  	int r, i, msr;
> > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> >
> >  	update_ple_window_actual_max();
> >
> > +	wakeup_handler_callback = wakeup_handler;
> > +
> >  	return 0;
> >
> >  out7:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 0033df3..1551a46 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_vcpu_reload_apic_access_page(vcpu);
> >  	}
> >
> > +	/*
> > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > +	 * operations out of the if statement.
> > +	 */
> > +	if (kvm_lapic_enabled(vcpu)) {
> > +		/*
> > +		 * Update architecture specific hints for APIC
> > +		 * virtual interrupt delivery.
> > +		 */
> > +		if (kvm_x86_ops->hwapic_irr_update)
> > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > +				kvm_lapic_find_highest_irr(vcpu));
> > +	}
> > +
> 
> This is a hot fast path. You can set KVM_REQ_EVENT from wakeup_handler.

I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help much,
if vCPU is running in ROOT mode, and VT-d hardware issues an notification event,
POSTED_INTR_VECTOR interrupt handler will be called.

Again, POSTED_INTR_VECTOR interrupt handler may be called very frequently,
it is a little hard to get vCPU related information in it, even if we get, it is not
accurate and may harm the performance.(need search)

> 
> >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >  		kvm_apic_accept_events(vcpu);
> >  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > @@ -6168,13 +6183,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> *vcpu)
> >  			kvm_x86_ops->enable_irq_window(vcpu);
> >
> >  		if (kvm_lapic_enabled(vcpu)) {
> > -			/*
> > -			 * Update architecture specific hints for APIC
> > -			 * virtual interrupt delivery.
> > -			 */
> > -			if (kvm_x86_ops->hwapic_irr_update)
> > -				kvm_x86_ops->hwapic_irr_update(vcpu,
> > -					kvm_lapic_find_highest_irr(vcpu));
> >  			update_cr8_intercept(vcpu);
> >  			kvm_lapic_sync_to_vapic(vcpu);
> >  		}
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 3d7242c..d981d16 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -239,6 +239,9 @@ struct kvm_vcpu {
> >  	unsigned long requests;
> >  	unsigned long guest_debug;
> >
> > +	int wakeup_cpu;
> > +	struct list_head blocked_vcpu_list;
> > +
> >  	struct mutex mutex;
> >  	struct kvm_run *run;
> >
> > @@ -282,6 +285,7 @@ struct kvm_vcpu {
> >  	} spin_loop;
> >  #endif
> >  	bool preempted;
> > +	bool blocked;
> >  	struct kvm_vcpu_arch arch;
> >  };
> 
> Please remove blocked and wakeup_cpu, they should not be necessary.

Why do you think wakeup_cpu is not needed, when vCPU is blocked, 
wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
is woken up, it can run on a different cpu, so we need wakeup_cpu to
find the right list to wake up the vCPU.

Thanks,
Feng

> 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ba53fd6..6deb994 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -233,6 +233,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
> *kvm, unsigned id)
> >
> >  	INIT_WORK(&vcpu->wakeup_worker, wakeup_thread);
> >
> > +	vcpu->wakeup_cpu = -1;
> > +	INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
> > +
> >  	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >  	if (!page) {
> >  		r = -ENOMEM;
> > @@ -243,6 +246,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
> *kvm, unsigned id)
> >  	kvm_vcpu_set_in_spin_loop(vcpu, false);
> >  	kvm_vcpu_set_dy_eligible(vcpu, false);
> >  	vcpu->preempted = false;
> > +	vcpu->blocked = false;
> >
> >  	r = kvm_arch_vcpu_init(vcpu);
> >  	if (r < 0)
> > @@ -1752,6 +1756,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  	DEFINE_WAIT(wait);
> >
> >  	for (;;) {
> > +		vcpu->blocked = true;
> >  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >
> >  		if (kvm_arch_vcpu_runnable(vcpu)) {
> > @@ -1767,6 +1772,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  	}
> >
> >  	finish_wait(&vcpu->wq, &wait);
> > +	vcpu->blocked = false;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_vcpu_block);
> >
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-02 13:36     ` Wu, Feng
@ 2015-03-04 12:06       ` Marcelo Tosatti
  2015-03-06  6:51         ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-03-04 12:06 UTC (permalink / raw)
  To: Wu, Feng
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Mon, Mar 02, 2015 at 01:36:51PM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Friday, February 27, 2015 7:41 AM
> > To: Wu, Feng
> > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > is blocked.
> > >
> > > pre-block:
> > > - Add the vCPU to the blocked per-CPU list
> > > - Clear 'SN'
> > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > >
> > > post-block:
> > > - Remove the vCPU from the per-CPU list
> > >
> > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  2 +
> > >  arch/x86/kvm/vmx.c              | 96
> > +++++++++++++++++++++++++++++++++++++++++
> > >  arch/x86/kvm/x86.c              | 22 +++++++---
> > >  include/linux/kvm_host.h        |  4 ++
> > >  virt/kvm/kvm_main.c             |  6 +++
> > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > > index 13e3e40..32c110a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> > base_gfn, int level)
> > >
> > >  #define ASYNC_PF_PER_VCPU 64
> > >
> > > +extern void (*wakeup_handler_callback)(void);
> > > +
> > >  enum kvm_reg {
> > >  	VCPU_REGS_RAX = 0,
> > >  	VCPU_REGS_RCX = 1,
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index bf2e6cd..a1c83a2 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > current_vmcs);
> > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > >
> > > +/*
> > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > > + * can find which vCPU should be waken up.
> > > + */
> > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > +
> > >  static unsigned long *vmx_io_bitmap_a;
> > >  static unsigned long *vmx_io_bitmap_b;
> > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu,
> > int cpu)
> > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > >  		struct pi_desc old, new;
> > >  		unsigned int dest;
> > > +		unsigned long flags;
> > >
> > >  		memset(&old, 0, sizeof(old));
> > >  		memset(&new, 0, sizeof(new));
> > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > *vcpu, int cpu)
> > >  			new.nv = POSTED_INTR_VECTOR;
> > >  		} while (cmpxchg(&pi_desc->control, old.control,
> > >  				new.control) != old.control);
> > > +
> > > +		/*
> > > +		 * Delete the vCPU from the related wakeup queue
> > > +		 * if we are resuming from blocked state
> > > +		 */
> > > +		if (vcpu->blocked) {
> > > +			vcpu->blocked = false;
> > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +				vcpu->wakeup_cpu), flags);
> > > +			list_del(&vcpu->blocked_vcpu_list);
> > > +			spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +				vcpu->wakeup_cpu), flags);
> > > +			vcpu->wakeup_cpu = -1;
> > > +		}
> > >  	}
> > >  }
> > >
> > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> > >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > >  		struct pi_desc old, new;
> > > +		unsigned long flags;
> > > +		int cpu;
> > > +		struct cpumask cpu_others_mask;
> > >
> > >  		memset(&old, 0, sizeof(old));
> > >  		memset(&new, 0, sizeof(new));
> > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > *vcpu)
> > >  				pi_set_sn(&new);
> > >  			} while (cmpxchg(&pi_desc->control, old.control,
> > >  					new.control) != old.control);
> > > +		} else if (vcpu->blocked) {
> > > +			/*
> > > +			 * The vcpu is blocked on the wait queue.
> > > +			 * Store the blocked vCPU on the list of the
> > > +			 * vcpu->wakeup_cpu, which is the destination
> > > +			 * of the wake-up notification event.
> > > +			 */
> > > +			vcpu->wakeup_cpu = vcpu->cpu;
> > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +					  vcpu->wakeup_cpu), flags);
> > > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > > +				      &per_cpu(blocked_vcpu_on_cpu,
> > > +				      vcpu->wakeup_cpu));
> > > +			spin_unlock_irqrestore(
> > > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > +					vcpu->wakeup_cpu), flags);
> > > +
> > > +			do {
> > > +				old.control = new.control = pi_desc->control;
> > > +
> > > +				/*
> > > +				 * We should not block the vCPU if
> > > +				 * an interrupt is posted for it.
> > > +				 */
> > > +				if (pi_test_on(pi_desc) == 1) {
> > > +					/*
> > > +					 * We need schedule the wakeup worker
> > > +					 * on a different cpu other than
> > > +					 * vcpu->cpu, because in some case,
> > > +					 * schedule_work() will call
> > > +					 * try_to_wake_up() which needs acquire
> > > +					 * the rq lock. This can cause deadlock.
> > > +					 */
> > > +					cpumask_copy(&cpu_others_mask,
> > > +						     cpu_online_mask);
> > > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > > +					cpu = any_online_cpu(cpu_others_mask);
> > > +
> > > +					schedule_work_on(cpu,
> > > +							 &vcpu->wakeup_worker);
> > > +				}
> > > +
> > > +				pi_clear_sn(&new);
> > > +
> > > +				/* set 'NV' to 'wakeup vector' */
> > > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > > +			} while (cmpxchg(&pi_desc->control, old.control,
> > > +				new.control) != old.control);
> > >  		}
> > 
> > This can be done exclusively on HLT emulation, correct? (that is, on
> > entry to HLT and exit from HLT).
> 
> Do you mean the following?
> In kvm_emulate_halt(), we do:
> 1. Add vCPU in the blocking list
> 2. Clear 'SN'
> 3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> 
> In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
> Bloc king list.

Yes (please check its OK to do this...).

> > If the vcpu is scheduled out for any other reason (transition to
> > userspace or transition to other thread), it will eventually resume
> > execution. And in that case, continuation of execution does not depend
> > on the event (VT-d interrupt) notification.
> 
> Yes, I think this is true for my current implementation, right?
> 
> > 
> > There is a race window with the code above, I believe.
> 
> I did careful code review back and forth for the above code, It will
> be highly appreciated if you can point out the race window!

So the remapping HW sees either POSTED_INTR_VECTOR or 
POSTED_INTR_WAKEUP_VECTOR.

You should:

1. Set POSTED_INTR_WAKEUP_VECTOR.
2. Check for PIR / ON bit, which might have been set by
POSTED_INTR_VECTOR notification.
3. emulate HLT.

> > >  	} 
> > >
> > > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> > >  		return -EBUSY;
> > >
> > >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > >
> > >  	/*
> > >  	 * Now we can enable the vmclear operation in kdump
> > > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > >  	.pi_set_sn = vmx_pi_set_sn,
> > >  };
> > >
> > > +/*
> > > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > > + */
> > > +void wakeup_handler(void)
> > > +{
> > > +	struct kvm_vcpu *vcpu;
> > > +	int cpu = smp_processor_id();
> > > +
> > > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > > +			blocked_vcpu_list) {
> > > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > +
> > > +		if (pi_test_on(pi_desc) == 1)
> > > +			kvm_vcpu_kick(vcpu);
> > > +	}
> > > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > +}
> > 
> > Looping through all blocked vcpus does not scale:
> > Can you allocate more vectors and then multiplex those
> > vectors amongst the HLT'ed vcpus?
> 
> I am a little confused about this, can you elaborate it a bit more?
> Thanks a lot!

Picture the following overcommitment scenario:

* High ratio of vCPUs/pCPUs, in the ratio 128/1 (this is exaggerated
to demonstrate the issue).
* Every VT-d interrupt is going to scan 128 entries in the list.

Moreover, the test:

		if (pi_test_on(pi_desc) == 1)
			kvm_vcpu_kick(vcpu);

Can trigger for vCPUs which have not been waken up due 
to VT-d interrupts, but for other interrupts.

You can allocate, say 16 vectors on the pCPU for VT-d interrupts:

POSTED_INTERRUPT_WAKEUP_VECTOR_1, POSTED_INTERRUPT_WAKEUP_VECTOR_2,
...

> > It seems there is a bunch free:
> > 
> > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > Author: Alex Shi <alex.shi@intel.com>
> > Date:   Thu Jun 28 09:02:23 2012 +0800
> > 
> >     x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
> > 
> > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > devices are not part of the list).
> 
> Is it easy to find whether a vCPU (or the associated domain) has assigned devices?
> If so, we can only add those vCPUs with assigned devices.

When configuring IRTE, at kvm_arch_vfio_update_pi_irte?

> > > +
> > >  static int __init vmx_init(void)
> > >  {
> > >  	int r, i, msr;
> > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > >
> > >  	update_ple_window_actual_max();
> > >
> > > +	wakeup_handler_callback = wakeup_handler;
> > > +
> > >  	return 0;
> > >
> > >  out7:
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 0033df3..1551a46 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > >  	}
> > >
> > > +	/*
> > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > +	 * operations out of the if statement.
> > > +	 */
> > > +	if (kvm_lapic_enabled(vcpu)) {
> > > +		/*
> > > +		 * Update architecture specific hints for APIC
> > > +		 * virtual interrupt delivery.
> > > +		 */
> > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > +				kvm_lapic_find_highest_irr(vcpu));
> > > +	}
> > > +
> > 
> > This is a hot fast path. You can set KVM_REQ_EVENT from wakeup_handler.
> 
> I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help much,
> if vCPU is running in ROOT mode, and VT-d hardware issues an notification event,
> POSTED_INTR_VECTOR interrupt handler will be called.

If vCPU is in root mode, remapping HW will find IRTE configured with
vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
VM-exit, and execute the interrupt handler wakeup_handler. Right?

The point of this comment is that you can keep the 

"if (kvm_x86_ops->hwapic_irr_update)
	kvm_x86_ops->hwapic_irr_update(vcpu,
			kvm_lapic_find_highest_irr(vcpu));
"

Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
wakeup_handler sets KVM_REQ_EVENT.

> Again, POSTED_INTR_VECTOR interrupt handler may be called very frequently,
> it is a little hard to get vCPU related information in it, even if we get, it is not
> accurate and may harm the performance.(need search)
> 
> > 
> > >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > >  		kvm_apic_accept_events(vcpu);
> > >  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > > @@ -6168,13 +6183,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> > >  			kvm_x86_ops->enable_irq_window(vcpu);
> > >
> > >  		if (kvm_lapic_enabled(vcpu)) {
> > > -			/*
> > > -			 * Update architecture specific hints for APIC
> > > -			 * virtual interrupt delivery.
> > > -			 */
> > > -			if (kvm_x86_ops->hwapic_irr_update)
> > > -				kvm_x86_ops->hwapic_irr_update(vcpu,
> > > -					kvm_lapic_find_highest_irr(vcpu));
> > >  			update_cr8_intercept(vcpu);
> > >  			kvm_lapic_sync_to_vapic(vcpu);
> > >  		}
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 3d7242c..d981d16 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -239,6 +239,9 @@ struct kvm_vcpu {
> > >  	unsigned long requests;
> > >  	unsigned long guest_debug;
> > >
> > > +	int wakeup_cpu;
> > > +	struct list_head blocked_vcpu_list;
> > > +
> > >  	struct mutex mutex;
> > >  	struct kvm_run *run;
> > >
> > > @@ -282,6 +285,7 @@ struct kvm_vcpu {
> > >  	} spin_loop;
> > >  #endif
> > >  	bool preempted;
> > > +	bool blocked;
> > >  	struct kvm_vcpu_arch arch;
> > >  };
> > 
> > Please remove blocked and wakeup_cpu, they should not be necessary.
> 
> Why do you think wakeup_cpu is not needed, when vCPU is blocked, 
> wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> is woken up, it can run on a different cpu, so we need wakeup_cpu to
> find the right list to wake up the vCPU.

If the vCPU was moved it should have updated IRTE destination field
to the pCPU which it has moved to?


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-04 12:06       ` Marcelo Tosatti
@ 2015-03-06  6:51         ` Wu, Feng
  2015-03-12  1:15           ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-03-06  6:51 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Wednesday, March 04, 2015 8:06 PM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Mon, Mar 02, 2015 at 01:36:51PM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Friday, February 27, 2015 7:41 AM
> > > To: Wu, Feng
> > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > > is blocked.
> > > >
> > > > pre-block:
> > > > - Add the vCPU to the blocked per-CPU list
> > > > - Clear 'SN'
> > > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > >
> > > > post-block:
> > > > - Remove the vCPU from the per-CPU list
> > > >
> > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > ---
> > > >  arch/x86/include/asm/kvm_host.h |  2 +
> > > >  arch/x86/kvm/vmx.c              | 96
> > > +++++++++++++++++++++++++++++++++++++++++
> > > >  arch/x86/kvm/x86.c              | 22 +++++++---
> > > >  include/linux/kvm_host.h        |  4 ++
> > > >  virt/kvm/kvm_main.c             |  6 +++
> > > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > b/arch/x86/include/asm/kvm_host.h
> > > > index 13e3e40..32c110a 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> > > base_gfn, int level)
> > > >
> > > >  #define ASYNC_PF_PER_VCPU 64
> > > >
> > > > +extern void (*wakeup_handler_callback)(void);
> > > > +
> > > >  enum kvm_reg {
> > > >  	VCPU_REGS_RAX = 0,
> > > >  	VCPU_REGS_RCX = 1,
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index bf2e6cd..a1c83a2 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > > current_vmcs);
> > > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > > >
> > > > +/*
> > > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > > > + * can find which vCPU should be waken up.
> > > > + */
> > > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > > +
> > > >  static unsigned long *vmx_io_bitmap_a;
> > > >  static unsigned long *vmx_io_bitmap_b;
> > > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu
> *vcpu,
> > > int cpu)
> > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > >  		struct pi_desc old, new;
> > > >  		unsigned int dest;
> > > > +		unsigned long flags;
> > > >
> > > >  		memset(&old, 0, sizeof(old));
> > > >  		memset(&new, 0, sizeof(new));
> > > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > > *vcpu, int cpu)
> > > >  			new.nv = POSTED_INTR_VECTOR;
> > > >  		} while (cmpxchg(&pi_desc->control, old.control,
> > > >  				new.control) != old.control);
> > > > +
> > > > +		/*
> > > > +		 * Delete the vCPU from the related wakeup queue
> > > > +		 * if we are resuming from blocked state
> > > > +		 */
> > > > +		if (vcpu->blocked) {
> > > > +			vcpu->blocked = false;
> > > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +				vcpu->wakeup_cpu), flags);
> > > > +			list_del(&vcpu->blocked_vcpu_list);
> > > > +
> 	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +				vcpu->wakeup_cpu), flags);
> > > > +			vcpu->wakeup_cpu = -1;
> > > > +		}
> > > >  	}
> > > >  }
> > > >
> > > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu
> *vcpu)
> > > >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > >  		struct pi_desc old, new;
> > > > +		unsigned long flags;
> > > > +		int cpu;
> > > > +		struct cpumask cpu_others_mask;
> > > >
> > > >  		memset(&old, 0, sizeof(old));
> > > >  		memset(&new, 0, sizeof(new));
> > > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > > *vcpu)
> > > >  				pi_set_sn(&new);
> > > >  			} while (cmpxchg(&pi_desc->control, old.control,
> > > >  					new.control) != old.control);
> > > > +		} else if (vcpu->blocked) {
> > > > +			/*
> > > > +			 * The vcpu is blocked on the wait queue.
> > > > +			 * Store the blocked vCPU on the list of the
> > > > +			 * vcpu->wakeup_cpu, which is the destination
> > > > +			 * of the wake-up notification event.
> > > > +			 */
> > > > +			vcpu->wakeup_cpu = vcpu->cpu;
> > > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +					  vcpu->wakeup_cpu), flags);
> > > > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > > > +				      &per_cpu(blocked_vcpu_on_cpu,
> > > > +				      vcpu->wakeup_cpu));
> > > > +			spin_unlock_irqrestore(
> > > > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > +					vcpu->wakeup_cpu), flags);
> > > > +
> > > > +			do {
> > > > +				old.control = new.control = pi_desc->control;
> > > > +
> > > > +				/*
> > > > +				 * We should not block the vCPU if
> > > > +				 * an interrupt is posted for it.
> > > > +				 */
> > > > +				if (pi_test_on(pi_desc) == 1) {
> > > > +					/*
> > > > +					 * We need schedule the wakeup worker
> > > > +					 * on a different cpu other than
> > > > +					 * vcpu->cpu, because in some case,
> > > > +					 * schedule_work() will call
> > > > +					 * try_to_wake_up() which needs acquire
> > > > +					 * the rq lock. This can cause deadlock.
> > > > +					 */
> > > > +					cpumask_copy(&cpu_others_mask,
> > > > +						     cpu_online_mask);
> > > > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > > > +					cpu = any_online_cpu(cpu_others_mask);
> > > > +
> > > > +					schedule_work_on(cpu,
> > > > +							 &vcpu->wakeup_worker);
> > > > +				}
> > > > +
> > > > +				pi_clear_sn(&new);
> > > > +
> > > > +				/* set 'NV' to 'wakeup vector' */
> > > > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > > > +			} while (cmpxchg(&pi_desc->control, old.control,
> > > > +				new.control) != old.control);
> > > >  		}
> > >
> > > This can be done exclusively on HLT emulation, correct? (that is, on
> > > entry to HLT and exit from HLT).
> >
> > Do you mean the following?
> > In kvm_emulate_halt(), we do:
> > 1. Add vCPU in the blocking list
> > 2. Clear 'SN'
> > 3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >
> > In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
> > Bloc king list.
> 
> Yes (please check its OK to do this...).

I think about this for some time, and I feel this may be another solution
to implement it. Do you mind sharing your ideas about why do you think
this alternative is better than the current one? Thanks a lot!

> 
> > > If the vcpu is scheduled out for any other reason (transition to
> > > userspace or transition to other thread), it will eventually resume
> > > execution. And in that case, continuation of execution does not depend
> > > on the event (VT-d interrupt) notification.
> >
> > Yes, I think this is true for my current implementation, right?
> >
> > >
> > > There is a race window with the code above, I believe.
> >
> > I did careful code review back and forth for the above code, It will
> > be highly appreciated if you can point out the race window!
> 
> So the remapping HW sees either POSTED_INTR_VECTOR or
> POSTED_INTR_WAKEUP_VECTOR.
> 
> You should:
> 
> 1. Set POSTED_INTR_WAKEUP_VECTOR.
> 2. Check for PIR / ON bit, which might have been set by
> POSTED_INTR_VECTOR notification.
> 3. emulate HLT.

My original idea for pre-block operation is:
1. Add vCPU to the per-cpu blocking list. Here is the code for this in my patch:
+                       vcpu->wakeup_cpu = vcpu->cpu;
+                       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+                                         vcpu->wakeup_cpu), flags);
+                       list_add_tail(&vcpu->blocked_vcpu_list,
+                                     &per_cpu(blocked_vcpu_on_cpu,
+                                     vcpu->wakeup_cpu));
+                       spin_unlock_irqrestore(
+                                       &per_cpu(blocked_vcpu_on_cpu_lock,
+                                       vcpu->wakeup_cpu), flags);
2. Update Posted-interrupt descriptor, here is the code in my patch:
+                       do {
+                               old.control = new.control = pi_desc->control;
+
+                               /*
+                                * We should not block the vCPU if
+                                * an interrupt is posted for it.
+                                */
+                               if (pi_test_on(pi_desc) == 1) {
+                                       /*
+                                        * We need schedule the wakeup worker
+                                        * on a different cpu other than
+                                        * vcpu->cpu, because in some case,
+                                        * schedule_work() will call
+                                        * try_to_wake_up() which needs acquire
+                                        * the rq lock. This can cause deadlock.
+                                        */
+                                       cpumask_copy(&cpu_others_mask,
+                                                    cpu_online_mask);
+                                       cpu_clear(vcpu->cpu, cpu_others_mask);
+                                       cpu = any_online_cpu(cpu_others_mask);
+
+                                       schedule_work_on(cpu,
+                                                        &vcpu->wakeup_worker);
+                               }
+
+                               WARN((pi_desc->sn == 1),
+                                    "Warning: SN field of posted-interrupts "
+                                    "is set before blocking\n");
+
+                               /* set 'NV' to 'wakeup vector' */
+                               new.nv = POSTED_INTR_WAKEUP_VECTOR;
+                       } while (cmpxchg(&pi_desc->control, old.control,
+                               new.control) != old.control);

If PIR/ON bit is set by POSTED_INTR_VECTOR notification during the above operation, we will stop
blocking the vCPU like about. But seems I missed something in the above code which should be in
my mind from the beginning, I should add a 'break' in the end the above ' if (pi_test_on(pi_desc) == 1) {}',
so in this case, the 'NV' filed remains unchanged.

> 
> > > >  	}
> > > >
> > > > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> > > >  		return -EBUSY;
> > > >
> > > >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > > > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > > > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > >
> > > >  	/*
> > > >  	 * Now we can enable the vmclear operation in kdump
> > > > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > > >  	.pi_set_sn = vmx_pi_set_sn,
> > > >  };
> > > >
> > > > +/*
> > > > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > > > + */
> > > > +void wakeup_handler(void)
> > > > +{
> > > > +	struct kvm_vcpu *vcpu;
> > > > +	int cpu = smp_processor_id();
> > > > +
> > > > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > > > +			blocked_vcpu_list) {
> > > > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > +
> > > > +		if (pi_test_on(pi_desc) == 1)
> > > > +			kvm_vcpu_kick(vcpu);
> > > > +	}
> > > > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > +}
> > >
> > > Looping through all blocked vcpus does not scale:
> > > Can you allocate more vectors and then multiplex those
> > > vectors amongst the HLT'ed vcpus?
> >
> > I am a little confused about this, can you elaborate it a bit more?
> > Thanks a lot!
> 
> Picture the following overcommitment scenario:
> 
> * High ratio of vCPUs/pCPUs, in the ratio 128/1 (this is exaggerated
> to demonstrate the issue).
> * Every VT-d interrupt is going to scan 128 entries in the list.
> 
> Moreover, the test:
> 
> 		if (pi_test_on(pi_desc) == 1)
> 			kvm_vcpu_kick(vcpu);
> 
> Can trigger for vCPUs which have not been waken up due
> to VT-d interrupts, but for other interrupts.
> 
> You can allocate, say 16 vectors on the pCPU for VT-d interrupts:
> 
> POSTED_INTERRUPT_WAKEUP_VECTOR_1,
> POSTED_INTERRUPT_WAKEUP_VECTOR_2,
> ...
> 

Global vector is a limited resources in the system, and this involves
common x86 interrupt code changes. I am not sure we can allocate
so many dedicated global vector for KVM usage.

> > > It seems there is a bunch free:
> > >
> > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > Author: Alex Shi <alex.shi@intel.com>
> > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > >
> > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> CALL_FUNCTION_VECTOR
> > >
> > > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > devices are not part of the list).
> >
> > Is it easy to find whether a vCPU (or the associated domain) has assigned
> devices?
> > If so, we can only add those vCPUs with assigned devices.
> 
> When configuring IRTE, at kvm_arch_vfio_update_pi_irte?

Yes.

> 
> > > > +
> > > >  static int __init vmx_init(void)
> > > >  {
> > > >  	int r, i, msr;
> > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > >
> > > >  	update_ple_window_actual_max();
> > > >
> > > > +	wakeup_handler_callback = wakeup_handler;
> > > > +
> > > >  	return 0;
> > > >
> > > >  out7:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 0033df3..1551a46 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > > *vcpu)
> > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > >  	}
> > > >
> > > > +	/*
> > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > +	 * operations out of the if statement.
> > > > +	 */
> > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > +		/*
> > > > +		 * Update architecture specific hints for APIC
> > > > +		 * virtual interrupt delivery.
> > > > +		 */
> > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > +	}
> > > > +
> > >
> > > This is a hot fast path. You can set KVM_REQ_EVENT from wakeup_handler.
> >
> > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help
> much,
> > if vCPU is running in ROOT mode, and VT-d hardware issues an notification
> event,
> > POSTED_INTR_VECTOR interrupt handler will be called.
> 
> If vCPU is in root mode, remapping HW will find IRTE configured with
> vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> VM-exit, and execute the interrupt handler wakeup_handler. Right?

There are two cases:
Case 1: vCPU is blocked, so it is in root mode, this is what you described above.
Case 2, vCPU is running in root mode, such as, handling vm-exits, in this case,
the notification vector is 'POSTED_INTR_VECTOR', and if external interrupts
from assigned devices happen, the handled of 'POSTED_INTR_VECTOR' will
be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't need
do real things, since the pending interrupts in PIR will be synced to vIRR before
VM-Entry (this code have already been there when enabling CPU-side
posted-interrupt along with APICv). Like what I said before, it is a little hard to
get vCPU related information in it, even if we get, it is not accurate and may harm
the performance.(need search)

So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the notification
event for 'POSTED_INTR_VECTOR'.

> 
> The point of this comment is that you can keep the
> 
> "if (kvm_x86_ops->hwapic_irr_update)
> 	kvm_x86_ops->hwapic_irr_update(vcpu,
> 			kvm_lapic_find_highest_irr(vcpu));
> "
> 
> Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> wakeup_handler sets KVM_REQ_EVENT.

Please see above.

> 
> > Again, POSTED_INTR_VECTOR interrupt handler may be called very
> frequently,
> > it is a little hard to get vCPU related information in it, even if we get, it is not
> > accurate and may harm the performance.(need search)
> >
> > >
> > > >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > > >  		kvm_apic_accept_events(vcpu);
> > > >  		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > > > @@ -6168,13 +6183,6 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > > *vcpu)
> > > >  			kvm_x86_ops->enable_irq_window(vcpu);
> > > >
> > > >  		if (kvm_lapic_enabled(vcpu)) {
> > > > -			/*
> > > > -			 * Update architecture specific hints for APIC
> > > > -			 * virtual interrupt delivery.
> > > > -			 */
> > > > -			if (kvm_x86_ops->hwapic_irr_update)
> > > > -				kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > -					kvm_lapic_find_highest_irr(vcpu));
> > > >  			update_cr8_intercept(vcpu);
> > > >  			kvm_lapic_sync_to_vapic(vcpu);
> > > >  		}
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 3d7242c..d981d16 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -239,6 +239,9 @@ struct kvm_vcpu {
> > > >  	unsigned long requests;
> > > >  	unsigned long guest_debug;
> > > >
> > > > +	int wakeup_cpu;
> > > > +	struct list_head blocked_vcpu_list;
> > > > +
> > > >  	struct mutex mutex;
> > > >  	struct kvm_run *run;
> > > >
> > > > @@ -282,6 +285,7 @@ struct kvm_vcpu {
> > > >  	} spin_loop;
> > > >  #endif
> > > >  	bool preempted;
> > > > +	bool blocked;
> > > >  	struct kvm_vcpu_arch arch;
> > > >  };
> > >
> > > Please remove blocked and wakeup_cpu, they should not be necessary.
> >
> > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > find the right list to wake up the vCPU.
> 
> If the vCPU was moved it should have updated IRTE destination field
> to the pCPU which it has moved to?

Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
would be updated accordingly. 

When vCPU is blocked. To wake up the blocked vCPU, we need to find which
list the vCPU is blocked on, and this is what wakeup_cpu used for?


Thanks,
Feng

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-06  6:51         ` Wu, Feng
@ 2015-03-12  1:15           ` Marcelo Tosatti
  2015-03-16 11:42             ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-03-12  1:15 UTC (permalink / raw)
  To: Wu, Feng
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Mar 06, 2015 at 06:51:52AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Wednesday, March 04, 2015 8:06 PM
> > To: Wu, Feng
> > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Mon, Mar 02, 2015 at 01:36:51PM +0000, Wu, Feng wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > Sent: Friday, February 27, 2015 7:41 AM
> > > > To: Wu, Feng
> > > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> > x86@kernel.org;
> > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> > vCPU
> > > > is blocked
> > > >
> > > > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > > > is blocked.
> > > > >
> > > > > pre-block:
> > > > > - Add the vCPU to the blocked per-CPU list
> > > > > - Clear 'SN'
> > > > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > > >
> > > > > post-block:
> > > > > - Remove the vCPU from the per-CPU list
> > > > >
> > > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > > ---
> > > > >  arch/x86/include/asm/kvm_host.h |  2 +
> > > > >  arch/x86/kvm/vmx.c              | 96
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > > >  arch/x86/kvm/x86.c              | 22 +++++++---
> > > > >  include/linux/kvm_host.h        |  4 ++
> > > > >  virt/kvm/kvm_main.c             |  6 +++
> > > > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > b/arch/x86/include/asm/kvm_host.h
> > > > > index 13e3e40..32c110a 100644
> > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> > > > base_gfn, int level)
> > > > >
> > > > >  #define ASYNC_PF_PER_VCPU 64
> > > > >
> > > > > +extern void (*wakeup_handler_callback)(void);
> > > > > +
> > > > >  enum kvm_reg {
> > > > >  	VCPU_REGS_RAX = 0,
> > > > >  	VCPU_REGS_RCX = 1,
> > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > index bf2e6cd..a1c83a2 100644
> > > > > --- a/arch/x86/kvm/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > > > current_vmcs);
> > > > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > > > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > > > >
> > > > > +/*
> > > > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > > > > + * can find which vCPU should be waken up.
> > > > > + */
> > > > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > > > +
> > > > >  static unsigned long *vmx_io_bitmap_a;
> > > > >  static unsigned long *vmx_io_bitmap_b;
> > > > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > *vcpu,
> > > > int cpu)
> > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > >  		struct pi_desc old, new;
> > > > >  		unsigned int dest;
> > > > > +		unsigned long flags;
> > > > >
> > > > >  		memset(&old, 0, sizeof(old));
> > > > >  		memset(&new, 0, sizeof(new));
> > > > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > > > *vcpu, int cpu)
> > > > >  			new.nv = POSTED_INTR_VECTOR;
> > > > >  		} while (cmpxchg(&pi_desc->control, old.control,
> > > > >  				new.control) != old.control);
> > > > > +
> > > > > +		/*
> > > > > +		 * Delete the vCPU from the related wakeup queue
> > > > > +		 * if we are resuming from blocked state
> > > > > +		 */
> > > > > +		if (vcpu->blocked) {
> > > > > +			vcpu->blocked = false;
> > > > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > +				vcpu->wakeup_cpu), flags);
> > > > > +			list_del(&vcpu->blocked_vcpu_list);
> > > > > +
> > 	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > +				vcpu->wakeup_cpu), flags);
> > > > > +			vcpu->wakeup_cpu = -1;
> > > > > +		}
> > > > >  	}
> > > > >  }
> > > > >
> > > > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > *vcpu)
> > > > >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > >  		struct pi_desc old, new;
> > > > > +		unsigned long flags;
> > > > > +		int cpu;
> > > > > +		struct cpumask cpu_others_mask;
> > > > >
> > > > >  		memset(&old, 0, sizeof(old));
> > > > >  		memset(&new, 0, sizeof(new));
> > > > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > > > *vcpu)
> > > > >  				pi_set_sn(&new);
> > > > >  			} while (cmpxchg(&pi_desc->control, old.control,
> > > > >  					new.control) != old.control);
> > > > > +		} else if (vcpu->blocked) {
> > > > > +			/*
> > > > > +			 * The vcpu is blocked on the wait queue.
> > > > > +			 * Store the blocked vCPU on the list of the
> > > > > +			 * vcpu->wakeup_cpu, which is the destination
> > > > > +			 * of the wake-up notification event.
> > > > > +			 */
> > > > > +			vcpu->wakeup_cpu = vcpu->cpu;
> > > > > +			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > +					  vcpu->wakeup_cpu), flags);
> > > > > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > > > > +				      &per_cpu(blocked_vcpu_on_cpu,
> > > > > +				      vcpu->wakeup_cpu));
> > > > > +			spin_unlock_irqrestore(
> > > > > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > +					vcpu->wakeup_cpu), flags);
> > > > > +
> > > > > +			do {
> > > > > +				old.control = new.control = pi_desc->control;
> > > > > +
> > > > > +				/*
> > > > > +				 * We should not block the vCPU if
> > > > > +				 * an interrupt is posted for it.
> > > > > +				 */
> > > > > +				if (pi_test_on(pi_desc) == 1) {
> > > > > +					/*
> > > > > +					 * We need schedule the wakeup worker
> > > > > +					 * on a different cpu other than
> > > > > +					 * vcpu->cpu, because in some case,
> > > > > +					 * schedule_work() will call
> > > > > +					 * try_to_wake_up() which needs acquire
> > > > > +					 * the rq lock. This can cause deadlock.
> > > > > +					 */
> > > > > +					cpumask_copy(&cpu_others_mask,
> > > > > +						     cpu_online_mask);
> > > > > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > > > > +					cpu = any_online_cpu(cpu_others_mask);
> > > > > +
> > > > > +					schedule_work_on(cpu,
> > > > > +							 &vcpu->wakeup_worker);
> > > > > +				}
> > > > > +
> > > > > +				pi_clear_sn(&new);
> > > > > +
> > > > > +				/* set 'NV' to 'wakeup vector' */
> > > > > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > > > > +			} while (cmpxchg(&pi_desc->control, old.control,
> > > > > +				new.control) != old.control);
> > > > >  		}
> > > >
> > > > This can be done exclusively on HLT emulation, correct? (that is, on
> > > > entry to HLT and exit from HLT).
> > >
> > > Do you mean the following?
> > > In kvm_emulate_halt(), we do:
> > > 1. Add vCPU in the blocking list
> > > 2. Clear 'SN'
> > > 3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > >
> > > In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
> > > Bloc king list.
> > 
> > Yes (please check its OK to do this...).
> 
> I think about this for some time, and I feel this may be another solution
> to implement it. Do you mind sharing your ideas about why do you think
> this alternative is better than the current one? Thanks a lot!

Two reasons:

1) Because it does not add overhead to vcpu_puts thats are not due to
HLT. Doing so removes the "vcpu->blocked" variable (its implicit in the 
code anyway).
2) Easier to spot races.

Do you have any reason why having the code at vcpu_put/vcpu_load is     
better than the proposal to have the code at kvm_vcpu_block?

> > > > If the vcpu is scheduled out for any other reason (transition to
> > > > userspace or transition to other thread), it will eventually resume
> > > > execution. And in that case, continuation of execution does not depend
> > > > on the event (VT-d interrupt) notification.
> > >
> > > Yes, I think this is true for my current implementation, right?
> > >
> > > >
> > > > There is a race window with the code above, I believe.
> > >
> > > I did careful code review back and forth for the above code, It will
> > > be highly appreciated if you can point out the race window!
> > 
> > So the remapping HW sees either POSTED_INTR_VECTOR or
> > POSTED_INTR_WAKEUP_VECTOR.
> > 
> > You should:
> > 
> > 1. Set POSTED_INTR_WAKEUP_VECTOR.
> > 2. Check for PIR / ON bit, which might have been set by
> > POSTED_INTR_VECTOR notification.
> > 3. emulate HLT.
> 
> My original idea for pre-block operation is:
> 1. Add vCPU to the per-cpu blocking list. Here is the code for this in my patch:
> +                       vcpu->wakeup_cpu = vcpu->cpu;
> +                       spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> +                                         vcpu->wakeup_cpu), flags);
> +                       list_add_tail(&vcpu->blocked_vcpu_list,
> +                                     &per_cpu(blocked_vcpu_on_cpu,
> +                                     vcpu->wakeup_cpu));
> +                       spin_unlock_irqrestore(
> +                                       &per_cpu(blocked_vcpu_on_cpu_lock,
> +                                       vcpu->wakeup_cpu), flags);
> 2. Update Posted-interrupt descriptor, here is the code in my patch:
> +                       do {
> +                               old.control = new.control = pi_desc->control;
> +
> +                               /*
> +                                * We should not block the vCPU if
> +                                * an interrupt is posted for it.
> +                                */
> +                               if (pi_test_on(pi_desc) == 1) {
> +                                       /*
> +                                        * We need schedule the wakeup worker
> +                                        * on a different cpu other than
> +                                        * vcpu->cpu, because in some case,
> +                                        * schedule_work() will call
> +                                        * try_to_wake_up() which needs acquire
> +                                        * the rq lock. This can cause deadlock.
> +                                        */
> +                                       cpumask_copy(&cpu_others_mask,
> +                                                    cpu_online_mask);
> +                                       cpu_clear(vcpu->cpu, cpu_others_mask);
> +                                       cpu = any_online_cpu(cpu_others_mask);
> +
> +                                       schedule_work_on(cpu,
> +                                                        &vcpu->wakeup_worker);
> +                               }
> +
> +                               WARN((pi_desc->sn == 1),
> +                                    "Warning: SN field of posted-interrupts "
> +                                    "is set before blocking\n");
> +
> +                               /* set 'NV' to 'wakeup vector' */
> +                               new.nv = POSTED_INTR_WAKEUP_VECTOR;
> +                       } while (cmpxchg(&pi_desc->control, old.control,
> +                               new.control) != old.control);
> 
> If PIR/ON bit is set by POSTED_INTR_VECTOR notification during the above operation, we will stop
> blocking the vCPU like about. But seems I missed something in the above code which should be in
> my mind from the beginning, I should add a 'break' in the end the above ' if (pi_test_on(pi_desc) == 1) {}',
> so in this case, the 'NV' filed remains unchanged.

Right have to think carefully about all cases.

> > > > >  	}
> > > > >
> > > > > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> > > > >  		return -EBUSY;
> > > > >
> > > > >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > > > > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > > > > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > >
> > > > >  	/*
> > > > >  	 * Now we can enable the vmclear operation in kdump
> > > > > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > > > >  	.pi_set_sn = vmx_pi_set_sn,
> > > > >  };
> > > > >
> > > > > +/*
> > > > > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > > > > + */
> > > > > +void wakeup_handler(void)
> > > > > +{
> > > > > +	struct kvm_vcpu *vcpu;
> > > > > +	int cpu = smp_processor_id();
> > > > > +
> > > > > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> > > > > +			blocked_vcpu_list) {
> > > > > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > +
> > > > > +		if (pi_test_on(pi_desc) == 1)
> > > > > +			kvm_vcpu_kick(vcpu);
> > > > > +	}
> > > > > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > +}
> > > >
> > > > Looping through all blocked vcpus does not scale:
> > > > Can you allocate more vectors and then multiplex those
> > > > vectors amongst the HLT'ed vcpus?
> > >
> > > I am a little confused about this, can you elaborate it a bit more?
> > > Thanks a lot!
> > 
> > Picture the following overcommitment scenario:
> > 
> > * High ratio of vCPUs/pCPUs, in the ratio 128/1 (this is exaggerated
> > to demonstrate the issue).
> > * Every VT-d interrupt is going to scan 128 entries in the list.
> > 
> > Moreover, the test:
> > 
> > 		if (pi_test_on(pi_desc) == 1)
> > 			kvm_vcpu_kick(vcpu);
> > 
> > Can trigger for vCPUs which have not been waken up due
> > to VT-d interrupts, but for other interrupts.
> > 
> > You can allocate, say 16 vectors on the pCPU for VT-d interrupts:
> > 
> > POSTED_INTERRUPT_WAKEUP_VECTOR_1,
> > POSTED_INTERRUPT_WAKEUP_VECTOR_2,
> > ...
> > 
> 
> Global vector is a limited resources in the system, and this involves
> common x86 interrupt code changes. I am not sure we can allocate
> so many dedicated global vector for KVM usage.

Why not? Have KVM use all free vectors (so if vectors are necessary for
other purposes, people should shrink the KVM vector pool).

BTW the Intel docs talk about that ("one vector per vCPU").

> > > > It seems there is a bunch free:
> > > >
> > > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > > Author: Alex Shi <alex.shi@intel.com>
> > > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > > >
> > > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> > CALL_FUNCTION_VECTOR
> > > >
> > > > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > > devices are not part of the list).
> > >
> > > Is it easy to find whether a vCPU (or the associated domain) has assigned
> > devices?
> > > If so, we can only add those vCPUs with assigned devices.
> > 
> > When configuring IRTE, at kvm_arch_vfio_update_pi_irte?
> 
> Yes.
> 
> > 
> > > > > +
> > > > >  static int __init vmx_init(void)
> > > > >  {
> > > > >  	int r, i, msr;
> > > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > > >
> > > > >  	update_ple_window_actual_max();
> > > > >
> > > > > +	wakeup_handler_callback = wakeup_handler;
> > > > > +
> > > > >  	return 0;
> > > > >
> > > > >  out7:
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index 0033df3..1551a46 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > > > *vcpu)
> > > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > > >  	}
> > > > >
> > > > > +	/*
> > > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > > +	 * operations out of the if statement.
> > > > > +	 */
> > > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > > +		/*
> > > > > +		 * Update architecture specific hints for APIC
> > > > > +		 * virtual interrupt delivery.
> > > > > +		 */
> > > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > > +	}
> > > > > +
> > > >
> > > > This is a hot fast path. You can set KVM_REQ_EVENT from wakeup_handler.
> > >
> > > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help
> > much,
> > > if vCPU is running in ROOT mode, and VT-d hardware issues an notification
> > event,
> > > POSTED_INTR_VECTOR interrupt handler will be called.
> > 
> > If vCPU is in root mode, remapping HW will find IRTE configured with
> > vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> > VM-exit, and execute the interrupt handler wakeup_handler. Right?
> 
> There are two cases:
> Case 1: vCPU is blocked, so it is in root mode, this is what you described above.
> Case 2, vCPU is running in root mode, such as, handling vm-exits, in this case,
> the notification vector is 'POSTED_INTR_VECTOR', and if external interrupts
> from assigned devices happen, the handled of 'POSTED_INTR_VECTOR' will
> be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't need
> do real things, since the pending interrupts in PIR will be synced to vIRR before
> VM-Entry (this code have already been there when enabling CPU-side
> posted-interrupt along with APICv). Like what I said before, it is a little hard to
> get vCPU related information in it, even if we get, it is not accurate and may harm
> the performance.(need search)
> 
> So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the notification
> event for 'POSTED_INTR_VECTOR'.
> 
> > 
> > The point of this comment is that you can keep the
> > 
> > "if (kvm_x86_ops->hwapic_irr_update)
> > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > 			kvm_lapic_find_highest_irr(vcpu));
> > "
> > 
> > Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> > wakeup_handler sets KVM_REQ_EVENT.
> 
> Please see above.

OK can you set KVM_REQ_EVENT in case the ON bit is set,
after disabling interrupts ?

kvm_lapic_find_highest_irr(vcpu) eats some cache 
(4 cachelines) versus 1 cacheline for reading ON bit.

> > > > Please remove blocked and wakeup_cpu, they should not be necessary.
> > >
> > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > find the right list to wake up the vCPU.
> > 
> > If the vCPU was moved it should have updated IRTE destination field
> > to the pCPU which it has moved to?
> 
> Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> would be updated accordingly. 
> 
> When vCPU is blocked. To wake up the blocked vCPU, we need to find which
> list the vCPU is blocked on, and this is what wakeup_cpu used for?

Right, perhaps prev_vcpu is a better name.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-12  1:15           ` Marcelo Tosatti
@ 2015-03-16 11:42             ` Wu, Feng
  2015-03-25 23:17               ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-03-16 11:42 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Thursday, March 12, 2015 9:15 AM
> To: Wu, Feng
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Fri, Mar 06, 2015 at 06:51:52AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Wednesday, March 04, 2015 8:06 PM
> > > To: Wu, Feng
> > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Mon, Mar 02, 2015 at 01:36:51PM +0000, Wu, Feng wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > > Sent: Friday, February 27, 2015 7:41 AM
> > > > > To: Wu, Feng
> > > > > Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> > > x86@kernel.org;
> > > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > > joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com;
> > > > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> > > vCPU
> > > > > is blocked
> > > > >
> > > > > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > > > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > > > > is blocked.
> > > > > >
> > > > > > pre-block:
> > > > > > - Add the vCPU to the blocked per-CPU list
> > > > > > - Clear 'SN'
> > > > > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > > > >
> > > > > > post-block:
> > > > > > - Remove the vCPU from the per-CPU list
> > > > > >
> > > > > > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > > > > > ---
> > > > > >  arch/x86/include/asm/kvm_host.h |  2 +
> > > > > >  arch/x86/kvm/vmx.c              | 96
> > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > >  arch/x86/kvm/x86.c              | 22 +++++++---
> > > > > >  include/linux/kvm_host.h        |  4 ++
> > > > > >  virt/kvm/kvm_main.c             |  6 +++
> > > > > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > > b/arch/x86/include/asm/kvm_host.h
> > > > > > index 13e3e40..32c110a 100644
> > > > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn,
> gfn_t
> > > > > base_gfn, int level)
> > > > > >
> > > > > >  #define ASYNC_PF_PER_VCPU 64
> > > > > >
> > > > > > +extern void (*wakeup_handler_callback)(void);
> > > > > > +
> > > > > >  enum kvm_reg {
> > > > > >  	VCPU_REGS_RAX = 0,
> > > > > >  	VCPU_REGS_RCX = 1,
> > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > index bf2e6cd..a1c83a2 100644
> > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > > > > current_vmcs);
> > > > > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > > > > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > > > > >
> > > > > > +/*
> > > > > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler()
> we
> > > > > > + * can find which vCPU should be waken up.
> > > > > > + */
> > > > > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > > > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > > > > +
> > > > > >  static unsigned long *vmx_io_bitmap_a;
> > > > > >  static unsigned long *vmx_io_bitmap_b;
> > > > > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > > > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > > *vcpu,
> > > > > int cpu)
> > > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > >  		struct pi_desc old, new;
> > > > > >  		unsigned int dest;
> > > > > > +		unsigned long flags;
> > > > > >
> > > > > >  		memset(&old, 0, sizeof(old));
> > > > > >  		memset(&new, 0, sizeof(new));
> > > > > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct
> kvm_vcpu
> > > > > *vcpu, int cpu)
> > > > > >  			new.nv = POSTED_INTR_VECTOR;
> > > > > >  		} while (cmpxchg(&pi_desc->control, old.control,
> > > > > >  				new.control) != old.control);
> > > > > > +
> > > > > > +		/*
> > > > > > +		 * Delete the vCPU from the related wakeup queue
> > > > > > +		 * if we are resuming from blocked state
> > > > > > +		 */
> > > > > > +		if (vcpu->blocked) {
> > > > > > +			vcpu->blocked = false;
> > > > > > +
> 	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +				vcpu->wakeup_cpu), flags);
> > > > > > +			list_del(&vcpu->blocked_vcpu_list);
> > > > > > +
> > > 	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +				vcpu->wakeup_cpu), flags);
> > > > > > +			vcpu->wakeup_cpu = -1;
> > > > > > +		}
> > > > > >  	}
> > > > > >  }
> > > > > >
> > > > > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > > *vcpu)
> > > > > >  	if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > > > > >  		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > >  		struct pi_desc old, new;
> > > > > > +		unsigned long flags;
> > > > > > +		int cpu;
> > > > > > +		struct cpumask cpu_others_mask;
> > > > > >
> > > > > >  		memset(&old, 0, sizeof(old));
> > > > > >  		memset(&new, 0, sizeof(new));
> > > > > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct
> kvm_vcpu
> > > > > *vcpu)
> > > > > >  				pi_set_sn(&new);
> > > > > >  			} while (cmpxchg(&pi_desc->control, old.control,
> > > > > >  					new.control) != old.control);
> > > > > > +		} else if (vcpu->blocked) {
> > > > > > +			/*
> > > > > > +			 * The vcpu is blocked on the wait queue.
> > > > > > +			 * Store the blocked vCPU on the list of the
> > > > > > +			 * vcpu->wakeup_cpu, which is the destination
> > > > > > +			 * of the wake-up notification event.
> > > > > > +			 */
> > > > > > +			vcpu->wakeup_cpu = vcpu->cpu;
> > > > > > +
> 	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +					  vcpu->wakeup_cpu), flags);
> > > > > > +			list_add_tail(&vcpu->blocked_vcpu_list,
> > > > > > +				      &per_cpu(blocked_vcpu_on_cpu,
> > > > > > +				      vcpu->wakeup_cpu));
> > > > > > +			spin_unlock_irqrestore(
> > > > > > +					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > > > > +					vcpu->wakeup_cpu), flags);
> > > > > > +
> > > > > > +			do {
> > > > > > +				old.control = new.control = pi_desc->control;
> > > > > > +
> > > > > > +				/*
> > > > > > +				 * We should not block the vCPU if
> > > > > > +				 * an interrupt is posted for it.
> > > > > > +				 */
> > > > > > +				if (pi_test_on(pi_desc) == 1) {
> > > > > > +					/*
> > > > > > +					 * We need schedule the wakeup worker
> > > > > > +					 * on a different cpu other than
> > > > > > +					 * vcpu->cpu, because in some case,
> > > > > > +					 * schedule_work() will call
> > > > > > +					 * try_to_wake_up() which needs acquire
> > > > > > +					 * the rq lock. This can cause deadlock.
> > > > > > +					 */
> > > > > > +					cpumask_copy(&cpu_others_mask,
> > > > > > +						     cpu_online_mask);
> > > > > > +					cpu_clear(vcpu->cpu, cpu_others_mask);
> > > > > > +					cpu = any_online_cpu(cpu_others_mask);
> > > > > > +
> > > > > > +					schedule_work_on(cpu,
> > > > > > +							 &vcpu->wakeup_worker);
> > > > > > +				}
> > > > > > +
> > > > > > +				pi_clear_sn(&new);
> > > > > > +
> > > > > > +				/* set 'NV' to 'wakeup vector' */
> > > > > > +				new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > > > > > +			} while (cmpxchg(&pi_desc->control, old.control,
> > > > > > +				new.control) != old.control);
> > > > > >  		}
> > > > >
> > > > > This can be done exclusively on HLT emulation, correct? (that is, on
> > > > > entry to HLT and exit from HLT).
> > > >
> > > > Do you mean the following?
> > > > In kvm_emulate_halt(), we do:
> > > > 1. Add vCPU in the blocking list
> > > > 2. Clear 'SN'
> > > > 3. set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > > >
> > > > In __vcpu_run(), after kvm_vcpu_block(), we remove the vCPU from the
> > > > Bloc king list.
> > >
> > > Yes (please check its OK to do this...).
> >
> > I think about this for some time, and I feel this may be another solution
> > to implement it. Do you mind sharing your ideas about why do you think
> > this alternative is better than the current one? Thanks a lot!
> 
> Two reasons:
> 
> 1) Because it does not add overhead to vcpu_puts thats are not due to
> HLT. Doing so removes the "vcpu->blocked" variable (its implicit in the
> code anyway).
> 2) Easier to spot races.
> 
> Do you have any reason why having the code at vcpu_put/vcpu_load is
> better than the proposal to have the code at kvm_vcpu_block?

I think your proposal is good, I just want to better understand your idea here.:)

One thing, even we put the code to kvm_vcpu_block, we still need to add code
at vcpu_put/vcpu_load for the preemption case like what I did now.

> 
> > > > > If the vcpu is scheduled out for any other reason (transition to
> > > > > userspace or transition to other thread), it will eventually resume
> > > > > execution. And in that case, continuation of execution does not depend
> > > > > on the event (VT-d interrupt) notification.
> > > >
> > > > Yes, I think this is true for my current implementation, right?
> > > >
> > > > >
> > > > > There is a race window with the code above, I believe.
> > > >
> > > > I did careful code review back and forth for the above code, It will
> > > > be highly appreciated if you can point out the race window!
> > >
> > > So the remapping HW sees either POSTED_INTR_VECTOR or
> > > POSTED_INTR_WAKEUP_VECTOR.
> > >
> > > You should:
> > >
> > > 1. Set POSTED_INTR_WAKEUP_VECTOR.
> > > 2. Check for PIR / ON bit, which might have been set by
> > > POSTED_INTR_VECTOR notification.
> > > 3. emulate HLT.
> >
> > My original idea for pre-block operation is:
> > 1. Add vCPU to the per-cpu blocking list. Here is the code for this in my patch:
> > +                       vcpu->wakeup_cpu = vcpu->cpu;
> > +
> spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                         vcpu->wakeup_cpu),
> flags);
> > +                       list_add_tail(&vcpu->blocked_vcpu_list,
> > +
> &per_cpu(blocked_vcpu_on_cpu,
> > +                                     vcpu->wakeup_cpu));
> > +                       spin_unlock_irqrestore(
> > +
> &per_cpu(blocked_vcpu_on_cpu_lock,
> > +                                       vcpu->wakeup_cpu), flags);
> > 2. Update Posted-interrupt descriptor, here is the code in my patch:
> > +                       do {
> > +                               old.control = new.control =
> pi_desc->control;
> > +
> > +                               /*
> > +                                * We should not block the vCPU if
> > +                                * an interrupt is posted for it.
> > +                                */
> > +                               if (pi_test_on(pi_desc) == 1) {
> > +                                       /*
> > +                                        * We need schedule the
> wakeup worker
> > +                                        * on a different cpu other
> than
> > +                                        * vcpu->cpu, because in
> some case,
> > +                                        * schedule_work() will call
> > +                                        * try_to_wake_up() which
> needs acquire
> > +                                        * the rq lock. This can
> cause deadlock.
> > +                                        */
> > +
> cpumask_copy(&cpu_others_mask,
> > +
> cpu_online_mask);
> > +                                       cpu_clear(vcpu->cpu,
> cpu_others_mask);
> > +                                       cpu =
> any_online_cpu(cpu_others_mask);
> > +
> > +                                       schedule_work_on(cpu,
> > +
> &vcpu->wakeup_worker);
> > +                               }
> > +
> > +                               WARN((pi_desc->sn == 1),
> > +                                    "Warning: SN field of
> posted-interrupts "
> > +                                    "is set before blocking\n");
> > +
> > +                               /* set 'NV' to 'wakeup vector' */
> > +                               new.nv =
> POSTED_INTR_WAKEUP_VECTOR;
> > +                       } while (cmpxchg(&pi_desc->control,
> old.control,
> > +                               new.control) != old.control);
> >
> > If PIR/ON bit is set by POSTED_INTR_VECTOR notification during the above
> operation, we will stop
> > blocking the vCPU like about. But seems I missed something in the above
> code which should be in
> > my mind from the beginning, I should add a 'break' in the end the above ' if
> (pi_test_on(pi_desc) == 1) {}',
> > so in this case, the 'NV' filed remains unchanged.
> 
> Right have to think carefully about all cases.
> 
> > > > > >  	}
> > > > > >
> > > > > > @@ -2842,6 +2915,8 @@ static int hardware_enable(void)
> > > > > >  		return -EBUSY;
> > > > > >
> > > > > >  	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> > > > > > +	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
> > > > > > +	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > >
> > > > > >  	/*
> > > > > >  	 * Now we can enable the vmclear operation in kdump
> > > > > > @@ -9315,6 +9390,25 @@ static struct kvm_x86_ops vmx_x86_ops =
> {
> > > > > >  	.pi_set_sn = vmx_pi_set_sn,
> > > > > >  };
> > > > > >
> > > > > > +/*
> > > > > > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> > > > > > + */
> > > > > > +void wakeup_handler(void)
> > > > > > +{
> > > > > > +	struct kvm_vcpu *vcpu;
> > > > > > +	int cpu = smp_processor_id();
> > > > > > +
> > > > > > +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > > +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu,
> cpu),
> > > > > > +			blocked_vcpu_list) {
> > > > > > +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > > > > +
> > > > > > +		if (pi_test_on(pi_desc) == 1)
> > > > > > +			kvm_vcpu_kick(vcpu);
> > > > > > +	}
> > > > > > +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> > > > > > +}
> > > > >
> > > > > Looping through all blocked vcpus does not scale:
> > > > > Can you allocate more vectors and then multiplex those
> > > > > vectors amongst the HLT'ed vcpus?
> > > >
> > > > I am a little confused about this, can you elaborate it a bit more?
> > > > Thanks a lot!
> > >
> > > Picture the following overcommitment scenario:
> > >
> > > * High ratio of vCPUs/pCPUs, in the ratio 128/1 (this is exaggerated
> > > to demonstrate the issue).
> > > * Every VT-d interrupt is going to scan 128 entries in the list.
> > >
> > > Moreover, the test:
> > >
> > > 		if (pi_test_on(pi_desc) == 1)
> > > 			kvm_vcpu_kick(vcpu);
> > >
> > > Can trigger for vCPUs which have not been waken up due
> > > to VT-d interrupts, but for other interrupts.
> > >
> > > You can allocate, say 16 vectors on the pCPU for VT-d interrupts:
> > >
> > > POSTED_INTERRUPT_WAKEUP_VECTOR_1,
> > > POSTED_INTERRUPT_WAKEUP_VECTOR_2,
> > > ...
> > >
> >
> > Global vector is a limited resources in the system, and this involves
> > common x86 interrupt code changes. I am not sure we can allocate
> > so many dedicated global vector for KVM usage.
> 
> Why not? Have KVM use all free vectors (so if vectors are necessary for
> other purposes, people should shrink the KVM vector pool).

If we want to allocate more global vector for this usage, we need hpa's
input about it. Peter, what is your opinion?

> 
> BTW the Intel docs talk about that ("one vector per vCPU").
Yes, the Spec talks about this, but it is more complex using one vector per vCPU.

> 
> > > > > It seems there is a bunch free:
> > > > >
> > > > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > > > Author: Alex Shi <alex.shi@intel.com>
> > > > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > > > >
> > > > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> > > CALL_FUNCTION_VECTOR
> > > > >
> > > > > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > > > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > > > devices are not part of the list).
> > > >
> > > > Is it easy to find whether a vCPU (or the associated domain) has assigned
> > > devices?
> > > > If so, we can only add those vCPUs with assigned devices.
> > >
> > > When configuring IRTE, at kvm_arch_vfio_update_pi_irte?
> >
> > Yes.
> >
> > >
> > > > > > +
> > > > > >  static int __init vmx_init(void)
> > > > > >  {
> > > > > >  	int r, i, msr;
> > > > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > > > >
> > > > > >  	update_ple_window_actual_max();
> > > > > >
> > > > > > +	wakeup_handler_callback = wakeup_handler;
> > > > > > +
> > > > > >  	return 0;
> > > > > >
> > > > > >  out7:
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index 0033df3..1551a46 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct
> kvm_vcpu
> > > > > *vcpu)
> > > > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > > > >  	}
> > > > > >
> > > > > > +	/*
> > > > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > > > +	 * operations out of the if statement.
> > > > > > +	 */
> > > > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > > > +		/*
> > > > > > +		 * Update architecture specific hints for APIC
> > > > > > +		 * virtual interrupt delivery.
> > > > > > +		 */
> > > > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > > > +	}
> > > > > > +
> > > > >
> > > > > This is a hot fast path. You can set KVM_REQ_EVENT from
> wakeup_handler.
> > > >
> > > > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help
> > > much,
> > > > if vCPU is running in ROOT mode, and VT-d hardware issues an notification
> > > event,
> > > > POSTED_INTR_VECTOR interrupt handler will be called.
> > >
> > > If vCPU is in root mode, remapping HW will find IRTE configured with
> > > vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> > > VM-exit, and execute the interrupt handler wakeup_handler. Right?
> >
> > There are two cases:
> > Case 1: vCPU is blocked, so it is in root mode, this is what you described
> above.
> > Case 2, vCPU is running in root mode, such as, handling vm-exits, in this case,
> > the notification vector is 'POSTED_INTR_VECTOR', and if external interrupts
> > from assigned devices happen, the handled of 'POSTED_INTR_VECTOR' will
> > be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't need
> > do real things, since the pending interrupts in PIR will be synced to vIRR
> before
> > VM-Entry (this code have already been there when enabling CPU-side
> > posted-interrupt along with APICv). Like what I said before, it is a little hard
> to
> > get vCPU related information in it, even if we get, it is not accurate and may
> harm
> > the performance.(need search)
> >
> > So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the
> notification
> > event for 'POSTED_INTR_VECTOR'.
> >
> > >
> > > The point of this comment is that you can keep the
> > >
> > > "if (kvm_x86_ops->hwapic_irr_update)
> > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > 			kvm_lapic_find_highest_irr(vcpu));
> > > "
> > >
> > > Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> > > wakeup_handler sets KVM_REQ_EVENT.
> >
> > Please see above.
> 
> OK can you set KVM_REQ_EVENT in case the ON bit is set,
> after disabling interrupts ?
> 
Currently, the following code is executed before local_irq_disable() is called,
so do you mean 1)moving local_irq_disable() to the place before it. 2) after interrupt
is disabled, set KVM_REQ_EVENT in case the ON bit is set?

"if (kvm_x86_ops->hwapic_irr_update)
	kvm_x86_ops->hwapic_irr_update(vcpu,
			kvm_lapic_find_highest_irr(vcpu));

> kvm_lapic_find_highest_irr(vcpu) eats some cache
> (4 cachelines) versus 1 cacheline for reading ON bit.
> 
> > > > > Please remove blocked and wakeup_cpu, they should not be necessary.
> > > >
> > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > > find the right list to wake up the vCPU.
> > >
> > > If the vCPU was moved it should have updated IRTE destination field
> > > to the pCPU which it has moved to?
> >
> > Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> > would be updated accordingly.
> >
> > When vCPU is blocked. To wake up the blocked vCPU, we need to find which
> > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> 
> Right, perhaps prev_vcpu is a better name.

Do you mean "prev_pcpu"?

Thanks,
Feng


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-16 11:42             ` Wu, Feng
@ 2015-03-25 23:17               ` Marcelo Tosatti
  2015-03-27  6:34                 ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-03-25 23:17 UTC (permalink / raw)
  To: Wu, Feng, hpa
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Mon, Mar 16, 2015 at 11:42:06AM +0000, Wu, Feng wrote:
> > Do you have any reason why having the code at vcpu_put/vcpu_load is
> > better than the proposal to have the code at kvm_vcpu_block?
> 
> I think your proposal is good, I just want to better understand your idea here.:)

Reduce the overhead of vcpu sched in / vcpu sched out, basically.

> One thing, even we put the code to kvm_vcpu_block, we still need to add code
> at vcpu_put/vcpu_load for the preemption case like what I did now.
> 
> > 
> > >
> > > Global vector is a limited resources in the system, and this involves
> > > common x86 interrupt code changes. I am not sure we can allocate
> > > so many dedicated global vector for KVM usage.
> > 
> > Why not? Have KVM use all free vectors (so if vectors are necessary for
> > other purposes, people should shrink the KVM vector pool).
> 
> If we want to allocate more global vector for this usage, we need hpa's
> input about it. Peter, what is your opinion?

Peter?

> > BTW the Intel docs talk about that ("one vector per vCPU").
> Yes, the Spec talks about this, but it is more complex using one vector per vCPU.
> 
> > 
> > > > > > It seems there is a bunch free:
> > > > > >
> > > > > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > > > > Author: Alex Shi <alex.shi@intel.com>
> > > > > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > > > > >
> > > > > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> > > > CALL_FUNCTION_VECTOR
> > > > > >
> > > > > > Can you add only vcpus which have posted IRTEs that point to this pCPU
> > > > > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > > > > devices are not part of the list).
> > > > >
> > > > > Is it easy to find whether a vCPU (or the associated domain) has assigned
> > > > devices?
> > > > > If so, we can only add those vCPUs with assigned devices.
> > > >
> > > > When configuring IRTE, at kvm_arch_vfio_update_pi_irte?
> > >
> > > Yes.
> > >
> > > >
> > > > > > > +
> > > > > > >  static int __init vmx_init(void)
> > > > > > >  {
> > > > > > >  	int r, i, msr;
> > > > > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > > > > >
> > > > > > >  	update_ple_window_actual_max();
> > > > > > >
> > > > > > > +	wakeup_handler_callback = wakeup_handler;
> > > > > > > +
> > > > > > >  	return 0;
> > > > > > >
> > > > > > >  out7:
> > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > index 0033df3..1551a46 100644
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct
> > kvm_vcpu
> > > > > > *vcpu)
> > > > > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > > > > >  	}
> > > > > > >
> > > > > > > +	/*
> > > > > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > > > > +	 * operations out of the if statement.
> > > > > > > +	 */
> > > > > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > > > > +		/*
> > > > > > > +		 * Update architecture specific hints for APIC
> > > > > > > +		 * virtual interrupt delivery.
> > > > > > > +		 */
> > > > > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > > > > +	}
> > > > > > > +
> > > > > >
> > > > > > This is a hot fast path. You can set KVM_REQ_EVENT from
> > wakeup_handler.
> > > > >
> > > > > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't help
> > > > much,
> > > > > if vCPU is running in ROOT mode, and VT-d hardware issues an notification
> > > > event,
> > > > > POSTED_INTR_VECTOR interrupt handler will be called.
> > > >
> > > > If vCPU is in root mode, remapping HW will find IRTE configured with
> > > > vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> > > > VM-exit, and execute the interrupt handler wakeup_handler. Right?
> > >
> > > There are two cases:
> > > Case 1: vCPU is blocked, so it is in root mode, this is what you described
> > above.
> > > Case 2, vCPU is running in root mode, such as, handling vm-exits, in this case,
> > > the notification vector is 'POSTED_INTR_VECTOR', and if external interrupts
> > > from assigned devices happen, the handled of 'POSTED_INTR_VECTOR' will
> > > be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't need
> > > do real things, since the pending interrupts in PIR will be synced to vIRR
> > before
> > > VM-Entry (this code have already been there when enabling CPU-side
> > > posted-interrupt along with APICv). Like what I said before, it is a little hard
> > to
> > > get vCPU related information in it, even if we get, it is not accurate and may
> > harm
> > > the performance.(need search)
> > >
> > > So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the
> > notification
> > > event for 'POSTED_INTR_VECTOR'.
> > >
> > > >
> > > > The point of this comment is that you can keep the
> > > >
> > > > "if (kvm_x86_ops->hwapic_irr_update)
> > > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > 			kvm_lapic_find_highest_irr(vcpu));
> > > > "
> > > >
> > > > Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> > > > wakeup_handler sets KVM_REQ_EVENT.
> > >
> > > Please see above.
> > 
> > OK can you set KVM_REQ_EVENT in case the ON bit is set,
> > after disabling interrupts ?
> > 
> Currently, the following code is executed before local_irq_disable() is called,
> so do you mean 1)moving local_irq_disable() to the place before it. 2) after interrupt
> is disabled, set KVM_REQ_EVENT in case the ON bit is set?

2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit 
is set.

> 
> "if (kvm_x86_ops->hwapic_irr_update)
> 	kvm_x86_ops->hwapic_irr_update(vcpu,
> 			kvm_lapic_find_highest_irr(vcpu));
> 
> > kvm_lapic_find_highest_irr(vcpu) eats some cache
> > (4 cachelines) versus 1 cacheline for reading ON bit.
> > 
> > > > > > Please remove blocked and wakeup_cpu, they should not be necessary.
> > > > >
> > > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > > > find the right list to wake up the vCPU.
> > > >
> > > > If the vCPU was moved it should have updated IRTE destination field
> > > > to the pCPU which it has moved to?
> > >
> > > Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> > > would be updated accordingly.
> > >
> > > When vCPU is blocked. To wake up the blocked vCPU, we need to find which
> > > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> > 
> > Right, perhaps prev_vcpu is a better name.
> 
> Do you mean "prev_pcpu"?

Yes.



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-25 23:17               ` Marcelo Tosatti
@ 2015-03-27  6:34                 ` Wu, Feng
  2015-03-27 19:30                   ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-03-27  6:34 UTC (permalink / raw)
  To: Marcelo Tosatti, hpa
  Cc: tglx, mingo, hpa, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Thursday, March 26, 2015 7:18 AM
> To: Wu, Feng; hpa@zytor.com
> Cc: tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Mon, Mar 16, 2015 at 11:42:06AM +0000, Wu, Feng wrote:
> > > Do you have any reason why having the code at vcpu_put/vcpu_load is
> > > better than the proposal to have the code at kvm_vcpu_block?
> >
> > I think your proposal is good, I just want to better understand your idea
> here.:)
> 
> Reduce the overhead of vcpu sched in / vcpu sched out, basically.
> 
> > One thing, even we put the code to kvm_vcpu_block, we still need to add
> code
> > at vcpu_put/vcpu_load for the preemption case like what I did now.
> >
> > >
> > > >
> > > > Global vector is a limited resources in the system, and this involves
> > > > common x86 interrupt code changes. I am not sure we can allocate
> > > > so many dedicated global vector for KVM usage.
> > >
> > > Why not? Have KVM use all free vectors (so if vectors are necessary for
> > > other purposes, people should shrink the KVM vector pool).
> >
> > If we want to allocate more global vector for this usage, we need hpa's
> > input about it. Peter, what is your opinion?
> 
> Peter?
> 
> > > BTW the Intel docs talk about that ("one vector per vCPU").
> > Yes, the Spec talks about this, but it is more complex using one vector per
> vCPU.
> >
> > >
> > > > > > > It seems there is a bunch free:
> > > > > > >
> > > > > > > commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> > > > > > > Author: Alex Shi <alex.shi@intel.com>
> > > > > > > Date:   Thu Jun 28 09:02:23 2012 +0800
> > > > > > >
> > > > > > >     x86/tlb: replace INVALIDATE_TLB_VECTOR by
> > > > > CALL_FUNCTION_VECTOR
> > > > > > >
> > > > > > > Can you add only vcpus which have posted IRTEs that point to this
> pCPU
> > > > > > > to the HLT'ed vcpu lists? (so for example, vcpus without assigned
> > > > > > > devices are not part of the list).
> > > > > >
> > > > > > Is it easy to find whether a vCPU (or the associated domain) has
> assigned
> > > > > devices?
> > > > > > If so, we can only add those vCPUs with assigned devices.
> > > > >
> > > > > When configuring IRTE, at kvm_arch_vfio_update_pi_irte?
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > > > > > +
> > > > > > > >  static int __init vmx_init(void)
> > > > > > > >  {
> > > > > > > >  	int r, i, msr;
> > > > > > > > @@ -9429,6 +9523,8 @@ static int __init vmx_init(void)
> > > > > > > >
> > > > > > > >  	update_ple_window_actual_max();
> > > > > > > >
> > > > > > > > +	wakeup_handler_callback = wakeup_handler;
> > > > > > > > +
> > > > > > > >  	return 0;
> > > > > > > >
> > > > > > > >  out7:
> > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > index 0033df3..1551a46 100644
> > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > @@ -6152,6 +6152,21 @@ static int vcpu_enter_guest(struct
> > > kvm_vcpu
> > > > > > > *vcpu)
> > > > > > > >  			kvm_vcpu_reload_apic_access_page(vcpu);
> > > > > > > >  	}
> > > > > > > >
> > > > > > > > +	/*
> > > > > > > > +	 * Since posted-interrupts can be set by VT-d HW now, in this
> > > > > > > > +	 * case, KVM_REQ_EVENT is not set. We move the following
> > > > > > > > +	 * operations out of the if statement.
> > > > > > > > +	 */
> > > > > > > > +	if (kvm_lapic_enabled(vcpu)) {
> > > > > > > > +		/*
> > > > > > > > +		 * Update architecture specific hints for APIC
> > > > > > > > +		 * virtual interrupt delivery.
> > > > > > > > +		 */
> > > > > > > > +		if (kvm_x86_ops->hwapic_irr_update)
> > > > > > > > +			kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > > > > +				kvm_lapic_find_highest_irr(vcpu));
> > > > > > > > +	}
> > > > > > > > +
> > > > > > >
> > > > > > > This is a hot fast path. You can set KVM_REQ_EVENT from
> > > wakeup_handler.
> > > > > >
> > > > > > I am afraid Setting KVM_REQ_EVENT from wakeup_handler doesn't
> help
> > > > > much,
> > > > > > if vCPU is running in ROOT mode, and VT-d hardware issues an
> notification
> > > > > event,
> > > > > > POSTED_INTR_VECTOR interrupt handler will be called.
> > > > >
> > > > > If vCPU is in root mode, remapping HW will find IRTE configured with
> > > > > vector == POSTED_INTR_WAKEUP_VECTOR, use that vector, which will
> > > > > VM-exit, and execute the interrupt handler wakeup_handler. Right?
> > > >
> > > > There are two cases:
> > > > Case 1: vCPU is blocked, so it is in root mode, this is what you described
> > > above.
> > > > Case 2, vCPU is running in root mode, such as, handling vm-exits, in this
> case,
> > > > the notification vector is 'POSTED_INTR_VECTOR', and if external
> interrupts
> > > > from assigned devices happen, the handled of 'POSTED_INTR_VECTOR'
> will
> > > > be called ( it is 'smp_kvm_posted_intr_ipi' in fact), this routine doesn't
> need
> > > > do real things, since the pending interrupts in PIR will be synced to vIRR
> > > before
> > > > VM-Entry (this code have already been there when enabling CPU-side
> > > > posted-interrupt along with APICv). Like what I said before, it is a little
> hard
> > > to
> > > > get vCPU related information in it, even if we get, it is not accurate and
> may
> > > harm
> > > > the performance.(need search)
> > > >
> > > > So only setting KVM_REQ_EVENT in wakeup_handler cannot cover the
> > > notification
> > > > event for 'POSTED_INTR_VECTOR'.
> > > >
> > > > >
> > > > > The point of this comment is that you can keep the
> > > > >
> > > > > "if (kvm_x86_ops->hwapic_irr_update)
> > > > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > > 			kvm_lapic_find_highest_irr(vcpu));
> > > > > "
> > > > >
> > > > > Code inside KVM_REQ_EVENT handling section of vcpu_run, as long as
> > > > > wakeup_handler sets KVM_REQ_EVENT.
> > > >
> > > > Please see above.
> > >
> > > OK can you set KVM_REQ_EVENT in case the ON bit is set,
> > > after disabling interrupts ?
> > >
> > Currently, the following code is executed before local_irq_disable() is called,
> > so do you mean 1)moving local_irq_disable() to the place before it. 2) after
> interrupt
> > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> 
> 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> is set.

Here is my understanding about your comments here:
- Disable interrupts
- Check 'ON'
- Set KVM_REQ_EVENT if 'ON' is set

Then we can put the above code inside " if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
just like it used to be. However, I still have some questions about this comment:

1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(), or other places?
If in vcpu_enter_guest(), since currently local_irq_disable() is called after 'KVM_REQ_EVENT'
is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is called?
2. 'ON' is set by VT-d hardware, it can be set even when interrupt is disabled (the related bit in PIR is also set).
So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly after interrupt is disabled?

I might miss something in your comments, if so please point out. Thanks a lot!

Thanks,
Feng

> 
> >
> > "if (kvm_x86_ops->hwapic_irr_update)
> > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > 			kvm_lapic_find_highest_irr(vcpu));
> >
> > > kvm_lapic_find_highest_irr(vcpu) eats some cache
> > > (4 cachelines) versus 1 cacheline for reading ON bit.
> > >
> > > > > > > Please remove blocked and wakeup_cpu, they should not be
> necessary.
> > > > > >
> > > > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > > > > find the right list to wake up the vCPU.
> > > > >
> > > > > If the vCPU was moved it should have updated IRTE destination field
> > > > > to the pCPU which it has moved to?
> > > >
> > > > Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> > > > would be updated accordingly.
> > > >
> > > > When vCPU is blocked. To wake up the blocked vCPU, we need to find
> which
> > > > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> > >
> > > Right, perhaps prev_vcpu is a better name.
> >
> > Do you mean "prev_pcpu"?
> 
> Yes.
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-27  6:34                 ` Wu, Feng
@ 2015-03-27 19:30                   ` Marcelo Tosatti
  2015-03-30  4:46                     ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-03-27 19:30 UTC (permalink / raw)
  To: Wu, Feng
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > Currently, the following code is executed before local_irq_disable() is called,
> > > so do you mean 1)moving local_irq_disable() to the place before it. 2) after
> > interrupt
> > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > 
> > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > is set.
> 
> Here is my understanding about your comments here:
> - Disable interrupts
> - Check 'ON'
> - Set KVM_REQ_EVENT if 'ON' is set
> 
> Then we can put the above code inside " if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> just like it used to be. However, I still have some questions about this comment:
> 
> 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(), or other places?

See below:

> If in vcpu_enter_guest(), since currently local_irq_disable() is called after 'KVM_REQ_EVENT'
> is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is called?

        local_irq_disable();

	*** add code here ***

        if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
						^^^^^^^^^^^^^^
            || need_resched() || signal_pending(current)) {
                vcpu->mode = OUTSIDE_GUEST_MODE;
                smp_wmb();
                local_irq_enable();
                preempt_enable();
                vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
                r = 1;
                goto cancel_injection;
        }

> 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is disabled (the related bit in PIR is also set).

Yes, we are checking if the HW has set an interrupt in PIR while
outside VM (which requires PIR->VIRR transfer by software).

If the interrupt it set by hardware after local_irq_disable(), 
VMX-entry will handle the interrupt and perform the PIR->VIRR
transfer and reevaluate interrupts, injecting to guest 
if necessary, is that correct ?

> So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly after interrupt is disabled?

To replace the costly 

+            */
+           if (kvm_x86_ops->hwapic_irr_update)
+                   kvm_x86_ops->hwapic_irr_update(vcpu,
+                           kvm_lapic_find_highest_irr(vcpu));

Yes, i think so.

> I might miss something in your comments, if so please point out. Thanks a lot!
> 
> Thanks,
> Feng
> 
> > 
> > >
> > > "if (kvm_x86_ops->hwapic_irr_update)
> > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > 			kvm_lapic_find_highest_irr(vcpu));
> > >
> > > > kvm_lapic_find_highest_irr(vcpu) eats some cache
> > > > (4 cachelines) versus 1 cacheline for reading ON bit.
> > > >
> > > > > > > > Please remove blocked and wakeup_cpu, they should not be
> > necessary.
> > > > > > >
> > > > > > > Why do you think wakeup_cpu is not needed, when vCPU is blocked,
> > > > > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after vCPU
> > > > > > > is woken up, it can run on a different cpu, so we need wakeup_cpu to
> > > > > > > find the right list to wake up the vCPU.
> > > > > >
> > > > > > If the vCPU was moved it should have updated IRTE destination field
> > > > > > to the pCPU which it has moved to?
> > > > >
> > > > > Every time a vCPU is scheduled to a new pCPU, the IRTE destination filed
> > > > > would be updated accordingly.
> > > > >
> > > > > When vCPU is blocked. To wake up the blocked vCPU, we need to find
> > which
> > > > > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> > > >
> > > > Right, perhaps prev_vcpu is a better name.
> > >
> > > Do you mean "prev_pcpu"?
> > 
> > Yes.
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-27 19:30                   ` Marcelo Tosatti
@ 2015-03-30  4:46                     ` Wu, Feng
  2015-03-30 23:55                       ` Marcelo Tosatti
  0 siblings, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-03-30  4:46 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Saturday, March 28, 2015 3:30 AM
> To: Wu, Feng
> Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > Currently, the following code is executed before local_irq_disable() is
> called,
> > > > so do you mean 1)moving local_irq_disable() to the place before it. 2) after
> > > interrupt
> > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > >
> > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > is set.
> >
> > Here is my understanding about your comments here:
> > - Disable interrupts
> > - Check 'ON'
> > - Set KVM_REQ_EVENT if 'ON' is set
> >
> > Then we can put the above code inside " if
> (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > just like it used to be. However, I still have some questions about this
> comment:
> >
> > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(), or
> other places?
> 
> See below:
> 
> > If in vcpu_enter_guest(), since currently local_irq_disable() is called after
> 'KVM_REQ_EVENT'
> > is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is
> called?
> 
>         local_irq_disable();
> 
> 	*** add code here ***

So we need add code like the following here, right?

          if ('ON' is set)
              kvm_make_request(KVM_REQ_EVENT, vcpu);

> 
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> 						^^^^^^^^^^^^^^
>             || need_resched() || signal_pending(current)) {
>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>                 smp_wmb();
>                 local_irq_enable();
>                 preempt_enable();
>                 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
>                 r = 1;
>                 goto cancel_injection;
>         }
> 
> > 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is disabled
> (the related bit in PIR is also set).
> 
> Yes, we are checking if the HW has set an interrupt in PIR while
> outside VM (which requires PIR->VIRR transfer by software).
> 
> If the interrupt it set by hardware after local_irq_disable(),
> VMX-entry will handle the interrupt and perform the PIR->VIRR
> transfer and reevaluate interrupts, injecting to guest
> if necessary, is that correct ?
> 
> > So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly
> after interrupt is disabled?
> 
> To replace the costly
> 
> +            */
> +           if (kvm_x86_ops->hwapic_irr_update)
> +                   kvm_x86_ops->hwapic_irr_update(vcpu,
> +                           kvm_lapic_find_highest_irr(vcpu));
> 
> Yes, i think so.

After adding the "checking ON and setting KVM_REQ_EVENT" operations listed in my
comments above, do you mean we still need to keep the costly code above
inside "if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {}" in function
vcpu_enter_guest() as it used to be? If yes, my question is what is the exact purpose
of "checking ON and setting KVM_REQ_EVENT" operations? Here is the code flow in
vcpu_enter_guest():

1. Check KVM_REQ_EVENT, if it is set, sync pir->virr
2. Disable interrupts
3. Check ON and set KVM_REQ_EVENT -- Here, we set KVM_REQ_EVENT, but it is
checked in the step 1, which means, we cannot get any benefits even we set it here,
since the "pir->virr" sync operation was done in step 1, between step 3 and VM-Entry,
we don't synchronize the pir to virr. So even we set KVM_REQ_EVENT here, the interrupts
remaining in PIR cannot be delivered to guest during this VM-Entry, right?

Thanks,
Feng

> 
> > I might miss something in your comments, if so please point out. Thanks a
> lot!
> >
> > Thanks,
> > Feng
> >
> > >
> > > >
> > > > "if (kvm_x86_ops->hwapic_irr_update)
> > > > 	kvm_x86_ops->hwapic_irr_update(vcpu,
> > > > 			kvm_lapic_find_highest_irr(vcpu));
> > > >
> > > > > kvm_lapic_find_highest_irr(vcpu) eats some cache
> > > > > (4 cachelines) versus 1 cacheline for reading ON bit.
> > > > >
> > > > > > > > > Please remove blocked and wakeup_cpu, they should not be
> > > necessary.
> > > > > > > >
> > > > > > > > Why do you think wakeup_cpu is not needed, when vCPU is
> blocked,
> > > > > > > > wakeup_cpu saves the cpu which the vCPU is blocked on, after
> vCPU
> > > > > > > > is woken up, it can run on a different cpu, so we need wakeup_cpu
> to
> > > > > > > > find the right list to wake up the vCPU.
> > > > > > >
> > > > > > > If the vCPU was moved it should have updated IRTE destination field
> > > > > > > to the pCPU which it has moved to?
> > > > > >
> > > > > > Every time a vCPU is scheduled to a new pCPU, the IRTE destination
> filed
> > > > > > would be updated accordingly.
> > > > > >
> > > > > > When vCPU is blocked. To wake up the blocked vCPU, we need to find
> > > which
> > > > > > list the vCPU is blocked on, and this is what wakeup_cpu used for?
> > > > >
> > > > > Right, perhaps prev_vcpu is a better name.
> > > >
> > > > Do you mean "prev_pcpu"?
> > >
> > > Yes.
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-30  4:46                     ` Wu, Feng
@ 2015-03-30 23:55                       ` Marcelo Tosatti
  2015-03-31  1:13                         ` Wu, Feng
  2015-04-14  7:37                         ` Wu, Feng
  0 siblings, 2 replies; 140+ messages in thread
From: Marcelo Tosatti @ 2015-03-30 23:55 UTC (permalink / raw)
  To: Wu, Feng, f
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Mon, Mar 30, 2015 at 04:46:55AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Saturday, March 28, 2015 3:30 AM
> > To: Wu, Feng
> > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > > Currently, the following code is executed before local_irq_disable() is
> > called,
> > > > > so do you mean 1)moving local_irq_disable() to the place before it. 2) after
> > > > interrupt
> > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > >
> > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > > is set.
> > >
> > > Here is my understanding about your comments here:
> > > - Disable interrupts
> > > - Check 'ON'
> > > - Set KVM_REQ_EVENT if 'ON' is set
> > >
> > > Then we can put the above code inside " if
> > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > just like it used to be. However, I still have some questions about this
> > comment:
> > >
> > > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(), or
> > other places?
> > 
> > See below:
> > 
> > > If in vcpu_enter_guest(), since currently local_irq_disable() is called after
> > 'KVM_REQ_EVENT'
> > > is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is
> > called?
> > 
> >         local_irq_disable();
> > 
> > 	*** add code here ***
> 
> So we need add code like the following here, right?
> 
>           if ('ON' is set)
>               kvm_make_request(KVM_REQ_EVENT, vcpu);

Yes.

> >         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> > 						^^^^^^^^^^^^^^

Point *1.

> >             || need_resched() || signal_pending(current)) {
> >                 vcpu->mode = OUTSIDE_GUEST_MODE;
> >                 smp_wmb();
> >                 local_irq_enable();
> >                 preempt_enable();
> >                 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> >                 r = 1;
> >                 goto cancel_injection;
> >         }
> > 
> > > 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is disabled
> > (the related bit in PIR is also set).
> > 
> > Yes, we are checking if the HW has set an interrupt in PIR while
> > outside VM (which requires PIR->VIRR transfer by software).
> > 
> > If the interrupt it set by hardware after local_irq_disable(),
> > VMX-entry will handle the interrupt and perform the PIR->VIRR
> > transfer and reevaluate interrupts, injecting to guest
> > if necessary, is that correct ?
> > 
> > > So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly
> > after interrupt is disabled?
> > 
> > To replace the costly
> > 
> > +            */
> > +           if (kvm_x86_ops->hwapic_irr_update)
> > +                   kvm_x86_ops->hwapic_irr_update(vcpu,
> > +                           kvm_lapic_find_highest_irr(vcpu));
> > 
> > Yes, i think so.
> 
> After adding the "checking ON and setting KVM_REQ_EVENT" operations listed in my
> comments above, do you mean we still need to keep the costly code above
> inside "if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {}" in function
> vcpu_enter_guest() as it used to be? If yes, my question is what is the exact purpose
> of "checking ON and setting KVM_REQ_EVENT" operations? Here is the code flow in
> vcpu_enter_guest():
> 
> 1. Check KVM_REQ_EVENT, if it is set, sync pir->virr
> 2. Disable interrupts
> 3. Check ON and set KVM_REQ_EVENT -- Here, we set KVM_REQ_EVENT, but it is
> checked in the step 1, which means, we cannot get any benefits even we set it here,
> since the "pir->virr" sync operation was done in step 1, between step 3 and VM-Entry,
> we don't synchronize the pir to virr. So even we set KVM_REQ_EVENT here, the interrupts
> remaining in PIR cannot be delivered to guest during this VM-Entry, right?

Please check point *1 above. The code will go back to  

"if (kvm_check_request(KVM_REQ_EVENT, vcpu)"

And perform the pir->virr sync.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-30 23:55                       ` Marcelo Tosatti
@ 2015-03-31  1:13                         ` Wu, Feng
  2015-04-14  7:37                         ` Wu, Feng
  1 sibling, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-03-31  1:13 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Tuesday, March 31, 2015 7:56 AM
> To: Wu, Feng
> Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is
> blocked
> 
> On Mon, Mar 30, 2015 at 04:46:55AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Saturday, March 28, 2015 3:30 AM
> > > To: Wu, Feng
> > > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > > > Currently, the following code is executed before local_irq_disable() is
> > > called,
> > > > > > so do you mean 1)moving local_irq_disable() to the place before it. 2)
> after
> > > > > interrupt
> > > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > > >
> > > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > > > is set.
> > > >
> > > > Here is my understanding about your comments here:
> > > > - Disable interrupts
> > > > - Check 'ON'
> > > > - Set KVM_REQ_EVENT if 'ON' is set
> > > >
> > > > Then we can put the above code inside " if
> > > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > > just like it used to be. However, I still have some questions about this
> > > comment:
> > > >
> > > > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(),
> or
> > > other places?
> > >
> > > See below:
> > >
> > > > If in vcpu_enter_guest(), since currently local_irq_disable() is called after
> > > 'KVM_REQ_EVENT'
> > > > is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is
> > > called?
> > >
> > >         local_irq_disable();
> > >
> > > 	*** add code here ***
> >
> > So we need add code like the following here, right?
> >
> >           if ('ON' is set)
> >               kvm_make_request(KVM_REQ_EVENT, vcpu);
> 
> Yes.
> 
> > >         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> > > 						^^^^^^^^^^^^^^
> 
> Point *1.
> 
> > >             || need_resched() || signal_pending(current)) {
> > >                 vcpu->mode = OUTSIDE_GUEST_MODE;
> > >                 smp_wmb();
> > >                 local_irq_enable();
> > >                 preempt_enable();
> > >                 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> > >                 r = 1;
> > >                 goto cancel_injection;
> > >         }
> > >
> > > > 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is
> disabled
> > > (the related bit in PIR is also set).
> > >
> > > Yes, we are checking if the HW has set an interrupt in PIR while
> > > outside VM (which requires PIR->VIRR transfer by software).
> > >
> > > If the interrupt it set by hardware after local_irq_disable(),
> > > VMX-entry will handle the interrupt and perform the PIR->VIRR
> > > transfer and reevaluate interrupts, injecting to guest
> > > if necessary, is that correct ?
> > >
> > > > So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly
> > > after interrupt is disabled?
> > >
> > > To replace the costly
> > >
> > > +            */
> > > +           if (kvm_x86_ops->hwapic_irr_update)
> > > +                   kvm_x86_ops->hwapic_irr_update(vcpu,
> > > +                           kvm_lapic_find_highest_irr(vcpu));
> > >
> > > Yes, i think so.
> >
> > After adding the "checking ON and setting KVM_REQ_EVENT" operations
> listed in my
> > comments above, do you mean we still need to keep the costly code above
> > inside "if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {}" in
> function
> > vcpu_enter_guest() as it used to be? If yes, my question is what is the exact
> purpose
> > of "checking ON and setting KVM_REQ_EVENT" operations? Here is the code
> flow in
> > vcpu_enter_guest():
> >
> > 1. Check KVM_REQ_EVENT, if it is set, sync pir->virr
> > 2. Disable interrupts
> > 3. Check ON and set KVM_REQ_EVENT -- Here, we set KVM_REQ_EVENT, but
> it is
> > checked in the step 1, which means, we cannot get any benefits even we set it
> here,
> > since the "pir->virr" sync operation was done in step 1, between step 3 and
> VM-Entry,
> > we don't synchronize the pir to virr. So even we set KVM_REQ_EVENT here,
> the interrupts
> > remaining in PIR cannot be delivered to guest during this VM-Entry, right?
> 
> Please check point *1 above. The code will go back to
> 
> "if (kvm_check_request(KVM_REQ_EVENT, vcpu)"
> 
> And perform the pir->virr sync.

Ah, yes, that is the point I was missing. Thanks for pointing this out!

Thanks,
Feng

^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-03-30 23:55                       ` Marcelo Tosatti
  2015-03-31  1:13                         ` Wu, Feng
@ 2015-04-14  7:37                         ` Wu, Feng
  2015-06-05 21:59                           ` Marcelo Tosatti
  1 sibling, 1 reply; 140+ messages in thread
From: Wu, Feng @ 2015-04-14  7:37 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Tuesday, March 31, 2015 7:56 AM
> To: Wu, Feng
> Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Mon, Mar 30, 2015 at 04:46:55AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Saturday, March 28, 2015 3:30 AM
> > > To: Wu, Feng
> > > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > > > Currently, the following code is executed before local_irq_disable() is
> > > called,
> > > > > > so do you mean 1)moving local_irq_disable() to the place before it. 2)
> after
> > > > > interrupt
> > > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > > >
> > > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > > > is set.
> > > >
> > > > Here is my understanding about your comments here:
> > > > - Disable interrupts
> > > > - Check 'ON'
> > > > - Set KVM_REQ_EVENT if 'ON' is set
> > > >
> > > > Then we can put the above code inside " if
> > > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > > just like it used to be. However, I still have some questions about this
> > > comment:
> > > >
> > > > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(),
> or
> > > other places?
> > >
> > > See below:
> > >
> > > > If in vcpu_enter_guest(), since currently local_irq_disable() is called after
> > > 'KVM_REQ_EVENT'
> > > > is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is
> > > called?
> > >
> > >         local_irq_disable();
> > >
> > > 	*** add code here ***
> >
> > So we need add code like the following here, right?
> >
> >           if ('ON' is set)
> >               kvm_make_request(KVM_REQ_EVENT, vcpu);
> 

Hi Marcelo,

I changed the code as above, then I found that the ping latency was extremely big, (70ms - 400ms).
I digged into it and got the root cause. We cannot use "checking-on" as the judgment, since 'ON'
can be cleared by hypervisor software in lots of places. In this case, KVM_REQ_EVENT cannot be
set when we check 'ON' bit, hence the interrupts are not injected to the guest in time.

Please refer to the following code, in which 'ON' bit can be cleared:

apic_find_highest_irr () --> vmx_sync_pir_to_irr () --> pi_test_and_clear_on()

Searching from the code step by step, apic_find_highest_irr() can be called by many other guys.

Thanks,
Feng

> Yes.
> 
> > >         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> > > 						^^^^^^^^^^^^^^
> 
> Point *1.
> 
> > >             || need_resched() || signal_pending(current)) {
> > >                 vcpu->mode = OUTSIDE_GUEST_MODE;
> > >                 smp_wmb();
> > >                 local_irq_enable();
> > >                 preempt_enable();
> > >                 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> > >                 r = 1;
> > >                 goto cancel_injection;
> > >         }
> > >
> > > > 2. 'ON' is set by VT-d hardware, it can be set even when interrupt is
> disabled
> > > (the related bit in PIR is also set).
> > >
> > > Yes, we are checking if the HW has set an interrupt in PIR while
> > > outside VM (which requires PIR->VIRR transfer by software).
> > >
> > > If the interrupt it set by hardware after local_irq_disable(),
> > > VMX-entry will handle the interrupt and perform the PIR->VIRR
> > > transfer and reevaluate interrupts, injecting to guest
> > > if necessary, is that correct ?
> > >
> > > > So does it make sense to check 'ON' and set KVM_REQ_EVENT accordingly
> > > after interrupt is disabled?
> > >
> > > To replace the costly
> > >
> > > +            */
> > > +           if (kvm_x86_ops->hwapic_irr_update)
> > > +                   kvm_x86_ops->hwapic_irr_update(vcpu,
> > > +                           kvm_lapic_find_highest_irr(vcpu));
> > >
> > > Yes, i think so.
> >
> > After adding the "checking ON and setting KVM_REQ_EVENT" operations
> listed in my
> > comments above, do you mean we still need to keep the costly code above
> > inside "if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {}" in
> function
> > vcpu_enter_guest() as it used to be? If yes, my question is what is the exact
> purpose
> > of "checking ON and setting KVM_REQ_EVENT" operations? Here is the code
> flow in
> > vcpu_enter_guest():
> >
> > 1. Check KVM_REQ_EVENT, if it is set, sync pir->virr
> > 2. Disable interrupts
> > 3. Check ON and set KVM_REQ_EVENT -- Here, we set KVM_REQ_EVENT, but
> it is
> > checked in the step 1, which means, we cannot get any benefits even we set
> it here,
> > since the "pir->virr" sync operation was done in step 1, between step 3 and
> VM-Entry,
> > we don't synchronize the pir to virr. So even we set KVM_REQ_EVENT here,
> the interrupts
> > remaining in PIR cannot be delivered to guest during this VM-Entry, right?
> 
> Please check point *1 above. The code will go back to
> 
> "if (kvm_check_request(KVM_REQ_EVENT, vcpu)"
> 
> And perform the pir->virr sync.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-04-14  7:37                         ` Wu, Feng
@ 2015-06-05 21:59                           ` Marcelo Tosatti
  2015-06-08  1:43                             ` Wu, Feng
  0 siblings, 1 reply; 140+ messages in thread
From: Marcelo Tosatti @ 2015-06-05 21:59 UTC (permalink / raw)
  To: Wu, Feng
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm

On Tue, Apr 14, 2015 at 07:37:44AM +0000, Wu, Feng wrote:
> 
> 
> > -----Original Message-----
> > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > Sent: Tuesday, March 31, 2015 7:56 AM
> > To: Wu, Feng
> > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Mon, Mar 30, 2015 at 04:46:55AM +0000, Wu, Feng wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > Sent: Saturday, March 28, 2015 3:30 AM
> > > > To: Wu, Feng
> > > > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com;
> > x86@kernel.org;
> > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> > vCPU
> > > > is blocked
> > > >
> > > > On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > > > > Currently, the following code is executed before local_irq_disable() is
> > > > called,
> > > > > > > so do you mean 1)moving local_irq_disable() to the place before it. 2)
> > after
> > > > > > interrupt
> > > > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > > > >
> > > > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON bit
> > > > > > is set.
> > > > >
> > > > > Here is my understanding about your comments here:
> > > > > - Disable interrupts
> > > > > - Check 'ON'
> > > > > - Set KVM_REQ_EVENT if 'ON' is set
> > > > >
> > > > > Then we can put the above code inside " if
> > > > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > > > just like it used to be. However, I still have some questions about this
> > > > comment:
> > > > >
> > > > > 1. Where should I set KVM_REQ_EVENT? In function vcpu_enter_guest(),
> > or
> > > > other places?
> > > >
> > > > See below:
> > > >
> > > > > If in vcpu_enter_guest(), since currently local_irq_disable() is called after
> > > > 'KVM_REQ_EVENT'
> > > > > is checked, is it helpful to set KVM_REQ_EVENT after local_irq_disable() is
> > > > called?
> > > >
> > > >         local_irq_disable();
> > > >
> > > > 	*** add code here ***
> > >
> > > So we need add code like the following here, right?
> > >
> > >           if ('ON' is set)
> > >               kvm_make_request(KVM_REQ_EVENT, vcpu);
> > 
> 
> Hi Marcelo,
> 
> I changed the code as above, then I found that the ping latency was extremely big, (70ms - 400ms).
> I digged into it and got the root cause. We cannot use "checking-on" as the judgment, since 'ON'
> can be cleared by hypervisor software in lots of places. In this case, KVM_REQ_EVENT cannot be
> set when we check 'ON' bit, hence the interrupts are not injected to the guest in time.
> 
> Please refer to the following code, in which 'ON' bit can be cleared:
> 
> apic_find_highest_irr () --> vmx_sync_pir_to_irr () --> pi_test_and_clear_on()
> 
> Searching from the code step by step, apic_find_highest_irr() can be called by many other guys.
> 
> Thanks,

Ok then, ignore my suggestion.

Can you resend the latest version please ?



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-06-05 21:59                           ` Marcelo Tosatti
@ 2015-06-08  1:43                             ` Wu, Feng
  0 siblings, 0 replies; 140+ messages in thread
From: Wu, Feng @ 2015-06-08  1:43 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: hpa, tglx, mingo, x86, gleb, pbonzini, dwmw2, joro,
	alex.williamson, jiang.liu, eric.auger, linux-kernel, iommu, kvm,
	Wu, Feng



> -----Original Message-----
> From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> Sent: Saturday, June 06, 2015 5:59 AM
> To: Wu, Feng
> Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> On Tue, Apr 14, 2015 at 07:37:44AM +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > Sent: Tuesday, March 31, 2015 7:56 AM
> > > To: Wu, Feng
> > > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com;
> x86@kernel.org;
> > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > joro@8bytes.org; alex.williamson@redhat.com; jiang.liu@linux.intel.com;
> > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> vCPU
> > > is blocked
> > >
> > > On Mon, Mar 30, 2015 at 04:46:55AM +0000, Wu, Feng wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Marcelo Tosatti [mailto:mtosatti@redhat.com]
> > > > > Sent: Saturday, March 28, 2015 3:30 AM
> > > > > To: Wu, Feng
> > > > > Cc: hpa@zytor.com; tglx@linutronix.de; mingo@redhat.com;
> > > x86@kernel.org;
> > > > > gleb@kernel.org; pbonzini@redhat.com; dwmw2@infradead.org;
> > > > > joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com;
> > > > > eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> > > > > iommu@lists.linux-foundation.org; kvm@vger.kernel.org
> > > > > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when
> > > vCPU
> > > > > is blocked
> > > > >
> > > > > On Fri, Mar 27, 2015 at 06:34:14AM +0000, Wu, Feng wrote:
> > > > > > > > Currently, the following code is executed before local_irq_disable()
> is
> > > > > called,
> > > > > > > > so do you mean 1)moving local_irq_disable() to the place before it.
> 2)
> > > after
> > > > > > > interrupt
> > > > > > > > is disabled, set KVM_REQ_EVENT in case the ON bit is set?
> > > > > > >
> > > > > > > 2) after interrupt is disabled, set KVM_REQ_EVENT in case the ON
> bit
> > > > > > > is set.
> > > > > >
> > > > > > Here is my understanding about your comments here:
> > > > > > - Disable interrupts
> > > > > > - Check 'ON'
> > > > > > - Set KVM_REQ_EVENT if 'ON' is set
> > > > > >
> > > > > > Then we can put the above code inside " if
> > > > > (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) "
> > > > > > just like it used to be. However, I still have some questions about this
> > > > > comment:
> > > > > >
> > > > > > 1. Where should I set KVM_REQ_EVENT? In function
> vcpu_enter_guest(),
> > > or
> > > > > other places?
> > > > >
> > > > > See below:
> > > > >
> > > > > > If in vcpu_enter_guest(), since currently local_irq_disable() is called
> after
> > > > > 'KVM_REQ_EVENT'
> > > > > > is checked, is it helpful to set KVM_REQ_EVENT after
> local_irq_disable() is
> > > > > called?
> > > > >
> > > > >         local_irq_disable();
> > > > >
> > > > > 	*** add code here ***
> > > >
> > > > So we need add code like the following here, right?
> > > >
> > > >           if ('ON' is set)
> > > >               kvm_make_request(KVM_REQ_EVENT, vcpu);
> > >
> >
> > Hi Marcelo,
> >
> > I changed the code as above, then I found that the ping latency was
> extremely big, (70ms - 400ms).
> > I digged into it and got the root cause. We cannot use "checking-on" as the
> judgment, since 'ON'
> > can be cleared by hypervisor software in lots of places. In this case,
> KVM_REQ_EVENT cannot be
> > set when we check 'ON' bit, hence the interrupts are not injected to the guest
> in time.
> >
> > Please refer to the following code, in which 'ON' bit can be cleared:
> >
> > apic_find_highest_irr () --> vmx_sync_pir_to_irr () --> pi_test_and_clear_on()
> >
> > Searching from the code step by step, apic_find_highest_irr() can be called by
> many other guys.
> >
> > Thanks,
> 
> Ok then, ignore my suggestion.
> 
> Can you resend the latest version please ?

Thanks for your review, I will send the new version soon.

Thanks,
Feng

> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2015-06-08  1:43 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-12 15:14 [v3 00/26] Add VT-d Posted-Interrupts support Feng Wu
2014-12-12 15:14 ` [v3 01/26] genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU Feng Wu
2014-12-12 15:14 ` [v3 02/26] iommu: Add new member capability to struct irq_remap_ops Feng Wu
2015-01-28 15:22   ` David Woodhouse
2015-01-29  8:34     ` Wu, Feng
2014-12-12 15:14 ` [v3 03/26] iommu, x86: Define new irte structure for VT-d Posted-Interrupts Feng Wu
2015-01-28 15:26   ` David Woodhouse
2014-12-12 15:14 ` [v3 04/26] iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip Feng Wu
2015-01-28 15:26   ` David Woodhouse
2015-01-29  7:55     ` Wu, Feng
2014-12-12 15:14 ` [v3 05/26] x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller Feng Wu
2014-12-12 15:14 ` [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts Feng Wu
2014-12-18 14:26   ` Zhang, Yang Z
2014-12-19  1:40     ` Wu, Feng
2014-12-19  1:46       ` Zhang, Yang Z
2014-12-19 11:59         ` Paolo Bonzini
2014-12-23  0:37           ` Zhang, Yang Z
2014-12-23  8:47             ` Paolo Bonzini
2014-12-23  9:07               ` Wu, Feng
2014-12-23  9:34                 ` Paolo Bonzini
2014-12-24  1:38                   ` Zhang, Yang Z
2014-12-24  2:12                     ` Jiang Liu
2014-12-24  2:32                       ` Zhang, Yang Z
2014-12-24  3:08                         ` Wu, Feng
2014-12-24  4:04                           ` Zhang, Yang Z
2014-12-24  4:54                         ` Jiang Liu
2015-01-28 15:29   ` David Woodhouse
2014-12-12 15:14 ` [v3 07/26] iommu, x86: Add cap_pi_support() to detect VT-d PI capability Feng Wu
2015-01-28 15:32   ` David Woodhouse
2014-12-12 15:14 ` [v3 08/26] iommu, x86: Add intel_irq_remapping_capability() for Intel Feng Wu
2015-01-28 15:37   ` David Woodhouse
2015-01-29  8:57     ` Wu, Feng
2014-12-12 15:14 ` [v3 09/26] iommu, x86: define irq_remapping_cap() Feng Wu
2014-12-12 15:14 ` [v3 10/26] KVM: change struct pi_desc for VT-d Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 11/26] KVM: Add some helper functions for Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor Feng Wu
2014-12-18 15:19   ` Zhang, Yang Z
2014-12-12 15:14 ` [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI Feng Wu
2014-12-18 14:49   ` Zhang, Yang Z
2014-12-18 16:58     ` Paolo Bonzini
2014-12-19  1:13       ` Zhang, Yang Z
2014-12-19  1:30         ` Wu, Feng
2014-12-19  1:30       ` Wu, Feng
2014-12-19  1:47         ` Zhang, Yang Z
2014-12-19 11:59         ` Paolo Bonzini
2014-12-19 23:48           ` Wu, Feng
2014-12-20 13:16             ` Paolo Bonzini
2014-12-22  4:48               ` Wu, Feng
2014-12-22  9:27                 ` Paolo Bonzini
2014-12-22 11:04                   ` Wu, Feng
2014-12-22 11:06                     ` Paolo Bonzini
2014-12-22 11:17                       ` Wu, Feng
2014-12-22 11:23                         ` Paolo Bonzini
2014-12-22 14:13                           ` Yong Wang
2015-01-09 14:54   ` Radim Krčmář
2015-01-09 14:56     ` Paolo Bonzini
2015-01-09 15:12       ` Radim Krčmář
2015-01-09 15:18         ` Paolo Bonzini
2015-01-09 15:47           ` Radim Krčmář
2015-01-13  0:27       ` Wu, Feng
2015-01-13 16:17         ` Radim Kr?má?
2015-01-14  1:27           ` Wu, Feng
2015-01-14 13:02             ` Paolo Bonzini
2015-01-14 16:59             ` Radim Kr?má?
2015-01-20 21:04               ` Nadav Amit
2015-01-21 21:16                 ` Radim Kr?má?
2014-12-12 15:14 ` [v3 14/26] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu Feng Wu
2014-12-12 15:14 ` [v3 15/26] KVM: add interfaces to control PI outside vmx Feng Wu
2014-12-12 15:14 ` [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
2014-12-17 16:17   ` Paolo Bonzini
2014-12-19  2:19     ` Wu, Feng
2014-12-19 11:59       ` Paolo Bonzini
2014-12-19 23:39         ` Wu, Feng
2014-12-12 15:14 ` [v3 17/26] KVM: make kvm_set_msi_irq() public Feng Wu
2014-12-17 17:32   ` Paolo Bonzini
2014-12-12 15:14 ` [v3 18/26] KVM: kvm-vfio: User API for VT-d Posted-Interrupts Feng Wu
2014-12-12 15:14 ` [v3 19/26] KVM: kvm-vfio: implement the VFIO skeleton " Feng Wu
2014-12-12 15:14 ` [v3 20/26] KVM: x86: kvm-vfio: VT-d posted-interrupts setup Feng Wu
2014-12-12 15:14 ` [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts Feng Wu
2014-12-18 14:54   ` Zhang, Yang Z
2014-12-19  0:52     ` Wu, Feng
2015-01-30 18:18   ` H. Peter Anvin
2015-02-02  1:06     ` Wu, Feng
2015-02-23 22:04   ` Marcelo Tosatti
2014-12-12 15:14 ` [v3 22/26] KVM: Define a wakeup worker thread for vCPU Feng Wu
2014-12-12 15:14 ` [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
2014-12-17 17:11   ` Paolo Bonzini
2014-12-18  3:15     ` Wu, Feng
2014-12-18  8:32       ` Paolo Bonzini
2014-12-19  2:09         ` Wu, Feng
2015-02-23 22:21   ` Marcelo Tosatti
2015-03-02  9:12     ` Wu, Feng
2014-12-12 15:14 ` [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
2014-12-17 17:09   ` Paolo Bonzini
2014-12-18  3:16     ` Wu, Feng
2014-12-18  8:37       ` Paolo Bonzini
2014-12-19  2:51         ` Wu, Feng
2015-02-25 21:50   ` Marcelo Tosatti
2015-02-26  8:08     ` Wu, Feng
2015-02-26 23:41       ` Marcelo Tosatti
2015-02-26 23:40   ` Marcelo Tosatti
2015-03-02 13:36     ` Wu, Feng
2015-03-04 12:06       ` Marcelo Tosatti
2015-03-06  6:51         ` Wu, Feng
2015-03-12  1:15           ` Marcelo Tosatti
2015-03-16 11:42             ` Wu, Feng
2015-03-25 23:17               ` Marcelo Tosatti
2015-03-27  6:34                 ` Wu, Feng
2015-03-27 19:30                   ` Marcelo Tosatti
2015-03-30  4:46                     ` Wu, Feng
2015-03-30 23:55                       ` Marcelo Tosatti
2015-03-31  1:13                         ` Wu, Feng
2015-04-14  7:37                         ` Wu, Feng
2015-06-05 21:59                           ` Marcelo Tosatti
2015-06-08  1:43                             ` Wu, Feng
2014-12-12 15:14 ` [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Feng Wu
2014-12-17 17:42   ` Paolo Bonzini
2014-12-18  3:14     ` Wu, Feng
2014-12-18  8:38       ` Paolo Bonzini
2014-12-18 15:09         ` Zhang, Yang Z
2014-12-19  2:58           ` Wu, Feng
2014-12-19  3:32             ` Zhang, Yang Z
2014-12-19  4:34               ` Wu, Feng
2014-12-19  4:44                 ` Zhang, Yang Z
2014-12-19  4:49                   ` Wu, Feng
2014-12-19  5:25                     ` Zhang, Yang Z
2014-12-19  5:46                       ` Wu, Feng
2014-12-19  7:04                         ` Zhang, Yang Z
2014-12-19 12:00                       ` Paolo Bonzini
2014-12-19 23:34                         ` Wu, Feng
2014-12-12 15:15 ` [v3 26/26] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-01-28 15:39   ` David Woodhouse
2014-12-16  9:04 ` [v3 00/26] Add VT-d Posted-Interrupts support Wu, Feng
2015-01-06  1:10 ` Wu, Feng
2015-01-09 12:46   ` joro
2015-01-09 13:58     ` Wu, Feng
2015-01-21  2:25 ` Wu, Feng
2015-01-28  3:01   ` Wu, Feng
2015-01-28  3:44     ` Alex Williamson
2015-01-28  4:44       ` Wu, Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).