linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/13] Add VT-d Posted-Interrupts support
@ 2015-09-16  8:49 Feng Wu
  2015-09-16  8:49 ` [PATCH v8 01/13] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:49 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v8:
refer to the changelog in each patch

v7:
* Define two weak irq bypass callbacks:
  - kvm_arch_irq_bypass_start()
  - kvm_arch_irq_bypass_stop()
* Remove the x86 dummy implementation of the above two functions.
* Print some useful information instead of WARN_ON() when the
  irq bypass consumer unregistration fails.
* Fix an issue when calling pi_pre_block and pi_post_block.

v6:
* Rebase on 4.2.0-rc6
* Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623
* Make the add_consumer and del_consumer callbacks static
* Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
* Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
* Remove optional dummy callbacks for irq producer

v4:
* For lowest-priority interrupt, only support single-CPU destination
interrupts at the current stage, more common lowest priority support
will be added later.
* Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
the posted-interrupts in the HLT emulation path.
* Some small changes (coding style, typo, add some code comments)

v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
  preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
  can be used to change back to remapping mode.
* Fix typo

v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
  base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
* Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
  then revise some irq logic based on the new hierarchy irqdomain patches provided
  by Jiang Liu <jiang.liu@linux.intel.com>

Feng Wu (13):
  KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Define a new interface kvm_intr_is_single_vcpu()
  KVM: Make struct kvm_irq_routing_table accessible
  KVM: make kvm_set_msi_irq() public
  vfio: Register/unregister irq_bypass_producer
  KVM: x86: Update IRTE for posted-interrupts
  KVM: Implement IRQ bypass consumer callbacks for x86
  KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  KVM: Warn if 'SN' is set during posting interrupts by software
  iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

 Documentation/kernel-parameters.txt |   1 +
 arch/x86/include/asm/kvm_host.h     |  24 +++
 arch/x86/kvm/Kconfig                |   1 +
 arch/x86/kvm/irq_comm.c             |  99 +++++++++-
 arch/x86/kvm/lapic.c                |   5 +-
 arch/x86/kvm/lapic.h                |   2 +
 arch/x86/kvm/trace.h                |  33 ++++
 arch/x86/kvm/vmx.c                  | 361 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                  | 106 ++++++++++-
 drivers/iommu/irq_remapping.c       |  12 +-
 drivers/vfio/pci/Kconfig            |   1 +
 drivers/vfio/pci/vfio_pci_intrs.c   |   9 +
 drivers/vfio/pci/vfio_pci_private.h |   2 +
 include/linux/kvm_host.h            |  19 ++
 virt/kvm/eventfd.c                  |  31 +++-
 virt/kvm/irqchip.c                  |  10 -
 virt/kvm/kvm_main.c                 |   3 +
 17 files changed, 687 insertions(+), 32 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v8 01/13] KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
@ 2015-09-16  8:49 ` Feng Wu
  2015-09-16  8:49 ` [PATCH v8 02/13] KVM: Add some helper functions for Posted-Interrupts Feng Wu
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:49 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..271dd70 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,8 +446,24 @@ struct nested_vmx {
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
-	u32 control;	/* bit 0 of control is outstanding notification bit */
-	u32 rsvd[7];
+	union {
+		struct {
+				/* bit 256 - Outstanding Notification */
+			u16	on	: 1,
+				/* bit 257 - Suppress Notification */
+				sn	: 1,
+				/* bit 271:258 - Reserved */
+				rsvd_1	: 14;
+				/* bit 279:272 - Notification Vector */
+			u8	nv;
+				/* bit 287:280 - Reserved */
+			u8	rsvd_2;
+				/* bit 319:288 - Notification Destination */
+			u32	ndst;
+		};
+		u64 control;
+	};
+	u32 rsvd[6];
 } __aligned(64);
 
 static bool pi_test_and_set_on(struct pi_desc *pi_desc)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 02/13] KVM: Add some helper functions for Posted-Interrupts
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
  2015-09-16  8:49 ` [PATCH v8 01/13] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
@ 2015-09-16  8:49 ` Feng Wu
  2015-09-16  8:49 ` [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:49 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 271dd70..316f9bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -443,6 +443,8 @@ struct nested_vmx {
 };
 
 #define POSTED_INTR_ON  0
+#define POSTED_INTR_SN  1
+
 /* Posted-Interrupt Descriptor */
 struct pi_desc {
 	u32 pir[8];     /* Posted interrupt requested */
@@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
 	return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
 }
 
+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+	return clear_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+	return set_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_ON,
+			(unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+	return test_bit(POSTED_INTR_SN,
+			(unsigned long *)&pi_desc->control);
+}
+
 struct vcpu_vmx {
 	struct kvm_vcpu       vcpu;
 	unsigned long         host_rsp;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
  2015-09-16  8:49 ` [PATCH v8 01/13] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
  2015-09-16  8:49 ` [PATCH v8 02/13] KVM: Add some helper functions for Posted-Interrupts Feng Wu
@ 2015-09-16  8:49 ` Feng Wu
  2015-09-16  9:23   ` Paolo Bonzini
  2015-09-16  8:50 ` [PATCH v8 04/13] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:49 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Some optimizations in kvm_intr_is_single_vcpu().
- Expose kvm_intr_is_single_vcpu() so we can use it in vmx code.
- Add kvm_intr_is_single_vcpu_fast() as the fast path to find
  the target vCPU for the single-destination interrupt

 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/kvm/irq_comm.c         | 94 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/lapic.c            |  5 +--
 arch/x86/kvm/lapic.h            |  2 +
 4 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49ec903..af11bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			     struct kvm_vcpu **dest_vcpu);
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9efff9e..97ba1d6 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -297,6 +297,100 @@ out:
 	return r;
 }
 
+static bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
+					 struct kvm_lapic_irq *irq,
+					 struct kvm_vcpu **dest_vcpu)
+{
+	struct kvm_apic_map *map;
+	bool ret = false;
+	struct kvm_lapic *dst = NULL;
+
+	if (irq->shorthand)
+		return false;
+
+	rcu_read_lock();
+	map = rcu_dereference(kvm->arch.apic_map);
+
+	if (!map)
+		goto out;
+
+	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
+		if (irq->dest_id == 0xFF)
+			goto out;
+
+		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+			WARN_ON_ONCE(1);
+			goto out;
+		}
+
+		dst = map->phys_map[irq->dest_id];
+		if (dst && kvm_apic_present(dst->vcpu))
+			*dest_vcpu = dst->vcpu;
+		else
+			goto out;
+	} else {
+		u16 cid;
+		unsigned long bitmap = 1;
+		int i, r = 0;
+
+		if (!kvm_apic_logical_map_valid(map)) {
+			WARN_ON_ONCE(1);
+			goto out;
+		}
+
+		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
+
+		if (cid >= ARRAY_SIZE(map->logical_map)) {
+			WARN_ON_ONCE(1);
+			goto out;
+		}
+
+		for_each_set_bit(i, &bitmap, 16) {
+			dst = map->logical_map[cid][i];
+			if (++r == 2)
+				goto out;
+		}
+
+		if (dst && kvm_apic_present(dst->vcpu))
+			*dest_vcpu = dst->vcpu;
+		else
+			goto out;
+	}
+
+	ret = true;
+out:
+	rcu_read_unlock();
+	return ret;
+}
+
+
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+			     struct kvm_vcpu **dest_vcpu)
+{
+	int i, r = 0;
+	struct kvm_vcpu *vcpu;
+
+	if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
+		return true;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+					irq->dest_id, irq->dest_mode))
+			continue;
+
+		if (++r == 2)
+			return false;
+
+		*dest_vcpu = vcpu;
+	}
+
+	return r == 1;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
+
 #define IOAPIC_ROUTING_ENTRY(irq) \
 	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
 	  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2a5ca97..9848cd50 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -136,13 +136,12 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
 /* The logical map is definitely wrong if we have multiple
  * modes at the same time.  (Physical map is always right.)
  */
-static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
+bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
 {
 	return !(map->mode & (map->mode - 1));
 }
 
-static inline void
-apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
+void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
 {
 	unsigned lid_bits;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 7195274..6798b87 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -169,4 +169,6 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
 
+void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid);
+bool kvm_apic_logical_map_valid(struct kvm_apic_map *map);
 #endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 04/13] KVM: Make struct kvm_irq_routing_table accessible
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (2 preceding siblings ...)
  2015-09-16  8:49 ` [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  8:50 ` [PATCH v8 05/13] KVM: make kvm_set_msi_irq() public Feng Wu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h | 14 ++++++++++++++
 virt/kvm/irqchip.c       | 10 ----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ac8d21..5f183fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -328,6 +328,20 @@ struct kvm_kernel_irq_routing_entry {
 	struct hlist_node link;
 };
 
+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+	u32 nr_rt_entries;
+	/*
+	 * Array indexed by gsi. Each entry contains list of irq chips
+	 * the gsi is connected to.
+	 */
+	struct hlist_head map[0];
+};
+
+#endif
+
 #ifndef KVM_PRIVATE_MEM_SLOTS
 #define KVM_PRIVATE_MEM_SLOTS 0
 #endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..2cf45d3 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,16 +31,6 @@
 #include <trace/events/kvm.h>
 #include "irq.h"
 
-struct kvm_irq_routing_table {
-	int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
-	u32 nr_rt_entries;
-	/*
-	 * Array indexed by gsi. Each entry contains list of irq chips
-	 * the gsi is connected to.
-	 */
-	struct hlist_head map[0];
-};
-
 int kvm_irq_map_gsi(struct kvm *kvm,
 		    struct kvm_kernel_irq_routing_entry *entries, int gsi)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 05/13] KVM: make kvm_set_msi_irq() public
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (3 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 04/13] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  8:50 ` [PATCH v8 06/13] vfio: Register/unregister irq_bypass_producer Feng Wu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
v8:
- Export kvm_set_msi_irq() so we can use it in vmx code

 arch/x86/include/asm/kvm_host.h | 4 ++++
 arch/x86/kvm/irq_comm.c         | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af11bca..daa6126 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
  */
 #define KVM_APIC_PV_EOI_PENDING	1
 
+struct kvm_kernel_irq_routing_entry;
+
 /*
  * We don't want allocation failures within the mmu code, so we preallocate
  * enough memory for a single page fault in a cache.
@@ -1207,4 +1209,6 @@ int x86_set_memory_region(struct kvm *kvm,
 bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			     struct kvm_vcpu **dest_vcpu);
 
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		     struct kvm_lapic_irq *irq);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 97ba1d6..add52d8 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -91,8 +91,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
-				   struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		     struct kvm_lapic_irq *irq)
 {
 	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
 
@@ -108,6 +108,7 @@ static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 	irq->level = 1;
 	irq->shorthand = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 		struct kvm *kvm, int irq_source_id, int level, bool line_status)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 06/13] vfio: Register/unregister irq_bypass_producer
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (4 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 05/13] KVM: make kvm_set_msi_irq() public Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  8:50 ` [PATCH v8 07/13] KVM: x86: Update IRTE for posted-interrupts Feng Wu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Merge "[PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices"
  into this patch.

v6:
- Make the add_consumer and del_consumer callbacks static
- Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
- Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
- Remove optional dummy callbacks for irq producer

 drivers/vfio/pci/Kconfig            | 1 +
 drivers/vfio/pci/vfio_pci_intrs.c   | 9 +++++++++
 drivers/vfio/pci/vfio_pci_private.h | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
 	tristate "VFIO support for PCI devices"
 	depends on VFIO && PCI && EVENTFD
 	select VFIO_VIRQFD
+	select IRQ_BYPASS_MANAGER
 	help
 	  Support for the PCI VFIO bus driver.  This is required to make
 	  use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c65299d 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 
 	if (vdev->ctx[vector].trigger) {
 		free_irq(irq, vdev->ctx[vector].trigger);
+		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(vdev->ctx[vector].trigger);
 		vdev->ctx[vector].trigger = NULL;
@@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return ret;
 	}
 
+	vdev->ctx[vector].producer.token = trigger;
+	vdev->ctx[vector].producer.irq = irq;
+	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+	if (unlikely(ret))
+		dev_info(&pdev->dev,
+		"irq bypass producer (token %p) registeration fails: %d\n",
+		vdev->ctx[vector].producer.token, ret);
+
 	vdev->ctx[vector].trigger = trigger;
 
 	return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
 
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/irqbypass.h>
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
 	struct virqfd		*mask;
 	char			*name;
 	bool			masked;
+	struct irq_bypass_producer	producer;
 };
 
 struct vfio_pci_device {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 07/13] KVM: x86: Update IRTE for posted-interrupts
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (5 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 06/13] vfio: Register/unregister irq_bypass_producer Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  8:50 ` [PATCH v8 08/13] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Move 'kvm_arch_update_pi_irte' to vmx.c as a callback
- Only update the PI irte when VM has assigned devices
- Add a trace point for VT-d posted-interrupts when we update
  or disable it for a specific irq.

 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/kvm/trace.h            | 33 ++++++++++++++++
 arch/x86/kvm/vmx.c              | 83 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              |  2 +
 4 files changed, 121 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index daa6126..8c44286 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -862,6 +862,9 @@ struct kvm_x86_ops {
 					   gfn_t offset, unsigned long mask);
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
+
+	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
+			      uint32_t guest_irq, bool set);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4eae7c3..539a9e4 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -974,6 +974,39 @@ TRACE_EVENT(kvm_enter_smm,
 		  __entry->smbase)
 );
 
+/*
+ * Tracepoint for VT-d posted-interrupts.
+ */
+TRACE_EVENT(kvm_pi_irte_update,
+	TP_PROTO(unsigned int vcpu_id, unsigned int gsi,
+		 unsigned int gvec, u64 pi_desc_addr, bool set),
+	TP_ARGS(vcpu_id, gsi, gvec, pi_desc_addr, set),
+
+	TP_STRUCT__entry(
+		__field(	unsigned int,	vcpu_id		)
+		__field(	unsigned int,	gsi		)
+		__field(	unsigned int,	gvec		)
+		__field(	u64,		pi_desc_addr	)
+		__field(	bool,		set		)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id	= vcpu_id;
+		__entry->gsi		= gsi;
+		__entry->gvec		= gvec;
+		__entry->pi_desc_addr	= pi_desc_addr;
+		__entry->set		= set;
+	),
+
+	TP_printk("VT-d PI is %s for this irq, vcpu %u, gsi: 0x%x, "
+		  "gvec: 0x%x, pi_desc_addr: 0x%llx",
+		  __entry->set ? "enabled and being updated" : "disabled",
+		  __entry->vcpu_id,
+		  __entry->gsi,
+		  __entry->gvec,
+		  __entry->pi_desc_addr)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 316f9bf..5a25651 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
 #include <asm/debugreg.h>
 #include <asm/kexec.h>
 #include <asm/apic.h>
+#include <asm/irq_remapping.h>
 
 #include "trace.h"
 #include "pmu.h"
@@ -605,6 +606,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+	return &(to_vmx(vcpu)->pi_desc);
+}
+
 #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
 #define FIELD(number, name)	[number] = VMCS12_OFFSET(name)
 #define FIELD64(number, name)	[number] = VMCS12_OFFSET(name), \
@@ -10344,6 +10350,81 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
 }
 
+/*
+ * vmx_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+		       uint32_t guest_irq, bool set)
+{
+	struct kvm_kernel_irq_routing_entry *e;
+	struct kvm_irq_routing_table *irq_rt;
+	struct kvm_lapic_irq irq;
+	struct kvm_vcpu *vcpu;
+	struct vcpu_data vcpu_info;
+	int idx, ret = -EINVAL;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
+		(!kvm_arch_has_assigned_device(kvm)))
+		return 0;
+
+	idx = srcu_read_lock(&kvm->irq_srcu);
+	irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+	BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+	hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+		if (e->type != KVM_IRQ_ROUTING_MSI)
+			continue;
+		/*
+		 * VT-d PI cannot support posting multicast/broadcast
+		 * interrupts to a vCPU, we still use interrupt remapping
+		 * for these kind of interrupts.
+		 *
+		 * For lowest-priority interrupts, we only support
+		 * those with single CPU as the destination, e.g. user
+		 * configures the interrupts via /proc/irq or uses
+		 * irqbalance to make the interrupts single-CPU.
+		 *
+		 * We will support full lowest-priority interrupt later.
+		 */
+
+		kvm_set_msi_irq(e, &irq);
+		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+			continue;
+
+		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
+		vcpu_info.vector = irq.vector;
+
+		trace_kvm_pi_irte_update(vcpu->vcpu_id, e->gsi,
+				vcpu_info.vector, vcpu_info.pi_desc_addr, set);
+
+		if (set)
+			ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+		else {
+			/* suppress notification event before unposting */
+			pi_set_sn(vcpu_to_pi_desc(vcpu));
+			ret = irq_set_vcpu_affinity(host_irq, NULL);
+			pi_clear_sn(vcpu_to_pi_desc(vcpu));
+		}
+
+		if (ret < 0) {
+			printk(KERN_INFO "%s: failed to update PI IRTE\n",
+					__func__);
+			goto out;
+		}
+	}
+
+	ret = 0;
+out:
+	srcu_read_unlock(&kvm->irq_srcu, idx);
+	return ret;
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -10461,6 +10542,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 
 	.pmu_ops = &intel_pmu_ops,
+
+	.update_pi_irte = vmx_update_pi_irte,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ef2560..9dcd501 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
 #include <asm/fpu/internal.h> /* Ugh! */
 #include <asm/pvclock.h>
 #include <asm/div64.h>
+#include <asm/irq_remapping.h>
 
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
@@ -8263,3 +8264,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 08/13] KVM: Implement IRQ bypass consumer callbacks for x86
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (6 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 07/13] KVM: x86: Update IRTE for posted-interrupts Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  8:50 ` [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Implement the following callbacks for x86:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop: dummy callback
- kvm_arch_irq_bypass_resume: dummy callback

and set CONFIG_HAVE_KVM_IRQ_BYPASS for x86.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Move the weak irq bypas stop and irq bypass start to this patch.
- Call kvm_x86_ops->update_pi_irte() instead of kvm_arch_update_pi_irte().

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Kconfig            |  1 +
 arch/x86/kvm/x86.c              | 44 +++++++++++++++++++++++++++++++++++++++++
 virt/kvm/eventfd.c              | 12 +++++++++++
 4 files changed, 58 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8c44286..0ddd353 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
 #include <linux/perf_event.h>
 #include <linux/pvclock_gtod.h>
 #include <linux/clocksource.h>
+#include <linux/irqbypass.h>
 
 #include <asm/pvclock-abi.h>
 #include <asm/desc.h>
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index c951d44..b90776f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQFD
 	select IRQ_BYPASS_MANAGER
+	select HAVE_KVM_IRQ_BYPASS
 	select HAVE_KVM_IRQ_ROUTING
 	select HAVE_KVM_EVENTFD
 	select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9dcd501..79dac02 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,6 +50,8 @@
 #include <linux/pci.h>
 #include <linux/timekeeper_internal.h>
 #include <linux/pvclock_gtod.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
 #include <trace/events/kvm.h>
 
 #define CREATE_TRACE_POINTS
@@ -8249,6 +8251,48 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+				      struct irq_bypass_producer *prod)
+{
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+	if (kvm_x86_ops->update_pi_irte) {
+		irqfd->producer = prod;
+		return kvm_x86_ops->update_pi_irte(irqfd->kvm,
+				prod->irq, irqfd->gsi, 1);
+	}
+
+	return -EINVAL;
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+				      struct irq_bypass_producer *prod)
+{
+	int ret;
+	struct kvm_kernel_irqfd *irqfd =
+		container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+	if (!kvm_x86_ops->update_pi_irte) {
+		WARN_ON(irqfd->producer != NULL);
+		return;
+	}
+
+	WARN_ON(irqfd->producer != prod);
+	irqfd->producer = NULL;
+
+	/*
+	 * When producer of consumer is unregistered, we change back to
+	 * remapped mode, so we can re-use the current implementation
+	 * when the irq is masked/disabed or the consumer side (KVM
+	 * int this case doesn't want to receive the interrupts.
+	*/
+	ret = kvm_x86_ops->update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 0);
+	if (ret)
+		printk(KERN_INFO "irq bypass consumer (token %p) unregistration"
+		       " fails: %d\n", irqfd->consumer.token, ret);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index d7a230f..c0a56a1 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -256,6 +256,18 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
 	write_seqcount_end(&irqfd->irq_entry_sc);
 }
 
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+void __attribute__((weak)) kvm_arch_irq_bypass_stop(
+				struct irq_bypass_consumer *cons)
+{
+}
+
+void __attribute__((weak)) kvm_arch_irq_bypass_start(
+				struct irq_bypass_consumer *cons)
+{
+}
+#endif
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (7 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 08/13] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  9:27   ` Paolo Bonzini
  2015-09-16  8:50 ` [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Remove callback .arch_update()
- Remove kvm_arch_irqfd_init()
- Call kvm_arch_update_irqfd_routing() instead.

 arch/x86/kvm/x86.c       |  7 +++++++
 include/linux/kvm_host.h |  2 ++
 virt/kvm/eventfd.c       | 19 ++++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79dac02..e189a94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8293,6 +8293,13 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
 		       " fails: %d\n", irqfd->consumer.token, ret);
 }
 
+int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
+				   uint32_t guest_irq, bool set)
+{
+	return !kvm_x86_ops->update_pi_irte ? -EINVAL :
+		kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5f183fb..feba1fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1174,6 +1174,8 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *,
 			   struct irq_bypass_producer *);
 void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *);
 void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *);
+int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
+				  uint32_t guest_irq, bool set);
 #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 #endif
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c0a56a1..89c9635 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -266,6 +266,12 @@ void __attribute__((weak)) kvm_arch_irq_bypass_start(
 				struct irq_bypass_consumer *cons)
 {
 }
+int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
+				struct kvm *kvm, unsigned int host_irq,
+				uint32_t guest_irq, bool set)
+{
+	return 0;
+}
 #endif
 
 static int
@@ -582,13 +588,24 @@ kvm_irqfd_release(struct kvm *kvm)
  */
 void kvm_irq_routing_update(struct kvm *kvm)
 {
+	int ret;
 	struct kvm_kernel_irqfd *irqfd;
 
 	spin_lock_irq(&kvm->irqfds.lock);
 
-	list_for_each_entry(irqfd, &kvm->irqfds.items, list)
+	list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
 		irqfd_update(kvm, irqfd);
 
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+		if (irqfd->producer) {
+			ret = kvm_arch_update_irqfd_routing(
+					irqfd->kvm, irqfd->producer->irq,
+					irqfd->gsi, 1);
+			WARN_ON(ret);
+		}
+#endif
+	}
+
 	spin_unlock_irq(&kvm->irqfds.lock);
 }
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (8 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  9:29   ` Paolo Bonzini
  2015-09-16  8:50 ` [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Add two wrapper fucntion vmx_vcpu_pi_load() and vmx_vcpu_pi_put().
- Only handle VT-d PI related logic when the VM has assigned devices.

 arch/x86/kvm/vmx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5a25651..5ceb280 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1943,6 +1943,52 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 	preempt_enable();
 }
 
+static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+	struct pi_desc old, new;
+	unsigned int dest;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
+		(!kvm_arch_has_assigned_device(vcpu->kvm)))
+		return;
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		/*
+		 * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
+		 * are two possible cases:
+		 * 1. After running 'pi_pre_block', context switch
+		 *    happened. For this case, 'sn' was set in
+		 *    vmx_vcpu_put(), so we need to clear it here.
+		 * 2. After running 'pi_pre_block', we were blocked,
+		 *    and woken up by some other guy. For this case,
+		 *    we don't need to do anything, 'pi_post_block'
+		 *    will do everything for us. However, we cannot
+		 *    check whether it is case #1 or case #2 here
+		 *    (maybe, not needed), so we also clear sn here,
+		 *    I think it is not a big deal.
+		 */
+		if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) {
+			if (vcpu->cpu != cpu) {
+				dest = cpu_physical_id(cpu);
+
+				if (x2apic_enabled())
+					new.ndst = dest;
+				else
+					new.ndst = (dest << 8) & 0xFF00;
+			}
+
+			/* set 'NV' to 'notification vector' */
+			new.nv = POSTED_INTR_VECTOR;
+		}
+
+		/* Allow posting non-urgent interrupts */
+		new.sn = 0;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+}
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -1993,10 +2039,27 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
 		vmx->loaded_vmcs->cpu = cpu;
 	}
+
+	vmx_vcpu_pi_load(vcpu, cpu);
+}
+
+static void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
+		(!kvm_arch_has_assigned_device(vcpu->kvm)))
+		return;
+
+	/* Set SN when the vCPU is preempted */
+	if (vcpu->preempted)
+		pi_set_sn(pi_desc);
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	vmx_vcpu_pi_put(vcpu);
+
 	__vmx_load_host_state(to_vmx(vcpu));
 	if (!vmm_exclusive) {
 		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (9 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  9:32   ` Paolo Bonzini
  2015-09-16  8:50 ` [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software Feng Wu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
v8:
- Rename 'pi_pre_block' to 'pre_block'
- Rename 'pi_post_block' to 'post_block'
- Change some comments
- Only add the vCPU to the blocking list when the VM has assigned devices.

 arch/x86/include/asm/kvm_host.h |  13 ++++
 arch/x86/kvm/vmx.c              | 157 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  53 +++++++++++---
 include/linux/kvm_host.h        |   3 +
 virt/kvm/kvm_main.c             |   3 +
 5 files changed, 217 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0ddd353..304fbb5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -552,6 +552,8 @@ struct kvm_vcpu_arch {
 	 */
 	bool write_fault_to_shadow_pgtable;
 
+	bool halted;
+
 	/* set at EPT violation at this point */
 	unsigned long exit_qualification;
 
@@ -864,6 +866,17 @@ struct kvm_x86_ops {
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
 
+	/*
+	 * Architecture specific hooks for vCPU blocking due to
+	 * HLT instruction.
+	 * Returns for .pre_block():
+	 *    - 0 means continue to block the vCPU.
+	 *    - 1 means we cannot block the vCPU since some event
+	 *        happens during this period, such as, 'ON' bit in
+	 *        posted-interrupts descriptor is set.
+	 */
+	int (*pre_block)(struct kvm_vcpu *vcpu);
+	void (*post_block)(struct kvm_vcpu *vcpu);
 	int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
 			      uint32_t guest_irq, bool set);
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5ceb280..9888c43 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -879,6 +879,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
 
+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
 static unsigned long *vmx_io_bitmap_a;
 static unsigned long *vmx_io_bitmap_b;
 static unsigned long *vmx_msr_bitmap_legacy;
@@ -1959,10 +1966,10 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 		/*
 		 * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
 		 * are two possible cases:
-		 * 1. After running 'pi_pre_block', context switch
+		 * 1. After running 'pre_block', context switch
 		 *    happened. For this case, 'sn' was set in
 		 *    vmx_vcpu_put(), so we need to clear it here.
-		 * 2. After running 'pi_pre_block', we were blocked,
+		 * 2. After running 'pre_block', we were blocked,
 		 *    and woken up by some other guy. For this case,
 		 *    we don't need to do anything, 'pi_post_block'
 		 *    will do everything for us. However, we cannot
@@ -2985,6 +2992,8 @@ static int hardware_enable(void)
 		return -EBUSY;
 
 	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
 	/*
 	 * Now we can enable the vmclear operation in kdump
@@ -6105,6 +6114,25 @@ static void update_ple_window_actual_max(void)
 			                    ple_window_grow, INT_MIN);
 }
 
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+	struct kvm_vcpu *vcpu;
+	int cpu = smp_processor_id();
+
+	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+			blocked_vcpu_list) {
+		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+		if (pi_test_on(pi_desc) == 1)
+			kvm_vcpu_kick(vcpu);
+	}
+	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
 static __init int hardware_setup(void)
 {
 	int r = -ENOMEM, i, msr;
@@ -6289,6 +6317,8 @@ static __init int hardware_setup(void)
 		kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
 	}
 
+	kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
 	return alloc_kvm_area();
 
 out8:
@@ -10414,6 +10444,126 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
 }
 
 /*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ *   we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ *      'NDST' <-- vcpu->pre_pcpu
+ *      'NV' <-- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ *   interrupt is posted for this vCPU, we cannot block it, in
+ *   this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pre_block(struct kvm_vcpu *vcpu)
+{
+	unsigned long flags;
+	unsigned int dest;
+	struct pi_desc old, new;
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
+		(!kvm_arch_has_assigned_device(vcpu->kvm)))
+		return 0;
+
+	vcpu->pre_pcpu = vcpu->cpu;
+	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+			  vcpu->pre_pcpu), flags);
+	list_add_tail(&vcpu->blocked_vcpu_list,
+		      &per_cpu(blocked_vcpu_on_cpu,
+		      vcpu->pre_pcpu));
+	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+			       vcpu->pre_pcpu), flags);
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		/*
+		 * We should not block the vCPU if
+		 * an interrupt is posted for it.
+		 */
+		if (pi_test_on(pi_desc) == 1) {
+			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+					  vcpu->pre_pcpu), flags);
+			list_del(&vcpu->blocked_vcpu_list);
+			spin_unlock_irqrestore(
+					&per_cpu(blocked_vcpu_on_cpu_lock,
+					vcpu->pre_pcpu), flags);
+			vcpu->pre_pcpu = -1;
+
+			return 1;
+		}
+
+		WARN((pi_desc->sn == 1),
+		     "Warning: SN field of posted-interrupts "
+		     "is set before blocking\n");
+
+		/*
+		 * Since vCPU can be preempted during this process,
+		 * vcpu->cpu could be different with pre_pcpu, we
+		 * need to set pre_pcpu as the destination of wakeup
+		 * notification event, then we can find the right vCPU
+		 * to wakeup in wakeup handler if interrupts happen
+		 * when the vCPU is in blocked state.
+		 */
+		dest = cpu_physical_id(vcpu->pre_pcpu);
+
+		if (x2apic_enabled())
+			new.ndst = dest;
+		else
+			new.ndst = (dest << 8) & 0xFF00;
+
+		/* set 'NV' to 'wakeup vector' */
+		new.nv = POSTED_INTR_WAKEUP_VECTOR;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+
+	return 0;
+}
+
+static void vmx_post_block(struct kvm_vcpu *vcpu)
+{
+	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+	struct pi_desc old, new;
+	unsigned int dest;
+	unsigned long flags;
+
+	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
+		(!kvm_arch_has_assigned_device(vcpu->kvm)))
+		return;
+
+	do {
+		old.control = new.control = pi_desc->control;
+
+		dest = cpu_physical_id(vcpu->cpu);
+
+		if (x2apic_enabled())
+			new.ndst = dest;
+		else
+			new.ndst = (dest << 8) & 0xFF00;
+
+		/* Allow posting non-urgent interrupts */
+		new.sn = 0;
+
+		/* set 'NV' to 'notification vector' */
+		new.nv = POSTED_INTR_VECTOR;
+	} while (cmpxchg(&pi_desc->control, old.control,
+			new.control) != old.control);
+
+	if(vcpu->pre_pcpu != -1) {
+		spin_lock_irqsave(
+			&per_cpu(blocked_vcpu_on_cpu_lock,
+			vcpu->pre_pcpu), flags);
+		list_del(&vcpu->blocked_vcpu_list);
+		spin_unlock_irqrestore(
+			&per_cpu(blocked_vcpu_on_cpu_lock,
+			vcpu->pre_pcpu), flags);
+		vcpu->pre_pcpu = -1;
+	}
+}
+
+/*
  * vmx_update_pi_irte - set IRTE for Posted-Interrupts
  *
  * @kvm: kvm
@@ -10604,6 +10754,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
 	.flush_log_dirty = vmx_flush_log_dirty,
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 
+	.pre_block = vmx_pre_block,
+	.post_block = vmx_post_block,
+
 	.pmu_ops = &intel_pmu_ops,
 
 	.update_pi_irte = vmx_update_pi_irte,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e189a94..106a0c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5869,7 +5869,12 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.halt_exits;
 	if (irqchip_in_kernel(vcpu->kvm)) {
-		vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+		/* Handle posted-interrupt when vCPU is to be halted */
+		if (!kvm_x86_ops->pre_block ||
+				kvm_x86_ops->pre_block(vcpu) == 0) {
+			vcpu->arch.halted = true;
+			vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+		}
 		return 1;
 	} else {
 		vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6518,6 +6523,20 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_reload_apic_access_page(vcpu);
 	}
 
+	/*
+	 * KVM_REQ_EVENT is not set when posted interrupts are set by
+	 * VT-d hardware, so we have to update RVI unconditionally.
+	 */
+	if (kvm_lapic_enabled(vcpu)) {
+		/*
+		 * Update architecture specific hints for APIC
+		 * virtual interrupt delivery.
+		 */
+		if (kvm_x86_ops->hwapic_irr_update)
+			kvm_x86_ops->hwapic_irr_update(vcpu,
+				kvm_lapic_find_highest_irr(vcpu));
+	}
+
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
 		kvm_apic_accept_events(vcpu);
 		if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6534,13 +6553,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {
-			/*
-			 * Update architecture specific hints for APIC
-			 * virtual interrupt delivery.
-			 */
-			if (kvm_x86_ops->hwapic_irr_update)
-				kvm_x86_ops->hwapic_irr_update(vcpu,
-					kvm_lapic_find_highest_irr(vcpu));
 			update_cr8_intercept(vcpu);
 			kvm_lapic_sync_to_vapic(vcpu);
 		}
@@ -6711,10 +6723,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 
 	for (;;) {
 		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
-		    !vcpu->arch.apf.halted)
+		    !vcpu->arch.apf.halted) {
+			/*
+			 * For some cases, we can get here with
+			 * vcpu->arch.halted being true.
+			 */
+			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
+				kvm_x86_ops->post_block(vcpu);
+				vcpu->arch.halted = false;
+			}
+
 			r = vcpu_enter_guest(vcpu);
-		else
+		} else {
 			r = vcpu_block(kvm, vcpu);
+
+			/*
+			 * post_block() must be called after
+			 * pre_block() which is called in
+			 * kvm_vcpu_halt().
+			 */
+			if (kvm_x86_ops->post_block && vcpu->arch.halted) {
+				kvm_x86_ops->post_block(vcpu);
+				vcpu->arch.halted = false;
+			}
+		}
+
 		if (r <= 0)
 			break;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index feba1fb..bf462e7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -231,6 +231,9 @@ struct kvm_vcpu {
 	unsigned long requests;
 	unsigned long guest_debug;
 
+	int pre_pcpu;
+	struct list_head blocked_vcpu_list;
+
 	struct mutex mutex;
 	struct kvm_run *run;
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..191c7eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
 	init_waitqueue_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
+	vcpu->pre_pcpu = -1;
+	INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
 	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!page) {
 		r = -ENOMEM;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (10 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  9:32   ` Paolo Bonzini
  2015-09-16  8:50 ` [PATCH v8 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
  2015-09-16  9:34 ` [PATCH v8 00/13] Add VT-d Posted-Interrupts support Paolo Bonzini
  13 siblings, 1 reply; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot post
interrupts when 'SN' is set.

If the vcpu is in guest mode, it cannot have been scheduled out,
and that's the only case when SN is set currently, warning if
SN is set.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9888c43..58fbbc6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4498,6 +4498,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_SMP
 	if (vcpu->mode == IN_GUEST_MODE) {
+		struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+		/*
+		 * Currently, we don't support urgent interrupt,
+		 * all interrupts are recognized as non-urgent
+		 * interrupt, so we cannot post interrupts when
+		 * 'SN' is set.
+		 *
+		 * If the vcpu is in guest mode, it means it is
+		 * running instead of being scheduled out and
+		 * waiting in the run queue, and that's the only
+		 * case when 'SN' is set currently, warning if
+		 * 'SN' is set.
+		 */
+		WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
+
 		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
 				POSTED_INTR_VECTOR);
 		return true;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v8 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (11 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software Feng Wu
@ 2015-09-16  8:50 ` Feng Wu
  2015-09-16  9:34 ` [PATCH v8 00/13] Add VT-d Posted-Interrupts support Paolo Bonzini
  13 siblings, 0 replies; 32+ messages in thread
From: Feng Wu @ 2015-09-16  8:50 UTC (permalink / raw)
  To: pbonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Feng Wu

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/kernel-parameters.txt |  1 +
 drivers/iommu/irq_remapping.c       | 12 ++++++++----
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..52aca36 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1547,6 +1547,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			nosid	disable Source ID checking
 			no_x2apic_optout
 				BIOS x2APIC opt-out request will be ignored
+			nopost	disable Interrupt Posting
 
 	iomem=		Disable strict checking of access to MMIO memory
 		strict	regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d99930..d8c3997 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -22,7 +22,7 @@ int irq_remap_broken;
 int disable_sourceid_checking;
 int no_x2apic_optout;
 
-int disable_irq_post = 1;
+int disable_irq_post = 0;
 
 static int disable_irq_remap;
 static struct irq_remap_ops *remap_ops;
@@ -58,14 +58,18 @@ static __init int setup_irqremap(char *str)
 		return -EINVAL;
 
 	while (*str) {
-		if (!strncmp(str, "on", 2))
+		if (!strncmp(str, "on", 2)) {
 			disable_irq_remap = 0;
-		else if (!strncmp(str, "off", 3))
+			disable_irq_post = 0;
+		} else if (!strncmp(str, "off", 3)) {
 			disable_irq_remap = 1;
-		else if (!strncmp(str, "nosid", 5))
+			disable_irq_post = 1;
+		} else if (!strncmp(str, "nosid", 5))
 			disable_sourceid_checking = 1;
 		else if (!strncmp(str, "no_x2apic_optout", 16))
 			no_x2apic_optout = 1;
+		else if (!strncmp(str, "nopost", 6))
+			disable_irq_post = 1;
 
 		str += strcspn(str, ",");
 		while (*str == ',')
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-16  8:49 ` [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
@ 2015-09-16  9:23   ` Paolo Bonzini
  2015-09-17  3:17     ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:23 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:49, Feng Wu wrote:
> This patch defines a new interface kvm_intr_is_single_vcpu(),
> which can returns whether the interrupt is for single-CPU or not.
> 
> It is used by VT-d PI, since now we only support single-CPU
> interrupts, For lowest-priority interrupts, if user configures
> it via /proc/irq or uses irqbalance to make it single-CPU, we
> can use PI to deliver the interrupts to it. Full functionality
> of lowest-priority support will be added later.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> v8:
> - Some optimizations in kvm_intr_is_single_vcpu().
> - Expose kvm_intr_is_single_vcpu() so we can use it in vmx code.
> - Add kvm_intr_is_single_vcpu_fast() as the fast path to find
>   the target vCPU for the single-destination interrupt
> 
>  arch/x86/include/asm/kvm_host.h |  3 ++
>  arch/x86/kvm/irq_comm.c         | 94 +++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/lapic.c            |  5 +--
>  arch/x86/kvm/lapic.h            |  2 +
>  4 files changed, 101 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 49ec903..af11bca 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
>  int x86_set_memory_region(struct kvm *kvm,
>  			  const struct kvm_userspace_memory_region *mem);
>  
> +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> +			     struct kvm_vcpu **dest_vcpu);
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 9efff9e..97ba1d6 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -297,6 +297,100 @@ out:
>  	return r;
>  }
>  
> +static bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
> +					 struct kvm_lapic_irq *irq,
> +					 struct kvm_vcpu **dest_vcpu)

Please put this in lapic.c, similar to kvm_irq_delivery_to_apic_fast, so
that you do not have to export other functions.

> +{
> +	struct kvm_apic_map *map;
> +	bool ret = false;
> +	struct kvm_lapic *dst = NULL;
> +
> +	if (irq->shorthand)
> +		return false;
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(kvm->arch.apic_map);
> +
> +	if (!map)
> +		goto out;
> +
> +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> +		if (irq->dest_id == 0xFF)
> +			goto out;
> +
> +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {

Warning here is wrong, the guest can trigger it.

> +			WARN_ON_ONCE(1);
> +			goto out;
> +		}
> +
> +		dst = map->phys_map[irq->dest_id];
> +		if (dst && kvm_apic_present(dst->vcpu))
> +			*dest_vcpu = dst->vcpu;
> +		else
> +			goto out;
> +	} else {
> +		u16 cid;
> +		unsigned long bitmap = 1;
> +		int i, r = 0;
> +
> +		if (!kvm_apic_logical_map_valid(map)) {
> +			WARN_ON_ONCE(1);

Same here.

> +			goto out;
> +		}
> +
> +		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
> +
> +		if (cid >= ARRAY_SIZE(map->logical_map)) {
> +			WARN_ON_ONCE(1);

Same here.

Otherwise looks good.

Paolo

> +			goto out;
> +		}
> +
> +		for_each_set_bit(i, &bitmap, 16) {
> +			dst = map->logical_map[cid][i];
> +			if (++r == 2)
> +				goto out;
> +		}
> +
> +		if (dst && kvm_apic_present(dst->vcpu))
> +			*dest_vcpu = dst->vcpu;
> +		else
> +			goto out;
> +	}
> +
> +	ret = true;
> +out:
> +	rcu_read_unlock();
> +	return ret;
> +}
> +
> +
> +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> +			     struct kvm_vcpu **dest_vcpu)
> +{
> +	int i, r = 0;
> +	struct kvm_vcpu *vcpu;
> +
> +	if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
> +		return true;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> +					irq->dest_id, irq->dest_mode))
> +			continue;
> +
> +		if (++r == 2)
> +			return false;
> +
> +		*dest_vcpu = vcpu;
> +	}
> +
> +	return r == 1;
> +}
> +EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
> +
>  #define IOAPIC_ROUTING_ENTRY(irq) \
>  	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
>  	  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2a5ca97..9848cd50 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -136,13 +136,12 @@ static inline int kvm_apic_id(struct kvm_lapic *apic)
>  /* The logical map is definitely wrong if we have multiple
>   * modes at the same time.  (Physical map is always right.)
>   */
> -static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
> +bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
>  {
>  	return !(map->mode & (map->mode - 1));
>  }
>  
> -static inline void
> -apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
> +void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
>  {
>  	unsigned lid_bits;
>  
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 7195274..6798b87 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -169,4 +169,6 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
>  
>  void wait_lapic_expire(struct kvm_vcpu *vcpu);
>  
> +void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid);
> +bool kvm_apic_logical_map_valid(struct kvm_apic_map *map);
>  #endif
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  2015-09-16  8:50 ` [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
@ 2015-09-16  9:27   ` Paolo Bonzini
  2015-09-17  1:51     ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:27 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:50, Feng Wu wrote:
> +int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
> +				   uint32_t guest_irq, bool set)
> +{
> +	return !kvm_x86_ops->update_pi_irte ? -EINVAL :
> +		kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
> +}
> +

Just use "if" here.  No need to resend if this is the only comment.

> 
>  }
> +int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
> +				struct kvm *kvm, unsigned

Empty line after "}".

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
  2015-09-16  8:50 ` [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
@ 2015-09-16  9:29   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:29 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:50, Feng Wu wrote:
> +
> +	if (!irq_remapping_cap(IRQ_POSTING_CAP) ||
> +		(!kvm_arch_has_assigned_device(vcpu->kvm)))
> +		return;
> +

Better:

	if (!arch_has_assigned_device(vcpu->kvm)) ||
	    !irq_remapping_cap(IRQ_POSTING_CAP))
		return;

(In the future we might add a static_key here).

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
  2015-09-16  8:50 ` [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
@ 2015-09-16  9:32   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:32 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:50, Feng Wu wrote:
>  		 * are two possible cases:
> -		 * 1. After running 'pi_pre_block', context switch
> +		 * 1. After running 'pre_block', context switch

Please fold this in the previous patch.

>  		 *    happened. For this case, 'sn' was set in
>  		 *    vmx_vcpu_put(), so we need to clear it here.
> -		 * 2. After running 'pi_pre_block', we were blocked,
> +		 * 2. After running 'pre_block', we were blocked,
>  		 *    and woken up by some other guy. For this case,

(Same).

> +	spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +	list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
> +			blocked_vcpu_list) {
> +		struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> +
> +		if (pi_test_on(pi_desc) == 1)
> +			kvm_vcpu_kick(vcpu);
> +	}
> +	spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
> +}

Please document the lock in Documentation/virtual/kvm/locking.txt.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software
  2015-09-16  8:50 ` [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software Feng Wu
@ 2015-09-16  9:32   ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:32 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:50, Feng Wu wrote:
> Currently, we don't support urgent interrupt, all interrupts
> are recognized as non-urgent interrupt, so we cannot post
> interrupts when 'SN' is set.
> 
> If the vcpu is in guest mode, it cannot have been scheduled out,
> and that's the only case when SN is set currently, warning if
> SN is set.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Please fold this into patch 10.

Paolo

> ---
>  arch/x86/kvm/vmx.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 9888c43..58fbbc6 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4498,6 +4498,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
>  {
>  #ifdef CONFIG_SMP
>  	if (vcpu->mode == IN_GUEST_MODE) {
> +		struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +		/*
> +		 * Currently, we don't support urgent interrupt,
> +		 * all interrupts are recognized as non-urgent
> +		 * interrupt, so we cannot post interrupts when
> +		 * 'SN' is set.
> +		 *
> +		 * If the vcpu is in guest mode, it means it is
> +		 * running instead of being scheduled out and
> +		 * waiting in the run queue, and that's the only
> +		 * case when 'SN' is set currently, warning if
> +		 * 'SN' is set.
> +		 */
> +		WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
> +
>  		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
>  				POSTED_INTR_VECTOR);
>  		return true;
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 00/13] Add VT-d Posted-Interrupts support
  2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
                   ` (12 preceding siblings ...)
  2015-09-16  8:50 ` [PATCH v8 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
@ 2015-09-16  9:34 ` Paolo Bonzini
  13 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-16  9:34 UTC (permalink / raw)
  To: Feng Wu, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 16/09/2015 10:49, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
> 
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
> 
> v8:
> refer to the changelog in each patch

Thanks, it mostly looks good.

Since we've more or less converged, could you post the whole series for
v9, including the other prerequisite series?

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  2015-09-16  9:27   ` Paolo Bonzini
@ 2015-09-17  1:51     ` Wu, Feng
  2015-09-17  9:38       ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2015-09-17  1:51 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, September 16, 2015 5:27 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct
> kvm_kernel_irqfd'
> 
> 
> 
> On 16/09/2015 10:50, Feng Wu wrote:
> > +int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
> > +				   uint32_t guest_irq, bool set)
> > +{
> > +	return !kvm_x86_ops->update_pi_irte ? -EINVAL :
> > +		kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
> > +}
> > +
> 
> Just use "if" here.  No need to resend if this is the only comment.

I am sorry, I don't quite understand. Do you mean I don't need to include
this patch in v9? If so, what about other patches with your Reviewed-by?

Thanks,
Feng

> 
> >
> >  }
> > +int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
> > +				struct kvm *kvm, unsigned
> 
> Empty line after "}".
> 
> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-16  9:23   ` Paolo Bonzini
@ 2015-09-17  3:17     ` Wu, Feng
  2015-09-17  9:42       ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2015-09-17  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, September 16, 2015 5:23 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> 
> 
> On 16/09/2015 10:49, Feng Wu wrote:
> > This patch defines a new interface kvm_intr_is_single_vcpu(),
> > which can returns whether the interrupt is for single-CPU or not.
> >
> > It is used by VT-d PI, since now we only support single-CPU
> > interrupts, For lowest-priority interrupts, if user configures
> > it via /proc/irq or uses irqbalance to make it single-CPU, we
> > can use PI to deliver the interrupts to it. Full functionality
> > of lowest-priority support will be added later.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> > v8:
> > - Some optimizations in kvm_intr_is_single_vcpu().
> > - Expose kvm_intr_is_single_vcpu() so we can use it in vmx code.
> > - Add kvm_intr_is_single_vcpu_fast() as the fast path to find
> >   the target vCPU for the single-destination interrupt
> >
> >  arch/x86/include/asm/kvm_host.h |  3 ++
> >  arch/x86/kvm/irq_comm.c         | 94
> +++++++++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/lapic.c            |  5 +--
> >  arch/x86/kvm/lapic.h            |  2 +
> >  4 files changed, 101 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 49ec903..af11bca 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
> >  int x86_set_memory_region(struct kvm *kvm,
> >  			  const struct kvm_userspace_memory_region *mem);
> >
> > +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > +			     struct kvm_vcpu **dest_vcpu);
> > +
> >  #endif /* _ASM_X86_KVM_HOST_H */
> > diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> > index 9efff9e..97ba1d6 100644
> > --- a/arch/x86/kvm/irq_comm.c
> > +++ b/arch/x86/kvm/irq_comm.c
> > @@ -297,6 +297,100 @@ out:
> >  	return r;
> >  }
> >
> > +static bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
> > +					 struct kvm_lapic_irq *irq,
> > +					 struct kvm_vcpu **dest_vcpu)
> 
> Please put this in lapic.c, similar to kvm_irq_delivery_to_apic_fast, so
> that you do not have to export other functions.
> 
> > +{
> > +	struct kvm_apic_map *map;
> > +	bool ret = false;
> > +	struct kvm_lapic *dst = NULL;
> > +
> > +	if (irq->shorthand)
> > +		return false;
> > +
> > +	rcu_read_lock();
> > +	map = rcu_dereference(kvm->arch.apic_map);
> > +
> > +	if (!map)
> > +		goto out;
> > +
> > +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> > +		if (irq->dest_id == 0xFF)
> > +			goto out;
> > +
> > +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> 
> Warning here is wrong, the guest can trigger it.

Could you please share more information about how the guest
triggers these conditions (including the following two), Thanks
a lot!

Thanks,
Feng

> 
> > +			WARN_ON_ONCE(1);
> > +			goto out;
> > +		}
> > +
> > +		dst = map->phys_map[irq->dest_id];
> > +		if (dst && kvm_apic_present(dst->vcpu))
> > +			*dest_vcpu = dst->vcpu;
> > +		else
> > +			goto out;
> > +	} else {
> > +		u16 cid;
> > +		unsigned long bitmap = 1;
> > +		int i, r = 0;
> > +
> > +		if (!kvm_apic_logical_map_valid(map)) {
> > +			WARN_ON_ONCE(1);
> 
> Same here.
> 
> > +			goto out;
> > +		}
> > +
> > +		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
> > +
> > +		if (cid >= ARRAY_SIZE(map->logical_map)) {
> > +			WARN_ON_ONCE(1);
> 
> Same here.
> 
> Otherwise looks good.
> 
> Paolo
> 
> > +			goto out;
> > +		}
> > +
> > +		for_each_set_bit(i, &bitmap, 16) {
> > +			dst = map->logical_map[cid][i];
> > +			if (++r == 2)
> > +				goto out;
> > +		}
> > +
> > +		if (dst && kvm_apic_present(dst->vcpu))
> > +			*dest_vcpu = dst->vcpu;
> > +		else
> > +			goto out;
> > +	}
> > +
> > +	ret = true;
> > +out:
> > +	rcu_read_unlock();
> > +	return ret;
> > +}
> > +
> > +
> > +bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > +			     struct kvm_vcpu **dest_vcpu)
> > +{
> > +	int i, r = 0;
> > +	struct kvm_vcpu *vcpu;
> > +
> > +	if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
> > +		return true;
> > +
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		if (!kvm_apic_present(vcpu))
> > +			continue;
> > +
> > +		if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> > +					irq->dest_id, irq->dest_mode))
> > +			continue;
> > +
> > +		if (++r == 2)
> > +			return false;
> > +
> > +		*dest_vcpu = vcpu;
> > +	}
> > +
> > +	return r == 1;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
> > +
> >  #define IOAPIC_ROUTING_ENTRY(irq) \
> >  	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
> >  	  .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 2a5ca97..9848cd50 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -136,13 +136,12 @@ static inline int kvm_apic_id(struct kvm_lapic
> *apic)
> >  /* The logical map is definitely wrong if we have multiple
> >   * modes at the same time.  (Physical map is always right.)
> >   */
> > -static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
> > +bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
> >  {
> >  	return !(map->mode & (map->mode - 1));
> >  }
> >
> > -static inline void
> > -apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
> > +void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16
> *lid)
> >  {
> >  	unsigned lid_bits;
> >
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 7195274..6798b87 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -169,4 +169,6 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu,
> int vector);
> >
> >  void wait_lapic_expire(struct kvm_vcpu *vcpu);
> >
> > +void apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16
> *lid);
> > +bool kvm_apic_logical_map_valid(struct kvm_apic_map *map);
> >  #endif
> >

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  2015-09-17  1:51     ` Wu, Feng
@ 2015-09-17  9:38       ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-17  9:38 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 17/09/2015 03:51, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Wednesday, September 16, 2015 5:27 PM
>> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
>> mtosatti@redhat.com
>> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
>> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct
>> kvm_kernel_irqfd'
>>
>>
>>
>> On 16/09/2015 10:50, Feng Wu wrote:
>>> +int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
>>> +				   uint32_t guest_irq, bool set)
>>> +{
>>> +	return !kvm_x86_ops->update_pi_irte ? -EINVAL :
>>> +		kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
>>> +}
>>> +
>>
>> Just use "if" here.  No need to resend if this is the only comment.
> 
> I am sorry, I don't quite understand. Do you mean I don't need to include
> this patch in v9? If so, what about other patches with your Reviewed-by?

No, I just wrote this email first.  If this was the only comment, I
could have fixed it up myself.  But since there were a few other
comments, please send v9 with all the required changes.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17  3:17     ` Wu, Feng
@ 2015-09-17  9:42       ` Paolo Bonzini
  2015-09-17 13:36         ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-17  9:42 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel



On 17/09/2015 05:17, Wu, Feng wrote:
>>> > > +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
>>> > > +		if (irq->dest_id == 0xFF)
>>> > > +			goto out;
>>> > > +
>>> > > +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
>> > 
>> > Warning here is wrong, the guest can trigger it.
> Could you please share more information about how the guest
> triggers these conditions (including the following two), Thanks
> a lot!

irq->dest_id is a 16-bit value, so it can be > 255.

> +		if (!kvm_apic_logical_map_valid(map)) {
> +			WARN_ON_ONCE(1);

Here, the guest can trigger it by setting a few APICs in flat mode and
others in cluster mode, for example.

> +		if (cid >= ARRAY_SIZE(map->logical_map)) {
> +			WARN_ON_ONCE(1);

In x2apic mode irq->dest_id could have bits 12..15 set.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17  9:42       ` Paolo Bonzini
@ 2015-09-17 13:36         ` Wu, Feng
  2015-09-17 14:24           ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2015-09-17 13:36 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, September 17, 2015 5:42 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> 
> 
> On 17/09/2015 05:17, Wu, Feng wrote:
> >>> > > +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> >>> > > +		if (irq->dest_id == 0xFF)
> >>> > > +			goto out;
> >>> > > +
> >>> > > +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> >> >
> >> > Warning here is wrong, the guest can trigger it.
> > Could you please share more information about how the guest
> > triggers these conditions (including the following two), Thanks
> > a lot!
> 
> irq->dest_id is a 16-bit value, so it can be > 255.

Yes, irq->dest_id is defined as u32, but by looking the current KVM
code, seems desst_id is used as an u8 variable, even in x2apic mode
the dest_id will not beyond 255 (except for broadcast dest in in x2apic
mode). Correct me if I am wrong. Thanks a lot!

> 
> > +		if (!kvm_apic_logical_map_valid(map)) {
> > +			WARN_ON_ONCE(1);
> 
> Here, the guest can trigger it by setting a few APICs in flat mode and
> others in cluster mode, for example.

Oh, right, the logical map works only when the destination mode of all
the vCPUs are the same.

> 
> > +		if (cid >= ARRAY_SIZE(map->logical_map)) {
> > +			WARN_ON_ONCE(1);
> 
> In x2apic mode irq->dest_id could have bits 12..15 set.

cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
below:

u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));

So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.
Do I miss something here? Thanks!

Thanks,
Feng

> 
> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 13:36         ` Wu, Feng
@ 2015-09-17 14:24           ` Paolo Bonzini
  2015-09-17 15:58             ` Radim Krčmář
  2015-09-17 23:15             ` Wu, Feng
  0 siblings, 2 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-17 14:24 UTC (permalink / raw)
  To: Wu, Feng, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Radim Krčmář

>> On 17/09/2015 05:17, Wu, Feng wrote:
>>>>>>> +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
>>>>>>> +		if (irq->dest_id == 0xFF)
>>>>>>> +			goto out;
>>>>>>> +
>>>>>>> +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
>>>>>
>>>>> Warning here is wrong, the guest can trigger it.
>>> Could you please share more information about how the guest
>>> triggers these conditions (including the following two), Thanks
>>> a lot!
>>
>> irq->dest_id is a 16-bit value, so it can be > 255.
> 
> Yes, irq->dest_id is defined as u32, but by looking the current KVM
> code, seems desst_id is used as an u8 variable, even in x2apic mode
> the dest_id will not beyond 255 (except for broadcast dest in in x2apic
> mode). Correct me if I am wrong. Thanks a lot!

Actually you're right, the MSI destination is only 8 bits.  I was
confused because of

#define	 MSI_ADDR_DEST_ID_MASK		0x00ffff0

in arch/x86/include/asm/msidef.h.  But there may be a bug, see below...

>>> +		if (cid >= ARRAY_SIZE(map->logical_map)) {
>>> +			WARN_ON_ONCE(1);
>>
>> In x2apic mode irq->dest_id could have bits 12..15 set.
> 
> cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
> in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
> below:
> 
> u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> 
> So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.

I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
buggy in x2apic mode.

It does:

        if (apic_x2apic_mode(apic))
                return ((logical_id >> 16) == (mda >> 16))
                       && (logical_id & mda & 0xffff) != 0;

But mda is only 8-bits for MSI and IOAPIC interrupts.

Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
!= APIC_BROADCAST case?  It never triggers with Linux because it uses
only the physical mode (that's not super-easy to see;
ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
is allocated with kzalloc).

> Do I miss something here? Thanks!

No, you were right.

But still I think the WARNs are unnecessary; it is conceivable that some
future chipset adds support for more than 8-bits in the dest_id.

Paolo

> Thanks,
> Feng
> 
>>
>> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 14:24           ` Paolo Bonzini
@ 2015-09-17 15:58             ` Radim Krčmář
  2015-09-17 16:00               ` Paolo Bonzini
  2015-09-17 23:15             ` Wu, Feng
  1 sibling, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2015-09-17 15:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wu, Feng, alex.williamson, joro, mtosatti, eric.auger, kvm,
	iommu, linux-kernel

2015-09-17 16:24+0200, Paolo Bonzini:
> I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
> buggy in x2apic mode.
> 
> It does:
> 
>         if (apic_x2apic_mode(apic))
>                 return ((logical_id >> 16) == (mda >> 16))
>                        && (logical_id & mda & 0xffff) != 0;
> 
> But mda is only 8-bits for MSI and IOAPIC interrupts.
> 
> Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
> != APIC_BROADCAST case?  It never triggers with Linux because it uses
> only the physical mode (that's not super-easy to see;
> ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
> is allocated with kzalloc).

KVM handles that case, it's just convoluted.
(I wish we scrapped the IR-less x2APIC mode.)

For interrupts from MSI and IOxAPIC:
- Flat logical interrupts are delivered as if we had natural
  (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
- Cluster logical doesn't work much, it's interpreted like flat logical.
  I didn't care about xAPIC cluster because Linux, the sole user of our
  paravirtualized x2APIC, doesn't configure it.

I'll paste kvm_apic_mda() source for better explanation:

  static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
                                                struct kvm_lapic *target)
  {
  	bool ipi = source != NULL;
  	bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
  
  	if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
  		return X2APIC_BROADCAST;
  
  	return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
  }

MSI/IOxAPIC interrupt means that source is NULL and if the target is in
x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
kvm_apic_match_logical_addr().
xAPIC address are only 8 bit long so they always get delivered to x2APIC
cluster 0, where first 16 bits work like xAPIC flat logical mode.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 15:58             ` Radim Krčmář
@ 2015-09-17 16:00               ` Paolo Bonzini
  2015-09-17 23:18                 ` Wu, Feng
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-17 16:00 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Wu, Feng, alex.williamson, joro, mtosatti, eric.auger, kvm,
	iommu, linux-kernel



On 17/09/2015 17:58, Radim Krčmář wrote:
> For interrupts from MSI and IOxAPIC:
> - Flat logical interrupts are delivered as if we had natural
>   (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
> - Cluster logical doesn't work much, it's interpreted like flat logical.
>   I didn't care about xAPIC cluster because Linux, the sole user of our
>   paravirtualized x2APIC, doesn't configure it.
> 
> I'll paste kvm_apic_mda() source for better explanation:
> 
>   static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
>                                                 struct kvm_lapic *target)
>   {
>   	bool ipi = source != NULL;
>   	bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
>   
>   	if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
>   		return X2APIC_BROADCAST;
>   
>   	return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
>   }
> 
> MSI/IOxAPIC interrupt means that source is NULL and if the target is in
> x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
> xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
> kvm_apic_match_logical_addr().
> xAPIC address are only 8 bit long so they always get delivered to x2APIC
> cluster 0, where first 16 bits work like xAPIC flat logical mode.

Ok, I was wondering whether this was the correct interpretation.  Thanks!

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 14:24           ` Paolo Bonzini
  2015-09-17 15:58             ` Radim Krčmář
@ 2015-09-17 23:15             ` Wu, Feng
  1 sibling, 0 replies; 32+ messages in thread
From: Wu, Feng @ 2015-09-17 23:15 UTC (permalink / raw)
  To: Paolo Bonzini, alex.williamson, joro, mtosatti
  Cc: eric.auger, kvm, iommu, linux-kernel, Radim Kr?má?, Wu, Feng



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Thursday, September 17, 2015 10:25 PM
> To: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com
> Cc: eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; Radim
> Krčmář
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> >> On 17/09/2015 05:17, Wu, Feng wrote:
> >>>>>>> +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> >>>>>>> +		if (irq->dest_id == 0xFF)
> >>>>>>> +			goto out;
> >>>>>>> +
> >>>>>>> +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> >>>>>
> >>>>> Warning here is wrong, the guest can trigger it.
> >>> Could you please share more information about how the guest
> >>> triggers these conditions (including the following two), Thanks
> >>> a lot!
> >>
> >> irq->dest_id is a 16-bit value, so it can be > 255.
> >
> > Yes, irq->dest_id is defined as u32, but by looking the current KVM
> > code, seems desst_id is used as an u8 variable, even in x2apic mode
> > the dest_id will not beyond 255 (except for broadcast dest in in x2apic
> > mode). Correct me if I am wrong. Thanks a lot!
> 
> Actually you're right, the MSI destination is only 8 bits.  I was
> confused because of
> 
> #define	 MSI_ADDR_DEST_ID_MASK		0x00ffff0
> 
> in arch/x86/include/asm/msidef.h.  But there may be a bug, see below...
> 
> >>> +		if (cid >= ARRAY_SIZE(map->logical_map)) {
> >>> +			WARN_ON_ONCE(1);
> >>
> >> In x2apic mode irq->dest_id could have bits 12..15 set.
> >
> > cid is gotten from bit 16 ..31 of the ldr (in apic_logical_id()), and
> > in x2apic mode, ldr is constructed in kvm_apic_set_x2apic_id() as
> > below:
> >
> > u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
> >
> > So in fact, cid is (id >> 4), I cannot think of why cid can beyond 15.
> 
> I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is
> buggy in x2apic mode.
> 
> It does:
> 
>         if (apic_x2apic_mode(apic))
>                 return ((logical_id >> 16) == (mda >> 16))
>                        && (logical_id & mda & 0xffff) != 0;
> 
> But mda is only 8-bits for MSI and IOAPIC interrupts.
> 
> Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id
> != APIC_BROADCAST case?  It never triggers with Linux because it uses
> only the physical mode (that's not super-easy to see;
> ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that
> is allocated with kzalloc).
> 
> > Do I miss something here? Thanks!
> 
> No, you were right.
> 
> But still I think the WARNs are unnecessary; it is conceivable that some
> future chipset adds support for more than 8-bits in the dest_id.

No problem, I agree with it. Here I just want clarify some questions, thanks
for the elaboration!

Thanks,
Feng

> 
> Paolo
> 
> > Thanks,
> > Feng
> >
> >>
> >> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 16:00               ` Paolo Bonzini
@ 2015-09-17 23:18                 ` Wu, Feng
  2015-09-18 16:16                   ` Radim Krčmář
  0 siblings, 1 reply; 32+ messages in thread
From: Wu, Feng @ 2015-09-17 23:18 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Kr?má?
  Cc: alex.williamson, joro, mtosatti, eric.auger, kvm, iommu,
	linux-kernel, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2194 bytes --]



> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, September 18, 2015 12:00 AM
> To: Radim Krčmář
> Cc: Wu, Feng; alex.williamson@redhat.com; joro@8bytes.org;
> mtosatti@redhat.com; eric.auger@linaro.org; kvm@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v8 03/13] KVM: Define a new interface
> kvm_intr_is_single_vcpu()
> 
> 
> 
> On 17/09/2015 17:58, Radim Krčmář wrote:
> > For interrupts from MSI and IOxAPIC:
> > - Flat logical interrupts are delivered as if we had natural
> >   (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs.
> > - Cluster logical doesn't work much, it's interpreted like flat logical.
> >   I didn't care about xAPIC cluster because Linux, the sole user of our
> >   paravirtualized x2APIC, doesn't configure it.
> >
> > I'll paste kvm_apic_mda() source for better explanation:
> >
> >   static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
> >                                                 struct kvm_lapic
> *target)
> >   {
> >   	bool ipi = source != NULL;
> >   	bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
> >
> >   	if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
> >   		return X2APIC_BROADCAST;
> >
> >   	return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
> >   }
> >
> > MSI/IOxAPIC interrupt means that source is NULL and if the target is in
> > x2APIC mode, the original 'dest_id' is returned as mda => a flat logical
> > xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in
> > kvm_apic_match_logical_addr().
> > xAPIC address are only 8 bit long so they always get delivered to x2APIC
> > cluster 0, where first 16 bits work like xAPIC flat logical mode.
> 
> Ok, I was wondering whether this was the correct interpretation.  Thanks!

Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it
is wrong with mda >> 16, this is your concern, right?

Thanks,
Feng

> 
> Paolo
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-17 23:18                 ` Wu, Feng
@ 2015-09-18 16:16                   ` Radim Krčmář
  2015-09-18 16:17                     ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2015-09-18 16:16 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Paolo Bonzini, alex.williamson, joro, mtosatti, eric.auger, kvm,
	iommu, linux-kernel

2015-09-17 23:18+0000, Wu, Feng:
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> On 17/09/2015 17:58, Radim Krčmář wrote:
>>> xAPIC address are only 8 bit long so they always get delivered to x2APIC
>>> cluster 0, where first 16 bits work like xAPIC flat logical mode.
>> 
>> Ok, I was wondering whether this was the correct interpretation.  Thanks!
> 
> Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it
> is wrong with mda >> 16, this is your concern, right?

In case it was:  mda is u32 so the bitshift is defined by C.
(xAPIC destinations in KVM's x2APIC mode are stored in lowest 8 bits of
 mda, hence the cluster is always 0.)

Or am I still missing the point?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
  2015-09-18 16:16                   ` Radim Krčmář
@ 2015-09-18 16:17                     ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2015-09-18 16:17 UTC (permalink / raw)
  To: Radim Krčmář, Wu, Feng
  Cc: alex.williamson, joro, mtosatti, eric.auger, kvm, iommu, linux-kernel



On 18/09/2015 18:16, Radim Krčmář wrote:
>>> >> Ok, I was wondering whether this was the correct interpretation.  Thanks!
>> > 
>> > Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it
>> > is wrong with mda >> 16, this is your concern, right?
> In case it was:  mda is u32 so the bitshift is defined by C.
> (xAPIC destinations in KVM's x2APIC mode are stored in lowest 8 bits of
>  mda, hence the cluster is always 0.)
> 
> Or am I still missing the point?

Yes, remembering that the cluster is always 0 solved my doubt.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-09-18 16:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16  8:49 [PATCH v8 00/13] Add VT-d Posted-Interrupts support Feng Wu
2015-09-16  8:49 ` [PATCH v8 01/13] KVM: Extend struct pi_desc for VT-d Posted-Interrupts Feng Wu
2015-09-16  8:49 ` [PATCH v8 02/13] KVM: Add some helper functions for Posted-Interrupts Feng Wu
2015-09-16  8:49 ` [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu() Feng Wu
2015-09-16  9:23   ` Paolo Bonzini
2015-09-17  3:17     ` Wu, Feng
2015-09-17  9:42       ` Paolo Bonzini
2015-09-17 13:36         ` Wu, Feng
2015-09-17 14:24           ` Paolo Bonzini
2015-09-17 15:58             ` Radim Krčmář
2015-09-17 16:00               ` Paolo Bonzini
2015-09-17 23:18                 ` Wu, Feng
2015-09-18 16:16                   ` Radim Krčmář
2015-09-18 16:17                     ` Paolo Bonzini
2015-09-17 23:15             ` Wu, Feng
2015-09-16  8:50 ` [PATCH v8 04/13] KVM: Make struct kvm_irq_routing_table accessible Feng Wu
2015-09-16  8:50 ` [PATCH v8 05/13] KVM: make kvm_set_msi_irq() public Feng Wu
2015-09-16  8:50 ` [PATCH v8 06/13] vfio: Register/unregister irq_bypass_producer Feng Wu
2015-09-16  8:50 ` [PATCH v8 07/13] KVM: x86: Update IRTE for posted-interrupts Feng Wu
2015-09-16  8:50 ` [PATCH v8 08/13] KVM: Implement IRQ bypass consumer callbacks for x86 Feng Wu
2015-09-16  8:50 ` [PATCH v8 09/13] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd' Feng Wu
2015-09-16  9:27   ` Paolo Bonzini
2015-09-17  1:51     ` Wu, Feng
2015-09-17  9:38       ` Paolo Bonzini
2015-09-16  8:50 ` [PATCH v8 10/13] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted Feng Wu
2015-09-16  9:29   ` Paolo Bonzini
2015-09-16  8:50 ` [PATCH v8 11/13] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked Feng Wu
2015-09-16  9:32   ` Paolo Bonzini
2015-09-16  8:50 ` [PATCH v8 12/13] KVM: Warn if 'SN' is set during posting interrupts by software Feng Wu
2015-09-16  9:32   ` Paolo Bonzini
2015-09-16  8:50 ` [PATCH v8 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Feng Wu
2015-09-16  9:34 ` [PATCH v8 00/13] Add VT-d Posted-Interrupts support Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).