kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/14] KVM: x86: break the xAPIC barrier
@ 2016-07-12 20:09 Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 01/14] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

This series allows userspace to create and send interrupts to VCPUs with
APIC ID > 255.

v3:
[6/14] ignore APIC ID register in x2apic mode [Zhang and Paolo]
[10/14]
* use sub-feature flags -- they allow userspace to postpone enablement
* check invalid msi route in kvm_arch_set_irq_inatomic [Zhang]
[11/14] new

v2: http://www.spinics.net/lists/kvm/msg135220.html


Radim Krčmář (14):
  KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  KVM: x86: add kvm_apic_map_get_dest_lapic
  KVM: x86: use physical LAPIC array for logical x2APIC
  KVM: x86: dynamic kvm_apic_map
  KVM: x86: use generic function for MSI parsing
  KVM: x86: use hardware-compatible format for APIC ID register
  KVM: x86: reset APIC ID when enabling LAPIC
  KVM: VMX: optimize APIC ID read with APICv
  KVM: x86: reset lapic base in kvm_lapic_reset
  KVM: pass struct kvm to kvm_set_routing_entry
  KVM: x86: add KVM_CAP_X2APIC_API
  KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  KVM: x86: bump MAX_VCPUS to 288
  KVM: x86: bump KVM_MAX_VCPU_ID to 1023

 Documentation/virtual/kvm/api.txt |  45 +++++
 arch/powerpc/kvm/mpic.c           |   3 +-
 arch/s390/kvm/interrupt.c         |   3 +-
 arch/x86/include/asm/kvm_host.h   |  19 +-
 arch/x86/kvm/irq_comm.c           |  47 +++--
 arch/x86/kvm/lapic.c              | 408 ++++++++++++++++++++------------------
 arch/x86/kvm/lapic.h              |  14 +-
 arch/x86/kvm/vmx.c                |   5 +-
 arch/x86/kvm/x86.c                |  30 ++-
 include/linux/kvm_host.h          |   3 +-
 include/trace/events/kvm.h        |   5 +-
 include/uapi/linux/kvm.h          |   4 +
 virt/kvm/irqchip.c                |   7 +-
 13 files changed, 361 insertions(+), 232 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 01/14] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 02/14] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

240 has been well tested by Red Hat.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a628fb6a2c2..53d39771842b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
 #include <asm/kvm_page_track.h>
 
 #define KVM_MAX_VCPUS 255
-#define KVM_SOFT_MAX_VCPUS 160
+#define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 02/14] KVM: x86: add kvm_apic_map_get_dest_lapic
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 01/14] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
compute the interrupt destination.  Factor the code.

'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
GCC might complain about potential NULL access in the future, because it
missed conditions that avoided uninitialized uses of dst.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 242 ++++++++++++++++++++++-----------------------------
 1 file changed, 104 insertions(+), 138 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 22a6474af220..2987843657db 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -671,14 +671,99 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
 	}
 }
 
+/* Return true if the interrupt can be handled by using *bitmap as index mask
+ * for valid destinations in *dst array.
+ * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
+ * Note: we may have zero kvm_lapic destinations when we return true, which
+ * means that the interrupt should be dropped.  In this case, *bitmap would be
+ * zero and *dst undefined.
+ */
+static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
+		struct kvm_lapic **src, struct kvm_lapic_irq *irq,
+		struct kvm_apic_map *map, struct kvm_lapic ***dst,
+		unsigned long *bitmap)
+{
+	int i, lowest;
+	bool x2apic_ipi;
+	u16 cid;
+
+	if (irq->shorthand == APIC_DEST_SELF && src) {
+		*dst = src;
+		*bitmap = 1;
+		return true;
+	} else if (irq->shorthand)
+		return false;
+
+	x2apic_ipi = src && *src && apic_x2apic_mode(*src);
+	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
+		return false;
+
+	if (!map)
+		return false;
+
+	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
+		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+			*bitmap = 0;
+		} else {
+			*dst = &map->phys_map[irq->dest_id];
+			*bitmap = 1;
+		}
+		return true;
+	}
+
+	if (!kvm_apic_logical_map_valid(map))
+		return false;
+
+	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
+
+	if (cid >= ARRAY_SIZE(map->logical_map)) {
+		*bitmap = 0;
+		return true;
+	}
+
+	*dst = map->logical_map[cid];
+
+	if (!kvm_lowest_prio_delivery(irq))
+		return true;
+
+	if (!kvm_vector_hashing_enabled()) {
+		lowest = -1;
+		for_each_set_bit(i, bitmap, 16) {
+			if (!(*dst)[i])
+				continue;
+			if (lowest < 0)
+				lowest = i;
+			else if (kvm_apic_compare_prio((*dst)[i]->vcpu,
+						(*dst)[lowest]->vcpu) < 0)
+				lowest = i;
+		}
+	} else {
+		if (!*bitmap)
+			return true;
+
+		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
+				bitmap, 16);
+
+		if (!(*dst)[lowest]) {
+			kvm_apic_disabled_lapic_found(kvm);
+			*bitmap = 0;
+			return true;
+		}
+	}
+
+	*bitmap = (lowest >= 0) ? 1 << lowest : 0;
+
+	return true;
+}
+
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
 	struct kvm_apic_map *map;
-	unsigned long bitmap = 1;
-	struct kvm_lapic **dst;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	int i;
-	bool ret, x2apic_ipi;
+	bool ret;
 
 	*r = -1;
 
@@ -687,86 +772,19 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		return true;
 	}
 
-	if (irq->shorthand)
-		return false;
-
-	x2apic_ipi = src && apic_x2apic_mode(src);
-	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
-		return false;
-
-	ret = true;
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map) {
-		ret = false;
-		goto out;
-	}
-
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = &map->phys_map[irq->dest_id];
-	} else {
-		u16 cid;
-
-		if (!kvm_apic_logical_map_valid(map)) {
-			ret = false;
-			goto out;
+	ret = kvm_apic_map_get_dest_lapic(kvm, &src, irq, map, &dst, &bitmap);
+	if (ret)
+		for_each_set_bit(i, &bitmap, 16) {
+			if (!dst[i])
+				continue;
+			if (*r < 0)
+				*r = 0;
+			*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
 		}
 
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		dst = map->logical_map[cid];
-
-		if (!kvm_lowest_prio_delivery(irq))
-			goto set_irq;
-
-		if (!kvm_vector_hashing_enabled()) {
-			int l = -1;
-			for_each_set_bit(i, &bitmap, 16) {
-				if (!dst[i])
-					continue;
-				if (l < 0)
-					l = i;
-				else if (kvm_apic_compare_prio(dst[i]->vcpu,
-							dst[l]->vcpu) < 0)
-					l = i;
-			}
-			bitmap = (l >= 0) ? 1 << l : 0;
-		} else {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector,
-				dest_vcpus, &bitmap, 16);
-
-			if (!dst[idx]) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			bitmap = (idx >= 0) ? 1 << idx : 0;
-		}
-	}
-
-set_irq:
-	for_each_set_bit(i, &bitmap, 16) {
-		if (!dst[i])
-			continue;
-		if (*r < 0)
-			*r = 0;
-		*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
-	}
-out:
 	rcu_read_unlock();
 	return ret;
 }
@@ -789,8 +807,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
 	struct kvm_apic_map *map;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	bool ret = false;
-	struct kvm_lapic *dst = NULL;
 
 	if (irq->shorthand)
 		return false;
@@ -798,69 +817,16 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map)
-		goto out;
+	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
+			hweight16(bitmap) == 1) {
+		unsigned long i = find_first_bit(&bitmap, 16);
 
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id == 0xFF)
-			goto out;
-
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = map->phys_map[irq->dest_id];
-		if (dst && kvm_apic_present(dst->vcpu))
-			*dest_vcpu = dst->vcpu;
-		else
-			goto out;
-	} else {
-		u16 cid;
-		unsigned long bitmap = 1;
-		int i, r = 0;
-
-		if (!kvm_apic_logical_map_valid(map))
-			goto out;
-
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		if (kvm_vector_hashing_enabled() &&
-				kvm_lowest_prio_delivery(irq)) {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector, dest_vcpus,
-						  &bitmap, 16);
-
-			dst = map->logical_map[cid][idx];
-			if (!dst) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			*dest_vcpu = dst->vcpu;
-		} else {
-			for_each_set_bit(i, &bitmap, 16) {
-				dst = map->logical_map[cid][i];
-				if (++r == 2)
-					goto out;
-			}
-
-			if (dst && kvm_apic_present(dst->vcpu))
-				*dest_vcpu = dst->vcpu;
-			else
-				goto out;
+		if (dst[i]) {
+			*dest_vcpu = dst[i]->vcpu;
+			ret = true;
 		}
 	}
 
-	ret = true;
-out:
 	rcu_read_unlock();
 	return ret;
 }
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 01/14] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 02/14] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-08-02 13:46   ` Wanpeng Li
  2016-07-12 20:09 ` [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map Radim Krčmář
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
reuse the physical array for them.  This allows us to save space by
separating logical xAPIC maps.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++--
 arch/x86/kvm/lapic.c            | 69 +++++++++++++++++++++--------------------
 2 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53d39771842b..3194b19b9c7b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -683,8 +683,10 @@ struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
 	struct kvm_lapic *phys_map[256];
-	/* first index is cluster id second is cpu id in a cluster */
-	struct kvm_lapic *logical_map[16][16];
+	union {
+		struct kvm_lapic *xapic_flat_map[8];
+		struct kvm_lapic *xapic_cluster_map[16][4];
+	};
 };
 
 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2987843657db..9880d03f533d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -115,26 +115,36 @@ static inline int apic_enabled(struct kvm_lapic *apic)
 	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
 	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
 
-/* The logical map is definitely wrong if we have multiple
- * modes at the same time.  (Physical map is always right.)
- */
-static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
-{
-	return !(map->mode & (map->mode - 1));
-}
+static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
+		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
+	switch (map->mode) {
+	case KVM_APIC_MODE_X2APIC: {
+		u32 offset = (dest_id >> 16) * 16;
+		u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
 
-static inline void
-apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
-{
-	unsigned lid_bits;
+		if (offset <= max_apic_id) {
+			u8 cluster_size = min(max_apic_id - offset + 1, 16U);
 
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
-	BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
-	lid_bits = map->mode;
+			*cluster = &map->phys_map[offset];
+			*mask = dest_id & (0xffff >> (16 - cluster_size));
+		} else {
+			*mask = 0;
+		}
 
-	*cid = dest_id >> lid_bits;
-	*lid = dest_id & ((1 << lid_bits) - 1);
+		return true;
+		}
+	case KVM_APIC_MODE_XAPIC_FLAT:
+		*cluster = map->xapic_flat_map;
+		*mask = dest_id & 0xff;
+		return true;
+	case KVM_APIC_MODE_XAPIC_CLUSTER:
+		*cluster = map->xapic_cluster_map[dest_id >> 4];
+		*mask = dest_id & 0xf;
+		return true;
+	default:
+		/* Not optimized. */
+		return false;
+	}
 }
 
 static void recalculate_apic_map(struct kvm *kvm)
@@ -152,7 +162,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
-		u16 cid, lid;
+		struct kvm_lapic **cluster;
+		u16 mask;
 		u32 ldr, aid;
 
 		if (!kvm_apic_present(vcpu))
@@ -174,13 +185,11 @@ static void recalculate_apic_map(struct kvm *kvm)
 				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
 		}
 
-		if (!kvm_apic_logical_map_valid(new))
+		if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
 			continue;
 
-		apic_logical_id(new, ldr, &cid, &lid);
-
-		if (lid && cid < ARRAY_SIZE(new->logical_map))
-			new->logical_map[cid][ffs(lid) - 1] = apic;
+		if (mask)
+			cluster[ffs(mask) - 1] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
@@ -685,7 +694,6 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 {
 	int i, lowest;
 	bool x2apic_ipi;
-	u16 cid;
 
 	if (irq->shorthand == APIC_DEST_SELF && src) {
 		*dst = src;
@@ -711,18 +719,11 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 		return true;
 	}
 
-	if (!kvm_apic_logical_map_valid(map))
+	*bitmap = 0;
+	if (!kvm_apic_map_get_logical_dest(map, irq->dest_id, dst,
+				(u16 *)bitmap))
 		return false;
 
-	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
-
-	if (cid >= ARRAY_SIZE(map->logical_map)) {
-		*bitmap = 0;
-		return true;
-	}
-
-	*dst = map->logical_map[cid];
-
 	if (!kvm_lowest_prio_delivery(irq))
 		return true;
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (2 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-13  8:29   ` Paolo Bonzini
  2016-08-02 11:39   ` Wanpeng Li
  2016-07-12 20:09 ` [PATCH v3 05/14] KVM: x86: use generic function for MSI parsing Radim Krčmář
                   ` (9 subsequent siblings)
  13 siblings, 2 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
have slighly less VCPUs.  Dynamic size saves memory at the cost of
turning one constant into a variable.

apic_map mutex had to be moved before allocation to avoid races with cpu
hotplug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/lapic.c            | 18 +++++++++++++-----
 arch/x86/kvm/lapic.h            |  2 +-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3194b19b9c7b..643e3dffcd85 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -682,11 +682,12 @@ struct kvm_arch_memory_slot {
 struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
-	struct kvm_lapic *phys_map[256];
+	u32 max_apic_id;
 	union {
 		struct kvm_lapic *xapic_flat_map[8];
 		struct kvm_lapic *xapic_cluster_map[16][4];
 	};
+	struct kvm_lapic *phys_map[];
 };
 
 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9880d03f533d..224fc1c5fcc6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -120,7 +120,7 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 	switch (map->mode) {
 	case KVM_APIC_MODE_X2APIC: {
 		u32 offset = (dest_id >> 16) * 16;
-		u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
+		u32 max_apic_id = map->max_apic_id;
 
 		if (offset <= max_apic_id) {
 			u8 cluster_size = min(max_apic_id - offset + 1, 16U);
@@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	int i;
-
-	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
+	u32 max_id = 255;
 
 	mutex_lock(&kvm->arch.apic_map_lock);
 
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		if (kvm_apic_present(vcpu))
+			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
+
+	new = kzalloc(sizeof(struct kvm_apic_map) +
+	              sizeof(struct kvm_lapic *) * (max_id + 1), GFP_KERNEL);
+
 	if (!new)
 		goto out;
 
+	new->max_apic_id = max_id;
+
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		struct kvm_lapic **cluster;
@@ -172,7 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 		aid = kvm_apic_id(apic);
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
 
-		if (aid < ARRAY_SIZE(new->phys_map))
+		if (aid <= new->max_apic_id)
 			new->phys_map[aid] = apic;
 
 		if (apic_x2apic_mode(apic)) {
@@ -710,7 +718,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 		return false;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+		if (irq->dest_id > map->max_apic_id) {
 			*bitmap = 0;
 		} else {
 			*dst = &map->phys_map[irq->dest_id];
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 336ba51bb16e..8d811139d2b3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -200,7 +200,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
 }
 
-static inline int kvm_apic_id(struct kvm_lapic *apic)
+static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
 	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 05/14] KVM: x86: use generic function for MSI parsing
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (3 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 06/14] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/irq_comm.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index dfb4c6476877..47ad681a33fd 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -388,21 +388,16 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			       kvm->arch.nr_reserved_ioapic_pins);
 	for (i = 0; i < nr_ioapic_pins; ++i) {
 		hlist_for_each_entry(entry, &table->map[i], link) {
-			u32 dest_id, dest_mode;
-			bool level;
+			struct kvm_lapic_irq irq;
 
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
-			dest_id = (entry->msi.address_lo >> 12) & 0xff;
-			dest_mode = (entry->msi.address_lo >> 2) & 0x1;
-			level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
-			if (level && kvm_apic_match_dest(vcpu, NULL, 0,
-						dest_id, dest_mode)) {
-				u32 vector = entry->msi.data & 0xff;
 
-				__set_bit(vector,
-					  ioapic_handled_vectors);
-			}
+			kvm_set_msi_irq(entry, &irq);
+
+			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
+						irq.dest_id, irq.dest_mode))
+				__set_bit(irq.vector, ioapic_handled_vectors);
 		}
 	}
 	srcu_read_unlock(&kvm->irq_srcu, idx);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 06/14] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (4 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 05/14] KVM: x86: use generic function for MSI parsing Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

We currently always shift APIC ID as if APIC was in xAPIC mode.
x2APIC mode wants to use more bits and storing a hardware-compabible
value is the the sanest option.

KVM API to set the lapic expects that bottom 8 bits of APIC ID are in
top 8 bits of APIC_ID register, so the register needs to be shifted in
x2APIC mode.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v3: ignore APIC ID register in x2apic mode [Zhang and Paolo]

 arch/x86/kvm/lapic.c | 52 +++++++++++++++++++++++++++++++++++++---------------
 arch/x86/kvm/lapic.h | 12 +++++++++---
 arch/x86/kvm/x86.c   | 10 ++++++----
 3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 224fc1c5fcc6..41089dbeeafc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -227,7 +227,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 	}
 }
 
-static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id)
 {
 	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
 	recalculate_apic_map(apic->vcpu->kvm);
@@ -239,11 +239,11 @@ static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
-static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
 {
 	u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
 
-	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
+	kvm_lapic_set_reg(apic, APIC_ID, id);
 	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
@@ -1102,12 +1102,6 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int offset)
 		return 0;
 
 	switch (offset) {
-	case APIC_ID:
-		if (apic_x2apic_mode(apic))
-			val = kvm_apic_id(apic);
-		else
-			val = kvm_apic_id(apic) << 24;
-		break;
 	case APIC_ARBPRI:
 		apic_debug("Access APIC ARBPRI register which is for P6\n");
 		break;
@@ -1465,7 +1459,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	switch (reg) {
 	case APIC_ID:		/* Local APIC ID */
 		if (!apic_x2apic_mode(apic))
-			kvm_apic_set_id(apic, val >> 24);
+			kvm_apic_set_xapic_id(apic, val >> 24);
 		else
 			ret = 1;
 		break;
@@ -1769,7 +1763,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
 	if (!init_event)
-		kvm_apic_set_id(apic, vcpu->vcpu_id);
+		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1990,17 +1984,43 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	return vector;
 }
 
-void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s)
+static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s, bool set)
+{
+	if (apic_x2apic_mode(vcpu->arch.apic)) {
+		u32 *id = (u32 *)(s->regs + APIC_ID);
+
+		if (set)
+			*id >>= 24;
+		else
+			*id <<= 24;
+	}
+
+	return 0;
+}
+
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+{
+	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
+	return kvm_apic_state_fixup(vcpu, s, false);
+}
+
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
+	int r;
+
 
 	kvm_lapic_set_base(vcpu, vcpu->arch.apic_base);
 	/* set SPIV separately to get count of SW disabled APICs right */
 	apic_set_spiv(apic, *((u32 *)(s->regs + APIC_SPIV)));
+
+	r = kvm_apic_state_fixup(vcpu, s, true);
+	if (r)
+		return r;
 	memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
-	/* call kvm_apic_set_id() to put apic into apic_map */
-	kvm_apic_set_id(apic, kvm_apic_id(apic));
+
+	recalculate_apic_map(vcpu->kvm);
 	kvm_apic_set_version(vcpu);
 
 	apic_update_ppr(apic);
@@ -2026,6 +2046,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		kvm_rtc_eoi_tracking_restore_one(vcpu);
 
 	vcpu->arch.apic_arb_prio = 0;
+
+	return 0;
 }
 
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8d811139d2b3..f60d01c29d51 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -81,8 +81,8 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
-void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s);
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
@@ -202,7 +202,13 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
-	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+	/* To avoid a race between apic_base and following APIC_ID update when
+	 * switching to x2apic_mode, the x2apic mode returns initial x2apic id.
+	 */
+	if (apic_x2apic_mode(apic))
+		return apic->vcpu->vcpu_id;
+
+	return kvm_lapic_get_reg(apic, APIC_ID) >> 24;
 }
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf834cdd..b178c8c12717 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2779,15 +2779,17 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 	if (vcpu->arch.apicv_active)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
-	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
-
-	return 0;
+	return kvm_apic_get_state(vcpu, s);
 }
 
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
-	kvm_apic_post_state_restore(vcpu, s);
+	int r;
+
+	r = kvm_apic_set_state(vcpu, s);
+	if (r)
+		return r;
 	update_cr8_intercept(vcpu);
 
 	return 0;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (5 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 06/14] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-08-02 13:26   ` Wanpeng Li
  2016-07-12 20:09 ` [PATCH v3 08/14] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

APIC ID should be set to the initial APIC ID when enabling LAPIC.
This only matters if the guest changes APIC ID.  No sane OS does that.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 41089dbeeafc..0fce77fdbe91 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1720,9 +1720,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 	/* update jump label if enable bit changes */
 	if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
-		if (value & MSR_IA32_APICBASE_ENABLE)
+		if (value & MSR_IA32_APICBASE_ENABLE) {
+			kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 			static_key_slow_dec_deferred(&apic_hw_disabled);
-		else
+		} else
 			static_key_slow_inc(&apic_hw_disabled.key);
 		recalculate_apic_map(vcpu->kvm);
 	}
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 08/14] KVM: VMX: optimize APIC ID read with APICv
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (6 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 09/14] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

The register is in hardware-compatible format now, so there is not need
to intercept.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e51503063181..24b505fda7ad 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6456,9 +6456,6 @@ static __init int hardware_setup(void)
 	for (msr = 0x800; msr <= 0x8ff; msr++)
 		vmx_disable_intercept_msr_read_x2apic(msr);
 
-	/* According SDM, in x2apic mode, the whole id reg is used.  But in
-	 * KVM, it only use the highest eight bits. Need to intercept it */
-	vmx_enable_intercept_msr_read_x2apic(0x802);
 	/* TMCCT */
 	vmx_enable_intercept_msr_read_x2apic(0x839);
 	/* TPR */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 09/14] KVM: x86: reset lapic base in kvm_lapic_reset
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (7 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 08/14] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 10/14] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

LAPIC is reset in xAPIC mode and the surrounding code expects that.
KVM never resets after initialization.  This patch is just for sanity.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0fce77fdbe91..3c2a8c113054 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1763,8 +1763,11 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/* Stop the timer in case it's a reset to an active apic */
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
-	if (!init_event)
+	if (!init_event) {
+		kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE |
+		                         MSR_IA32_APICBASE_ENABLE);
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
+	}
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1903,9 +1906,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 	 * thinking that APIC satet has changed.
 	 */
 	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
-	kvm_lapic_set_base(vcpu,
-			APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
-
 	static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */
 	kvm_lapic_reset(vcpu, false);
 	kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 10/14] KVM: pass struct kvm to kvm_set_routing_entry
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (8 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 09/14] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

Arch-specific code will use it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/powerpc/kvm/mpic.c   | 3 ++-
 arch/s390/kvm/interrupt.c | 3 ++-
 arch/x86/kvm/irq_comm.c   | 3 ++-
 include/linux/kvm_host.h  | 3 ++-
 virt/kvm/irqchip.c        | 7 ++++---
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 6249cdc834d1..ed38f8114118 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1823,7 +1823,8 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 	return 0;
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int r = -EINVAL;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index ca19627779db..24524c0f3ef8 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2246,7 +2246,8 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	return ret;
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int ret;
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 47ad681a33fd..889563d50c55 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -248,7 +248,8 @@ static int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
 	return kvm_hv_synic_set_irq(kvm, e->hv_sint.vcpu, e->hv_sint.sint);
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int r = -EINVAL;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 66b2f6159aad..60d339faa94c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1011,7 +1011,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			const struct kvm_irq_routing_entry *entries,
 			unsigned nr,
 			unsigned flags);
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue);
 void kvm_free_irq_routing(struct kvm *kvm);
 
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 8db197bb6c7a..df99e9c3b64d 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -135,7 +135,8 @@ void kvm_free_irq_routing(struct kvm *kvm)
 	free_irq_routing_table(rt);
 }
 
-static int setup_routing_entry(struct kvm_irq_routing_table *rt,
+static int setup_routing_entry(struct kvm *kvm,
+			       struct kvm_irq_routing_table *rt,
 			       struct kvm_kernel_irq_routing_entry *e,
 			       const struct kvm_irq_routing_entry *ue)
 {
@@ -154,7 +155,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
 
 	e->gsi = ue->gsi;
 	e->type = ue->type;
-	r = kvm_set_routing_entry(e, ue);
+	r = kvm_set_routing_entry(kvm, e, ue);
 	if (r)
 		goto out;
 	if (e->type == KVM_IRQ_ROUTING_IRQCHIP)
@@ -211,7 +212,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			kfree(e);
 			goto out;
 		}
-		r = setup_routing_entry(new, e, ue);
+		r = setup_routing_entry(kvm, new, e, ue);
 		if (r) {
 			kfree(e);
 			goto out;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (9 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 10/14] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-13  8:41   ` Paolo Bonzini
  2016-07-12 20:09 ` [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk Radim Krčmář
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

KVM_CAP_X2APIC_API is a capability for features related to x2APIC
enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
Both are needed to support x2APIC.

The feature has to be enableable and disabled by default, because
get/set ioctl shifted and truncated APIC ID to 8 bits by using a
non-standard protocol inspired by xAPIC and the change is not
backward-compatible.

Changes to MSI addresses follow the format used by interrupt remapping
unit.  The upper address word, that used to be 0, contains upper 24 bits
of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
0.  Using the upper address word is not backward-compatible either as we
didn't check that userspace zeroed the word.  Reserved bits are still
not explicitly checked, but non-zero data will affect LAPIC addresses,
which will cause a bug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v3:
 * use sub-feature flags -- they allow userspace to postpone enablement
   (helps a bit in the [12/14] patch, because QEMU intitializes kvm
    before before iommu.)
 * check invalid msi route in kvm_arch_set_irq_inatomic [Zhang]

 Documentation/virtual/kvm/api.txt | 40 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h   |  4 +++-
 arch/x86/kvm/irq_comm.c           | 29 +++++++++++++++++++++++-----
 arch/x86/kvm/lapic.c              | 13 +++++++++----
 arch/x86/kvm/vmx.c                |  2 +-
 arch/x86/kvm/x86.c                | 15 +++++++++++++++
 include/trace/events/kvm.h        |  5 +++--
 include/uapi/linux/kvm.h          |  3 +++
 8 files changed, 98 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 09efa9eb3926..10e2bf903e57 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1482,6 +1482,11 @@ struct kvm_irq_routing_msi {
 	__u32 pad;
 };
 
+On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_FORMAT
+feature of KVM_CAP_X2APIC_API capability is enabled.  If it is enabled,
+address_hi bits 31-8 provide bits 31-8 of the destination id.  Bits 7-0 of
+address_hi must be zero.
+
 struct kvm_irq_routing_s390_adapter {
 	__u64 ind_addr;
 	__u64 summary_addr;
@@ -1583,6 +1588,16 @@ struct kvm_lapic_state {
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+If KVM_X2APIC_API_USE_32BIT_FORMAT feature of KVM_CAP_X2APIC_API is enabled,
+then the format of APIC_ID register depends on the APIC mode (reported by
+MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in the APIC_ID register
+(bytes 32-35).  xAPIC only allows an 8-bit APIC ID which is stored in bits
+31-24 of the APIC register, or equivalently in byte 35 of struct
+kvm_lapic_state's regs field.
+
+If KVM_X2APIC_API_USE_32BIT_FORMAT feature is disabled, struct kvm_lapic_state
+always uses xAPIC format.
+
 
 4.58 KVM_SET_LAPIC
 
@@ -1600,6 +1615,10 @@ struct kvm_lapic_state {
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
+regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
+See the note in KVM_GET_LAPIC.
+
 
 4.59 KVM_IOEVENTFD
 
@@ -2180,6 +2199,10 @@ struct kvm_msi {
 
 No flags are defined so far. The corresponding field must be 0.
 
+On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
+enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
+destination id.  Bits 7-0 of address_hi must be zero.
+
 
 4.71 KVM_CREATE_PIT2
 
@@ -3811,6 +3834,23 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
 Will return -EINVAL if the machine does not support runtime-instrumentation.
 Will return -EBUSY if a VCPU has already been created.
 
+7.7 KVM_CAP_X2APIC_API
+
+Architectures: x86
+Parameters: args[0] - features that should be enabled
+Returns: 0 on success, -EINVAL when args[0] contains invalid features
+
+Valid feature flags in args[0] are
+
+#define KVM_X2APIC_API_USE_32BIT_FORMAT         (1ULL << 0)
+
+Enabling KVM_X2APIC_API_USE_32BIT_FORMAT changes the behavior of
+KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, allowing
+the use of 32-bit APIC IDs.  See KVM_CAP_X2APIC_API in their respective
+sections.
+
+
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 643e3dffcd85..f13522f85a1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -782,6 +782,8 @@ struct kvm_arch {
 	u32 ldr_mode;
 	struct page *avic_logical_id_table_page;
 	struct page *avic_physical_id_table_page;
+
+	bool x2apic_format;
 };
 
 struct kvm_vm_stat {
@@ -1364,7 +1366,7 @@ bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
 bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			     struct kvm_vcpu **dest_vcpu);
 
-void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq);
 
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 889563d50c55..25810b144b58 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -110,13 +110,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq)
 {
-	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
+	trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_format ?
+	                                     (u64)e->msi.address_hi << 32 : 0),
+	                      e->msi.data);
 
 	irq->dest_id = (e->msi.address_lo &
 			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+	if (kvm->arch.x2apic_format)
+		irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
 	irq->vector = (e->msi.data &
 			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
 	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
@@ -129,15 +133,24 @@ void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 }
 EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
+static inline bool kvm_msi_route_invalid(struct kvm *kvm,
+		struct kvm_kernel_irq_routing_entry *e)
+{
+	return kvm->arch.x2apic_format && (e->msi.address_hi & 0xff);
+}
+
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 		struct kvm *kvm, int irq_source_id, int level, bool line_status)
 {
 	struct kvm_lapic_irq irq;
 
+	if (kvm_msi_route_invalid(kvm, e))
+		return -EINVAL;
+
 	if (!level)
 		return -1;
 
-	kvm_set_msi_irq(e, &irq);
+	kvm_set_msi_irq(kvm, e, &irq);
 
 	return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
 }
@@ -153,7 +166,10 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
 	if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
 		return -EWOULDBLOCK;
 
-	kvm_set_msi_irq(e, &irq);
+	if (kvm_msi_route_invalid(kvm, e))
+		return -EINVAL;
+
+	kvm_set_msi_irq(kvm, e, &irq);
 
 	if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
 		return r;
@@ -286,6 +302,9 @@ int kvm_set_routing_entry(struct kvm *kvm,
 		e->msi.address_lo = ue->u.msi.address_lo;
 		e->msi.address_hi = ue->u.msi.address_hi;
 		e->msi.data = ue->u.msi.data;
+
+		if (kvm_msi_route_invalid(kvm, e))
+			goto out;
 		break;
 	case KVM_IRQ_ROUTING_HV_SINT:
 		e->set = kvm_hv_set_sint;
@@ -394,7 +413,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
 
-			kvm_set_msi_irq(entry, &irq);
+			kvm_set_msi_irq(vcpu->kvm, entry, &irq);
 
 			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
 						irq.dest_id, irq.dest_mode))
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3c2a8c113054..d27a7829a4ce 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1991,10 +1991,15 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 	if (apic_x2apic_mode(vcpu->arch.apic)) {
 		u32 *id = (u32 *)(s->regs + APIC_ID);
 
-		if (set)
-			*id >>= 24;
-		else
-			*id <<= 24;
+		if (vcpu->kvm->arch.x2apic_format) {
+			if (*id != vcpu->vcpu_id)
+				return -EINVAL;
+		} else {
+			if (set)
+				*id >>= 24;
+			else
+				*id <<= 24;
+		}
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 24b505fda7ad..a12c239eb732 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11097,7 +11097,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		 * We will support full lowest-priority interrupt later.
 		 */
 
-		kvm_set_msi_irq(e, &irq);
+		kvm_set_msi_irq(kvm, e, &irq);
 		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
 			/*
 			 * Make sure the IRTE is in remapped mode if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b178c8c12717..330276315daa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -90,6 +90,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_FORMAT)
+
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
 static void enter_smm(struct kvm_vcpu *vcpu);
@@ -2625,6 +2627,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_TSC_CONTROL:
 		r = kvm_has_tsc_control;
 		break;
+	case KVM_CAP_X2APIC_API:
+		r = KVM_X2APIC_API_VALID_FLAGS;
+		break;
 	default:
 		r = 0;
 		break;
@@ -3799,6 +3804,16 @@ split_irqchip_unlock:
 		mutex_unlock(&kvm->lock);
 		break;
 	}
+	case KVM_CAP_X2APIC_API:
+		r = -EINVAL;
+		if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
+			break;
+
+		if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_FORMAT)
+			kvm->arch.x2apic_format = true;
+
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index f28292d73ddb..8ade3eb6c640 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -151,8 +151,9 @@ TRACE_EVENT(kvm_msi_set_irq,
 		__entry->data		= data;
 	),
 
-	TP_printk("dst %u vec %u (%s|%s|%s%s)",
-		  (u8)(__entry->address >> 12), (u8)__entry->data,
+	TP_printk("dst %llx vec %u (%s|%s|%s%s)",
+		  (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00),
+		  (u8)__entry->data,
 		  __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode),
 		  (__entry->address & (1<<2)) ? "logical" : "physical",
 		  (__entry->data & (1<<15)) ? "level" : "edge",
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf475104c..b6de3febabaa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_X2APIC_API 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1313,4 +1314,6 @@ struct kvm_assigned_msix_entry {
 	__u16 padding[3];
 };
 
+#define KVM_X2APIC_API_USE_32BIT_FORMAT         (1ULL << 0)
+
 #endif /* __LINUX_KVM_H */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (10 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-13  8:38   ` Paolo Bonzini
  2016-07-13 10:15   ` Paolo Bonzini
  2016-07-12 20:09 ` [PATCH v3 13/14] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 14/14] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář
  13 siblings, 2 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
KVM_CAP_X2APIC_API.

The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
The enableable capability is needed in order to support standard x2APIC and
remain backward compatible.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v3: new

 Documentation/virtual/kvm/api.txt |  5 +++++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/lapic.c              | 40 +++++++++++++++++++++++++++++----------
 arch/x86/kvm/x86.c                |  5 ++++-
 include/uapi/linux/kvm.h          |  1 +
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 10e2bf903e57..c2753ff4a499 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3843,12 +3843,17 @@ Returns: 0 on success, -EINVAL when args[0] contains invalid features
 Valid feature flags in args[0] are
 
 #define KVM_X2APIC_API_USE_32BIT_FORMAT         (1ULL << 0)
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
 Enabling KVM_X2APIC_API_USE_32BIT_FORMAT changes the behavior of
 KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, allowing
 the use of 32-bit APIC IDs.  See KVM_CAP_X2APIC_API in their respective
 sections.
 
+KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work in
+logical mode or with more than 255 VCPUs.  A KVM quirk treats 0xff as a
+broadcast even in x2APIC mode (a consequence of allowing physical x2APIC
+without interrupt remapping).
 
 
 8. Other capabilities.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f13522f85a1d..25bf9d627c15 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -784,6 +784,7 @@ struct kvm_arch {
 	struct page *avic_physical_id_table_page;
 
 	bool x2apic_format;
+	bool x2apic_broadcast_quirk_disabled;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d27a7829a4ce..46f1d1c93420 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -619,14 +619,17 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 /* KVM APIC implementation has two quirks
  *  - dest always begins at 0 while xAPIC MDA has offset 24,
  *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
+ *
+ * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
  */
-static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source,
-                                              struct kvm_lapic *target)
+static u32 kvm_apic_mda(struct kvm_vcpu *vcpu, unsigned int dest_id,
+		struct kvm_lapic *source, struct kvm_lapic *target)
 {
 	bool ipi = source != NULL;
 	bool x2apic_mda = apic_x2apic_mode(ipi ? source : target);
 
-	if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda)
+	if (!vcpu->kvm->arch.x2apic_broadcast_quirk_disabled &&
+	    !ipi && dest_id == APIC_BROADCAST && x2apic_mda)
 		return X2APIC_BROADCAST;
 
 	return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id);
@@ -636,7 +639,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 			   int short_hand, unsigned int dest, int dest_mode)
 {
 	struct kvm_lapic *target = vcpu->arch.apic;
-	u32 mda = kvm_apic_mda(dest, source, target);
+	u32 mda = kvm_apic_mda(vcpu, dest, source, target);
 
 	apic_debug("target %p, source %p, dest 0x%x, "
 		   "dest_mode 0x%x, short_hand 0x%x\n",
@@ -688,6 +691,28 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
 	}
 }
 
+static bool kvm_apic_map_need_slowpath(struct kvm *kvm, struct kvm_lapic **src,
+		struct kvm_lapic_irq *irq, struct kvm_apic_map *map)
+{
+	if (!map)
+		return true;
+
+	if (kvm->arch.x2apic_broadcast_quirk_disabled) {
+		if ((irq->dest_id == APIC_BROADCAST &&
+				map->mode != KVM_APIC_MODE_X2APIC))
+			return true;
+		if (irq->dest_id == X2APIC_BROADCAST)
+			return true;
+	} else {
+		bool x2apic_ipi = src && *src && apic_x2apic_mode(*src);
+		if (irq->dest_id == (x2apic_ipi ?
+		                     X2APIC_BROADCAST : APIC_BROADCAST))
+			return true;
+	}
+
+	return false;
+}
+
 /* Return true if the interrupt can be handled by using *bitmap as index mask
  * for valid destinations in *dst array.
  * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
@@ -701,7 +726,6 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 		unsigned long *bitmap)
 {
 	int i, lowest;
-	bool x2apic_ipi;
 
 	if (irq->shorthand == APIC_DEST_SELF && src) {
 		*dst = src;
@@ -710,11 +734,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 	} else if (irq->shorthand)
 		return false;
 
-	x2apic_ipi = src && *src && apic_x2apic_mode(*src);
-	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
-		return false;
-
-	if (!map)
+	if (kvm_apic_map_need_slowpath(kvm, src, irq, map))
 		return false;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 330276315daa..4bab85f82a96 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -90,7 +90,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_FORMAT)
+#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_FORMAT | \
+                                    KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu);
 static void process_nmi(struct kvm_vcpu *vcpu);
@@ -3811,6 +3812,8 @@ split_irqchip_unlock:
 
 		if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_FORMAT)
 			kvm->arch.x2apic_format = true;
+		if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+			kvm->arch.x2apic_broadcast_quirk_disabled = true;
 
 		r = 0;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6de3febabaa..13b30dfd93b2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1315,5 +1315,6 @@ struct kvm_assigned_msix_entry {
 };
 
 #define KVM_X2APIC_API_USE_32BIT_FORMAT         (1ULL << 0)
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
 #endif /* __LINUX_KVM_H */
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 13/14] KVM: x86: bump MAX_VCPUS to 288
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (11 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  2016-07-12 20:09 ` [PATCH v3 14/14] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

288 is in high demand because of Knights Landing CPU.
We cannot set the limit to 640k, because that would be wasting space.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 25bf9d627c15..276d6f449d13 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -34,7 +34,7 @@
 #include <asm/asm.h>
 #include <asm/kvm_page_track.h>
 
-#define KVM_MAX_VCPUS 255
+#define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 14/14] KVM: x86: bump KVM_MAX_VCPU_ID to 1023
  2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (12 preceding siblings ...)
  2016-07-12 20:09 ` [PATCH v3 13/14] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
@ 2016-07-12 20:09 ` Radim Krčmář
  13 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-12 20:09 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu,
	Yang Zhang

kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
rcu had to be modified to cope with it.

The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
value was chosen in case there were bugs.  1023 is sufficient maximum
APIC ID for 288 VCPUs.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/lapic.c            | 13 ++++++++++---
 arch/x86/kvm/x86.c              |  2 +-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 276d6f449d13..30241a2af2a6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
 
 #define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
+#define KVM_MAX_VCPU_ID 1023
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 46f1d1c93420..41d782212cea 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -147,6 +147,13 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 	}
 }
 
+static void kvm_apic_map_free(struct rcu_head *rcu)
+{
+	struct kvm_apic_map *map = container_of(rcu, struct kvm_apic_map, rcu);
+
+	kvfree(map);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -160,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
 
-	new = kzalloc(sizeof(struct kvm_apic_map) +
-	              sizeof(struct kvm_lapic *) * (max_id + 1), GFP_KERNEL);
+	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
 
 	if (!new)
 		goto out;
@@ -206,7 +213,7 @@ out:
 	mutex_unlock(&kvm->arch.apic_map_lock);
 
 	if (old)
-		kfree_rcu(old, rcu);
+		call_rcu(&old->rcu, kvm_apic_map_free);
 
 	kvm_make_scan_ioapic_request(kvm);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4bab85f82a96..c41a51fe7221 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7927,7 +7927,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kfree(kvm->arch.vpic);
 	kfree(kvm->arch.vioapic);
 	kvm_free_vcpus(kvm);
-	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
+	kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 	kvm_mmu_uninit_vm(kvm);
 }
 
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-07-12 20:09 ` [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-07-13  8:29   ` Paolo Bonzini
  2016-07-13 14:37     ` Radim Krčmář
  2016-08-02 11:39   ` Wanpeng Li
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-07-13  8:29 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu, Yang Zhang



On 12/07/2016 22:09, Radim Krčmář wrote:
> x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
> have slighly less VCPUs.

It's "fewer", not "less" (that's important to make the joke less uncanny
for native speakers :)).  Less goes with uncountable nouns such as milk.

Paolo

> Dynamic size saves memory at the cost of
> turning one constant into a variable.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-12 20:09 ` [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk Radim Krčmář
@ 2016-07-13  8:38   ` Paolo Bonzini
  2016-07-13 15:14     ` Radim Krčmář
  2016-07-13 10:15   ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-07-13  8:38 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu, Yang Zhang



On 12/07/2016 22:09, Radim Krčmář wrote:
> @@ -619,14 +619,17 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
>  /* KVM APIC implementation has two quirks
>   *  - dest always begins at 0 while xAPIC MDA has offset 24,
>   *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
> + *
> + * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.

--verbose version:

 /* KVM APIC implementation has two quirks
- *  - dest always begins at 0 while xAPIC MDA has offset 24,
- *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
+ *  - dest always begins at 0 while xAPIC MDA has offset 24.  This is
+ *    just a quirk in the API and is not problematic.
  *
- * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
+ *  - in-kernel IOAPIC messages have to be delivered directly to
+ *    x2APIC, because the kernel does not support interrupt remapping.
+ *    In order to support broadcast without interrupt remapping, x2APIC
+ *    rewrites the destination of non-IPI messages (and also of IPIs sent
+ *    from xAPIC-mode LAPICs) from APIC_BROADCAST to X2APIC_BROADCAST.
+ *
+ * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.  This is
+ * important when userspace wants to use x2APIC-format MSIs, because
+ * APIC_BROADCAST (0xff) is a legal route for "cluster 0, CPUs 0-7".
  */

Sounds good?

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-12 20:09 ` [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-07-13  8:41   ` Paolo Bonzini
  2016-07-13 14:40     ` Radim Krčmář
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-07-13  8:41 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu, Yang Zhang



On 12/07/2016 22:09, Radim Krčmář wrote:
> KVM_CAP_X2APIC_API is a capability for features related to x2APIC
> enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
> extend APIC ID in get/set ioctl and MSI addresses to 32 bits.

Let's make it KVM_X2APIC_API_USE_32BIT_IDS instead, as this matches the
name of Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK ("verb adjective noun").

Paolo

> Both are needed to support x2APIC.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-12 20:09 ` [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk Radim Krčmář
  2016-07-13  8:38   ` Paolo Bonzini
@ 2016-07-13 10:15   ` Paolo Bonzini
  2016-07-13 14:52     ` Radim Krčmář
  1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-07-13 10:15 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu, Yang Zhang



On 12/07/2016 22:09, Radim Krčmář wrote:
> +static bool kvm_apic_map_need_slowpath(struct kvm *kvm, struct kvm_lapic **src,
> +		struct kvm_lapic_irq *irq, struct kvm_apic_map *map)
> +{
> +	if (!map)
> +		return true;
> +
> +	if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> +		if ((irq->dest_id == APIC_BROADCAST &&
> +				map->mode != KVM_APIC_MODE_X2APIC))
> +			return true;
> +		if (irq->dest_id == X2APIC_BROADCAST)
> +			return true;
> +	} else {
> +		bool x2apic_ipi = src && *src && apic_x2apic_mode(*src);
> +		if (irq->dest_id == (x2apic_ipi ?
> +		                     X2APIC_BROADCAST : APIC_BROADCAST))
> +			return true;
> +	}
> +
> +	return false;
> +}

This isn't the only case where you go through the slowpath.  What about
renaming to kvm_apic_dest_is_broadcast?

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-07-13  8:29   ` Paolo Bonzini
@ 2016-07-13 14:37     ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-13 14:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka,
	Peter Xu, Yang Zhang

2016-07-13 10:29+0200, Paolo Bonzini:
> On 12/07/2016 22:09, Radim Krčmář wrote:
>> x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
>> have slighly less VCPUs.
> 
> It's "fewer", not "less" (that's important to make the joke less uncanny
> for native speakers :)).  Less goes with uncountable nouns such as milk.

Thanks.  My syntax should be enough to creep native speakers. :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-13  8:41   ` Paolo Bonzini
@ 2016-07-13 14:40     ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-13 14:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka,
	Peter Xu, Yang Zhang

2016-07-13 10:41+0200, Paolo Bonzini:
> On 12/07/2016 22:09, Radim Krčmář wrote:
>> KVM_CAP_X2APIC_API is a capability for features related to x2APIC
>> enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
>> extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
> 
> Let's make it KVM_X2APIC_API_USE_32BIT_IDS instead, as this matches the
> name of Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK ("verb adjective noun").

Yeah, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-13 10:15   ` Paolo Bonzini
@ 2016-07-13 14:52     ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-07-13 14:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka,
	Peter Xu, Yang Zhang

2016-07-13 12:15+0200, Paolo Bonzini:
> On 12/07/2016 22:09, Radim Krčmář wrote:
>> +static bool kvm_apic_map_need_slowpath(struct kvm *kvm, struct kvm_lapic **src,
>> +		struct kvm_lapic_irq *irq, struct kvm_apic_map *map)
>> +{
>> +	if (!map)
>> +		return true;
>> +
>> +	if (kvm->arch.x2apic_broadcast_quirk_disabled) {
>> +		if ((irq->dest_id == APIC_BROADCAST &&
>> +				map->mode != KVM_APIC_MODE_X2APIC))
>> +			return true;
>> +		if (irq->dest_id == X2APIC_BROADCAST)
>> +			return true;
>> +	} else {
>> +		bool x2apic_ipi = src && *src && apic_x2apic_mode(*src);
>> +		if (irq->dest_id == (x2apic_ipi ?
>> +		                     X2APIC_BROADCAST : APIC_BROADCAST))
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
> 
> This isn't the only case where you go through the slowpath.  What about
> renaming to kvm_apic_dest_is_broadcast?

Yeah, sorry.  I have a patch that adds a check for 0xff in flat lapic
mode (can be handled in the fast path) and used the same name even
before the check ...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-13  8:38   ` Paolo Bonzini
@ 2016-07-13 15:14     ` Radim Krčmář
  2016-07-13 15:30       ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-07-13 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka,
	Peter Xu, Yang Zhang

2016-07-13 10:38+0200, Paolo Bonzini:
> On 12/07/2016 22:09, Radim Krčmář wrote:
>> @@ -619,14 +619,17 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
>>  /* KVM APIC implementation has two quirks
>>   *  - dest always begins at 0 while xAPIC MDA has offset 24,
>>   *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
>> + *
>> + * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
> 
> --verbose version:
> 
>  /* KVM APIC implementation has two quirks
> - *  - dest always begins at 0 while xAPIC MDA has offset 24,
> - *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
> + *  - dest always begins at 0 while xAPIC MDA has offset 24.  This is
> + *    just a quirk in the API and is not problematic.
>   *
> - * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
> + *  - in-kernel IOAPIC messages have to be delivered directly to
> + *    x2APIC, because the kernel does not support interrupt remapping.
> + *    In order to support broadcast without interrupt remapping, x2APIC
> + *    rewrites the destination of non-IPI messages (and also of IPIs sent
> + *    from xAPIC-mode LAPICs) from APIC_BROADCAST to X2APIC_BROADCAST.

KVM doesn't do the operation in parentheses since 394457a928e0 ("KVM:
x86: some apic broadcast modes does not work").

Following patches added only !ipi case (I already regret posting them).

> + *
> + * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.  This is
> + * important when userspace wants to use x2APIC-format MSIs, because
> + * APIC_BROADCAST (0xff) is a legal route for "cluster 0, CPUs 0-7".
>   */
> 
> Sounds good?

Except that one bit.  (Code alone really can't explain the what happens
here.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk
  2016-07-13 15:14     ` Radim Krčmář
@ 2016-07-13 15:30       ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-07-13 15:30 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka,
	Peter Xu, Yang Zhang



On 13/07/2016 17:14, Radim Krčmář wrote:
> 2016-07-13 10:38+0200, Paolo Bonzini:
>> On 12/07/2016 22:09, Radim Krčmář wrote:
>>> @@ -619,14 +619,17 @@ static bool kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
>>>  /* KVM APIC implementation has two quirks
>>>   *  - dest always begins at 0 while xAPIC MDA has offset 24,
>>>   *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
>>> + *
>>> + * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
>>
>> --verbose version:
>>
>>  /* KVM APIC implementation has two quirks
>> - *  - dest always begins at 0 while xAPIC MDA has offset 24,
>> - *  - IOxAPIC messages have to be delivered (directly) to x2APIC.
>> + *  - dest always begins at 0 while xAPIC MDA has offset 24.  This is
>> + *    just a quirk in the API and is not problematic.
>>   *
>> - * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.
>> + *  - in-kernel IOAPIC messages have to be delivered directly to
>> + *    x2APIC, because the kernel does not support interrupt remapping.
>> + *    In order to support broadcast without interrupt remapping, x2APIC
>> + *    rewrites the destination of non-IPI messages (and also of IPIs sent
>> + *    from xAPIC-mode LAPICs) from APIC_BROADCAST to X2APIC_BROADCAST.
> 
> KVM doesn't do the operation in parentheses since 394457a928e0 ("KVM:
> x86: some apic broadcast modes does not work").
> 
> Following patches added only !ipi case (I already regret posting them).
> 
>> + *
>> + * The broadcast quirk can be disabled with KVM_CAP_X2APIC_API.  This is
>> + * important when userspace wants to use x2APIC-format MSIs, because
>> + * APIC_BROADCAST (0xff) is a legal route for "cluster 0, CPUs 0-7".
>>   */
>>
>> Sounds good?
> 
> Except that one bit.  (Code alone really can't explain the what happens
> here.)
> 

Ok, will fix both bits tomorrow and repush to kvm/queue.  Then test,
push to kvm/next, and pass the tree to you.

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-07-12 20:09 ` [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map Radim Krčmář
  2016-07-13  8:29   ` Paolo Bonzini
@ 2016-08-02 11:39   ` Wanpeng Li
  2016-08-02 12:22     ` Radim Krčmář
  1 sibling, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2016-08-02 11:39 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
[...]
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9880d03f533d..224fc1c5fcc6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -120,7 +120,7 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>         switch (map->mode) {
>         case KVM_APIC_MODE_X2APIC: {
>                 u32 offset = (dest_id >> 16) * 16;
> -               u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
> +               u32 max_apic_id = map->max_apic_id;
>
>                 if (offset <= max_apic_id) {
>                         u8 cluster_size = min(max_apic_id - offset + 1, 16U);
> @@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
>         struct kvm_apic_map *new, *old = NULL;
>         struct kvm_vcpu *vcpu;
>         int i;
> -
> -       new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
> +       u32 max_id = 255;

If this should be max_id = KVM_MAX_VCPU_ID? I have a patch on hand to
fix it, but I didn't know whether it is your desired behavior or not.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-08-02 11:39   ` Wanpeng Li
@ 2016-08-02 12:22     ` Radim Krčmář
  2016-08-02 13:15       ` Wanpeng Li
  0 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-08-02 12:22 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-08-02 19:39+0800, Wanpeng Li:
> 2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> [...]
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> @@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
>>         struct kvm_apic_map *new, *old = NULL;
>>         struct kvm_vcpu *vcpu;
>>         int i;
>> -
>> -       new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>> +       u32 max_id = 255;
> 
> If this should be max_id = KVM_MAX_VCPU_ID? I have a patch on hand to
> fix it, but I didn't know whether it is your desired behavior or not.

No, we could just have static array then.  KVM_MAX_VCPU_ID is expected
to be raised to INT_MAX and eventually UINT_MAX, so it would not be
practical.  The dynamic array is there to avoid wasting space in the
common case, where VMs have only low APIC IDs.

Inintial patches had 255 to minimize a chance of regressions as the
static array was that big, but starting with max_id = 0 is our goal and
should be ok even without other changes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map
  2016-08-02 12:22     ` Radim Krčmář
@ 2016-08-02 13:15       ` Wanpeng Li
  0 siblings, 0 replies; 32+ messages in thread
From: Wanpeng Li @ 2016-08-02 13:15 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-08-02 20:22 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2016-08-02 19:39+0800, Wanpeng Li:
>> 2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> [...]
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> @@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
>>>         struct kvm_apic_map *new, *old = NULL;
>>>         struct kvm_vcpu *vcpu;
>>>         int i;
>>> -
>>> -       new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
>>> +       u32 max_id = 255;
>>
>> If this should be max_id = KVM_MAX_VCPU_ID? I have a patch on hand to
>> fix it, but I didn't know whether it is your desired behavior or not.
>
> No, we could just have static array then.  KVM_MAX_VCPU_ID is expected
> to be raised to INT_MAX and eventually UINT_MAX, so it would not be
> practical.  The dynamic array is there to avoid wasting space in the
> common case, where VMs have only low APIC IDs.
>
> Inintial patches had 255 to minimize a chance of regressions as the
> static array was that big, but starting with max_id = 0 is our goal and
> should be ok even without other changes.

I see, thanks. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC
  2016-07-12 20:09 ` [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
@ 2016-08-02 13:26   ` Wanpeng Li
  2016-08-02 13:52     ` Radim Krčmář
  0 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2016-08-02 13:26 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> APIC ID should be set to the initial APIC ID when enabling LAPIC.
> This only matters if the guest changes APIC ID.  No sane OS does that.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 41089dbeeafc..0fce77fdbe91 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1720,9 +1720,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>
>         /* update jump label if enable bit changes */
>         if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
> -               if (value & MSR_IA32_APICBASE_ENABLE)
> +               if (value & MSR_IA32_APICBASE_ENABLE) {
> +                       kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
>                         static_key_slow_dec_deferred(&apic_hw_disabled);
> -               else
> +               } else
>                         static_key_slow_inc(&apic_hw_disabled.key);
>                 recalculate_apic_map(vcpu->kvm);
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If we can drop this recalculate_apic_map() since it has already been
called by kvm_apic_set_xapic_id().

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC
  2016-07-12 20:09 ` [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
@ 2016-08-02 13:46   ` Wanpeng Li
  2016-08-02 14:22     ` Radim Krčmář
  0 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2016-08-02 13:46 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
> reuse the physical array for them.  This allows us to save space by
> separating logical xAPIC maps.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  6 ++--
>  arch/x86/kvm/lapic.c            | 69 +++++++++++++++++++++--------------------
>  2 files changed, 39 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 53d39771842b..3194b19b9c7b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -683,8 +683,10 @@ struct kvm_apic_map {
>         struct rcu_head rcu;
>         u8 mode;
>         struct kvm_lapic *phys_map[256];
> -       /* first index is cluster id second is cpu id in a cluster */
> -       struct kvm_lapic *logical_map[16][16];
> +       union {
> +               struct kvm_lapic *xapic_flat_map[8];
> +               struct kvm_lapic *xapic_cluster_map[16][4];
> +       };
>  };
>
>  /* Hyper-V emulation context */
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2987843657db..9880d03f533d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -115,26 +115,36 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>         (LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>          APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>
> -/* The logical map is definitely wrong if we have multiple
> - * modes at the same time.  (Physical map is always right.)
> - */
> -static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
> -{
> -       return !(map->mode & (map->mode - 1));
> -}
> +static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> +               u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> +       switch (map->mode) {
> +       case KVM_APIC_MODE_X2APIC: {
> +               u32 offset = (dest_id >> 16) * 16;
> +               u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
>
> -static inline void
> -apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
> -{
> -       unsigned lid_bits;
> +               if (offset <= max_apic_id) {
> +                       u8 cluster_size = min(max_apic_id - offset + 1, 16U);
>
> -       BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
> -       BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
> -       BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
> -       lid_bits = map->mode;
> +                       *cluster = &map->phys_map[offset];
> +                       *mask = dest_id & (0xffff >> (16 - cluster_size));
> +               } else {
> +                       *mask = 0;
> +               }
>
> -       *cid = dest_id >> lid_bits;
> -       *lid = dest_id & ((1 << lid_bits) - 1);
> +               return true;
> +               }
> +       case KVM_APIC_MODE_XAPIC_FLAT:
> +               *cluster = map->xapic_flat_map;
> +               *mask = dest_id & 0xff;
> +               return true;
> +       case KVM_APIC_MODE_XAPIC_CLUSTER:
> +               *cluster = map->xapic_cluster_map[dest_id >> 4];
> +               *mask = dest_id & 0xf;
> +               return true;
> +       default:
> +               /* Not optimized. */
> +               return false;
> +       }
>  }
>
>  static void recalculate_apic_map(struct kvm *kvm)
> @@ -152,7 +162,8 @@ static void recalculate_apic_map(struct kvm *kvm)
>
>         kvm_for_each_vcpu(i, vcpu, kvm) {
>                 struct kvm_lapic *apic = vcpu->arch.apic;
> -               u16 cid, lid;
> +               struct kvm_lapic **cluster;
> +               u16 mask;
>                 u32 ldr, aid;
>
>                 if (!kvm_apic_present(vcpu))
> @@ -174,13 +185,11 @@ static void recalculate_apic_map(struct kvm *kvm)
>                                 new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
>                 }
>
> -               if (!kvm_apic_logical_map_valid(new))
> +               if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
>                         continue;

As the comments of the removed function kvm_apic_logical_map_valid()
mentioned: "The logical map is definitely wrong if we have multiple
modes at the same time". So once one lapic offend this, we will just
skip it. However, the offend mode is still not removed from the
new->mode, which results in  all the lapics after the offend one will
be skipped and they can't get the benefit of the apic map table. I can
make a patch once it make sense to you.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC
  2016-08-02 13:26   ` Wanpeng Li
@ 2016-08-02 13:52     ` Radim Krčmář
  0 siblings, 0 replies; 32+ messages in thread
From: Radim Krčmář @ 2016-08-02 13:52 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-08-02 21:26+0800, Wanpeng Li:
> 2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> APIC ID should be set to the initial APIC ID when enabling LAPIC.
>> This only matters if the guest changes APIC ID.  No sane OS does that.
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  arch/x86/kvm/lapic.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 41089dbeeafc..0fce77fdbe91 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1720,9 +1720,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>>
>>         /* update jump label if enable bit changes */
>>         if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
>> -               if (value & MSR_IA32_APICBASE_ENABLE)
>> +               if (value & MSR_IA32_APICBASE_ENABLE) {
>> +                       kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
>>                         static_key_slow_dec_deferred(&apic_hw_disabled);
>> -               else
>> +               } else
>>                         static_key_slow_inc(&apic_hw_disabled.key);
>>                 recalculate_apic_map(vcpu->kvm);
>                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> If we can drop this recalculate_apic_map() since it has already been
> called by kvm_apic_set_xapic_id().

Good point.  We would need to call it in the else branch, though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC
  2016-08-02 13:46   ` Wanpeng Li
@ 2016-08-02 14:22     ` Radim Krčmář
  2016-08-02 16:40       ` Nadav Amit
  0 siblings, 1 reply; 32+ messages in thread
From: Radim Krčmář @ 2016-08-02 14:22 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu, Yang Zhang

2016-08-02 21:46+0800, Wanpeng Li:
> 2016-07-13 4:09 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
>> reuse the physical array for them.  This allows us to save space by
>> separating logical xAPIC maps.
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  6 ++--
>>  arch/x86/kvm/lapic.c            | 69 +++++++++++++++++++++--------------------
>>  2 files changed, 39 insertions(+), 36 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 53d39771842b..3194b19b9c7b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -683,8 +683,10 @@ struct kvm_apic_map {
>>         struct rcu_head rcu;
>>         u8 mode;
>>         struct kvm_lapic *phys_map[256];
>> -       /* first index is cluster id second is cpu id in a cluster */
>> -       struct kvm_lapic *logical_map[16][16];
>> +       union {
>> +               struct kvm_lapic *xapic_flat_map[8];
>> +               struct kvm_lapic *xapic_cluster_map[16][4];
>> +       };
>>  };
>>
>>  /* Hyper-V emulation context */
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 2987843657db..9880d03f533d 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -115,26 +115,36 @@ static inline int apic_enabled(struct kvm_lapic *apic)
>>         (LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
>>          APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
>>
>> -/* The logical map is definitely wrong if we have multiple
>> - * modes at the same time.  (Physical map is always right.)
>> - */
>> -static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
>> -{
>> -       return !(map->mode & (map->mode - 1));
>> -}
>> +static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>> +               u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
>> +       switch (map->mode) {
>> +       case KVM_APIC_MODE_X2APIC: {
>> +               u32 offset = (dest_id >> 16) * 16;
>> +               u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
>> +               if (offset <= max_apic_id) {
>> +                       u8 cluster_size = min(max_apic_id - offset + 1, 16U);
>> +                       *cluster = &map->phys_map[offset];
>> +                       *mask = dest_id & (0xffff >> (16 - cluster_size));
>> +               } else {
>> +                       *mask = 0;
>> +               }
>> +               return true;
>> +               }
>> +       case KVM_APIC_MODE_XAPIC_FLAT:
>> +               *cluster = map->xapic_flat_map;
>> +               *mask = dest_id & 0xff;
>> +               return true;
>> +       case KVM_APIC_MODE_XAPIC_CLUSTER:
>> +               *cluster = map->xapic_cluster_map[dest_id >> 4];
>> +               *mask = dest_id & 0xf;
>> +               return true;
>> +       default:
>> +               /* Not optimized. */
>> +               return false;
>> +       }
> 
> As the comments of the removed function kvm_apic_logical_map_valid()
> mentioned: "The logical map is definitely wrong if we have multiple
> modes at the same time". So once one lapic offend this, we will just
> skip it. However, the offend mode is still not removed from the
> new->mode, which results in  all the lapics after the offend one will
> be skipped and they can't get the benefit of the apic map table. I can
> make a patch once it make sense to you.

Offending LAPIC marks the logical mapping as unusable, which was
intended.  kvm_apic_map_get_logical_dest() returns false.

The main problem is that SDM does not clearly forbid LAPICs in x2APIC
mode while some xAPIC LAPICs have nonzero LDR, so we should handle it
somewhat gracefully.
In combination with our IR-less IOAPIC hack, this would mean that we
would have to check both interpretations (x2 and xAPIC) of a logical
address before delivering.  Not to mention that we would also have to
decide how to handle a case when both interpretations are valid.

Mixed logical mode is rare and the slow path works fine with it, so I
think it doesn't make sense to do more than what we do now, i.e. detect
mixed in the fast path and fall back to the slow path.

(The map needs to keep a mode in any case, so we're not losing much by
 remembering that the mode is weird.)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC
  2016-08-02 14:22     ` Radim Krčmář
@ 2016-08-02 16:40       ` Nadav Amit
  0 siblings, 0 replies; 32+ messages in thread
From: Nadav Amit @ 2016-08-02 16:40 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Wanpeng Li, linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu,
	Igor Mammedov, Jan Kiszka, Peter Xu, Yang Zhang

Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> Offending LAPIC marks the logical mapping as unusable, which was
> intended.  kvm_apic_map_get_logical_dest() returns false.
> 
> The main problem is that SDM does not clearly forbid LAPICs in x2APIC
> mode while some xAPIC LAPICs have nonzero LDR, so we should handle it
> somewhat gracefully.
> In combination with our IR-less IOAPIC hack, this would mean that we
> would have to check both interpretations (x2 and xAPIC) of a logical
> address before delivering.  Not to mention that we would also have to
> decide how to handle a case when both interpretations are valid.

My recollection of running Intel tests is that they do leave sometimes
some of the cores in the xAPIC mode while others are in x2APIC mode.
In this case, IIRC they do not generate IPIs from the xAPIC core.

IIRC they do expect broadcast interrupts to be received by all cores,
including those with xAPIC mode. However, I am unsure about this last
point.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-08-02 16:40 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12 20:09 [PATCH v3 00/14] KVM: x86: break the xAPIC barrier Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 01/14] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 02/14] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 03/14] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
2016-08-02 13:46   ` Wanpeng Li
2016-08-02 14:22     ` Radim Krčmář
2016-08-02 16:40       ` Nadav Amit
2016-07-12 20:09 ` [PATCH v3 04/14] KVM: x86: dynamic kvm_apic_map Radim Krčmář
2016-07-13  8:29   ` Paolo Bonzini
2016-07-13 14:37     ` Radim Krčmář
2016-08-02 11:39   ` Wanpeng Li
2016-08-02 12:22     ` Radim Krčmář
2016-08-02 13:15       ` Wanpeng Li
2016-07-12 20:09 ` [PATCH v3 05/14] KVM: x86: use generic function for MSI parsing Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 06/14] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 07/14] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
2016-08-02 13:26   ` Wanpeng Li
2016-08-02 13:52     ` Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 08/14] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 09/14] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 10/14] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 11/14] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
2016-07-13  8:41   ` Paolo Bonzini
2016-07-13 14:40     ` Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 12/14] KVM: x86: add a flag to disable KVM x2apic broadcast quirk Radim Krčmář
2016-07-13  8:38   ` Paolo Bonzini
2016-07-13 15:14     ` Radim Krčmář
2016-07-13 15:30       ` Paolo Bonzini
2016-07-13 10:15   ` Paolo Bonzini
2016-07-13 14:52     ` Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 13/14] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
2016-07-12 20:09 ` [PATCH v3 14/14] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).