linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] KVM: x86: break the xAPIC barrier
@ 2016-07-07 17:15 Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 01/13] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

This series allows userspace to create and send interrupts to VCPUs with
APIC ID > 255.

v2:
dropped "KVM: x86: use u16 for logical VCPU mask in lapic" [Paolo]
[1,5,8,9,12/13] r-b Paolo
[7,10,13/13] new [Paolo]
[2/13] do not point to out-of-scope value (return &src) [Paolo]
[3/13]
 * split from the next patch [Paolo]
 * handle all sizes of the lapic array [Paolo]
[4/13]
 * replaced size with max_apic_id to minimize chances of overflow [Andrew]
 * fixed allocation size [Paolo]
[6/13]
 * removed an obvious comment about recalculate_apic_id in state_set
 * refactor kvm_apic_state_fixup and kvm_apic_post_state_restore [Paolo]
 * reset apic id on xapic->disabled->xapic transitions [Paolo]
[11/13]
 * pass struct kvm into kvm_set_msi_irq [Paolo]
 * trace address_hi [David]
 * use hex dst, like other tracepoins
 * strict reserved MSI bits checking [Paolo]
 * loose reserved capability bits checking [Paolo]
 * improved documentation [Paolo]

v1: http://www.spinics.net/lists/kvm/msg134921.html


Radim Krčmář (13):
  KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  KVM: x86: add kvm_apic_map_get_dest_lapic
  KVM: x86: use physical LAPIC array for logical x2APIC
  KVM: x86: dynamic kvm_apic_map
  KVM: x86: use generic function for MSI parsing
  KVM: x86: use hardware-compatible format for APIC ID register
  KVM: x86: reset APIC ID when enabling LAPIC
  KVM: VMX: optimize APIC ID read with APICv
  KVM: x86: reset lapic base in kvm_lapic_reset
  KVM: pass struct kvm to kvm_set_routing_entry
  KVM: x86: add KVM_CAP_X2APIC_API
  KVM: x86: bump MAX_VCPUS to 288
  KVM: x86: bump KVM_MAX_VCPU_ID to 1023

 Documentation/virtual/kvm/api.txt |  32 ++++
 arch/powerpc/kvm/mpic.c           |   3 +-
 arch/s390/kvm/interrupt.c         |   3 +-
 arch/x86/include/asm/kvm_host.h   |  18 +-
 arch/x86/kvm/irq_comm.c           |  44 +++--
 arch/x86/kvm/lapic.c              | 380 +++++++++++++++++++-------------------
 arch/x86/kvm/lapic.h              |  10 +-
 arch/x86/kvm/vmx.c                |   5 +-
 arch/x86/kvm/x86.c                |  17 +-
 include/linux/kvm_host.h          |   3 +-
 include/trace/events/kvm.h        |   5 +-
 include/uapi/linux/kvm.h          |   1 +
 virt/kvm/irqchip.c                |   7 +-
 13 files changed, 300 insertions(+), 228 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/13] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 02/13] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

240 has been well tested by Red Hat.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: r-b Paolo

 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a628fb6a2c2..53d39771842b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
 #include <asm/kvm_page_track.h>
 
 #define KVM_MAX_VCPUS 255
-#define KVM_SOFT_MAX_VCPUS 160
+#define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/13] KVM: x86: add kvm_apic_map_get_dest_lapic
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 01/13] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 03/13] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
compute the interrupt destination.  Factor the code.

'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
GCC might complain about potential NULL access in the future, because it
missed conditions that avoided uninitialized uses of dst.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: do not point to out-of-scope value (return &src) [Paolo]

 arch/x86/kvm/lapic.c | 242 ++++++++++++++++++++++-----------------------------
 1 file changed, 104 insertions(+), 138 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 22a6474af220..2987843657db 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -671,14 +671,99 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
 	}
 }
 
+/* Return true if the interrupt can be handled by using *bitmap as index mask
+ * for valid destinations in *dst array.
+ * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
+ * Note: we may have zero kvm_lapic destinations when we return true, which
+ * means that the interrupt should be dropped.  In this case, *bitmap would be
+ * zero and *dst undefined.
+ */
+static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
+		struct kvm_lapic **src, struct kvm_lapic_irq *irq,
+		struct kvm_apic_map *map, struct kvm_lapic ***dst,
+		unsigned long *bitmap)
+{
+	int i, lowest;
+	bool x2apic_ipi;
+	u16 cid;
+
+	if (irq->shorthand == APIC_DEST_SELF && src) {
+		*dst = src;
+		*bitmap = 1;
+		return true;
+	} else if (irq->shorthand)
+		return false;
+
+	x2apic_ipi = src && *src && apic_x2apic_mode(*src);
+	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
+		return false;
+
+	if (!map)
+		return false;
+
+	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
+		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+			*bitmap = 0;
+		} else {
+			*dst = &map->phys_map[irq->dest_id];
+			*bitmap = 1;
+		}
+		return true;
+	}
+
+	if (!kvm_apic_logical_map_valid(map))
+		return false;
+
+	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
+
+	if (cid >= ARRAY_SIZE(map->logical_map)) {
+		*bitmap = 0;
+		return true;
+	}
+
+	*dst = map->logical_map[cid];
+
+	if (!kvm_lowest_prio_delivery(irq))
+		return true;
+
+	if (!kvm_vector_hashing_enabled()) {
+		lowest = -1;
+		for_each_set_bit(i, bitmap, 16) {
+			if (!(*dst)[i])
+				continue;
+			if (lowest < 0)
+				lowest = i;
+			else if (kvm_apic_compare_prio((*dst)[i]->vcpu,
+						(*dst)[lowest]->vcpu) < 0)
+				lowest = i;
+		}
+	} else {
+		if (!*bitmap)
+			return true;
+
+		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
+				bitmap, 16);
+
+		if (!(*dst)[lowest]) {
+			kvm_apic_disabled_lapic_found(kvm);
+			*bitmap = 0;
+			return true;
+		}
+	}
+
+	*bitmap = (lowest >= 0) ? 1 << lowest : 0;
+
+	return true;
+}
+
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
 	struct kvm_apic_map *map;
-	unsigned long bitmap = 1;
-	struct kvm_lapic **dst;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	int i;
-	bool ret, x2apic_ipi;
+	bool ret;
 
 	*r = -1;
 
@@ -687,86 +772,19 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		return true;
 	}
 
-	if (irq->shorthand)
-		return false;
-
-	x2apic_ipi = src && apic_x2apic_mode(src);
-	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
-		return false;
-
-	ret = true;
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map) {
-		ret = false;
-		goto out;
-	}
-
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = &map->phys_map[irq->dest_id];
-	} else {
-		u16 cid;
-
-		if (!kvm_apic_logical_map_valid(map)) {
-			ret = false;
-			goto out;
+	ret = kvm_apic_map_get_dest_lapic(kvm, &src, irq, map, &dst, &bitmap);
+	if (ret)
+		for_each_set_bit(i, &bitmap, 16) {
+			if (!dst[i])
+				continue;
+			if (*r < 0)
+				*r = 0;
+			*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
 		}
 
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		dst = map->logical_map[cid];
-
-		if (!kvm_lowest_prio_delivery(irq))
-			goto set_irq;
-
-		if (!kvm_vector_hashing_enabled()) {
-			int l = -1;
-			for_each_set_bit(i, &bitmap, 16) {
-				if (!dst[i])
-					continue;
-				if (l < 0)
-					l = i;
-				else if (kvm_apic_compare_prio(dst[i]->vcpu,
-							dst[l]->vcpu) < 0)
-					l = i;
-			}
-			bitmap = (l >= 0) ? 1 << l : 0;
-		} else {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector,
-				dest_vcpus, &bitmap, 16);
-
-			if (!dst[idx]) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			bitmap = (idx >= 0) ? 1 << idx : 0;
-		}
-	}
-
-set_irq:
-	for_each_set_bit(i, &bitmap, 16) {
-		if (!dst[i])
-			continue;
-		if (*r < 0)
-			*r = 0;
-		*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
-	}
-out:
 	rcu_read_unlock();
 	return ret;
 }
@@ -789,8 +807,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
 	struct kvm_apic_map *map;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	bool ret = false;
-	struct kvm_lapic *dst = NULL;
 
 	if (irq->shorthand)
 		return false;
@@ -798,69 +817,16 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map)
-		goto out;
+	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
+			hweight16(bitmap) == 1) {
+		unsigned long i = find_first_bit(&bitmap, 16);
 
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id == 0xFF)
-			goto out;
-
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = map->phys_map[irq->dest_id];
-		if (dst && kvm_apic_present(dst->vcpu))
-			*dest_vcpu = dst->vcpu;
-		else
-			goto out;
-	} else {
-		u16 cid;
-		unsigned long bitmap = 1;
-		int i, r = 0;
-
-		if (!kvm_apic_logical_map_valid(map))
-			goto out;
-
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		if (kvm_vector_hashing_enabled() &&
-				kvm_lowest_prio_delivery(irq)) {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector, dest_vcpus,
-						  &bitmap, 16);
-
-			dst = map->logical_map[cid][idx];
-			if (!dst) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			*dest_vcpu = dst->vcpu;
-		} else {
-			for_each_set_bit(i, &bitmap, 16) {
-				dst = map->logical_map[cid][i];
-				if (++r == 2)
-					goto out;
-			}
-
-			if (dst && kvm_apic_present(dst->vcpu))
-				*dest_vcpu = dst->vcpu;
-			else
-				goto out;
+		if (dst[i]) {
+			*dest_vcpu = dst[i]->vcpu;
+			ret = true;
 		}
 	}
 
-	ret = true;
-out:
 	rcu_read_unlock();
 	return ret;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/13] KVM: x86: use physical LAPIC array for logical x2APIC
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 01/13] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 02/13] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map Radim Krčmář
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

Logical x2APIC IDs map injectively to physical x2APIC IDs, so we can
reuse the physical array for them.  This allows us to save space by
separating logical xAPIC maps.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * split from the next patch [Paolo]
 * handle all sizes of the lapic array [Paolo]

 arch/x86/include/asm/kvm_host.h |  6 ++--
 arch/x86/kvm/lapic.c            | 69 +++++++++++++++++++++--------------------
 2 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53d39771842b..3194b19b9c7b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -683,8 +683,10 @@ struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
 	struct kvm_lapic *phys_map[256];
-	/* first index is cluster id second is cpu id in a cluster */
-	struct kvm_lapic *logical_map[16][16];
+	union {
+		struct kvm_lapic *xapic_flat_map[8];
+		struct kvm_lapic *xapic_cluster_map[16][4];
+	};
 };
 
 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2987843657db..9880d03f533d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -115,26 +115,36 @@ static inline int apic_enabled(struct kvm_lapic *apic)
 	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
 	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
 
-/* The logical map is definitely wrong if we have multiple
- * modes at the same time.  (Physical map is always right.)
- */
-static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
-{
-	return !(map->mode & (map->mode - 1));
-}
+static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
+		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
+	switch (map->mode) {
+	case KVM_APIC_MODE_X2APIC: {
+		u32 offset = (dest_id >> 16) * 16;
+		u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
 
-static inline void
-apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
-{
-	unsigned lid_bits;
+		if (offset <= max_apic_id) {
+			u8 cluster_size = min(max_apic_id - offset + 1, 16U);
 
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
-	BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
-	lid_bits = map->mode;
+			*cluster = &map->phys_map[offset];
+			*mask = dest_id & (0xffff >> (16 - cluster_size));
+		} else {
+			*mask = 0;
+		}
 
-	*cid = dest_id >> lid_bits;
-	*lid = dest_id & ((1 << lid_bits) - 1);
+		return true;
+		}
+	case KVM_APIC_MODE_XAPIC_FLAT:
+		*cluster = map->xapic_flat_map;
+		*mask = dest_id & 0xff;
+		return true;
+	case KVM_APIC_MODE_XAPIC_CLUSTER:
+		*cluster = map->xapic_cluster_map[dest_id >> 4];
+		*mask = dest_id & 0xf;
+		return true;
+	default:
+		/* Not optimized. */
+		return false;
+	}
 }
 
 static void recalculate_apic_map(struct kvm *kvm)
@@ -152,7 +162,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
-		u16 cid, lid;
+		struct kvm_lapic **cluster;
+		u16 mask;
 		u32 ldr, aid;
 
 		if (!kvm_apic_present(vcpu))
@@ -174,13 +185,11 @@ static void recalculate_apic_map(struct kvm *kvm)
 				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
 		}
 
-		if (!kvm_apic_logical_map_valid(new))
+		if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
 			continue;
 
-		apic_logical_id(new, ldr, &cid, &lid);
-
-		if (lid && cid < ARRAY_SIZE(new->logical_map))
-			new->logical_map[cid][ffs(lid) - 1] = apic;
+		if (mask)
+			cluster[ffs(mask) - 1] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
@@ -685,7 +694,6 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 {
 	int i, lowest;
 	bool x2apic_ipi;
-	u16 cid;
 
 	if (irq->shorthand == APIC_DEST_SELF && src) {
 		*dst = src;
@@ -711,18 +719,11 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 		return true;
 	}
 
-	if (!kvm_apic_logical_map_valid(map))
+	*bitmap = 0;
+	if (!kvm_apic_map_get_logical_dest(map, irq->dest_id, dst,
+				(u16 *)bitmap))
 		return false;
 
-	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
-
-	if (cid >= ARRAY_SIZE(map->logical_map)) {
-		*bitmap = 0;
-		return true;
-	}
-
-	*dst = map->logical_map[cid];
-
 	if (!kvm_lowest_prio_delivery(irq))
 		return true;
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (2 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 03/13] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-11  6:07   ` Yang Zhang
  2016-07-07 17:15 ` [PATCH v2 05/13] KVM: x86: use generic function for MSI parsing Radim Krčmář
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
have slighly less VCPUs.  Dynamic size saves memory at the cost of
turning one constant into a variable.

apic_map mutex had to be moved before allocation to avoid races with cpu
hotplug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * replaced size with max_apic_id to minimize chances of overflow [Andrew]
 * fixed allocation size [Paolo]

 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/lapic.c            | 18 +++++++++++++-----
 arch/x86/kvm/lapic.h            |  2 +-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3194b19b9c7b..643e3dffcd85 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -682,11 +682,12 @@ struct kvm_arch_memory_slot {
 struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
-	struct kvm_lapic *phys_map[256];
+	u32 max_apic_id;
 	union {
 		struct kvm_lapic *xapic_flat_map[8];
 		struct kvm_lapic *xapic_cluster_map[16][4];
 	};
+	struct kvm_lapic *phys_map[];
 };
 
 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9880d03f533d..224fc1c5fcc6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -120,7 +120,7 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 	switch (map->mode) {
 	case KVM_APIC_MODE_X2APIC: {
 		u32 offset = (dest_id >> 16) * 16;
-		u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
+		u32 max_apic_id = map->max_apic_id;
 
 		if (offset <= max_apic_id) {
 			u8 cluster_size = min(max_apic_id - offset + 1, 16U);
@@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	int i;
-
-	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
+	u32 max_id = 255;
 
 	mutex_lock(&kvm->arch.apic_map_lock);
 
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		if (kvm_apic_present(vcpu))
+			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
+
+	new = kzalloc(sizeof(struct kvm_apic_map) +
+	              sizeof(struct kvm_lapic *) * (max_id + 1), GFP_KERNEL);
+
 	if (!new)
 		goto out;
 
+	new->max_apic_id = max_id;
+
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
 		struct kvm_lapic **cluster;
@@ -172,7 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 		aid = kvm_apic_id(apic);
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
 
-		if (aid < ARRAY_SIZE(new->phys_map))
+		if (aid <= new->max_apic_id)
 			new->phys_map[aid] = apic;
 
 		if (apic_x2apic_mode(apic)) {
@@ -710,7 +718,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
 		return false;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+		if (irq->dest_id > map->max_apic_id) {
 			*bitmap = 0;
 		} else {
 			*dst = &map->phys_map[irq->dest_id];
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 336ba51bb16e..8d811139d2b3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -200,7 +200,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
 }
 
-static inline int kvm_apic_id(struct kvm_lapic *apic)
+static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
 	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/13] KVM: x86: use generic function for MSI parsing
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (3 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 06/13] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: r-b Paolo

 arch/x86/kvm/irq_comm.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index dfb4c6476877..47ad681a33fd 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -388,21 +388,16 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			       kvm->arch.nr_reserved_ioapic_pins);
 	for (i = 0; i < nr_ioapic_pins; ++i) {
 		hlist_for_each_entry(entry, &table->map[i], link) {
-			u32 dest_id, dest_mode;
-			bool level;
+			struct kvm_lapic_irq irq;
 
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
-			dest_id = (entry->msi.address_lo >> 12) & 0xff;
-			dest_mode = (entry->msi.address_lo >> 2) & 0x1;
-			level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
-			if (level && kvm_apic_match_dest(vcpu, NULL, 0,
-						dest_id, dest_mode)) {
-				u32 vector = entry->msi.data & 0xff;
 
-				__set_bit(vector,
-					  ioapic_handled_vectors);
-			}
+			kvm_set_msi_irq(entry, &irq);
+
+			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
+						irq.dest_id, irq.dest_mode))
+				__set_bit(irq.vector, ioapic_handled_vectors);
 		}
 	}
 	srcu_read_unlock(&kvm->irq_srcu, idx);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/13] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (4 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 05/13] KVM: x86: use generic function for MSI parsing Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 07/13] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

We currently always shift APIC ID as if APIC was in xAPIC mode.
x2APIC mode wants to use more bits and storing a hardware-compabible
value is the the sanest option.

KVM API to set the lapic expects that bottom 8 bits of APIC ID are in
top 8 bits of APIC_ID register, so the register needs to be shifted in
x2APIC mode.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * removed an obvious comment about recalculate_apic_id in state_set
 * refactor kvm_apic_state_fixup and kvm_apic_post_state_restore [Paolo]
 * reset apic id on xapic->disabled->xapic transitions [Paolo]

 arch/x86/kvm/lapic.c | 52 +++++++++++++++++++++++++++++++++++++---------------
 arch/x86/kvm/lapic.h |  8 +++++---
 arch/x86/kvm/x86.c   | 10 ++++++----
 3 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 224fc1c5fcc6..41089dbeeafc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -227,7 +227,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 	}
 }
 
-static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id)
 {
 	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
 	recalculate_apic_map(apic->vcpu->kvm);
@@ -239,11 +239,11 @@ static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
-static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
 {
 	u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
 
-	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
+	kvm_lapic_set_reg(apic, APIC_ID, id);
 	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
@@ -1102,12 +1102,6 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int offset)
 		return 0;
 
 	switch (offset) {
-	case APIC_ID:
-		if (apic_x2apic_mode(apic))
-			val = kvm_apic_id(apic);
-		else
-			val = kvm_apic_id(apic) << 24;
-		break;
 	case APIC_ARBPRI:
 		apic_debug("Access APIC ARBPRI register which is for P6\n");
 		break;
@@ -1465,7 +1459,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	switch (reg) {
 	case APIC_ID:		/* Local APIC ID */
 		if (!apic_x2apic_mode(apic))
-			kvm_apic_set_id(apic, val >> 24);
+			kvm_apic_set_xapic_id(apic, val >> 24);
 		else
 			ret = 1;
 		break;
@@ -1769,7 +1763,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
 	if (!init_event)
-		kvm_apic_set_id(apic, vcpu->vcpu_id);
+		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1990,17 +1984,43 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	return vector;
 }
 
-void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s)
+static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s, bool set)
+{
+	if (apic_x2apic_mode(vcpu->arch.apic)) {
+		u32 *id = (u32 *)(s->regs + APIC_ID);
+
+		if (set)
+			*id >>= 24;
+		else
+			*id <<= 24;
+	}
+
+	return 0;
+}
+
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+{
+	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
+	return kvm_apic_state_fixup(vcpu, s, false);
+}
+
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
+	int r;
+
 
 	kvm_lapic_set_base(vcpu, vcpu->arch.apic_base);
 	/* set SPIV separately to get count of SW disabled APICs right */
 	apic_set_spiv(apic, *((u32 *)(s->regs + APIC_SPIV)));
+
+	r = kvm_apic_state_fixup(vcpu, s, true);
+	if (r)
+		return r;
 	memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
-	/* call kvm_apic_set_id() to put apic into apic_map */
-	kvm_apic_set_id(apic, kvm_apic_id(apic));
+
+	recalculate_apic_map(vcpu->kvm);
 	kvm_apic_set_version(vcpu);
 
 	apic_update_ppr(apic);
@@ -2026,6 +2046,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		kvm_rtc_eoi_tracking_restore_one(vcpu);
 
 	vcpu->arch.apic_arb_prio = 0;
+
+	return 0;
 }
 
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8d811139d2b3..b904127d6fbf 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -81,8 +81,8 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
-void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s);
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
@@ -202,7 +202,9 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
-	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+	u32 id = kvm_lapic_get_reg(apic, APIC_ID);
+
+	return apic_x2apic_mode(apic) ? id : id >> 24;
 }
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf834cdd..b178c8c12717 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2779,15 +2779,17 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 	if (vcpu->arch.apicv_active)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
-	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
-
-	return 0;
+	return kvm_apic_get_state(vcpu, s);
 }
 
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
-	kvm_apic_post_state_restore(vcpu, s);
+	int r;
+
+	r = kvm_apic_set_state(vcpu, s);
+	if (r)
+		return r;
 	update_cr8_intercept(vcpu);
 
 	return 0;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/13] KVM: x86: reset APIC ID when enabling LAPIC
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (5 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 06/13] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 08/13] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

APIC ID should be set to the initial APIC ID when enabling LAPIC.
This only matters if the guest changes APIC ID.  No sane OS does that.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new [Paolo]

 arch/x86/kvm/lapic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 41089dbeeafc..0fce77fdbe91 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1720,9 +1720,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 
 	/* update jump label if enable bit changes */
 	if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
-		if (value & MSR_IA32_APICBASE_ENABLE)
+		if (value & MSR_IA32_APICBASE_ENABLE) {
+			kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 			static_key_slow_dec_deferred(&apic_hw_disabled);
-		else
+		} else
 			static_key_slow_inc(&apic_hw_disabled.key);
 		recalculate_apic_map(vcpu->kvm);
 	}
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/13] KVM: VMX: optimize APIC ID read with APICv
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (6 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 07/13] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 09/13] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

The register is in hardware-compatible format now, so there is not need
to intercept.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: r-b Paolo

 arch/x86/kvm/vmx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e564fa2c7ac8..0a6a8845ea0c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6447,9 +6447,6 @@ static __init int hardware_setup(void)
 	for (msr = 0x800; msr <= 0x8ff; msr++)
 		vmx_disable_intercept_msr_read_x2apic(msr);
 
-	/* According SDM, in x2apic mode, the whole id reg is used.  But in
-	 * KVM, it only use the highest eight bits. Need to intercept it */
-	vmx_enable_intercept_msr_read_x2apic(0x802);
 	/* TMCCT */
 	vmx_enable_intercept_msr_read_x2apic(0x839);
 	/* TPR */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/13] KVM: x86: reset lapic base in kvm_lapic_reset
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (7 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 08/13] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 10/13] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

LAPIC is reset in xAPIC mode and the surrounding code expects that.
KVM never resets after initialization.  This patch is just for sanity.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: r-b Paolo

 arch/x86/kvm/lapic.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0fce77fdbe91..3c2a8c113054 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1763,8 +1763,11 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/* Stop the timer in case it's a reset to an active apic */
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
-	if (!init_event)
+	if (!init_event) {
+		kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE |
+		                         MSR_IA32_APICBASE_ENABLE);
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
+	}
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1903,9 +1906,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 	 * thinking that APIC satet has changed.
 	 */
 	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
-	kvm_lapic_set_base(vcpu,
-			APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
-
 	static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */
 	kvm_lapic_reset(vcpu, false);
 	kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/13] KVM: pass struct kvm to kvm_set_routing_entry
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (8 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 09/13] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

Arch-specific code will use it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new [Paolo]

 arch/powerpc/kvm/mpic.c   | 3 ++-
 arch/s390/kvm/interrupt.c | 3 ++-
 arch/x86/kvm/irq_comm.c   | 3 ++-
 include/linux/kvm_host.h  | 3 ++-
 virt/kvm/irqchip.c        | 7 ++++---
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 6249cdc834d1..ed38f8114118 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -1823,7 +1823,8 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 	return 0;
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int r = -EINVAL;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index ca19627779db..24524c0f3ef8 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2246,7 +2246,8 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	return ret;
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int ret;
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 47ad681a33fd..889563d50c55 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -248,7 +248,8 @@ static int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
 	return kvm_hv_synic_set_irq(kvm, e->hv_sint.vcpu, e->hv_sint.sint);
 }
 
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue)
 {
 	int r = -EINVAL;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 66b2f6159aad..60d339faa94c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1011,7 +1011,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			const struct kvm_irq_routing_entry *entries,
 			unsigned nr,
 			unsigned flags);
-int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
+int kvm_set_routing_entry(struct kvm *kvm,
+			  struct kvm_kernel_irq_routing_entry *e,
 			  const struct kvm_irq_routing_entry *ue);
 void kvm_free_irq_routing(struct kvm *kvm);
 
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 8db197bb6c7a..df99e9c3b64d 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -135,7 +135,8 @@ void kvm_free_irq_routing(struct kvm *kvm)
 	free_irq_routing_table(rt);
 }
 
-static int setup_routing_entry(struct kvm_irq_routing_table *rt,
+static int setup_routing_entry(struct kvm *kvm,
+			       struct kvm_irq_routing_table *rt,
 			       struct kvm_kernel_irq_routing_entry *e,
 			       const struct kvm_irq_routing_entry *ue)
 {
@@ -154,7 +155,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
 
 	e->gsi = ue->gsi;
 	e->type = ue->type;
-	r = kvm_set_routing_entry(e, ue);
+	r = kvm_set_routing_entry(kvm, e, ue);
 	if (r)
 		goto out;
 	if (e->type == KVM_IRQ_ROUTING_IRQCHIP)
@@ -211,7 +212,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
 			kfree(e);
 			goto out;
 		}
-		r = setup_routing_entry(new, e, ue);
+		r = setup_routing_entry(kvm, new, e, ue);
 		if (r) {
 			kfree(e);
 			goto out;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (9 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 10/13] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-11  6:06   ` Yang Zhang
  2016-07-07 17:15 ` [PATCH v2 12/13] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
addresses to 32 bits.  Both are needed to support x2APIC.

The capability has to be toggleable and disabled by default, because get/set
ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
inspired by xAPIC and the change is not backward-compatible.

Changes to MSI addresses follow the format used by interrupt remapping unit.
The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
address in its upper 24 bits.  Lower 8 bits are reserved as 0.
Using the upper address word is not backward-compatible either as we didn't
check that userspace zeroed the word.  Reserved bits are still not explicitly
checked, but non-zero data will affect LAPIC addresses, which will cause a bug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2:
 * pass struct kvm into kvm_set_msi_irq [Paolo]
 * trace address_hi [David]
 * use hex dst, like other tracepoins
 * strict reserved MSI bits checking [Paolo]
 * loose reserved capability bits checking [Paolo]
 * improved documentation [Paolo]

 Documentation/virtual/kvm/api.txt | 32 ++++++++++++++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h   |  4 +++-
 arch/x86/kvm/irq_comm.c           | 26 +++++++++++++++++++++-----
 arch/x86/kvm/lapic.c              | 13 +++++++++----
 arch/x86/kvm/vmx.c                |  2 +-
 arch/x86/kvm/x86.c                |  5 +++++
 include/trace/events/kvm.h        |  5 +++--
 include/uapi/linux/kvm.h          |  1 +
 8 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 09efa9eb3926..a8f2ef910f98 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1482,6 +1482,10 @@ struct kvm_irq_routing_msi {
 	__u32 pad;
 };
 
+On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
+enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
+destination id.  Bits 7-0 of address_hi must be zero.
+
 struct kvm_irq_routing_s390_adapter {
 	__u64 ind_addr;
 	__u64 summary_addr;
@@ -1583,6 +1587,15 @@ struct kvm_lapic_state {
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register depends
+on the APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores
+APIC ID in the APIC_ID register (bytes 32-35).  xAPIC only allows an 8-bit APIC
+ID which is stored in bits 31-24 of the APIC register, or equivalently in byte
+35 of struct kvm_lapic_state's regs field.
+
+If KVM_CAP_X2APIC_API is disabled, struct kvm_lapic_state always uses xAPIC
+format.
+
 
 4.58 KVM_SET_LAPIC
 
@@ -1600,6 +1613,10 @@ struct kvm_lapic_state {
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
+regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
+See the note in KVM_GET_LAPIC.
+
 
 4.59 KVM_IOEVENTFD
 
@@ -2180,6 +2197,10 @@ struct kvm_msi {
 
 No flags are defined so far. The corresponding field must be 0.
 
+On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
+enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
+destination id.  Bits 7-0 of address_hi must be zero.
+
 
 4.71 KVM_CREATE_PIT2
 
@@ -3811,6 +3832,17 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
 Will return -EINVAL if the machine does not support runtime-instrumentation.
 Will return -EBUSY if a VCPU has already been created.
 
+7.7 KVM_CAP_X2APIC_API
+
+Architectures: x86
+Parameters: none
+Returns: 0
+
+Enabling this capability changes the behavior of KVM_SET_GSI_ROUTING,
+KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, allowing the use of 32-bit
+APIC IDs.  See KVM_CAP_X2APIC_API in their respective sections.
+
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 643e3dffcd85..f1b202b34c72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -782,6 +782,8 @@ struct kvm_arch {
 	u32 ldr_mode;
 	struct page *avic_logical_id_table_page;
 	struct page *avic_physical_id_table_page;
+
+	bool x2apic_api;
 };
 
 struct kvm_vm_stat {
@@ -1364,7 +1366,7 @@ bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
 bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			     struct kvm_vcpu **dest_vcpu);
 
-void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq);
 
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 889563d50c55..2ff9691e4603 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -110,13 +110,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 		     struct kvm_lapic_irq *irq)
 {
-	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
+	trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_api ?
+	                                     (u64)e->msi.address_hi << 32 : 0),
+	                      e->msi.data);
 
 	irq->dest_id = (e->msi.address_lo &
 			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+	if (kvm->arch.x2apic_api)
+		irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
 	irq->vector = (e->msi.data &
 			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
 	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
@@ -129,15 +133,24 @@ void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
 }
 EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
+static inline bool kvm_msi_route_invalid(struct kvm *kvm,
+                                       struct kvm_kernel_irq_routing_entry *e)
+{
+	return kvm->arch.x2apic_api && (e->msi.address_hi & 0xff);
+}
+
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 		struct kvm *kvm, int irq_source_id, int level, bool line_status)
 {
 	struct kvm_lapic_irq irq;
 
+	if (kvm_msi_route_invalid(kvm, e))
+		return -EINVAL;
+
 	if (!level)
 		return -1;
 
-	kvm_set_msi_irq(e, &irq);
+	kvm_set_msi_irq(kvm, e, &irq);
 
 	return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
 }
@@ -153,7 +166,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
 	if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
 		return -EWOULDBLOCK;
 
-	kvm_set_msi_irq(e, &irq);
+	kvm_set_msi_irq(kvm, e, &irq);
 
 	if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
 		return r;
@@ -286,6 +299,9 @@ int kvm_set_routing_entry(struct kvm *kvm,
 		e->msi.address_lo = ue->u.msi.address_lo;
 		e->msi.address_hi = ue->u.msi.address_hi;
 		e->msi.data = ue->u.msi.data;
+
+		if (kvm_msi_route_invalid(kvm, e))
+			goto out;
 		break;
 	case KVM_IRQ_ROUTING_HV_SINT:
 		e->set = kvm_hv_set_sint;
@@ -394,7 +410,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
 
-			kvm_set_msi_irq(entry, &irq);
+			kvm_set_msi_irq(vcpu->kvm, entry, &irq);
 
 			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
 						irq.dest_id, irq.dest_mode))
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3c2a8c113054..b47a82bfc8c4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1991,10 +1991,15 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 	if (apic_x2apic_mode(vcpu->arch.apic)) {
 		u32 *id = (u32 *)(s->regs + APIC_ID);
 
-		if (set)
-			*id >>= 24;
-		else
-			*id <<= 24;
+		if (vcpu->kvm->arch.x2apic_api) {
+			if (*id != vcpu->vcpu_id)
+				return -EINVAL;
+		} else {
+			if (set)
+				*id >>= 24;
+			else
+				*id <<= 24;
+		}
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0a6a8845ea0c..ada9437d6e18 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11075,7 +11075,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		 * We will support full lowest-priority interrupt later.
 		 */
 
-		kvm_set_msi_irq(e, &irq);
+		kvm_set_msi_irq(kvm, e, &irq);
 		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
 			/*
 			 * Make sure the IRTE is in remapped mode if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b178c8c12717..5d4089dcb3eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2576,6 +2576,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
+	case KVM_CAP_X2APIC_API:
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 	case KVM_CAP_PCI_2_3:
@@ -3799,6 +3800,10 @@ split_irqchip_unlock:
 		mutex_unlock(&kvm->lock);
 		break;
 	}
+	case KVM_CAP_X2APIC_API:
+		kvm->arch.x2apic_api = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index f28292d73ddb..8ade3eb6c640 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -151,8 +151,9 @@ TRACE_EVENT(kvm_msi_set_irq,
 		__entry->data		= data;
 	),
 
-	TP_printk("dst %u vec %u (%s|%s|%s%s)",
-		  (u8)(__entry->address >> 12), (u8)__entry->data,
+	TP_printk("dst %llx vec %u (%s|%s|%s%s)",
+		  (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00),
+		  (u8)__entry->data,
 		  __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode),
 		  (__entry->address & (1<<2)) ? "logical" : "physical",
 		  (__entry->data & (1<<15)) ? "level" : "edge",
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf475104c..43b355d6db7b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_X2APIC_API 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 12/13] KVM: x86: bump MAX_VCPUS to 288
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (10 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-07 17:15 ` [PATCH v2 13/13] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

288 is in high demand because of Knights Landing CPU.
We cannot set the limit to 640k, because that would be wasting space.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: r-b Paolo

 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f1b202b34c72..e362de872139 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -34,7 +34,7 @@
 #include <asm/asm.h>
 #include <asm/kvm_page_track.h>
 
-#define KVM_MAX_VCPUS 255
+#define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 13/13] KVM: x86: bump KVM_MAX_VCPU_ID to 1023
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (11 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 12/13] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
@ 2016-07-07 17:15 ` Radim Krčmář
  2016-07-08 10:00 ` [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Paolo Bonzini
  2016-07-08 16:29 ` Radim Krčmář
  14 siblings, 0 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-07 17:15 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

kzalloc was replaced with kvm_kvzalloc to allow non-contiguous areas and
rcu had to be modified to cope with it.

The practical limit for KVM_MAX_VCPU_ID right now is INT_MAX, but lower
value was chosen in case there were bugs.  1023 is sufficient maximum
APIC ID for 288 VCPUs.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v2: new [Paolo]

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/lapic.c            | 13 ++++++++++---
 arch/x86/kvm/x86.c              |  2 +-
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e362de872139..b771667e8e99 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
 
 #define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
+#define KVM_MAX_VCPU_ID 1023
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b47a82bfc8c4..3fcc590aa155 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -147,6 +147,13 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
 	}
 }
 
+static void kvm_apic_map_free(struct rcu_head *rcu)
+{
+	struct kvm_apic_map *map = container_of(rcu, struct kvm_apic_map, rcu);
+
+	kvfree(map);
+}
+
 static void recalculate_apic_map(struct kvm *kvm)
 {
 	struct kvm_apic_map *new, *old = NULL;
@@ -160,8 +167,8 @@ static void recalculate_apic_map(struct kvm *kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
 
-	new = kzalloc(sizeof(struct kvm_apic_map) +
-	              sizeof(struct kvm_lapic *) * (max_id + 1), GFP_KERNEL);
+	new = kvm_kvzalloc(sizeof(struct kvm_apic_map) +
+	                   sizeof(struct kvm_lapic *) * ((u64)max_id + 1));
 
 	if (!new)
 		goto out;
@@ -206,7 +213,7 @@ out:
 	mutex_unlock(&kvm->arch.apic_map_lock);
 
 	if (old)
-		kfree_rcu(old, rcu);
+		call_rcu(&old->rcu, kvm_apic_map_free);
 
 	kvm_make_scan_ioapic_request(kvm);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d4089dcb3eb..411f786950ab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7914,7 +7914,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kfree(kvm->arch.vpic);
 	kfree(kvm->arch.vioapic);
 	kvm_free_vcpus(kvm);
-	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
+	kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
 	kvm_mmu_uninit_vm(kvm);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/13] KVM: x86: break the xAPIC barrier
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (12 preceding siblings ...)
  2016-07-07 17:15 ` [PATCH v2 13/13] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář
@ 2016-07-08 10:00 ` Paolo Bonzini
  2016-07-08 16:29 ` Radim Krčmář
  14 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-08 10:00 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 07/07/2016 19:15, Radim Krčmář wrote:
> This series allows userspace to create and send interrupts to VCPUs with
> APIC ID > 255.
> 
> v2:
> dropped "KVM: x86: use u16 for logical VCPU mask in lapic" [Paolo]
> [1,5,8,9,12/13] r-b Paolo
> [7,10,13/13] new [Paolo]
> [2/13] do not point to out-of-scope value (return &src) [Paolo]
> [3/13]
>  * split from the next patch [Paolo]
>  * handle all sizes of the lapic array [Paolo]
> [4/13]
>  * replaced size with max_apic_id to minimize chances of overflow [Andrew]
>  * fixed allocation size [Paolo]
> [6/13]
>  * removed an obvious comment about recalculate_apic_id in state_set
>  * refactor kvm_apic_state_fixup and kvm_apic_post_state_restore [Paolo]
>  * reset apic id on xapic->disabled->xapic transitions [Paolo]
> [11/13]
>  * pass struct kvm into kvm_set_msi_irq [Paolo]
>  * trace address_hi [David]
>  * use hex dst, like other tracepoins
>  * strict reserved MSI bits checking [Paolo]
>  * loose reserved capability bits checking [Paolo]
>  * improved documentation [Paolo]
> 
> v1: http://www.spinics.net/lists/kvm/msg134921.html

Looks good, thanks!

Paolo

> 
> Radim Krčmář (13):
>   KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
>   KVM: x86: add kvm_apic_map_get_dest_lapic
>   KVM: x86: use physical LAPIC array for logical x2APIC
>   KVM: x86: dynamic kvm_apic_map
>   KVM: x86: use generic function for MSI parsing
>   KVM: x86: use hardware-compatible format for APIC ID register
>   KVM: x86: reset APIC ID when enabling LAPIC
>   KVM: VMX: optimize APIC ID read with APICv
>   KVM: x86: reset lapic base in kvm_lapic_reset
>   KVM: pass struct kvm to kvm_set_routing_entry
>   KVM: x86: add KVM_CAP_X2APIC_API
>   KVM: x86: bump MAX_VCPUS to 288
>   KVM: x86: bump KVM_MAX_VCPU_ID to 1023
> 
>  Documentation/virtual/kvm/api.txt |  32 ++++
>  arch/powerpc/kvm/mpic.c           |   3 +-
>  arch/s390/kvm/interrupt.c         |   3 +-
>  arch/x86/include/asm/kvm_host.h   |  18 +-
>  arch/x86/kvm/irq_comm.c           |  44 +++--
>  arch/x86/kvm/lapic.c              | 380 +++++++++++++++++++-------------------
>  arch/x86/kvm/lapic.h              |  10 +-
>  arch/x86/kvm/vmx.c                |   5 +-
>  arch/x86/kvm/x86.c                |  17 +-
>  include/linux/kvm_host.h          |   3 +-
>  include/trace/events/kvm.h        |   5 +-
>  include/uapi/linux/kvm.h          |   1 +
>  virt/kvm/irqchip.c                |   7 +-
>  13 files changed, 300 insertions(+), 228 deletions(-)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/13] KVM: x86: break the xAPIC barrier
  2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (13 preceding siblings ...)
  2016-07-08 10:00 ` [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Paolo Bonzini
@ 2016-07-08 16:29 ` Radim Krčmář
  2016-07-08 16:30   ` Paolo Bonzini
  14 siblings, 1 reply; 31+ messages in thread
From: Radim Krčmář @ 2016-07-08 16:29 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

NACK, sorry.

KVM allows x2APIC even without interrupt remapping and therefore
interprets IOAPIC destination 0xff as a broadcast, which causes a bug
when x2APIC uses 0xff.

I'll fix that in v3.  The hack will be disabled when x2apic_api
capability is enabled.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/13] KVM: x86: break the xAPIC barrier
  2016-07-08 16:29 ` Radim Krčmář
@ 2016-07-08 16:30   ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-08 16:30 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 08/07/2016 18:29, Radim Krčmář wrote:
> NACK, sorry.
> 
> KVM allows x2APIC even without interrupt remapping and therefore
> interprets IOAPIC destination 0xff as a broadcast, which causes a bug
> when x2APIC uses 0xff.
> 
> I'll fix that in v3.  The hack will be disabled when x2apic_api
> capability is enabled.

Right, I was going to suggest the same.  It's fine, just provide
patch(es) on top and I can squash them.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-07 17:15 ` [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-07-11  6:06   ` Yang Zhang
  2016-07-11  7:44     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhang @ 2016-07-11  6:06 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/8 1:15, Radim Krčmář wrote:
> KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
> addresses to 32 bits.  Both are needed to support x2APIC.
>
> The capability has to be toggleable and disabled by default, because get/set
> ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
> inspired by xAPIC and the change is not backward-compatible.
>
> Changes to MSI addresses follow the format used by interrupt remapping unit.
> The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
> Using the upper address word is not backward-compatible either as we didn't
> check that userspace zeroed the word.  Reserved bits are still not explicitly

Does this means we cannot migrate the VM from KVM_CAP_X2APIC_API enabled 
host to the disable host even VM doesn't have more than 255 VCPUs?

> checked, but non-zero data will affect LAPIC addresses, which will cause a bug.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v2:
>  * pass struct kvm into kvm_set_msi_irq [Paolo]
>  * trace address_hi [David]
>  * use hex dst, like other tracepoins
>  * strict reserved MSI bits checking [Paolo]
>  * loose reserved capability bits checking [Paolo]
>  * improved documentation [Paolo]
>
>  Documentation/virtual/kvm/api.txt | 32 ++++++++++++++++++++++++++++++++
>  arch/x86/include/asm/kvm_host.h   |  4 +++-
>  arch/x86/kvm/irq_comm.c           | 26 +++++++++++++++++++++-----
>  arch/x86/kvm/lapic.c              | 13 +++++++++----
>  arch/x86/kvm/vmx.c                |  2 +-
>  arch/x86/kvm/x86.c                |  5 +++++
>  include/trace/events/kvm.h        |  5 +++--
>  include/uapi/linux/kvm.h          |  1 +
>  8 files changed, 75 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 09efa9eb3926..a8f2ef910f98 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1482,6 +1482,10 @@ struct kvm_irq_routing_msi {
>  	__u32 pad;
>  };
>
> +On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
> +enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
> +destination id.  Bits 7-0 of address_hi must be zero.
> +
>  struct kvm_irq_routing_s390_adapter {
>  	__u64 ind_addr;
>  	__u64 summary_addr;
> @@ -1583,6 +1587,15 @@ struct kvm_lapic_state {
>  Reads the Local APIC registers and copies them into the input argument.  The
>  data format and layout are the same as documented in the architecture manual.
>
> +If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register depends
> +on the APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores
> +APIC ID in the APIC_ID register (bytes 32-35).  xAPIC only allows an 8-bit APIC
> +ID which is stored in bits 31-24 of the APIC register, or equivalently in byte
> +35 of struct kvm_lapic_state's regs field.
> +
> +If KVM_CAP_X2APIC_API is disabled, struct kvm_lapic_state always uses xAPIC
> +format.
> +
>
>  4.58 KVM_SET_LAPIC
>
> @@ -1600,6 +1613,10 @@ struct kvm_lapic_state {
>  Copies the input argument into the Local APIC registers.  The data format
>  and layout are the same as documented in the architecture manual.
>
> +The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
> +regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
> +See the note in KVM_GET_LAPIC.
> +
>
>  4.59 KVM_IOEVENTFD
>
> @@ -2180,6 +2197,10 @@ struct kvm_msi {
>
>  No flags are defined so far. The corresponding field must be 0.
>
> +On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
> +enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
> +destination id.  Bits 7-0 of address_hi must be zero.
> +
>
>  4.71 KVM_CREATE_PIT2
>
> @@ -3811,6 +3832,17 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
>  Will return -EINVAL if the machine does not support runtime-instrumentation.
>  Will return -EBUSY if a VCPU has already been created.
>
> +7.7 KVM_CAP_X2APIC_API
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0
> +
> +Enabling this capability changes the behavior of KVM_SET_GSI_ROUTING,
> +KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, allowing the use of 32-bit
> +APIC IDs.  See KVM_CAP_X2APIC_API in their respective sections.
> +
> +
>  8. Other capabilities.
>  ----------------------
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 643e3dffcd85..f1b202b34c72 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -782,6 +782,8 @@ struct kvm_arch {
>  	u32 ldr_mode;
>  	struct page *avic_logical_id_table_page;
>  	struct page *avic_physical_id_table_page;
> +
> +	bool x2apic_api;
>  };
>
>  struct kvm_vm_stat {
> @@ -1364,7 +1366,7 @@ bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
>  bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  			     struct kvm_vcpu **dest_vcpu);
>
> -void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> +void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
>  		     struct kvm_lapic_irq *irq);
>
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 889563d50c55..2ff9691e4603 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -110,13 +110,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  	return r;
>  }
>
> -void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> +void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
>  		     struct kvm_lapic_irq *irq)
>  {
> -	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> +	trace_kvm_msi_set_irq(e->msi.address_lo | (kvm->arch.x2apic_api ?
> +	                                     (u64)e->msi.address_hi << 32 : 0),
> +	                      e->msi.data);
>
>  	irq->dest_id = (e->msi.address_lo &
>  			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> +	if (kvm->arch.x2apic_api)
> +		irq->dest_id |= MSI_ADDR_EXT_DEST_ID(e->msi.address_hi);
>  	irq->vector = (e->msi.data &
>  			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
>  	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
> @@ -129,15 +133,24 @@ void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
>
> +static inline bool kvm_msi_route_invalid(struct kvm *kvm,
> +                                       struct kvm_kernel_irq_routing_entry *e)
> +{
> +	return kvm->arch.x2apic_api && (e->msi.address_hi & 0xff);
> +}
> +
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>  		struct kvm *kvm, int irq_source_id, int level, bool line_status)
>  {
>  	struct kvm_lapic_irq irq;
>
> +	if (kvm_msi_route_invalid(kvm, e))
> +		return -EINVAL;
> +
>  	if (!level)
>  		return -1;
>
> -	kvm_set_msi_irq(e, &irq);
> +	kvm_set_msi_irq(kvm, e, &irq);
>
>  	return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
>  }
> @@ -153,7 +166,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
>  	if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
>  		return -EWOULDBLOCK;
>

Shouldn't we check the msi_route_invalid in kvm_arch_set_irq_inatomic?

> -	kvm_set_msi_irq(e, &irq);
> +	kvm_set_msi_irq(kvm, e, &irq);
>
>  	if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
>  		return r;
> @@ -286,6 +299,9 @@ int kvm_set_routing_entry(struct kvm *kvm,
>  		e->msi.address_lo = ue->u.msi.address_lo;
>  		e->msi.address_hi = ue->u.msi.address_hi;
>  		e->msi.data = ue->u.msi.data;
> +
> +		if (kvm_msi_route_invalid(kvm, e))
> +			goto out;
>  		break;
>  	case KVM_IRQ_ROUTING_HV_SINT:
>  		e->set = kvm_hv_set_sint;
> @@ -394,7 +410,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  			if (entry->type != KVM_IRQ_ROUTING_MSI)
>  				continue;
>
> -			kvm_set_msi_irq(entry, &irq);
> +			kvm_set_msi_irq(vcpu->kvm, entry, &irq);
>
>  			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
>  						irq.dest_id, irq.dest_mode))
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 3c2a8c113054..b47a82bfc8c4 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1991,10 +1991,15 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  	if (apic_x2apic_mode(vcpu->arch.apic)) {
>  		u32 *id = (u32 *)(s->regs + APIC_ID);
>
> -		if (set)
> -			*id >>= 24;
> -		else
> -			*id <<= 24;
> +		if (vcpu->kvm->arch.x2apic_api) {
> +			if (*id != vcpu->vcpu_id)
> +				return -EINVAL;
> +		} else {
> +			if (set)
> +				*id >>= 24;
> +			else
> +				*id <<= 24;
> +		}
>  	}
>
>  	return 0;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0a6a8845ea0c..ada9437d6e18 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11075,7 +11075,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
>  		 * We will support full lowest-priority interrupt later.
>  		 */
>
> -		kvm_set_msi_irq(e, &irq);
> +		kvm_set_msi_irq(kvm, e, &irq);
>  		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>  			/*
>  			 * Make sure the IRTE is in remapped mode if
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b178c8c12717..5d4089dcb3eb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2576,6 +2576,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_DISABLE_QUIRKS:
>  	case KVM_CAP_SET_BOOT_CPU_ID:
>   	case KVM_CAP_SPLIT_IRQCHIP:
> +	case KVM_CAP_X2APIC_API:
>  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
>  	case KVM_CAP_ASSIGN_DEV_IRQ:
>  	case KVM_CAP_PCI_2_3:
> @@ -3799,6 +3800,10 @@ split_irqchip_unlock:
>  		mutex_unlock(&kvm->lock);
>  		break;
>  	}
> +	case KVM_CAP_X2APIC_API:
> +		kvm->arch.x2apic_api = true;
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
> index f28292d73ddb..8ade3eb6c640 100644
> --- a/include/trace/events/kvm.h
> +++ b/include/trace/events/kvm.h
> @@ -151,8 +151,9 @@ TRACE_EVENT(kvm_msi_set_irq,
>  		__entry->data		= data;
>  	),
>
> -	TP_printk("dst %u vec %u (%s|%s|%s%s)",
> -		  (u8)(__entry->address >> 12), (u8)__entry->data,
> +	TP_printk("dst %llx vec %u (%s|%s|%s%s)",
> +		  (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00),
> +		  (u8)__entry->data,
>  		  __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode),
>  		  (__entry->address & (1<<2)) ? "logical" : "physical",
>  		  (__entry->data & (1<<15)) ? "level" : "edge",
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 05ebf475104c..43b355d6db7b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_ARM_PMU_V3 126
>  #define KVM_CAP_VCPU_ATTRIBUTES 127
>  #define KVM_CAP_MAX_VCPU_ID 128
> +#define KVM_CAP_X2APIC_API 129
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
>


-- 
best regards
yang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-07 17:15 ` [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-07-11  6:07   ` Yang Zhang
  2016-07-11  7:43     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhang @ 2016-07-11  6:07 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/8 1:15, Radim Krčmář wrote:
> x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
> have slighly less VCPUs.  Dynamic size saves memory at the cost of
> turning one constant into a variable.
>
> apic_map mutex had to be moved before allocation to avoid races with cpu
> hotplug.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v2:
>  * replaced size with max_apic_id to minimize chances of overflow [Andrew]
>  * fixed allocation size [Paolo]
>
>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/lapic.c            | 18 +++++++++++++-----
>  arch/x86/kvm/lapic.h            |  2 +-
>  3 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3194b19b9c7b..643e3dffcd85 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -682,11 +682,12 @@ struct kvm_arch_memory_slot {
>  struct kvm_apic_map {
>  	struct rcu_head rcu;
>  	u8 mode;
> -	struct kvm_lapic *phys_map[256];
> +	u32 max_apic_id;
>  	union {
>  		struct kvm_lapic *xapic_flat_map[8];
>  		struct kvm_lapic *xapic_cluster_map[16][4];
>  	};
> +	struct kvm_lapic *phys_map[];
>  };
>
>  /* Hyper-V emulation context */
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9880d03f533d..224fc1c5fcc6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -120,7 +120,7 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
>  	switch (map->mode) {
>  	case KVM_APIC_MODE_X2APIC: {
>  		u32 offset = (dest_id >> 16) * 16;
> -		u32 max_apic_id = ARRAY_SIZE(map->phys_map) - 1;
> +		u32 max_apic_id = map->max_apic_id;
>
>  		if (offset <= max_apic_id) {
>  			u8 cluster_size = min(max_apic_id - offset + 1, 16U);
> @@ -152,14 +152,22 @@ static void recalculate_apic_map(struct kvm *kvm)
>  	struct kvm_apic_map *new, *old = NULL;
>  	struct kvm_vcpu *vcpu;
>  	int i;
> -
> -	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
> +	u32 max_id = 255;
>
>  	mutex_lock(&kvm->arch.apic_map_lock);
>
> +	kvm_for_each_vcpu(i, vcpu, kvm)
> +		if (kvm_apic_present(vcpu))
> +			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
> +
> +	new = kzalloc(sizeof(struct kvm_apic_map) +
> +	              sizeof(struct kvm_lapic *) * (max_id + 1), GFP_KERNEL);
> +

I think this may cause the host runs out of memory if a malicious guest 
did follow thing:
1. vcpu a is doing apic map recalculation.
2. vcpu b write the apic id with 0xff
3. then vcpu b enable the x2apic: in kvm_lapic_set_base(), we will set 
apic_base to new value before reset the apic id.
4. vcpu a may see the x2apic enabled in vcpu b plus an old apic 
id(0xff), and max_id will become (0xff >> 24).

>  	if (!new)
>  		goto out;
>
> +	new->max_apic_id = max_id;
> +
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		struct kvm_lapic *apic = vcpu->arch.apic;
>  		struct kvm_lapic **cluster;
> @@ -172,7 +180,7 @@ static void recalculate_apic_map(struct kvm *kvm)
>  		aid = kvm_apic_id(apic);
>  		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
>
> -		if (aid < ARRAY_SIZE(new->phys_map))
> +		if (aid <= new->max_apic_id)
>  			new->phys_map[aid] = apic;
>
>  		if (apic_x2apic_mode(apic)) {
> @@ -710,7 +718,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm,
>  		return false;
>
>  	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> -		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> +		if (irq->dest_id > map->max_apic_id) {
>  			*bitmap = 0;
>  		} else {
>  			*dst = &map->phys_map[irq->dest_id];
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 336ba51bb16e..8d811139d2b3 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -200,7 +200,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
>  	return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
>  }
>
> -static inline int kvm_apic_id(struct kvm_lapic *apic)
> +static inline u32 kvm_apic_id(struct kvm_lapic *apic)
>  {
>  	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
>  }
>


-- 
best regards
yang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11  6:07   ` Yang Zhang
@ 2016-07-11  7:43     ` Paolo Bonzini
  2016-07-11 10:14       ` Yang Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11  7:43 UTC (permalink / raw)
  To: Yang Zhang, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 11/07/2016 08:07, Yang Zhang wrote:
>>
>>      mutex_lock(&kvm->arch.apic_map_lock);
>>
>> +    kvm_for_each_vcpu(i, vcpu, kvm)
>> +        if (kvm_apic_present(vcpu))
>> +            max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
>> +
>> +    new = kzalloc(sizeof(struct kvm_apic_map) +
>> +                  sizeof(struct kvm_lapic *) * (max_id + 1),
>> GFP_KERNEL);
>> +
> 
> I think this may cause the host runs out of memory if a malicious guest
> did follow thing:
> 1. vcpu a is doing apic map recalculation.
> 2. vcpu b write the apic id with 0xff
> 3. then vcpu b enable the x2apic: in kvm_lapic_set_base(), we will set
> apic_base to new value before reset the apic id.
> 4. vcpu a may see the x2apic enabled in vcpu b plus an old apic
> id(0xff), and max_id will become (0xff >> 24).

The bug is not really here but in patch 6---but you're right nevertheless!

I guess the easiest solution is to replace kvm_apic_id with a field in
struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.

It can be added easily in patch 6 itself, it's like 3 new lines of code
because all reads and writes go through kvm_apic_id and kvm_apic_set_id;
the kvm_apic_id wrapper can be kept for simplicity.

Thanks again!

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-11  6:06   ` Yang Zhang
@ 2016-07-11  7:44     ` Paolo Bonzini
  2016-07-11  8:56       ` Yang Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11  7:44 UTC (permalink / raw)
  To: Yang Zhang, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 11/07/2016 08:06, Yang Zhang wrote:
>> Changes to MSI addresses follow the format used by interrupt remapping
>> unit.
>> The upper address word, that used to be 0, contains upper 24 bits of
>> the LAPIC
>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>> Using the upper address word is not backward-compatible either as we
>> didn't
>> check that userspace zeroed the word.  Reserved bits are still not
>> explicitly
> 
> Does this means we cannot migrate the VM from KVM_CAP_X2APIC_API enabled
> host to the disable host even VM doesn't have more than 255 VCPUs?

Yes, but that's why KVM_CAP_X2APIC_API is enabled manually.  The idea is
that QEMU will not use KVM_CAP_X2APIC_API except on the newest machine type.

If interrupt remapping is on, KVM_CAP_X2APIC_API is needed even with 8
VCPUs, I think.  Otherwise KVM will believe that 0xff is "broadcast"
rather than "cluster 0, CPUs 0-7".

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-11  7:44     ` Paolo Bonzini
@ 2016-07-11  8:56       ` Yang Zhang
  2016-07-11  9:17         ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhang @ 2016-07-11  8:56 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/11 15:44, Paolo Bonzini wrote:
>
>
> On 11/07/2016 08:06, Yang Zhang wrote:
>>> Changes to MSI addresses follow the format used by interrupt remapping
>>> unit.
>>> The upper address word, that used to be 0, contains upper 24 bits of
>>> the LAPIC
>>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>>> Using the upper address word is not backward-compatible either as we
>>> didn't
>>> check that userspace zeroed the word.  Reserved bits are still not
>>> explicitly
>>
>> Does this means we cannot migrate the VM from KVM_CAP_X2APIC_API enabled
>> host to the disable host even VM doesn't have more than 255 VCPUs?
>
> Yes, but that's why KVM_CAP_X2APIC_API is enabled manually.  The idea is
> that QEMU will not use KVM_CAP_X2APIC_API except on the newest machine type.

Thanks for confirmation. And when the KVM_CAP_X2APIC_API will be enabled 
in Qemu?

>
> If interrupt remapping is on, KVM_CAP_X2APIC_API is needed even with 8
> VCPUs, I think.  Otherwise KVM will believe that 0xff is "broadcast"
> rather than "cluster 0, CPUs 0-7".

If interrupt remapping is using, what 0xff means is relying on which 
mode the destination CPU is in. I think there is no KVM_CAP_X2APIC_API 
needed since interrupt remapping table gives all the information.

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-11  8:56       ` Yang Zhang
@ 2016-07-11  9:17         ` Paolo Bonzini
  2016-07-11 10:33           ` Yang Zhang
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11  9:17 UTC (permalink / raw)
  To: Yang Zhang, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 11/07/2016 10:56, Yang Zhang wrote:
> On 2016/7/11 15:44, Paolo Bonzini wrote:
>>
>>
>> On 11/07/2016 08:06, Yang Zhang wrote:
>>>> Changes to MSI addresses follow the format used by interrupt remapping
>>>> unit.
>>>> The upper address word, that used to be 0, contains upper 24 bits of
>>>> the LAPIC
>>>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>>>> Using the upper address word is not backward-compatible either as we
>>>> didn't
>>>> check that userspace zeroed the word.  Reserved bits are still not
>>>> explicitly
>>>
>>> Does this means we cannot migrate the VM from KVM_CAP_X2APIC_API enabled
>>> host to the disable host even VM doesn't have more than 255 VCPUs?
>>
>> Yes, but that's why KVM_CAP_X2APIC_API is enabled manually.  The idea is
>> that QEMU will not use KVM_CAP_X2APIC_API except on the newest machine
>> type.
> 
> Thanks for confirmation. And when the KVM_CAP_X2APIC_API will be enabled
> in Qemu?

It could be 2.7 or 2.8.

>>
>> If interrupt remapping is on, KVM_CAP_X2APIC_API is needed even with 8
>> VCPUs, I think.  Otherwise KVM will believe that 0xff is "broadcast"
>> rather than "cluster 0, CPUs 0-7".
> 
> If interrupt remapping is using, what 0xff means is relying on which
> mode the destination CPU is in. I think there is no KVM_CAP_X2APIC_API
> needed since interrupt remapping table gives all the information.

If you have EIM 0xff never means broadcast, but KVM sees a 0xff in the
interrupt route or KVM_SIGNAL_MSI argument and translates it into a
broadcast.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11  7:43     ` Paolo Bonzini
@ 2016-07-11 10:14       ` Yang Zhang
  2016-07-11 13:48         ` Radim Krčmář
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhang @ 2016-07-11 10:14 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/11 15:43, Paolo Bonzini wrote:
>
>
> On 11/07/2016 08:07, Yang Zhang wrote:
>>>
>>>      mutex_lock(&kvm->arch.apic_map_lock);
>>>
>>> +    kvm_for_each_vcpu(i, vcpu, kvm)
>>> +        if (kvm_apic_present(vcpu))
>>> +            max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
>>> +
>>> +    new = kzalloc(sizeof(struct kvm_apic_map) +
>>> +                  sizeof(struct kvm_lapic *) * (max_id + 1),
>>> GFP_KERNEL);
>>> +
>>
>> I think this may cause the host runs out of memory if a malicious guest
>> did follow thing:
>> 1. vcpu a is doing apic map recalculation.
>> 2. vcpu b write the apic id with 0xff
>> 3. then vcpu b enable the x2apic: in kvm_lapic_set_base(), we will set
>> apic_base to new value before reset the apic id.
>> 4. vcpu a may see the x2apic enabled in vcpu b plus an old apic
>> id(0xff), and max_id will become (0xff >> 24).
>
> The bug is not really here but in patch 6---but you're right nevertheless!
>
> I guess the easiest solution is to replace kvm_apic_id with a field in
> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.

Or we can just simply put the assignment of apic_base to the end.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fdc05ae..9c69059 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1745,7 +1745,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
value)
                 return;
         }

-       vcpu->arch.apic_base = value;

         /* update jump label if enable bit changes */
         if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
@@ -1753,7 +1752,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
value)
                         static_key_slow_dec_deferred(&apic_hw_disabled);
                 else
                         static_key_slow_inc(&apic_hw_disabled.key);
-               recalculate_apic_map(vcpu->kvm);
         }

         if ((old_value ^ value) & X2APIC_ENABLE) {
@@ -1764,6 +1762,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
value)
                         kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
         }

+       vcpu->arch.apic_base = value;
+       recalculate_apic_map(vcpu->kvm);
         apic->base_address = apic->vcpu->arch.apic_base &
                              MSR_IA32_APICBASE_BASE;


btw, i noticed that there is no apic map recalculation after turn off 
the x2apic mode.Is it correct?

-- 
best regards
yang

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-11  9:17         ` Paolo Bonzini
@ 2016-07-11 10:33           ` Yang Zhang
  2016-07-11 10:40             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Yang Zhang @ 2016-07-11 10:33 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/11 17:17, Paolo Bonzini wrote:
>
>
> On 11/07/2016 10:56, Yang Zhang wrote:
>> On 2016/7/11 15:44, Paolo Bonzini wrote:
>>>
>>>
>>> On 11/07/2016 08:06, Yang Zhang wrote:
>>>>> Changes to MSI addresses follow the format used by interrupt remapping
>>>>> unit.
>>>>> The upper address word, that used to be 0, contains upper 24 bits of
>>>>> the LAPIC
>>>>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>>>>> Using the upper address word is not backward-compatible either as we
>>>>> didn't
>>>>> check that userspace zeroed the word.  Reserved bits are still not
>>>>> explicitly
>>>>
>>>> Does this means we cannot migrate the VM from KVM_CAP_X2APIC_API enabled
>>>> host to the disable host even VM doesn't have more than 255 VCPUs?
>>>
>>> Yes, but that's why KVM_CAP_X2APIC_API is enabled manually.  The idea is
>>> that QEMU will not use KVM_CAP_X2APIC_API except on the newest machine
>>> type.
>>
>> Thanks for confirmation. And when the KVM_CAP_X2APIC_API will be enabled
>> in Qemu?
>
> It could be 2.7 or 2.8.
>
>>>
>>> If interrupt remapping is on, KVM_CAP_X2APIC_API is needed even with 8
>>> VCPUs, I think.  Otherwise KVM will believe that 0xff is "broadcast"
>>> rather than "cluster 0, CPUs 0-7".
>>
>> If interrupt remapping is using, what 0xff means is relying on which
>> mode the destination CPU is in. I think there is no KVM_CAP_X2APIC_API
>> needed since interrupt remapping table gives all the information.
>
> If you have EIM 0xff never means broadcast, but KVM sees a 0xff in the
> interrupt route or KVM_SIGNAL_MSI argument and translates it into a
> broadcast.

I see your point. I thought there would be a new irq router(like 
KVM_IRQ_ROUTING_IR) to handle all interrupts after turn on IR and 
KVM_CAP_X2APIC_API would be dropped. So we will continue to use 
KVM_IRQ_ROUTING_IRQCHIP and KVM_IRQ_ROUTING_MSI for interrupt from IR, 
right?

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-11 10:33           ` Yang Zhang
@ 2016-07-11 10:40             ` Paolo Bonzini
  0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11 10:40 UTC (permalink / raw)
  To: Yang Zhang, Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 11/07/2016 12:33, Yang Zhang wrote:
> On 2016/7/11 17:17, Paolo Bonzini wrote:
>> On 11/07/2016 10:56, Yang Zhang wrote:
>>> On 2016/7/11 15:44, Paolo Bonzini wrote:
>>>> On 11/07/2016 08:06, Yang Zhang wrote:
>>>> If interrupt remapping is on, KVM_CAP_X2APIC_API is needed even with 8
>>>> VCPUs, I think.  Otherwise KVM will believe that 0xff is "broadcast"
>>>> rather than "cluster 0, CPUs 0-7".
>>>
>>> If interrupt remapping is using, what 0xff means is relying on which
>>> mode the destination CPU is in. I think there is no KVM_CAP_X2APIC_API
>>> needed since interrupt remapping table gives all the information.
>>
>> If you have EIM 0xff never means broadcast, but KVM sees a 0xff in the
>> interrupt route or KVM_SIGNAL_MSI argument and translates it into a
>> broadcast.
> 
> I see your point. I thought there would be a new irq router(like
> KVM_IRQ_ROUTING_IR) to handle all interrupts after turn on IR and
> KVM_CAP_X2APIC_API would be dropped.

KVM_CAP_X2APIC_API seems simpler to me than a new type of irq routing.

> So we will continue to use
> KVM_IRQ_ROUTING_IRQCHIP and KVM_IRQ_ROUTING_MSI for interrupt from IR,
> right?

Actually only KVM_IRQ_ROUTING_MSI, because that's the only one that is
available with split irqchip.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11 10:14       ` Yang Zhang
@ 2016-07-11 13:48         ` Radim Krčmář
  2016-07-11 14:14           ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Radim Krčmář @ 2016-07-11 13:48 UTC (permalink / raw)
  To: Yang Zhang
  Cc: Paolo Bonzini, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-11 18:14+0800, Yang Zhang:
> On 2016/7/11 15:43, Paolo Bonzini wrote:
>> On 11/07/2016 08:07, Yang Zhang wrote:
>> > > 
>> > >      mutex_lock(&kvm->arch.apic_map_lock);
>> > > 
>> > > +    kvm_for_each_vcpu(i, vcpu, kvm)
>> > > +        if (kvm_apic_present(vcpu))
>> > > +            max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
>> > > +
>> > > +    new = kzalloc(sizeof(struct kvm_apic_map) +
>> > > +                  sizeof(struct kvm_lapic *) * (max_id + 1),
>> > > GFP_KERNEL);
>> > > +
>> > 
>> > I think this may cause the host runs out of memory if a malicious guest
>> > did follow thing:
>> > 1. vcpu a is doing apic map recalculation.
>> > 2. vcpu b write the apic id with 0xff
>> > 3. then vcpu b enable the x2apic: in kvm_lapic_set_base(), we will set
>> > apic_base to new value before reset the apic id.
>> > 4. vcpu a may see the x2apic enabled in vcpu b plus an old apic
>> > id(0xff), and max_id will become (0xff >> 24).

Indeed, thanks.  The guest doesn't even have to be malicious ...

>> The bug is not really here but in patch 6---but you're right nevertheless!

Yes.

>> I guess the easiest solution is to replace kvm_apic_id with a field in
>> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.

(I guess the fewest LOC is to look at vcpu->vcpu_id, which is equal to
 x2apic id.  xapic id cannot be greater than 255 and all of those are
 covered by the initial value of max_id.)

> Or we can just simply put the assignment of apic_base to the end.

Yes, this would work, I'd also remove recalculates from
kvm_apic_set_*apic_id() and add a compiler barrier with comment for good
measure, even though set_virtual_x2apic_mode() serves as one.
(What makes a bit wary is that it doesn't avoid the same problem if we
 changed KVM to reset apic id to xapic id first when disabling apic.)

Races in recalculation and APIC ID changes also lead to invalid physical
maps, which haven't been taken care of properly ...
Having apic id stored in big endian, or "0-7,8-31" format, would be
safest.  I wanted to change the apic map to do incremental updates with
with respect to the APIC that has changed, instead of being completely
recomputed, so maybe the time is now. :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11 13:48         ` Radim Krčmář
@ 2016-07-11 14:14           ` Paolo Bonzini
  2016-07-11 15:52             ` Radim Krčmář
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11 14:14 UTC (permalink / raw)
  To: Radim Krčmář, Yang Zhang
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 11/07/2016 15:48, Radim Krčmář wrote:
>>> I guess the easiest solution is to replace kvm_apic_id with a field in
>>> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.
> 
> (I guess the fewest LOC is to look at vcpu->vcpu_id, which is equal to
>  x2apic id.  xapic id cannot be greater than 255 and all of those are
>  covered by the initial value of max_id.)

Yes, this would work too.  Or even better perhaps, look at vcpu->vcpu_id
in kvm_apic_id?

>> Or we can just simply put the assignment of apic_base to the end.
> 
> Yes, this would work, I'd also remove recalculates from
> kvm_apic_set_*apic_id() and add a compiler barrier with comment for good
> measure, even though set_virtual_x2apic_mode() serves as one.

Why a compiler barrier?

> (What makes a bit wary is that it doesn't avoid the same problem if we
>  changed KVM to reset apic id to xapic id first when disabling apic.)

Yes, this is why I prefer it fixed once and for all in kvm_apic_id...

> Races in recalculation and APIC ID changes also lead to invalid physical
> maps, which haven't been taken care of properly ...

Hmm, true, but can be fixed separately.  Probably the mutex should be
renamed so that it can be taken outside recalculate_apic_map...

Paolo

> Having apic id stored in big endian, or "0-7,8-31" format, would be
> safest.  I wanted to change the apic map to do incremental updates with
> with respect to the APIC that has changed, instead of being completely
> recomputed, so maybe the time is now. :)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11 14:14           ` Paolo Bonzini
@ 2016-07-11 15:52             ` Radim Krčmář
  2016-07-11 16:04               ` Paolo Bonzini
  2016-07-12  3:27               ` Yang Zhang
  0 siblings, 2 replies; 31+ messages in thread
From: Radim Krčmář @ 2016-07-11 15:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Yang Zhang, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-11 16:14+0200, Paolo Bonzini:
> On 11/07/2016 15:48, Radim Krčmář wrote:
>>>> I guess the easiest solution is to replace kvm_apic_id with a field in
>>>> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.
>> 
>> (I guess the fewest LOC is to look at vcpu->vcpu_id, which is equal to
>>  x2apic id.  xapic id cannot be greater than 255 and all of those are
>>  covered by the initial value of max_id.)
> 
> Yes, this would work too.  Or even better perhaps, look at vcpu->vcpu_id
> in kvm_apic_id?

APIC ID is writeable in xAPIC mode, which would make the implementation
weird without an extra variable.  Always read-only APIC ID would be
best, IMO.

>>> Or we can just simply put the assignment of apic_base to the end.
>> 
>> Yes, this would work, I'd also remove recalculates from
>> kvm_apic_set_*apic_id() and add a compiler barrier with comment for good
>> measure, even though set_virtual_x2apic_mode() serves as one.
> 
> Why a compiler barrier?

True, it should be a proper pair of smp_wmb() and smp_rmb() in
recalculate ... and current kvm_apic_id() reads in a wrong order, so
changing the apic_base alone update wouldn't get rid of this race.

>> (What makes a bit wary is that it doesn't avoid the same problem if we
>>  changed KVM to reset apic id to xapic id first when disabling apic.)
> 
> Yes, this is why I prefer it fixed once and for all in kvm_apic_id...

Seems most reasonable.  We'll need to be careful to have a correct value
in the apic page, but there shouldn't be any races there.

>> Races in recalculation and APIC ID changes also lead to invalid physical
>> maps, which haven't been taken care of properly ...
> 
> Hmm, true, but can be fixed separately.  Probably the mutex should be
> renamed so that it can be taken outside recalculate_apic_map...

Good point, it'll make reasoning easier and shouldn't introduce any
extra scalability issues.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11 15:52             ` Radim Krčmář
@ 2016-07-11 16:04               ` Paolo Bonzini
  2016-07-12  3:27               ` Yang Zhang
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2016-07-11 16:04 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Yang Zhang, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu



On 11/07/2016 17:52, Radim Krčmář wrote:
> 2016-07-11 16:14+0200, Paolo Bonzini:
>> On 11/07/2016 15:48, Radim Krčmář wrote:
>>>>> I guess the easiest solution is to replace kvm_apic_id with a field in
>>>>> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.
>>>
>>> (I guess the fewest LOC is to look at vcpu->vcpu_id, which is equal to
>>>  x2apic id.  xapic id cannot be greater than 255 and all of those are
>>>  covered by the initial value of max_id.)
>>
>> Yes, this would work too.  Or even better perhaps, look at vcpu->vcpu_id
>> in kvm_apic_id?
> 
> APIC ID is writeable in xAPIC mode, which would make the implementation
> weird without an extra variable.  Always read-only APIC ID would be
> best, IMO.

You can do

	if (x2apic mode)
		return lapic->vcpu->vcpu_id;
	else
		return get_reg(APIC_ID) >> 24;

The point is to avoid returning a shifted APIC_ID without shifting it.

The alternative of course is just caching it, which at this point is not
particularly harder...

Paolo

>>> (What makes a bit wary is that it doesn't avoid the same problem if we
>>>  changed KVM to reset apic id to xapic id first when disabling apic.)
>>
>> Yes, this is why I prefer it fixed once and for all in kvm_apic_id...
> 
> Seems most reasonable.  We'll need to be careful to have a correct value
> in the apic page, but there shouldn't be any races there.
> 
>>> Races in recalculation and APIC ID changes also lead to invalid physical
>>> maps, which haven't been taken care of properly ...
>>
>> Hmm, true, but can be fixed separately.  Probably the mutex should be
>> renamed so that it can be taken outside recalculate_apic_map...
> 
> Good point, it'll make reasoning easier and shouldn't introduce any
> extra scalability issues.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map
  2016-07-11 15:52             ` Radim Krčmář
  2016-07-11 16:04               ` Paolo Bonzini
@ 2016-07-12  3:27               ` Yang Zhang
  1 sibling, 0 replies; 31+ messages in thread
From: Yang Zhang @ 2016-07-12  3:27 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

On 2016/7/11 23:52, Radim Krčmář wrote:
> 2016-07-11 16:14+0200, Paolo Bonzini:
>> On 11/07/2016 15:48, Radim Krčmář wrote:
>>>>> I guess the easiest solution is to replace kvm_apic_id with a field in
>>>>> struct kvm_lapic, which is already shifted right by 24 in xAPIC mode.
>>>
>>> (I guess the fewest LOC is to look at vcpu->vcpu_id, which is equal to
>>>  x2apic id.  xapic id cannot be greater than 255 and all of those are
>>>  covered by the initial value of max_id.)
>>
>> Yes, this would work too.  Or even better perhaps, look at vcpu->vcpu_id
>> in kvm_apic_id?
>
> APIC ID is writeable in xAPIC mode, which would make the implementation
> weird without an extra variable.  Always read-only APIC ID would be
> best, IMO.
>
>>>> Or we can just simply put the assignment of apic_base to the end.
>>>
>>> Yes, this would work, I'd also remove recalculates from
>>> kvm_apic_set_*apic_id() and add a compiler barrier with comment for good
>>> measure, even though set_virtual_x2apic_mode() serves as one.
>>
>> Why a compiler barrier?
>
> True, it should be a proper pair of smp_wmb() and smp_rmb() in
> recalculate ... and current kvm_apic_id() reads in a wrong order, so
> changing the apic_base alone update wouldn't get rid of this race.
>
>>> (What makes a bit wary is that it doesn't avoid the same problem if we
>>>  changed KVM to reset apic id to xapic id first when disabling apic.)
>>
>> Yes, this is why I prefer it fixed once and for all in kvm_apic_id...
>
> Seems most reasonable.  We'll need to be careful to have a correct value
> in the apic page, but there shouldn't be any races there.

Yes, it is more reasonable.

>
>>> Races in recalculation and APIC ID changes also lead to invalid physical
>>> maps, which haven't been taken care of properly ...
>>
>> Hmm, true, but can be fixed separately.  Probably the mutex should be
>> renamed so that it can be taken outside recalculate_apic_map...
>
> Good point, it'll make reasoning easier and shouldn't introduce any
> extra scalability issues.

If we can ensure all the updates to LDR,DFR,ID and apic mode are in 
correct sequence and followed with apic map recalculation, it should be 
enough. It's guest's responsibility to ensure the apic updating must
happen in right time(means no interrupt is in flying), otherwise the 
interrupt may deliver to wrong VCPU.

-- 
best regards
yang

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-07-12  3:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-07 17:15 [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 01/13] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 02/13] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 03/13] KVM: x86: use physical LAPIC array for logical x2APIC Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 04/13] KVM: x86: dynamic kvm_apic_map Radim Krčmář
2016-07-11  6:07   ` Yang Zhang
2016-07-11  7:43     ` Paolo Bonzini
2016-07-11 10:14       ` Yang Zhang
2016-07-11 13:48         ` Radim Krčmář
2016-07-11 14:14           ` Paolo Bonzini
2016-07-11 15:52             ` Radim Krčmář
2016-07-11 16:04               ` Paolo Bonzini
2016-07-12  3:27               ` Yang Zhang
2016-07-07 17:15 ` [PATCH v2 05/13] KVM: x86: use generic function for MSI parsing Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 06/13] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 07/13] KVM: x86: reset APIC ID when enabling LAPIC Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 08/13] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 09/13] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 10/13] KVM: pass struct kvm to kvm_set_routing_entry Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 11/13] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
2016-07-11  6:06   ` Yang Zhang
2016-07-11  7:44     ` Paolo Bonzini
2016-07-11  8:56       ` Yang Zhang
2016-07-11  9:17         ` Paolo Bonzini
2016-07-11 10:33           ` Yang Zhang
2016-07-11 10:40             ` Paolo Bonzini
2016-07-07 17:15 ` [PATCH v2 12/13] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
2016-07-07 17:15 ` [PATCH v2 13/13] KVM: x86: bump KVM_MAX_VCPU_ID to 1023 Radim Krčmář
2016-07-08 10:00 ` [PATCH v2 00/13] KVM: x86: break the xAPIC barrier Paolo Bonzini
2016-07-08 16:29 ` Radim Krčmář
2016-07-08 16:30   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).