All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/11] KVM: x86: break the xAPIC barrier
@ 2016-06-30 20:54 Radim Krčmář
  2016-06-30 20:54 ` [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
                   ` (10 more replies)
  0 siblings, 11 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

RFC: http://www.spinics.net/lists/kvm/msg132036.html

This series allows userspace to create and send interrupts to VCPUs with
APIC ID > 255.

v1: 32 bit API extensions now use a toggleable capability called
KVM_CAP_X2APIC_API.

* new [1/11], loosely related and should have been posted long ago
* [2/11] improved comment for kvm_apic_map_get_dest_lapic() [Peter]
* [3/11] use round_up [Peter]
* [6/11] drop big endian format in lapic get/set ioctl [Paolo]
* [6/11] exclude APICv optimization of APIC ID register
* [6/11] rename __kvm_apic_state_fixup() parameter
* new [7/11], split from the previous patch
* [8/11] added r-b [Peter]
* [10/11] rewritten with a toggleable capability [Paolo]
* [10/11] dropped MSI_ADDR_EXT_DEST_ID to enforce reserved bits

Latest Linux, FreeBSD, and Windows run fine, but I haven't tested VFIO.
Linux works even with APIC ID over 255.  (QEMU and seabios patches to
achieve that are not yet posted.)


Radim Krčmář (11):
  KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  KVM: x86: add kvm_apic_map_get_dest_lapic
  KVM: x86: dynamic kvm_apic_map
  KVM: x86: use u16 for logical VCPU mask in lapic
  KVM: x86: use generic function for MSI parsing
  KVM: x86: use hardware-compatible format for APIC ID register
  KVM: VMX: optimize APIC ID read with APICv
  KVM: x86: directly call recalculate_apic_map on lapic restore
  KVM: x86: reset lapic base in kvm_lapic_reset
  KVM: x86: add KVM_CAP_X2APIC_API
  KVM: x86: bump MAX_VCPUS to 288

 Documentation/virtual/kvm/api.txt |  26 +++
 arch/x86/include/asm/kvm_host.h   |  17 +-
 arch/x86/kvm/irq_comm.c           |  29 ++--
 arch/x86/kvm/lapic.c              | 357 ++++++++++++++++++--------------------
 arch/x86/kvm/lapic.h              |   9 +-
 arch/x86/kvm/vmx.c                |   5 +-
 arch/x86/kvm/x86.c                |  14 ++
 include/uapi/linux/kvm.h          |   1 +
 8 files changed, 248 insertions(+), 210 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:42   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

240 has been well tested by Red Hat.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1: new, loosely related and should have been posted long ago

 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a628fb6a2c2..53d39771842b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -35,7 +35,7 @@
 #include <asm/kvm_page_track.h>
 
 #define KVM_MAX_VCPUS 255
-#define KVM_SOFT_MAX_VCPUS 160
+#define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-06-30 20:54 ` [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  7:57   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map Radim Krčmář
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
compute the interrupt destination.  Factor the code.

'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
GCC might complain about potential NULL access in the future, because it
missed conditions that avoided uninitialized uses of dst.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1: improved comment for kvm_apic_map_get_dest_lapic() [Peter]

 arch/x86/kvm/lapic.c | 241 ++++++++++++++++++++++-----------------------------
 1 file changed, 103 insertions(+), 138 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 22a6474af220..238b87b068db 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -671,14 +671,98 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
 	}
 }
 
+/* Return true if the interrupt can be handled by using *bitmap as index mask
+ * for valid destinations in *dst array.
+ * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
+ * Note: we may have zero kvm_lapic destinations when we return true, which
+ * means that the interrupt should be dropped.  In this case, *bitmap would be
+ * zero and *dst undefined.
+ */
+static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic *src,
+		struct kvm_lapic_irq *irq, struct kvm_apic_map *map,
+		struct kvm_lapic ***dst, unsigned long *bitmap)
+{
+	int i, lowest;
+	bool x2apic_ipi;
+	u16 cid;
+
+	if (irq->shorthand == APIC_DEST_SELF) {
+		*dst = &src;
+		*bitmap = 1;
+		return true;
+	} else if (irq->shorthand)
+		return false;
+
+	x2apic_ipi = src && apic_x2apic_mode(src);
+	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
+		return false;
+
+	if (!map)
+		return false;
+
+	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
+		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+			*bitmap = 0;
+		} else {
+			*dst = &map->phys_map[irq->dest_id];
+			*bitmap = 1;
+		}
+		return true;
+	}
+
+	if (!kvm_apic_logical_map_valid(map))
+		return false;
+
+	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
+
+	if (cid >= ARRAY_SIZE(map->logical_map)) {
+		*bitmap = 0;
+		return true;
+	}
+
+	*dst = map->logical_map[cid];
+
+	if (!kvm_lowest_prio_delivery(irq))
+		return true;
+
+	if (!kvm_vector_hashing_enabled()) {
+		lowest = -1;
+		for_each_set_bit(i, bitmap, 16) {
+			if (!(*dst)[i])
+				continue;
+			if (lowest < 0)
+				lowest = i;
+			else if (kvm_apic_compare_prio((*dst)[i]->vcpu,
+						(*dst)[lowest]->vcpu) < 0)
+				lowest = i;
+		}
+	} else {
+		if (!*bitmap)
+			return true;
+
+		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
+				bitmap, 16);
+
+		if (!(*dst)[lowest]) {
+			kvm_apic_disabled_lapic_found(kvm);
+			*bitmap = 0;
+			return true;
+		}
+	}
+
+	*bitmap = (lowest >= 0) ? 1 << lowest : 0;
+
+	return true;
+}
+
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
 	struct kvm_apic_map *map;
-	unsigned long bitmap = 1;
-	struct kvm_lapic **dst;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	int i;
-	bool ret, x2apic_ipi;
+	bool ret;
 
 	*r = -1;
 
@@ -687,86 +771,19 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		return true;
 	}
 
-	if (irq->shorthand)
-		return false;
-
-	x2apic_ipi = src && apic_x2apic_mode(src);
-	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
-		return false;
-
-	ret = true;
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map) {
-		ret = false;
-		goto out;
-	}
-
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = &map->phys_map[irq->dest_id];
-	} else {
-		u16 cid;
-
-		if (!kvm_apic_logical_map_valid(map)) {
-			ret = false;
-			goto out;
+	ret = kvm_apic_map_get_dest_lapic(kvm, src, irq, map, &dst, &bitmap);
+	if (ret)
+		for_each_set_bit(i, &bitmap, 16) {
+			if (!dst[i])
+				continue;
+			if (*r < 0)
+				*r = 0;
+			*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
 		}
 
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		dst = map->logical_map[cid];
-
-		if (!kvm_lowest_prio_delivery(irq))
-			goto set_irq;
-
-		if (!kvm_vector_hashing_enabled()) {
-			int l = -1;
-			for_each_set_bit(i, &bitmap, 16) {
-				if (!dst[i])
-					continue;
-				if (l < 0)
-					l = i;
-				else if (kvm_apic_compare_prio(dst[i]->vcpu,
-							dst[l]->vcpu) < 0)
-					l = i;
-			}
-			bitmap = (l >= 0) ? 1 << l : 0;
-		} else {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector,
-				dest_vcpus, &bitmap, 16);
-
-			if (!dst[idx]) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			bitmap = (idx >= 0) ? 1 << idx : 0;
-		}
-	}
-
-set_irq:
-	for_each_set_bit(i, &bitmap, 16) {
-		if (!dst[i])
-			continue;
-		if (*r < 0)
-			*r = 0;
-		*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
-	}
-out:
 	rcu_read_unlock();
 	return ret;
 }
@@ -789,8 +806,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
 	struct kvm_apic_map *map;
+	unsigned long bitmap;
+	struct kvm_lapic **dst = NULL;
 	bool ret = false;
-	struct kvm_lapic *dst = NULL;
 
 	if (irq->shorthand)
 		return false;
@@ -798,69 +816,16 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-	if (!map)
-		goto out;
+	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
+			hweight16(bitmap) == 1) {
+		unsigned long i = find_first_bit(&bitmap, 16);
 
-	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id == 0xFF)
-			goto out;
-
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
-			goto out;
-
-		dst = map->phys_map[irq->dest_id];
-		if (dst && kvm_apic_present(dst->vcpu))
-			*dest_vcpu = dst->vcpu;
-		else
-			goto out;
-	} else {
-		u16 cid;
-		unsigned long bitmap = 1;
-		int i, r = 0;
-
-		if (!kvm_apic_logical_map_valid(map))
-			goto out;
-
-		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
-
-		if (cid >= ARRAY_SIZE(map->logical_map))
-			goto out;
-
-		if (kvm_vector_hashing_enabled() &&
-				kvm_lowest_prio_delivery(irq)) {
-			int idx;
-			unsigned int dest_vcpus;
-
-			dest_vcpus = hweight16(bitmap);
-			if (dest_vcpus == 0)
-				goto out;
-
-			idx = kvm_vector_to_index(irq->vector, dest_vcpus,
-						  &bitmap, 16);
-
-			dst = map->logical_map[cid][idx];
-			if (!dst) {
-				kvm_apic_disabled_lapic_found(kvm);
-				goto out;
-			}
-
-			*dest_vcpu = dst->vcpu;
-		} else {
-			for_each_set_bit(i, &bitmap, 16) {
-				dst = map->logical_map[cid][i];
-				if (++r == 2)
-					goto out;
-			}
-
-			if (dst && kvm_apic_present(dst->vcpu))
-				*dest_vcpu = dst->vcpu;
-			else
-				goto out;
+		if (dst[i]) {
+			*dest_vcpu = dst[i]->vcpu;
+			ret = true;
 		}
 	}
 
-	ret = true;
-out:
 	rcu_read_unlock();
 	return ret;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
  2016-06-30 20:54 ` [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
  2016-06-30 20:54 ` [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-06-30 22:15   ` Andrew Honig
  2016-07-01  7:33   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic Radim Krčmář
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
have slighly less VCPUs.  Dynamic size saves memory at the cost of
turning one constant into a variable.

apic_map mutex had to be moved before allocation to avoid races with cpu
hotplug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1: use round_up [Peter]

 arch/x86/include/asm/kvm_host.h |  9 +++--
 arch/x86/kvm/lapic.c            | 86 ++++++++++++++++++++++-------------------
 arch/x86/kvm/lapic.h            |  2 +-
 3 files changed, 53 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53d39771842b..459a789cb3da 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -682,9 +682,12 @@ struct kvm_arch_memory_slot {
 struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 mode;
-	struct kvm_lapic *phys_map[256];
-	/* first index is cluster id second is cpu id in a cluster */
-	struct kvm_lapic *logical_map[16][16];
+	u32 size;
+	union {
+		struct kvm_lapic *xapic_flat_map[8];
+		struct kvm_lapic *xapic_cluster_map[16][4];
+	};
+	struct kvm_lapic *phys_map[];
 };
 
 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 238b87b068db..eed5af46e619 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -115,26 +115,31 @@ static inline int apic_enabled(struct kvm_lapic *apic)
 	(LVT_MASK | APIC_MODE_MASK | APIC_INPUT_POLARITY | \
 	 APIC_LVT_REMOTE_IRR | APIC_LVT_LEVEL_TRIGGER)
 
-/* The logical map is definitely wrong if we have multiple
- * modes at the same time.  (Physical map is always right.)
- */
-static inline bool kvm_apic_logical_map_valid(struct kvm_apic_map *map)
-{
-	return !(map->mode & (map->mode - 1));
-}
+static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
+		u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
+	switch (map->mode) {
+	case KVM_APIC_MODE_X2APIC: {
+		u32 offset = (dest_id >> 16) * 16;
+		if (offset < map->size) {
+			*cluster = &map->phys_map[offset];
+			*mask = dest_id & 0xffff;
+		} else
+			*mask = 0;
 
-static inline void
-apic_logical_id(struct kvm_apic_map *map, u32 dest_id, u16 *cid, u16 *lid)
-{
-	unsigned lid_bits;
-
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_CLUSTER !=  4);
-	BUILD_BUG_ON(KVM_APIC_MODE_XAPIC_FLAT    !=  8);
-	BUILD_BUG_ON(KVM_APIC_MODE_X2APIC        != 16);
-	lid_bits = map->mode;
-
-	*cid = dest_id >> lid_bits;
-	*lid = dest_id & ((1 << lid_bits) - 1);
+		return true;
+		}
+	case KVM_APIC_MODE_XAPIC_FLAT:
+		*cluster = map->xapic_flat_map;
+		*mask = dest_id & 0xff;
+		return true;
+	case KVM_APIC_MODE_XAPIC_CLUSTER:
+		*cluster = map->xapic_cluster_map[dest_id >> 4];
+		*mask = dest_id & 0xf;
+		return true;
+	default:
+		/* Not optimized. */
+		return false;
+	}
 }
 
 static void recalculate_apic_map(struct kvm *kvm)
@@ -142,17 +147,28 @@ static void recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	int i;
-
-	new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
+	u32 size, max_id = 255;
 
 	mutex_lock(&kvm->arch.apic_map_lock);
 
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		if (kvm_apic_present(vcpu))
+			max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
+
+	/* kvm_apic_map_get_logical_dest() expects multiples of 16 */
+	size = round_up(max_id + 1, 16);
+	new = kzalloc(sizeof(struct kvm_apic_map) +
+	              sizeof(struct kvm_lapic) * size, GFP_KERNEL);
+
 	if (!new)
 		goto out;
 
+	new->size = size;
+
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
-		u16 cid, lid;
+		struct kvm_lapic **cluster;
+		u16 mask;
 		u32 ldr, aid;
 
 		if (!kvm_apic_present(vcpu))
@@ -161,7 +177,7 @@ static void recalculate_apic_map(struct kvm *kvm)
 		aid = kvm_apic_id(apic);
 		ldr = kvm_lapic_get_reg(apic, APIC_LDR);
 
-		if (aid < ARRAY_SIZE(new->phys_map))
+		if (aid < size)
 			new->phys_map[aid] = apic;
 
 		if (apic_x2apic_mode(apic)) {
@@ -174,13 +190,11 @@ static void recalculate_apic_map(struct kvm *kvm)
 				new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
 		}
 
-		if (!kvm_apic_logical_map_valid(new))
+		if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
 			continue;
 
-		apic_logical_id(new, ldr, &cid, &lid);
-
-		if (lid && cid < ARRAY_SIZE(new->logical_map))
-			new->logical_map[cid][ffs(lid) - 1] = apic;
+		if (mask)
+			cluster[ffs(mask) - 1] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
@@ -684,7 +698,6 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 {
 	int i, lowest;
 	bool x2apic_ipi;
-	u16 cid;
 
 	if (irq->shorthand == APIC_DEST_SELF) {
 		*dst = &src;
@@ -701,7 +714,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 		return false;
 
 	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
-		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
+		if (irq->dest_id >= map->size) {
 			*bitmap = 0;
 		} else {
 			*dst = &map->phys_map[irq->dest_id];
@@ -710,18 +723,11 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 		return true;
 	}
 
-	if (!kvm_apic_logical_map_valid(map))
+	*bitmap = 0;
+	if (!kvm_apic_map_get_logical_dest(map, irq->dest_id, dst,
+				(u16 *)bitmap))
 		return false;
 
-	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
-
-	if (cid >= ARRAY_SIZE(map->logical_map)) {
-		*bitmap = 0;
-		return true;
-	}
-
-	*dst = map->logical_map[cid];
-
 	if (!kvm_lowest_prio_delivery(irq))
 		return true;
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 336ba51bb16e..8d811139d2b3 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -200,7 +200,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 	return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
 }
 
-static inline int kvm_apic_id(struct kvm_lapic *apic)
+static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
 	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (2 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  7:56   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing Radim Krčmář
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

for_all_bits() never were type safe, so we don't lose safety when
casting u16 * to unsigned long * and u16 is the size we want to avoid
mishaps when sending the pointer to functions that operate only on 16
bits.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index eed5af46e619..4296bb7353ff 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -694,7 +694,7 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
  */
 static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, struct kvm_apic_map *map,
-		struct kvm_lapic ***dst, unsigned long *bitmap)
+		struct kvm_lapic ***dst, u16 *bitmap)
 {
 	int i, lowest;
 	bool x2apic_ipi;
@@ -723,9 +723,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 		return true;
 	}
 
-	*bitmap = 0;
-	if (!kvm_apic_map_get_logical_dest(map, irq->dest_id, dst,
-				(u16 *)bitmap))
+	if (!kvm_apic_map_get_logical_dest(map, irq->dest_id, dst, bitmap))
 		return false;
 
 	if (!kvm_lowest_prio_delivery(irq))
@@ -733,7 +731,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 
 	if (!kvm_vector_hashing_enabled()) {
 		lowest = -1;
-		for_each_set_bit(i, bitmap, 16) {
+		for_each_set_bit(i, (unsigned long *)bitmap, 16) {
 			if (!(*dst)[i])
 				continue;
 			if (lowest < 0)
@@ -747,7 +745,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
 			return true;
 
 		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
-				bitmap, 16);
+				(unsigned long *)bitmap, 16);
 
 		if (!(*dst)[lowest]) {
 			kvm_apic_disabled_lapic_found(kvm);
@@ -765,7 +763,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
 {
 	struct kvm_apic_map *map;
-	unsigned long bitmap;
+	u16 bitmap;
 	struct kvm_lapic **dst = NULL;
 	int i;
 	bool ret;
@@ -782,7 +780,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 	ret = kvm_apic_map_get_dest_lapic(kvm, src, irq, map, &dst, &bitmap);
 	if (ret)
-		for_each_set_bit(i, &bitmap, 16) {
+		for_each_set_bit(i, (unsigned long *)&bitmap, 16) {
 			if (!dst[i])
 				continue;
 			if (*r < 0)
@@ -812,7 +810,7 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			struct kvm_vcpu **dest_vcpu)
 {
 	struct kvm_apic_map *map;
-	unsigned long bitmap;
+	u16 bitmap;
 	struct kvm_lapic **dst = NULL;
 	bool ret = false;
 
@@ -824,7 +822,7 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
 
 	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
 			hweight16(bitmap) == 1) {
-		unsigned long i = find_first_bit(&bitmap, 16);
+		unsigned long i = find_first_bit((unsigned long *)&bitmap, 16);
 
 		if (dst[i]) {
 			*dest_vcpu = dst[i]->vcpu;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (3 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:42   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/irq_comm.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index dfb4c6476877..47ad681a33fd 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -388,21 +388,16 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			       kvm->arch.nr_reserved_ioapic_pins);
 	for (i = 0; i < nr_ioapic_pins; ++i) {
 		hlist_for_each_entry(entry, &table->map[i], link) {
-			u32 dest_id, dest_mode;
-			bool level;
+			struct kvm_lapic_irq irq;
 
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
-			dest_id = (entry->msi.address_lo >> 12) & 0xff;
-			dest_mode = (entry->msi.address_lo >> 2) & 0x1;
-			level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
-			if (level && kvm_apic_match_dest(vcpu, NULL, 0,
-						dest_id, dest_mode)) {
-				u32 vector = entry->msi.data & 0xff;
 
-				__set_bit(vector,
-					  ioapic_handled_vectors);
-			}
+			kvm_set_msi_irq(entry, &irq);
+
+			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
+						irq.dest_id, irq.dest_mode))
+				__set_bit(irq.vector, ioapic_handled_vectors);
 		}
 	}
 	srcu_read_unlock(&kvm->irq_srcu, idx);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (4 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:33   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

We currently always shift APIC ID as if APIC was in xAPIC mode.
x2APIC mode wants to use more bits and storing a hardware-compabible
value is the the sanest option.

KVM API to set the lapic expects that bottom 8 bits of APIC ID are in
top 8 bits of APIC_ID register, so the register needs to be shifted in
x2APIC mode.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1:
 * drop big endian format in lapic get/set ioctl [Paolo]
 * exclude APICv optimization of APIC ID register
 * rename __kvm_apic_state_fixup() parameter from 'restore' to 'set'

 arch/x86/kvm/lapic.c | 44 ++++++++++++++++++++++++++++++++------------
 arch/x86/kvm/lapic.h |  7 ++++++-
 arch/x86/kvm/x86.c   |  2 ++
 3 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4296bb7353ff..d914b5351fdc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -224,7 +224,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 	}
 }
 
-static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_xapic_id(struct kvm_lapic *apic, u8 id)
 {
 	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
 	recalculate_apic_map(apic->vcpu->kvm);
@@ -236,11 +236,11 @@ static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id)
 	recalculate_apic_map(apic->vcpu->kvm);
 }
 
-static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u8 id)
+static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id)
 {
 	u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf));
 
-	kvm_lapic_set_reg(apic, APIC_ID, id << 24);
+	kvm_lapic_set_reg(apic, APIC_ID, id);
 	kvm_lapic_set_reg(apic, APIC_LDR, ldr);
 	recalculate_apic_map(apic->vcpu->kvm);
 }
@@ -1096,12 +1096,6 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int offset)
 		return 0;
 
 	switch (offset) {
-	case APIC_ID:
-		if (apic_x2apic_mode(apic))
-			val = kvm_apic_id(apic);
-		else
-			val = kvm_apic_id(apic) << 24;
-		break;
 	case APIC_ARBPRI:
 		apic_debug("Access APIC ARBPRI register which is for P6\n");
 		break;
@@ -1459,7 +1453,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 	switch (reg) {
 	case APIC_ID:		/* Local APIC ID */
 		if (!apic_x2apic_mode(apic))
-			kvm_apic_set_id(apic, val >> 24);
+			kvm_apic_set_xapic_id(apic, val >> 24);
 		else
 			ret = 1;
 		break;
@@ -1731,8 +1725,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		if (value & X2APIC_ENABLE) {
 			kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
 			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, true);
-		} else
+		} else {
+			kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 			kvm_x86_ops->set_virtual_x2apic_mode(vcpu, false);
+		}
 	}
 
 	apic->base_address = apic->vcpu->arch.apic_base &
@@ -1763,7 +1759,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
 	if (!init_event)
-		kvm_apic_set_id(apic, vcpu->vcpu_id);
+		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1984,6 +1980,30 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 	return vector;
 }
 
+static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s, bool set)
+{
+	if (apic_x2apic_mode(vcpu->arch.apic)) {
+		u32 *id = (u32 *)(s->regs + APIC_ID);
+		if (set)
+			*id >>= 24;
+		else
+			*id <<= 24;
+	}
+}
+
+void kvm_apic_state_get_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s)
+{
+	__kvm_apic_state_fixup(vcpu, s, false);
+}
+
+void kvm_apic_state_set_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s)
+{
+	__kvm_apic_state_fixup(vcpu, s, true);
+}
+
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s)
 {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 8d811139d2b3..8d33cc56ee62 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -81,6 +81,10 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+void kvm_apic_state_get_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s);
+void kvm_apic_state_set_fixup(struct kvm_vcpu *vcpu,
+		struct kvm_lapic_state *s);
 void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
@@ -202,7 +206,8 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
 
 static inline u32 kvm_apic_id(struct kvm_lapic *apic)
 {
-	return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+	u32 id = kvm_lapic_get_reg(apic, APIC_ID);
+	return apic_x2apic_mode(apic) ? id : id >> 24;
 }
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf834cdd..043f110f2210 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2780,6 +2780,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
 	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
+	kvm_apic_state_get_fixup(vcpu, s);
 
 	return 0;
 }
@@ -2787,6 +2788,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
 				    struct kvm_lapic_state *s)
 {
+	kvm_apic_state_set_fixup(vcpu, s);
 	kvm_apic_post_state_restore(vcpu, s);
 	update_cr8_intercept(vcpu);
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (5 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:42   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore Radim Krčmář
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

The register is in hardware-compatible format now, so there is not need
to intercept.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1: new, split from the previous patch

 arch/x86/kvm/vmx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a882ca..a10038258b80 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6447,9 +6447,6 @@ static __init int hardware_setup(void)
 	for (msr = 0x800; msr <= 0x8ff; msr++)
 		vmx_disable_intercept_msr_read_x2apic(msr);
 
-	/* According SDM, in x2apic mode, the whole id reg is used.  But in
-	 * KVM, it only use the highest eight bits. Need to intercept it */
-	vmx_enable_intercept_msr_read_x2apic(0x802);
 	/* TMCCT */
 	vmx_enable_intercept_msr_read_x2apic(0x839);
 	/* TPR */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (6 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:43   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

The get/set dance was just for that.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d914b5351fdc..143df33f451e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2013,8 +2013,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
 	/* set SPIV separately to get count of SW disabled APICs right */
 	apic_set_spiv(apic, *((u32 *)(s->regs + APIC_SPIV)));
 	memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
-	/* call kvm_apic_set_id() to put apic into apic_map */
-	kvm_apic_set_id(apic, kvm_apic_id(apic));
+	/* put apic into apic_map */
+	recalculate_apic_map(vcpu->kvm);
 	kvm_apic_set_version(vcpu);
 
 	apic_update_ppr(apic);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (7 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:43   ` Paolo Bonzini
  2016-06-30 20:54 ` [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
  2016-06-30 20:54 ` [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

LAPIC is reset in xAPIC mode and the surrounding code expects that.
KVM never resets after initialization.  This patch is just for sanity.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/lapic.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 143df33f451e..46eb71c425cf 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1758,8 +1758,11 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 	/* Stop the timer in case it's a reset to an active apic */
 	hrtimer_cancel(&apic->lapic_timer.timer);
 
-	if (!init_event)
+	if (!init_event) {
+		kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE |
+		                         MSR_IA32_APICBASE_ENABLE);
 		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
+	}
 	kvm_apic_set_version(apic->vcpu);
 
 	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
@@ -1898,9 +1901,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 	 * thinking that APIC satet has changed.
 	 */
 	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
-	kvm_lapic_set_base(vcpu,
-			APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
-
 	static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */
 	kvm_lapic_reset(vcpu, false);
 	kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (8 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:24   ` Paolo Bonzini
  2016-07-01 18:09   ` David Matlack
  2016-06-30 20:54 ` [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
  10 siblings, 2 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
addresses to 32 bits.  Both are needed to support x2APIC.

The capability has to be toggleable and disabled by default, because get/set
ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
inspired by xAPIC and the change is not backward-compatible.

Changes to MSI addresses follow the format used by interrupt remapping unit.
The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
address in its upper 24 bits.  Lower 8 bits are reserved as 0.
Using the upper address word is not backward-compatible either as we didn't
check that userspace zeroed the word.  Reserved bits are still not explicitly
checked, but non-zero data will affect LAPIC addresses, which will cause a bug.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 v1:
 * rewritten with a toggleable capability [Paolo]
 * dropped MSI_ADDR_EXT_DEST_ID to enforce reserved bits

 Documentation/virtual/kvm/api.txt | 26 ++++++++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h   |  4 +++-
 arch/x86/kvm/irq_comm.c           | 14 ++++++++++----
 arch/x86/kvm/lapic.c              |  2 +-
 arch/x86/kvm/vmx.c                |  2 +-
 arch/x86/kvm/x86.c                | 12 ++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 7 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 09efa9eb3926..0f978089a0f6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1482,6 +1482,9 @@ struct kvm_irq_routing_msi {
 	__u32 pad;
 };
 
+If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
+of destination id and address_hi bits 7-0 is must be 0.
+
 struct kvm_irq_routing_s390_adapter {
 	__u64 ind_addr;
 	__u64 summary_addr;
@@ -1583,6 +1586,13 @@ struct kvm_lapic_state {
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register depends
+on APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.  The format follows
+xAPIC otherwise.
+
+x2APIC stores APIC ID as little endian in bits 31-0 of APIC_ID register.
+xAPIC stores bits 7-0 of APIC ID in register bits 31-24.
+
 
 4.58 KVM_SET_LAPIC
 
@@ -1600,6 +1610,8 @@ struct kvm_lapic_state {
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+See the note about APIC_ID register in KVM_GET_LAPIC.
+
 
 4.59 KVM_IOEVENTFD
 
@@ -2180,6 +2192,9 @@ struct kvm_msi {
 
 No flags are defined so far. The corresponding field must be 0.
 
+If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
+of destination id and address_hi bits 7-0 is must be 0.
+
 
 4.71 KVM_CREATE_PIT2
 
@@ -3811,6 +3826,17 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
 Will return -EINVAL if the machine does not support runtime-instrumentation.
 Will return -EBUSY if a VCPU has already been created.
 
+7.7 KVM_CAP_X2APIC_API
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success, -EINVAL if reserved parameters are not 0
+
+Enabling this capability changes the behavior of KVM_SET_GSI_ROUTING,
+KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC.  See KVM_CAP_X2APIC_API
+in their respective sections.
+
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 459a789cb3da..48b0ca18066c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -782,6 +782,8 @@ struct kvm_arch {
 	u32 ldr_mode;
 	struct page *avic_logical_id_table_page;
 	struct page *avic_physical_id_table_page;
+
+	bool x2apic_api;
 };
 
 struct kvm_vm_stat {
@@ -1365,7 +1367,7 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 			     struct kvm_vcpu **dest_vcpu);
 
 void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
-		     struct kvm_lapic_irq *irq);
+		     struct kvm_lapic_irq *irq, bool x2apic_api);
 
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 47ad681a33fd..4594644ab090 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -111,12 +111,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 }
 
 void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
-		     struct kvm_lapic_irq *irq)
+		     struct kvm_lapic_irq *irq, bool x2apic_api)
 {
 	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
 
 	irq->dest_id = (e->msi.address_lo &
 			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+	if (x2apic_api)
+		/* MSI_ADDR_EXT_DEST_ID() is omitted to introduce bugs on
+		 * userspaces that set reserved bits 0-7.
+		 */
+		irq->dest_id |= e->msi.address_hi;
 	irq->vector = (e->msi.data &
 			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
 	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
@@ -137,7 +142,7 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 	if (!level)
 		return -1;
 
-	kvm_set_msi_irq(e, &irq);
+	kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
 
 	return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
 }
@@ -153,7 +158,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
 	if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
 		return -EWOULDBLOCK;
 
-	kvm_set_msi_irq(e, &irq);
+	kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
 
 	if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
 		return r;
@@ -393,7 +398,8 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
 			if (entry->type != KVM_IRQ_ROUTING_MSI)
 				continue;
 
-			kvm_set_msi_irq(entry, &irq);
+			kvm_set_msi_irq(entry, &irq,
+					vcpu->kvm->arch.x2apic_api);
 
 			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
 						irq.dest_id, irq.dest_mode))
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 46eb71c425cf..178605635df5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1983,7 +1983,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s, bool set)
 {
-	if (apic_x2apic_mode(vcpu->arch.apic)) {
+	if (apic_x2apic_mode(vcpu->arch.apic) && !vcpu->kvm->arch.x2apic_api) {
 		u32 *id = (u32 *)(s->regs + APIC_ID);
 		if (set)
 			*id >>= 24;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a10038258b80..ea1f439b444e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11075,7 +11075,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		 * We will support full lowest-priority interrupt later.
 		 */
 
-		kvm_set_msi_irq(e, &irq);
+		kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
 		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
 			/*
 			 * Make sure the IRTE is in remapped mode if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 043f110f2210..16b55f09dd16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2576,6 +2576,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
+	case KVM_CAP_X2APIC_API:
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 	case KVM_CAP_ASSIGN_DEV_IRQ:
 	case KVM_CAP_PCI_2_3:
@@ -3799,6 +3800,17 @@ split_irqchip_unlock:
 		mutex_unlock(&kvm->lock);
 		break;
 	}
+	case KVM_CAP_X2APIC_API: {
+		struct kvm_enable_cap valid = {.cap = KVM_CAP_X2APIC_API};
+
+		r = -EINVAL;
+		if (memcmp(cap, &valid, sizeof(valid)))
+			break;
+
+		kvm->arch.x2apic_api = true;
+		r = 0;
+		break;
+	}
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf475104c..43b355d6db7b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_X2APIC_API 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288
  2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
                   ` (9 preceding siblings ...)
  2016-06-30 20:54 ` [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-06-30 20:54 ` Radim Krčmář
  2016-07-01  8:43   ` Paolo Bonzini
  10 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-06-30 20:54 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

288 is in high demand because of Knights Landing CPU.
We cannot set the limit to 640k, because that would be wasting space.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48b0ca18066c..411da14a675e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -34,7 +34,7 @@
 #include <asm/asm.h>
 #include <asm/kvm_page_track.h>
 
-#define KVM_MAX_VCPUS 255
+#define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-06-30 20:54 ` [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map Radim Krčmář
@ 2016-06-30 22:15   ` Andrew Honig
  2016-07-01  8:42     ` Paolo Bonzini
  2016-07-01  7:33   ` Paolo Bonzini
  1 sibling, 1 reply; 45+ messages in thread
From: Andrew Honig @ 2016-06-30 22:15 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Paolo Bonzini, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

> -
> -       new = kzalloc(sizeof(struct kvm_apic_map), GFP_KERNEL);
> +       u32 size, max_id = 255;
>
>         mutex_lock(&kvm->arch.apic_map_lock);
>
> +       kvm_for_each_vcpu(i, vcpu, kvm)
> +               if (kvm_apic_present(vcpu))
> +                       max_id = max(max_id, kvm_apic_id(vcpu->arch.apic));
> +
> +       /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
> +       size = round_up(max_id + 1, 16);

Now that you're using the full range of apic_id values, could this
calculation overflow?  Perhaps max_id could be u64?

> +       new = kzalloc(sizeof(struct kvm_apic_map) +
> +                     sizeof(struct kvm_lapic) * size, GFP_KERNEL);
> +
>         if (!new)
>                 goto out;
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-06-30 20:54 ` [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map Radim Krčmář
  2016-06-30 22:15   ` Andrew Honig
@ 2016-07-01  7:33   ` Paolo Bonzini
  1 sibling, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  7:33 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> x2APIC supports up to 2^32-1 LAPICs, but most guest in coming years will
> have slighly less VCPUs.  Dynamic size saves memory at the cost of
> turning one constant into a variable.
> 
> apic_map mutex had to be moved before allocation to avoid races with cpu
> hotplug.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

It's important to note another change here, which is that you start
using the phys_map for x2apic clustered mode.  It's so important that
you could make it a separate patch while keeping the phys_map static! :)

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic
  2016-06-30 20:54 ` [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic Radim Krčmář
@ 2016-07-01  7:56   ` Paolo Bonzini
  2016-07-01 12:48     ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  7:56 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> @@ -747,7 +745,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
>  			return true;
>  
>  		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
> -				bitmap, 16);
> +				(unsigned long *)bitmap, 16);
>  

kvm_vector_to_index's last argument is always 16, so we might as well
omit it and make the third argument u16* too.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic
  2016-06-30 20:54 ` [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
@ 2016-07-01  7:57   ` Paolo Bonzini
  2016-07-01 12:39     ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  7:57 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
> compute the interrupt destination.  Factor the code.
> 
> 'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
> GCC might complain about potential NULL access in the future, because it
> missed conditions that avoided uninitialized uses of dst.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v1: improved comment for kvm_apic_map_get_dest_lapic() [Peter]
> 
>  arch/x86/kvm/lapic.c | 241 ++++++++++++++++++++++-----------------------------
>  1 file changed, 103 insertions(+), 138 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 22a6474af220..238b87b068db 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -671,14 +671,98 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
>  	}
>  }
>  
> +/* Return true if the interrupt can be handled by using *bitmap as index mask
> + * for valid destinations in *dst array.
> + * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
> + * Note: we may have zero kvm_lapic destinations when we return true, which
> + * means that the interrupt should be dropped.  In this case, *bitmap would be
> + * zero and *dst undefined.
> + */
> +static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic *src,
> +		struct kvm_lapic_irq *irq, struct kvm_apic_map *map,
> +		struct kvm_lapic ***dst, unsigned long *bitmap)
> +{
> +	int i, lowest;
> +	bool x2apic_ipi;
> +	u16 cid;
> +
> +	if (irq->shorthand == APIC_DEST_SELF) {
> +		*dst = &src;

This is not valid, &src dies as soon as you leave the function.  You
need to pass &src into kvm_apic_map_get_dest_lapic and here do something
like

	if (irq->shorthand) {
		if (irq->shorthand == APIC_DEST_SELF && p_src) {
			*dst = p_src;
			*bitmap = 1;
			return true;
		}
		return false;
	}

If it's not too hard, I'd like to have patch 4 before this one.

Paolo

> +		*bitmap = 1;
> +		return true;
> +	} else if (irq->shorthand)
> +		return false;
> +
> +	x2apic_ipi = src && apic_x2apic_mode(src);
> +	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
> +		return false;
> +
> +	if (!map)
> +		return false;
> +
> +	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> +		if (irq->dest_id >= ARRAY_SIZE(map->phys_map)) {
> +			*bitmap = 0;
> +		} else {
> +			*dst = &map->phys_map[irq->dest_id];
> +			*bitmap = 1;
> +		}
> +		return true;
> +	}
> +
> +	if (!kvm_apic_logical_map_valid(map))
> +		return false;
> +
> +	apic_logical_id(map, irq->dest_id, &cid, (u16 *)bitmap);
> +
> +	if (cid >= ARRAY_SIZE(map->logical_map)) {
> +		*bitmap = 0;
> +		return true;
> +	}
> +
> +	*dst = map->logical_map[cid];
> +
> +	if (!kvm_lowest_prio_delivery(irq))
> +		return true;
> +
> +	if (!kvm_vector_hashing_enabled()) {
> +		lowest = -1;
> +		for_each_set_bit(i, bitmap, 16) {
> +			if (!(*dst)[i])
> +				continue;
> +			if (lowest < 0)
> +				lowest = i;
> +			else if (kvm_apic_compare_prio((*dst)[i]->vcpu,
> +						(*dst)[lowest]->vcpu) < 0)
> +				lowest = i;
> +		}
> +	} else {
> +		if (!*bitmap)
> +			return true;
> +
> +		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
> +				bitmap, 16);
> +
> +		if (!(*dst)[lowest]) {
> +			kvm_apic_disabled_lapic_found(kvm);
> +			*bitmap = 0;
> +			return true;
> +		}
> +	}
> +
> +	*bitmap = (lowest >= 0) ? 1 << lowest : 0;
> +
> +	return true;
> +}
> +
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map)
>  {
>  	struct kvm_apic_map *map;
> -	unsigned long bitmap = 1;
> -	struct kvm_lapic **dst;
> +	unsigned long bitmap;
> +	struct kvm_lapic **dst = NULL;
>  	int i;
> -	bool ret, x2apic_ipi;
> +	bool ret;
>  
>  	*r = -1;
>  
> @@ -687,86 +771,19 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		return true;
>  	}
>  
> -	if (irq->shorthand)
> -		return false;
> -
> -	x2apic_ipi = src && apic_x2apic_mode(src);
> -	if (irq->dest_id == (x2apic_ipi ? X2APIC_BROADCAST : APIC_BROADCAST))
> -		return false;
> -
> -	ret = true;
>  	rcu_read_lock();
>  	map = rcu_dereference(kvm->arch.apic_map);
>  
> -	if (!map) {
> -		ret = false;
> -		goto out;
> -	}
> -
> -	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> -		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
> -			goto out;
> -
> -		dst = &map->phys_map[irq->dest_id];
> -	} else {
> -		u16 cid;
> -
> -		if (!kvm_apic_logical_map_valid(map)) {
> -			ret = false;
> -			goto out;
> +	ret = kvm_apic_map_get_dest_lapic(kvm, src, irq, map, &dst, &bitmap);
> +	if (ret)
> +		for_each_set_bit(i, &bitmap, 16) {
> +			if (!dst[i])
> +				continue;
> +			if (*r < 0)
> +				*r = 0;
> +			*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
>  		}
>  
> -		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
> -
> -		if (cid >= ARRAY_SIZE(map->logical_map))
> -			goto out;
> -
> -		dst = map->logical_map[cid];
> -
> -		if (!kvm_lowest_prio_delivery(irq))
> -			goto set_irq;
> -
> -		if (!kvm_vector_hashing_enabled()) {
> -			int l = -1;
> -			for_each_set_bit(i, &bitmap, 16) {
> -				if (!dst[i])
> -					continue;
> -				if (l < 0)
> -					l = i;
> -				else if (kvm_apic_compare_prio(dst[i]->vcpu,
> -							dst[l]->vcpu) < 0)
> -					l = i;
> -			}
> -			bitmap = (l >= 0) ? 1 << l : 0;
> -		} else {
> -			int idx;
> -			unsigned int dest_vcpus;
> -
> -			dest_vcpus = hweight16(bitmap);
> -			if (dest_vcpus == 0)
> -				goto out;
> -
> -			idx = kvm_vector_to_index(irq->vector,
> -				dest_vcpus, &bitmap, 16);
> -
> -			if (!dst[idx]) {
> -				kvm_apic_disabled_lapic_found(kvm);
> -				goto out;
> -			}
> -
> -			bitmap = (idx >= 0) ? 1 << idx : 0;
> -		}
> -	}
> -
> -set_irq:
> -	for_each_set_bit(i, &bitmap, 16) {
> -		if (!dst[i])
> -			continue;
> -		if (*r < 0)
> -			*r = 0;
> -		*r += kvm_apic_set_irq(dst[i]->vcpu, irq, dest_map);
> -	}
> -out:
>  	rcu_read_unlock();
>  	return ret;
>  }
> @@ -789,8 +806,9 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  			struct kvm_vcpu **dest_vcpu)
>  {
>  	struct kvm_apic_map *map;
> +	unsigned long bitmap;
> +	struct kvm_lapic **dst = NULL;
>  	bool ret = false;
> -	struct kvm_lapic *dst = NULL;
>  
>  	if (irq->shorthand)
>  		return false;
> @@ -798,69 +816,16 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  	rcu_read_lock();
>  	map = rcu_dereference(kvm->arch.apic_map);
>  
> -	if (!map)
> -		goto out;
> +	if (kvm_apic_map_get_dest_lapic(kvm, NULL, irq, map, &dst, &bitmap) &&
> +			hweight16(bitmap) == 1) {
> +		unsigned long i = find_first_bit(&bitmap, 16);
>  
> -	if (irq->dest_mode == APIC_DEST_PHYSICAL) {
> -		if (irq->dest_id == 0xFF)
> -			goto out;
> -
> -		if (irq->dest_id >= ARRAY_SIZE(map->phys_map))
> -			goto out;
> -
> -		dst = map->phys_map[irq->dest_id];
> -		if (dst && kvm_apic_present(dst->vcpu))
> -			*dest_vcpu = dst->vcpu;
> -		else
> -			goto out;
> -	} else {
> -		u16 cid;
> -		unsigned long bitmap = 1;
> -		int i, r = 0;
> -
> -		if (!kvm_apic_logical_map_valid(map))
> -			goto out;
> -
> -		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
> -
> -		if (cid >= ARRAY_SIZE(map->logical_map))
> -			goto out;
> -
> -		if (kvm_vector_hashing_enabled() &&
> -				kvm_lowest_prio_delivery(irq)) {
> -			int idx;
> -			unsigned int dest_vcpus;
> -
> -			dest_vcpus = hweight16(bitmap);
> -			if (dest_vcpus == 0)
> -				goto out;
> -
> -			idx = kvm_vector_to_index(irq->vector, dest_vcpus,
> -						  &bitmap, 16);
> -
> -			dst = map->logical_map[cid][idx];
> -			if (!dst) {
> -				kvm_apic_disabled_lapic_found(kvm);
> -				goto out;
> -			}
> -
> -			*dest_vcpu = dst->vcpu;
> -		} else {
> -			for_each_set_bit(i, &bitmap, 16) {
> -				dst = map->logical_map[cid][i];
> -				if (++r == 2)
> -					goto out;
> -			}
> -
> -			if (dst && kvm_apic_present(dst->vcpu))
> -				*dest_vcpu = dst->vcpu;
> -			else
> -				goto out;
> +		if (dst[i]) {
> +			*dest_vcpu = dst[i]->vcpu;
> +			ret = true;
>  		}
>  	}
>  
> -	ret = true;
> -out:
>  	rcu_read_unlock();
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API
  2016-06-30 20:54 ` [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
@ 2016-07-01  8:24   ` Paolo Bonzini
  2016-07-01 13:25     ` Radim Krčmář
  2016-07-01 18:09   ` David Matlack
  1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:24 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
> addresses to 32 bits.  Both are needed to support x2APIC.
> 
> The capability has to be toggleable and disabled by default, because get/set
> ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
> inspired by xAPIC and the change is not backward-compatible.
> 
> Changes to MSI addresses follow the format used by interrupt remapping unit.
> The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
> Using the upper address word is not backward-compatible either as we didn't
> check that userspace zeroed the word.  Reserved bits are still not explicitly
> checked, but non-zero data will affect LAPIC addresses, which will cause a bug.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v1:
>  * rewritten with a toggleable capability [Paolo]
>  * dropped MSI_ADDR_EXT_DEST_ID to enforce reserved bits
> 
>  Documentation/virtual/kvm/api.txt | 26 ++++++++++++++++++++++++++
>  arch/x86/include/asm/kvm_host.h   |  4 +++-
>  arch/x86/kvm/irq_comm.c           | 14 ++++++++++----
>  arch/x86/kvm/lapic.c              |  2 +-
>  arch/x86/kvm/vmx.c                |  2 +-
>  arch/x86/kvm/x86.c                | 12 ++++++++++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 54 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 09efa9eb3926..0f978089a0f6 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1482,6 +1482,9 @@ struct kvm_irq_routing_msi {
>  	__u32 pad;
>  };
>  
> +If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
> +of destination id and address_hi bits 7-0 is must be 0.

On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability
is enabled.  If it is enabled, address_hi bits 31-8 provide bits 31-8 of
the destination id.  Bits 7-0 of address_hi must be zero.

>  struct kvm_irq_routing_s390_adapter {
>  	__u64 ind_addr;
>  	__u64 summary_addr;
> @@ -1583,6 +1586,13 @@ struct kvm_lapic_state {
>  Reads the Local APIC registers and copies them into the input argument.  The
>  data format and layout are the same as documented in the architecture manual.
>  
> +If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register depends
> +on APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.  The format follows
> +xAPIC otherwise.
> +
> +x2APIC stores APIC ID as little endian in bits 31-0 of APIC_ID register.
> +xAPIC stores bits 7-0 of APIC ID in register bits 31-24.

If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register
depends on the APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.
x2APIC stores APIC ID in the APIC_ID register (bytes 32-35).  xAPIC only
allows an 8-bit APIC ID which is stored in bits 31-24 of the APIC
register, or equivalently in byte 35 of struct kvm_lapic_state's regs field.

If KVM_CAP_X2APIC_API is disabled, struct kvm_lapic_state always uses
xAPIC format.

> +
>  
>  4.58 KVM_SET_LAPIC
>  
> @@ -1600,6 +1610,8 @@ struct kvm_lapic_state {
>  Copies the input argument into the Local APIC registers.  The data format
>  and layout are the same as documented in the architecture manual.
>  
> +See the note about APIC_ID register in KVM_GET_LAPIC.

The format of the APIC ID register (bytes 32-35 of struct
kvm_lapic_state's regs field) depends on the state of the
KVM_CAP_X2APIC_API capability.  See the note in KVM_GET_LAPIC.
>  
>  4.59 KVM_IOEVENTFD
>  
> @@ -2180,6 +2192,9 @@ struct kvm_msi {
>  
>  No flags are defined so far. The corresponding field must be 0.
>  
> +If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
> +of destination id and address_hi bits 7-0 is must be 0.

Please use the same wording as in KVM_SET_GSI_ROUTING.

>  
>  4.71 KVM_CREATE_PIT2
>  
> @@ -3811,6 +3826,17 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
>  Will return -EINVAL if the machine does not support runtime-instrumentation.
>  Will return -EBUSY if a VCPU has already been created.
>  
> +7.7 KVM_CAP_X2APIC_API
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success, -EINVAL if reserved parameters are not 0
> +
> +Enabling this capability changes the behavior of KVM_SET_GSI_ROUTING,
> +KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC

... ", allowing the use of 32-bit APIC IDs".

> .  See KVM_CAP_X2APIC_API
> +in their respective sections.
> +
> +
>  8. Other capabilities.
>  ----------------------
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 459a789cb3da..48b0ca18066c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -782,6 +782,8 @@ struct kvm_arch {
>  	u32 ldr_mode;
>  	struct page *avic_logical_id_table_page;
>  	struct page *avic_physical_id_table_page;
> +
> +	bool x2apic_api;
>  };
>  
>  struct kvm_vm_stat {
> @@ -1365,7 +1367,7 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
>  			     struct kvm_vcpu **dest_vcpu);
>  
>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> -		     struct kvm_lapic_irq *irq);
> +		     struct kvm_lapic_irq *irq, bool x2apic_api);

Just pass a struct kvm as the first argument.

>  
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 47ad681a33fd..4594644ab090 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -111,12 +111,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  }
>  
>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> -		     struct kvm_lapic_irq *irq)
> +		     struct kvm_lapic_irq *irq, bool x2apic_api)
>  {
>  	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
>  
>  	irq->dest_id = (e->msi.address_lo &
>  			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> +	if (x2apic_api)
> +		/* MSI_ADDR_EXT_DEST_ID() is omitted to introduce bugs on
> +		 * userspaces that set reserved bits 0-7.
> +		 */

Reread Rusty's API design guidelines and come back. ;)

Seriously, please validate the address_hi at both places
(KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI) and WARN here if you get
non-zero bits 7-0.

> +		irq->dest_id |= e->msi.address_hi;
>  	irq->vector = (e->msi.data &
>  			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
>  	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
> @@ -137,7 +142,7 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>  	if (!level)
>  		return -1;
>  
> -	kvm_set_msi_irq(e, &irq);
> +	kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>  
>  	return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
>  }
> @@ -153,7 +158,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
>  	if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
>  		return -EWOULDBLOCK;
>  
> -	kvm_set_msi_irq(e, &irq);
> +	kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>  
>  	if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
>  		return r;
> @@ -393,7 +398,8 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  			if (entry->type != KVM_IRQ_ROUTING_MSI)
>  				continue;
>  
> -			kvm_set_msi_irq(entry, &irq);
> +			kvm_set_msi_irq(entry, &irq,
> +					vcpu->kvm->arch.x2apic_api);
>  
>  			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
>  						irq.dest_id, irq.dest_mode))
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 46eb71c425cf..178605635df5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1983,7 +1983,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  		struct kvm_lapic_state *s, bool set)
>  {
> -	if (apic_x2apic_mode(vcpu->arch.apic)) {
> +	if (apic_x2apic_mode(vcpu->arch.apic) && !vcpu->kvm->arch.x2apic_api) {
>  		u32 *id = (u32 *)(s->regs + APIC_ID);
>  		if (set)
>  			*id >>= 24;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a10038258b80..ea1f439b444e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11075,7 +11075,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
>  		 * We will support full lowest-priority interrupt later.
>  		 */
>  
> -		kvm_set_msi_irq(e, &irq);
> +		kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>  		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>  			/*
>  			 * Make sure the IRTE is in remapped mode if
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 043f110f2210..16b55f09dd16 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2576,6 +2576,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_DISABLE_QUIRKS:
>  	case KVM_CAP_SET_BOOT_CPU_ID:
>   	case KVM_CAP_SPLIT_IRQCHIP:
> +	case KVM_CAP_X2APIC_API:
>  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
>  	case KVM_CAP_ASSIGN_DEV_IRQ:
>  	case KVM_CAP_PCI_2_3:
> @@ -3799,6 +3800,17 @@ split_irqchip_unlock:
>  		mutex_unlock(&kvm->lock);
>  		break;
>  	}
> +	case KVM_CAP_X2APIC_API: {
> +		struct kvm_enable_cap valid = {.cap = KVM_CAP_X2APIC_API};
> +
> +		r = -EINVAL;
> +		if (memcmp(cap, &valid, sizeof(valid)))
> +			break;

Nice trick, and strict argument checking in general is a good idea.
However it's ugly to do it only for KVM_CAP_X2APIC_API and we've really
bad at strict argument checking elsewhere.  For consistency, please
check that args[0] is zero, and forgo other violations. :(

Paolo

> +		kvm->arch.x2apic_api = true;
> +		r = 0;
> +		break;
> +	}
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 05ebf475104c..43b355d6db7b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_ARM_PMU_V3 126
>  #define KVM_CAP_VCPU_ATTRIBUTES 127
>  #define KVM_CAP_MAX_VCPU_ID 128
> +#define KVM_CAP_X2APIC_API 129
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-06-30 20:54 ` [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
@ 2016-07-01  8:33   ` Paolo Bonzini
  2016-07-01 13:11     ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:33 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> +static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> +		struct kvm_lapic_state *s, bool set)
> +{
> +	if (apic_x2apic_mode(vcpu->arch.apic)) {
> +		u32 *id = (u32 *)(s->regs + APIC_ID);
> +		if (set)
> +			*id >>= 24;
> +		else
> +			*id <<= 24;
> +	}

When setting, this should read from the apic_base being set.  So I think
you cannot use apic_x2apic_mode.

With this change, you don't need the struct kvm_vcpu argument now; add a
struct kvm argument instead in patch 10.

> @@ -2780,6 +2780,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>  		kvm_x86_ops->sync_pir_to_irr(vcpu);
>  
>  	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
> +	kvm_apic_state_get_fixup(vcpu, s);

Instead of kvm_apic_state_get/set_fixup, group the memcpy and
__kvm_apic_state_fixup in a new function kvm_apic_get_state(struct
kvm_lapic *apic, char *regs).

>  	return 0;
>  }
> @@ -2787,6 +2788,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>  static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
>  				    struct kvm_lapic_state *s)
>  {
> +	kvm_apic_state_set_fixup(vcpu, s);
>  	kvm_apic_post_state_restore(vcpu, s);

... and likewise merge these two in a refactored
kvm_apic_post_state_restore called kvm_apic_set_state(struct kvm_lapic
*apic, char *regs), by calling __kvm_apic_state_fixup before
kvm_apic_post_state_restore's memcpy.

With these changes I guess there's no need for the underscores in
__kvm_apic_state_fixup.  And you can also change the struct
kvm_lapic_state pointer in the function to a char *.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-06-30 22:15   ` Andrew Honig
@ 2016-07-01  8:42     ` Paolo Bonzini
  2016-07-01 12:44       ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:42 UTC (permalink / raw)
  To: Andrew Honig, Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 01/07/2016 00:15, Andrew Honig wrote:
>> > +       /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
>> > +       size = round_up(max_id + 1, 16);
> Now that you're using the full range of apic_id values, could this
> calculation overflow?  Perhaps max_id could be u64?

Good point, but I wonder if it's a good idea to let userspace allocate
32 GB of memory. :)

Let's put a limit on the maximum supported APIC ID, and report it
through KVM_CHECK_EXTENSION on the new KVM_CAP_X2APIC_API capability.
If 767 is enough for Knights Landing, the allocation below fits in two
pages.  If you need to make it higher, please change the allocation to
use kvm_kvzalloc and kvfree.

Last but not least...

>> > +       new = kzalloc(sizeof(struct kvm_apic_map) +
>> > +                     sizeof(struct kvm_lapic) * size, GFP_KERNEL);
                           ^^^^^^^^^^^^^^^^^^^^^^^^
... the sizeof here must be sizeof(struct kvm_lapic *).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240
  2016-06-30 20:54 ` [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
@ 2016-07-01  8:42   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:42 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> 240 has been well tested by Red Hat.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v1: new, loosely related and should have been posted long ago
> 
>  arch/x86/include/asm/kvm_host.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7a628fb6a2c2..53d39771842b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -35,7 +35,7 @@
>  #include <asm/kvm_page_track.h>
>  
>  #define KVM_MAX_VCPUS 255
> -#define KVM_SOFT_MAX_VCPUS 160
> +#define KVM_SOFT_MAX_VCPUS 240
>  #define KVM_USER_MEM_SLOTS 509
>  /* memory slots that are not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 3
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing
  2016-06-30 20:54 ` [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing Radim Krčmář
@ 2016-07-01  8:42   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:42 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/irq_comm.c | 17 ++++++-----------
>  1 file changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index dfb4c6476877..47ad681a33fd 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -388,21 +388,16 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>  			       kvm->arch.nr_reserved_ioapic_pins);
>  	for (i = 0; i < nr_ioapic_pins; ++i) {
>  		hlist_for_each_entry(entry, &table->map[i], link) {
> -			u32 dest_id, dest_mode;
> -			bool level;
> +			struct kvm_lapic_irq irq;
>  
>  			if (entry->type != KVM_IRQ_ROUTING_MSI)
>  				continue;
> -			dest_id = (entry->msi.address_lo >> 12) & 0xff;
> -			dest_mode = (entry->msi.address_lo >> 2) & 0x1;
> -			level = entry->msi.data & MSI_DATA_TRIGGER_LEVEL;
> -			if (level && kvm_apic_match_dest(vcpu, NULL, 0,
> -						dest_id, dest_mode)) {
> -				u32 vector = entry->msi.data & 0xff;
>  
> -				__set_bit(vector,
> -					  ioapic_handled_vectors);
> -			}
> +			kvm_set_msi_irq(entry, &irq);
> +
> +			if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
> +						irq.dest_id, irq.dest_mode))
> +				__set_bit(irq.vector, ioapic_handled_vectors);
>  		}
>  	}
>  	srcu_read_unlock(&kvm->irq_srcu, idx);
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv
  2016-06-30 20:54 ` [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
@ 2016-07-01  8:42   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:42 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> The register is in hardware-compatible format now, so there is not need
> to intercept.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v1: new, split from the previous patch
> 
>  arch/x86/kvm/vmx.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 85e2f0a882ca..a10038258b80 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6447,9 +6447,6 @@ static __init int hardware_setup(void)
>  	for (msr = 0x800; msr <= 0x8ff; msr++)
>  		vmx_disable_intercept_msr_read_x2apic(msr);
>  
> -	/* According SDM, in x2apic mode, the whole id reg is used.  But in
> -	 * KVM, it only use the highest eight bits. Need to intercept it */
> -	vmx_enable_intercept_msr_read_x2apic(0x802);
>  	/* TMCCT */
>  	vmx_enable_intercept_msr_read_x2apic(0x839);
>  	/* TPR */
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore
  2016-06-30 20:54 ` [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore Radim Krčmář
@ 2016-07-01  8:43   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:43 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> The get/set dance was just for that.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index d914b5351fdc..143df33f451e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2013,8 +2013,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
>  	/* set SPIV separately to get count of SW disabled APICs right */
>  	apic_set_spiv(apic, *((u32 *)(s->regs + APIC_SPIV)));
>  	memcpy(vcpu->arch.apic->regs, s->regs, sizeof *s);
> -	/* call kvm_apic_set_id() to put apic into apic_map */
> -	kvm_apic_set_id(apic, kvm_apic_id(apic));
> +	/* put apic into apic_map */
> +	recalculate_apic_map(vcpu->kvm);
>  	kvm_apic_set_version(vcpu);
>  
>  	apic_update_ppr(apic);
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset
  2016-06-30 20:54 ` [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
@ 2016-07-01  8:43   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:43 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> LAPIC is reset in xAPIC mode and the surrounding code expects that.
> KVM never resets after initialization.  This patch is just for sanity.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 143df33f451e..46eb71c425cf 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1758,8 +1758,11 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	/* Stop the timer in case it's a reset to an active apic */
>  	hrtimer_cancel(&apic->lapic_timer.timer);
>  
> -	if (!init_event)
> +	if (!init_event) {
> +		kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE |
> +		                         MSR_IA32_APICBASE_ENABLE);
>  		kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
> +	}
>  	kvm_apic_set_version(apic->vcpu);
>  
>  	for (i = 0; i < KVM_APIC_LVT_NUM; i++)
> @@ -1898,9 +1901,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
>  	 * thinking that APIC satet has changed.
>  	 */
>  	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
> -	kvm_lapic_set_base(vcpu,
> -			APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
> -
>  	static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */
>  	kvm_lapic_reset(vcpu, false);
>  	kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288
  2016-06-30 20:54 ` [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
@ 2016-07-01  8:43   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01  8:43 UTC (permalink / raw)
  To: Radim Krčmář, linux-kernel, kvm
  Cc: Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 30/06/2016 22:54, Radim Krčmář wrote:
> 288 is in high demand because of Knights Landing CPU.
> We cannot set the limit to 640k, because that would be wasting space.
> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 48b0ca18066c..411da14a675e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -34,7 +34,7 @@
>  #include <asm/asm.h>
>  #include <asm/kvm_page_track.h>
>  
> -#define KVM_MAX_VCPUS 255
> +#define KVM_MAX_VCPUS 288
>  #define KVM_SOFT_MAX_VCPUS 240
>  #define KVM_USER_MEM_SLOTS 509
>  /* memory slots that are not exposed to userspace */
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic
  2016-07-01  7:57   ` Paolo Bonzini
@ 2016-07-01 12:39     ` Radim Krčmář
  0 siblings, 0 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 12:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 09:57+0200, Paolo Bonzini:
> On 30/06/2016 22:54, Radim Krčmář wrote:
>> kvm_irq_delivery_to_apic_fast and kvm_intr_is_single_vcpu_fast both
>> compute the interrupt destination.  Factor the code.
>> 
>> 'struct kvm_lapic **dst = NULL' had to be added to silence GCC.
>> GCC might complain about potential NULL access in the future, because it
>> missed conditions that avoided uninitialized uses of dst.
>> 
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  v1: improved comment for kvm_apic_map_get_dest_lapic() [Peter]
>> 
>>  arch/x86/kvm/lapic.c | 241 ++++++++++++++++++++++-----------------------------
>>  1 file changed, 103 insertions(+), 138 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 22a6474af220..238b87b068db 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -671,14 +671,98 @@ static void kvm_apic_disabled_lapic_found(struct kvm *kvm)
>>  	}
>>  }
>>  
>> +/* Return true if the interrupt can be handled by using *bitmap as index mask
>> + * for valid destinations in *dst array.
>> + * Return false if kvm_apic_map_get_dest_lapic did nothing useful.
>> + * Note: we may have zero kvm_lapic destinations when we return true, which
>> + * means that the interrupt should be dropped.  In this case, *bitmap would be
>> + * zero and *dst undefined.
>> + */
>> +static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic *src,
>> +		struct kvm_lapic_irq *irq, struct kvm_apic_map *map,
>> +		struct kvm_lapic ***dst, unsigned long *bitmap)
>> +{
>> +	int i, lowest;
>> +	bool x2apic_ipi;
>> +	u16 cid;
>> +
>> +	if (irq->shorthand == APIC_DEST_SELF) {
>> +		*dst = &src;
> 
> This is not valid, &src dies as soon as you leave the function.  You
> need to pass &src into kvm_apic_map_get_dest_lapic and here do something
> like

Indeed, that would have been bad.

> 	if (irq->shorthand) {
> 		if (irq->shorthand == APIC_DEST_SELF && p_src) {
> 			*dst = p_src;
> 			*bitmap = 1;
> 			return true;
> 		}
> 		return false;
> 	}
> 
> If it's not too hard, I'd like to have patch 4 before this one.

Sure, thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01  8:42     ` Paolo Bonzini
@ 2016-07-01 12:44       ` Radim Krčmář
  2016-07-01 14:03         ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-01 10:42+0200, Paolo Bonzini:
> On 01/07/2016 00:15, Andrew Honig wrote:
>>> > +       /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
>>> > +       size = round_up(max_id + 1, 16);
>> Now that you're using the full range of apic_id values, could this
>> calculation overflow?  Perhaps max_id could be u64?
> 
> Good point, but I wonder if it's a good idea to let userspace allocate
> 32 GB of memory. :)

Yes, both could happen.  I'll change it to u64 to make it future proof.

> Let's put a limit on the maximum supported APIC ID, and report it
> through KVM_CHECK_EXTENSION on the new KVM_CAP_X2APIC_API capability.
> If 767 is enough for Knights Landing, the allocation below fits in two
> pages.  If you need to make it higher, please change the allocation to
> use kvm_kvzalloc and kvfree.

We sort of have a capability for maximum APIC ID, KVM_MAX_VCPU_ID,
because VCPU ID is initial APIC ID and x2APIC ID should always be the
initial APIC ID.

Userspace is able to change x2APIC with LAPIC_GET/SET ioctl -- what
about forbidding that?

> Last but not least...
> 
> >> > +       new = kzalloc(sizeof(struct kvm_apic_map) +
> >> > +                     sizeof(struct kvm_lapic) * size, GFP_KERNEL);
>                            ^^^^^^^^^^^^^^^^^^^^^^^^
> ... the sizeof here must be sizeof(struct kvm_lapic *).

Oops, thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic
  2016-07-01  7:56   ` Paolo Bonzini
@ 2016-07-01 12:48     ` Radim Krčmář
  2016-07-01 14:04       ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 12:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 09:56+0200, Paolo Bonzini:
> On 30/06/2016 22:54, Radim Krčmář wrote:
>> @@ -747,7 +745,7 @@ static inline bool kvm_apic_map_get_dest_lapic(struct kvm *kvm, struct kvm_lapic
>>  			return true;
>>  
>>  		lowest = kvm_vector_to_index(irq->vector, hweight16(*bitmap),
>> -				bitmap, 16);
>> +				(unsigned long *)bitmap, 16);
>>  
> 
> kvm_vector_to_index's last argument is always 16, so we might as well
> omit it and make the third argument u16* too.

The other callsite in kvm_irq_delivery_to_apic() does not pass 16:

  int idx = kvm_vector_to_index(irq->vector, dest_vcpus,
  			dest_vcpu_bitmap, KVM_MAX_VCPUS);

kvm_vector_to_index() uses find_next_bit(), so we would have two casts
if we changed the type.  This patch is not really needed, so I could
drop it instead.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01  8:33   ` Paolo Bonzini
@ 2016-07-01 13:11     ` Radim Krčmář
  2016-07-01 14:12       ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 13:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 10:33+0200, Paolo Bonzini:
> On 30/06/2016 22:54, Radim Krčmář wrote:
>> +static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>> +		struct kvm_lapic_state *s, bool set)
>> +{
>> +	if (apic_x2apic_mode(vcpu->arch.apic)) {
>> +		u32 *id = (u32 *)(s->regs + APIC_ID);
>> +		if (set)
>> +			*id >>= 24;
>> +		else
>> +			*id <<= 24;
>> +	}
> 
> When setting, this should read from the apic_base being set.  So I think
> you cannot use apic_x2apic_mode.

apic_x2apic_mode uses apic_base MSR, so its value does not depend on
LAPIC_SET/GET.  I don't like the dependency much, but all combinations
of values/orders should work well.

> With this change, you don't need the struct kvm_vcpu argument now; add a
> struct kvm argument instead in patch 10.

I don't think we can use hardware-compatible format in API without
knowing the x2apic mode through vcpu.
(The big endian format could do without knowing apic mode.)

>> @@ -2780,6 +2780,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>>  		kvm_x86_ops->sync_pir_to_irr(vcpu);
>>  
>>  	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
>> +	kvm_apic_state_get_fixup(vcpu, s);
> 
> Instead of kvm_apic_state_get/set_fixup, group the memcpy and
> __kvm_apic_state_fixup in a new function kvm_apic_get_state(struct
> kvm_lapic *apic, char *regs).

Ok.

>>  	return 0;
>>  }
>> @@ -2787,6 +2788,7 @@ static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>>  static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
>>  				    struct kvm_lapic_state *s)
>>  {
>> +	kvm_apic_state_set_fixup(vcpu, s);
>>  	kvm_apic_post_state_restore(vcpu, s);
> 
> ... and likewise merge these two in a refactored
> kvm_apic_post_state_restore called kvm_apic_set_state(struct kvm_lapic
> *apic, char *regs), by calling __kvm_apic_state_fixup before
> kvm_apic_post_state_restore's memcpy.
> 
> With these changes I guess there's no need for the underscores in
> __kvm_apic_state_fixup.

Yes, good idea.

>                          And you can also change the struct
> kvm_lapic_state pointer in the function to a char *.

Ok, I'll pass just s->regs.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-01  8:24   ` Paolo Bonzini
@ 2016-07-01 13:25     ` Radim Krčmář
  0 siblings, 0 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 13:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 10:24+0200, Paolo Bonzini:
> On 30/06/2016 22:54, Radim Krčmář wrote:
>> KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
>> addresses to 32 bits.  Both are needed to support x2APIC.
>> 
>> The capability has to be toggleable and disabled by default, because get/set
>> ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
>> inspired by xAPIC and the change is not backward-compatible.
>> 
>> Changes to MSI addresses follow the format used by interrupt remapping unit.
>> The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>> Using the upper address word is not backward-compatible either as we didn't
>> check that userspace zeroed the word.  Reserved bits are still not explicitly
>> checked, but non-zero data will affect LAPIC addresses, which will cause a bug.
>> 
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>>  v1:
>>  * rewritten with a toggleable capability [Paolo]
>>  * dropped MSI_ADDR_EXT_DEST_ID to enforce reserved bits
>> 
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>  [Rewritten documentation]

Will apply, thanks.

>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> @@ -1365,7 +1367,7 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
>>  			     struct kvm_vcpu **dest_vcpu);
>>  
>>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>> -		     struct kvm_lapic_irq *irq);
>> +		     struct kvm_lapic_irq *irq, bool x2apic_api);
> 
> Just pass a struct kvm as the first argument.

Ok.

>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>> @@ -111,12 +111,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>>  }
>>  
>>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>> -		     struct kvm_lapic_irq *irq)
>> +		     struct kvm_lapic_irq *irq, bool x2apic_api)
>>  {
>>  	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
>>  
>>  	irq->dest_id = (e->msi.address_lo &
>>  			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
>> +	if (x2apic_api)
>> +		/* MSI_ADDR_EXT_DEST_ID() is omitted to introduce bugs on
>> +		 * userspaces that set reserved bits 0-7.
>> +		 */
> 
> Reread Rusty's API design guidelines and come back. ;)

I still consider it as an improvement over not checking at all. ;)

> Seriously, please validate the address_hi at both places
> (KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI) and WARN here if you get
> non-zero bits 7-0.

This is of course better, will do necessary changes.

>> +		irq->dest_id |= e->msi.address_hi;
>>  	irq->vector = (e->msi.data &
>>  			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
>>  	irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -3799,6 +3800,17 @@ split_irqchip_unlock:
>> +	case KVM_CAP_X2APIC_API: {
>> +		struct kvm_enable_cap valid = {.cap = KVM_CAP_X2APIC_API};
>> +
>> +		r = -EINVAL;
>> +		if (memcmp(cap, &valid, sizeof(valid)))
>> +			break;
> 
> Nice trick, and strict argument checking in general is a good idea.
> However it's ugly to do it only for KVM_CAP_X2APIC_API and we've really
> bad at strict argument checking elsewhere.  For consistency, please
> check that args[0] is zero, and forgo other violations. :(

Ok.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 12:44       ` Radim Krčmář
@ 2016-07-01 14:03         ` Paolo Bonzini
  2016-07-01 14:38           ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 14:03 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu



On 01/07/2016 14:44, Radim Krčmář wrote:
> 2016-07-01 10:42+0200, Paolo Bonzini:
>> On 01/07/2016 00:15, Andrew Honig wrote:
>>>>> +       /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
>>>>> +       size = round_up(max_id + 1, 16);
>>> Now that you're using the full range of apic_id values, could this
>>> calculation overflow?  Perhaps max_id could be u64?
>>
>> Good point, but I wonder if it's a good idea to let userspace allocate
>> 32 GB of memory. :)
> 
> Yes, both could happen.  I'll change it to u64 to make it future proof.

It's not necessary to change it to u64 if you put a limit, but you can
add a WARN_ON(size == 0).

Also if kvm_apic_map_get_logical_dest() expects multiples of 16, it
should warn whenever the invariant is not respected.

>> Let's put a limit on the maximum supported APIC ID, and report it
>> through KVM_CHECK_EXTENSION on the new KVM_CAP_X2APIC_API capability.
>> If 767 is enough for Knights Landing, the allocation below fits in two
>> pages.  If you need to make it higher, please change the allocation to
>> use kvm_kvzalloc and kvfree.
> 
> We sort of have a capability for maximum APIC ID, KVM_MAX_VCPU_ID,
> because VCPU ID is initial APIC ID and x2APIC ID should always be the
> initial APIC ID.

Should it?  According to QEMU if you have e.g. 3 cores per socket one
socket take 4 APIC IDs.  For Knights Landing the "worst" prime factor in
288 is 3^2 so you need APIC IDs up to 288 * (4/3)^2 = 512.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic
  2016-07-01 12:48     ` Radim Krčmář
@ 2016-07-01 14:04       ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 14:04 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 01/07/2016 14:48, Radim Krčmář wrote:
> The other callsite in kvm_irq_delivery_to_apic() does not pass 16:
> 
>   int idx = kvm_vector_to_index(irq->vector, dest_vcpus,
>   			dest_vcpu_bitmap, KVM_MAX_VCPUS);

Ah, missed it.

> kvm_vector_to_index() uses find_next_bit(), so we would have two casts
> if we changed the type.  This patch is not really needed, so I could
> drop it instead.

Yes, let's leave it aside (and then you don't have to put it before
patch 2!).

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01 13:11     ` Radim Krčmář
@ 2016-07-01 14:12       ` Paolo Bonzini
  2016-07-01 14:54         ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 14:12 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 01/07/2016 15:11, Radim Krčmář wrote:
>>> >> +static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>>> >> +		struct kvm_lapic_state *s, bool set)
>>> >> +{
>>> >> +	if (apic_x2apic_mode(vcpu->arch.apic)) {
>>> >> +		u32 *id = (u32 *)(s->regs + APIC_ID);
>>> >> +		if (set)
>>> >> +			*id >>= 24;
>>> >> +		else
>>> >> +			*id <<= 24;
>>> >> +	}
>> > 
>> > When setting, this should read from the apic_base being set.  So I think
>> > you cannot use apic_x2apic_mode.
> 
> apic_x2apic_mode uses apic_base MSR, so its value does not depend on
> LAPIC_SET/GET.  I don't like the dependency much, but all combinations
> of values/orders should work well.

You're right of course.  However it should be documented in the
KVM_GET_LAPIC/KVM_SET_LAPIC ioctl docs that KVM_SET_MSR for apic_base
should be performed first.

Should kvm_lapic_set_base change the value of the ID register when
x2apic mode is left just like we do for entering x2apic mode?  (Hint: we
want kvm-unit-tests for this).

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 14:03         ` Paolo Bonzini
@ 2016-07-01 14:38           ` Radim Krčmář
  2016-07-01 15:06             ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 14:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-01 16:03+0200, Paolo Bonzini:
> On 01/07/2016 14:44, Radim Krčmář wrote:
>> 2016-07-01 10:42+0200, Paolo Bonzini:
>>> On 01/07/2016 00:15, Andrew Honig wrote:
>>>>>> +       /* kvm_apic_map_get_logical_dest() expects multiples of 16 */
>>>>>> +       size = round_up(max_id + 1, 16);
>>>> Now that you're using the full range of apic_id values, could this
>>>> calculation overflow?  Perhaps max_id could be u64?
>>>
>>> Good point, but I wonder if it's a good idea to let userspace allocate
>>> 32 GB of memory. :)
>> 
>> Yes, both could happen.  I'll change it to u64 to make it future proof.
> 
> It's not necessary to change it to u64 if you put a limit, but you can
> add a WARN_ON(size == 0).

Hm, to save 4 bytes and avoid a WARN_ON, I'll change it to u32
max_apic_id instead of u32 size.

> Also if kvm_apic_map_get_logical_dest() expects multiples of 16, it
> should warn whenever the invariant is not respected.

It was to optimize the fast path ... kvm_apic_map_get_logical_dest() can
handle arbitrary values, so I'll do that instead of checking or assuming
an alignment.

>>> Let's put a limit on the maximum supported APIC ID, and report it
>>> through KVM_CHECK_EXTENSION on the new KVM_CAP_X2APIC_API capability.
>>> If 767 is enough for Knights Landing, the allocation below fits in two
>>> pages.  If you need to make it higher, please change the allocation to
>>> use kvm_kvzalloc and kvfree.
>> 
>> We sort of have a capability for maximum APIC ID, KVM_MAX_VCPU_ID,
>> because VCPU ID is initial APIC ID and x2APIC ID should always be the
>> initial APIC ID.
> 
> Should it?

Yes, x2APIC ID cannot be changed in hardware and is initialized to the
intitial APIC ID.
Letting LAPIC_SET change x2APIC ID would allow scenarios where userspace
reuses old VMs instead of building new ones after reconfiguration.
I don't think it's a sensible use case and it it is currently broken,
because we don't exit to userspace when changing APIC mode, so KVM would
just set APIC ID to VCPU ID on any transition and userspace couldn't
amend it.

>             According to QEMU if you have e.g. 3 cores per socket one
> socket take 4 APIC IDs.  For Knights Landing the "worst" prime factor in
> 288 is 3^2 so you need APIC IDs up to 288 * (4/3)^2 = 512.

The topology can result in sparse APIC ID and APIC ID is initialized
from VCPU ID, so userspace has to pick VCPU ID accordingly.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01 14:12       ` Paolo Bonzini
@ 2016-07-01 14:54         ` Radim Krčmář
  2016-07-01 15:07           ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 14:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 16:12+0200, Paolo Bonzini:
> On 01/07/2016 15:11, Radim Krčmář wrote:
>>>> >> +static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>>>> >> +		struct kvm_lapic_state *s, bool set)
>>>> >> +{
>>>> >> +	if (apic_x2apic_mode(vcpu->arch.apic)) {
>>>> >> +		u32 *id = (u32 *)(s->regs + APIC_ID);
>>>> >> +		if (set)
>>>> >> +			*id >>= 24;
>>>> >> +		else
>>>> >> +			*id <<= 24;
>>>> >> +	}
>>> > 
>>> > When setting, this should read from the apic_base being set.  So I think
>>> > you cannot use apic_x2apic_mode.
>> 
>> apic_x2apic_mode uses apic_base MSR, so its value does not depend on
>> LAPIC_SET/GET.  I don't like the dependency much, but all combinations
>> of values/orders should work well.
> 
> You're right of course.  However it should be documented in the
> KVM_GET_LAPIC/KVM_SET_LAPIC ioctl docs that KVM_SET_MSR for apic_base
> should be performed first.

Ok, that will make our life easier.

> Should kvm_lapic_set_base change the value of the ID register when
> x2apic mode is left just like we do for entering x2apic mode?

Yes, the patch does it.  The only possible change from x2APIC mode is to
hardware disabled mode and re-enabled xAPIC starts with initial APIC ID
in upper 8 bits.

>                                                                (Hint: we
> want kvm-unit-tests for this).

:) Something like this?

  enable_xapic()
  id = apic_id()
  set_apic_id(id+1) // ?
  enable_x2apic()
  id == apic_id() & 0xff
  disable_apic()
  enable_xapic()
  id == apic_id()

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 14:38           ` Radim Krčmář
@ 2016-07-01 15:06             ` Paolo Bonzini
  2016-07-01 15:12               ` Paolo Bonzini
  2016-07-01 15:35               ` Radim Krčmář
  0 siblings, 2 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 15:06 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu



On 01/07/2016 16:38, Radim Krčmář wrote:
>> > Should it?
> Yes, x2APIC ID cannot be changed in hardware and is initialized to the
> intitial APIC ID.
> Letting LAPIC_SET change x2APIC ID would allow scenarios where userspace
> reuses old VMs instead of building new ones after reconfiguration.
> I don't think it's a sensible use case and it it is currently broken,
> because we don't exit to userspace when changing APIC mode, so KVM would
> just set APIC ID to VCPU ID on any transition and userspace couldn't
> amend it.
> 
>> >             According to QEMU if you have e.g. 3 cores per socket one
>> > socket take 4 APIC IDs.  For Knights Landing the "worst" prime factor in
>> > 288 is 3^2 so you need APIC IDs up to 288 * (4/3)^2 = 512.
> The topology can result in sparse APIC ID and APIC ID is initialized
> from VCPU ID, so userspace has to pick VCPU ID accordingly.

Right, I was confusing KVM_MAX_VCPUS and KVM_MAX_VCPU_ID.  So the
overflow case cannot happen and neither can the 32GB allocation.  On the
other hand, I suspect you need to bump KVM_MAX_VCPU_ID beyond its
current default setting (which is equal to KVM_MAX_VCPUS), up to 511 or
1023.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01 14:54         ` Radim Krčmář
@ 2016-07-01 15:07           ` Paolo Bonzini
  2016-07-01 15:53             ` Radim Krčmář
  0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 15:07 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 01/07/2016 16:54, Radim Krčmář wrote:
>> >                                                                (Hint: we
>> > want kvm-unit-tests for this).
> :) Something like this?
> 
>   enable_xapic()
>   id = apic_id()
>   set_apic_id(id+1) // ?
>   enable_x2apic()
>   id == apic_id() & 0xff
>   disable_apic()
>   enable_xapic()
>   id == apic_id()
> 

Yes, plus checking that it "moves" appropriately between low and high bits.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 15:06             ` Paolo Bonzini
@ 2016-07-01 15:12               ` Paolo Bonzini
  2016-07-01 15:43                 ` Radim Krčmář
  2016-07-01 15:35               ` Radim Krčmář
  1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 15:12 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu



On 01/07/2016 17:06, Paolo Bonzini wrote:
>>> >> > Should it?
>> Yes, x2APIC ID cannot be changed in hardware and is initialized to the
>> intitial APIC ID.
>> Letting LAPIC_SET change x2APIC ID would allow scenarios where userspace
>> reuses old VMs instead of building new ones after reconfiguration.
>> I don't think it's a sensible use case and it it is currently broken,
>> because we don't exit to userspace when changing APIC mode, so KVM would
>> just set APIC ID to VCPU ID on any transition and userspace couldn't
>> amend it.

Forgot to reply about this: letting SET_LAPIC change x2APIC IDs is nonsense.

In x2APIC mode + new capability disabled SET_LAPIC should ignore the id
register altogether for backwards compatibility.

In x2APIC mode + new capability enabled it should either ignore it, or
fail if the x2APIC ID doesn't match the VCPU id.  I suspect the latter
is better because it would help catching the case where userspace is
erroneously shifting the id left to bits 31-24.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 15:06             ` Paolo Bonzini
  2016-07-01 15:12               ` Paolo Bonzini
@ 2016-07-01 15:35               ` Radim Krčmář
  1 sibling, 0 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 15:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-01 17:06+0200, Paolo Bonzini:
> On 01/07/2016 16:38, Radim Krčmář wrote:
>                                                                   On the
> other hand, I suspect you need to bump KVM_MAX_VCPU_ID beyond its
> current default setting (which is equal to KVM_MAX_VCPUS), up to 511 or
> 1023.

Yes, thanks for pointing it out.  APIC ID 1023 is reasonably low and I'm
pretty sure that it cannot be surpassed with any topology of 288 VCPUs.
(The limit should be 543, with 257 cores per package.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 15:12               ` Paolo Bonzini
@ 2016-07-01 15:43                 ` Radim Krčmář
  2016-07-01 16:38                   ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 15:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu

2016-07-01 17:12+0200, Paolo Bonzini:
> On 01/07/2016 17:06, Paolo Bonzini wrote:
>>>> >> > Should it?
>>> Yes, x2APIC ID cannot be changed in hardware and is initialized to the
>>> intitial APIC ID.
>>> Letting LAPIC_SET change x2APIC ID would allow scenarios where userspace
>>> reuses old VMs instead of building new ones after reconfiguration.
>>> I don't think it's a sensible use case and it it is currently broken,
>>> because we don't exit to userspace when changing APIC mode, so KVM would
>>> just set APIC ID to VCPU ID on any transition and userspace couldn't
>>> amend it.
> 
> Forgot to reply about this: letting SET_LAPIC change x2APIC IDs is nonsense.
> 
> In x2APIC mode + new capability disabled SET_LAPIC should ignore the id
> register altogether for backwards compatibility.

I'd still shift SET_LAPIC APIC ID to have internal APIC ID register in
hardware-compatible format.

> In x2APIC mode + new capability enabled it should either ignore it, or
> fail if the x2APIC ID doesn't match the VCPU id.  I suspect the latter
> is better because it would help catching the case where userspace is
> erroneously shifting the id left to bits 31-24.

Yes, I'll make it EINVAL.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01 15:07           ` Paolo Bonzini
@ 2016-07-01 15:53             ` Radim Krčmář
  2016-07-01 16:37               ` Paolo Bonzini
  0 siblings, 1 reply; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 15:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 17:07+0200, Paolo Bonzini:
> On 01/07/2016 16:54, Radim Krčmář wrote:
>>> >                                                                (Hint: we
>>> > want kvm-unit-tests for this).
>> :) Something like this?
>> 
>>   enable_xapic()
>>   id = apic_id()
>>   set_apic_id(id+1) // ?
>>   enable_x2apic()
>>   id == apic_id() & 0xff
>>   disable_apic()
>>   enable_xapic()
>>   id == apic_id()
>> 
> 
> Yes, plus checking that it "moves" appropriately between low and high bits.

x2APIC cannot use MMIO interface, so apic_id() already does the best we
can ... if KVM is shifting wrong somwhere, then the id should differ.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register
  2016-07-01 15:53             ` Radim Krčmář
@ 2016-07-01 16:37               ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 16:37 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, Lan, Tianyu, Igor Mammedov, Jan Kiszka, Peter Xu



On 01/07/2016 17:53, Radim Krčmář wrote:
>> >>   enable_xapic()
>> >>   id = apic_id()
>> >>   set_apic_id(id+1) // ?
>> >>   enable_x2apic()
>> >>   id == apic_id() & 0xff
>> >>   disable_apic()
>> >>   enable_xapic()
>> >>   id == apic_id()
>> >> 
> > 
> > Yes, plus checking that it "moves" appropriately between low and high bits.
> 
> x2APIC cannot use MMIO interface, so apic_id() already does the best we
> can ... if KVM is shifting wrong somwhere, then the id should differ.

Yeah, fair enough.

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map
  2016-07-01 15:43                 ` Radim Krčmář
@ 2016-07-01 16:38                   ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2016-07-01 16:38 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Andrew Honig, linux-kernel, kvm, Lan, Tianyu, Igor Mammedov,
	Jan Kiszka, Peter Xu



On 01/07/2016 17:43, Radim Krčmář wrote:
> > Forgot to reply about this: letting SET_LAPIC change x2APIC IDs is nonsense.
> > 
> > In x2APIC mode + new capability disabled SET_LAPIC should ignore the id
> > register altogether for backwards compatibility.
> 
> I'd still shift SET_LAPIC APIC ID to have internal APIC ID register in
> hardware-compatible format.

With the capability disabled, APIC ID should always be in bits 31-24 for
both GET and SET.  But I think we agree, it's simpler to reason in v2
code and testcases. :)

Paolo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API
  2016-06-30 20:54 ` [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
  2016-07-01  8:24   ` Paolo Bonzini
@ 2016-07-01 18:09   ` David Matlack
  2016-07-01 18:31     ` Radim Krčmář
  1 sibling, 1 reply; 45+ messages in thread
From: David Matlack @ 2016-07-01 18:09 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm list, Paolo Bonzini, Lan, Tianyu,
	Igor Mammedov, Jan Kiszka, Peter Xu

On Thu, Jun 30, 2016 at 1:54 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
> addresses to 32 bits.  Both are needed to support x2APIC.
>
> The capability has to be toggleable and disabled by default, because get/set
> ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
> inspired by xAPIC and the change is not backward-compatible.
>
> Changes to MSI addresses follow the format used by interrupt remapping unit.
> The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
> Using the upper address word is not backward-compatible either as we didn't
> check that userspace zeroed the word.  Reserved bits are still not explicitly
> checked, but non-zero data will affect LAPIC addresses, which will cause a bug.
>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  v1:
>  * rewritten with a toggleable capability [Paolo]
>  * dropped MSI_ADDR_EXT_DEST_ID to enforce reserved bits
>
>  Documentation/virtual/kvm/api.txt | 26 ++++++++++++++++++++++++++
>  arch/x86/include/asm/kvm_host.h   |  4 +++-
>  arch/x86/kvm/irq_comm.c           | 14 ++++++++++----
>  arch/x86/kvm/lapic.c              |  2 +-
>  arch/x86/kvm/vmx.c                |  2 +-
>  arch/x86/kvm/x86.c                | 12 ++++++++++++
>  include/uapi/linux/kvm.h          |  1 +
>  7 files changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 09efa9eb3926..0f978089a0f6 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1482,6 +1482,9 @@ struct kvm_irq_routing_msi {
>         __u32 pad;
>  };
>
> +If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
> +of destination id and address_hi bits 7-0 is must be 0.
> +
>  struct kvm_irq_routing_s390_adapter {
>         __u64 ind_addr;
>         __u64 summary_addr;
> @@ -1583,6 +1586,13 @@ struct kvm_lapic_state {
>  Reads the Local APIC registers and copies them into the input argument.  The
>  data format and layout are the same as documented in the architecture manual.
>
> +If KVM_CAP_X2APIC_API is enabled, then the format of APIC_ID register depends
> +on APIC mode (reported by MSR_IA32_APICBASE) of its VCPU.  The format follows
> +xAPIC otherwise.
> +
> +x2APIC stores APIC ID as little endian in bits 31-0 of APIC_ID register.
> +xAPIC stores bits 7-0 of APIC ID in register bits 31-24.
> +
>
>  4.58 KVM_SET_LAPIC
>
> @@ -1600,6 +1610,8 @@ struct kvm_lapic_state {
>  Copies the input argument into the Local APIC registers.  The data format
>  and layout are the same as documented in the architecture manual.
>
> +See the note about APIC_ID register in KVM_GET_LAPIC.
> +
>
>  4.59 KVM_IOEVENTFD
>
> @@ -2180,6 +2192,9 @@ struct kvm_msi {
>
>  No flags are defined so far. The corresponding field must be 0.
>
> +If KVM_CAP_X2APIC_API is enabled, then address_hi bits 31-8 contain bits 31-8
> +of destination id and address_hi bits 7-0 is must be 0.
> +
>
>  4.71 KVM_CREATE_PIT2
>
> @@ -3811,6 +3826,17 @@ Allows use of runtime-instrumentation introduced with zEC12 processor.
>  Will return -EINVAL if the machine does not support runtime-instrumentation.
>  Will return -EBUSY if a VCPU has already been created.
>
> +7.7 KVM_CAP_X2APIC_API
> +
> +Architectures: x86
> +Parameters: none
> +Returns: 0 on success, -EINVAL if reserved parameters are not 0
> +
> +Enabling this capability changes the behavior of KVM_SET_GSI_ROUTING,
> +KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC.  See KVM_CAP_X2APIC_API
> +in their respective sections.
> +
> +
>  8. Other capabilities.
>  ----------------------
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 459a789cb3da..48b0ca18066c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -782,6 +782,8 @@ struct kvm_arch {
>         u32 ldr_mode;
>         struct page *avic_logical_id_table_page;
>         struct page *avic_physical_id_table_page;
> +
> +       bool x2apic_api;
>  };
>
>  struct kvm_vm_stat {
> @@ -1365,7 +1367,7 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
>                              struct kvm_vcpu **dest_vcpu);
>
>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> -                    struct kvm_lapic_irq *irq);
> +                    struct kvm_lapic_irq *irq, bool x2apic_api);
>
>  static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 47ad681a33fd..4594644ab090 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -111,12 +111,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  }
>
>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> -                    struct kvm_lapic_irq *irq)
> +                    struct kvm_lapic_irq *irq, bool x2apic_api)
>  {
>         trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);

This tracepoint should start reporting e->msi.address_hi as well now.

>
>         irq->dest_id = (e->msi.address_lo &
>                         MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
> +       if (x2apic_api)
> +               /* MSI_ADDR_EXT_DEST_ID() is omitted to introduce bugs on
> +                * userspaces that set reserved bits 0-7.
> +                */
> +               irq->dest_id |= e->msi.address_hi;
>         irq->vector = (e->msi.data &
>                         MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
>         irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
> @@ -137,7 +142,7 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>         if (!level)
>                 return -1;
>
> -       kvm_set_msi_irq(e, &irq);
> +       kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>
>         return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
>  }
> @@ -153,7 +158,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
>         if (unlikely(e->type != KVM_IRQ_ROUTING_MSI))
>                 return -EWOULDBLOCK;
>
> -       kvm_set_msi_irq(e, &irq);
> +       kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>
>         if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
>                 return r;
> @@ -393,7 +398,8 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
>                         if (entry->type != KVM_IRQ_ROUTING_MSI)
>                                 continue;
>
> -                       kvm_set_msi_irq(entry, &irq);
> +                       kvm_set_msi_irq(entry, &irq,
> +                                       vcpu->kvm->arch.x2apic_api);
>
>                         if (irq.level && kvm_apic_match_dest(vcpu, NULL, 0,
>                                                 irq.dest_id, irq.dest_mode))
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 46eb71c425cf..178605635df5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1983,7 +1983,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
>  static void __kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>                 struct kvm_lapic_state *s, bool set)
>  {
> -       if (apic_x2apic_mode(vcpu->arch.apic)) {
> +       if (apic_x2apic_mode(vcpu->arch.apic) && !vcpu->kvm->arch.x2apic_api) {
>                 u32 *id = (u32 *)(s->regs + APIC_ID);
>                 if (set)
>                         *id >>= 24;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a10038258b80..ea1f439b444e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11075,7 +11075,7 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
>                  * We will support full lowest-priority interrupt later.
>                  */
>
> -               kvm_set_msi_irq(e, &irq);
> +               kvm_set_msi_irq(e, &irq, kvm->arch.x2apic_api);
>                 if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>                         /*
>                          * Make sure the IRTE is in remapped mode if
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 043f110f2210..16b55f09dd16 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2576,6 +2576,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_DISABLE_QUIRKS:
>         case KVM_CAP_SET_BOOT_CPU_ID:
>         case KVM_CAP_SPLIT_IRQCHIP:
> +       case KVM_CAP_X2APIC_API:
>  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
>         case KVM_CAP_ASSIGN_DEV_IRQ:
>         case KVM_CAP_PCI_2_3:
> @@ -3799,6 +3800,17 @@ split_irqchip_unlock:
>                 mutex_unlock(&kvm->lock);
>                 break;
>         }
> +       case KVM_CAP_X2APIC_API: {
> +               struct kvm_enable_cap valid = {.cap = KVM_CAP_X2APIC_API};
> +
> +               r = -EINVAL;
> +               if (memcmp(cap, &valid, sizeof(valid)))
> +                       break;
> +
> +               kvm->arch.x2apic_api = true;
> +               r = 0;
> +               break;
> +       }
>         default:
>                 r = -EINVAL;
>                 break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 05ebf475104c..43b355d6db7b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_ARM_PMU_V3 126
>  #define KVM_CAP_VCPU_ATTRIBUTES 127
>  #define KVM_CAP_MAX_VCPU_ID 128
> +#define KVM_CAP_X2APIC_API 129
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API
  2016-07-01 18:09   ` David Matlack
@ 2016-07-01 18:31     ` Radim Krčmář
  0 siblings, 0 replies; 45+ messages in thread
From: Radim Krčmář @ 2016-07-01 18:31 UTC (permalink / raw)
  To: David Matlack
  Cc: linux-kernel, kvm list, Paolo Bonzini, Lan, Tianyu,
	Igor Mammedov, Jan Kiszka, Peter Xu

2016-07-01 11:09-0700, David Matlack:
> On Thu, Jun 30, 2016 at 1:54 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> KVM_CAP_X2APIC_API can be enabled to extend APIC ID in get/set ioctl and MSI
>> addresses to 32 bits.  Both are needed to support x2APIC.
>>
>> The capability has to be toggleable and disabled by default, because get/set
>> ioctl shifted and truncated APIC ID to 8 bits by using a non-standard protocol
>> inspired by xAPIC and the change is not backward-compatible.
>>
>> Changes to MSI addresses follow the format used by interrupt remapping unit.
>> The upper address word, that used to be 0, contains upper 24 bits of the LAPIC
>> address in its upper 24 bits.  Lower 8 bits are reserved as 0.
>> Using the upper address word is not backward-compatible either as we didn't
>> check that userspace zeroed the word.  Reserved bits are still not explicitly
>> checked, but non-zero data will affect LAPIC addresses, which will cause a bug.
>>
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
>> ---
>> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
>> @@ -111,12 +111,17 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>>  }
>>
>>  void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
>> -                    struct kvm_lapic_irq *irq)
>> +                    struct kvm_lapic_irq *irq, bool x2apic_api)
>>  {
>>         trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> 
> This tracepoint should start reporting e->msi.address_hi as well now.

Good catch, thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-07-01 18:31 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30 20:54 [PATCH v1 00/11] KVM: x86: break the xAPIC barrier Radim Krčmář
2016-06-30 20:54 ` [PATCH v1 01/11] KVM: x86: bump KVM_SOFT_MAX_VCPUS to 240 Radim Krčmář
2016-07-01  8:42   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 02/11] KVM: x86: add kvm_apic_map_get_dest_lapic Radim Krčmář
2016-07-01  7:57   ` Paolo Bonzini
2016-07-01 12:39     ` Radim Krčmář
2016-06-30 20:54 ` [PATCH v1 03/11] KVM: x86: dynamic kvm_apic_map Radim Krčmář
2016-06-30 22:15   ` Andrew Honig
2016-07-01  8:42     ` Paolo Bonzini
2016-07-01 12:44       ` Radim Krčmář
2016-07-01 14:03         ` Paolo Bonzini
2016-07-01 14:38           ` Radim Krčmář
2016-07-01 15:06             ` Paolo Bonzini
2016-07-01 15:12               ` Paolo Bonzini
2016-07-01 15:43                 ` Radim Krčmář
2016-07-01 16:38                   ` Paolo Bonzini
2016-07-01 15:35               ` Radim Krčmář
2016-07-01  7:33   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 04/11] KVM: x86: use u16 for logical VCPU mask in lapic Radim Krčmář
2016-07-01  7:56   ` Paolo Bonzini
2016-07-01 12:48     ` Radim Krčmář
2016-07-01 14:04       ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 05/11] KVM: x86: use generic function for MSI parsing Radim Krčmář
2016-07-01  8:42   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 06/11] KVM: x86: use hardware-compatible format for APIC ID register Radim Krčmář
2016-07-01  8:33   ` Paolo Bonzini
2016-07-01 13:11     ` Radim Krčmář
2016-07-01 14:12       ` Paolo Bonzini
2016-07-01 14:54         ` Radim Krčmář
2016-07-01 15:07           ` Paolo Bonzini
2016-07-01 15:53             ` Radim Krčmář
2016-07-01 16:37               ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 07/11] KVM: VMX: optimize APIC ID read with APICv Radim Krčmář
2016-07-01  8:42   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 08/11] KVM: x86: directly call recalculate_apic_map on lapic restore Radim Krčmář
2016-07-01  8:43   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 09/11] KVM: x86: reset lapic base in kvm_lapic_reset Radim Krčmář
2016-07-01  8:43   ` Paolo Bonzini
2016-06-30 20:54 ` [PATCH v1 10/11] KVM: x86: add KVM_CAP_X2APIC_API Radim Krčmář
2016-07-01  8:24   ` Paolo Bonzini
2016-07-01 13:25     ` Radim Krčmář
2016-07-01 18:09   ` David Matlack
2016-07-01 18:31     ` Radim Krčmář
2016-06-30 20:54 ` [PATCH v1 11/11] KVM: x86: bump MAX_VCPUS to 288 Radim Krčmář
2016-07-01  8:43   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.