[PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
@ 2017-08-16 19:40 Radim Krčmář
  2017-08-16 19:40 ` [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC Radim Krčmář
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Radim Krčmář @ 2017-08-16 19:40 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390, linux-arm-kernel
  Cc: Paolo Bonzini, David Hildenbrand, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

The goal is to increase KVM_MAX_VCPUS without worrying about memory
impact of many small guests.

This is a second out of three major "dynamic" options:
 1) size vcpu array at VM creation time
 2) resize vcpu array when new VCPUs are created
 3) use a lockless list/tree for VCPUs

The disadvantage of (1) is its requirement on userspace changes and
limited flexibility because userspace must provide the maximal count on
start.  The main advantage is that kvm->vcpus will work like it does
now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
allow dynamic kvm->vcpus array",
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html

The main problem of (2), this series, is that we cannot extend the array
in place and therefore require some kind of protection when moving it.
RCU seems best, but it makes the code slower and harder to deal with.
The main advantage is that we do not need userspace changes.

The third option wasn't explored yet.  It would solve the ugly
kvm_for_each_vcpu() of (2), but kvm_get_vcpu() would become linear.
(We could mitigate it by having list of vcpu arrays and A lockless
 sequentially growing "tree" would be logarithmic and not that much more
 complicated to implement.)

Which option do you think is the best?

Thanks.

Radim Krčmář (2):
  KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC
  KVM: RCU protected dynamic vcpus array

 arch/mips/kvm/mips.c       |  8 +++--
 arch/powerpc/kvm/powerpc.c |  6 ++--
 arch/s390/kvm/kvm-s390.c   | 27 ++++++++++++-----
 arch/x86/kvm/hyperv.c      |  3 +-
 arch/x86/kvm/vmx.c         |  3 +-
 arch/x86/kvm/x86.c         |  5 ++--
 include/linux/kvm_host.h   | 73 +++++++++++++++++++++++++++++-----------------
 virt/kvm/arm/arm.c         | 10 +++----
 virt/kvm/kvm_main.c        | 72 +++++++++++++++++++++++++++++++++++++--------
 9 files changed, 144 insertions(+), 63 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC
  2017-08-16 19:40 [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Radim Krčmář
@ 2017-08-16 19:40 ` Radim Krčmář
  2017-08-21 13:48   ` Christian Borntraeger
  2017-08-16 19:40 ` [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array Radim Krčmář
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2017-08-16 19:40 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390, linux-arm-kernel
  Cc: Paolo Bonzini, David Hildenbrand, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

Moving it to generic code will allow us to extend it with ease.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 include/linux/kvm_host.h | 12 ------------
 virt/kvm/kvm_main.c      | 16 +++++++++++++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6882538eda32..c8df733eed41 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -803,18 +803,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
-#ifndef __KVM_HAVE_ARCH_VM_ALLOC
-static inline struct kvm *kvm_arch_alloc_vm(void)
-{
-	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
-}
-
-static inline void kvm_arch_free_vm(struct kvm *kvm)
-{
-	kfree(kvm);
-}
-#endif
-
 #ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
 void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e17c40d986f3..2eac2c62795f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -640,10 +640,20 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	return 0;
 }
 
+static inline struct kvm *kvm_alloc_vm(void)
+{
+	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+}
+
+static inline void kvm_free_vm(struct kvm *kvm)
+{
+	kfree(kvm);
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	int r, i;
-	struct kvm *kvm = kvm_arch_alloc_vm();
+	struct kvm *kvm = kvm_alloc_vm();
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
@@ -720,7 +730,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		kfree(kvm_get_bus(kvm, i));
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
-	kvm_arch_free_vm(kvm);
+	kvm_free_vm(kvm);
 	mmdrop(current->mm);
 	return ERR_PTR(r);
 }
@@ -771,7 +781,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
 	cleanup_srcu_struct(&kvm->irq_srcu);
 	cleanup_srcu_struct(&kvm->srcu);
-	kvm_arch_free_vm(kvm);
+	kvm_free_vm(kvm);
 	preempt_notifier_dec();
 	hardware_disable_all();
 	mmdrop(mm);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array
  2017-08-16 19:40 [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Radim Krčmář
  2017-08-16 19:40 ` [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC Radim Krčmář
@ 2017-08-16 19:40 ` Radim Krčmář
  2017-08-17  8:07   ` Cornelia Huck
  2017-08-17 11:14   ` David Hildenbrand
  2017-08-17  7:04 ` [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Alexander Graf
  2017-08-17  7:29 ` David Hildenbrand
  3 siblings, 2 replies; 24+ messages in thread
From: Radim Krčmář @ 2017-08-16 19:40 UTC (permalink / raw)
  To: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390, linux-arm-kernel
  Cc: Paolo Bonzini, David Hildenbrand, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

This is a prototype with many TODO comments to give a better idea of
what would be needed.

The main missing piece a rework of every kvm_for_each_vcpu() into a less
inefficient loop, but RCU readers cannot block, so the rewrite cannot be
scripted.   Is there a more suitable protection scheme?

I didn't test it much ... I am still leaning towards the internally
simpler version, (1), even if it requires userspace changes.
---
 arch/mips/kvm/mips.c       |  8 +++---
 arch/powerpc/kvm/powerpc.c |  6 +++--
 arch/s390/kvm/kvm-s390.c   | 27 ++++++++++++++------
 arch/x86/kvm/hyperv.c      |  3 +--
 arch/x86/kvm/vmx.c         |  3 ++-
 arch/x86/kvm/x86.c         |  5 ++--
 include/linux/kvm_host.h   | 61 ++++++++++++++++++++++++++++++++++------------
 virt/kvm/arm/arm.c         | 10 +++-----
 virt/kvm/kvm_main.c        | 58 +++++++++++++++++++++++++++++++++++--------
 9 files changed, 132 insertions(+), 49 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index bce2a6431430..4c9d383babe7 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -164,6 +164,7 @@ void kvm_mips_free_vcpus(struct kvm *kvm)
 {
 	unsigned int i;
 	struct kvm_vcpu *vcpu;
+	struct kvm_vcpus *vcpus;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		kvm_arch_vcpu_free(vcpu);
@@ -171,8 +172,9 @@ void kvm_mips_free_vcpus(struct kvm *kvm)
 
 	mutex_lock(&kvm->lock);
 
-	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-		kvm->vcpus[i] = NULL;
+	// TODO: resetting online vcpus shouldn't be needed
+	vcpus = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
+	vcpus->online = 0;
 
 	atomic_set(&kvm->online_vcpus, 0);
 
@@ -499,7 +501,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
 	if (irq->cpu == -1)
 		dvcpu = vcpu;
 	else
-		dvcpu = vcpu->kvm->vcpus[irq->cpu];
+		dvcpu = kvm_get_vcpu(vcpu->kvm, irq->cpu);
 
 	if (intr == 2 || intr == 3 || intr == 4) {
 		kvm_mips_callbacks->queue_io_int(dvcpu, irq);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3480faaf1ef8..9d1a16fd629f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -460,6 +460,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	unsigned int i;
 	struct kvm_vcpu *vcpu;
+	struct kvm_vcpus *vcpus;
 
 #ifdef CONFIG_KVM_XICS
 	/*
@@ -475,8 +476,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		kvm_arch_vcpu_free(vcpu);
 
 	mutex_lock(&kvm->lock);
-	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-		kvm->vcpus[i] = NULL;
+
+	vcpus = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
+	vcpus->online = 0;
 
 	atomic_set(&kvm->online_vcpus, 0);
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 9f23a9e81a91..dd8592c67ef4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1945,15 +1945,16 @@ static void kvm_free_vcpus(struct kvm *kvm)
 {
 	unsigned int i;
 	struct kvm_vcpu *vcpu;
+	struct kvm_vcpus *vcpus;
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_arch_vcpu_destroy(vcpu);
 
 	mutex_lock(&kvm->lock);
-	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-		kvm->vcpus[i] = NULL;
 
-	atomic_set(&kvm->online_vcpus, 0);
+	vcpus = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
+	vcpus->online = 0;
+
 	mutex_unlock(&kvm->lock);
 }
 
@@ -3415,6 +3416,7 @@ static void __enable_ibs_on_vcpu(struct kvm_vcpu *vcpu)
 void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 {
 	int i, online_vcpus, started_vcpus = 0;
+	struct kvm_vcpus *vcpus;
 
 	if (!is_vcpu_stopped(vcpu))
 		return;
@@ -3422,12 +3424,16 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_vcpu_start_stop(vcpu->vcpu_id, 1);
 	/* Only one cpu at a time may enter/leave the STOPPED state. */
 	spin_lock(&vcpu->kvm->arch.start_stop_lock);
-	online_vcpus = atomic_read(&vcpu->kvm->online_vcpus);
 
-	for (i = 0; i < online_vcpus; i++) {
-		if (!is_vcpu_stopped(vcpu->kvm->vcpus[i]))
+	rcu_read_lock();
+	vcpus = rcu_dereference(vcpu->kvm->vcpus);
+	// TODO: this pattern is kvm_for_each_vcpu
+	for (i = 0; i < vcpus->online; i++) {
+		if (!is_vcpu_stopped(vcpus->array[i]))
 			started_vcpus++;
+		// TODO: if (started_vcpus > 1) break;
 	}
+	rcu_read_unlock();
 
 	if (started_vcpus == 0) {
 		/* we're the only active VCPU -> speed it up */
@@ -3455,6 +3461,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
 {
 	int i, online_vcpus, started_vcpus = 0;
 	struct kvm_vcpu *started_vcpu = NULL;
+	struct kvm_vcpus *vcpus;
 
 	if (is_vcpu_stopped(vcpu))
 		return;
@@ -3470,12 +3477,16 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
 	atomic_or(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
 	__disable_ibs_on_vcpu(vcpu);
 
+	rcu_read_lock();
+	vcpus = rcu_dereference(vcpu->kvm->vcpus);
+	// TODO: use kvm_for_each_vcpu
 	for (i = 0; i < online_vcpus; i++) {
-		if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) {
+		if (!is_vcpu_stopped(vcpus->array[i])) {
 			started_vcpus++;
-			started_vcpu = vcpu->kvm->vcpus[i];
+			started_vcpu = vcpus->array[i];
 		}
 	}
+	rcu_read_unlock();
 
 	if (started_vcpus == 1) {
 		/*
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dc97f2544b6f..bf4037344729 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -111,8 +111,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
 	struct kvm_vcpu *vcpu = NULL;
 	int i;
 
-	if (vpidx < KVM_MAX_VCPUS)
-		vcpu = kvm_get_vcpu(kvm, vpidx);
+	vcpu = kvm_get_vcpu(kvm, vpidx);
 	if (vcpu && vcpu_to_hv_vcpu(vcpu)->vp_index == vpidx)
 		return vcpu;
 	kvm_for_each_vcpu(i, vcpu, kvm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index df8d2f127508..1f492a1b64e9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11742,7 +11742,8 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 
 	if (!kvm_arch_has_assigned_device(kvm) ||
 		!irq_remapping_cap(IRQ_POSTING_CAP) ||
-		!kvm_vcpu_apicv_active(kvm->vcpus[0]))
+		// TODO: make apicv state accessible directly from struct kvm
+		!kvm_vcpu_apicv_active(kvm_get_vcpu(kvm, 0)))
 		return 0;
 
 	idx = srcu_read_lock(&kvm->irq_srcu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e10eda86bc7b..5d8af3e4eab1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8081,6 +8081,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
 {
 	unsigned int i;
 	struct kvm_vcpu *vcpu;
+	struct kvm_vcpus *vcpus;
 
 	/*
 	 * Unpin any mmu pages first.
@@ -8093,8 +8094,8 @@ static void kvm_free_vcpus(struct kvm *kvm)
 		kvm_arch_vcpu_free(vcpu);
 
 	mutex_lock(&kvm->lock);
-	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
-		kvm->vcpus[i] = NULL;
+	vcpus = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
+	vcpus->online = 0;
 
 	atomic_set(&kvm->online_vcpus, 0);
 	mutex_unlock(&kvm->lock);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c8df733eed41..eb9fb5b493ac 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -386,12 +386,17 @@ struct kvm_memslots {
 	int used_slots;
 };
 
+struct kvm_vcpus {
+	u32 online;
+	struct kvm_vcpu *array[];
+};
+
 struct kvm {
 	spinlock_t mmu_lock;
 	struct mutex slots_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
 	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	struct kvm_vcpus *vcpus;
 
 	/*
 	 * created_vcpus is protected by kvm->lock, and is incremented
@@ -483,45 +488,71 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
 
 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 {
-	/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu, in case
-	 * the caller has read kvm->online_vcpus before (as is the case
-	 * for kvm_for_each_vcpu, for example).
-	 */
-	smp_rmb();
-	return kvm->vcpus[i];
+	struct kvm_vcpu *r = NULL;
+	struct kvm_vcpus *vcpus;
+
+	rcu_read_lock();
+	vcpus = rcu_dereference(kvm->vcpus);
+	if (i < vcpus->online)
+		r = vcpus->array[i];
+	// TODO: check for bounds & return NULL in that case?
+	rcu_read_unlock();
+
+	return r;
 }
 
+// This is unacceptably inefficient implementation, but rcu critical section
+// imposes limitations on preemption and we'll have to check all users before
+// converting them.
 #define kvm_for_each_vcpu(idx, vcpup, kvm) \
 	for (idx = 0; \
-	     idx < atomic_read(&kvm->online_vcpus) && \
-	     (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
+	     ({struct kvm_vcpus *__vcpus; \
+	       rcu_read_lock(); \
+	       __vcpus = rcu_dereference(kvm->vcpus); \
+	       vcpup = idx < __vcpus->online ? __vcpus->array[idx] : NULL; \
+	       rcu_read_unlock(); \
+	       vcpup;}); \
 	     idx++)
 
+#define kvm_for_each_vcpu_rcu(idx, vcpup, vcpus, kvm) \
+	for (vcpus = rcu_dereference(kvm->vcpus), idx = 0; \
+	     idx < vcpus->online && (vcpup = vcpus->array[idx]); \
+	     idx++) \
+
 static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 {
 	struct kvm_vcpu *vcpu = NULL;
+	struct kvm_vcpus *vcpus;
 	int i;
 
 	if (id < 0)
 		return NULL;
-	if (id < KVM_MAX_VCPUS)
-		vcpu = kvm_get_vcpu(kvm, id);
+	vcpu = kvm_get_vcpu(kvm, id);
 	if (vcpu && vcpu->vcpu_id == id)
 		return vcpu;
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		if (vcpu->vcpu_id == id)
+	rcu_read_lock();
+	kvm_for_each_vcpu_rcu(i, vcpu, vcpus, kvm)
+		if (vcpu->vcpu_id == id) {
+			rcu_read_unlock();
 			return vcpu;
+		}
+	rcu_read_unlock();
 	return NULL;
 }
 
 static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu *tmp;
+	struct kvm_vcpus *vcpus;
 	int idx;
 
-	kvm_for_each_vcpu(idx, tmp, vcpu->kvm)
-		if (tmp == vcpu)
+	rcu_read_lock();
+	kvm_for_each_vcpu_rcu(idx, tmp, vcpus, vcpu->kvm)
+		if (tmp == vcpu) {
+			rcu_read_unlock();
 			return idx;
+		}
+	rcu_read_unlock();
 	BUG();
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b9f68e4add71..f0774a08083d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -174,16 +174,14 @@ int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
+	struct kvm_vcpu *vcpu;
 
 	free_percpu(kvm->arch.last_vcpu_ran);
 	kvm->arch.last_vcpu_ran = NULL;
 
-	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-		if (kvm->vcpus[i]) {
-			kvm_arch_vcpu_free(kvm->vcpus[i]);
-			kvm->vcpus[i] = NULL;
-		}
-	}
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvm_arch_vcpu_free(vcpu);
+	// other arches zeroed online here, for no apparent reason :)
 
 	kvm_vgic_destroy(kvm);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2eac2c62795f..578285ff74a4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -640,13 +640,31 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	return 0;
 }
 
+// TODO: preallocate some VCPUs
 static inline struct kvm *kvm_alloc_vm(void)
 {
-	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	struct kvm *kvm;
+	struct kvm_vcpus *vcpus;
+
+	kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	if (!kvm)
+		return NULL;
+
+	vcpus = kvzalloc(sizeof(*vcpus), GFP_KERNEL);
+	if (!vcpus) {
+		kfree(kvm);
+		return NULL;
+	}
+	vcpus->online = 0;
+	rcu_assign_pointer(kvm->vcpus, vcpus);
+
+	return kvm;
 }
 
 static inline void kvm_free_vm(struct kvm *kvm)
 {
+	if (kvm)
+		kvfree(rcu_dereference_protected(kvm->vcpus, 1));
 	kfree(kvm);
 }
 
@@ -2473,6 +2491,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
 	int r;
 	struct kvm_vcpu *vcpu;
+	struct kvm_vcpus *old, *new = NULL;
 
 	if (id >= KVM_MAX_VCPU_ID)
 		return -EINVAL;
@@ -2508,7 +2527,22 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		goto unlock_vcpu_destroy;
 	}
 
-	BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]);
+	old = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
+
+	// TODO: make it a function for it that returns new/NULL
+	{
+		// OPTIMIZE: do not allocate every time.  Requires atomic online
+		// counter.
+		u32 size = old->online + 1;
+
+		new = kvzalloc(sizeof(*new) + size * sizeof(*new->array), GFP_KERNEL);
+		if (!new) {
+			r = -ENOMEM;
+			goto unlock_vcpu_destroy;
+		}
+		new->online = size;
+		memcpy(new->array, old->array, old->online * sizeof(*old->array));
+	}
 
 	/* Now it's all set up, let userspace reach it */
 	kvm_get_kvm(kvm);
@@ -2518,20 +2552,24 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		goto unlock_vcpu_destroy;
 	}
 
-	kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
-
-	/*
-	 * Pairs with smp_rmb() in kvm_get_vcpu.  Write kvm->vcpus
-	 * before kvm->online_vcpu's incremented value.
-	 */
-	smp_wmb();
-	atomic_inc(&kvm->online_vcpus);
+	new->array[old->online] = vcpu;
+	rcu_assign_pointer(kvm->vcpus, new);
 
 	mutex_unlock(&kvm->lock);
+
+	// we could schedule a callback instead
+	synchronize_rcu();
+	kfree(old);
+
+	// TODO: No longer synchronizes anything in the common code.
+	// Remove if the arch-specific uses were mostly hacks.
+	atomic_inc(&kvm->online_vcpus);
+
 	kvm_arch_vcpu_postcreate(vcpu);
 	return r;
 
 unlock_vcpu_destroy:
+	kvfree(new);
 	mutex_unlock(&kvm->lock);
 	debugfs_remove_recursive(vcpu->debugfs_dentry);
 vcpu_destroy:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-16 19:40 [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Radim Krčmář
  2017-08-16 19:40 ` [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC Radim Krčmář
  2017-08-16 19:40 ` [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array Radim Krčmář
@ 2017-08-17  7:04 ` Alexander Graf
  2017-08-17  7:36   ` Cornelia Huck
  2017-08-17 14:54   ` Radim Krčmář
  2017-08-17  7:29 ` David Hildenbrand
  3 siblings, 2 replies; 24+ messages in thread
From: Alexander Graf @ 2017-08-17  7:04 UTC (permalink / raw)
  To: Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390
  Cc: Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, Cornelia Huck,
	David Hildenbrand, Paolo Bonzini



On 16.08.17 21:40, Radim Krčmář  wrote:
> The goal is to increase KVM_MAX_VCPUS without worrying about memory
> impact of many small guests.
> 
> This is a second out of three major "dynamic" options:
>   1) size vcpu array at VM creation time
>   2) resize vcpu array when new VCPUs are created
>   3) use a lockless list/tree for VCPUs
> 
> The disadvantage of (1) is its requirement on userspace changes and
> limited flexibility because userspace must provide the maximal count on
> start.  The main advantage is that kvm->vcpus will work like it does
> now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
> allow dynamic kvm->vcpus array",
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html
> 
> The main problem of (2), this series, is that we cannot extend the array
> in place and therefore require some kind of protection when moving it.
> RCU seems best, but it makes the code slower and harder to deal with.
> The main advantage is that we do not need userspace changes.

Creating/Destroying vcpus is not something I consider a fast path, so 
why should we optimize for it? The case that needs to be fast is execution.

What if we just sent a "vcpu move" request to all vcpus with the new 
pointer after it moved? That way the vcpu thread itself would be 
responsible for the migration to the new memory region. Only if all 
vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).

That way we should be basically lock-less and scale well. For additional 
icing, feel free to increase the vcpu array x2 every time it grows to 
not run into the slow path too often.


Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-16 19:40 [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Radim Krčmář
                   ` (2 preceding siblings ...)
  2017-08-17  7:04 ` [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Alexander Graf
@ 2017-08-17  7:29 ` David Hildenbrand
  2017-08-17  7:37   ` Cornelia Huck
  3 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-08-17  7:29 UTC (permalink / raw)
  To: Radim Krčmář,
	linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel
  Cc: Paolo Bonzini, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

On 16.08.2017 21:40, Radim Krčmář wrote:
> The goal is to increase KVM_MAX_VCPUS without worrying about memory
> impact of many small guests.
> 
> This is a second out of three major "dynamic" options:
>  1) size vcpu array at VM creation time
>  2) resize vcpu array when new VCPUs are created
>  3) use a lockless list/tree for VCPUs
> 
> The disadvantage of (1) is its requirement on userspace changes and
> limited flexibility because userspace must provide the maximal count on
> start.  The main advantage is that kvm->vcpus will work like it does
> now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
> allow dynamic kvm->vcpus array",
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html
> 
> The main problem of (2), this series, is that we cannot extend the array
> in place and therefore require some kind of protection when moving it.
> RCU seems best, but it makes the code slower and harder to deal with.
> The main advantage is that we do not need userspace changes.
> 
> The third option wasn't explored yet.  It would solve the ugly
> kvm_for_each_vcpu() of (2), but kvm_get_vcpu() would become linear.
> (We could mitigate it by having list of vcpu arrays and A lockless
>  sequentially growing "tree" would be logarithmic and not that much more
>  complicated to implement.)

That sounds interesting but also too complicated.

> 
> Which option do you think is the best?

I actually think the RCU variant doesn't look bad at all. Execution time
should be ok.

As Alex said, doubling the size every time we run out of space could be
done.

I clearly favor a solution that doesn't require user space changes.

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  7:04 ` [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Alexander Graf
@ 2017-08-17  7:36   ` Cornelia Huck
  2017-08-17  9:16     ` Paolo Bonzini
  2017-08-17 14:54   ` Radim Krčmář
  1 sibling, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2017-08-17  7:36 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, David Hildenbrand,
	Paolo Bonzini

On Thu, 17 Aug 2017 09:04:14 +0200
Alexander Graf <agraf@suse.de> wrote:

> On 16.08.17 21:40, Radim Krčmář  wrote:
> > The goal is to increase KVM_MAX_VCPUS without worrying about memory
> > impact of many small guests.
> > 
> > This is a second out of three major "dynamic" options:
> >   1) size vcpu array at VM creation time
> >   2) resize vcpu array when new VCPUs are created
> >   3) use a lockless list/tree for VCPUs
> > 
> > The disadvantage of (1) is its requirement on userspace changes and
> > limited flexibility because userspace must provide the maximal count on
> > start.  The main advantage is that kvm->vcpus will work like it does
> > now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
> > allow dynamic kvm->vcpus array",
> > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html
> > 
> > The main problem of (2), this series, is that we cannot extend the array
> > in place and therefore require some kind of protection when moving it.
> > RCU seems best, but it makes the code slower and harder to deal with.
> > The main advantage is that we do not need userspace changes.  
> 
> Creating/Destroying vcpus is not something I consider a fast path, so 
> why should we optimize for it? The case that needs to be fast is execution.
> 
> What if we just sent a "vcpu move" request to all vcpus with the new 
> pointer after it moved? That way the vcpu thread itself would be 
> responsible for the migration to the new memory region. Only if all 
> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
> 
> That way we should be basically lock-less and scale well. For additional 
> icing, feel free to increase the vcpu array x2 every time it grows to 
> not run into the slow path too often.

I'd prefer the rcu approach: This is a mechanism already understood
well, no need to come up with a new one that will likely have its own
share of problems.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  7:29 ` David Hildenbrand
@ 2017-08-17  7:37   ` Cornelia Huck
  0 siblings, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2017-08-17  7:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Radim Krčmář,
	linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel, Paolo Bonzini, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, James Hogan, Paul Mackerras,
	Alexander Graf

On Thu, 17 Aug 2017 09:29:51 +0200
David Hildenbrand <david@redhat.com> wrote:

> As Alex said, doubling the size every time we run out of space could be
> done.
> 
> I clearly favor a solution that doesn't require user space changes.

+1 to both of this.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array
  2017-08-16 19:40 ` [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array Radim Krčmář
@ 2017-08-17  8:07   ` Cornelia Huck
  2017-08-17 11:14   ` David Hildenbrand
  1 sibling, 0 replies; 24+ messages in thread
From: Cornelia Huck @ 2017-08-17  8:07 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel, Paolo Bonzini, David Hildenbrand,
	Christoffer Dall, Marc Zyngier, Christian Borntraeger,
	James Hogan, Paul Mackerras, Alexander Graf

On Wed, 16 Aug 2017 21:40:37 +0200
Radim Krčmář <rkrcmar@redhat.com> wrote:

> This is a prototype with many TODO comments to give a better idea of
> what would be needed.

Just a very superficial reading...

> 
> The main missing piece a rework of every kvm_for_each_vcpu() into a less
> inefficient loop, but RCU readers cannot block, so the rewrite cannot be
> scripted.   Is there a more suitable protection scheme?
> 
> I didn't test it much ... I am still leaning towards the internally
> simpler version, (1), even if it requires userspace changes.
> ---
>  arch/mips/kvm/mips.c       |  8 +++---
>  arch/powerpc/kvm/powerpc.c |  6 +++--
>  arch/s390/kvm/kvm-s390.c   | 27 ++++++++++++++------
>  arch/x86/kvm/hyperv.c      |  3 +--
>  arch/x86/kvm/vmx.c         |  3 ++-
>  arch/x86/kvm/x86.c         |  5 ++--
>  include/linux/kvm_host.h   | 61 ++++++++++++++++++++++++++++++++++------------
>  virt/kvm/arm/arm.c         | 10 +++-----
>  virt/kvm/kvm_main.c        | 58 +++++++++++++++++++++++++++++++++++--------
>  9 files changed, 132 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index bce2a6431430..4c9d383babe7 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -164,6 +164,7 @@ void kvm_mips_free_vcpus(struct kvm *kvm)
>  {
>  	unsigned int i;
>  	struct kvm_vcpu *vcpu;
> +	struct kvm_vcpus *vcpus;
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		kvm_arch_vcpu_free(vcpu);
> @@ -171,8 +172,9 @@ void kvm_mips_free_vcpus(struct kvm *kvm)
>  
>  	mutex_lock(&kvm->lock);
>  
> -	for (i = 0; i < atomic_read(&kvm->online_vcpus); i++)
> -		kvm->vcpus[i] = NULL;
> +	// TODO: resetting online vcpus shouldn't be needed
> +	vcpus = rcu_dereference_protected(kvm->vcpus, lockdep_is_held(&kvm->lock));
> +	vcpus->online = 0;

This seems to be a pattern used on nearly all architectures, so maybe
it was simply copied?

Iff we really need it (probably not), it seems like something that can
be done by common code.

>  
>  	atomic_set(&kvm->online_vcpus, 0);
>  
(...)
> @@ -3422,12 +3424,16 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu)
>  	trace_kvm_s390_vcpu_start_stop(vcpu->vcpu_id, 1);
>  	/* Only one cpu at a time may enter/leave the STOPPED state. */
>  	spin_lock(&vcpu->kvm->arch.start_stop_lock);
> -	online_vcpus = atomic_read(&vcpu->kvm->online_vcpus);
>  
> -	for (i = 0; i < online_vcpus; i++) {
> -		if (!is_vcpu_stopped(vcpu->kvm->vcpus[i]))
> +	rcu_read_lock();
> +	vcpus = rcu_dereference(vcpu->kvm->vcpus);
> +	// TODO: this pattern is kvm_for_each_vcpu
> +	for (i = 0; i < vcpus->online; i++) {
> +		if (!is_vcpu_stopped(vcpus->array[i]))
>  			started_vcpus++;
> +		// TODO: if (started_vcpus > 1) break;
>  	}
> +	rcu_read_unlock();
>  
>  	if (started_vcpus == 0) {
>  		/* we're the only active VCPU -> speed it up */
> @@ -3455,6 +3461,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
>  {
>  	int i, online_vcpus, started_vcpus = 0;
>  	struct kvm_vcpu *started_vcpu = NULL;
> +	struct kvm_vcpus *vcpus;
>  
>  	if (is_vcpu_stopped(vcpu))
>  		return;
> @@ -3470,12 +3477,16 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
>  	atomic_or(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
>  	__disable_ibs_on_vcpu(vcpu);
>  
> +	rcu_read_lock();
> +	vcpus = rcu_dereference(vcpu->kvm->vcpus);
> +	// TODO: use kvm_for_each_vcpu
>  	for (i = 0; i < online_vcpus; i++) {
> -		if (!is_vcpu_stopped(vcpu->kvm->vcpus[i])) {
> +		if (!is_vcpu_stopped(vcpus->array[i])) {
>  			started_vcpus++;
> -			started_vcpu = vcpu->kvm->vcpus[i];
> +			started_vcpu = vcpus->array[i];
>  		}
>  	}
> +	rcu_read_unlock();

These two only care for two cases: 0 started cpus <-> 1 started cpu and
1 started cpu <-> 2 started cpus. Maybe it is more reasonable to track
that in the arch code instead of walking the array.

>  
>  	if (started_vcpus == 1) {
>  		/*
(...)
> @@ -2518,20 +2552,24 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>  		goto unlock_vcpu_destroy;
>  	}
>  
> -	kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
> -
> -	/*
> -	 * Pairs with smp_rmb() in kvm_get_vcpu.  Write kvm->vcpus
> -	 * before kvm->online_vcpu's incremented value.
> -	 */
> -	smp_wmb();
> -	atomic_inc(&kvm->online_vcpus);
> +	new->array[old->online] = vcpu;
> +	rcu_assign_pointer(kvm->vcpus, new);
>  
>  	mutex_unlock(&kvm->lock);
> +
> +	// we could schedule a callback instead
> +	synchronize_rcu();
> +	kfree(old);
> +
> +	// TODO: No longer synchronizes anything in the common code.
> +	// Remove if the arch-specific uses were mostly hacks.
> +	atomic_inc(&kvm->online_vcpus);

Much of the arch code seems to care about one of two things:
- What is the upper limit for cpu searches?
- Has at least one cpu been created?

> +
>  	kvm_arch_vcpu_postcreate(vcpu);
>  	return r;
>  
>  unlock_vcpu_destroy:
> +	kvfree(new);
>  	mutex_unlock(&kvm->lock);
>  	debugfs_remove_recursive(vcpu->debugfs_dentry);
>  vcpu_destroy:

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  7:36   ` Cornelia Huck
@ 2017-08-17  9:16     ` Paolo Bonzini
  2017-08-17  9:28       ` Cornelia Huck
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-08-17  9:16 UTC (permalink / raw)
  To: Cornelia Huck, Alexander Graf
  Cc: Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, David Hildenbrand

On 17/08/2017 09:36, Cornelia Huck wrote:
>> What if we just sent a "vcpu move" request to all vcpus with the new 
>> pointer after it moved? That way the vcpu thread itself would be 
>> responsible for the migration to the new memory region. Only if all 
>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>
>> That way we should be basically lock-less and scale well. For additional 
>> icing, feel free to increase the vcpu array x2 every time it grows to 
>> not run into the slow path too often.
> 
> I'd prefer the rcu approach: This is a mechanism already understood
> well, no need to come up with a new one that will likely have its own
> share of problems.

What Alex is proposing _is_ RCU, except with a homegrown
synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  9:16     ` Paolo Bonzini
@ 2017-08-17  9:28       ` Cornelia Huck
  2017-08-17  9:44         ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Cornelia Huck @ 2017-08-17  9:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, David Hildenbrand

On Thu, 17 Aug 2017 11:16:59 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 17/08/2017 09:36, Cornelia Huck wrote:
> >> What if we just sent a "vcpu move" request to all vcpus with the new 
> >> pointer after it moved? That way the vcpu thread itself would be 
> >> responsible for the migration to the new memory region. Only if all 
> >> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
> >>
> >> That way we should be basically lock-less and scale well. For additional 
> >> icing, feel free to increase the vcpu array x2 every time it grows to 
> >> not run into the slow path too often.  
> > 
> > I'd prefer the rcu approach: This is a mechanism already understood
> > well, no need to come up with a new one that will likely have its own
> > share of problems.  
> 
> What Alex is proposing _is_ RCU, except with a homegrown
> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.

I'm worried a bit about the 'homegrown' part, though.

I also may be misunderstanding what Alex means with "vcpu move"...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  9:28       ` Cornelia Huck
@ 2017-08-17  9:44         ` Paolo Bonzini
  2017-08-17  9:55           ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-08-17  9:44 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, David Hildenbrand

On 17/08/2017 11:28, Cornelia Huck wrote:
> On Thu, 17 Aug 2017 11:16:59 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>> responsible for the migration to the new memory region. Only if all 
>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>
>>>> That way we should be basically lock-less and scale well. For additional 
>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>> not run into the slow path too often.  
>>>
>>> I'd prefer the rcu approach: This is a mechanism already understood
>>> well, no need to come up with a new one that will likely have its own
>>> share of problems.  
>>
>> What Alex is proposing _is_ RCU, except with a homegrown
>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
> 
> I'm worried a bit about the 'homegrown' part, though.

I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
has its uses.  For example, if you were only doing reads from a work
queue, flush_work_queue could be used as the "homegrown
synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.

> I also may be misunderstanding what Alex means with "vcpu move"...

My interpretation was "resizing the array" (so it moves in memory).

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  9:44         ` Paolo Bonzini
@ 2017-08-17  9:55           ` David Hildenbrand
  2017-08-17 10:18             ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-08-17  9:55 UTC (permalink / raw)
  To: Paolo Bonzini, Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras

On 17.08.2017 11:44, Paolo Bonzini wrote:
> On 17/08/2017 11:28, Cornelia Huck wrote:
>> On Thu, 17 Aug 2017 11:16:59 +0200
>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>>> responsible for the migration to the new memory region. Only if all 
>>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>>
>>>>> That way we should be basically lock-less and scale well. For additional 
>>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>>> not run into the slow path too often.  
>>>>
>>>> I'd prefer the rcu approach: This is a mechanism already understood
>>>> well, no need to come up with a new one that will likely have its own
>>>> share of problems.  
>>>
>>> What Alex is proposing _is_ RCU, except with a homegrown
>>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
>>
>> I'm worried a bit about the 'homegrown' part, though.
> 
> I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
> has its uses.  For example, if you were only doing reads from a work
> queue, flush_work_queue could be used as the "homegrown
> synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.
> 
>> I also may be misunderstanding what Alex means with "vcpu move"...
> 
> My interpretation was "resizing the array" (so it moves in memory).
> 
> Paolo
> 

Unpopular opinion: Let's keep it simple first (straight rcu) and
optimize later on.

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  9:55           ` David Hildenbrand
@ 2017-08-17 10:18             ` Paolo Bonzini
  2017-08-17 10:20               ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-08-17 10:18 UTC (permalink / raw)
  To: David Hildenbrand, Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras

On 17/08/2017 11:55, David Hildenbrand wrote:
> On 17.08.2017 11:44, Paolo Bonzini wrote:
>> On 17/08/2017 11:28, Cornelia Huck wrote:
>>> On Thu, 17 Aug 2017 11:16:59 +0200
>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>>>> responsible for the migration to the new memory region. Only if all 
>>>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>>>
>>>>>> That way we should be basically lock-less and scale well. For additional 
>>>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>>>> not run into the slow path too often.  
>>>>>
>>>>> I'd prefer the rcu approach: This is a mechanism already understood
>>>>> well, no need to come up with a new one that will likely have its own
>>>>> share of problems.  
>>>>
>>>> What Alex is proposing _is_ RCU, except with a homegrown
>>>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
>>>
>>> I'm worried a bit about the 'homegrown' part, though.
>>
>> I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
>> has its uses.  For example, if you were only doing reads from a work
>> queue, flush_work_queue could be used as the "homegrown
>> synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.
>>
>>> I also may be misunderstanding what Alex means with "vcpu move"...
>>
>> My interpretation was "resizing the array" (so it moves in memory).
> 
> Unpopular opinion: Let's keep it simple first (straight rcu) and
> optimize later on.

RCU vs. SRCU is about correctness, not optimization...

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17 10:18             ` Paolo Bonzini
@ 2017-08-17 10:20               ` David Hildenbrand
  2017-08-17 10:23                 ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-08-17 10:20 UTC (permalink / raw)
  To: Paolo Bonzini, Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras

On 17.08.2017 12:18, Paolo Bonzini wrote:
> On 17/08/2017 11:55, David Hildenbrand wrote:
>> On 17.08.2017 11:44, Paolo Bonzini wrote:
>>> On 17/08/2017 11:28, Cornelia Huck wrote:
>>>> On Thu, 17 Aug 2017 11:16:59 +0200
>>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>
>>>>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>>>>> responsible for the migration to the new memory region. Only if all 
>>>>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>>>>
>>>>>>> That way we should be basically lock-less and scale well. For additional 
>>>>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>>>>> not run into the slow path too often.  
>>>>>>
>>>>>> I'd prefer the rcu approach: This is a mechanism already understood
>>>>>> well, no need to come up with a new one that will likely have its own
>>>>>> share of problems.  
>>>>>
>>>>> What Alex is proposing _is_ RCU, except with a homegrown
>>>>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
>>>>
>>>> I'm worried a bit about the 'homegrown' part, though.
>>>
>>> I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
>>> has its uses.  For example, if you were only doing reads from a work
>>> queue, flush_work_queue could be used as the "homegrown
>>> synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.
>>>
>>>> I also may be misunderstanding what Alex means with "vcpu move"...
>>>
>>> My interpretation was "resizing the array" (so it moves in memory).
>>
>> Unpopular opinion: Let's keep it simple first (straight rcu) and
>> optimize later on.
> 
> RCU vs. SRCU is about correctness, not optimization...
> 
> Paolo
> 

Guess I am still missing the point why RCU cannot be used here.

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17 10:20               ` David Hildenbrand
@ 2017-08-17 10:23                 ` Paolo Bonzini
  2017-08-17 10:31                   ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-08-17 10:23 UTC (permalink / raw)
  To: David Hildenbrand, Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras

On 17/08/2017 12:20, David Hildenbrand wrote:
> On 17.08.2017 12:18, Paolo Bonzini wrote:
>> On 17/08/2017 11:55, David Hildenbrand wrote:
>>> On 17.08.2017 11:44, Paolo Bonzini wrote:
>>>> On 17/08/2017 11:28, Cornelia Huck wrote:
>>>>> On Thu, 17 Aug 2017 11:16:59 +0200
>>>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>
>>>>>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>>>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>>>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>>>>>> responsible for the migration to the new memory region. Only if all 
>>>>>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>>>>>
>>>>>>>> That way we should be basically lock-less and scale well. For additional 
>>>>>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>>>>>> not run into the slow path too often.  
>>>>>>>
>>>>>>> I'd prefer the rcu approach: This is a mechanism already understood
>>>>>>> well, no need to come up with a new one that will likely have its own
>>>>>>> share of problems.  
>>>>>>
>>>>>> What Alex is proposing _is_ RCU, except with a homegrown
>>>>>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
>>>>>
>>>>> I'm worried a bit about the 'homegrown' part, though.
>>>>
>>>> I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
>>>> has its uses.  For example, if you were only doing reads from a work
>>>> queue, flush_work_queue could be used as the "homegrown
>>>> synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.
>>>>
>>>>> I also may be misunderstanding what Alex means with "vcpu move"...
>>>>
>>>> My interpretation was "resizing the array" (so it moves in memory).
>>>
>>> Unpopular opinion: Let's keep it simple first (straight rcu) and
>>> optimize later on.
>>
>> RCU vs. SRCU is about correctness, not optimization...
> 
> Guess I am still missing the point why RCU cannot be used here.

Because the body of kvm_foreach_vcpu might sleep.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17 10:23                 ` Paolo Bonzini
@ 2017-08-17 10:31                   ` David Hildenbrand
  0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2017-08-17 10:31 UTC (permalink / raw)
  To: Paolo Bonzini, Cornelia Huck
  Cc: Alexander Graf, Radim Krčmář,
	linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras

On 17.08.2017 12:23, Paolo Bonzini wrote:
> On 17/08/2017 12:20, David Hildenbrand wrote:
>> On 17.08.2017 12:18, Paolo Bonzini wrote:
>>> On 17/08/2017 11:55, David Hildenbrand wrote:
>>>> On 17.08.2017 11:44, Paolo Bonzini wrote:
>>>>> On 17/08/2017 11:28, Cornelia Huck wrote:
>>>>>> On Thu, 17 Aug 2017 11:16:59 +0200
>>>>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>
>>>>>>> On 17/08/2017 09:36, Cornelia Huck wrote:
>>>>>>>>> What if we just sent a "vcpu move" request to all vcpus with the new 
>>>>>>>>> pointer after it moved? That way the vcpu thread itself would be 
>>>>>>>>> responsible for the migration to the new memory region. Only if all 
>>>>>>>>> vcpus successfully moved, keep rolling (and allow foreign get_vcpu again).
>>>>>>>>>
>>>>>>>>> That way we should be basically lock-less and scale well. For additional 
>>>>>>>>> icing, feel free to increase the vcpu array x2 every time it grows to 
>>>>>>>>> not run into the slow path too often.  
>>>>>>>>
>>>>>>>> I'd prefer the rcu approach: This is a mechanism already understood
>>>>>>>> well, no need to come up with a new one that will likely have its own
>>>>>>>> share of problems.  
>>>>>>>
>>>>>>> What Alex is proposing _is_ RCU, except with a homegrown
>>>>>>> synchronize_rcu.  Using kvm->srcu seems to be the best of both worlds.
>>>>>>
>>>>>> I'm worried a bit about the 'homegrown' part, though.
>>>>>
>>>>> I agree, that's why I'm suggesting SRCU instead.  But it's a trick that
>>>>> has its uses.  For example, if you were only doing reads from a work
>>>>> queue, flush_work_queue could be used as the "homegrown
>>>>> synchronize_rcu".  In KVM you might use kvm_make_all_cpus_request, I guess.
>>>>>
>>>>>> I also may be misunderstanding what Alex means with "vcpu move"...
>>>>>
>>>>> My interpretation was "resizing the array" (so it moves in memory).
>>>>
>>>> Unpopular opinion: Let's keep it simple first (straight rcu) and
>>>> optimize later on.
>>>
>>> RCU vs. SRCU is about correctness, not optimization...
>>
>> Guess I am still missing the point why RCU cannot be used here.
> 
> Because the body of kvm_foreach_vcpu might sleep.
> 

Thanks, now I get it, then of course srcu is the right thing to do.

> Paolo
> 


-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array
  2017-08-16 19:40 ` [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array Radim Krčmář
  2017-08-17  8:07   ` Cornelia Huck
@ 2017-08-17 11:14   ` David Hildenbrand
  2017-08-17 16:50     ` Radim Krčmář
  1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2017-08-17 11:14 UTC (permalink / raw)
  To: Radim Krčmář,
	linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel
  Cc: Paolo Bonzini, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf


>  	atomic_set(&kvm->online_vcpus, 0);
>  	mutex_unlock(&kvm->lock);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c8df733eed41..eb9fb5b493ac 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -386,12 +386,17 @@ struct kvm_memslots {
>  	int used_slots;
>  };
>  
> +struct kvm_vcpus {
> +	u32 online;
> +	struct kvm_vcpu *array[];

On option could be to simply chunk it:

+struct kvm_vcpus {
+       struct kvm_vcpu vcpus[32];
+};
+
 /*
  * Note:
  * memslots are not sorted by id anymore, please use id_to_memslot()
@@ -391,7 +395,7 @@ struct kvm {
        struct mutex slots_lock;
        struct mm_struct *mm; /* userspace tied to this vm */
        struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-       struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+       struct kvm_vcpus vcpus[(KVM_MAX_VCPUS + 31) / 32];

        /*
         * created_vcpus is protected by kvm->lock, and is incremented
@@ -483,12 +487,14 @@ static inline struct kvm_io_bus
*kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)


1. make nobody access kvm->vcpus directly (factor out)
2. allocate next chunk if necessary when creating a VCPU and store
pointer using WRITE_ONCE
3. use READ_ONCE to test for availability of the current chunk

kvm_for_each_vcpu just has to use READ_ONCE to access/test for the right
chunk. Pointers never get invalid. No RCU needed. Sleeping in the loop
is possible.

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17  7:04 ` [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Alexander Graf
  2017-08-17  7:36   ` Cornelia Huck
@ 2017-08-17 14:54   ` Radim Krčmář
  2017-08-17 19:17     ` Alexander Graf
  1 sibling, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2017-08-17 14:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, Cornelia Huck,
	David Hildenbrand, Paolo Bonzini

2017-08-17 09:04+0200, Alexander Graf:
> On 16.08.17 21:40, Radim Krčmář  wrote:
> > The goal is to increase KVM_MAX_VCPUS without worrying about memory
> > impact of many small guests.
> > 
> > This is a second out of three major "dynamic" options:
> >   1) size vcpu array at VM creation time
> >   2) resize vcpu array when new VCPUs are created
> >   3) use a lockless list/tree for VCPUs
> > 
> > The disadvantage of (1) is its requirement on userspace changes and
> > limited flexibility because userspace must provide the maximal count on
> > start.  The main advantage is that kvm->vcpus will work like it does
> > now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
> > allow dynamic kvm->vcpus array",
> > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html
> > 
> > The main problem of (2), this series, is that we cannot extend the array
> > in place and therefore require some kind of protection when moving it.
> > RCU seems best, but it makes the code slower and harder to deal with.
> > The main advantage is that we do not need userspace changes.
> 
> Creating/Destroying vcpus is not something I consider a fast path, so why
> should we optimize for it? The case that needs to be fast is execution.

Right, the creation is not important.  I was concerned about the use of
lock() and unlock() needed for every access -- both in performance and
code, because the common case where hotplug doesn't happen and all VCPUs
are created upfront doesn't even need any runtime protection.

> What if we just sent a "vcpu move" request to all vcpus with the new pointer
> after it moved? That way the vcpu thread itself would be responsible for the
> migration to the new memory region. Only if all vcpus successfully moved,
> keep rolling (and allow foreign get_vcpu again).

I'm not sure if I understood this.  You propose to cache kvm->vcpus in
vcpu->vcpus and do an extensions of this,

  int vcpu_create(...) {
    if (resize_needed(kvm->vcpus)) {
      old_vcpus = kvm->vcpus
      kvm->vcpus = make_bigger(kvm->vcpus)
      kvm_make_all_cpus_request(kvm, KVM_REQ_UPDATE_VCPUS)
      free(old_vcpus)
    }
    vcpu->vcpus = kvm->vcpus
  }

with added extra locking, (S)RCU, on accesses that do not come from
VCPUs (irqfd and VM ioctl)?

> That way we should be basically lock-less and scale well. For additional
> icing, feel free to increase the vcpu array x2 every time it grows to not
> run into the slow path too often.

Yeah, I skipped the growing as it was not necessary for the
illustration.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array
  2017-08-17 11:14   ` David Hildenbrand
@ 2017-08-17 16:50     ` Radim Krčmář
  2017-08-17 16:54       ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2017-08-17 16:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel, Paolo Bonzini, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

2017-08-17 13:14+0200, David Hildenbrand:
> >  	atomic_set(&kvm->online_vcpus, 0);
> >  	mutex_unlock(&kvm->lock);
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c8df733eed41..eb9fb5b493ac 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -386,12 +386,17 @@ struct kvm_memslots {
> >  	int used_slots;
> >  };
> >  
> > +struct kvm_vcpus {
> > +	u32 online;
> > +	struct kvm_vcpu *array[];
> 
> On option could be to simply chunk it:
> 
> +struct kvm_vcpus {
> +       struct kvm_vcpu vcpus[32];

I'm thinking of 128/256.

> +};
> +
>  /*
>   * Note:
>   * memslots are not sorted by id anymore, please use id_to_memslot()
> @@ -391,7 +395,7 @@ struct kvm {
>         struct mutex slots_lock;
>         struct mm_struct *mm; /* userspace tied to this vm */
>         struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
> -       struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
> +       struct kvm_vcpus vcpus[(KVM_MAX_VCPUS + 31) / 32];
>         /*
>          * created_vcpus is protected by kvm->lock, and is incremented
> @@ -483,12 +487,14 @@ static inline struct kvm_io_bus
> *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> 
> 
> 1. make nobody access kvm->vcpus directly (factor out)
> 2. allocate next chunk if necessary when creating a VCPU and store
> pointer using WRITE_ONCE
> 3. use READ_ONCE to test for availability of the current chunk

We can also use kvm->online_vcpus exactly like we did now.

> kvm_for_each_vcpu just has to use READ_ONCE to access/test for the right
> chunk. Pointers never get invalid. No RCU needed. Sleeping in the loop
> is possible.

I like this better than SRCU because it keeps the internal code mostly
intact, even though it is compromise solution with a tunable.
(SRCU gives us more protection than we need.)

It is very similar to (3).  Chunks will usually take memory and have
slower access to the first chunk than an extendable list would, but is
faster from the third chunk, which is a reasonable deal.

We are just postponing the problem until a higher number of VCPUs, but
then we can extend it into more levels using the same principle and I
think it will still get a better trade-offs than SRCU.

I'd do this for v2,

thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array
  2017-08-17 16:50     ` Radim Krčmář
@ 2017-08-17 16:54       ` Paolo Bonzini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2017-08-17 16:54 UTC (permalink / raw)
  To: Radim Krčmář, David Hildenbrand
  Cc: linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel, Christoffer Dall, Marc Zyngier,
	Christian Borntraeger, Cornelia Huck, James Hogan,
	Paul Mackerras, Alexander Graf

On 17/08/2017 18:50, Radim Krčmář wrote:
> 2017-08-17 13:14+0200, David Hildenbrand:
>>>  	atomic_set(&kvm->online_vcpus, 0);
>>>  	mutex_unlock(&kvm->lock);
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index c8df733eed41..eb9fb5b493ac 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -386,12 +386,17 @@ struct kvm_memslots {
>>>  	int used_slots;
>>>  };
>>>  
>>> +struct kvm_vcpus {
>>> +	u32 online;
>>> +	struct kvm_vcpu *array[];
>>
>> On option could be to simply chunk it:
>>
>> +struct kvm_vcpus {
>> +       struct kvm_vcpu vcpus[32];
> 
> I'm thinking of 128/256.
> 
>> +};
>> +
>>  /*
>>   * Note:
>>   * memslots are not sorted by id anymore, please use id_to_memslot()
>> @@ -391,7 +395,7 @@ struct kvm {
>>         struct mutex slots_lock;
>>         struct mm_struct *mm; /* userspace tied to this vm */
>>         struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
>> -       struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
>> +       struct kvm_vcpus vcpus[(KVM_MAX_VCPUS + 31) / 32];
>>         /*
>>          * created_vcpus is protected by kvm->lock, and is incremented
>> @@ -483,12 +487,14 @@ static inline struct kvm_io_bus
>> *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
>>
>>
>> 1. make nobody access kvm->vcpus directly (factor out)
>> 2. allocate next chunk if necessary when creating a VCPU and store
>> pointer using WRITE_ONCE
>> 3. use READ_ONCE to test for availability of the current chunk
> 
> We can also use kvm->online_vcpus exactly like we did now.
> 
>> kvm_for_each_vcpu just has to use READ_ONCE to access/test for the right
>> chunk. Pointers never get invalid. No RCU needed. Sleeping in the loop
>> is possible.
> 
> I like this better than SRCU because it keeps the internal code mostly
> intact, even though it is compromise solution with a tunable.
> (SRCU gives us more protection than we need.)
>
> I'd do this for v2,

Sounds good!

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17 14:54   ` Radim Krčmář
@ 2017-08-17 19:17     ` Alexander Graf
  2017-08-18 14:10       ` Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander Graf @ 2017-08-17 19:17 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, Cornelia Huck,
	David Hildenbrand, Paolo Bonzini



On 17.08.17 16:54, Radim Krčmář wrote:
> 2017-08-17 09:04+0200, Alexander Graf:
>> On 16.08.17 21:40, Radim Krčmář  wrote:
>>> The goal is to increase KVM_MAX_VCPUS without worrying about memory
>>> impact of many small guests.
>>>
>>> This is a second out of three major "dynamic" options:
>>>    1) size vcpu array at VM creation time
>>>    2) resize vcpu array when new VCPUs are created
>>>    3) use a lockless list/tree for VCPUs
>>>
>>> The disadvantage of (1) is its requirement on userspace changes and
>>> limited flexibility because userspace must provide the maximal count on
>>> start.  The main advantage is that kvm->vcpus will work like it does
>>> now.  It has been posted as "[PATCH 0/4] KVM: add KVM_CREATE_VM2 to
>>> allow dynamic kvm->vcpus array",
>>> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1377285.html
>>>
>>> The main problem of (2), this series, is that we cannot extend the array
>>> in place and therefore require some kind of protection when moving it.
>>> RCU seems best, but it makes the code slower and harder to deal with.
>>> The main advantage is that we do not need userspace changes.
>>
>> Creating/Destroying vcpus is not something I consider a fast path, so why
>> should we optimize for it? The case that needs to be fast is execution.
> 
> Right, the creation is not important.  I was concerned about the use of
> lock() and unlock() needed for every access -- both in performance and
> code, because the common case where hotplug doesn't happen and all VCPUs
> are created upfront doesn't even need any runtime protection.
> 
>> What if we just sent a "vcpu move" request to all vcpus with the new pointer
>> after it moved? That way the vcpu thread itself would be responsible for the
>> migration to the new memory region. Only if all vcpus successfully moved,
>> keep rolling (and allow foreign get_vcpu again).
> 
> I'm not sure if I understood this.  You propose to cache kvm->vcpus in
> vcpu->vcpus and do an extensions of this,
> 
>    int vcpu_create(...) {
>      if (resize_needed(kvm->vcpus)) {
>        old_vcpus = kvm->vcpus
>        kvm->vcpus = make_bigger(kvm->vcpus)

if (kvm->vcpus != old_vcpus) :)

>        kvm_make_all_cpus_request(kvm, KVM_REQ_UPDATE_VCPUS)

IIRC you'd need some manual bookkeeping to ensure that all users have 
switched to the new array. Or set the KVM_REQUEST_WAIT flag :).

>        free(old_vcpus)
>      }
>      vcpu->vcpus = kvm->vcpus
>    }
> 
> with added extra locking, (S)RCU, on accesses that do not come from
> VCPUs (irqfd and VM ioctl)?

Well, in an ideal world we wouldn't have any users to vcpu structs 
outside of the vcpus obviously. Every time we do, we should either 
reconsider whether the design is smart and if we think it is, protect 
them accordingly. Maybe even hard code separate request mechanisms for 
the few cases where it's reasonable?

> 
>> That way we should be basically lock-less and scale well. For additional
>> icing, feel free to increase the vcpu array x2 every time it grows to not
>> run into the slow path too often.
> 
> Yeah, I skipped the growing as it was not necessary for the
> illustration.

Sure.

I'm also not fully advocating my solution here, but wanted to make sure 
we have it on the radar. I *think* this option has the least runtime 
overhead and best readability score, as it sticks to the same frameworks 
we already have and use throughout the code base ;).

That said, I'd love to get proven wrong.


Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-17 19:17     ` Alexander Graf
@ 2017-08-18 14:10       ` Radim Krčmář
  2017-08-18 14:22         ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2017-08-18 14:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Marc Zyngier, Christian Borntraeger, James Hogan,
	Christoffer Dall, Paul Mackerras, Cornelia Huck,
	David Hildenbrand, Paolo Bonzini

2017-08-17 21:17+0200, Alexander Graf:
> On 17.08.17 16:54, Radim Krčmář wrote:
> > 2017-08-17 09:04+0200, Alexander Graf:
> > > What if we just sent a "vcpu move" request to all vcpus with the new pointer
> > > after it moved? That way the vcpu thread itself would be responsible for the
> > > migration to the new memory region. Only if all vcpus successfully moved,
> > > keep rolling (and allow foreign get_vcpu again).
> > 
> > I'm not sure if I understood this.  You propose to cache kvm->vcpus in
> > vcpu->vcpus and do an extensions of this,
> > 
> >    int vcpu_create(...) {
> >      if (resize_needed(kvm->vcpus)) {
> >        old_vcpus = kvm->vcpus
> >        kvm->vcpus = make_bigger(kvm->vcpus)
> 
> if (kvm->vcpus != old_vcpus) :)
> 
> >        kvm_make_all_cpus_request(kvm, KVM_REQ_UPDATE_VCPUS)
> 
> IIRC you'd need some manual bookkeeping to ensure that all users have
> switched to the new array. Or set the KVM_REQUEST_WAIT flag :).

Absolutely.  I was thinking about synchronous execution, which might
need extra work to expedite halted VCPUs.  Letting the last user free it
is plausible and would need more protection against races.

> >        free(old_vcpus)
> >      }
> >      vcpu->vcpus = kvm->vcpus
> >    }
> > 
> > with added extra locking, (S)RCU, on accesses that do not come from
> > VCPUs (irqfd and VM ioctl)?
> 
> Well, in an ideal world we wouldn't have any users to vcpu structs outside
> of the vcpus obviously. Every time we do, we should either reconsider
> whether the design is smart and if we think it is, protect them accordingly.

And there would be no linear access to all VCPUs. :)

The main user of kvm->vcpus is kvm_for_each_vcpu(), which is well suited
for a list, so we can change the design of kvm_for_each_vcpu() to use a
list head in struct kvm_vcpu with head/tail in struct kvm.
(The list is trivial to make lockless as we only append.)

This would allow more flexibility with the remaining uses.

> Maybe even hard code separate request mechanisms for the few cases where
> it's reasonable?

All non-kvm_for_each_vcpu() seem to need accesss outside of VCPU scope.

We have few awkward accesses that can be handled keeping track of kvm
state and all remaining uses need some kind of "int -> struct kvm_vcpu"
mapping, where the integer is arbitrary.

All users of kvm_get_vcpu_by_id() need a vcpu_id mapping, but hijack
kvm->vcpus for O(1) access if lucky, with fallback to
kvm_for_each_vcpu().  Adding a vcpu_id mapping seems reasonable.

s390 __floating_irq_kick() and x86 kvm_irq_delivery_to_apic() are
keeping a bitmap for kvm->vcpus indices.  They want compact indices,
which cannot be provided by vcpu_id mapping.

I think that MIPS and ARM use the index in kvm->vcpus for userspace
communication, which looks dangerous as userspace shouldn't know the
position.  Not much we can do because of that.

> > > That way we should be basically lock-less and scale well. For additional
> > > icing, feel free to increase the vcpu array x2 every time it grows to not
> > > run into the slow path too often.
> > 
> > Yeah, I skipped the growing as it was not necessary for the
> > illustration.
> 
> Sure.
> 
> I'm also not fully advocating my solution here, but wanted to make sure we
> have it on the radar. I *think* this option has the least runtime overhead
> and best readability score, as it sticks to the same frameworks we already
> have and use throughout the code base ;).
> 
> That said, I'd love to get proven wrong.

Unless I missed some uses, the linked list for kvm_for_each_vcpu() and a
use-case specific protection for the rest looks better ... kvm->vcpus is
terribly overloaded.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array
  2017-08-18 14:10       ` Radim Krčmář
@ 2017-08-18 14:22         ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2017-08-18 14:22 UTC (permalink / raw)
  To: Radim Krčmář, Alexander Graf
  Cc: linux-mips, linux-arm-kernel, kvm, kvm-ppc, linux-kernel,
	linux-s390, Christian Borntraeger, James Hogan, Christoffer Dall,
	Paul Mackerras, Cornelia Huck, David Hildenbrand, Paolo Bonzini

On 18/08/17 15:10, Radim Krčmář wrote:
> 2017-08-17 21:17+0200, Alexander Graf:
>> On 17.08.17 16:54, Radim Krčmář wrote:
>>> 2017-08-17 09:04+0200, Alexander Graf:
>>>> What if we just sent a "vcpu move" request to all vcpus with the new pointer
>>>> after it moved? That way the vcpu thread itself would be responsible for the
>>>> migration to the new memory region. Only if all vcpus successfully moved,
>>>> keep rolling (and allow foreign get_vcpu again).
>>>
>>> I'm not sure if I understood this.  You propose to cache kvm->vcpus in
>>> vcpu->vcpus and do an extensions of this,
>>>
>>>    int vcpu_create(...) {
>>>      if (resize_needed(kvm->vcpus)) {
>>>        old_vcpus = kvm->vcpus
>>>        kvm->vcpus = make_bigger(kvm->vcpus)
>>
>> if (kvm->vcpus != old_vcpus) :)
>>
>>>        kvm_make_all_cpus_request(kvm, KVM_REQ_UPDATE_VCPUS)
>>
>> IIRC you'd need some manual bookkeeping to ensure that all users have
>> switched to the new array. Or set the KVM_REQUEST_WAIT flag :).
> 
> Absolutely.  I was thinking about synchronous execution, which might
> need extra work to expedite halted VCPUs.  Letting the last user free it
> is plausible and would need more protection against races.
> 
>>>        free(old_vcpus)
>>>      }
>>>      vcpu->vcpus = kvm->vcpus
>>>    }
>>>
>>> with added extra locking, (S)RCU, on accesses that do not come from
>>> VCPUs (irqfd and VM ioctl)?
>>
>> Well, in an ideal world we wouldn't have any users to vcpu structs outside
>> of the vcpus obviously. Every time we do, we should either reconsider
>> whether the design is smart and if we think it is, protect them accordingly.
> 
> And there would be no linear access to all VCPUs. :)
> 
> The main user of kvm->vcpus is kvm_for_each_vcpu(), which is well suited
> for a list, so we can change the design of kvm_for_each_vcpu() to use a
> list head in struct kvm_vcpu with head/tail in struct kvm.
> (The list is trivial to make lockless as we only append.)
> 
> This would allow more flexibility with the remaining uses.
> 
>> Maybe even hard code separate request mechanisms for the few cases where
>> it's reasonable?
> 
> All non-kvm_for_each_vcpu() seem to need accesss outside of VCPU scope.
> 
> We have few awkward accesses that can be handled keeping track of kvm
> state and all remaining uses need some kind of "int -> struct kvm_vcpu"
> mapping, where the integer is arbitrary.
> 
> All users of kvm_get_vcpu_by_id() need a vcpu_id mapping, but hijack
> kvm->vcpus for O(1) access if lucky, with fallback to
> kvm_for_each_vcpu().  Adding a vcpu_id mapping seems reasonable.
> 
> s390 __floating_irq_kick() and x86 kvm_irq_delivery_to_apic() are
> keeping a bitmap for kvm->vcpus indices.  They want compact indices,
> which cannot be provided by vcpu_id mapping.
> 
> I think that MIPS and ARM use the index in kvm->vcpus for userspace
> communication, which looks dangerous as userspace shouldn't know the
> position.  Not much we can do because of that.
I think (at least for the ARM side) that we could switch whatever use we
have of the index to a vcpu_id. The worse offender (as far as I can
remember) is when injecting an interrupt, and that could be creatively
re-purposed to describe an affinity value in a backward compatible way.

Probably.

	N,
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC
  2017-08-16 19:40 ` [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC Radim Krčmář
@ 2017-08-21 13:48   ` Christian Borntraeger
  0 siblings, 0 replies; 24+ messages in thread
From: Christian Borntraeger @ 2017-08-21 13:48 UTC (permalink / raw)
  To: Radim Krčmář,
	linux-kernel, kvm, linux-mips, kvm-ppc, linux-s390,
	linux-arm-kernel
  Cc: Paolo Bonzini, David Hildenbrand, Christoffer Dall, Marc Zyngier,
	Cornelia Huck, James Hogan, Paul Mackerras, Alexander Graf

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

Since we removed ia64 from KVM we no longer need an _arch_ variant.

On 08/16/2017 09:40 PM, Radim Krčmář wrote:
> Moving it to generic code will allow us to extend it with ease.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  include/linux/kvm_host.h | 12 ------------
>  virt/kvm/kvm_main.c      | 16 +++++++++++++---
>  2 files changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6882538eda32..c8df733eed41 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -803,18 +803,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
>  bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
> 
> -#ifndef __KVM_HAVE_ARCH_VM_ALLOC
> -static inline struct kvm *kvm_arch_alloc_vm(void)
> -{
> -	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
> -}
> -
> -static inline void kvm_arch_free_vm(struct kvm *kvm)
> -{
> -	kfree(kvm);
> -}
> -#endif
> -
>  #ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
>  void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
>  void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e17c40d986f3..2eac2c62795f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -640,10 +640,20 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  	return 0;
>  }
> 
> +static inline struct kvm *kvm_alloc_vm(void)
> +{
> +	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
> +}
> +
> +static inline void kvm_free_vm(struct kvm *kvm)
> +{
> +	kfree(kvm);
> +}
> +
>  static struct kvm *kvm_create_vm(unsigned long type)
>  {
>  	int r, i;
> -	struct kvm *kvm = kvm_arch_alloc_vm();
> +	struct kvm *kvm = kvm_alloc_vm();
> 
>  	if (!kvm)
>  		return ERR_PTR(-ENOMEM);
> @@ -720,7 +730,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>  		kfree(kvm_get_bus(kvm, i));
>  	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
>  		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
> -	kvm_arch_free_vm(kvm);
> +	kvm_free_vm(kvm);
>  	mmdrop(current->mm);
>  	return ERR_PTR(r);
>  }
> @@ -771,7 +781,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>  		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
>  	cleanup_srcu_struct(&kvm->irq_srcu);
>  	cleanup_srcu_struct(&kvm->srcu);
> -	kvm_arch_free_vm(kvm);
> +	kvm_free_vm(kvm);
>  	preempt_notifier_dec();
>  	hardware_disable_all();
>  	mmdrop(mm);
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-21 13:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-16 19:40 [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Radim Krčmář
2017-08-16 19:40 ` [PATCH RFC 1/2] KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC Radim Krčmář
2017-08-21 13:48   ` Christian Borntraeger
2017-08-16 19:40 ` [PATCH RFC 2/2] KVM: RCU protected dynamic vcpus array Radim Krčmář
2017-08-17  8:07   ` Cornelia Huck
2017-08-17 11:14   ` David Hildenbrand
2017-08-17 16:50     ` Radim Krčmář
2017-08-17 16:54       ` Paolo Bonzini
2017-08-17  7:04 ` [PATCH RFC 0/2] KVM: use RCU to allow dynamic kvm->vcpus array Alexander Graf
2017-08-17  7:36   ` Cornelia Huck
2017-08-17  9:16     ` Paolo Bonzini
2017-08-17  9:28       ` Cornelia Huck
2017-08-17  9:44         ` Paolo Bonzini
2017-08-17  9:55           ` David Hildenbrand
2017-08-17 10:18             ` Paolo Bonzini
2017-08-17 10:20               ` David Hildenbrand
2017-08-17 10:23                 ` Paolo Bonzini
2017-08-17 10:31                   ` David Hildenbrand
2017-08-17 14:54   ` Radim Krčmář
2017-08-17 19:17     ` Alexander Graf
2017-08-18 14:10       ` Radim Krčmář
2017-08-18 14:22         ` Marc Zyngier
2017-08-17  7:29 ` David Hildenbrand
2017-08-17  7:37   ` Cornelia Huck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).