All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-23  6:34 ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:34 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Recent reports of lockdep splats in the HV KVM code revealed that it
was taking the kvm->lock mutex in several contexts where a vcpu mutex
was already held.  Lockdep has only started warning since I added code
to take the vcpu mutexes in the XIVE device release functions, but
since Documentation/virtual/kvm/locking.txt specifies that the vcpu
mutexes nest inside kvm->lock, it seems that the new code is correct
and it is most of the old uses of kvm->lock that are wrong.

This series should fix the problems, by adding new mutexes that nest
inside the vcpu mutexes and using them instead of kvm->lock.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-23  6:34 ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:34 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Recent reports of lockdep splats in the HV KVM code revealed that it
was taking the kvm->lock mutex in several contexts where a vcpu mutex
was already held.  Lockdep has only started warning since I added code
to take the vcpu mutexes in the XIVE device release functions, but
since Documentation/virtual/kvm/locking.txt specifies that the vcpu
mutexes nest inside kvm->lock, it seems that the new code is correct
and it is most of the old uses of kvm->lock that are wrong.

This series should fix the problems, by adding new mutexes that nest
inside the vcpu mutexes and using them instead of kvm->lock.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-23  6:35   ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:35 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently, kvmppc_xive_release() and kvmppc_xive_native_release() clear
kvm->arch.mmu_ready and call kick_all_cpus_sync() as a way of ensuring
that no vcpus are executing in the guest.  However, future patches will
change the mutex associated with kvm->arch.mmu_ready to a new mutex that
nests inside the vcpu mutexes, making it difficult to continue to use
this method.

In fact, taking the vcpu mutex for a vcpu excludes execution of that
vcpu, and we already take the vcpu mutex around the call to
kvmppc_xive_[native_]cleanup_vcpu().  Once the cleanup function is
done and we release the vcpu mutex, the vcpu can execute once again,
but because we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type,
vcpu->arch.xive_esc_vaddr and vcpu->arch.xive_esc_raddr, that vcpu will
not be going into XIVE code any more.  Thus, once we have cleaned up
all of the vcpus, we are safe to clean up the rest of the XIVE state,
and we don't need to use kvm->arch.mmu_ready to hold off vcpu execution.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_xive.c        | 28 ++++++++++++----------------
 arch/powerpc/kvm/book3s_xive_native.c | 28 ++++++++++++----------------
 2 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 4953957..f623451 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1859,21 +1859,10 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	struct kvm *kvm = xive->kvm;
 	struct kvm_vcpu *vcpu;
 	int i;
-	int was_ready;
 
 	pr_devel("Releasing xive device\n");
 
-	debugfs_remove(xive->dentry);
-
 	/*
-	 * Clearing mmu_ready temporarily while holding kvm->lock
-	 * is a way of ensuring that no vcpus can enter the guest
-	 * until we drop kvm->lock.  Doing kick_all_cpus_sync()
-	 * ensures that any vcpu executing inside the guest has
-	 * exited the guest.  Once kick_all_cpus_sync() has finished,
-	 * we know that no vcpu can be executing the XIVE push or
-	 * pull code, or executing a XICS hcall.
-	 *
 	 * Since this is the device release function, we know that
 	 * userspace does not have any open fd referring to the
 	 * device.  Therefore there can not be any of the device
@@ -1881,9 +1870,8 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	 * and similarly, the connect_vcpu and set/clr_mapped
 	 * functions also cannot be being executed.
 	 */
-	was_ready = kvm->arch.mmu_ready;
-	kvm->arch.mmu_ready = 0;
-	kick_all_cpus_sync();
+
+	debugfs_remove(xive->dentry);
 
 	/*
 	 * We should clean up the vCPU interrupt presenters first.
@@ -1892,12 +1880,22 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 		/*
 		 * Take vcpu->mutex to ensure that no one_reg get/set ioctl
 		 * (i.e. kvmppc_xive_[gs]et_icp) can be done concurrently.
+		 * Holding the vcpu->mutex also means that the vcpu cannot
+		 * be executing the KVM_RUN ioctl, and therefore it cannot
+		 * be executing the XIVE push or pull code or accessing
+		 * the XIVE MMIO regions.
 		 */
 		mutex_lock(&vcpu->mutex);
 		kvmppc_xive_cleanup_vcpu(vcpu);
 		mutex_unlock(&vcpu->mutex);
 	}
 
+	/*
+	 * Now that we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type
+	 * and vcpu->arch.xive_esc_[vr]addr on each vcpu, we are safe
+	 * against xive code getting called during vcpu execution or
+	 * set/get one_reg operations.
+	 */
 	kvm->arch.xive = NULL;
 
 	/* Mask and free interrupts */
@@ -1911,8 +1909,6 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
-	kvm->arch.mmu_ready = was_ready;
-
 	/*
 	 * A reference of the kvmppc_xive pointer is now kept under
 	 * the xive_devices struct of the machine for reuse. It is
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 6a8e698..da31dd0 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -973,21 +973,10 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	struct kvm *kvm = xive->kvm;
 	struct kvm_vcpu *vcpu;
 	int i;
-	int was_ready;
-
-	debugfs_remove(xive->dentry);
 
 	pr_devel("Releasing xive native device\n");
 
 	/*
-	 * Clearing mmu_ready temporarily while holding kvm->lock
-	 * is a way of ensuring that no vcpus can enter the guest
-	 * until we drop kvm->lock.  Doing kick_all_cpus_sync()
-	 * ensures that any vcpu executing inside the guest has
-	 * exited the guest.  Once kick_all_cpus_sync() has finished,
-	 * we know that no vcpu can be executing the XIVE push or
-	 * pull code or accessing the XIVE MMIO regions.
-	 *
 	 * Since this is the device release function, we know that
 	 * userspace does not have any open fd or mmap referring to
 	 * the device.  Therefore there can not be any of the
@@ -996,9 +985,8 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	 * connect_vcpu and set/clr_mapped functions also cannot
 	 * be being executed.
 	 */
-	was_ready = kvm->arch.mmu_ready;
-	kvm->arch.mmu_ready = 0;
-	kick_all_cpus_sync();
+
+	debugfs_remove(xive->dentry);
 
 	/*
 	 * We should clean up the vCPU interrupt presenters first.
@@ -1007,12 +995,22 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 		/*
 		 * Take vcpu->mutex to ensure that no one_reg get/set ioctl
 		 * (i.e. kvmppc_xive_native_[gs]et_vp) can be being done.
+		 * Holding the vcpu->mutex also means that the vcpu cannot
+		 * be executing the KVM_RUN ioctl, and therefore it cannot
+		 * be executing the XIVE push or pull code or accessing
+		 * the XIVE MMIO regions.
 		 */
 		mutex_lock(&vcpu->mutex);
 		kvmppc_xive_native_cleanup_vcpu(vcpu);
 		mutex_unlock(&vcpu->mutex);
 	}
 
+	/*
+	 * Now that we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type
+	 * and vcpu->arch.xive_esc_[vr]addr on each vcpu, we are safe
+	 * against xive code getting called during vcpu execution or
+	 * set/get one_reg operations.
+	 */
 	kvm->arch.xive = NULL;
 
 	for (i = 0; i <= xive->max_sbid; i++) {
@@ -1025,8 +1023,6 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
-	kvm->arch.mmu_ready = was_ready;
-
 	/*
 	 * A reference of the kvmppc_xive pointer is now kept under
 	 * the xive_devices struct of the machine for reuse. It is
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/4] KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions
@ 2019-05-23  6:35   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:35 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently, kvmppc_xive_release() and kvmppc_xive_native_release() clear
kvm->arch.mmu_ready and call kick_all_cpus_sync() as a way of ensuring
that no vcpus are executing in the guest.  However, future patches will
change the mutex associated with kvm->arch.mmu_ready to a new mutex that
nests inside the vcpu mutexes, making it difficult to continue to use
this method.

In fact, taking the vcpu mutex for a vcpu excludes execution of that
vcpu, and we already take the vcpu mutex around the call to
kvmppc_xive_[native_]cleanup_vcpu().  Once the cleanup function is
done and we release the vcpu mutex, the vcpu can execute once again,
but because we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type,
vcpu->arch.xive_esc_vaddr and vcpu->arch.xive_esc_raddr, that vcpu will
not be going into XIVE code any more.  Thus, once we have cleaned up
all of the vcpus, we are safe to clean up the rest of the XIVE state,
and we don't need to use kvm->arch.mmu_ready to hold off vcpu execution.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_xive.c        | 28 ++++++++++++----------------
 arch/powerpc/kvm/book3s_xive_native.c | 28 ++++++++++++----------------
 2 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 4953957..f623451 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1859,21 +1859,10 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	struct kvm *kvm = xive->kvm;
 	struct kvm_vcpu *vcpu;
 	int i;
-	int was_ready;
 
 	pr_devel("Releasing xive device\n");
 
-	debugfs_remove(xive->dentry);
-
 	/*
-	 * Clearing mmu_ready temporarily while holding kvm->lock
-	 * is a way of ensuring that no vcpus can enter the guest
-	 * until we drop kvm->lock.  Doing kick_all_cpus_sync()
-	 * ensures that any vcpu executing inside the guest has
-	 * exited the guest.  Once kick_all_cpus_sync() has finished,
-	 * we know that no vcpu can be executing the XIVE push or
-	 * pull code, or executing a XICS hcall.
-	 *
 	 * Since this is the device release function, we know that
 	 * userspace does not have any open fd referring to the
 	 * device.  Therefore there can not be any of the device
@@ -1881,9 +1870,8 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	 * and similarly, the connect_vcpu and set/clr_mapped
 	 * functions also cannot be being executed.
 	 */
-	was_ready = kvm->arch.mmu_ready;
-	kvm->arch.mmu_ready = 0;
-	kick_all_cpus_sync();
+
+	debugfs_remove(xive->dentry);
 
 	/*
 	 * We should clean up the vCPU interrupt presenters first.
@@ -1892,12 +1880,22 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 		/*
 		 * Take vcpu->mutex to ensure that no one_reg get/set ioctl
 		 * (i.e. kvmppc_xive_[gs]et_icp) can be done concurrently.
+		 * Holding the vcpu->mutex also means that the vcpu cannot
+		 * be executing the KVM_RUN ioctl, and therefore it cannot
+		 * be executing the XIVE push or pull code or accessing
+		 * the XIVE MMIO regions.
 		 */
 		mutex_lock(&vcpu->mutex);
 		kvmppc_xive_cleanup_vcpu(vcpu);
 		mutex_unlock(&vcpu->mutex);
 	}
 
+	/*
+	 * Now that we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type
+	 * and vcpu->arch.xive_esc_[vr]addr on each vcpu, we are safe
+	 * against xive code getting called during vcpu execution or
+	 * set/get one_reg operations.
+	 */
 	kvm->arch.xive = NULL;
 
 	/* Mask and free interrupts */
@@ -1911,8 +1909,6 @@ static void kvmppc_xive_release(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
-	kvm->arch.mmu_ready = was_ready;
-
 	/*
 	 * A reference of the kvmppc_xive pointer is now kept under
 	 * the xive_devices struct of the machine for reuse. It is
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 6a8e698..da31dd0 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -973,21 +973,10 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	struct kvm *kvm = xive->kvm;
 	struct kvm_vcpu *vcpu;
 	int i;
-	int was_ready;
-
-	debugfs_remove(xive->dentry);
 
 	pr_devel("Releasing xive native device\n");
 
 	/*
-	 * Clearing mmu_ready temporarily while holding kvm->lock
-	 * is a way of ensuring that no vcpus can enter the guest
-	 * until we drop kvm->lock.  Doing kick_all_cpus_sync()
-	 * ensures that any vcpu executing inside the guest has
-	 * exited the guest.  Once kick_all_cpus_sync() has finished,
-	 * we know that no vcpu can be executing the XIVE push or
-	 * pull code or accessing the XIVE MMIO regions.
-	 *
 	 * Since this is the device release function, we know that
 	 * userspace does not have any open fd or mmap referring to
 	 * the device.  Therefore there can not be any of the
@@ -996,9 +985,8 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	 * connect_vcpu and set/clr_mapped functions also cannot
 	 * be being executed.
 	 */
-	was_ready = kvm->arch.mmu_ready;
-	kvm->arch.mmu_ready = 0;
-	kick_all_cpus_sync();
+
+	debugfs_remove(xive->dentry);
 
 	/*
 	 * We should clean up the vCPU interrupt presenters first.
@@ -1007,12 +995,22 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 		/*
 		 * Take vcpu->mutex to ensure that no one_reg get/set ioctl
 		 * (i.e. kvmppc_xive_native_[gs]et_vp) can be being done.
+		 * Holding the vcpu->mutex also means that the vcpu cannot
+		 * be executing the KVM_RUN ioctl, and therefore it cannot
+		 * be executing the XIVE push or pull code or accessing
+		 * the XIVE MMIO regions.
 		 */
 		mutex_lock(&vcpu->mutex);
 		kvmppc_xive_native_cleanup_vcpu(vcpu);
 		mutex_unlock(&vcpu->mutex);
 	}
 
+	/*
+	 * Now that we have cleared vcpu->arch.xive_vcpu, vcpu->arch.irq_type
+	 * and vcpu->arch.xive_esc_[vr]addr on each vcpu, we are safe
+	 * against xive code getting called during vcpu execution or
+	 * set/get one_reg operations.
+	 */
 	kvm->arch.xive = NULL;
 
 	for (i = 0; i <= xive->max_sbid; i++) {
@@ -1025,8 +1023,6 @@ static void kvmppc_xive_native_release(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
-	kvm->arch.mmu_ready = was_ready;
-
 	/*
 	 * A reference of the kvmppc_xive pointer is now kept under
 	 * the xive_devices struct of the machine for reuse. It is
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-23  6:35   ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:35 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the HV KVM code uses kvm->lock in conjunction with a flag,
kvm->arch.mmu_ready, to synchronize MMU setup and hold off vcpu
execution until the MMU-related data structures are ready.  However,
this means that kvm->lock is being taken inside vcpu->mutex, which
is contrary to Documentation/virtual/kvm/locking.txt and results in
lockdep warnings.

To fix this, we add a new mutex, kvm->arch.mmu_setup_lock, which nests
inside the vcpu mutexes, and is taken in the places where kvm->lock
was taken that are related to MMU setup.

Additionally we take the new mutex in the vcpu creation code at the
point where we are creating a new vcore, in order to provide mutual
exclusion with kvmppc_update_lpcr() and ensure that an update to
kvm->arch.lpcr doesn't get missed, which could otherwise lead to a
stale vcore->lpcr value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 36 ++++++++++++++++++------------------
 arch/powerpc/kvm/book3s_hv.c        | 31 +++++++++++++++++++++++--------
 3 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 013c76a..26b3ce4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -325,6 +325,7 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
 	u64 l1_ptcr;
 	int max_nested_lpid;
 	struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ab3d484..5197131 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -63,7 +63,7 @@ struct kvm_resize_hpt {
 	struct work_struct work;
 	u32 order;
 
-	/* These fields protected by kvm->lock */
+	/* These fields protected by kvm->arch.mmu_setup_lock */
 
 	/* Possible values and their usage:
 	 *  <0     an error occurred during allocation,
@@ -73,7 +73,7 @@ struct kvm_resize_hpt {
 	int error;
 
 	/* Private to the work thread, until error != -EBUSY,
-	 * then protected by kvm->lock.
+	 * then protected by kvm->arch.mmu_setup_lock.
 	 */
 	struct kvm_hpt_info hpt;
 };
@@ -139,7 +139,7 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 	long err = -EBUSY;
 	struct kvm_hpt_info info;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (kvm->arch.mmu_ready) {
 		kvm->arch.mmu_ready = 0;
 		/* order mmu_ready vs. vcpus_running */
@@ -183,7 +183,7 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 		/* Ensure that each vcpu will flush its TLB on next entry. */
 		cpumask_setall(&kvm->arch.need_tlb_flush);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return err;
 }
 
@@ -1447,7 +1447,7 @@ static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
 
 static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
 {
-	if (WARN_ON(!mutex_is_locked(&kvm->lock)))
+	if (WARN_ON(!mutex_is_locked(&kvm->arch.mmu_setup_lock)))
 		return;
 
 	if (!resize)
@@ -1474,14 +1474,14 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 	if (WARN_ON(resize->error != -EBUSY))
 		return;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	/* Request is still current? */
 	if (kvm->arch.resize_hpt == resize) {
 		/* We may request large allocations here:
-		 * do not sleep with kvm->lock held for a while.
+		 * do not sleep with kvm->arch.mmu_setup_lock held for a while.
 		 */
-		mutex_unlock(&kvm->lock);
+		mutex_unlock(&kvm->arch.mmu_setup_lock);
 
 		resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
 				 resize->order);
@@ -1494,9 +1494,9 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 		if (WARN_ON(err == -EBUSY))
 			err = -EINPROGRESS;
 
-		mutex_lock(&kvm->lock);
+		mutex_lock(&kvm->arch.mmu_setup_lock);
 		/* It is possible that kvm->arch.resize_hpt != resize
-		 * after we grab kvm->lock again.
+		 * after we grab kvm->arch.mmu_setup_lock again.
 		 */
 	}
 
@@ -1505,7 +1505,7 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 	if (kvm->arch.resize_hpt != resize)
 		resize_hpt_release(kvm, resize);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 }
 
 long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
@@ -1522,7 +1522,7 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
 	if (shift && ((shift < 18) || (shift > 46)))
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	resize = kvm->arch.resize_hpt;
 
@@ -1565,7 +1565,7 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
 	ret = 100; /* estimated time in ms */
 
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return ret;
 }
 
@@ -1588,7 +1588,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 	if (shift && ((shift < 18) || (shift > 46)))
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	resize = kvm->arch.resize_hpt;
 
@@ -1625,7 +1625,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 	smp_mb();
 out_no_hpt:
 	resize_hpt_release(kvm, resize);
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return ret;
 }
 
@@ -1868,7 +1868,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		return -EINVAL;
 
 	/* lock out vcpus from running while we're doing this */
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	mmu_ready = kvm->arch.mmu_ready;
 	if (mmu_ready) {
 		kvm->arch.mmu_ready = 0;	/* temporarily */
@@ -1876,7 +1876,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
 			kvm->arch.mmu_ready = 1;
-			mutex_unlock(&kvm->lock);
+			mutex_unlock(&kvm->arch.mmu_setup_lock);
 			return -EBUSY;
 		}
 	}
@@ -1963,7 +1963,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	/* Order HPTE updates vs. mmu_ready */
 	smp_wmb();
 	kvm->arch.mmu_ready = mmu_ready;
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 
 	if (err)
 		return err;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d5fc624..b1c0a9b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2338,11 +2338,17 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 			pr_devel("KVM: collision on id %u", id);
 			vcore = NULL;
 		} else if (!vcore) {
+			/*
+			 * Take mmu_setup_lock for mutual exclusion
+			 * with kvmppc_update_lpcr().
+			 */
 			err = -ENOMEM;
 			vcore = kvmppc_vcore_create(kvm,
 					id & ~(kvm->arch.smt_mode - 1));
+			mutex_lock(&kvm->arch.mmu_setup_lock);
 			kvm->arch.vcores[core] = vcore;
 			kvm->arch.online_vcores++;
+			mutex_unlock(&kvm->arch.mmu_setup_lock);
 		}
 	}
 	mutex_unlock(&kvm->lock);
@@ -3859,7 +3865,7 @@ static int kvmhv_setup_mmu(struct kvm_vcpu *vcpu)
 	int r = 0;
 	struct kvm *kvm = vcpu->kvm;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (!kvm->arch.mmu_ready) {
 		if (!kvm_is_radix(kvm))
 			r = kvmppc_hv_setup_htab_rma(vcpu);
@@ -3869,7 +3875,7 @@ static int kvmhv_setup_mmu(struct kvm_vcpu *vcpu)
 			kvm->arch.mmu_ready = 1;
 		}
 	}
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return r;
 }
 
@@ -4478,7 +4484,8 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 
 /*
  * Update LPCR values in kvm->arch and in vcores.
- * Caller must hold kvm->lock.
+ * Caller must hold kvm->arch.mmu_setup_lock (for mutual exclusion
+ * of kvm->arch.lpcr update).
  */
 void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 {
@@ -4530,7 +4537,7 @@ void kvmppc_setup_partition_table(struct kvm *kvm)
 
 /*
  * Set up HPT (hashed page table) and RMA (real-mode area).
- * Must be called with kvm->lock held.
+ * Must be called with kvm->arch.mmu_setup_lock held.
  */
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 {
@@ -4618,7 +4625,10 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	goto out_srcu;
 }
 
-/* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */
+/*
+ * Must be called with kvm->arch.mmu_setup_lock held and
+ * mmu_ready = 0 and no vcpus running.
+ */
 int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 {
 	if (nesting_enabled(kvm))
@@ -4635,7 +4645,10 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 	return 0;
 }
 
-/* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */
+/*
+ * Must be called with kvm->arch.mmu_setup_lock held and
+ * mmu_ready = 0 and no vcpus running.
+ */
 int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
 {
 	int err;
@@ -4740,6 +4753,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	char buf[32];
 	int ret;
 
+	mutex_init(&kvm->arch.mmu_setup_lock);
+
 	/* Allocate the guest's logical partition ID */
 
 	lpid = kvmppc_alloc_lpid();
@@ -5265,7 +5280,7 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
 	if (kvmhv_on_pseries() && !radix)
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (radix != kvm_is_radix(kvm)) {
 		if (kvm->arch.mmu_ready) {
 			kvm->arch.mmu_ready = 0;
@@ -5293,7 +5308,7 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
 	err = 0;
 
  out_unlock:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return err;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup
@ 2019-05-23  6:35   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:35 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the HV KVM code uses kvm->lock in conjunction with a flag,
kvm->arch.mmu_ready, to synchronize MMU setup and hold off vcpu
execution until the MMU-related data structures are ready.  However,
this means that kvm->lock is being taken inside vcpu->mutex, which
is contrary to Documentation/virtual/kvm/locking.txt and results in
lockdep warnings.

To fix this, we add a new mutex, kvm->arch.mmu_setup_lock, which nests
inside the vcpu mutexes, and is taken in the places where kvm->lock
was taken that are related to MMU setup.

Additionally we take the new mutex in the vcpu creation code at the
point where we are creating a new vcore, in order to provide mutual
exclusion with kvmppc_update_lpcr() and ensure that an update to
kvm->arch.lpcr doesn't get missed, which could otherwise lead to a
stale vcore->lpcr value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 36 ++++++++++++++++++------------------
 arch/powerpc/kvm/book3s_hv.c        | 31 +++++++++++++++++++++++--------
 3 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 013c76a..26b3ce4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -325,6 +325,7 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
 	u64 l1_ptcr;
 	int max_nested_lpid;
 	struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ab3d484..5197131 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -63,7 +63,7 @@ struct kvm_resize_hpt {
 	struct work_struct work;
 	u32 order;
 
-	/* These fields protected by kvm->lock */
+	/* These fields protected by kvm->arch.mmu_setup_lock */
 
 	/* Possible values and their usage:
 	 *  <0     an error occurred during allocation,
@@ -73,7 +73,7 @@ struct kvm_resize_hpt {
 	int error;
 
 	/* Private to the work thread, until error != -EBUSY,
-	 * then protected by kvm->lock.
+	 * then protected by kvm->arch.mmu_setup_lock.
 	 */
 	struct kvm_hpt_info hpt;
 };
@@ -139,7 +139,7 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 	long err = -EBUSY;
 	struct kvm_hpt_info info;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (kvm->arch.mmu_ready) {
 		kvm->arch.mmu_ready = 0;
 		/* order mmu_ready vs. vcpus_running */
@@ -183,7 +183,7 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
 		/* Ensure that each vcpu will flush its TLB on next entry. */
 		cpumask_setall(&kvm->arch.need_tlb_flush);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return err;
 }
 
@@ -1447,7 +1447,7 @@ static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
 
 static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
 {
-	if (WARN_ON(!mutex_is_locked(&kvm->lock)))
+	if (WARN_ON(!mutex_is_locked(&kvm->arch.mmu_setup_lock)))
 		return;
 
 	if (!resize)
@@ -1474,14 +1474,14 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 	if (WARN_ON(resize->error != -EBUSY))
 		return;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	/* Request is still current? */
 	if (kvm->arch.resize_hpt = resize) {
 		/* We may request large allocations here:
-		 * do not sleep with kvm->lock held for a while.
+		 * do not sleep with kvm->arch.mmu_setup_lock held for a while.
 		 */
-		mutex_unlock(&kvm->lock);
+		mutex_unlock(&kvm->arch.mmu_setup_lock);
 
 		resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
 				 resize->order);
@@ -1494,9 +1494,9 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 		if (WARN_ON(err = -EBUSY))
 			err = -EINPROGRESS;
 
-		mutex_lock(&kvm->lock);
+		mutex_lock(&kvm->arch.mmu_setup_lock);
 		/* It is possible that kvm->arch.resize_hpt != resize
-		 * after we grab kvm->lock again.
+		 * after we grab kvm->arch.mmu_setup_lock again.
 		 */
 	}
 
@@ -1505,7 +1505,7 @@ static void resize_hpt_prepare_work(struct work_struct *work)
 	if (kvm->arch.resize_hpt != resize)
 		resize_hpt_release(kvm, resize);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 }
 
 long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
@@ -1522,7 +1522,7 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
 	if (shift && ((shift < 18) || (shift > 46)))
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	resize = kvm->arch.resize_hpt;
 
@@ -1565,7 +1565,7 @@ long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
 	ret = 100; /* estimated time in ms */
 
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return ret;
 }
 
@@ -1588,7 +1588,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 	if (shift && ((shift < 18) || (shift > 46)))
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 
 	resize = kvm->arch.resize_hpt;
 
@@ -1625,7 +1625,7 @@ long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 	smp_mb();
 out_no_hpt:
 	resize_hpt_release(kvm, resize);
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return ret;
 }
 
@@ -1868,7 +1868,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		return -EINVAL;
 
 	/* lock out vcpus from running while we're doing this */
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	mmu_ready = kvm->arch.mmu_ready;
 	if (mmu_ready) {
 		kvm->arch.mmu_ready = 0;	/* temporarily */
@@ -1876,7 +1876,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
 			kvm->arch.mmu_ready = 1;
-			mutex_unlock(&kvm->lock);
+			mutex_unlock(&kvm->arch.mmu_setup_lock);
 			return -EBUSY;
 		}
 	}
@@ -1963,7 +1963,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	/* Order HPTE updates vs. mmu_ready */
 	smp_wmb();
 	kvm->arch.mmu_ready = mmu_ready;
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 
 	if (err)
 		return err;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d5fc624..b1c0a9b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2338,11 +2338,17 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 			pr_devel("KVM: collision on id %u", id);
 			vcore = NULL;
 		} else if (!vcore) {
+			/*
+			 * Take mmu_setup_lock for mutual exclusion
+			 * with kvmppc_update_lpcr().
+			 */
 			err = -ENOMEM;
 			vcore = kvmppc_vcore_create(kvm,
 					id & ~(kvm->arch.smt_mode - 1));
+			mutex_lock(&kvm->arch.mmu_setup_lock);
 			kvm->arch.vcores[core] = vcore;
 			kvm->arch.online_vcores++;
+			mutex_unlock(&kvm->arch.mmu_setup_lock);
 		}
 	}
 	mutex_unlock(&kvm->lock);
@@ -3859,7 +3865,7 @@ static int kvmhv_setup_mmu(struct kvm_vcpu *vcpu)
 	int r = 0;
 	struct kvm *kvm = vcpu->kvm;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (!kvm->arch.mmu_ready) {
 		if (!kvm_is_radix(kvm))
 			r = kvmppc_hv_setup_htab_rma(vcpu);
@@ -3869,7 +3875,7 @@ static int kvmhv_setup_mmu(struct kvm_vcpu *vcpu)
 			kvm->arch.mmu_ready = 1;
 		}
 	}
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return r;
 }
 
@@ -4478,7 +4484,8 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 
 /*
  * Update LPCR values in kvm->arch and in vcores.
- * Caller must hold kvm->lock.
+ * Caller must hold kvm->arch.mmu_setup_lock (for mutual exclusion
+ * of kvm->arch.lpcr update).
  */
 void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask)
 {
@@ -4530,7 +4537,7 @@ void kvmppc_setup_partition_table(struct kvm *kvm)
 
 /*
  * Set up HPT (hashed page table) and RMA (real-mode area).
- * Must be called with kvm->lock held.
+ * Must be called with kvm->arch.mmu_setup_lock held.
  */
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 {
@@ -4618,7 +4625,10 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	goto out_srcu;
 }
 
-/* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */
+/*
+ * Must be called with kvm->arch.mmu_setup_lock held and
+ * mmu_ready = 0 and no vcpus running.
+ */
 int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 {
 	if (nesting_enabled(kvm))
@@ -4635,7 +4645,10 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 	return 0;
 }
 
-/* Must be called with kvm->lock held and mmu_ready = 0 and no vcpus running */
+/*
+ * Must be called with kvm->arch.mmu_setup_lock held and
+ * mmu_ready = 0 and no vcpus running.
+ */
 int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
 {
 	int err;
@@ -4740,6 +4753,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	char buf[32];
 	int ret;
 
+	mutex_init(&kvm->arch.mmu_setup_lock);
+
 	/* Allocate the guest's logical partition ID */
 
 	lpid = kvmppc_alloc_lpid();
@@ -5265,7 +5280,7 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
 	if (kvmhv_on_pseries() && !radix)
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.mmu_setup_lock);
 	if (radix != kvm_is_radix(kvm)) {
 		if (kvm->arch.mmu_ready) {
 			kvm->arch.mmu_ready = 0;
@@ -5293,7 +5308,7 @@ static int kvmhv_configure_mmu(struct kvm *kvm, struct kvm_ppc_mmuv3_cfg *cfg)
 	err = 0;
 
  out_unlock:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.mmu_setup_lock);
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-23  6:36   ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:36 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the Book 3S KVM code uses kvm->lock to synchronize access
to the kvm->arch.rtas_tokens list.  Because this list is scanned
inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
taking kvm->lock cause a lock inversion problem, which could lead to
a deadlock.

To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
inside the vcpu mutexes, and use that instead of kvm->lock when
accessing the rtas token list.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  1 +
 arch/powerpc/kvm/book3s_rtas.c      | 14 +++++++-------
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 26b3ce4..d10df67 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -309,6 +309,7 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	struct mutex rtas_token_lock;
 	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 #endif
 #ifdef CONFIG_KVM_MPIC
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 61a212d..ac56648 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 #ifdef CONFIG_PPC64
 	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
+	mutex_init(&kvm->arch.rtas_token_lock);
 #endif
 
 	return kvm->arch.kvm_ops->init_vm(kvm);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 4e178c4..47279a5 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		if (rtas_name_matches(d->handler->name, name)) {
@@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
 	bool found;
 	int i;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
 		if (d->token == token)
@@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.rtas_token_lock);
 
 	if (args.token)
 		rc = rtas_token_define(kvm, args.name, args.token);
 	else
 		rc = rtas_token_undefine(kvm, args.name);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.rtas_token_lock);
 
 	return rc;
 }
@@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	orig_rets = args.rets;
 	args.rets = &args.args[be32_to_cpu(args.nargs)];
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
 
 	rc = -ENOENT;
 	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
@@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
 
 	if (rc == 0) {
 		args.rets = orig_rets;
@@ -282,7 +282,7 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		list_del(&d->list);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
@ 2019-05-23  6:36   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:36 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the Book 3S KVM code uses kvm->lock to synchronize access
to the kvm->arch.rtas_tokens list.  Because this list is scanned
inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
taking kvm->lock cause a lock inversion problem, which could lead to
a deadlock.

To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
inside the vcpu mutexes, and use that instead of kvm->lock when
accessing the rtas token list.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  1 +
 arch/powerpc/kvm/book3s_rtas.c      | 14 +++++++-------
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 26b3ce4..d10df67 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -309,6 +309,7 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	struct mutex rtas_token_lock;
 	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 #endif
 #ifdef CONFIG_KVM_MPIC
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 61a212d..ac56648 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 #ifdef CONFIG_PPC64
 	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
+	mutex_init(&kvm->arch.rtas_token_lock);
 #endif
 
 	return kvm->arch.kvm_ops->init_vm(kvm);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 4e178c4..47279a5 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		if (rtas_name_matches(d->handler->name, name)) {
@@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
 	bool found;
 	int i;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
 		if (d->token = token)
@@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.rtas_token_lock);
 
 	if (args.token)
 		rc = rtas_token_define(kvm, args.name, args.token);
 	else
 		rc = rtas_token_undefine(kvm, args.name);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.rtas_token_lock);
 
 	return rc;
 }
@@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	orig_rets = args.rets;
 	args.rets = &args.args[be32_to_cpu(args.nargs)];
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
 
 	rc = -ENOENT;
 	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
@@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
 
 	if (rc = 0) {
 		args.rets = orig_rets;
@@ -282,7 +282,7 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		list_del(&d->list);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-23  6:36   ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:36 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the HV KVM code takes the kvm->lock around calls to
kvm_for_each_vcpu() and kvm_get_vcpu_by_id() (which can call
kvm_for_each_vcpu() internally).  However, that leads to a lock
order inversion problem, because these are called in contexts where
the vcpu mutex is held, but the vcpu mutexes nest within kvm->lock
according to Documentation/virtual/kvm/locking.txt.  Hence there
is a possibility of deadlock.

To fix this, we simply don't take the kvm->lock mutex around these
calls.  This is safe because the implementations of kvm_for_each_vcpu()
and kvm_get_vcpu_by_id() have been designed to be able to be called
locklessly.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b1c0a9b..27054d3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -446,12 +446,7 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 
 static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
-	struct kvm_vcpu *ret;
-
-	mutex_lock(&kvm->lock);
-	ret = kvm_get_vcpu_by_id(kvm, id);
-	mutex_unlock(&kvm->lock);
-	return ret;
+	return kvm_get_vcpu_by_id(kvm, id);
 }
 
 static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
@@ -1583,7 +1578,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	u64 mask;
 
-	mutex_lock(&kvm->lock);
 	spin_lock(&vc->lock);
 	/*
 	 * If ILE (interrupt little-endian) has changed, update the
@@ -1623,7 +1617,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		mask &= 0xFFFFFFFF;
 	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
 	spin_unlock(&vc->lock);
-	mutex_unlock(&kvm->lock);
 }
 
 static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
@ 2019-05-23  6:36   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  6:36 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the HV KVM code takes the kvm->lock around calls to
kvm_for_each_vcpu() and kvm_get_vcpu_by_id() (which can call
kvm_for_each_vcpu() internally).  However, that leads to a lock
order inversion problem, because these are called in contexts where
the vcpu mutex is held, but the vcpu mutexes nest within kvm->lock
according to Documentation/virtual/kvm/locking.txt.  Hence there
is a possibility of deadlock.

To fix this, we simply don't take the kvm->lock mutex around these
calls.  This is safe because the implementations of kvm_for_each_vcpu()
and kvm_get_vcpu_by_id() have been designed to be able to be called
locklessly.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_hv.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b1c0a9b..27054d3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -446,12 +446,7 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 
 static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
-	struct kvm_vcpu *ret;
-
-	mutex_lock(&kvm->lock);
-	ret = kvm_get_vcpu_by_id(kvm, id);
-	mutex_unlock(&kvm->lock);
-	return ret;
+	return kvm_get_vcpu_by_id(kvm, id);
 }
 
 static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
@@ -1583,7 +1578,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	u64 mask;
 
-	mutex_lock(&kvm->lock);
 	spin_lock(&vc->lock);
 	/*
 	 * If ILE (interrupt little-endian) has changed, update the
@@ -1623,7 +1617,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		mask &= 0xFFFFFFFF;
 	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
 	spin_unlock(&vc->lock);
-	mutex_unlock(&kvm->lock);
 }
 
 static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
  2019-05-23  6:36   ` Paul Mackerras
@ 2019-05-23  7:11     ` Cédric Le Goater
  -1 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:11 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:36 AM, Paul Mackerras wrote:
> Currently the HV KVM code takes the kvm->lock around calls to
> kvm_for_each_vcpu() and kvm_get_vcpu_by_id() (which can call
> kvm_for_each_vcpu() internally).  However, that leads to a lock
> order inversion problem, because these are called in contexts where
> the vcpu mutex is held, but the vcpu mutexes nest within kvm->lock
> according to Documentation/virtual/kvm/locking.txt.  Hence there
> is a possibility of deadlock.
> 
> To fix this, we simply don't take the kvm->lock mutex around these
> calls.  This is safe because the implementations of kvm_for_each_vcpu()
> and kvm_get_vcpu_by_id() have been designed to be able to be called
> locklessly.

Yes.

Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index b1c0a9b..27054d3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -446,12 +446,7 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
>  
>  static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
>  {
> -	struct kvm_vcpu *ret;
> -
> -	mutex_lock(&kvm->lock);
> -	ret = kvm_get_vcpu_by_id(kvm, id);
> -	mutex_unlock(&kvm->lock);
> -	return ret;
> +	return kvm_get_vcpu_by_id(kvm, id);
>  }
>  
>  static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
> @@ -1583,7 +1578,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
>  	struct kvmppc_vcore *vc = vcpu->arch.vcore;
>  	u64 mask;
>  
> -	mutex_lock(&kvm->lock);
>  	spin_lock(&vc->lock);
>  	/*
>  	 * If ILE (interrupt little-endian) has changed, update the
> @@ -1623,7 +1617,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
>  		mask &= 0xFFFFFFFF;
>  	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
>  	spin_unlock(&vc->lock);
> -	mutex_unlock(&kvm->lock);
>  }
>  
>  static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
@ 2019-05-23  7:11     ` Cédric Le Goater
  0 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:11 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:36 AM, Paul Mackerras wrote:
> Currently the HV KVM code takes the kvm->lock around calls to
> kvm_for_each_vcpu() and kvm_get_vcpu_by_id() (which can call
> kvm_for_each_vcpu() internally).  However, that leads to a lock
> order inversion problem, because these are called in contexts where
> the vcpu mutex is held, but the vcpu mutexes nest within kvm->lock
> according to Documentation/virtual/kvm/locking.txt.  Hence there
> is a possibility of deadlock.
> 
> To fix this, we simply don't take the kvm->lock mutex around these
> calls.  This is safe because the implementations of kvm_for_each_vcpu()
> and kvm_get_vcpu_by_id() have been designed to be able to be called
> locklessly.

Yes.

Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index b1c0a9b..27054d3 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -446,12 +446,7 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
>  
>  static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
>  {
> -	struct kvm_vcpu *ret;
> -
> -	mutex_lock(&kvm->lock);
> -	ret = kvm_get_vcpu_by_id(kvm, id);
> -	mutex_unlock(&kvm->lock);
> -	return ret;
> +	return kvm_get_vcpu_by_id(kvm, id);
>  }
>  
>  static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
> @@ -1583,7 +1578,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
>  	struct kvmppc_vcore *vc = vcpu->arch.vcore;
>  	u64 mask;
>  
> -	mutex_lock(&kvm->lock);
>  	spin_lock(&vc->lock);
>  	/*
>  	 * If ILE (interrupt little-endian) has changed, update the
> @@ -1623,7 +1617,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
>  		mask &= 0xFFFFFFFF;
>  	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
>  	spin_unlock(&vc->lock);
> -	mutex_unlock(&kvm->lock);
>  }
>  
>  static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
  2019-05-23  6:36   ` Paul Mackerras
@ 2019-05-23  7:11     ` Cédric Le Goater
  -1 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:11 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:36 AM, Paul Mackerras wrote:
> Currently the Book 3S KVM code uses kvm->lock to synchronize access
> to the kvm->arch.rtas_tokens list.  Because this list is scanned
> inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
> taking kvm->lock cause a lock inversion problem, which could lead to
> a deadlock.
> 
> To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
> inside the vcpu mutexes, and use that instead of kvm->lock when
> accessing the rtas token list.

We still need to remove the use of the kvm->lock in the RTAS call 
"set-xive" doing the EQ provisioning for all the vCPUs of the VM.  
I am looking at that part.

> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

> ---
>  arch/powerpc/include/asm/kvm_host.h |  1 +
>  arch/powerpc/kvm/book3s.c           |  1 +
>  arch/powerpc/kvm/book3s_rtas.c      | 14 +++++++-------
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 26b3ce4..d10df67 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -309,6 +309,7 @@ struct kvm_arch {
>  #ifdef CONFIG_PPC_BOOK3S_64
>  	struct list_head spapr_tce_tables;
>  	struct list_head rtas_tokens;
> +	struct mutex rtas_token_lock;
>  	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
>  #endif
>  #ifdef CONFIG_KVM_MPIC
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 61a212d..ac56648 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>  #ifdef CONFIG_PPC64
>  	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
>  	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
> +	mutex_init(&kvm->arch.rtas_token_lock);
>  #endif
>  
>  	return kvm->arch.kvm_ops->init_vm(kvm);
> diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
> index 4e178c4..47279a5 100644
> --- a/arch/powerpc/kvm/book3s_rtas.c
> +++ b/arch/powerpc/kvm/book3s_rtas.c
> @@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
>  {
>  	struct rtas_token_definition *d, *tmp;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
>  		if (rtas_name_matches(d->handler->name, name)) {
> @@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
>  	bool found;
>  	int i;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
>  		if (d->token == token)
> @@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
>  	if (copy_from_user(&args, argp, sizeof(args)))
>  		return -EFAULT;
>  
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.rtas_token_lock);
>  
>  	if (args.token)
>  		rc = rtas_token_define(kvm, args.name, args.token);
>  	else
>  		rc = rtas_token_undefine(kvm, args.name);
>  
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.rtas_token_lock);
>  
>  	return rc;
>  }
> @@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
>  	orig_rets = args.rets;
>  	args.rets = &args.args[be32_to_cpu(args.nargs)];
>  
> -	mutex_lock(&vcpu->kvm->lock);
> +	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
>  
>  	rc = -ENOENT;
>  	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
> @@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	mutex_unlock(&vcpu->kvm->lock);
> +	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
>  
>  	if (rc == 0) {
>  		args.rets = orig_rets;
> @@ -282,7 +282,7 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
>  {
>  	struct rtas_token_definition *d, *tmp;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
>  		list_del(&d->list);
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
@ 2019-05-23  7:11     ` Cédric Le Goater
  0 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:11 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:36 AM, Paul Mackerras wrote:
> Currently the Book 3S KVM code uses kvm->lock to synchronize access
> to the kvm->arch.rtas_tokens list.  Because this list is scanned
> inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
> taking kvm->lock cause a lock inversion problem, which could lead to
> a deadlock.
> 
> To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
> inside the vcpu mutexes, and use that instead of kvm->lock when
> accessing the rtas token list.

We still need to remove the use of the kvm->lock in the RTAS call 
"set-xive" doing the EQ provisioning for all the vCPUs of the VM.  
I am looking at that part.

> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks,

C.

> ---
>  arch/powerpc/include/asm/kvm_host.h |  1 +
>  arch/powerpc/kvm/book3s.c           |  1 +
>  arch/powerpc/kvm/book3s_rtas.c      | 14 +++++++-------
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 26b3ce4..d10df67 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -309,6 +309,7 @@ struct kvm_arch {
>  #ifdef CONFIG_PPC_BOOK3S_64
>  	struct list_head spapr_tce_tables;
>  	struct list_head rtas_tokens;
> +	struct mutex rtas_token_lock;
>  	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
>  #endif
>  #ifdef CONFIG_KVM_MPIC
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 61a212d..ac56648 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>  #ifdef CONFIG_PPC64
>  	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
>  	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
> +	mutex_init(&kvm->arch.rtas_token_lock);
>  #endif
>  
>  	return kvm->arch.kvm_ops->init_vm(kvm);
> diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
> index 4e178c4..47279a5 100644
> --- a/arch/powerpc/kvm/book3s_rtas.c
> +++ b/arch/powerpc/kvm/book3s_rtas.c
> @@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
>  {
>  	struct rtas_token_definition *d, *tmp;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
>  		if (rtas_name_matches(d->handler->name, name)) {
> @@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
>  	bool found;
>  	int i;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
>  		if (d->token = token)
> @@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
>  	if (copy_from_user(&args, argp, sizeof(args)))
>  		return -EFAULT;
>  
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.rtas_token_lock);
>  
>  	if (args.token)
>  		rc = rtas_token_define(kvm, args.name, args.token);
>  	else
>  		rc = rtas_token_undefine(kvm, args.name);
>  
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.rtas_token_lock);
>  
>  	return rc;
>  }
> @@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
>  	orig_rets = args.rets;
>  	args.rets = &args.args[be32_to_cpu(args.nargs)];
>  
> -	mutex_lock(&vcpu->kvm->lock);
> +	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
>  
>  	rc = -ENOENT;
>  	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
> @@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	mutex_unlock(&vcpu->kvm->lock);
> +	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
>  
>  	if (rc = 0) {
>  		args.rets = orig_rets;
> @@ -282,7 +282,7 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
>  {
>  	struct rtas_token_definition *d, *tmp;
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.rtas_token_lock);
>  
>  	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
>  		list_del(&d->list);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-23  7:21   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2019-05-23  7:21 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc, Cédric Le Goater



On 23/05/2019 16:34, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.


I applied these 4, compiled, installed, rebooted, tried running a guest
(which failed because I also updated QEMU and its cli has changed), got
this. So VM was created and then destroyed without executing a single
instruction, if that matters.


systemd-journald[1278]: Successfully sent stream file descriptor to
service manager.
systemd-journald[1278]: Successfully sent stream file descriptor to
service manager.
WARNING: CPU: 3 PID: 7697 at arch/powerpc/kvm/book3s_rtas.c:285
kvmppc_rtas_tokens_free+0x100/0x108
[kvm]

Modules linked in: bridge stp llc kvm_hv kvm rpcrdma ib_iser ib_srp
rdma_ucm ib_umad sunrpc rdma_cm
ib_ipoib iw_cm libiscsi ib_cm scsi_transport_iscsi mlx5_ib ib_uverbs
ib_core vmx_crypto crct10dif_vp
msum crct10dif_common at24 sg xfs libcrc32c crc32c_vpmsum mlx5_core
mlxfw autofs4
CPU: 3 PID: 7697 Comm: qemu-kvm Not tainted
5.2.0-rc1-le_nv2_aikATfstn1-p1 #496
NIP:  c00800000f3ab678 LR: c00800000f3ab66c CTR: c000000000198210

REGS: c000003fdf873680 TRAP: 0700   Not tainted
(5.2.0-rc1-le_nv2_aikATfstn1-p1)
MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
24008882  XER: 20040000
CFAR: c000000000198364 IRQMASK: 0

GPR00: c00800000f3ab66c c000003fdf873910 c00800000f3d8f00
0000000000000000
GPR04: c000003f4c984630 0000000000000000 00000000d5952dee
0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000
c00800000f3b95c0
GPR12: c000000000198210 c000003fffffbb80 00007ffff20d57a8
0000000000000000
GPR16: 0000000000000000 0000000000000008 0000000119dcd0c0
0000000000000001
GPR20: c000003f4c98f530 c000003f4c980098 c000003f4c984188
0000000000000000
GPR24: c000003f4c98a740 0000000000000000 c000003fdf8dd200
c00800000f3d1c18
GPR28: c000003f4c98a858 c000003f4c980000 c000003f4c9840c8
c000003f4c980000
NIP [c00800000f3ab678] kvmppc_rtas_tokens_free+0x100/0x108 [kvm]

LR [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108 [kvm]

Call Trace:

[c000003fdf873910] [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108
[kvm] (unreliable)
[c000003fdf873960] [c00800000f3aa640] kvmppc_core_destroy_vm+0x48/0xa8
[kvm]
[c000003fdf873990] [c00800000f3a4b08] kvm_arch_destroy_vm+0x130/0x190
[kvm]
[c000003fdf8739d0] [c00800000f3985dc] kvm_put_kvm+0x204/0x500 [kvm]

[c000003fdf873a60] [c00800000f398910] kvm_vm_release+0x38/0x60 [kvm]

[c000003fdf873a90] [c0000000004345fc] __fput+0xcc/0x2f0

[c000003fdf873af0] [c000000000139318] task_work_run+0x108/0x150

[c000003fdf873b30] [c000000000109408] do_exit+0x438/0xe10

[c000003fdf873c00] [c000000000109eb0] do_group_exit+0x60/0xe0

[c000003fdf873c40] [c00000000011ea24] get_signal+0x1b4/0xce0

[c000003fdf873d30] [c000000000025ea8] do_notify_resume+0x1a8/0x430

[c000003fdf873e20] [c00000000000e444] ret_from_except_lite+0x70/0x74

Instruction dump:

ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 3880ffff 38634630

4800df59 e8410018 2fa30000 409eff44 <0fe00000> 4bffff3c 7c0802a6
60000000
irq event stamp: 114938

hardirqs last  enabled at (114937): [<c0000000003ab8f4>]
free_unref_page+0xd4/0x100
hardirqs last disabled at (114938): [<c000000000009060>]
program_check_common+0x170/0x180
softirqs last  enabled at (114424): [<c000000000a06bdc>]
peernet2id+0x6c/0xb0
softirqs last disabled at (114422): [<c000000000a06bb4>]
peernet2id+0x44/0xb0
---[ end trace c33c9599a1a73dd2 ]---

systemd-journald[1278]: Compressed data object 650 -> 280 using XZ

tun: Universal TUN/TAP device driver, 1.6

virbr0: port 1(virbr0-nic) entered blocking state





-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-23  7:21   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 28+ messages in thread
From: Alexey Kardashevskiy @ 2019-05-23  7:21 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc, Cédric Le Goater



On 23/05/2019 16:34, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.


I applied these 4, compiled, installed, rebooted, tried running a guest
(which failed because I also updated QEMU and its cli has changed), got
this. So VM was created and then destroyed without executing a single
instruction, if that matters.


systemd-journald[1278]: Successfully sent stream file descriptor to
service manager.
systemd-journald[1278]: Successfully sent stream file descriptor to
service manager.
WARNING: CPU: 3 PID: 7697 at arch/powerpc/kvm/book3s_rtas.c:285
kvmppc_rtas_tokens_free+0x100/0x108
[kvm]

Modules linked in: bridge stp llc kvm_hv kvm rpcrdma ib_iser ib_srp
rdma_ucm ib_umad sunrpc rdma_cm
ib_ipoib iw_cm libiscsi ib_cm scsi_transport_iscsi mlx5_ib ib_uverbs
ib_core vmx_crypto crct10dif_vp
msum crct10dif_common at24 sg xfs libcrc32c crc32c_vpmsum mlx5_core
mlxfw autofs4
CPU: 3 PID: 7697 Comm: qemu-kvm Not tainted
5.2.0-rc1-le_nv2_aikATfstn1-p1 #496
NIP:  c00800000f3ab678 LR: c00800000f3ab66c CTR: c000000000198210

REGS: c000003fdf873680 TRAP: 0700   Not tainted
(5.2.0-rc1-le_nv2_aikATfstn1-p1)
MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
24008882  XER: 20040000
CFAR: c000000000198364 IRQMASK: 0

GPR00: c00800000f3ab66c c000003fdf873910 c00800000f3d8f00
0000000000000000
GPR04: c000003f4c984630 0000000000000000 00000000d5952dee
0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000
c00800000f3b95c0
GPR12: c000000000198210 c000003fffffbb80 00007ffff20d57a8
0000000000000000
GPR16: 0000000000000000 0000000000000008 0000000119dcd0c0
0000000000000001
GPR20: c000003f4c98f530 c000003f4c980098 c000003f4c984188
0000000000000000
GPR24: c000003f4c98a740 0000000000000000 c000003fdf8dd200
c00800000f3d1c18
GPR28: c000003f4c98a858 c000003f4c980000 c000003f4c9840c8
c000003f4c980000
NIP [c00800000f3ab678] kvmppc_rtas_tokens_free+0x100/0x108 [kvm]

LR [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108 [kvm]

Call Trace:

[c000003fdf873910] [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108
[kvm] (unreliable)
[c000003fdf873960] [c00800000f3aa640] kvmppc_core_destroy_vm+0x48/0xa8
[kvm]
[c000003fdf873990] [c00800000f3a4b08] kvm_arch_destroy_vm+0x130/0x190
[kvm]
[c000003fdf8739d0] [c00800000f3985dc] kvm_put_kvm+0x204/0x500 [kvm]

[c000003fdf873a60] [c00800000f398910] kvm_vm_release+0x38/0x60 [kvm]

[c000003fdf873a90] [c0000000004345fc] __fput+0xcc/0x2f0

[c000003fdf873af0] [c000000000139318] task_work_run+0x108/0x150

[c000003fdf873b30] [c000000000109408] do_exit+0x438/0xe10

[c000003fdf873c00] [c000000000109eb0] do_group_exit+0x60/0xe0

[c000003fdf873c40] [c00000000011ea24] get_signal+0x1b4/0xce0

[c000003fdf873d30] [c000000000025ea8] do_notify_resume+0x1a8/0x430

[c000003fdf873e20] [c00000000000e444] ret_from_except_lite+0x70/0x74

Instruction dump:

ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 3880ffff 38634630

4800df59 e8410018 2fa30000 409eff44 <0fe00000> 4bffff3c 7c0802a6
60000000
irq event stamp: 114938

hardirqs last  enabled at (114937): [<c0000000003ab8f4>]
free_unref_page+0xd4/0x100
hardirqs last disabled at (114938): [<c000000000009060>]
program_check_common+0x170/0x180
softirqs last  enabled at (114424): [<c000000000a06bdc>]
peernet2id+0x6c/0xb0
softirqs last disabled at (114422): [<c000000000a06bb4>]
peernet2id+0x44/0xb0
---[ end trace c33c9599a1a73dd2 ]---

systemd-journald[1278]: Compressed data object 650 -> 280 using XZ

tun: Universal TUN/TAP device driver, 1.6

virbr0: port 1(virbr0-nic) entered blocking state





-- 
Alexey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-23  7:21   ` Alexey Kardashevskiy
@ 2019-05-23  7:41     ` Cédric Le Goater
  -1 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 9:21 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/05/2019 16:34, Paul Mackerras wrote:
>> Recent reports of lockdep splats in the HV KVM code revealed that it
>> was taking the kvm->lock mutex in several contexts where a vcpu mutex
>> was already held.  Lockdep has only started warning since I added code
>> to take the vcpu mutexes in the XIVE device release functions, but
>> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
>> mutexes nest inside kvm->lock, it seems that the new code is correct
>> and it is most of the old uses of kvm->lock that are wrong.
>>
>> This series should fix the problems, by adding new mutexes that nest
>> inside the vcpu mutexes and using them instead of kvm->lock.
> 
> 
> I applied these 4, compiled, installed, rebooted, tried running a guest
> (which failed because I also updated QEMU and its cli has changed), got
> this. So VM was created and then destroyed without executing a single
> instruction, if that matters.


kvm->lock is still held when  kvmppc_rtas_tokens_free() is called
but kvm->arch.rtas_token_lock isn't.

May be we should change the lockdep annotation to what is was
before ? 

C. 


> 
> 
> systemd-journald[1278]: Successfully sent stream file descriptor to
> service manager.
> systemd-journald[1278]: Successfully sent stream file descriptor to
> service manager.
> WARNING: CPU: 3 PID: 7697 at arch/powerpc/kvm/book3s_rtas.c:285
> kvmppc_rtas_tokens_free+0x100/0x108
> [kvm]
> 
> Modules linked in: bridge stp llc kvm_hv kvm rpcrdma ib_iser ib_srp
> rdma_ucm ib_umad sunrpc rdma_cm
> ib_ipoib iw_cm libiscsi ib_cm scsi_transport_iscsi mlx5_ib ib_uverbs
> ib_core vmx_crypto crct10dif_vp
> msum crct10dif_common at24 sg xfs libcrc32c crc32c_vpmsum mlx5_core
> mlxfw autofs4
> CPU: 3 PID: 7697 Comm: qemu-kvm Not tainted
> 5.2.0-rc1-le_nv2_aikATfstn1-p1 #496
> NIP:  c00800000f3ab678 LR: c00800000f3ab66c CTR: c000000000198210
> 
> REGS: c000003fdf873680 TRAP: 0700   Not tainted
> (5.2.0-rc1-le_nv2_aikATfstn1-p1)
> MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 24008882  XER: 20040000
> CFAR: c000000000198364 IRQMASK: 0
> 
> GPR00: c00800000f3ab66c c000003fdf873910 c00800000f3d8f00
> 0000000000000000
> GPR04: c000003f4c984630 0000000000000000 00000000d5952dee
> 0000000000000000
> GPR08: 0000000000000000 0000000000000000 0000000000000000
> c00800000f3b95c0
> GPR12: c000000000198210 c000003fffffbb80 00007ffff20d57a8
> 0000000000000000
> GPR16: 0000000000000000 0000000000000008 0000000119dcd0c0
> 0000000000000001
> GPR20: c000003f4c98f530 c000003f4c980098 c000003f4c984188
> 0000000000000000
> GPR24: c000003f4c98a740 0000000000000000 c000003fdf8dd200
> c00800000f3d1c18
> GPR28: c000003f4c98a858 c000003f4c980000 c000003f4c9840c8
> c000003f4c980000
> NIP [c00800000f3ab678] kvmppc_rtas_tokens_free+0x100/0x108 [kvm]
> 
> LR [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108 [kvm]
> 
> Call Trace:
> 
> [c000003fdf873910] [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108
> [kvm] (unreliable)
> [c000003fdf873960] [c00800000f3aa640] kvmppc_core_destroy_vm+0x48/0xa8
> [kvm]
> [c000003fdf873990] [c00800000f3a4b08] kvm_arch_destroy_vm+0x130/0x190
> [kvm]
> [c000003fdf8739d0] [c00800000f3985dc] kvm_put_kvm+0x204/0x500 [kvm]
> 
> [c000003fdf873a60] [c00800000f398910] kvm_vm_release+0x38/0x60 [kvm]
> 
> [c000003fdf873a90] [c0000000004345fc] __fput+0xcc/0x2f0
> 
> [c000003fdf873af0] [c000000000139318] task_work_run+0x108/0x150
> 
> [c000003fdf873b30] [c000000000109408] do_exit+0x438/0xe10
> 
> [c000003fdf873c00] [c000000000109eb0] do_group_exit+0x60/0xe0
> 
> [c000003fdf873c40] [c00000000011ea24] get_signal+0x1b4/0xce0
> 
> [c000003fdf873d30] [c000000000025ea8] do_notify_resume+0x1a8/0x430
> 
> [c000003fdf873e20] [c00000000000e444] ret_from_except_lite+0x70/0x74
> 
> Instruction dump:
> 
> ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 3880ffff 38634630
> 
> 4800df59 e8410018 2fa30000 409eff44 <0fe00000> 4bffff3c 7c0802a6
> 60000000
> irq event stamp: 114938
> 
> hardirqs last  enabled at (114937): [<c0000000003ab8f4>]
> free_unref_page+0xd4/0x100
> hardirqs last disabled at (114938): [<c000000000009060>]
> program_check_common+0x170/0x180
> softirqs last  enabled at (114424): [<c000000000a06bdc>]
> peernet2id+0x6c/0xb0
> softirqs last disabled at (114422): [<c000000000a06bb4>]
> peernet2id+0x44/0xb0
> ---[ end trace c33c9599a1a73dd2 ]---
> 
> systemd-journald[1278]: Compressed data object 650 -> 280 using XZ
> 
> tun: Universal TUN/TAP device driver, 1.6
> 
> virbr0: port 1(virbr0-nic) entered blocking state
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-23  7:41     ` Cédric Le Goater
  0 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-23  7:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 9:21 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/05/2019 16:34, Paul Mackerras wrote:
>> Recent reports of lockdep splats in the HV KVM code revealed that it
>> was taking the kvm->lock mutex in several contexts where a vcpu mutex
>> was already held.  Lockdep has only started warning since I added code
>> to take the vcpu mutexes in the XIVE device release functions, but
>> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
>> mutexes nest inside kvm->lock, it seems that the new code is correct
>> and it is most of the old uses of kvm->lock that are wrong.
>>
>> This series should fix the problems, by adding new mutexes that nest
>> inside the vcpu mutexes and using them instead of kvm->lock.
> 
> 
> I applied these 4, compiled, installed, rebooted, tried running a guest
> (which failed because I also updated QEMU and its cli has changed), got
> this. So VM was created and then destroyed without executing a single
> instruction, if that matters.


kvm->lock is still held when  kvmppc_rtas_tokens_free() is called
but kvm->arch.rtas_token_lock isn't.

May be we should change the lockdep annotation to what is was
before ? 

C. 


> 
> 
> systemd-journald[1278]: Successfully sent stream file descriptor to
> service manager.
> systemd-journald[1278]: Successfully sent stream file descriptor to
> service manager.
> WARNING: CPU: 3 PID: 7697 at arch/powerpc/kvm/book3s_rtas.c:285
> kvmppc_rtas_tokens_free+0x100/0x108
> [kvm]
> 
> Modules linked in: bridge stp llc kvm_hv kvm rpcrdma ib_iser ib_srp
> rdma_ucm ib_umad sunrpc rdma_cm
> ib_ipoib iw_cm libiscsi ib_cm scsi_transport_iscsi mlx5_ib ib_uverbs
> ib_core vmx_crypto crct10dif_vp
> msum crct10dif_common at24 sg xfs libcrc32c crc32c_vpmsum mlx5_core
> mlxfw autofs4
> CPU: 3 PID: 7697 Comm: qemu-kvm Not tainted
> 5.2.0-rc1-le_nv2_aikATfstn1-p1 #496
> NIP:  c00800000f3ab678 LR: c00800000f3ab66c CTR: c000000000198210
> 
> REGS: c000003fdf873680 TRAP: 0700   Not tainted
> (5.2.0-rc1-le_nv2_aikATfstn1-p1)
> MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 24008882  XER: 20040000
> CFAR: c000000000198364 IRQMASK: 0
> 
> GPR00: c00800000f3ab66c c000003fdf873910 c00800000f3d8f00
> 0000000000000000
> GPR04: c000003f4c984630 0000000000000000 00000000d5952dee
> 0000000000000000
> GPR08: 0000000000000000 0000000000000000 0000000000000000
> c00800000f3b95c0
> GPR12: c000000000198210 c000003fffffbb80 00007ffff20d57a8
> 0000000000000000
> GPR16: 0000000000000000 0000000000000008 0000000119dcd0c0
> 0000000000000001
> GPR20: c000003f4c98f530 c000003f4c980098 c000003f4c984188
> 0000000000000000
> GPR24: c000003f4c98a740 0000000000000000 c000003fdf8dd200
> c00800000f3d1c18
> GPR28: c000003f4c98a858 c000003f4c980000 c000003f4c9840c8
> c000003f4c980000
> NIP [c00800000f3ab678] kvmppc_rtas_tokens_free+0x100/0x108 [kvm]
> 
> LR [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108 [kvm]
> 
> Call Trace:
> 
> [c000003fdf873910] [c00800000f3ab66c] kvmppc_rtas_tokens_free+0xf4/0x108
> [kvm] (unreliable)
> [c000003fdf873960] [c00800000f3aa640] kvmppc_core_destroy_vm+0x48/0xa8
> [kvm]
> [c000003fdf873990] [c00800000f3a4b08] kvm_arch_destroy_vm+0x130/0x190
> [kvm]
> [c000003fdf8739d0] [c00800000f3985dc] kvm_put_kvm+0x204/0x500 [kvm]
> 
> [c000003fdf873a60] [c00800000f398910] kvm_vm_release+0x38/0x60 [kvm]
> 
> [c000003fdf873a90] [c0000000004345fc] __fput+0xcc/0x2f0
> 
> [c000003fdf873af0] [c000000000139318] task_work_run+0x108/0x150
> 
> [c000003fdf873b30] [c000000000109408] do_exit+0x438/0xe10
> 
> [c000003fdf873c00] [c000000000109eb0] do_group_exit+0x60/0xe0
> 
> [c000003fdf873c40] [c00000000011ea24] get_signal+0x1b4/0xce0
> 
> [c000003fdf873d30] [c000000000025ea8] do_notify_resume+0x1a8/0x430
> 
> [c000003fdf873e20] [c00000000000e444] ret_from_except_lite+0x70/0x74
> 
> Instruction dump:
> 
> ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 3880ffff 38634630
> 
> 4800df59 e8410018 2fa30000 409eff44 <0fe00000> 4bffff3c 7c0802a6
> 60000000
> irq event stamp: 114938
> 
> hardirqs last  enabled at (114937): [<c0000000003ab8f4>]
> free_unref_page+0xd4/0x100
> hardirqs last disabled at (114938): [<c000000000009060>]
> program_check_common+0x170/0x180
> softirqs last  enabled at (114424): [<c000000000a06bdc>]
> peernet2id+0x6c/0xb0
> softirqs last disabled at (114422): [<c000000000a06bb4>]
> peernet2id+0x44/0xb0
> ---[ end trace c33c9599a1a73dd2 ]---
> 
> systemd-journald[1278]: Compressed data object 650 -> 280 using XZ
> 
> tun: Universal TUN/TAP device driver, 1.6
> 
> virbr0: port 1(virbr0-nic) entered blocking state
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-23  7:21   ` Alexey Kardashevskiy
@ 2019-05-23  8:31     ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  8:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: kvm, kvm-ppc, Cédric Le Goater

On Thu, May 23, 2019 at 05:21:00PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 23/05/2019 16:34, Paul Mackerras wrote:
> > Recent reports of lockdep splats in the HV KVM code revealed that it
> > was taking the kvm->lock mutex in several contexts where a vcpu mutex
> > was already held.  Lockdep has only started warning since I added code
> > to take the vcpu mutexes in the XIVE device release functions, but
> > since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> > mutexes nest inside kvm->lock, it seems that the new code is correct
> > and it is most of the old uses of kvm->lock that are wrong.
> > 
> > This series should fix the problems, by adding new mutexes that nest
> > inside the vcpu mutexes and using them instead of kvm->lock.
> 
> 
> I applied these 4, compiled, installed, rebooted, tried running a guest
> (which failed because I also updated QEMU and its cli has changed), got
> this. So VM was created and then destroyed without executing a single
> instruction, if that matters.

Looks like I need to remove the
	lockdep_assert_held(&kvm->arch.rtas_token_lock);

in kvmppc_rtas_tokens_free().  We don't have the rtas_token_lock, but
it doesn't matter because we are destroying the VM and nothing else
has a reference to it by now.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-23  8:31     ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-23  8:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: kvm, kvm-ppc, Cédric Le Goater

On Thu, May 23, 2019 at 05:21:00PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 23/05/2019 16:34, Paul Mackerras wrote:
> > Recent reports of lockdep splats in the HV KVM code revealed that it
> > was taking the kvm->lock mutex in several contexts where a vcpu mutex
> > was already held.  Lockdep has only started warning since I added code
> > to take the vcpu mutexes in the XIVE device release functions, but
> > since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> > mutexes nest inside kvm->lock, it seems that the new code is correct
> > and it is most of the old uses of kvm->lock that are wrong.
> > 
> > This series should fix the problems, by adding new mutexes that nest
> > inside the vcpu mutexes and using them instead of kvm->lock.
> 
> 
> I applied these 4, compiled, installed, rebooted, tried running a guest
> (which failed because I also updated QEMU and its cli has changed), got
> this. So VM was created and then destroyed without executing a single
> instruction, if that matters.

Looks like I need to remove the
	lockdep_assert_held(&kvm->arch.rtas_token_lock);

in kvmppc_rtas_tokens_free().  We don't have the rtas_token_lock, but
it doesn't matter because we are destroying the VM and nothing else
has a reference to it by now.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-24  9:17   ` Cédric Le Goater
  -1 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-24  9:17 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:34 AM, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.

Hello,

I guest this warning when running a guest with this patchset :

[  228.686461] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
[  228.686480] WARNING: CPU: 116 PID: 3803 at ../kernel/locking/lockdep.c:4219 check_flags.part.23+0x21c/0x270
[  228.686544] Modules linked in: vhost_net vhost xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter fuse kvm_hv kvm at24 ipmi_powernv regmap_i2c ipmi_devintf uio_pdrv_genirq ofpart ipmi_msghandler uio powernv_flash mtd ibmpowernv opal_prd ip_tables ext4 mbcache jbd2 btrfs zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 ses sd_mod enclosure scsi_transport_sas ast i2c_opal i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e e1000e cxl aacraid tg3 drm_panel_orientation_quirks i2c_core
[  228.686859] CPU: 116 PID: 3803 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc1-xive+ #42
[  228.686911] NIP:  c0000000001b394c LR: c0000000001b3948 CTR: c000000000bfad20
[  228.686963] REGS: c000200cdb50f570 TRAP: 0700   Not tainted  (5.2.0-rc1-xive+)
[  228.687001] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
[  228.687060] CFAR: c000000000116db0 IRQMASK: 1 
[  228.687060] GPR00: c0000000001b3948 c000200cdb50f800 c0000000015e7600 000000000000002e 
[  228.687060] GPR04: 0000000000000001 c0000000001c71a0 000000006e655f73 72727563284e4f5f 
[  228.687060] GPR08: 0000200e60680000 0000000000000000 c000200cdb486180 0000000000000000 
[  228.687060] GPR12: 0000000000002000 c000200fff61a680 0000000000000000 00007fffb75c0000 
[  228.687060] GPR16: 0000000000000000 0000000000000000 c0000000017d6900 c000000001124900 
[  228.687060] GPR20: 0000000000000074 c008000006916f68 0000000000000074 0000000000000074 
[  228.687060] GPR24: ffffffffffffffff ffffffffffffffff 0000000000000003 c000200d4b600000 
[  228.687060] GPR28: c000000001627e58 c000000001489908 c000000001627e58 c000000002304de0 
[  228.687377] NIP [c0000000001b394c] check_flags.part.23+0x21c/0x270
[  228.687415] LR [c0000000001b3948] check_flags.part.23+0x218/0x270
[  228.687466] Call Trace:
[  228.687488] [c000200cdb50f800] [c0000000001b3948] check_flags.part.23+0x218/0x270 (unreliable)
[  228.687542] [c000200cdb50f870] [c0000000001b6548] lock_is_held_type+0x188/0x1c0
[  228.687595] [c000200cdb50f8d0] [c0000000001d939c] rcu_read_lock_sched_held+0xdc/0x100
[  228.687646] [c000200cdb50f900] [c0000000001dd704] rcu_note_context_switch+0x304/0x340
[  228.687701] [c000200cdb50f940] [c0080000068fcc58] kvmhv_run_single_vcpu+0xdb0/0x1120 [kvm_hv]
[  228.687756] [c000200cdb50fa20] [c0080000068fd5b0] kvmppc_vcpu_run_hv+0x5e8/0xe40 [kvm_hv]
[  228.687816] [c000200cdb50faf0] [c0080000071797dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  228.687863] [c000200cdb50fb10] [c0080000071755dc] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[  228.687916] [c000200cdb50fba0] [c008000007165ccc] kvm_vcpu_ioctl+0x424/0x838 [kvm]
[  228.687957] [c000200cdb50fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
[  228.687995] [c000200cdb50fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
[  228.688033] [c000200cdb50fe00] [c000000000434768] sys_ioctl+0x28/0x80
[  228.688072] [c000200cdb50fe20] [c00000000000b888] system_call+0x5c/0x70
[  228.688109] Instruction dump:
[  228.688142] 4bf6342d 60000000 0fe00000 e8010080 7c0803a6 4bfffe60 3c82ff87 3c62ff87 
[  228.688196] 388472d0 3863d738 4bf63405 60000000 <0fe00000> 4bffff4c 3c82ff87 3c62ff87 
[  228.688251] irq event stamp: 205
[  228.688287] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
[  228.688344] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
[  228.688412] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
[  228.688464] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
[  228.688513] ---[ end trace eb16f6260022a812 ]---
[  228.688548] possible reason: unannotated irqs-off.
[  228.688571] irq event stamp: 205
[  228.688607] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
[  228.688664] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
[  228.688719] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
[  228.688758] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210



C.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-24  9:17   ` Cédric Le Goater
  0 siblings, 0 replies; 28+ messages in thread
From: Cédric Le Goater @ 2019-05-24  9:17 UTC (permalink / raw)
  To: Paul Mackerras, kvm; +Cc: kvm-ppc

On 5/23/19 8:34 AM, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.

Hello,

I guest this warning when running a guest with this patchset :

[  228.686461] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
[  228.686480] WARNING: CPU: 116 PID: 3803 at ../kernel/locking/lockdep.c:4219 check_flags.part.23+0x21c/0x270
[  228.686544] Modules linked in: vhost_net vhost xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter fuse kvm_hv kvm at24 ipmi_powernv regmap_i2c ipmi_devintf uio_pdrv_genirq ofpart ipmi_msghandler uio powernv_flash mtd ibmpowernv opal_prd ip_tables ext4 mbcache jbd2 btrfs zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 ses sd_mod enclosure scsi_transport_sas ast i2c_opal i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e e1000e cxl aacraid tg3 drm_panel_orientation_quirks i2c_core
[  228.686859] CPU: 116 PID: 3803 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc1-xive+ #42
[  228.686911] NIP:  c0000000001b394c LR: c0000000001b3948 CTR: c000000000bfad20
[  228.686963] REGS: c000200cdb50f570 TRAP: 0700   Not tainted  (5.2.0-rc1-xive+)
[  228.687001] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
[  228.687060] CFAR: c000000000116db0 IRQMASK: 1 
[  228.687060] GPR00: c0000000001b3948 c000200cdb50f800 c0000000015e7600 000000000000002e 
[  228.687060] GPR04: 0000000000000001 c0000000001c71a0 000000006e655f73 72727563284e4f5f 
[  228.687060] GPR08: 0000200e60680000 0000000000000000 c000200cdb486180 0000000000000000 
[  228.687060] GPR12: 0000000000002000 c000200fff61a680 0000000000000000 00007fffb75c0000 
[  228.687060] GPR16: 0000000000000000 0000000000000000 c0000000017d6900 c000000001124900 
[  228.687060] GPR20: 0000000000000074 c008000006916f68 0000000000000074 0000000000000074 
[  228.687060] GPR24: ffffffffffffffff ffffffffffffffff 0000000000000003 c000200d4b600000 
[  228.687060] GPR28: c000000001627e58 c000000001489908 c000000001627e58 c000000002304de0 
[  228.687377] NIP [c0000000001b394c] check_flags.part.23+0x21c/0x270
[  228.687415] LR [c0000000001b3948] check_flags.part.23+0x218/0x270
[  228.687466] Call Trace:
[  228.687488] [c000200cdb50f800] [c0000000001b3948] check_flags.part.23+0x218/0x270 (unreliable)
[  228.687542] [c000200cdb50f870] [c0000000001b6548] lock_is_held_type+0x188/0x1c0
[  228.687595] [c000200cdb50f8d0] [c0000000001d939c] rcu_read_lock_sched_held+0xdc/0x100
[  228.687646] [c000200cdb50f900] [c0000000001dd704] rcu_note_context_switch+0x304/0x340
[  228.687701] [c000200cdb50f940] [c0080000068fcc58] kvmhv_run_single_vcpu+0xdb0/0x1120 [kvm_hv]
[  228.687756] [c000200cdb50fa20] [c0080000068fd5b0] kvmppc_vcpu_run_hv+0x5e8/0xe40 [kvm_hv]
[  228.687816] [c000200cdb50faf0] [c0080000071797dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  228.687863] [c000200cdb50fb10] [c0080000071755dc] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[  228.687916] [c000200cdb50fba0] [c008000007165ccc] kvm_vcpu_ioctl+0x424/0x838 [kvm]
[  228.687957] [c000200cdb50fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
[  228.687995] [c000200cdb50fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
[  228.688033] [c000200cdb50fe00] [c000000000434768] sys_ioctl+0x28/0x80
[  228.688072] [c000200cdb50fe20] [c00000000000b888] system_call+0x5c/0x70
[  228.688109] Instruction dump:
[  228.688142] 4bf6342d 60000000 0fe00000 e8010080 7c0803a6 4bfffe60 3c82ff87 3c62ff87 
[  228.688196] 388472d0 3863d738 4bf63405 60000000 <0fe00000> 4bffff4c 3c82ff87 3c62ff87 
[  228.688251] irq event stamp: 205
[  228.688287] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
[  228.688344] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
[  228.688412] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
[  228.688464] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
[  228.688513] ---[ end trace eb16f6260022a812 ]---
[  228.688548] possible reason: unannotated irqs-off.
[  228.688571] irq event stamp: 205
[  228.688607] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
[  228.688664] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
[  228.688719] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
[  228.688758] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210



C.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-24  9:17   ` Cédric Le Goater
@ 2019-05-28  4:54     ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-28  4:54 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc

On Fri, May 24, 2019 at 11:17:16AM +0200, Cédric Le Goater wrote:
> On 5/23/19 8:34 AM, Paul Mackerras wrote:
> > Recent reports of lockdep splats in the HV KVM code revealed that it
> > was taking the kvm->lock mutex in several contexts where a vcpu mutex
> > was already held.  Lockdep has only started warning since I added code
> > to take the vcpu mutexes in the XIVE device release functions, but
> > since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> > mutexes nest inside kvm->lock, it seems that the new code is correct
> > and it is most of the old uses of kvm->lock that are wrong.
> > 
> > This series should fix the problems, by adding new mutexes that nest
> > inside the vcpu mutexes and using them instead of kvm->lock.
> 
> Hello,
> 
> I guest this warning when running a guest with this patchset :

Looks like we need the equivalent of 3309bec85e60 applied to the
p9/radix streamlined entry path.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-28  4:54     ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-28  4:54 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm, kvm-ppc

On Fri, May 24, 2019 at 11:17:16AM +0200, Cédric Le Goater wrote:
> On 5/23/19 8:34 AM, Paul Mackerras wrote:
> > Recent reports of lockdep splats in the HV KVM code revealed that it
> > was taking the kvm->lock mutex in several contexts where a vcpu mutex
> > was already held.  Lockdep has only started warning since I added code
> > to take the vcpu mutexes in the XIVE device release functions, but
> > since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> > mutexes nest inside kvm->lock, it seems that the new code is correct
> > and it is most of the old uses of kvm->lock that are wrong.
> > 
> > This series should fix the problems, by adding new mutexes that nest
> > inside the vcpu mutexes and using them instead of kvm->lock.
> 
> Hello,
> 
> I guest this warning when running a guest with this patchset :

Looks like we need the equivalent of 3309bec85e60 applied to the
p9/radix streamlined entry path.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
  2019-05-23  6:36   ` Paul Mackerras
@ 2019-05-29  1:54     ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-29  1:54 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the Book 3S KVM code uses kvm->lock to synchronize access
to the kvm->arch.rtas_tokens list.  Because this list is scanned
inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
taking kvm->lock cause a lock inversion problem, which could lead to
a deadlock.

To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
inside the vcpu mutexes, and use that instead of kvm->lock when
accessing the rtas token list.

This removes the lockdep_assert_held() in kvmppc_rtas_tokens_free().
At this point we don't hold the new mutex, but that is OK because
kvmppc_rtas_tokens_free() is only called when the whole VM is being
destroyed, and at that point nothing can be looking up a token in
the list.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  1 +
 arch/powerpc/kvm/book3s_rtas.c      | 14 ++++++--------
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 013c76a..fb7d363 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -309,6 +309,7 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	struct mutex rtas_token_lock;
 	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 #endif
 #ifdef CONFIG_KVM_MPIC
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 61a212d..ac56648 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 #ifdef CONFIG_PPC64
 	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
+	mutex_init(&kvm->arch.rtas_token_lock);
 #endif
 
 	return kvm->arch.kvm_ops->init_vm(kvm);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 4e178c4..b7ae3df 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		if (rtas_name_matches(d->handler->name, name)) {
@@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
 	bool found;
 	int i;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
 		if (d->token == token)
@@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.rtas_token_lock);
 
 	if (args.token)
 		rc = rtas_token_define(kvm, args.name, args.token);
 	else
 		rc = rtas_token_undefine(kvm, args.name);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.rtas_token_lock);
 
 	return rc;
 }
@@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	orig_rets = args.rets;
 	args.rets = &args.args[be32_to_cpu(args.nargs)];
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
 
 	rc = -ENOENT;
 	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
@@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
 
 	if (rc == 0) {
 		args.rets = orig_rets;
@@ -282,8 +282,6 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
-
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		list_del(&d->list);
 		kfree(d);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
@ 2019-05-29  1:54     ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-29  1:54 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

Currently the Book 3S KVM code uses kvm->lock to synchronize access
to the kvm->arch.rtas_tokens list.  Because this list is scanned
inside kvmppc_rtas_hcall(), which is called with the vcpu mutex held,
taking kvm->lock cause a lock inversion problem, which could lead to
a deadlock.

To fix this, we add a new mutex, kvm->arch.rtas_token_lock, which nests
inside the vcpu mutexes, and use that instead of kvm->lock when
accessing the rtas token list.

This removes the lockdep_assert_held() in kvmppc_rtas_tokens_free().
At this point we don't hold the new mutex, but that is OK because
kvmppc_rtas_tokens_free() is only called when the whole VM is being
destroyed, and at that point nothing can be looking up a token in
the list.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c           |  1 +
 arch/powerpc/kvm/book3s_rtas.c      | 14 ++++++--------
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 013c76a..fb7d363 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -309,6 +309,7 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	struct mutex rtas_token_lock;
 	DECLARE_BITMAP(enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 #endif
 #ifdef CONFIG_KVM_MPIC
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 61a212d..ac56648 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -902,6 +902,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 #ifdef CONFIG_PPC64
 	INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
+	mutex_init(&kvm->arch.rtas_token_lock);
 #endif
 
 	return kvm->arch.kvm_ops->init_vm(kvm);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 4e178c4..b7ae3df 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -146,7 +146,7 @@ static int rtas_token_undefine(struct kvm *kvm, char *name)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		if (rtas_name_matches(d->handler->name, name)) {
@@ -167,7 +167,7 @@ static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
 	bool found;
 	int i;
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.rtas_token_lock);
 
 	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
 		if (d->token = token)
@@ -206,14 +206,14 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(&args, argp, sizeof(args)))
 		return -EFAULT;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.rtas_token_lock);
 
 	if (args.token)
 		rc = rtas_token_define(kvm, args.name, args.token);
 	else
 		rc = rtas_token_undefine(kvm, args.name);
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.rtas_token_lock);
 
 	return rc;
 }
@@ -245,7 +245,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	orig_rets = args.rets;
 	args.rets = &args.args[be32_to_cpu(args.nargs)];
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.rtas_token_lock);
 
 	rc = -ENOENT;
 	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
@@ -256,7 +256,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.rtas_token_lock);
 
 	if (rc = 0) {
 		args.rets = orig_rets;
@@ -282,8 +282,6 @@ void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
 	struct rtas_token_definition *d, *tmp;
 
-	lockdep_assert_held(&kvm->lock);
-
 	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
 		list_del(&d->list);
 		kfree(d);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
  2019-05-23  6:34 ` Paul Mackerras
@ 2019-05-31  6:32   ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-31  6:32 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

On Thu, May 23, 2019 at 04:34:24PM +1000, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.

Series applied to my kvm-ppc-fixes branch (with v2 of 3/4).

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks
@ 2019-05-31  6:32   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2019-05-31  6:32 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Cédric Le Goater

On Thu, May 23, 2019 at 04:34:24PM +1000, Paul Mackerras wrote:
> Recent reports of lockdep splats in the HV KVM code revealed that it
> was taking the kvm->lock mutex in several contexts where a vcpu mutex
> was already held.  Lockdep has only started warning since I added code
> to take the vcpu mutexes in the XIVE device release functions, but
> since Documentation/virtual/kvm/locking.txt specifies that the vcpu
> mutexes nest inside kvm->lock, it seems that the new code is correct
> and it is most of the old uses of kvm->lock that are wrong.
> 
> This series should fix the problems, by adding new mutexes that nest
> inside the vcpu mutexes and using them instead of kvm->lock.

Series applied to my kvm-ppc-fixes branch (with v2 of 3/4).

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-05-31  6:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23  6:34 [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks Paul Mackerras
2019-05-23  6:34 ` Paul Mackerras
2019-05-23  6:35 ` [PATCH 1/4] KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions Paul Mackerras
2019-05-23  6:35   ` Paul Mackerras
2019-05-23  6:35 ` [PATCH 2/4] KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup Paul Mackerras
2019-05-23  6:35   ` Paul Mackerras
2019-05-23  6:36 ` [PATCH 3/4] KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list Paul Mackerras
2019-05-23  6:36   ` Paul Mackerras
2019-05-23  7:11   ` Cédric Le Goater
2019-05-23  7:11     ` Cédric Le Goater
2019-05-29  1:54   ` [PATCH v2 " Paul Mackerras
2019-05-29  1:54     ` Paul Mackerras
2019-05-23  6:36 ` [PATCH 4/4] KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu Paul Mackerras
2019-05-23  6:36   ` Paul Mackerras
2019-05-23  7:11   ` Cédric Le Goater
2019-05-23  7:11     ` Cédric Le Goater
2019-05-23  7:21 ` [PATCH 0/4] KVM: PPC: Book3S: Fix potential deadlocks Alexey Kardashevskiy
2019-05-23  7:21   ` Alexey Kardashevskiy
2019-05-23  7:41   ` Cédric Le Goater
2019-05-23  7:41     ` Cédric Le Goater
2019-05-23  8:31   ` Paul Mackerras
2019-05-23  8:31     ` Paul Mackerras
2019-05-24  9:17 ` Cédric Le Goater
2019-05-24  9:17   ` Cédric Le Goater
2019-05-28  4:54   ` Paul Mackerras
2019-05-28  4:54     ` Paul Mackerras
2019-05-31  6:32 ` Paul Mackerras
2019-05-31  6:32   ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.