kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion
@ 2023-03-16 21:14 Oliver Upton
  2023-03-16 21:14 ` [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Oliver Upton
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-16 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Oliver Upton

As it so happens, lock ordering in KVM/arm64 is completely backwards.
There's a significant amount of VM-wide state that needs to be accessed
from the context of a vCPU. Until now, this was accomplished by
acquiring the kvm->lock, but that cannot be nested within vcpu->mutex.

This series fixes the issue with some fine-grained locking for MP state
and a new, dedicated mutex that can nest with both kvm->lock and
vcpu->mutex.

Tested with kvmtool and QEMU scaled up to 64 vCPUs on a kernel w/
lockdep enabled. Applies to kvmarm/next.

v1: http://lore.kernel.org/kvmarm/20230308083947.3760066-1-oliver.upton@linux.dev

v1 -> v2:
 - Add a dedicated lock for serializing writes to MP state
 - Inform lockdep of acquisition order at time of VM/vCPU creation
 - Plug a race with GIC creation (Sean)
 - Use the config_lock in GIC ITS flows as well. There is now a single
   (valid) use of kvm->lock when enabling MTE.

Oliver Upton (4):
  KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
  KVM: arm64: Avoid lock inversion when setting the VM register width
  KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
  KVM: arm64: Use config_lock to protect vgic state

 arch/arm64/include/asm/kvm_host.h     |  4 ++
 arch/arm64/kvm/arm.c                  | 45 +++++++++++++++++++----
 arch/arm64/kvm/guest.c                |  2 +
 arch/arm64/kvm/hypercalls.c           |  4 +-
 arch/arm64/kvm/pmu-emul.c             | 23 +++---------
 arch/arm64/kvm/psci.c                 | 19 +++++-----
 arch/arm64/kvm/reset.c                | 16 ++++----
 arch/arm64/kvm/vgic/vgic-debug.c      |  8 ++--
 arch/arm64/kvm/vgic/vgic-init.c       | 33 ++++++++++-------
 arch/arm64/kvm/vgic/vgic-its.c        | 29 ++++++---------
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 53 ++++++++++++---------------
 arch/arm64/kvm/vgic/vgic-mmio-v3.c    |  4 +-
 arch/arm64/kvm/vgic/vgic-mmio.c       | 12 +++---
 arch/arm64/kvm/vgic/vgic-v4.c         | 11 +++---
 arch/arm64/kvm/vgic/vgic.c            |  2 +-
 15 files changed, 146 insertions(+), 119 deletions(-)

-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
  2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
@ 2023-03-16 21:14 ` Oliver Upton
  2023-03-22 12:02   ` Marc Zyngier
  2023-03-16 21:14 ` [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width Oliver Upton
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2023-03-16 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Oliver Upton

KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock
from the very beginning. One such example is the way vCPU resets are
handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI
call.

Add a dedicated lock to serialize writes to kvm_vcpu_arch::mp_state.
Hold the lock in kvm_psci_vcpu_on() to protect against changes while
the reset state is being configured. Ensure that readers read mp_state
exactly once. While at it, plug yet another race by taking the
mp_state_lock in the KVM_SET_MP_STATE ioctl handler.

As changes to MP state are now guarded with a dedicated lock, drop the
kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the
reader of reset_state outside of the kvm->lock and insert a barrier to
ensure reset_state is read before dropping the pending reset state.

While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least
now PSCI CPU_ON no longer depends on it for serializing vCPU reset.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              | 23 ++++++++++++++++++-----
 arch/arm64/kvm/psci.c             | 19 ++++++++++---------
 arch/arm64/kvm/reset.c            | 10 ++++++----
 4 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bcd774d74f34..917586237a4d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -522,6 +522,7 @@ struct kvm_vcpu_arch {
 
 	/* vcpu power state */
 	struct kvm_mp_state mp_state;
+	spinlock_t mp_state_lock;
 
 	/* Cache some mmu pages needed inside spinlock regions */
 	struct kvm_mmu_memory_cache mmu_page_cache;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3bd732eaf087..731a78f85915 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -326,6 +326,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 {
 	int err;
 
+	spin_lock_init(&vcpu->arch.mp_state_lock);
+
 	/* Force users to call KVM_ARM_VCPU_INIT */
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
@@ -443,16 +445,23 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 }
 
-void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
+static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
+void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+	spin_lock(&vcpu->arch.mp_state_lock);
+	__kvm_arm_vcpu_power_off(vcpu);
+	spin_unlock(&vcpu->arch.mp_state_lock);
+}
+
 bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED;
+	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_STOPPED;
 }
 
 static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
@@ -464,13 +473,13 @@ static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
 
 static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_SUSPENDED;
+	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_SUSPENDED;
 }
 
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	*mp_state = vcpu->arch.mp_state;
+	*mp_state = READ_ONCE(vcpu->arch.mp_state);
 
 	return 0;
 }
@@ -480,12 +489,14 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
 	int ret = 0;
 
+	spin_lock(&vcpu->arch.mp_state_lock);
+
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
 		vcpu->arch.mp_state = *mp_state;
 		break;
 	case KVM_MP_STATE_STOPPED:
-		kvm_arm_vcpu_power_off(vcpu);
+		__kvm_arm_vcpu_power_off(vcpu);
 		break;
 	case KVM_MP_STATE_SUSPENDED:
 		kvm_arm_vcpu_suspend(vcpu);
@@ -494,6 +505,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		ret = -EINVAL;
 	}
 
+	spin_unlock(&vcpu->arch.mp_state_lock);
+
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 7fbc4c1b9df0..7f1bff1cd670 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -62,6 +62,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	struct vcpu_reset_state *reset_state;
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
+	int ret = PSCI_RET_SUCCESS;
 	unsigned long cpu_id;
 
 	cpu_id = smccc_get_arg1(source_vcpu);
@@ -76,11 +77,15 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 */
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
+
+	spin_lock(&vcpu->arch.mp_state_lock);
 	if (!kvm_arm_vcpu_stopped(vcpu)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
-			return PSCI_RET_ALREADY_ON;
+			ret = PSCI_RET_ALREADY_ON;
 		else
-			return PSCI_RET_INVALID_PARAMS;
+			ret = PSCI_RET_INVALID_PARAMS;
+
+		goto out_unlock;
 	}
 
 	reset_state = &vcpu->arch.reset_state;
@@ -108,7 +113,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
 	kvm_vcpu_wake_up(vcpu);
 
-	return PSCI_RET_SUCCESS;
+out_unlock:
+	spin_unlock(&vcpu->arch.mp_state_lock);
+	return ret;
 }
 
 static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
@@ -229,7 +236,6 @@ static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32
 
 static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = vcpu->kvm;
 	u32 psci_fn = smccc_get_function(vcpu);
 	unsigned long val;
 	int ret = 1;
@@ -254,9 +260,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 		kvm_psci_narrow_to_32bit(vcpu);
 		fallthrough;
 	case PSCI_0_2_FN64_CPU_ON:
-		mutex_lock(&kvm->lock);
 		val = kvm_psci_vcpu_on(vcpu);
-		mutex_unlock(&kvm->lock);
 		break;
 	case PSCI_0_2_FN_AFFINITY_INFO:
 		kvm_psci_narrow_to_32bit(vcpu);
@@ -395,7 +399,6 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
 
 static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = vcpu->kvm;
 	u32 psci_fn = smccc_get_function(vcpu);
 	unsigned long val;
 
@@ -405,9 +408,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 		val = PSCI_RET_SUCCESS;
 		break;
 	case KVM_PSCI_FN_CPU_ON:
-		mutex_lock(&kvm->lock);
 		val = kvm_psci_vcpu_on(vcpu);
-		mutex_unlock(&kvm->lock);
 		break;
 	default:
 		val = PSCI_RET_NOT_SUPPORTED;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 49a3257dec46..b874ec6a37c1 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -264,15 +264,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	mutex_lock(&vcpu->kvm->lock);
 	ret = kvm_set_vm_width(vcpu);
-	if (!ret) {
-		reset_state = vcpu->arch.reset_state;
-		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
-	}
 	mutex_unlock(&vcpu->kvm->lock);
 
 	if (ret)
 		return ret;
 
+	reset_state = vcpu->arch.reset_state;
+
+	/* Ensure reset_state is read before clearing the pending state */
+	smp_rmb();
+	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
  2023-03-16 21:14 ` [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Oliver Upton
@ 2023-03-16 21:14 ` Oliver Upton
  2023-03-22 12:02   ` Marc Zyngier
  2023-03-23 20:09   ` Jeremy Linton
  2023-03-16 21:14 ` [PATCH v2 3/4] KVM: arm64: Use config_lock to protect data ordered against KVM_RUN Oliver Upton
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-16 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Oliver Upton,
	Jeremy Linton

kvm->lock must be taken outside of the vcpu->mutex. Of course, the
locking documentation for KVM makes this abundantly clear. Nonetheless,
the locking order in KVM/arm64 has been wrong for quite a while; we
acquire the kvm->lock while holding the vcpu->mutex all over the shop.

All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
pants down, leading to lockdep barfing:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.2.0-rc7+ #19 Not tainted
 ------------------------------------------------------
 qemu-system-aar/859 is trying to acquire lock:
 ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274

 but task is already holding lock:
 ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0

 which lock already depends on the new lock.

Add a dedicated lock to serialize writes to VM-scoped configuration from
the context of a vCPU. Protect the register width flags with the new
lock, thus avoiding the need to grab the kvm->lock while holding
vcpu->mutex in kvm_reset_vcpu().

Reported-by: Jeremy Linton <jeremy.linton@arm.com>
Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              | 18 ++++++++++++++++++
 arch/arm64/kvm/reset.c            |  6 +++---
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 917586237a4d..1f4b9708a775 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -185,6 +185,9 @@ struct kvm_protected_vm {
 };
 
 struct kvm_arch {
+	/* Protects VM-scoped configuration data */
+	struct mutex config_lock;
+
 	struct kvm_s2_mmu mmu;
 
 	/* VTCR_EL2 value for this VM */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 731a78f85915..1478bec52767 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -128,6 +128,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
 
+	mutex_init(&kvm->arch.config_lock);
+
+#ifdef CONFIG_LOCKDEP
+	/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
+	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
+	mutex_unlock(&kvm->arch.config_lock);
+	mutex_unlock(&kvm->lock);
+#endif
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		return ret;
@@ -328,6 +338,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	spin_lock_init(&vcpu->arch.mp_state_lock);
 
+#ifdef CONFIG_LOCKDEP
+	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
+	mutex_lock(&vcpu->mutex);
+	mutex_lock(&vcpu->kvm->arch.config_lock);
+	mutex_unlock(&vcpu->kvm->arch.config_lock);
+	mutex_unlock(&vcpu->mutex);
+#endif
+
 	/* Force users to call KVM_ARM_VCPU_INIT */
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b874ec6a37c1..ee445a7a1ef8 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -205,7 +205,7 @@ static int kvm_set_vm_width(struct kvm_vcpu *vcpu)
 
 	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
 
-	lockdep_assert_held(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.config_lock);
 
 	if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
 		/*
@@ -262,9 +262,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	bool loaded;
 	u32 pstate;
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.config_lock);
 	ret = kvm_set_vm_width(vcpu);
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.config_lock);
 
 	if (ret)
 		return ret;
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
  2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
  2023-03-16 21:14 ` [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Oliver Upton
  2023-03-16 21:14 ` [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width Oliver Upton
@ 2023-03-16 21:14 ` Oliver Upton
  2023-03-16 21:14 ` [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state Oliver Upton
  2023-03-23 22:48 ` [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Jeremy Linton
  4 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-16 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Oliver Upton

There are various bits of VM-scoped data that can only be configured
before the first call to KVM_RUN, such as the hypercall bitmaps and
the PMU. As these fields are protected by the kvm->lock and accessed
while holding vcpu->mutex, this is yet another example of lock
inversion.

Change out the kvm->lock for kvm->arch.config_lock in all of these
instances. Opportunistically simplify the locking mechanics of the
PMU configuration by holding the config_lock for the entirety of
kvm_arm_pmu_v3_set_attr().

Note that this also addresses a couple of bugs. There is an unguarded
read of the PMU version in KVM_ARM_VCPU_PMU_V3_FILTER which could race
with KVM_ARM_VCPU_PMU_V3_SET_PMU. Additionally, until now writes to the
per-vCPU vPMU irq were not serialized VM-wide, meaning concurrent calls
to KVM_ARM_VCPU_PMU_V3_IRQ could lead to a false positive in
pmu_irq_is_valid().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/arm.c        |  4 ++--
 arch/arm64/kvm/guest.c      |  2 ++
 arch/arm64/kvm/hypercalls.c |  4 ++--
 arch/arm64/kvm/pmu-emul.c   | 23 ++++++-----------------
 4 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1478bec52767..a4793961b73d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -624,9 +624,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (kvm_vm_is_protected(kvm))
 		kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 
 	return ret;
 }
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 07444fa22888..481c79cf22cd 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -957,7 +957,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 
 	switch (attr->group) {
 	case KVM_ARM_VCPU_PMU_V3_CTRL:
+		mutex_lock(&vcpu->kvm->arch.config_lock);
 		ret = kvm_arm_pmu_v3_set_attr(vcpu, attr);
+		mutex_unlock(&vcpu->kvm->arch.config_lock);
 		break;
 	case KVM_ARM_VCPU_TIMER_CTRL:
 		ret = kvm_arm_timer_set_attr(vcpu, attr);
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..fbdbf4257f76 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -377,7 +377,7 @@ static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
 	if (val & ~fw_reg_features)
 		return -EINVAL;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 
 	if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) &&
 	    val != *fw_reg_bmap) {
@@ -387,7 +387,7 @@ static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
 
 	WRITE_ONCE(*fw_reg_bmap, val);
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index c243b10f3e15..82991d89c2ea 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -875,7 +875,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 	struct arm_pmu *arm_pmu;
 	int ret = -ENXIO;
 
-	mutex_lock(&kvm->lock);
+	lockdep_assert_held(&kvm->arch.config_lock);
 	mutex_lock(&arm_pmus_lock);
 
 	list_for_each_entry(entry, &arm_pmus, entry) {
@@ -895,7 +895,6 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 	}
 
 	mutex_unlock(&arm_pmus_lock);
-	mutex_unlock(&kvm->lock);
 	return ret;
 }
 
@@ -903,22 +902,20 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	struct kvm *kvm = vcpu->kvm;
 
+	lockdep_assert_held(&kvm->arch.config_lock);
+
 	if (!kvm_vcpu_has_pmu(vcpu))
 		return -ENODEV;
 
 	if (vcpu->arch.pmu.created)
 		return -EBUSY;
 
-	mutex_lock(&kvm->lock);
 	if (!kvm->arch.arm_pmu) {
 		/* No PMU set, get the default one */
 		kvm->arch.arm_pmu = kvm_pmu_probe_armpmu();
-		if (!kvm->arch.arm_pmu) {
-			mutex_unlock(&kvm->lock);
+		if (!kvm->arch.arm_pmu)
 			return -ENODEV;
-		}
 	}
-	mutex_unlock(&kvm->lock);
 
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_PMU_V3_IRQ: {
@@ -962,19 +959,13 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		     filter.action != KVM_PMU_EVENT_DENY))
 			return -EINVAL;
 
-		mutex_lock(&kvm->lock);
-
-		if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
-			mutex_unlock(&kvm->lock);
+		if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags))
 			return -EBUSY;
-		}
 
 		if (!kvm->arch.pmu_filter) {
 			kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
-			if (!kvm->arch.pmu_filter) {
-				mutex_unlock(&kvm->lock);
+			if (!kvm->arch.pmu_filter)
 				return -ENOMEM;
-			}
 
 			/*
 			 * The default depends on the first applied filter.
@@ -993,8 +984,6 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		else
 			bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents);
 
-		mutex_unlock(&kvm->lock);
-
 		return 0;
 	}
 	case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state
  2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
                   ` (2 preceding siblings ...)
  2023-03-16 21:14 ` [PATCH v2 3/4] KVM: arm64: Use config_lock to protect data ordered against KVM_RUN Oliver Upton
@ 2023-03-16 21:14 ` Oliver Upton
  2023-03-22 12:02   ` Marc Zyngier
  2023-03-23 22:48 ` [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Jeremy Linton
  4 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2023-03-16 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Oliver Upton

Almost all of the vgic state is VM-scoped but accessed from the context
of a vCPU. These accesses were serialized on the kvm->lock which cannot
be nested within a vcpu->mutex critical section.

Move over the vgic state to using the config_lock. Tweak the lock
ordering where necessary to ensure that the config_lock is acquired
after the vcpu->mutex. Acquire the config_lock in kvm_vgic_create() to
avoid a race between the converted flows and GIC creation.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic-debug.c      |  8 ++--
 arch/arm64/kvm/vgic/vgic-init.c       | 33 ++++++++++-------
 arch/arm64/kvm/vgic/vgic-its.c        | 29 ++++++---------
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 53 ++++++++++++---------------
 arch/arm64/kvm/vgic/vgic-mmio-v3.c    |  4 +-
 arch/arm64/kvm/vgic/vgic-mmio.c       | 12 +++---
 arch/arm64/kvm/vgic/vgic-v4.c         | 11 +++---
 arch/arm64/kvm/vgic/vgic.c            |  2 +-
 8 files changed, 75 insertions(+), 77 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
index 78cde687383c..07aa0437125a 100644
--- a/arch/arm64/kvm/vgic/vgic-debug.c
+++ b/arch/arm64/kvm/vgic/vgic-debug.c
@@ -85,7 +85,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
 	struct kvm *kvm = s->private;
 	struct vgic_state_iter *iter;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	iter = kvm->arch.vgic.iter;
 	if (iter) {
 		iter = ERR_PTR(-EBUSY);
@@ -104,7 +104,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
 	if (end_of_vgic(iter))
 		iter = NULL;
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 	return iter;
 }
 
@@ -132,12 +132,12 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
 	if (IS_ERR(v))
 		return;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	iter = kvm->arch.vgic.iter;
 	kfree(iter->lpi_array);
 	kfree(iter);
 	kvm->arch.vgic.iter = NULL;
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 }
 
 static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index cd134db41a57..b1690063e17d 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -74,9 +74,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	unsigned long i;
 	int ret;
 
-	if (irqchip_in_kernel(kvm))
-		return -EEXIST;
-
 	/*
 	 * This function is also called by the KVM_CREATE_IRQCHIP handler,
 	 * which had no chance yet to check the availability of the GICv2
@@ -91,6 +88,13 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	if (!lock_all_vcpus(kvm))
 		return ret;
 
+	mutex_lock(&kvm->arch.config_lock);
+
+	if (irqchip_in_kernel(kvm)) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (vcpu_has_run_once(vcpu))
 			goto out_unlock;
@@ -118,6 +122,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
 
 out_unlock:
+	mutex_unlock(&kvm->arch.config_lock);
 	unlock_all_vcpus(kvm);
 	return ret;
 }
@@ -227,9 +232,9 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	 * KVM io device for the redistributor that belongs to this VCPU.
 	 */
 	if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
-		mutex_lock(&vcpu->kvm->lock);
+		mutex_lock(&vcpu->kvm->arch.config_lock);
 		ret = vgic_register_redist_iodev(vcpu);
-		mutex_unlock(&vcpu->kvm->lock);
+		mutex_unlock(&vcpu->kvm->arch.config_lock);
 	}
 	return ret;
 }
@@ -250,7 +255,6 @@ static void kvm_vgic_vcpu_enable(struct kvm_vcpu *vcpu)
  * The function is generally called when nr_spis has been explicitly set
  * by the guest through the KVM DEVICE API. If not nr_spis is set to 256.
  * vgic_initialized() returns true when this function has succeeded.
- * Must be called with kvm->lock held!
  */
 int vgic_init(struct kvm *kvm)
 {
@@ -259,6 +263,8 @@ int vgic_init(struct kvm *kvm)
 	int ret = 0, i;
 	unsigned long idx;
 
+	lockdep_assert_held(&kvm->arch.config_lock);
+
 	if (vgic_initialized(kvm))
 		return 0;
 
@@ -373,12 +379,13 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
 	vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
 }
 
-/* To be called with kvm->lock held */
 static void __kvm_vgic_destroy(struct kvm *kvm)
 {
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
 
+	lockdep_assert_held(&kvm->arch.config_lock);
+
 	vgic_debug_destroy(kvm);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
@@ -389,9 +396,9 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
 
 void kvm_vgic_destroy(struct kvm *kvm)
 {
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	__kvm_vgic_destroy(kvm);
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 }
 
 /**
@@ -414,9 +421,9 @@ int vgic_lazy_init(struct kvm *kvm)
 		if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
 			return -EBUSY;
 
-		mutex_lock(&kvm->lock);
+		mutex_lock(&kvm->arch.config_lock);
 		ret = vgic_init(kvm);
-		mutex_unlock(&kvm->lock);
+		mutex_unlock(&kvm->arch.config_lock);
 	}
 
 	return ret;
@@ -441,7 +448,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
 	if (likely(vgic_ready(kvm)))
 		return 0;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	if (vgic_ready(kvm))
 		goto out;
 
@@ -459,7 +466,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
 		dist->ready = true;
 
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 2642e9ce2819..ca55065102e7 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2043,7 +2043,10 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	if (offset & align)
 		return -EINVAL;
 
-	mutex_lock(&dev->kvm->lock);
+	if (!lock_all_vcpus(dev->kvm))
+		return -EBUSY;
+
+	mutex_lock(&dev->kvm->arch.config_lock);
 
 	if (IS_VGIC_ADDR_UNDEF(its->vgic_its_base)) {
 		ret = -ENXIO;
@@ -2058,11 +2061,6 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
-		ret = -EBUSY;
-		goto out;
-	}
-
 	addr = its->vgic_its_base + offset;
 
 	len = region->access_flags & VGIC_ACCESS_64bit ? 8 : 4;
@@ -2076,9 +2074,9 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	} else {
 		*reg = region->its_read(dev->kvm, its, addr, len);
 	}
-	unlock_all_vcpus(dev->kvm);
 out:
-	mutex_unlock(&dev->kvm->lock);
+	mutex_unlock(&dev->kvm->arch.config_lock);
+	unlock_all_vcpus(dev->kvm);
 	return ret;
 }
 
@@ -2748,14 +2746,11 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 	if (attr == KVM_DEV_ARM_VGIC_CTRL_INIT) /* Nothing to do */
 		return 0;
 
-	mutex_lock(&kvm->lock);
-	mutex_lock(&its->its_lock);
-
-	if (!lock_all_vcpus(kvm)) {
-		mutex_unlock(&its->its_lock);
-		mutex_unlock(&kvm->lock);
+	if (!lock_all_vcpus(kvm))
 		return -EBUSY;
-	}
+
+	mutex_lock(&kvm->arch.config_lock);
+	mutex_lock(&its->its_lock);
 
 	switch (attr) {
 	case KVM_DEV_ARM_ITS_CTRL_RESET:
@@ -2769,9 +2764,9 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		break;
 	}
 
-	unlock_all_vcpus(kvm);
 	mutex_unlock(&its->its_lock);
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
+	unlock_all_vcpus(kvm);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index edeac2380591..d5f8a81d6a92 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -46,7 +46,7 @@ int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev
 	struct vgic_dist *vgic = &kvm->arch.vgic;
 	int r;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	switch (FIELD_GET(KVM_ARM_DEVICE_TYPE_MASK, dev_addr->id)) {
 	case KVM_VGIC_V2_ADDR_TYPE_DIST:
 		r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
@@ -68,7 +68,7 @@ int kvm_set_legacy_vgic_v2_addr(struct kvm *kvm, struct kvm_arm_device_addr *dev
 		r = -ENODEV;
 	}
 
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 
 	return r;
 }
@@ -102,7 +102,7 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
 		if (get_user(addr, uaddr))
 			return -EFAULT;
 
-	mutex_lock(&kvm->lock);
+	mutex_lock(&kvm->arch.config_lock);
 	switch (attr->attr) {
 	case KVM_VGIC_V2_ADDR_TYPE_DIST:
 		r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
@@ -191,7 +191,7 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
 	}
 
 out:
-	mutex_unlock(&kvm->lock);
+	mutex_unlock(&kvm->arch.config_lock);
 
 	if (!r && !write)
 		r =  put_user(addr, uaddr);
@@ -227,7 +227,7 @@ static int vgic_set_common_attr(struct kvm_device *dev,
 		    (val & 31))
 			return -EINVAL;
 
-		mutex_lock(&dev->kvm->lock);
+		mutex_lock(&dev->kvm->arch.config_lock);
 
 		if (vgic_ready(dev->kvm) || dev->kvm->arch.vgic.nr_spis)
 			ret = -EBUSY;
@@ -235,16 +235,16 @@ static int vgic_set_common_attr(struct kvm_device *dev,
 			dev->kvm->arch.vgic.nr_spis =
 				val - VGIC_NR_PRIVATE_IRQS;
 
-		mutex_unlock(&dev->kvm->lock);
+		mutex_unlock(&dev->kvm->arch.config_lock);
 
 		return ret;
 	}
 	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
 		switch (attr->attr) {
 		case KVM_DEV_ARM_VGIC_CTRL_INIT:
-			mutex_lock(&dev->kvm->lock);
+			mutex_lock(&dev->kvm->arch.config_lock);
 			r = vgic_init(dev->kvm);
-			mutex_unlock(&dev->kvm->lock);
+			mutex_unlock(&dev->kvm->arch.config_lock);
 			return r;
 		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
 			/*
@@ -254,15 +254,14 @@ static int vgic_set_common_attr(struct kvm_device *dev,
 			 */
 			if (vgic_check_type(dev->kvm, KVM_DEV_TYPE_ARM_VGIC_V3))
 				return -ENXIO;
-			mutex_lock(&dev->kvm->lock);
 
-			if (!lock_all_vcpus(dev->kvm)) {
-				mutex_unlock(&dev->kvm->lock);
+			if (!lock_all_vcpus(dev->kvm))
 				return -EBUSY;
-			}
+
+			mutex_lock(&dev->kvm->arch.config_lock);
 			r = vgic_v3_save_pending_tables(dev->kvm);
+			mutex_unlock(&dev->kvm->arch.config_lock);
 			unlock_all_vcpus(dev->kvm);
-			mutex_unlock(&dev->kvm->lock);
 			return r;
 		}
 		break;
@@ -409,17 +408,15 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 		if (get_user(val, uaddr))
 			return -EFAULT;
 
-	mutex_lock(&dev->kvm->lock);
+	if (!lock_all_vcpus(dev->kvm))
+		return -EBUSY;
+
+	mutex_lock(&dev->kvm->arch.config_lock);
 
 	ret = vgic_init(dev->kvm);
 	if (ret)
 		goto out;
 
-	if (!lock_all_vcpus(dev->kvm)) {
-		ret = -EBUSY;
-		goto out;
-	}
-
 	switch (attr->group) {
 	case KVM_DEV_ARM_VGIC_GRP_CPU_REGS:
 		ret = vgic_v2_cpuif_uaccess(vcpu, is_write, addr, &val);
@@ -432,9 +429,9 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
 out:
-	mutex_unlock(&dev->kvm->lock);
+	mutex_unlock(&dev->kvm->arch.config_lock);
+	unlock_all_vcpus(dev->kvm);
 
 	if (!ret && !is_write)
 		ret = put_user(val, uaddr);
@@ -567,14 +564,12 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 			return -EFAULT;
 	}
 
-	mutex_lock(&dev->kvm->lock);
+	if (!lock_all_vcpus(dev->kvm))
+		return -EBUSY;
 
-	if (unlikely(!vgic_initialized(dev->kvm))) {
-		ret = -EBUSY;
-		goto out;
-	}
+	mutex_lock(&dev->kvm->arch.config_lock);
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (unlikely(!vgic_initialized(dev->kvm))) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -609,9 +604,9 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
 out:
-	mutex_unlock(&dev->kvm->lock);
+	mutex_unlock(&dev->kvm->arch.config_lock);
+	unlock_all_vcpus(dev->kvm);
 
 	if (!ret && uaccess && !is_write) {
 		u32 __user *uaddr = (u32 __user *)(unsigned long)attr->addr;
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 91201f743033..472b18ac92a2 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -111,7 +111,7 @@ static void vgic_mmio_write_v3_misc(struct kvm_vcpu *vcpu,
 	case GICD_CTLR: {
 		bool was_enabled, is_hwsgi;
 
-		mutex_lock(&vcpu->kvm->lock);
+		mutex_lock(&vcpu->kvm->arch.config_lock);
 
 		was_enabled = dist->enabled;
 		is_hwsgi = dist->nassgireq;
@@ -139,7 +139,7 @@ static void vgic_mmio_write_v3_misc(struct kvm_vcpu *vcpu,
 		else if (!was_enabled && dist->enabled)
 			vgic_kick_vcpus(vcpu->kvm);
 
-		mutex_unlock(&vcpu->kvm->lock);
+		mutex_unlock(&vcpu->kvm->arch.config_lock);
 		break;
 	}
 	case GICD_TYPER:
diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
index e67b3b2c8044..1939c94e0b24 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio.c
@@ -530,13 +530,13 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	u32 val;
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.config_lock);
 	vgic_access_active_prepare(vcpu, intid);
 
 	val = __vgic_mmio_read_active(vcpu, addr, len);
 
 	vgic_access_active_finish(vcpu, intid);
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.config_lock);
 
 	return val;
 }
@@ -625,13 +625,13 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.config_lock);
 	vgic_access_active_prepare(vcpu, intid);
 
 	__vgic_mmio_write_cactive(vcpu, addr, len, val);
 
 	vgic_access_active_finish(vcpu, intid);
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.config_lock);
 }
 
 int vgic_mmio_uaccess_write_cactive(struct kvm_vcpu *vcpu,
@@ -662,13 +662,13 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 
-	mutex_lock(&vcpu->kvm->lock);
+	mutex_lock(&vcpu->kvm->arch.config_lock);
 	vgic_access_active_prepare(vcpu, intid);
 
 	__vgic_mmio_write_sactive(vcpu, addr, len, val);
 
 	vgic_access_active_finish(vcpu, intid);
-	mutex_unlock(&vcpu->kvm->lock);
+	mutex_unlock(&vcpu->kvm->arch.config_lock);
 }
 
 int vgic_mmio_uaccess_write_sactive(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index a413718be92b..3bb003478060 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -232,9 +232,8 @@ int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq)
  * @kvm:	Pointer to the VM being initialized
  *
  * We may be called each time a vITS is created, or when the
- * vgic is initialized. This relies on kvm->lock to be
- * held. In both cases, the number of vcpus should now be
- * fixed.
+ * vgic is initialized. In both cases, the number of vcpus
+ * should now be fixed.
  */
 int vgic_v4_init(struct kvm *kvm)
 {
@@ -243,6 +242,8 @@ int vgic_v4_init(struct kvm *kvm)
 	int nr_vcpus, ret;
 	unsigned long i;
 
+	lockdep_assert_held(&kvm->arch.config_lock);
+
 	if (!kvm_vgic_global_state.has_gicv4)
 		return 0; /* Nothing to see here... move along. */
 
@@ -309,14 +310,14 @@ int vgic_v4_init(struct kvm *kvm)
 /**
  * vgic_v4_teardown - Free the GICv4 data structures
  * @kvm:	Pointer to the VM being destroyed
- *
- * Relies on kvm->lock to be held.
  */
 void vgic_v4_teardown(struct kvm *kvm)
 {
 	struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
 	int i;
 
+	lockdep_assert_held(&kvm->arch.config_lock);
+
 	if (!its_vm->vpes)
 		return;
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d97e6080b421..afea64f60086 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -23,7 +23,7 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
 
 /*
  * Locking order is always:
- * kvm->lock (mutex)
+ * kvm->arch.config_lock (mutex)
  *   its->cmd_lock (mutex)
  *     its->its_lock (mutex)
  *       vgic_cpu->ap_list_lock		must be taken with IRQs disabled
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state
  2023-03-16 21:14 ` [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state Oliver Upton
@ 2023-03-22 12:02   ` Marc Zyngier
  2023-03-23 19:18     ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2023-03-22 12:02 UTC (permalink / raw)
  To: Oliver Upton
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

On Thu, 16 Mar 2023 21:14:12 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Almost all of the vgic state is VM-scoped but accessed from the context
> of a vCPU. These accesses were serialized on the kvm->lock which cannot
> be nested within a vcpu->mutex critical section.
> 
> Move over the vgic state to using the config_lock. Tweak the lock
> ordering where necessary to ensure that the config_lock is acquired
> after the vcpu->mutex. Acquire the config_lock in kvm_vgic_create() to
> avoid a race between the converted flows and GIC creation.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/vgic/vgic-debug.c      |  8 ++--
>  arch/arm64/kvm/vgic/vgic-init.c       | 33 ++++++++++-------
>  arch/arm64/kvm/vgic/vgic-its.c        | 29 ++++++---------
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 53 ++++++++++++---------------
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c    |  4 +-
>  arch/arm64/kvm/vgic/vgic-mmio.c       | 12 +++---
>  arch/arm64/kvm/vgic/vgic-v4.c         | 11 +++---
>  arch/arm64/kvm/vgic/vgic.c            |  2 +-
>  8 files changed, 75 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> index 78cde687383c..07aa0437125a 100644
> --- a/arch/arm64/kvm/vgic/vgic-debug.c
> +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> @@ -85,7 +85,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
>  	struct kvm *kvm = s->private;
>  	struct vgic_state_iter *iter;
>  
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
>  	iter = kvm->arch.vgic.iter;
>  	if (iter) {
>  		iter = ERR_PTR(-EBUSY);
> @@ -104,7 +104,7 @@ static void *vgic_debug_start(struct seq_file *s, loff_t *pos)
>  	if (end_of_vgic(iter))
>  		iter = NULL;
>  out:
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.config_lock);
>  	return iter;
>  }
>  
> @@ -132,12 +132,12 @@ static void vgic_debug_stop(struct seq_file *s, void *v)
>  	if (IS_ERR(v))
>  		return;
>  
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
>  	iter = kvm->arch.vgic.iter;
>  	kfree(iter->lpi_array);
>  	kfree(iter);
>  	kvm->arch.vgic.iter = NULL;
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.config_lock);
>  }
>  
>  static void print_dist_state(struct seq_file *s, struct vgic_dist *dist)
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index cd134db41a57..b1690063e17d 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -74,9 +74,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  	unsigned long i;
>  	int ret;
>  
> -	if (irqchip_in_kernel(kvm))
> -		return -EEXIST;
> -
>  	/*
>  	 * This function is also called by the KVM_CREATE_IRQCHIP handler,
>  	 * which had no chance yet to check the availability of the GICv2
> @@ -91,6 +88,13 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  	if (!lock_all_vcpus(kvm))
>  		return ret;
>  
> +	mutex_lock(&kvm->arch.config_lock);
> +
> +	if (irqchip_in_kernel(kvm)) {
> +		ret = -EEXIST;
> +		goto out_unlock;
> +	}
> +
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		if (vcpu_has_run_once(vcpu))
>  			goto out_unlock;
> @@ -118,6 +122,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>  		INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
>  
>  out_unlock:
> +	mutex_unlock(&kvm->arch.config_lock);
>  	unlock_all_vcpus(kvm);
>  	return ret;
>  }
> @@ -227,9 +232,9 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  	 * KVM io device for the redistributor that belongs to this VCPU.
>  	 */
>  	if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
> -		mutex_lock(&vcpu->kvm->lock);
> +		mutex_lock(&vcpu->kvm->arch.config_lock);
>  		ret = vgic_register_redist_iodev(vcpu);
> -		mutex_unlock(&vcpu->kvm->lock);
> +		mutex_unlock(&vcpu->kvm->arch.config_lock);
>  	}
>  	return ret;
>  }
> @@ -250,7 +255,6 @@ static void kvm_vgic_vcpu_enable(struct kvm_vcpu *vcpu)
>   * The function is generally called when nr_spis has been explicitly set
>   * by the guest through the KVM DEVICE API. If not nr_spis is set to 256.
>   * vgic_initialized() returns true when this function has succeeded.
> - * Must be called with kvm->lock held!
>   */
>  int vgic_init(struct kvm *kvm)
>  {
> @@ -259,6 +263,8 @@ int vgic_init(struct kvm *kvm)
>  	int ret = 0, i;
>  	unsigned long idx;
>  
> +	lockdep_assert_held(&kvm->arch.config_lock);
> +
>  	if (vgic_initialized(kvm))
>  		return 0;
>  
> @@ -373,12 +379,13 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
>  }
>  
> -/* To be called with kvm->lock held */
>  static void __kvm_vgic_destroy(struct kvm *kvm)
>  {
>  	struct kvm_vcpu *vcpu;
>  	unsigned long i;
>  
> +	lockdep_assert_held(&kvm->arch.config_lock);
> +
>  	vgic_debug_destroy(kvm);
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm)
> @@ -389,9 +396,9 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
>  
>  void kvm_vgic_destroy(struct kvm *kvm)
>  {
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
>  	__kvm_vgic_destroy(kvm);
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.config_lock);
>  }
>  
>  /**
> @@ -414,9 +421,9 @@ int vgic_lazy_init(struct kvm *kvm)
>  		if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2)
>  			return -EBUSY;
>  
> -		mutex_lock(&kvm->lock);
> +		mutex_lock(&kvm->arch.config_lock);
>  		ret = vgic_init(kvm);
> -		mutex_unlock(&kvm->lock);
> +		mutex_unlock(&kvm->arch.config_lock);
>  	}
>  
>  	return ret;
> @@ -441,7 +448,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
>  	if (likely(vgic_ready(kvm)))
>  		return 0;
>  
> -	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
>  	if (vgic_ready(kvm))
>  		goto out;
>  
> @@ -459,7 +466,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
>  		dist->ready = true;
>  
>  out:
> -	mutex_unlock(&kvm->lock);
> +	mutex_unlock(&kvm->arch.config_lock);
>  	return ret;
>  }
>  
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 2642e9ce2819..ca55065102e7 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2043,7 +2043,10 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
>  	if (offset & align)
>  		return -EINVAL;
>  
> -	mutex_lock(&dev->kvm->lock);
> +	if (!lock_all_vcpus(dev->kvm))
> +		return -EBUSY;
> +
> +	mutex_lock(&dev->kvm->arch.config_lock);

Huh, that's fishy. The whole "lock the VM and the lock the individual
vcpus" is there to prevent a concurrent creation of a vcpu while we're
doing stuff that affects them all. Allowing a new vcpu to come online
while this sequence is happening is ... unexpected.

Why do we need to drop this initial lock? I'd expect them to be
completely cumulative.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
  2023-03-16 21:14 ` [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Oliver Upton
@ 2023-03-22 12:02   ` Marc Zyngier
  2023-03-23 19:47     ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2023-03-22 12:02 UTC (permalink / raw)
  To: Oliver Upton
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

On Thu, 16 Mar 2023 21:14:09 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock
> from the very beginning. One such example is the way vCPU resets are
> handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI
> call.
> 
> Add a dedicated lock to serialize writes to kvm_vcpu_arch::mp_state.
> Hold the lock in kvm_psci_vcpu_on() to protect against changes while
> the reset state is being configured. Ensure that readers read mp_state
> exactly once. While at it, plug yet another race by taking the
> mp_state_lock in the KVM_SET_MP_STATE ioctl handler.
> 
> As changes to MP state are now guarded with a dedicated lock, drop the
> kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the
> reader of reset_state outside of the kvm->lock and insert a barrier to
> ensure reset_state is read before dropping the pending reset state.
> 
> While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least
> now PSCI CPU_ON no longer depends on it for serializing vCPU reset.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/arm.c              | 23 ++++++++++++++++++-----
>  arch/arm64/kvm/psci.c             | 19 ++++++++++---------
>  arch/arm64/kvm/reset.c            | 10 ++++++----
>  4 files changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index bcd774d74f34..917586237a4d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -522,6 +522,7 @@ struct kvm_vcpu_arch {
>  
>  	/* vcpu power state */
>  	struct kvm_mp_state mp_state;
> +	spinlock_t mp_state_lock;
>  
>  	/* Cache some mmu pages needed inside spinlock regions */
>  	struct kvm_mmu_memory_cache mmu_page_cache;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 3bd732eaf087..731a78f85915 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -326,6 +326,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  {
>  	int err;
>  
> +	spin_lock_init(&vcpu->arch.mp_state_lock);
> +
>  	/* Force users to call KVM_ARM_VCPU_INIT */
>  	vcpu->arch.target = -1;
>  	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> @@ -443,16 +445,23 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	vcpu->cpu = -1;
>  }
>  
> -void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
> +static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
>  	kvm_make_request(KVM_REQ_SLEEP, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> +void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
> +{
> +	spin_lock(&vcpu->arch.mp_state_lock);
> +	__kvm_arm_vcpu_power_off(vcpu);
> +	spin_unlock(&vcpu->arch.mp_state_lock);
> +}
> +
>  bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED;
> +	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_STOPPED;
>  }
>  
>  static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> @@ -464,13 +473,13 @@ static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
>  
>  static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_SUSPENDED;
> +	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_SUSPENDED;
>  }
>  
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	*mp_state = vcpu->arch.mp_state;
> +	*mp_state = READ_ONCE(vcpu->arch.mp_state);
>  
>  	return 0;
>  }
> @@ -480,12 +489,14 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  {
>  	int ret = 0;
>  
> +	spin_lock(&vcpu->arch.mp_state_lock);
> +
>  	switch (mp_state->mp_state) {
>  	case KVM_MP_STATE_RUNNABLE:
>  		vcpu->arch.mp_state = *mp_state;

Given that the above helpers snapshot mp_state without holding the
lock, it'd be better to at least turn this into a WRITE_ONCE().

>  		break;
>  	case KVM_MP_STATE_STOPPED:
> -		kvm_arm_vcpu_power_off(vcpu);
> +		__kvm_arm_vcpu_power_off(vcpu);
>  		break;
>  	case KVM_MP_STATE_SUSPENDED:
>  		kvm_arm_vcpu_suspend(vcpu);
> @@ -494,6 +505,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  		ret = -EINVAL;
>  	}
>  
> +	spin_unlock(&vcpu->arch.mp_state_lock);
> +
>  	return ret;
>  }
>  
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 7fbc4c1b9df0..7f1bff1cd670 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -62,6 +62,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	struct vcpu_reset_state *reset_state;
>  	struct kvm *kvm = source_vcpu->kvm;
>  	struct kvm_vcpu *vcpu = NULL;
> +	int ret = PSCI_RET_SUCCESS;
>  	unsigned long cpu_id;
>  
>  	cpu_id = smccc_get_arg1(source_vcpu);
> @@ -76,11 +77,15 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 */
>  	if (!vcpu)
>  		return PSCI_RET_INVALID_PARAMS;
> +
> +	spin_lock(&vcpu->arch.mp_state_lock);
>  	if (!kvm_arm_vcpu_stopped(vcpu)) {
>  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> -			return PSCI_RET_ALREADY_ON;
> +			ret = PSCI_RET_ALREADY_ON;
>  		else
> -			return PSCI_RET_INVALID_PARAMS;
> +			ret = PSCI_RET_INVALID_PARAMS;
> +
> +		goto out_unlock;
>  	}
>  
>  	reset_state = &vcpu->arch.reset_state;
> @@ -108,7 +113,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
>  	kvm_vcpu_wake_up(vcpu);
>  
> -	return PSCI_RET_SUCCESS;
> +out_unlock:
> +	spin_unlock(&vcpu->arch.mp_state_lock);
> +	return ret;
>  }
>  
>  static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> @@ -229,7 +236,6 @@ static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32
>  
>  static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = vcpu->kvm;
>  	u32 psci_fn = smccc_get_function(vcpu);
>  	unsigned long val;
>  	int ret = 1;
> @@ -254,9 +260,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
>  		kvm_psci_narrow_to_32bit(vcpu);
>  		fallthrough;
>  	case PSCI_0_2_FN64_CPU_ON:
> -		mutex_lock(&kvm->lock);
>  		val = kvm_psci_vcpu_on(vcpu);
> -		mutex_unlock(&kvm->lock);
>  		break;
>  	case PSCI_0_2_FN_AFFINITY_INFO:
>  		kvm_psci_narrow_to_32bit(vcpu);
> @@ -395,7 +399,6 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
>  
>  static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = vcpu->kvm;
>  	u32 psci_fn = smccc_get_function(vcpu);
>  	unsigned long val;
>  
> @@ -405,9 +408,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
>  		val = PSCI_RET_SUCCESS;
>  		break;
>  	case KVM_PSCI_FN_CPU_ON:
> -		mutex_lock(&kvm->lock);
>  		val = kvm_psci_vcpu_on(vcpu);
> -		mutex_unlock(&kvm->lock);
>  		break;
>  	default:
>  		val = PSCI_RET_NOT_SUPPORTED;
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 49a3257dec46..b874ec6a37c1 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -264,15 +264,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  
>  	mutex_lock(&vcpu->kvm->lock);
>  	ret = kvm_set_vm_width(vcpu);
> -	if (!ret) {
> -		reset_state = vcpu->arch.reset_state;
> -		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
> -	}
>  	mutex_unlock(&vcpu->kvm->lock);
>  
>  	if (ret)
>  		return ret;
>  
> +	reset_state = vcpu->arch.reset_state;
> +
> +	/* Ensure reset_state is read before clearing the pending state */
> +	smp_rmb();
> +	WRITE_ONCE(vcpu->arch.reset_state.reset, false);

What prevents a concurrent PSCI call from messing with this state? I'd
have expected this to be done while holding the mp_state lock... There
must be something, somewhere, but I fail to spot it right now.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-16 21:14 ` [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width Oliver Upton
@ 2023-03-22 12:02   ` Marc Zyngier
  2023-03-23 19:20     ` Oliver Upton
  2023-03-23 20:09   ` Jeremy Linton
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2023-03-22 12:02 UTC (permalink / raw)
  To: Oliver Upton
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Jeremy Linton

On Thu, 16 Mar 2023 21:14:10 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> kvm->lock must be taken outside of the vcpu->mutex. Of course, the
> locking documentation for KVM makes this abundantly clear. Nonetheless,
> the locking order in KVM/arm64 has been wrong for quite a while; we
> acquire the kvm->lock while holding the vcpu->mutex all over the shop.
> 
> All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
> knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
> pants down, leading to lockdep barfing:
> 
>  ======================================================
>  WARNING: possible circular locking dependency detected
>  6.2.0-rc7+ #19 Not tainted
>  ------------------------------------------------------
>  qemu-system-aar/859 is trying to acquire lock:
>  ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274
> 
>  but task is already holding lock:
>  ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0
> 
>  which lock already depends on the new lock.
> 
> Add a dedicated lock to serialize writes to VM-scoped configuration from
> the context of a vCPU. Protect the register width flags with the new
> lock, thus avoiding the need to grab the kvm->lock while holding
> vcpu->mutex in kvm_reset_vcpu().
> 
> Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arm.c              | 18 ++++++++++++++++++
>  arch/arm64/kvm/reset.c            |  6 +++---
>  3 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 917586237a4d..1f4b9708a775 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -185,6 +185,9 @@ struct kvm_protected_vm {
>  };
>  
>  struct kvm_arch {
> +	/* Protects VM-scoped configuration data */
> +	struct mutex config_lock;
> +

nit: can we move this down into the structure and keep the MM stuff on
its own at the top? Placing it next to the flags would make some
sense, as these flags are definitely related to configuration matters.

>  	struct kvm_s2_mmu mmu;
>  
>  	/* VTCR_EL2 value for this VM */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 731a78f85915..1478bec52767 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -128,6 +128,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
>  	int ret;
>  
> +	mutex_init(&kvm->arch.config_lock);
> +
> +#ifdef CONFIG_LOCKDEP
> +	/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
> +	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
> +	mutex_unlock(&kvm->arch.config_lock);
> +	mutex_unlock(&kvm->lock);
> +#endif
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		return ret;
> @@ -328,6 +338,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	spin_lock_init(&vcpu->arch.mp_state_lock);
>  
> +#ifdef CONFIG_LOCKDEP
> +	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
> +	mutex_lock(&vcpu->mutex);
> +	mutex_lock(&vcpu->kvm->arch.config_lock);
> +	mutex_unlock(&vcpu->kvm->arch.config_lock);
> +	mutex_unlock(&vcpu->mutex);
> +#endif

Shouldn't this hunk be moved to the previous patch?

> +
>  	/* Force users to call KVM_ARM_VCPU_INIT */
>  	vcpu->arch.target = -1;
>  	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b874ec6a37c1..ee445a7a1ef8 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -205,7 +205,7 @@ static int kvm_set_vm_width(struct kvm_vcpu *vcpu)
>  
>  	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
>  
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.config_lock);
>  
>  	if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
>  		/*
> @@ -262,9 +262,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  	bool loaded;
>  	u32 pstate;
>  
> -	mutex_lock(&vcpu->kvm->lock);
> +	mutex_lock(&vcpu->kvm->arch.config_lock);
>  	ret = kvm_set_vm_width(vcpu);
> -	mutex_unlock(&vcpu->kvm->lock);
> +	mutex_unlock(&vcpu->kvm->arch.config_lock);
>  
>  	if (ret)
>  		return ret;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state
  2023-03-22 12:02   ` Marc Zyngier
@ 2023-03-23 19:18     ` Oliver Upton
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-23 19:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

On Wed, Mar 22, 2023 at 12:02:15PM +0000, Marc Zyngier wrote:
> On Thu, 16 Mar 2023 21:14:12 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:

[...]

> > --- a/arch/arm64/kvm/vgic/vgic-its.c
> > +++ b/arch/arm64/kvm/vgic/vgic-its.c
> > @@ -2043,7 +2043,10 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
> >  	if (offset & align)
> >  		return -EINVAL;
> >  
> > -	mutex_lock(&dev->kvm->lock);
> > +	if (!lock_all_vcpus(dev->kvm))
> > +		return -EBUSY;
> > +
> > +	mutex_lock(&dev->kvm->arch.config_lock);
> 
> Huh, that's fishy. The whole "lock the VM and the lock the individual
> vcpus" is there to prevent a concurrent creation of a vcpu while we're
> doing stuff that affects them all. Allowing a new vcpu to come online
> while this sequence is happening is ... unexpected.
> 
> Why do we need to drop this initial lock? I'd expect them to be
> completely cumulative.

Urgh.. Yes, you're right. I'll go with kvm->lock -> lock_all_vcpus() ->
kvm->config_lock in the next spin to guard against the vCPU creation
race.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-22 12:02   ` Marc Zyngier
@ 2023-03-23 19:20     ` Oliver Upton
  2023-03-23 19:43       ` Marc Zyngier
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2023-03-23 19:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Jeremy Linton

On Wed, Mar 22, 2023 at 12:02:40PM +0000, Marc Zyngier wrote:
> On Thu, 16 Mar 2023 21:14:10 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > kvm->lock must be taken outside of the vcpu->mutex. Of course, the
> > locking documentation for KVM makes this abundantly clear. Nonetheless,
> > the locking order in KVM/arm64 has been wrong for quite a while; we
> > acquire the kvm->lock while holding the vcpu->mutex all over the shop.
> > 
> > All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
> > knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
> > pants down, leading to lockdep barfing:
> > 
> >  ======================================================
> >  WARNING: possible circular locking dependency detected
> >  6.2.0-rc7+ #19 Not tainted
> >  ------------------------------------------------------
> >  qemu-system-aar/859 is trying to acquire lock:
> >  ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274
> > 
> >  but task is already holding lock:
> >  ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0
> > 
> >  which lock already depends on the new lock.
> > 
> > Add a dedicated lock to serialize writes to VM-scoped configuration from
> > the context of a vCPU. Protect the register width flags with the new
> > lock, thus avoiding the need to grab the kvm->lock while holding
> > vcpu->mutex in kvm_reset_vcpu().
> > 
> > Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> > Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/arm.c              | 18 ++++++++++++++++++
> >  arch/arm64/kvm/reset.c            |  6 +++---
> >  3 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 917586237a4d..1f4b9708a775 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -185,6 +185,9 @@ struct kvm_protected_vm {
> >  };
> >  
> >  struct kvm_arch {
> > +	/* Protects VM-scoped configuration data */
> > +	struct mutex config_lock;
> > +
> 
> nit: can we move this down into the structure and keep the MM stuff on
> its own at the top? Placing it next to the flags would make some
> sense, as these flags are definitely related to configuration matters.

Sure thing!

> >  	struct kvm_s2_mmu mmu;
> >  
> >  	/* VTCR_EL2 value for this VM */
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 731a78f85915..1478bec52767 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -128,6 +128,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >  {
> >  	int ret;
> >  
> > +	mutex_init(&kvm->arch.config_lock);
> > +
> > +#ifdef CONFIG_LOCKDEP
> > +	/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
> > +	mutex_lock(&kvm->lock);
> > +	mutex_lock(&kvm->arch.config_lock);
> > +	mutex_unlock(&kvm->arch.config_lock);
> > +	mutex_unlock(&kvm->lock);
> > +#endif
> > +
> >  	ret = kvm_share_hyp(kvm, kvm + 1);
> >  	if (ret)
> >  		return ret;
> > @@ -328,6 +338,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >  
> >  	spin_lock_init(&vcpu->arch.mp_state_lock);
> >  
> > +#ifdef CONFIG_LOCKDEP
> > +	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
> > +	mutex_lock(&vcpu->mutex);
> > +	mutex_lock(&vcpu->kvm->arch.config_lock);
> > +	mutex_unlock(&vcpu->kvm->arch.config_lock);
> > +	mutex_unlock(&vcpu->mutex);
> > +#endif
> 
> Shouldn't this hunk be moved to the previous patch?

Uh, I don't believe so since this is the patch that actually introduces
kvm_arch::config_lock. The last patch was aimed at a separate lock for
mp state.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-23 19:20     ` Oliver Upton
@ 2023-03-23 19:43       ` Marc Zyngier
  2023-03-23 19:49         ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2023-03-23 19:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Jeremy Linton

On 2023-03-23 19:20, Oliver Upton wrote:
> On Wed, Mar 22, 2023 at 12:02:40PM +0000, Marc Zyngier wrote:
>> On Thu, 16 Mar 2023 21:14:10 +0000,
>> Oliver Upton <oliver.upton@linux.dev> wrote:
>> >
>> > kvm->lock must be taken outside of the vcpu->mutex. Of course, the
>> > locking documentation for KVM makes this abundantly clear. Nonetheless,
>> > the locking order in KVM/arm64 has been wrong for quite a while; we
>> > acquire the kvm->lock while holding the vcpu->mutex all over the shop.
>> >
>> > All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
>> > knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
>> > pants down, leading to lockdep barfing:
>> >
>> >  ======================================================
>> >  WARNING: possible circular locking dependency detected
>> >  6.2.0-rc7+ #19 Not tainted
>> >  ------------------------------------------------------
>> >  qemu-system-aar/859 is trying to acquire lock:
>> >  ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274
>> >
>> >  but task is already holding lock:
>> >  ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0
>> >
>> >  which lock already depends on the new lock.
>> >
>> > Add a dedicated lock to serialize writes to VM-scoped configuration from
>> > the context of a vCPU. Protect the register width flags with the new
>> > lock, thus avoiding the need to grab the kvm->lock while holding
>> > vcpu->mutex in kvm_reset_vcpu().
>> >
>> > Reported-by: Jeremy Linton <jeremy.linton@arm.com>
>> > Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/
>> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>> > ---
>> >  arch/arm64/include/asm/kvm_host.h |  3 +++
>> >  arch/arm64/kvm/arm.c              | 18 ++++++++++++++++++
>> >  arch/arm64/kvm/reset.c            |  6 +++---
>> >  3 files changed, 24 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> > index 917586237a4d..1f4b9708a775 100644
>> > --- a/arch/arm64/include/asm/kvm_host.h
>> > +++ b/arch/arm64/include/asm/kvm_host.h
>> > @@ -185,6 +185,9 @@ struct kvm_protected_vm {
>> >  };
>> >
>> >  struct kvm_arch {
>> > +	/* Protects VM-scoped configuration data */
>> > +	struct mutex config_lock;
>> > +
>> 
>> nit: can we move this down into the structure and keep the MM stuff on
>> its own at the top? Placing it next to the flags would make some
>> sense, as these flags are definitely related to configuration matters.
> 
> Sure thing!
> 
>> >  	struct kvm_s2_mmu mmu;
>> >
>> >  	/* VTCR_EL2 value for this VM */
>> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> > index 731a78f85915..1478bec52767 100644
>> > --- a/arch/arm64/kvm/arm.c
>> > +++ b/arch/arm64/kvm/arm.c
>> > @@ -128,6 +128,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>> >  {
>> >  	int ret;
>> >
>> > +	mutex_init(&kvm->arch.config_lock);
>> > +
>> > +#ifdef CONFIG_LOCKDEP
>> > +	/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
>> > +	mutex_lock(&kvm->lock);
>> > +	mutex_lock(&kvm->arch.config_lock);
>> > +	mutex_unlock(&kvm->arch.config_lock);
>> > +	mutex_unlock(&kvm->lock);
>> > +#endif
>> > +
>> >  	ret = kvm_share_hyp(kvm, kvm + 1);
>> >  	if (ret)
>> >  		return ret;
>> > @@ -328,6 +338,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>> >
>> >  	spin_lock_init(&vcpu->arch.mp_state_lock);
>> >
>> > +#ifdef CONFIG_LOCKDEP
>> > +	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
>> > +	mutex_lock(&vcpu->mutex);
>> > +	mutex_lock(&vcpu->kvm->arch.config_lock);
>> > +	mutex_unlock(&vcpu->kvm->arch.config_lock);
>> > +	mutex_unlock(&vcpu->mutex);
>> > +#endif
>> 
>> Shouldn't this hunk be moved to the previous patch?
> 
> Uh, I don't believe so since this is the patch that actually introduces
> kvm_arch::config_lock. The last patch was aimed at a separate lock for
> mp state.

Nah, you're obviously right. I reviewed this at 4am being jet-lagged.
Not the brightest comment... :-/ Sorry for the noise.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
  2023-03-22 12:02   ` Marc Zyngier
@ 2023-03-23 19:47     ` Oliver Upton
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-23 19:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

On Wed, Mar 22, 2023 at 12:02:24PM +0000, Marc Zyngier wrote:
> On Thu, 16 Mar 2023 21:14:09 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock
> > from the very beginning. One such example is the way vCPU resets are
> > handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI
> > call.
> > 
> > Add a dedicated lock to serialize writes to kvm_vcpu_arch::mp_state.
> > Hold the lock in kvm_psci_vcpu_on() to protect against changes while
> > the reset state is being configured. Ensure that readers read mp_state
> > exactly once. While at it, plug yet another race by taking the
> > mp_state_lock in the KVM_SET_MP_STATE ioctl handler.
> > 
> > As changes to MP state are now guarded with a dedicated lock, drop the
> > kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the
> > reader of reset_state outside of the kvm->lock and insert a barrier to
> > ensure reset_state is read before dropping the pending reset state.
> > 
> > While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least
> > now PSCI CPU_ON no longer depends on it for serializing vCPU reset.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  arch/arm64/kvm/arm.c              | 23 ++++++++++++++++++-----
> >  arch/arm64/kvm/psci.c             | 19 ++++++++++---------
> >  arch/arm64/kvm/reset.c            | 10 ++++++----
> >  4 files changed, 35 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index bcd774d74f34..917586237a4d 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -522,6 +522,7 @@ struct kvm_vcpu_arch {
> >  
> >  	/* vcpu power state */
> >  	struct kvm_mp_state mp_state;
> > +	spinlock_t mp_state_lock;
> >  
> >  	/* Cache some mmu pages needed inside spinlock regions */
> >  	struct kvm_mmu_memory_cache mmu_page_cache;
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 3bd732eaf087..731a78f85915 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -326,6 +326,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >  {
> >  	int err;
> >  
> > +	spin_lock_init(&vcpu->arch.mp_state_lock);
> > +
> >  	/* Force users to call KVM_ARM_VCPU_INIT */
> >  	vcpu->arch.target = -1;
> >  	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> > @@ -443,16 +445,23 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  	vcpu->cpu = -1;
> >  }
> >  
> > -void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
> > +static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
> >  {
> >  	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_STOPPED;
> >  	kvm_make_request(KVM_REQ_SLEEP, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  }
> >  
> > +void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
> > +{
> > +	spin_lock(&vcpu->arch.mp_state_lock);
> > +	__kvm_arm_vcpu_power_off(vcpu);
> > +	spin_unlock(&vcpu->arch.mp_state_lock);
> > +}
> > +
> >  bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu)
> >  {
> > -	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_STOPPED;
> > +	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_STOPPED;
> >  }
> >  
> >  static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> > @@ -464,13 +473,13 @@ static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> >  
> >  static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
> >  {
> > -	return vcpu->arch.mp_state.mp_state == KVM_MP_STATE_SUSPENDED;
> > +	return READ_ONCE(vcpu->arch.mp_state.mp_state) == KVM_MP_STATE_SUSPENDED;
> >  }
> >  
> >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >  				    struct kvm_mp_state *mp_state)
> >  {
> > -	*mp_state = vcpu->arch.mp_state;
> > +	*mp_state = READ_ONCE(vcpu->arch.mp_state);
> >  
> >  	return 0;
> >  }
> > @@ -480,12 +489,14 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  {
> >  	int ret = 0;
> >  
> > +	spin_lock(&vcpu->arch.mp_state_lock);
> > +
> >  	switch (mp_state->mp_state) {
> >  	case KVM_MP_STATE_RUNNABLE:
> >  		vcpu->arch.mp_state = *mp_state;
> 
> Given that the above helpers snapshot mp_state without holding the
> lock, it'd be better to at least turn this into a WRITE_ONCE().

Yeah, definitely need that.

> >  		break;
> >  	case KVM_MP_STATE_STOPPED:
> > -		kvm_arm_vcpu_power_off(vcpu);
> > +		__kvm_arm_vcpu_power_off(vcpu);
> >  		break;
> >  	case KVM_MP_STATE_SUSPENDED:
> >  		kvm_arm_vcpu_suspend(vcpu);
> > @@ -494,6 +505,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  		ret = -EINVAL;
> >  	}
> >  
> > +	spin_unlock(&vcpu->arch.mp_state_lock);
> > +
> >  	return ret;
> >  }
> >  
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 7fbc4c1b9df0..7f1bff1cd670 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -62,6 +62,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	struct vcpu_reset_state *reset_state;
> >  	struct kvm *kvm = source_vcpu->kvm;
> >  	struct kvm_vcpu *vcpu = NULL;
> > +	int ret = PSCI_RET_SUCCESS;
> >  	unsigned long cpu_id;
> >  
> >  	cpu_id = smccc_get_arg1(source_vcpu);
> > @@ -76,11 +77,15 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	 */
> >  	if (!vcpu)
> >  		return PSCI_RET_INVALID_PARAMS;
> > +
> > +	spin_lock(&vcpu->arch.mp_state_lock);
> >  	if (!kvm_arm_vcpu_stopped(vcpu)) {
> >  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > -			return PSCI_RET_ALREADY_ON;
> > +			ret = PSCI_RET_ALREADY_ON;
> >  		else
> > -			return PSCI_RET_INVALID_PARAMS;
> > +			ret = PSCI_RET_INVALID_PARAMS;
> > +
> > +		goto out_unlock;
> >  	}
> >  
> >  	reset_state = &vcpu->arch.reset_state;
> > @@ -108,7 +113,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	vcpu->arch.mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
> >  	kvm_vcpu_wake_up(vcpu);
> >  
> > -	return PSCI_RET_SUCCESS;
> > +out_unlock:
> > +	spin_unlock(&vcpu->arch.mp_state_lock);
> > +	return ret;
> >  }
> >  
> >  static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> > @@ -229,7 +236,6 @@ static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32
> >  
> >  static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm *kvm = vcpu->kvm;
> >  	u32 psci_fn = smccc_get_function(vcpu);
> >  	unsigned long val;
> >  	int ret = 1;
> > @@ -254,9 +260,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
> >  		kvm_psci_narrow_to_32bit(vcpu);
> >  		fallthrough;
> >  	case PSCI_0_2_FN64_CPU_ON:
> > -		mutex_lock(&kvm->lock);
> >  		val = kvm_psci_vcpu_on(vcpu);
> > -		mutex_unlock(&kvm->lock);
> >  		break;
> >  	case PSCI_0_2_FN_AFFINITY_INFO:
> >  		kvm_psci_narrow_to_32bit(vcpu);
> > @@ -395,7 +399,6 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
> >  
> >  static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm *kvm = vcpu->kvm;
> >  	u32 psci_fn = smccc_get_function(vcpu);
> >  	unsigned long val;
> >  
> > @@ -405,9 +408,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
> >  		val = PSCI_RET_SUCCESS;
> >  		break;
> >  	case KVM_PSCI_FN_CPU_ON:
> > -		mutex_lock(&kvm->lock);
> >  		val = kvm_psci_vcpu_on(vcpu);
> > -		mutex_unlock(&kvm->lock);
> >  		break;
> >  	default:
> >  		val = PSCI_RET_NOT_SUPPORTED;
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index 49a3257dec46..b874ec6a37c1 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -264,15 +264,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  
> >  	mutex_lock(&vcpu->kvm->lock);
> >  	ret = kvm_set_vm_width(vcpu);
> > -	if (!ret) {
> > -		reset_state = vcpu->arch.reset_state;
> > -		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
> > -	}
> >  	mutex_unlock(&vcpu->kvm->lock);
> >  
> >  	if (ret)
> >  		return ret;
> >  
> > +	reset_state = vcpu->arch.reset_state;
> > +
> > +	/* Ensure reset_state is read before clearing the pending state */
> > +	smp_rmb();
> > +	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
> 
> What prevents a concurrent PSCI call from messing with this state? I'd
> have expected this to be done while holding the mp_state lock... There
> must be something, somewhere, but I fail to spot it right now.

It is a bit shaky, but my expectation was that the vCPU couldn't be
transitioned from RUNNABLE -> STOPPED w/o holding the vcpu->mutex, thus
any PSCI CPU_ON calls would fail as the vCPU is marked as RUNNABLE until
the reset completes.

However, looking at this with fresh eyes, the kvm_prepare_system_event()
flow breaks my expectation and does an unguarded transition to STOPPED
state.

So, in the interest of making the whole dance correct I'll take the
mp_state_lock here and in kvm_prepare_system_event(). Thanks for
spotting this.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-23 19:43       ` Marc Zyngier
@ 2023-03-23 19:49         ` Oliver Upton
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2023-03-23 19:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson, Jeremy Linton

On Thu, Mar 23, 2023 at 07:43:04PM +0000, Marc Zyngier wrote:
> On 2023-03-23 19:20, Oliver Upton wrote:
> > On Wed, Mar 22, 2023 at 12:02:40PM +0000, Marc Zyngier wrote:
> > > On Thu, 16 Mar 2023 21:14:10 +0000, Oliver Upton <oliver.upton@linux.dev> wrote:

[...]

> > > > +#ifdef CONFIG_LOCKDEP
> > > > +	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
> > > > +	mutex_lock(&vcpu->mutex);
> > > > +	mutex_lock(&vcpu->kvm->arch.config_lock);
> > > > +	mutex_unlock(&vcpu->kvm->arch.config_lock);
> > > > +	mutex_unlock(&vcpu->mutex);
> > > > +#endif
> > > 
> > > Shouldn't this hunk be moved to the previous patch?
> > 
> > Uh, I don't believe so since this is the patch that actually introduces
> > kvm_arch::config_lock. The last patch was aimed at a separate lock for
> > mp state.
> 
> Nah, you're obviously right. I reviewed this at 4am being jet-lagged.
> Not the brightest comment... :-/ Sorry for the noise.

No problem at all! I wish I had as good of an excuse as you for the times
I gave far worse feedback...

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-16 21:14 ` [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width Oliver Upton
  2023-03-22 12:02   ` Marc Zyngier
@ 2023-03-23 20:09   ` Jeremy Linton
  2023-03-23 20:45     ` Oliver Upton
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Linton @ 2023-03-23 20:09 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

Hi,

On 3/16/23 16:14, Oliver Upton wrote:
> kvm->lock must be taken outside of the vcpu->mutex. Of course, the
> locking documentation for KVM makes this abundantly clear. Nonetheless,
> the locking order in KVM/arm64 has been wrong for quite a while; we
> acquire the kvm->lock while holding the vcpu->mutex all over the shop.
> 
> All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
> knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
> pants down, leading to lockdep barfing:

Thanks for looking at this! It had a bit of fuzz applying to -rc3, did I 
miss a required patch?

This patch makes the lockdep warnings I was seeing go away but I'm 
seeing similar lockdep problem while running the kvm kselftests. In 
particular I think it was "selftests: kvm: debug-exceptions" which threw 
the warning.

So, i'm not sure its completely fixed. I ran the previous one a couple 
weeks back before you respun it, and IIRC didn't see any errors.






> 
>   ======================================================
>   WARNING: possible circular locking dependency detected
>   6.2.0-rc7+ #19 Not tainted
>   ------------------------------------------------------
>   qemu-system-aar/859 is trying to acquire lock:
>   ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274
> 
>   but task is already holding lock:
>   ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0
> 
>   which lock already depends on the new lock.
> 
> Add a dedicated lock to serialize writes to VM-scoped configuration from
> the context of a vCPU. Protect the register width flags with the new
> lock, thus avoiding the need to grab the kvm->lock while holding
> vcpu->mutex in kvm_reset_vcpu().
> 
> Reported-by: Jeremy Linton <jeremy.linton@arm.com>
> Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_host.h |  3 +++
>   arch/arm64/kvm/arm.c              | 18 ++++++++++++++++++
>   arch/arm64/kvm/reset.c            |  6 +++---
>   3 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 917586237a4d..1f4b9708a775 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -185,6 +185,9 @@ struct kvm_protected_vm {
>   };
>   
>   struct kvm_arch {
> +	/* Protects VM-scoped configuration data */
> +	struct mutex config_lock;
> +
>   	struct kvm_s2_mmu mmu;
>   
>   	/* VTCR_EL2 value for this VM */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 731a78f85915..1478bec52767 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -128,6 +128,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   {
>   	int ret;
>   
> +	mutex_init(&kvm->arch.config_lock);
> +
> +#ifdef CONFIG_LOCKDEP
> +	/* Clue in lockdep that the config_lock must be taken inside kvm->lock */
> +	mutex_lock(&kvm->lock);
> +	mutex_lock(&kvm->arch.config_lock);
> +	mutex_unlock(&kvm->arch.config_lock);
> +	mutex_unlock(&kvm->lock);
> +#endif
> +
>   	ret = kvm_share_hyp(kvm, kvm + 1);
>   	if (ret)
>   		return ret;
> @@ -328,6 +338,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>   
>   	spin_lock_init(&vcpu->arch.mp_state_lock);
>   
> +#ifdef CONFIG_LOCKDEP
> +	/* Inform lockdep that the config_lock is acquired after vcpu->mutex */
> +	mutex_lock(&vcpu->mutex);
> +	mutex_lock(&vcpu->kvm->arch.config_lock);
> +	mutex_unlock(&vcpu->kvm->arch.config_lock);
> +	mutex_unlock(&vcpu->mutex);
> +#endif
> +
>   	/* Force users to call KVM_ARM_VCPU_INIT */
>   	vcpu->arch.target = -1;
>   	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b874ec6a37c1..ee445a7a1ef8 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -205,7 +205,7 @@ static int kvm_set_vm_width(struct kvm_vcpu *vcpu)
>   
>   	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
>   
> -	lockdep_assert_held(&kvm->lock);
> +	lockdep_assert_held(&kvm->arch.config_lock);
>   
>   	if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
>   		/*
> @@ -262,9 +262,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   	bool loaded;
>   	u32 pstate;
>   
> -	mutex_lock(&vcpu->kvm->lock);
> +	mutex_lock(&vcpu->kvm->arch.config_lock);
>   	ret = kvm_set_vm_width(vcpu);
> -	mutex_unlock(&vcpu->kvm->lock);
> +	mutex_unlock(&vcpu->kvm->arch.config_lock);
>   
>   	if (ret)
>   		return ret;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-23 20:09   ` Jeremy Linton
@ 2023-03-23 20:45     ` Oliver Upton
  2023-03-23 22:45       ` Jeremy Linton
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2023-03-23 20:45 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

Jeremy,

On Thu, Mar 23, 2023 at 03:09:54PM -0500, Jeremy Linton wrote:
> Hi,
> 
> On 3/16/23 16:14, Oliver Upton wrote:
> > kvm->lock must be taken outside of the vcpu->mutex. Of course, the
> > locking documentation for KVM makes this abundantly clear. Nonetheless,
> > the locking order in KVM/arm64 has been wrong for quite a while; we
> > acquire the kvm->lock while holding the vcpu->mutex all over the shop.
> > 
> > All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
> > knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
> > pants down, leading to lockdep barfing:
> 
> Thanks for looking at this! It had a bit of fuzz applying to -rc3, did I
> miss a required patch?
> 
> This patch makes the lockdep warnings I was seeing go away but I'm seeing
> similar lockdep problem while running the kvm kselftests. In particular I
> think it was "selftests: kvm: debug-exceptions" which threw the warning.

Hmm, that's odd. IIRC the only test that exploded for me w/o the commit
below was arch_timer.

> So, i'm not sure its completely fixed. I ran the previous one a couple weeks
> back before you respun it, and IIRC didn't see any errors.

Thanks for taking this for a spin! This series depends on commit 47053904e182
("KVM: arm64: timers: Convert per-vcpu virtual offset to a global value")
which is only present in kvmarm/fixes at the moment. I had sent out a PR for
this last week but Paolo has yet to pull.

With both Marc's patch and this series I'm unable to reproduce lockdep
warnings w/ selftests or kvmtool. If it isn't too much trouble can you
give kvmarm/fixes plus this series a whirl and see if everything is
addressed?

Otherwise you may have stumbled into some more crud we'll need to
address :)

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width
  2023-03-23 20:45     ` Oliver Upton
@ 2023-03-23 22:45       ` Jeremy Linton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Linton @ 2023-03-23 22:45 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

Hi,

On 3/23/23 15:45, Oliver Upton wrote:
> Jeremy,
> 
> On Thu, Mar 23, 2023 at 03:09:54PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> On 3/16/23 16:14, Oliver Upton wrote:
>>> kvm->lock must be taken outside of the vcpu->mutex. Of course, the
>>> locking documentation for KVM makes this abundantly clear. Nonetheless,
>>> the locking order in KVM/arm64 has been wrong for quite a while; we
>>> acquire the kvm->lock while holding the vcpu->mutex all over the shop.
>>>
>>> All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep
>>> knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our
>>> pants down, leading to lockdep barfing:
>>
>> Thanks for looking at this! It had a bit of fuzz applying to -rc3, did I
>> miss a required patch?
>>
>> This patch makes the lockdep warnings I was seeing go away but I'm seeing
>> similar lockdep problem while running the kvm kselftests. In particular I
>> think it was "selftests: kvm: debug-exceptions" which threw the warning.
> 
> Hmm, that's odd. IIRC the only test that exploded for me w/o the commit
> below was arch_timer.
> 
>> So, i'm not sure its completely fixed. I ran the previous one a couple weeks
>> back before you respun it, and IIRC didn't see any errors.
> 
> Thanks for taking this for a spin! This series depends on commit 47053904e182
> ("KVM: arm64: timers: Convert per-vcpu virtual offset to a global value")
> which is only present in kvmarm/fixes at the moment. I had sent out a PR for
> this last week but Paolo has yet to pull.
> 
> With both Marc's patch and this series I'm unable to reproduce lockdep
> warnings w/ selftests or kvmtool. If it isn't too much trouble can you
> give kvmarm/fixes plus this series a whirl and see if everything is
> addressed?
> 
> Otherwise you may have stumbled into some more crud we'll need to
> address :)

Ok, I reran everything, including with the latest arm/kvm patches and it 
seems I created the problem. Which I realized almost immediately after 
sending the original comment, as is too frequently the case.

I will put the tested by on the main set, although it should be noted 
the selftest doesn't complete cleanly on armv8.0 machines because of the:

GUEST_ASSERT(guest_check_lse()) line in guest_cas().

So, this error was largely noise.


Thanks,



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion
  2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
                   ` (3 preceding siblings ...)
  2023-03-16 21:14 ` [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state Oliver Upton
@ 2023-03-23 22:48 ` Jeremy Linton
  4 siblings, 0 replies; 17+ messages in thread
From: Jeremy Linton @ 2023-03-23 22:48 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier
  Cc: James Morse, Suzuki K Poulose, kvmarm, Zenghui Yu,
	linux-arm-kernel, Sean Christopherson

Hi,

On 3/16/23 16:14, Oliver Upton wrote:
> As it so happens, lock ordering in KVM/arm64 is completely backwards.
> There's a significant amount of VM-wide state that needs to be accessed
> from the context of a vCPU. Until now, this was accomplished by
> acquiring the kvm->lock, but that cannot be nested within vcpu->mutex.
> 
> This series fixes the issue with some fine-grained locking for MP state
> and a new, dedicated mutex that can nest with both kvm->lock and
> vcpu->mutex.
> 
> Tested with kvmtool and QEMU scaled up to 64 vCPUs on a kernel w/
> lockdep enabled. Applies to kvmarm/next.

This set makes the reported lockdep error go away for me, it also 
appears to complete some basic kernel/kvm testing without error as well. So,

Tested-by: Jeremy Linton <jeremy.linton@arm.com>


Thanks,

> 
> v1: http://lore.kernel.org/kvmarm/20230308083947.3760066-1-oliver.upton@linux.dev
> 
> v1 -> v2:
>   - Add a dedicated lock for serializing writes to MP state
>   - Inform lockdep of acquisition order at time of VM/vCPU creation
>   - Plug a race with GIC creation (Sean)
>   - Use the config_lock in GIC ITS flows as well. There is now a single
>     (valid) use of kvm->lock when enabling MTE.
> 
> Oliver Upton (4):
>    KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
>    KVM: arm64: Avoid lock inversion when setting the VM register width
>    KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
>    KVM: arm64: Use config_lock to protect vgic state
> 
>   arch/arm64/include/asm/kvm_host.h     |  4 ++
>   arch/arm64/kvm/arm.c                  | 45 +++++++++++++++++++----
>   arch/arm64/kvm/guest.c                |  2 +
>   arch/arm64/kvm/hypercalls.c           |  4 +-
>   arch/arm64/kvm/pmu-emul.c             | 23 +++---------
>   arch/arm64/kvm/psci.c                 | 19 +++++-----
>   arch/arm64/kvm/reset.c                | 16 ++++----
>   arch/arm64/kvm/vgic/vgic-debug.c      |  8 ++--
>   arch/arm64/kvm/vgic/vgic-init.c       | 33 ++++++++++-------
>   arch/arm64/kvm/vgic/vgic-its.c        | 29 ++++++---------
>   arch/arm64/kvm/vgic/vgic-kvm-device.c | 53 ++++++++++++---------------
>   arch/arm64/kvm/vgic/vgic-mmio-v3.c    |  4 +-
>   arch/arm64/kvm/vgic/vgic-mmio.c       | 12 +++---
>   arch/arm64/kvm/vgic/vgic-v4.c         | 11 +++---
>   arch/arm64/kvm/vgic/vgic.c            |  2 +-
>   15 files changed, 146 insertions(+), 119 deletions(-)
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-23 22:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 21:14 [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Oliver Upton
2023-03-16 21:14 ` [PATCH v2 1/4] KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Oliver Upton
2023-03-22 12:02   ` Marc Zyngier
2023-03-23 19:47     ` Oliver Upton
2023-03-16 21:14 ` [PATCH v2 2/4] KVM: arm64: Avoid lock inversion when setting the VM register width Oliver Upton
2023-03-22 12:02   ` Marc Zyngier
2023-03-23 19:20     ` Oliver Upton
2023-03-23 19:43       ` Marc Zyngier
2023-03-23 19:49         ` Oliver Upton
2023-03-23 20:09   ` Jeremy Linton
2023-03-23 20:45     ` Oliver Upton
2023-03-23 22:45       ` Jeremy Linton
2023-03-16 21:14 ` [PATCH v2 3/4] KVM: arm64: Use config_lock to protect data ordered against KVM_RUN Oliver Upton
2023-03-16 21:14 ` [PATCH v2 4/4] KVM: arm64: Use config_lock to protect vgic state Oliver Upton
2023-03-22 12:02   ` Marc Zyngier
2023-03-23 19:18     ` Oliver Upton
2023-03-23 22:48 ` [PATCH v2 0/4] KVM: arm64: Fix vcpu->mutex v. kvm->lock inversion Jeremy Linton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).