From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 80118C433F5
	for <kvm@archiver.kernel.org>; Tue, 11 Jan 2022 13:30:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S240524AbiAKNau (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 11 Jan 2022 08:30:50 -0500
Received: from sin.source.kernel.org ([145.40.73.55]:34878 "EHLO
        sin.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S240497AbiAKNat (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 11 Jan 2022 08:30:49 -0500
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by sin.source.kernel.org (Postfix) with ESMTPS id 554D0CE199B
        for <kvm@vger.kernel.org>; Tue, 11 Jan 2022 13:30:46 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 587CFC36AEB;
        Tue, 11 Jan 2022 13:30:44 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1641907844;
        bh=8OD1t5cI6qYCWJTAbC1YF4ZMsG67IjB7llqWmOKLa5s=;
        h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
        b=SKf/Mx6T2AJtn1Nzi1DHBbwQ5f0R9h5iem9QDlonGJctzWFLWb3Ppjy523nE2w7SC
         Tvss1JnXnFMpueRVWyLnuD38pqPaxjVYYHBS5F5nmW3jP+2nhefdoGCRFkPgaXlVmp
         IWsXPVaPyGIHm43TvtGsWMqaRIi/B8uwc4PZZfV8U4XAotqvzbVfGzvqJWMR9vkK5p
         bdq5mBVueFHmhgwP6equZyQg6FXZPIooVqpdOm4PXomh+Twy3Ga3iAxWkjivYcbCFS
         pEtAP8aLnyM/1kH21dGJzAl5CTmwQUh9/0W6qh+6qkc524nhb+TGmZ8WYLZnmZxG8R
         CTeU44Vaxo2MQ==
Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org)
        by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <maz@kernel.org>)
        id 1n7HEY-00HOHM-5d; Tue, 11 Jan 2022 13:30:42 +0000
Date:   Tue, 11 Jan 2022 13:30:41 +0000
Message-ID: <875yqqtn5q.wl-maz@kernel.org>
From:   Marc Zyngier <maz@kernel.org>
To:     Reiji Watanabe <reijiw@google.com>
Cc:     Alexandru Elisei <alexandru.elisei@arm.com>,
        kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
        Linux ARM <linux-arm-kernel@lists.infradead.org>,
        James Morse <james.morse@arm.com>,
        Suzuki K Poulose <suzuki.poulose@arm.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Will Deacon <will@kernel.org>, Peter Shier <pshier@google.com>,
        Ricardo Koller <ricarkol@google.com>,
        Oliver Upton <oupton@google.com>,
        Jing Zhang <jingzhangos@google.com>,
        Raghavendra Rao Anata <rananta@google.com>
Subject: Re: [PATCH 1/2] KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
In-Reply-To: <CAAeT=Fz1KPbpmcSbukBuGWMJH=V_oXAJoaDHAen_Gy9Qswo_1Q@mail.gmail.com>
References: <20220110054042.1079932-1-reijiw@google.com>
        <YdwPCcZWD8Uc1eej@monolith.localdoman>
        <CAAeT=Fz1KPbpmcSbukBuGWMJH=V_oXAJoaDHAen_Gy9Qswo_1Q@mail.gmail.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1
 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: reijiw@google.com, alexandru.elisei@arm.com, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, pbonzini@redhat.com, will@kernel.org, pshier@google.com, ricarkol@google.com, oupton@google.com, jingzhangos@google.com, rananta@google.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On Tue, 11 Jan 2022 07:37:57 +0000,
Reiji Watanabe <reijiw@google.com> wrote:
> 
> Hi Alex,
> 
> On Mon, Jan 10, 2022 at 2:48 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi Reiji,
> >
> > On Sun, Jan 09, 2022 at 09:40:41PM -0800, Reiji Watanabe wrote:
> > > vcpu_allowed_register_width() checks if all the VCPUs are either
> > > all 32bit or all 64bit.  Since the checking is done even for vCPUs
> > > that are not initialized (KVM_ARM_VCPU_INIT has not been done) yet,
> > > the non-initialized vCPUs are erroneously treated as 64bit vCPU,
> > > which causes the function to incorrectly detect a mixed-width VM.
> > >
> > > Fix vcpu_allowed_register_width() to skip the check for vCPUs that
> > > are not initialized yet.
> > >
> > > Fixes: 66e94d5cafd4 ("KVM: arm64: Prevent mixed-width VM creation")
> > > Signed-off-by: Reiji Watanabe <reijiw@google.com>
> > > ---
> > >  arch/arm64/kvm/reset.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index 426bd7fbc3fd..ef78bbc7566a 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -180,8 +180,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> > >       if (kvm_has_mte(vcpu->kvm) && is32bit)
> > >               return false;
> > >
> > > +     /*
> > > +      * Make sure vcpu->arch.target setting is visible from others so
> > > +      * that the width consistency checking between two vCPUs is done
> > > +      * by at least one of them at KVM_ARM_VCPU_INIT.
> > > +      */
> > > +     smp_mb();
> >
> > From ARM DDI 0487G.a, page B2-146 ("Data Memory Barrier (DMB)"):
> >
> > "The DMB instruction is a memory barrier instruction that ensures the relative
> > order of memory accesses before the barrier with memory accesses after the
> > barrier."
> >
> > I'm going to assume from the comment that you are referring to completion of
> > memory accesses ("Make sure [..] is visible from others"). Please correct me if
> > I am wrong. In this case, DMB ensures ordering of memory accesses with regards
> > to writes and reads, not *completion*.  Have a look at
> > tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus for
> > the classic message passing example as an example of memory ordering.
> > Message passing and other patterns are also explained in ARM DDI 0487G.a, page
> > K11-8363.
> >
> > I'm not saying that your approach is incorrect, but the commit message should
> > explain what memory accesses are being ordered relative to each other and why.
> 
> Thank you so much for the review.
> What I meant with the comment was:
> ---
>   DMB is used to make sure that writing @vcpu->arch.target, which is done
>   by kvm_vcpu_set_target() before getting here, is visible to other PEs
>   before the following kvm_for_each_vcpu iteration reads the other vCPUs'
>   target field.
> ---
> Did the comment become more clear ?? (Or do I use DMB incorrectly ?)
> 
> > > +
> > >       /* Check that the vcpus are either all 32bit or all 64bit */
> > >       kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > > +             /* Skip if KVM_ARM_VCPU_INIT is not done for the vcpu yet */
> > > +             if (tmp->arch.target == -1)
> > > +                     continue;
> 
> I just noticed DMB(ishld) is needed here to assure ordering between
> reading tmp->arch.target and reading vcpu->arch.features for this fix.
> Similarly, kvm_vcpu_set_target() needs DMB(ishst) to assure ordering
> between writing vcpu->arch.features and writing vcpu->arch.target...
> I am going to fix them in the v2 series.

Yes, you'd need at least this, and preferably in their smp_rmb/wmb
variants.

However, this looks like a pretty fragile construct, as there are
multiple paths where we can change target (including some error
paths from the run loop).

I'd rather all changes to target and the feature bits happen under the
kvm->lock, and take that lock when checking for consistency in
vcpu_allowed_register_width(), as this isn't a fast path. I wrote the
following, which is obviously incomplete and as usual untested.

Thanks,

	M.

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e4727dc771bf..42f2ab80646c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1061,7 +1061,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
 static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 			       const struct kvm_vcpu_init *init)
 {
-	unsigned int i, ret;
+	unsigned int i;
+	int ret = 0;
 	u32 phys_target = kvm_target_cpu();
 
 	if (init->target != phys_target)
@@ -1074,32 +1075,46 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 	if (vcpu->arch.target != -1 && vcpu->arch.target != init->target)
 		return -EINVAL;
 
+	/* Hazard against a concurent check of the target in kvm_reset_vcpu() */
+	mutex_lock(&vcpu->kvm->lock);
+
 	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
 	for (i = 0; i < sizeof(init->features) * 8; i++) {
 		bool set = (init->features[i / 32] & (1 << (i % 32)));
 
-		if (set && i >= KVM_VCPU_MAX_FEATURES)
-			return -ENOENT;
+		if (set && i >= KVM_VCPU_MAX_FEATURES) {
+			ret = -ENOENT;
+			break;
+		}
 
 		/*
 		 * Secondary and subsequent calls to KVM_ARM_VCPU_INIT must
 		 * use the same feature set.
 		 */
 		if (vcpu->arch.target != -1 && i < KVM_VCPU_MAX_FEATURES &&
-		    test_bit(i, vcpu->arch.features) != set)
-			return -EINVAL;
+		    test_bit(i, vcpu->arch.features) != set) {
+			ret = -EINVAL;
+			break;
+		}
 
 		if (set)
 			set_bit(i, vcpu->arch.features);
 	}
 
-	vcpu->arch.target = phys_target;
+	if (!ret)
+		vcpu->arch.target = phys_target;
+
+	mutex_unlock(&vcpu->kvm->lock);
+	if (ret)
+		return ret;
 
 	/* Now we know what it is, we can reset it. */
 	ret = kvm_reset_vcpu(vcpu);
 	if (ret) {
+		mutex_lock(&vcpu->kvm->lock);
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
+		mutex_unlock(&vcpu->kvm->lock);
 	}
 
 	return ret;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ef78bbc7566a..fae88a703140 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -180,13 +180,6 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 	if (kvm_has_mte(vcpu->kvm) && is32bit)
 		return false;
 
-	/*
-	 * Make sure vcpu->arch.target setting is visible from others so
-	 * that the width consistency checking between two vCPUs is done
-	 * by at least one of them at KVM_ARM_VCPU_INIT.
-	 */
-	smp_mb();
-
 	/* Check that the vcpus are either all 32bit or all 64bit */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
 		/* Skip if KVM_ARM_VCPU_INIT is not done for the vcpu yet */
@@ -222,14 +215,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_reset_state reset_state;
-	int ret;
+	int ret = -EINVAL;
 	bool loaded;
 	u32 pstate;
 
 	mutex_lock(&vcpu->kvm->lock);
-	reset_state = vcpu->arch.reset_state;
-	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+	if (vcpu_allowed_register_width(vcpu)) {
+		reset_state = vcpu->arch.reset_state;
+		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+		ret = 0;
+	}
 	mutex_unlock(&vcpu->kvm->lock);
+	if (ret)
+		goto out;
 
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
@@ -257,11 +255,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	if (!vcpu_allowed_register_width(vcpu)) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {

-- 
Without deviation from the norm, progress is not possible.