Re: [PATCH 1/2] KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs

From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Reiji Watanabe <reijiw@google.com>,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>, Peter Shier <pshier@google.com>,
	Ricardo Koller <ricarkol@google.com>,
	Oliver Upton <oupton@google.com>,
	Jing Zhang <jingzhangos@google.com>,
	Raghavendra Rao Anata <rananta@google.com>
Subject: Re: [PATCH 1/2] KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
Date: Tue, 11 Jan 2022 16:11:25 +0000	[thread overview]
Message-ID: <Yd2sLbiw/XPCZe7q@monolith.localdoman> (raw)
In-Reply-To: <875yqqtn5q.wl-maz@kernel.org>

Hi Marc,

On Tue, Jan 11, 2022 at 01:30:41PM +0000, Marc Zyngier wrote:
> On Tue, 11 Jan 2022 07:37:57 +0000,
> Reiji Watanabe <reijiw@google.com> wrote:
> > 
> > Hi Alex,
> > 
> > On Mon, Jan 10, 2022 at 2:48 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi Reiji,
> > >
> > > On Sun, Jan 09, 2022 at 09:40:41PM -0800, Reiji Watanabe wrote:
> > > > vcpu_allowed_register_width() checks if all the VCPUs are either
> > > > all 32bit or all 64bit.  Since the checking is done even for vCPUs
> > > > that are not initialized (KVM_ARM_VCPU_INIT has not been done) yet,
> > > > the non-initialized vCPUs are erroneously treated as 64bit vCPU,
> > > > which causes the function to incorrectly detect a mixed-width VM.
> > > >
> > > > Fix vcpu_allowed_register_width() to skip the check for vCPUs that
> > > > are not initialized yet.
> > > >
> > > > Fixes: 66e94d5cafd4 ("KVM: arm64: Prevent mixed-width VM creation")
> > > > Signed-off-by: Reiji Watanabe <reijiw@google.com>
> > > > ---
> > > >  arch/arm64/kvm/reset.c | 11 +++++++++++
> > > >  1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > > index 426bd7fbc3fd..ef78bbc7566a 100644
> > > > --- a/arch/arm64/kvm/reset.c
> > > > +++ b/arch/arm64/kvm/reset.c
> > > > @@ -180,8 +180,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> > > >       if (kvm_has_mte(vcpu->kvm) && is32bit)
> > > >               return false;
> > > >
> > > > +     /*
> > > > +      * Make sure vcpu->arch.target setting is visible from others so
> > > > +      * that the width consistency checking between two vCPUs is done
> > > > +      * by at least one of them at KVM_ARM_VCPU_INIT.
> > > > +      */
> > > > +     smp_mb();
> > >
> > > From ARM DDI 0487G.a, page B2-146 ("Data Memory Barrier (DMB)"):
> > >
> > > "The DMB instruction is a memory barrier instruction that ensures the relative
> > > order of memory accesses before the barrier with memory accesses after the
> > > barrier."
> > >
> > > I'm going to assume from the comment that you are referring to completion of
> > > memory accesses ("Make sure [..] is visible from others"). Please correct me if
> > > I am wrong. In this case, DMB ensures ordering of memory accesses with regards
> > > to writes and reads, not *completion*.  Have a look at
> > > tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus for
> > > the classic message passing example as an example of memory ordering.
> > > Message passing and other patterns are also explained in ARM DDI 0487G.a, page
> > > K11-8363.
> > >
> > > I'm not saying that your approach is incorrect, but the commit message should
> > > explain what memory accesses are being ordered relative to each other and why.
> > 
> > Thank you so much for the review.
> > What I meant with the comment was:
> > ---
> >   DMB is used to make sure that writing @vcpu->arch.target, which is done
> >   by kvm_vcpu_set_target() before getting here, is visible to other PEs
> >   before the following kvm_for_each_vcpu iteration reads the other vCPUs'
> >   target field.
> > ---
> > Did the comment become more clear ?? (Or do I use DMB incorrectly ?)
> > 
> > > > +
> > > >       /* Check that the vcpus are either all 32bit or all 64bit */
> > > >       kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > > > +             /* Skip if KVM_ARM_VCPU_INIT is not done for the vcpu yet */
> > > > +             if (tmp->arch.target == -1)
> > > > +                     continue;
> > 
> > I just noticed DMB(ishld) is needed here to assure ordering between
> > reading tmp->arch.target and reading vcpu->arch.features for this fix.
> > Similarly, kvm_vcpu_set_target() needs DMB(ishst) to assure ordering
> > between writing vcpu->arch.features and writing vcpu->arch.target...
> > I am going to fix them in the v2 series.
> 
> Yes, you'd need at least this, and preferably in their smp_rmb/wmb
> variants.
> 
> However, this looks like a pretty fragile construct, as there are
> multiple paths where we can change target (including some error
> paths from the run loop).
> 
> I'd rather all changes to target and the feature bits happen under the
> kvm->lock, and take that lock when checking for consistency in
> vcpu_allowed_register_width(), as this isn't a fast path. I wrote the
> following, which is obviously incomplete and as usual untested.

I think this is the better approach, because we also want to make sure that
a PE observes changes to target and features as soon as they have been
made, to avoid situations where one PE sets the target and the 32bit
feature, and another PE reads the old values and skips the check, in which
case memory ordering is not enough.

Thanks,
Alex

> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e4727dc771bf..42f2ab80646c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1061,7 +1061,8 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>  static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  			       const struct kvm_vcpu_init *init)
>  {
> -	unsigned int i, ret;
> +	unsigned int i;
> +	int ret = 0;
>  	u32 phys_target = kvm_target_cpu();
>  
>  	if (init->target != phys_target)
> @@ -1074,32 +1075,46 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  	if (vcpu->arch.target != -1 && vcpu->arch.target != init->target)
>  		return -EINVAL;
>  
> +	/* Hazard against a concurent check of the target in kvm_reset_vcpu() */
> +	mutex_lock(&vcpu->kvm->lock);
> +
>  	/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
>  	for (i = 0; i < sizeof(init->features) * 8; i++) {
>  		bool set = (init->features[i / 32] & (1 << (i % 32)));
>  
> -		if (set && i >= KVM_VCPU_MAX_FEATURES)
> -			return -ENOENT;
> +		if (set && i >= KVM_VCPU_MAX_FEATURES) {
> +			ret = -ENOENT;
> +			break;
> +		}
>  
>  		/*
>  		 * Secondary and subsequent calls to KVM_ARM_VCPU_INIT must
>  		 * use the same feature set.
>  		 */
>  		if (vcpu->arch.target != -1 && i < KVM_VCPU_MAX_FEATURES &&
> -		    test_bit(i, vcpu->arch.features) != set)
> -			return -EINVAL;
> +		    test_bit(i, vcpu->arch.features) != set) {
> +			ret = -EINVAL;
> +			break;
> +		}
>  
>  		if (set)
>  			set_bit(i, vcpu->arch.features);
>  	}
>  
> -	vcpu->arch.target = phys_target;
> +	if (!ret)
> +		vcpu->arch.target = phys_target;
> +
> +	mutex_unlock(&vcpu->kvm->lock);
> +	if (ret)
> +		return ret;
>  
>  	/* Now we know what it is, we can reset it. */
>  	ret = kvm_reset_vcpu(vcpu);
>  	if (ret) {
> +		mutex_lock(&vcpu->kvm->lock);
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> +		mutex_unlock(&vcpu->kvm->lock);
>  	}
>  
>  	return ret;
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ef78bbc7566a..fae88a703140 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -180,13 +180,6 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
>  	if (kvm_has_mte(vcpu->kvm) && is32bit)
>  		return false;
>  
> -	/*
> -	 * Make sure vcpu->arch.target setting is visible from others so
> -	 * that the width consistency checking between two vCPUs is done
> -	 * by at least one of them at KVM_ARM_VCPU_INIT.
> -	 */
> -	smp_mb();
> -
>  	/* Check that the vcpus are either all 32bit or all 64bit */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
>  		/* Skip if KVM_ARM_VCPU_INIT is not done for the vcpu yet */
> @@ -222,14 +215,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_reset_state reset_state;
> -	int ret;
> +	int ret = -EINVAL;
>  	bool loaded;
>  	u32 pstate;
>  
>  	mutex_lock(&vcpu->kvm->lock);
> -	reset_state = vcpu->arch.reset_state;
> -	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
> +	if (vcpu_allowed_register_width(vcpu)) {
> +		reset_state = vcpu->arch.reset_state;
> +		WRITE_ONCE(vcpu->arch.reset_state.reset, false);
> +		ret = 0;
> +	}
>  	mutex_unlock(&vcpu->kvm->lock);
> +	if (ret)
> +		goto out;
>  
>  	/* Reset PMU outside of the non-preemptible section */
>  	kvm_pmu_vcpu_reset(vcpu);
> @@ -257,11 +255,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	if (!vcpu_allowed_register_width(vcpu)) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> 
> -- 
> Without deviation from the norm, progress is not possible.