All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Mark Brown <broonie@kernel.org>,
	linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	James Morse <james.morse@arm.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	asahi@lists.linux.dev, Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Sven Peter <sven@svenpeter.dev>, Hector Martin <marcan@marcan.st>
Subject: Re: [PATCH v4 7/7] KVM: arm64: Normalize cache configuration
Date: Sun, 25 Dec 2022 13:45:36 +0000	[thread overview]
Message-ID: <875ydzfvgf.wl-maz@kernel.org> (raw)
In-Reply-To: <20221221204016.658874-8-akihiko.odaki@daynix.com>

On Wed, 21 Dec 2022 20:40:16 +0000,
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> Before this change, the cache configuration of the physical CPU was
> exposed to vcpus. This is problematic because the cache configuration a
> vcpu sees varies when it migrates between vcpus with different cache
> configurations.
> 
> Fabricate cache configuration from the sanitized value, which holds the
> CTR_EL0 value the userspace sees regardless of which physical CPU it
> resides on.
> 
> CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
> the VMM can restore the values saved with the old kernel.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  arch/arm64/include/asm/cache.h    |   3 +
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/reset.c            |   1 +
>  arch/arm64/kvm/sys_regs.c         | 229 +++++++++++++++++-------------
>  4 files changed, 141 insertions(+), 96 deletions(-)

[...]

>  /* Which cache CCSIDR represents depends on CSSELR value. */
> -static u32 get_ccsidr(u32 csselr)
> +static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> +{
> +	if (vcpu->arch.ccsidr)
> +		return vcpu->arch.ccsidr[csselr];
> +
> +	/*
> +	 * Fabricate a CCSIDR value as the overriding value does not exist.
> +	 * The real CCSIDR value will not be used as it can vary by the
> +	 * physical CPU which the vcpu currently resides in.
> +	 *
> +	 * The line size is determined with get_min_cache_line_size(), which
> +	 * should be valid for all CPUs even if they have different cache
> +	 * configuration.
> +	 *
> +	 * The associativity bits are cleared, meaning the geometry of all data
> +	 * and unified caches (which are guaranteed to be PIPT and thus
> +	 * non-aliasing) are 1 set and 1 way.
> +	 * Guests should not be doing cache operations by set/way at all, and
> +	 * for this reason, we trap them and attempt to infer the intent, so
> +	 * that we can flush the entire guest's address space at the appropriate
> +	 * time. The exposed geometry minimizes the number of the traps.
> +	 * [If guests should attempt to infer aliasing properties from the
> +	 * geometry (which is not permitted by the architecture), they would
> +	 * only do so for virtually indexed caches.]
> +	 */
> +	return get_min_cache_line_size(csselr) << CCSIDR_EL1_LineSize_SHIFT;

It'd be nice to have a comment that says this relies on CCSIDR_EL1
being allowed to return an UNKNOWN value when CSSELR_EL1 does not
specify an implemented cache level (you always return something with
the I or D line-size).

> +}
> +
> +static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
>  {
> -	u32 ccsidr;
> +	u8 line_size = FIELD_GET(CCSIDR_EL1_LineSize, val);
> +	u32 *ccsidr = vcpu->arch.ccsidr;
> +	u32 i;
> +
> +	if ((val & CCSIDR_EL1_RES0) || line_size < get_min_cache_line_size(csselr))
> +		return -EINVAL;
> +
> +	if (!ccsidr) {
> +		if (val == get_ccsidr(vcpu, csselr))
> +			return 0;
> +
> +		ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32), GFP_KERNEL);
> +		if (!ccsidr)
> +			return -ENOMEM;
> +
> +		for (i = 0; i < CSSELR_MAX; i++)
> +			ccsidr[i] = get_ccsidr(vcpu, i);
> +
> +		vcpu->arch.ccsidr = ccsidr;
> +	}
>  
> -	/* Make sure noone else changes CSSELR during this! */
> -	local_irq_disable();
> -	write_sysreg(csselr, csselr_el1);
> -	isb();
> -	ccsidr = read_sysreg(ccsidr_el1);
> -	local_irq_enable();
> +	ccsidr[csselr] = val;
>  
> -	return ccsidr;
> +	return 0;
>  }
>  
>  /*
> @@ -1281,10 +1332,64 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	if (p->is_write)
>  		return write_to_read_only(vcpu, p, r);
>  
> -	p->regval = read_sysreg(clidr_el1);
> +	p->regval = __vcpu_sys_reg(vcpu, r->reg);
>  	return true;
>  }
>  
> +/*
> + * Fabricate a CLIDR_EL1 value instead of using the real value, which can vary
> + * by the physical CPU which the vcpu currently resides in.
> + */
> +static void reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> +	u64 clidr;
> +	u8 loc;
> +
> +	if ((ctr_el0 & CTR_EL0_IDC) || cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {

Having looked into this again, I *think* we can drop the FWB check, as
the above read_sanitised_ftr_reg() is populated from
read_cpuid_effective_cachetype(), which factors in the *effects* of
FWB.

Please check though, as I only had a cursory look.

> +		/*
> +		 * Data cache clean to the PoU is not required so LoUU and LoUIS
> +		 * will not be set and a unified cache, which will be marked as
> +		 * LoC, will be added.
> +		 *
> +		 * If not DIC, let the unified cache L2 so that an instruction
> +		 * cache can be added as L1 later.
> +		 */
> +		loc = (ctr_el0 & CTR_EL0_DIC) ? 1 : 2;
> +		clidr = CACHE_TYPE_UNIFIED << CLIDR_CTYPE_SHIFT(loc);
> +	} else {
> +		/*
> +		 * Data cache clean to the PoU is required so let L1 have a data
> +		 * cache and mark it as LoUU and LoUIS. As L1 has a data cache,
> +		 * it can be marked as LoC too.
> +		 */
> +		loc = 1;
> +		clidr = 1 << CLIDR_LOUU_SHIFT;
> +		clidr |= 1 << CLIDR_LOUIS_SHIFT;
> +		clidr |= CACHE_TYPE_DATA << CLIDR_CTYPE_SHIFT(1);
> +	}
> +
> +	/*
> +	 * Instruction cache invalidation to the PoU is required so let L1 have
> +	 * an instruction cache. If L1 already has a data cache, it will be
> +	 * CACHE_TYPE_SEPARATE.
> +	 */
> +	if (!(ctr_el0 & CTR_EL0_DIC))
> +		clidr |= CACHE_TYPE_INST << CLIDR_CTYPE_SHIFT(1);
> +
> +	clidr |= loc << CLIDR_LOC_SHIFT;
> +
> +	/*
> +	 * Add tag cache unified to data cache. Allocation tags and data are
> +	 * unified in a cache line so that it looks valid even if there is only
> +	 * one cache line.
> +	 */
> +	if (kvm_has_mte(vcpu->kvm))
> +		clidr |= 2 << CLIDR_TTYPE_SHIFT(loc);
> +
> +	__vcpu_sys_reg(vcpu, r->reg) = clidr;
> +}
> +
>  static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  			  const struct sys_reg_desc *r)
>  {
> @@ -1306,22 +1411,12 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  		return write_to_read_only(vcpu, p, r);
>  
>  	csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
> -	p->regval = get_ccsidr(csselr);
> +	csselr &= CSSELR_EL1_Level | CSSELR_EL1_InD;
> +	if (csselr >= CSSELR_MAX)
> +		return undef_access(vcpu, p, r);

I really dislike this UNDEF injection. Yes, this is allowed, but this
is also inconsistent, as you can still set random values of TnD and
not suffer an UNDEF. It also doesn't check for the values that have
been advertised in CLIDR_EL1. I'd rather you return something without
creating havoc.

> +
> +	p->regval = get_ccsidr(vcpu, csselr);
>  
> -	/*
> -	 * Guests should not be doing cache operations by set/way at all, and
> -	 * for this reason, we trap them and attempt to infer the intent, so
> -	 * that we can flush the entire guest's address space at the appropriate
> -	 * time.
> -	 * To prevent this trapping from causing performance problems, let's
> -	 * expose the geometry of all data and unified caches (which are
> -	 * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
> -	 * [If guests should attempt to infer aliasing properties from the
> -	 * geometry (which is not permitted by the architecture), they would
> -	 * only do so for virtually indexed caches.]
> -	 */
> -	if (!(csselr & 1)) // data or unified cache
> -		p->regval &= ~GENMASK(27, 3);
>  	return true;
>  }
>  
> @@ -1610,7 +1705,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0},
>  
>  	{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
> -	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr },
> +	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1 },

Maybe I wasn't clear enough in my previous reviews, so allow me to put
it bluntly: you MUST validate CLIDR_EL1 as it is written from
userspace, and refuse CLIDR_EL1.{LoUU,LoC,LoUIS} values that cannot be
safely handled on this host.

For example, restoring CLIDR_EL1.LoC=0, which implies a CTR_EL0.IDC
behaviour, cannot be restored on a host that cannot offer the IDC
behaviour. This is a critical aspect, as the guest will otherwise
observe something that looks like memory corruption.

So these 3 values must be checked across the board, and you must not
accept a zero value if the HW has a non-zero value for the same
field.  The main difficultly I can see is that this needs to be
tracked on all CPUs, probably as some boot-time information, and
result in a sanitised version of CLIDR_EL1, just like we do for
CTR_EL0.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Hector Martin <marcan@marcan.st>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Will Deacon <will@kernel.org>, Sven Peter <sven@svenpeter.dev>,
	linux-kernel@vger.kernel.org, Mark Brown <broonie@kernel.org>,
	asahi@lists.linux.dev, Catalin Marinas <catalin.marinas@arm.com>,
	kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v4 7/7] KVM: arm64: Normalize cache configuration
Date: Sun, 25 Dec 2022 13:45:36 +0000	[thread overview]
Message-ID: <875ydzfvgf.wl-maz@kernel.org> (raw)
In-Reply-To: <20221221204016.658874-8-akihiko.odaki@daynix.com>

On Wed, 21 Dec 2022 20:40:16 +0000,
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> Before this change, the cache configuration of the physical CPU was
> exposed to vcpus. This is problematic because the cache configuration a
> vcpu sees varies when it migrates between vcpus with different cache
> configurations.
> 
> Fabricate cache configuration from the sanitized value, which holds the
> CTR_EL0 value the userspace sees regardless of which physical CPU it
> resides on.
> 
> CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
> the VMM can restore the values saved with the old kernel.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  arch/arm64/include/asm/cache.h    |   3 +
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/reset.c            |   1 +
>  arch/arm64/kvm/sys_regs.c         | 229 +++++++++++++++++-------------
>  4 files changed, 141 insertions(+), 96 deletions(-)

[...]

>  /* Which cache CCSIDR represents depends on CSSELR value. */
> -static u32 get_ccsidr(u32 csselr)
> +static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> +{
> +	if (vcpu->arch.ccsidr)
> +		return vcpu->arch.ccsidr[csselr];
> +
> +	/*
> +	 * Fabricate a CCSIDR value as the overriding value does not exist.
> +	 * The real CCSIDR value will not be used as it can vary by the
> +	 * physical CPU which the vcpu currently resides in.
> +	 *
> +	 * The line size is determined with get_min_cache_line_size(), which
> +	 * should be valid for all CPUs even if they have different cache
> +	 * configuration.
> +	 *
> +	 * The associativity bits are cleared, meaning the geometry of all data
> +	 * and unified caches (which are guaranteed to be PIPT and thus
> +	 * non-aliasing) are 1 set and 1 way.
> +	 * Guests should not be doing cache operations by set/way at all, and
> +	 * for this reason, we trap them and attempt to infer the intent, so
> +	 * that we can flush the entire guest's address space at the appropriate
> +	 * time. The exposed geometry minimizes the number of the traps.
> +	 * [If guests should attempt to infer aliasing properties from the
> +	 * geometry (which is not permitted by the architecture), they would
> +	 * only do so for virtually indexed caches.]
> +	 */
> +	return get_min_cache_line_size(csselr) << CCSIDR_EL1_LineSize_SHIFT;

It'd be nice to have a comment that says this relies on CCSIDR_EL1
being allowed to return an UNKNOWN value when CSSELR_EL1 does not
specify an implemented cache level (you always return something with
the I or D line-size).

> +}
> +
> +static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
>  {
> -	u32 ccsidr;
> +	u8 line_size = FIELD_GET(CCSIDR_EL1_LineSize, val);
> +	u32 *ccsidr = vcpu->arch.ccsidr;
> +	u32 i;
> +
> +	if ((val & CCSIDR_EL1_RES0) || line_size < get_min_cache_line_size(csselr))
> +		return -EINVAL;
> +
> +	if (!ccsidr) {
> +		if (val == get_ccsidr(vcpu, csselr))
> +			return 0;
> +
> +		ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32), GFP_KERNEL);
> +		if (!ccsidr)
> +			return -ENOMEM;
> +
> +		for (i = 0; i < CSSELR_MAX; i++)
> +			ccsidr[i] = get_ccsidr(vcpu, i);
> +
> +		vcpu->arch.ccsidr = ccsidr;
> +	}
>  
> -	/* Make sure noone else changes CSSELR during this! */
> -	local_irq_disable();
> -	write_sysreg(csselr, csselr_el1);
> -	isb();
> -	ccsidr = read_sysreg(ccsidr_el1);
> -	local_irq_enable();
> +	ccsidr[csselr] = val;
>  
> -	return ccsidr;
> +	return 0;
>  }
>  
>  /*
> @@ -1281,10 +1332,64 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	if (p->is_write)
>  		return write_to_read_only(vcpu, p, r);
>  
> -	p->regval = read_sysreg(clidr_el1);
> +	p->regval = __vcpu_sys_reg(vcpu, r->reg);
>  	return true;
>  }
>  
> +/*
> + * Fabricate a CLIDR_EL1 value instead of using the real value, which can vary
> + * by the physical CPU which the vcpu currently resides in.
> + */
> +static void reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> +	u64 clidr;
> +	u8 loc;
> +
> +	if ((ctr_el0 & CTR_EL0_IDC) || cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {

Having looked into this again, I *think* we can drop the FWB check, as
the above read_sanitised_ftr_reg() is populated from
read_cpuid_effective_cachetype(), which factors in the *effects* of
FWB.

Please check though, as I only had a cursory look.

> +		/*
> +		 * Data cache clean to the PoU is not required so LoUU and LoUIS
> +		 * will not be set and a unified cache, which will be marked as
> +		 * LoC, will be added.
> +		 *
> +		 * If not DIC, let the unified cache L2 so that an instruction
> +		 * cache can be added as L1 later.
> +		 */
> +		loc = (ctr_el0 & CTR_EL0_DIC) ? 1 : 2;
> +		clidr = CACHE_TYPE_UNIFIED << CLIDR_CTYPE_SHIFT(loc);
> +	} else {
> +		/*
> +		 * Data cache clean to the PoU is required so let L1 have a data
> +		 * cache and mark it as LoUU and LoUIS. As L1 has a data cache,
> +		 * it can be marked as LoC too.
> +		 */
> +		loc = 1;
> +		clidr = 1 << CLIDR_LOUU_SHIFT;
> +		clidr |= 1 << CLIDR_LOUIS_SHIFT;
> +		clidr |= CACHE_TYPE_DATA << CLIDR_CTYPE_SHIFT(1);
> +	}
> +
> +	/*
> +	 * Instruction cache invalidation to the PoU is required so let L1 have
> +	 * an instruction cache. If L1 already has a data cache, it will be
> +	 * CACHE_TYPE_SEPARATE.
> +	 */
> +	if (!(ctr_el0 & CTR_EL0_DIC))
> +		clidr |= CACHE_TYPE_INST << CLIDR_CTYPE_SHIFT(1);
> +
> +	clidr |= loc << CLIDR_LOC_SHIFT;
> +
> +	/*
> +	 * Add tag cache unified to data cache. Allocation tags and data are
> +	 * unified in a cache line so that it looks valid even if there is only
> +	 * one cache line.
> +	 */
> +	if (kvm_has_mte(vcpu->kvm))
> +		clidr |= 2 << CLIDR_TTYPE_SHIFT(loc);
> +
> +	__vcpu_sys_reg(vcpu, r->reg) = clidr;
> +}
> +
>  static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  			  const struct sys_reg_desc *r)
>  {
> @@ -1306,22 +1411,12 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  		return write_to_read_only(vcpu, p, r);
>  
>  	csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
> -	p->regval = get_ccsidr(csselr);
> +	csselr &= CSSELR_EL1_Level | CSSELR_EL1_InD;
> +	if (csselr >= CSSELR_MAX)
> +		return undef_access(vcpu, p, r);

I really dislike this UNDEF injection. Yes, this is allowed, but this
is also inconsistent, as you can still set random values of TnD and
not suffer an UNDEF. It also doesn't check for the values that have
been advertised in CLIDR_EL1. I'd rather you return something without
creating havoc.

> +
> +	p->regval = get_ccsidr(vcpu, csselr);
>  
> -	/*
> -	 * Guests should not be doing cache operations by set/way at all, and
> -	 * for this reason, we trap them and attempt to infer the intent, so
> -	 * that we can flush the entire guest's address space at the appropriate
> -	 * time.
> -	 * To prevent this trapping from causing performance problems, let's
> -	 * expose the geometry of all data and unified caches (which are
> -	 * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
> -	 * [If guests should attempt to infer aliasing properties from the
> -	 * geometry (which is not permitted by the architecture), they would
> -	 * only do so for virtually indexed caches.]
> -	 */
> -	if (!(csselr & 1)) // data or unified cache
> -		p->regval &= ~GENMASK(27, 3);
>  	return true;
>  }
>  
> @@ -1610,7 +1705,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0},
>  
>  	{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
> -	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr },
> +	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1 },

Maybe I wasn't clear enough in my previous reviews, so allow me to put
it bluntly: you MUST validate CLIDR_EL1 as it is written from
userspace, and refuse CLIDR_EL1.{LoUU,LoC,LoUIS} values that cannot be
safely handled on this host.

For example, restoring CLIDR_EL1.LoC=0, which implies a CTR_EL0.IDC
behaviour, cannot be restored on a host that cannot offer the IDC
behaviour. This is a critical aspect, as the guest will otherwise
observe something that looks like memory corruption.

So these 3 values must be checked across the board, and you must not
accept a zero value if the HW has a non-zero value for the same
field.  The main difficultly I can see is that this needs to be
tracked on all CPUs, probably as some boot-time information, and
result in a sanitised version of CLIDR_EL1, just like we do for
CTR_EL0.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Mark Brown <broonie@kernel.org>,
	linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	James Morse <james.morse@arm.com>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	asahi@lists.linux.dev, Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Sven Peter <sven@svenpeter.dev>, Hector Martin <marcan@marcan.st>
Subject: Re: [PATCH v4 7/7] KVM: arm64: Normalize cache configuration
Date: Sun, 25 Dec 2022 13:45:36 +0000	[thread overview]
Message-ID: <875ydzfvgf.wl-maz@kernel.org> (raw)
In-Reply-To: <20221221204016.658874-8-akihiko.odaki@daynix.com>

On Wed, 21 Dec 2022 20:40:16 +0000,
Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> 
> Before this change, the cache configuration of the physical CPU was
> exposed to vcpus. This is problematic because the cache configuration a
> vcpu sees varies when it migrates between vcpus with different cache
> configurations.
> 
> Fabricate cache configuration from the sanitized value, which holds the
> CTR_EL0 value the userspace sees regardless of which physical CPU it
> resides on.
> 
> CLIDR_EL1 and CCSIDR_EL1 are now writable from the userspace so that
> the VMM can restore the values saved with the old kernel.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  arch/arm64/include/asm/cache.h    |   3 +
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/reset.c            |   1 +
>  arch/arm64/kvm/sys_regs.c         | 229 +++++++++++++++++-------------
>  4 files changed, 141 insertions(+), 96 deletions(-)

[...]

>  /* Which cache CCSIDR represents depends on CSSELR value. */
> -static u32 get_ccsidr(u32 csselr)
> +static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> +{
> +	if (vcpu->arch.ccsidr)
> +		return vcpu->arch.ccsidr[csselr];
> +
> +	/*
> +	 * Fabricate a CCSIDR value as the overriding value does not exist.
> +	 * The real CCSIDR value will not be used as it can vary by the
> +	 * physical CPU which the vcpu currently resides in.
> +	 *
> +	 * The line size is determined with get_min_cache_line_size(), which
> +	 * should be valid for all CPUs even if they have different cache
> +	 * configuration.
> +	 *
> +	 * The associativity bits are cleared, meaning the geometry of all data
> +	 * and unified caches (which are guaranteed to be PIPT and thus
> +	 * non-aliasing) are 1 set and 1 way.
> +	 * Guests should not be doing cache operations by set/way at all, and
> +	 * for this reason, we trap them and attempt to infer the intent, so
> +	 * that we can flush the entire guest's address space at the appropriate
> +	 * time. The exposed geometry minimizes the number of the traps.
> +	 * [If guests should attempt to infer aliasing properties from the
> +	 * geometry (which is not permitted by the architecture), they would
> +	 * only do so for virtually indexed caches.]
> +	 */
> +	return get_min_cache_line_size(csselr) << CCSIDR_EL1_LineSize_SHIFT;

It'd be nice to have a comment that says this relies on CCSIDR_EL1
being allowed to return an UNKNOWN value when CSSELR_EL1 does not
specify an implemented cache level (you always return something with
the I or D line-size).

> +}
> +
> +static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
>  {
> -	u32 ccsidr;
> +	u8 line_size = FIELD_GET(CCSIDR_EL1_LineSize, val);
> +	u32 *ccsidr = vcpu->arch.ccsidr;
> +	u32 i;
> +
> +	if ((val & CCSIDR_EL1_RES0) || line_size < get_min_cache_line_size(csselr))
> +		return -EINVAL;
> +
> +	if (!ccsidr) {
> +		if (val == get_ccsidr(vcpu, csselr))
> +			return 0;
> +
> +		ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32), GFP_KERNEL);
> +		if (!ccsidr)
> +			return -ENOMEM;
> +
> +		for (i = 0; i < CSSELR_MAX; i++)
> +			ccsidr[i] = get_ccsidr(vcpu, i);
> +
> +		vcpu->arch.ccsidr = ccsidr;
> +	}
>  
> -	/* Make sure noone else changes CSSELR during this! */
> -	local_irq_disable();
> -	write_sysreg(csselr, csselr_el1);
> -	isb();
> -	ccsidr = read_sysreg(ccsidr_el1);
> -	local_irq_enable();
> +	ccsidr[csselr] = val;
>  
> -	return ccsidr;
> +	return 0;
>  }
>  
>  /*
> @@ -1281,10 +1332,64 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	if (p->is_write)
>  		return write_to_read_only(vcpu, p, r);
>  
> -	p->regval = read_sysreg(clidr_el1);
> +	p->regval = __vcpu_sys_reg(vcpu, r->reg);
>  	return true;
>  }
>  
> +/*
> + * Fabricate a CLIDR_EL1 value instead of using the real value, which can vary
> + * by the physical CPU which the vcpu currently resides in.
> + */
> +static void reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
> +	u64 clidr;
> +	u8 loc;
> +
> +	if ((ctr_el0 & CTR_EL0_IDC) || cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {

Having looked into this again, I *think* we can drop the FWB check, as
the above read_sanitised_ftr_reg() is populated from
read_cpuid_effective_cachetype(), which factors in the *effects* of
FWB.

Please check though, as I only had a cursory look.

> +		/*
> +		 * Data cache clean to the PoU is not required so LoUU and LoUIS
> +		 * will not be set and a unified cache, which will be marked as
> +		 * LoC, will be added.
> +		 *
> +		 * If not DIC, let the unified cache L2 so that an instruction
> +		 * cache can be added as L1 later.
> +		 */
> +		loc = (ctr_el0 & CTR_EL0_DIC) ? 1 : 2;
> +		clidr = CACHE_TYPE_UNIFIED << CLIDR_CTYPE_SHIFT(loc);
> +	} else {
> +		/*
> +		 * Data cache clean to the PoU is required so let L1 have a data
> +		 * cache and mark it as LoUU and LoUIS. As L1 has a data cache,
> +		 * it can be marked as LoC too.
> +		 */
> +		loc = 1;
> +		clidr = 1 << CLIDR_LOUU_SHIFT;
> +		clidr |= 1 << CLIDR_LOUIS_SHIFT;
> +		clidr |= CACHE_TYPE_DATA << CLIDR_CTYPE_SHIFT(1);
> +	}
> +
> +	/*
> +	 * Instruction cache invalidation to the PoU is required so let L1 have
> +	 * an instruction cache. If L1 already has a data cache, it will be
> +	 * CACHE_TYPE_SEPARATE.
> +	 */
> +	if (!(ctr_el0 & CTR_EL0_DIC))
> +		clidr |= CACHE_TYPE_INST << CLIDR_CTYPE_SHIFT(1);
> +
> +	clidr |= loc << CLIDR_LOC_SHIFT;
> +
> +	/*
> +	 * Add tag cache unified to data cache. Allocation tags and data are
> +	 * unified in a cache line so that it looks valid even if there is only
> +	 * one cache line.
> +	 */
> +	if (kvm_has_mte(vcpu->kvm))
> +		clidr |= 2 << CLIDR_TTYPE_SHIFT(loc);
> +
> +	__vcpu_sys_reg(vcpu, r->reg) = clidr;
> +}
> +
>  static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  			  const struct sys_reg_desc *r)
>  {
> @@ -1306,22 +1411,12 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  		return write_to_read_only(vcpu, p, r);
>  
>  	csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
> -	p->regval = get_ccsidr(csselr);
> +	csselr &= CSSELR_EL1_Level | CSSELR_EL1_InD;
> +	if (csselr >= CSSELR_MAX)
> +		return undef_access(vcpu, p, r);

I really dislike this UNDEF injection. Yes, this is allowed, but this
is also inconsistent, as you can still set random values of TnD and
not suffer an UNDEF. It also doesn't check for the values that have
been advertised in CLIDR_EL1. I'd rather you return something without
creating havoc.

> +
> +	p->regval = get_ccsidr(vcpu, csselr);
>  
> -	/*
> -	 * Guests should not be doing cache operations by set/way at all, and
> -	 * for this reason, we trap them and attempt to infer the intent, so
> -	 * that we can flush the entire guest's address space at the appropriate
> -	 * time.
> -	 * To prevent this trapping from causing performance problems, let's
> -	 * expose the geometry of all data and unified caches (which are
> -	 * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
> -	 * [If guests should attempt to infer aliasing properties from the
> -	 * geometry (which is not permitted by the architecture), they would
> -	 * only do so for virtually indexed caches.]
> -	 */
> -	if (!(csselr & 1)) // data or unified cache
> -		p->regval &= ~GENMASK(27, 3);
>  	return true;
>  }
>  
> @@ -1610,7 +1705,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0},
>  
>  	{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
> -	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr },
> +	{ SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1 },

Maybe I wasn't clear enough in my previous reviews, so allow me to put
it bluntly: you MUST validate CLIDR_EL1 as it is written from
userspace, and refuse CLIDR_EL1.{LoUU,LoC,LoUIS} values that cannot be
safely handled on this host.

For example, restoring CLIDR_EL1.LoC=0, which implies a CTR_EL0.IDC
behaviour, cannot be restored on a host that cannot offer the IDC
behaviour. This is a critical aspect, as the guest will otherwise
observe something that looks like memory corruption.

So these 3 values must be checked across the board, and you must not
accept a zero value if the HW has a non-zero value for the same
field.  The main difficultly I can see is that this needs to be
tracked on all CPUs, probably as some boot-time information, and
result in a sanitised version of CLIDR_EL1, just like we do for
CTR_EL0.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-12-25 13:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21 20:40 [PATCH v4 0/7] KVM: arm64: Normalize cache configuration Akihiko Odaki
2022-12-21 20:40 ` Akihiko Odaki
2022-12-21 20:40 ` Akihiko Odaki
2022-12-21 20:40 ` Akihiko Odaki
2022-12-21 20:40 ` [PATCH v4 1/7] arm64: Allow the definition of UNKNOWN system register fields Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40 ` [PATCH v4 2/7] arm64/sysreg: Convert CCSIDR_EL1 to automatic generation Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-22 11:45   ` Mark Brown
2022-12-22 11:45     ` Mark Brown
2022-12-22 11:45     ` Mark Brown
2022-12-21 20:40 ` [PATCH v4 3/7] arm64/sysreg: Add CCSIDR2_EL1 Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40 ` [PATCH v4 4/7] arm64/cache: Move CLIDR macro definitions Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40 ` [PATCH v4 5/7] KVM: arm64: Always set HCR_TID2 Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40 ` [PATCH v4 6/7] KVM: arm64: Mask FEAT_CCIDX Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2023-01-05 20:48   ` Oliver Upton
2023-01-05 20:48     ` Oliver Upton
2023-01-05 20:48     ` Oliver Upton
2022-12-21 20:40 ` [PATCH v4 7/7] KVM: arm64: Normalize cache configuration Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-21 20:40   ` Akihiko Odaki
2022-12-25 13:45   ` Marc Zyngier [this message]
2022-12-25 13:45     ` Marc Zyngier
2022-12-25 13:45     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ydzfvgf.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=akihiko.odaki@daynix.com \
    --cc=alexandru.elisei@arm.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mathieu.poirier@linaro.org \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=sven@svenpeter.dev \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.