linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] MTE support for KVM guest
@ 2020-10-26 15:57 Steven Price
  2020-10-26 15:57 ` [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers Steven Price
  2020-10-26 15:57 ` [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
  0 siblings, 2 replies; 17+ messages in thread
From: Steven Price @ 2020-10-26 15:57 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

This series adds support for Arm's Memory Tagging Extension (MTE) to
KVM, allowing KVM guests to make use of it. This builds on the existing
user space support already in v5.10-rc1, see [1] for an overview.

[1] https://lwn.net/Articles/834289/

Changes since v3[2]:

 * Rebased on v5.10-rc1 (required updating KVM_CAP number).

 * Clarified redundant test for system_supports_mte() with a comment.

 * Added Reviewed-by tags from Andrew - thanks!

[2] https://lore.kernel.org/r/20200925093607.3051-1-steven.price%40arm.com

Haibo plans to start looking at the QEMU support for this. I believe
what is in this series should be sufficient, but there is still some
concern that we need more kernel support for easily accessing the tags
for migrating the VM. I don't expect any extra support to change the
interfaces defined here, but rather build on them. My only reservation
would be whether we want to expose the KVM_CAP before everything is
ready.

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  7 +++++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/arm.c                       |  9 +++++++++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       | 20 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 include/uapi/linux/kvm.h                   |  1 +
 8 files changed, 71 insertions(+), 6 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers
  2020-10-26 15:57 [PATCH v4 0/2] MTE support for KVM guest Steven Price
@ 2020-10-26 15:57 ` Steven Price
  2020-11-17 19:20   ` Marc Zyngier
  2020-10-26 15:57 ` [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
  1 sibling, 1 reply; 17+ messages in thread
From: Steven Price @ 2020-10-26 15:57 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h          |  4 ++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0aecbab6a7fb..95ab7345dcc8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -134,6 +134,8 @@ enum vcpu_sysreg {
 	SCTLR_EL1,	/* System Control Register */
 	ACTLR_EL1,	/* Auxiliary Control Register */
 	CPACR_EL1,	/* Coprocessor Access Control */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
 	ZCR_EL1,	/* SVE Control */
 	TTBR0_EL1,	/* Translation Table Base Register 0 */
 	TTBR1_EL1,	/* Translation Table Base Register 1 */
@@ -150,6 +152,8 @@ enum vcpu_sysreg {
 	TPIDR_EL1,	/* Thread ID, Privileged */
 	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
 	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index d52c1b3ce589..7727df0bc09d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -565,7 +565,8 @@
 #define SCTLR_ELx_M	(BIT(0))
 
 #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+			 SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
+	if (system_supports_mte()) {
+		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
+		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+	if (system_supports_mte())
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+	if (system_supports_mte()) {
+		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+	if (system_supports_mte())
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d9117bc56237..430e36e1a13d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return false;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-10-26 15:57 [PATCH v4 0/2] MTE support for KVM guest Steven Price
  2020-10-26 15:57 ` [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers Steven Price
@ 2020-10-26 15:57 ` Steven Price
  2020-11-17 16:07   ` Catalin Marinas
  2020-11-17 19:35   ` Marc Zyngier
  1 sibling, 2 replies; 17+ messages in thread
From: Steven Price @ 2020-10-26 15:57 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This exposes the feature to the guest and automatically tags
memory pages touched by the VM as PG_mte_tagged (and clears the tags
storage) to ensure that the guest cannot see stale tags, and so that the
tags are correctly saved/restored across swap.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  3 +++
 arch/arm64/kvm/arm.c                 |  9 +++++++++
 arch/arm64/kvm/mmu.c                 | 20 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5ef2669ccd6c..66c0d9e7c2b4 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (vcpu->kvm->arch.mte_enabled)
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 95ab7345dcc8..cd993aec0440 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -118,6 +118,9 @@ struct kvm_arch {
 	 */
 	unsigned long *pmu_filter;
 	unsigned int pmuver;
+
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f56122eedffc..7ee93bcac017 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 19aacc7d64de..38fe25310ca1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
+
+	/*
+	 * The otherwise redundant test for system_supports_mte() allows the
+	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
+	 */
+	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
+		/*
+		 * VM will be able to see the page's tags, so we must ensure
+		 * they have been initialised.
+		 */
+		struct page *page = pfn_to_page(pfn);
+		long i, nr_pages = compound_nr(page);
+
+		/* if PG_mte_tagged is set, tags have already been initialised */
+		for (i = 0; i < nr_pages; i++, page++) {
+			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+				mte_clear_page_tags(page_address(page));
+		}
+	}
+
 	if (writable) {
 		prot |= KVM_PGTABLE_PROT_W;
 		kvm_set_pfn_dirty(pfn);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 430e36e1a13d..35a3dc448231 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		    arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED)
 			val |= (1UL << ID_AA64PFR0_CSV2_SHIFT);
 	} else if (id == SYS_ID_AA64PFR1_EL1) {
-		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+		if (!vcpu->kvm->arch.mte_enabled)
+			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (vcpu->kvm->arch.mte_enabled)
+		return 0;
+
 	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..3e6fb5b580a9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_ARM_MTE 191
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-10-26 15:57 ` [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
@ 2020-11-17 16:07   ` Catalin Marinas
  2020-11-18 16:01     ` Steven Price
  2020-11-17 19:35   ` Marc Zyngier
  1 sibling, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2020-11-17 16:07 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

Hi Steven,

On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 19aacc7d64de..38fe25310ca1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +
> +	/*
> +	 * The otherwise redundant test for system_supports_mte() allows the
> +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
> +	 */
> +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}

If this page was swapped out and mapped back in, where does the
restoring from swap happen?

I may have asked in the past, is user_mem_abort() the only path for
mapping Normal pages into stage 2?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers
  2020-10-26 15:57 ` [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers Steven Price
@ 2020-11-17 19:20   ` Marc Zyngier
  2020-11-18 16:01     ` Steven Price
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2020-11-17 19:20 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Catalin Marinas, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

Hi Steven,

These patches unfortunately don't apply to -rc4 anymore, as we repainted
quite a bit while working on fixes. I'd be grateful if you could rebase 
them.

A few other things though:

On 2020-10-26 15:57, Steven Price wrote:
> Define the new system registers that MTE introduces and context switch
> them. The MTE feature is still hidden from the ID register as it isn't
> supported in a VM yet.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm64/include/asm/kvm_host.h          |  4 ++++
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
>  4 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> index 0aecbab6a7fb..95ab7345dcc8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -134,6 +134,8 @@ enum vcpu_sysreg {
>  	SCTLR_EL1,	/* System Control Register */
>  	ACTLR_EL1,	/* Auxiliary Control Register */
>  	CPACR_EL1,	/* Coprocessor Access Control */
> +	RGSR_EL1,	/* Random Allocation Tag Seed Register */
> +	GCR_EL1,	/* Tag Control Register */
>  	ZCR_EL1,	/* SVE Control */
>  	TTBR0_EL1,	/* Translation Table Base Register 0 */
>  	TTBR1_EL1,	/* Translation Table Base Register 1 */
> @@ -150,6 +152,8 @@ enum vcpu_sysreg {
>  	TPIDR_EL1,	/* Thread ID, Privileged */
>  	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
>  	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
> +	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
> +	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
>  	PAR_EL1,	/* Physical Address Register */
>  	MDSCR_EL1,	/* Monitor Debug System Control Register */
>  	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
> diff --git a/arch/arm64/include/asm/sysreg.h 
> b/arch/arm64/include/asm/sysreg.h
> index d52c1b3ce589..7727df0bc09d 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -565,7 +565,8 @@
>  #define SCTLR_ELx_M	(BIT(0))
> 
>  #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
> -			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
> +			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
> +			 SCTLR_ELx_ITFSB)
> 
>  /* SCTLR_EL2 specific flags. */
>  #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) 
> | \
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 7a986030145f..a124ffa49ba3 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -18,6 +18,11 @@
>  static inline void __sysreg_save_common_state(struct kvm_cpu_context 
> *ctxt)
>  {
>  	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
> +	if (system_supports_mte()) {
> +		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
> +		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
> +		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);

As far as I can tell, HCR_EL2.ATA is still clear when running a guest.
So why, do we save/restore this state yet?

Also, I wonder whether we should keep these in the C code. If one day
we enable MTE in the kernel, we will have to move them to the assembly
part, much like we do for PAuth. And I fear that "one day" is pretty
soon:

https://lore.kernel.org/linux-arm-kernel/cover.1605046192.git.andreyknvl@google.com/


> +	}
>  }
> 
>  static inline void __sysreg_save_user_state(struct kvm_cpu_context 
> *ctxt)
> @@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct
> kvm_cpu_context *ctxt)
>  	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
>  	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
>  	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
> +	if (system_supports_mte())
> +		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
> 
>  	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
>  	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
> @@ -63,6 +70,11 @@ static inline void
> __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
>  static inline void __sysreg_restore_common_state(struct 
> kvm_cpu_context *ctxt)
>  {
>  	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
> +	if (system_supports_mte()) {
> +		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
> +		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
> +		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
> +	}
>  }
> 
>  static inline void __sysreg_restore_user_state(struct kvm_cpu_context 
> *ctxt)
> @@ -106,6 +118,8 @@ static inline void
> __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
>  	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
>  	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
> +	if (system_supports_mte())
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
> 
>  	if (!has_vhe() &&
>  	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index d9117bc56237..430e36e1a13d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu
> *vcpu, struct sys_reg_params *p,
>  	return false;
>  }
> 
> +static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
> +				   const struct sys_reg_desc *rd)
> +{
> +	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;

The handling of visibility has changed somehow since 01fe5ace92dd.

> +}
> +
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>  #define ID_SANITISED(name) {			\
>  	SYS_DESC(SYS_##name),			\
> @@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] 
> = {
>  	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
>  	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
> 
> -	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
> -	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
> +	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1,
> .visibility = mte_visibility },
> +	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1,
> .visibility = mte_visibility },
> 
>  	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility =
> sve_visibility },
>  	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
> @@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] 
> = {
>  	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
>  	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
> 
> -	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
> -	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
> +	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1,
> .visibility = mte_visibility },
> +	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown,
> TFSRE0_EL1, .visibility = mte_visibility },
> 
>  	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
>  	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-10-26 15:57 ` [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
  2020-11-17 16:07   ` Catalin Marinas
@ 2020-11-17 19:35   ` Marc Zyngier
  2020-11-18 16:01     ` Steven Price
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2020-11-17 19:35 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Catalin Marinas, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

Hi Steven,

On 2020-10-26 15:57, Steven Price wrote:
> Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> for a VM. This exposes the feature to the guest and automatically tags
> memory pages touched by the VM as PG_mte_tagged (and clears the tags
> storage) to ensure that the guest cannot see stale tags, and so that 
> the
> tags are correctly saved/restored across swap.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h    |  3 +++
>  arch/arm64/kvm/arm.c                 |  9 +++++++++
>  arch/arm64/kvm/mmu.c                 | 20 ++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>  include/uapi/linux/kvm.h             |  1 +
>  6 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h
> b/arch/arm64/include/asm/kvm_emulate.h
> index 5ef2669ccd6c..66c0d9e7c2b4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu 
> *vcpu)
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>  	    vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> +	if (vcpu->kvm->arch.mte_enabled)

Please add a predicate (vcpu_has_mte() or kvm_has_mte()?) for this.

> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
> 
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> index 95ab7345dcc8..cd993aec0440 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -118,6 +118,9 @@ struct kvm_arch {
>  	 */
>  	unsigned long *pmu_filter;
>  	unsigned int pmuver;
> +
> +	/* Memory Tagging Extension enabled for the guest */
> +	bool mte_enabled;
>  };
> 
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f56122eedffc..7ee93bcac017 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		r = 0;
>  		kvm->arch.return_nisv_io_abort_to_user = true;
>  		break;
> +	case KVM_CAP_ARM_MTE:
> +		if (!system_supports_mte() || kvm->created_vcpus)
> +			return -EINVAL;

You also want to avoid 32bit guests. Also, what is the rational for
this being a VM capability as opposed to a CPU feature, similar
to SVE and PMU?

> +		r = 0;
> +		kvm->arch.mte_enabled = true;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> @@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> long ext)
>  		 */
>  		r = 1;
>  		break;
> +	case KVM_CAP_ARM_MTE:
> +		r = system_supports_mte();

Same comment about 32bit.

> +		break;
>  	case KVM_CAP_STEAL_TIME:
>  		r = kvm_arm_pvtime_supported();
>  		break;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 19aacc7d64de..38fe25310ca1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
> phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +
> +	/*
> +	 * The otherwise redundant test for system_supports_mte() allows the
> +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
> +	 */
> +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) 
> {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}

What are the visibility requirements for the tags, specially if the
guest has its MMU off? Is there any cache management that needs to
occur?

Another thing is device-like memory that is managed by userspace,
such as the QEMU emulated flash, for which there also might be tags.
How is that dealt with? In general, what are the expectations for
tags on memory shared between host and guest? Who owns them?

> +
>  	if (writable) {
>  		prot |= KVM_PGTABLE_PROT_W;
>  		kvm_set_pfn_dirty(pfn);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 430e36e1a13d..35a3dc448231 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu 
> *vcpu,
>  		    arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED)
>  			val |= (1UL << ID_AA64PFR0_CSV2_SHIFT);
>  	} else if (id == SYS_ID_AA64PFR1_EL1) {
> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> +		if (!vcpu->kvm->arch.mte_enabled)
> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu
> *vcpu, struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  				   const struct sys_reg_desc *rd)
>  {
> +	if (vcpu->kvm->arch.mte_enabled)
> +		return 0;
> +
>  	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index ca41220b40b8..3e6fb5b580a9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_X86_USER_SPACE_MSR 188
>  #define KVM_CAP_X86_MSR_FILTER 189
>  #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
> +#define KVM_CAP_ARM_MTE 191
> 
>  #ifdef KVM_CAP_IRQ_ROUTING

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers
  2020-11-17 19:20   ` Marc Zyngier
@ 2020-11-18 16:01     ` Steven Price
  2020-11-18 17:02       ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Price @ 2020-11-18 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Catalin Marinas, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 17/11/2020 19:20, Marc Zyngier wrote:
> Hi Steven,

Hi Marc

> These patches unfortunately don't apply to -rc4 anymore, as we repainted
> quite a bit while working on fixes. I'd be grateful if you could rebase 
> them.

No problem - the changes look relatively minor.

> 
> A few other things though:
> 
> On 2020-10-26 15:57, Steven Price wrote:
>> Define the new system registers that MTE introduces and context switch
>> them. The MTE feature is still hidden from the ID register as it isn't
>> supported in a VM yet.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h          |  4 ++++
>>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>>  arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
>>  4 files changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h
>> b/arch/arm64/include/asm/kvm_host.h
>> index 0aecbab6a7fb..95ab7345dcc8 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -134,6 +134,8 @@ enum vcpu_sysreg {
>>      SCTLR_EL1,    /* System Control Register */
>>      ACTLR_EL1,    /* Auxiliary Control Register */
>>      CPACR_EL1,    /* Coprocessor Access Control */
>> +    RGSR_EL1,    /* Random Allocation Tag Seed Register */
>> +    GCR_EL1,    /* Tag Control Register */
>>      ZCR_EL1,    /* SVE Control */
>>      TTBR0_EL1,    /* Translation Table Base Register 0 */
>>      TTBR1_EL1,    /* Translation Table Base Register 1 */
>> @@ -150,6 +152,8 @@ enum vcpu_sysreg {
>>      TPIDR_EL1,    /* Thread ID, Privileged */
>>      AMAIR_EL1,    /* Aux Memory Attribute Indirection Register */
>>      CNTKCTL_EL1,    /* Timer Control Register (EL1) */
>> +    TFSRE0_EL1,    /* Tag Fault Status Register (EL0) */
>> +    TFSR_EL1,    /* Tag Fault Stauts Register (EL1) */
>>      PAR_EL1,    /* Physical Address Register */
>>      MDSCR_EL1,    /* Monitor Debug System Control Register */
>>      MDCCINT_EL1,    /* Monitor Debug Comms Channel Interrupt Enable 
>> Reg */
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index d52c1b3ce589..7727df0bc09d 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -565,7 +565,8 @@
>>  #define SCTLR_ELx_M    (BIT(0))
>>
>>  #define SCTLR_ELx_FLAGS    (SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
>> -             SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
>> +             SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
>> +             SCTLR_ELx_ITFSB)
>>
>>  /* SCTLR_EL2 specific flags. */
>>  #define SCTLR_EL2_RES1    ((BIT(4))  | (BIT(5))  | (BIT(11)) | 
>> (BIT(16)) | \
>> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>> b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>> index 7a986030145f..a124ffa49ba3 100644
>> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>> @@ -18,6 +18,11 @@
>>  static inline void __sysreg_save_common_state(struct kvm_cpu_context 
>> *ctxt)
>>  {
>>      ctxt_sys_reg(ctxt, MDSCR_EL1)    = read_sysreg(mdscr_el1);
>> +    if (system_supports_mte()) {
>> +        ctxt_sys_reg(ctxt, RGSR_EL1)    = read_sysreg_s(SYS_RGSR_EL1);
>> +        ctxt_sys_reg(ctxt, GCR_EL1)    = read_sysreg_s(SYS_GCR_EL1);
>> +        ctxt_sys_reg(ctxt, TFSRE0_EL1)    = 
>> read_sysreg_s(SYS_TFSRE0_EL1);
> 
> As far as I can tell, HCR_EL2.ATA is still clear when running a guest.
> So why, do we save/restore this state yet?

At this stage it is indeed not necessary. Clearly it's needed after the 
second patch because ATA is enabled for the guest. This is just an 
artifact of doing this as two patches. The first patch adds all the 
save/restoring logic the second the machinery for enabling ATA safely. 
If you've got any suggestions about how to better split it (or indeed if 
you'd prefer the patches squashed) let me know. The only alternative I 
can think of is three patches: the 'mte_enabled' machinery (but without 
a way of enabling it), this patch, followed by a way of turning 
mte_enabled on. But that doesn't seem an improvement to anything other 
than my patch count ;)

> 
> Also, I wonder whether we should keep these in the C code. If one day
> we enable MTE in the kernel, we will have to move them to the assembly
> part, much like we do for PAuth. And I fear that "one day" is pretty
> soon:
> 
> https://lore.kernel.org/linux-arm-kernel/cover.1605046192.git.andreyknvl@google.com/ 

Good point. Although for MTE we do have the option of setting TCO in 
PSTATE so this could remain in C if we're not bothered about the 'gap' 
in KASAN coverage. I haven't yet got my head around how (or indeed if) 
that series handles guests.

> 
> 
> 
>> +    }
>>  }
>>
>>  static inline void __sysreg_save_user_state(struct kvm_cpu_context 
>> *ctxt)
>> @@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct
>> kvm_cpu_context *ctxt)
>>      ctxt_sys_reg(ctxt, CNTKCTL_EL1)    = read_sysreg_el1(SYS_CNTKCTL);
>>      ctxt_sys_reg(ctxt, PAR_EL1)    = read_sysreg(par_el1);
>>      ctxt_sys_reg(ctxt, TPIDR_EL1)    = read_sysreg(tpidr_el1);
>> +    if (system_supports_mte())
>> +        ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
>>
>>      ctxt_sys_reg(ctxt, SP_EL1)    = read_sysreg(sp_el1);
>>      ctxt_sys_reg(ctxt, ELR_EL1)    = read_sysreg_el1(SYS_ELR);
>> @@ -63,6 +70,11 @@ static inline void
>> __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
>>  static inline void __sysreg_restore_common_state(struct 
>> kvm_cpu_context *ctxt)
>>  {
>>      write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
>> +    if (system_supports_mte()) {
>> +        write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
>> +        write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
>> +        write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
>> +    }
>>  }
>>
>>  static inline void __sysreg_restore_user_state(struct kvm_cpu_context 
>> *ctxt)
>> @@ -106,6 +118,8 @@ static inline void
>> __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>>      write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
>>      write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),    par_el1);
>>      write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),    tpidr_el1);
>> +    if (system_supports_mte())
>> +        write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
>>
>>      if (!has_vhe() &&
>>          cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index d9117bc56237..430e36e1a13d 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu
>> *vcpu, struct sys_reg_params *p,
>>      return false;
>>  }
>>
>> +static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>> +                   const struct sys_reg_desc *rd)
>> +{
>> +    return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
> 
> The handling of visibility has changed somehow since 01fe5ace92dd.

Thanks for the pointer!

Steve

>> +}
>> +
>>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>>  #define ID_SANITISED(name) {            \
>>      SYS_DESC(SYS_##name),            \
>> @@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] 
>> = {
>>      { SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
>>      { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
>>
>> -    { SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
>> -    { SYS_DESC(SYS_GCR_EL1), access_mte_regs },
>> +    { SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1,
>> .visibility = mte_visibility },
>> +    { SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1,
>> .visibility = mte_visibility },
>>
>>      { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility =
>> sve_visibility },
>>      { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, 
>> TTBR0_EL1 },
>> @@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] 
>> = {
>>      { SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
>>      { SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
>>
>> -    { SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
>> -    { SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
>> +    { SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1,
>> .visibility = mte_visibility },
>> +    { SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown,
>> TFSRE0_EL1, .visibility = mte_visibility },
>>
>>      { SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
>>      { SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
> 
> Thanks,
> 
>          M.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-17 16:07   ` Catalin Marinas
@ 2020-11-18 16:01     ` Steven Price
  2020-11-18 16:50       ` Catalin Marinas
  2020-11-25 18:13       ` James Morse
  0 siblings, 2 replies; 17+ messages in thread
From: Steven Price @ 2020-11-18 16:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 17/11/2020 16:07, Catalin Marinas wrote:
> Hi Steven,
> 
> On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 19aacc7d64de..38fe25310ca1 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +
>> +	/*
>> +	 * The otherwise redundant test for system_supports_mte() allows the
>> +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
>> +	 */
>> +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised.
>> +		 */
>> +		struct page *page = pfn_to_page(pfn);
>> +		long i, nr_pages = compound_nr(page);
>> +
>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
> 
> If this page was swapped out and mapped back in, where does the
> restoring from swap happen?

Restoring from swap happens above this in the call to gfn_to_pfn_prot()

> I may have asked in the past, is user_mem_abort() the only path for
> mapping Normal pages into stage 2?
> 

That is my understanding (and yes you asked before) and no one has 
corrected me! ;)

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-17 19:35   ` Marc Zyngier
@ 2020-11-18 16:01     ` Steven Price
  0 siblings, 0 replies; 17+ messages in thread
From: Steven Price @ 2020-11-18 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Catalin Marinas, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 17/11/2020 19:35, Marc Zyngier wrote:
> Hi Steven,
> 
> On 2020-10-26 15:57, Steven Price wrote:
>> Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
>> for a VM. This exposes the feature to the guest and automatically tags
>> memory pages touched by the VM as PG_mte_tagged (and clears the tags
>> storage) to ensure that the guest cannot see stale tags, and so that the
>> tags are correctly saved/restored across swap.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> Reviewed-by: Andrew Jones <drjones@redhat.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>>  arch/arm64/include/asm/kvm_host.h    |  3 +++
>>  arch/arm64/kvm/arm.c                 |  9 +++++++++
>>  arch/arm64/kvm/mmu.c                 | 20 ++++++++++++++++++++
>>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>>  include/uapi/linux/kvm.h             |  1 +
>>  6 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index 5ef2669ccd6c..66c0d9e7c2b4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu 
>> *vcpu)
>>      if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>>          vcpu_el1_is_32bit(vcpu))
>>          vcpu->arch.hcr_el2 |= HCR_TID2;
>> +
>> +    if (vcpu->kvm->arch.mte_enabled)
> 
> Please add a predicate (vcpu_has_mte() or kvm_has_mte()?) for this.

Sure

>> +        vcpu->arch.hcr_el2 |= HCR_ATA;
>>  }
>>
>>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/include/asm/kvm_host.h
>> b/arch/arm64/include/asm/kvm_host.h
>> index 95ab7345dcc8..cd993aec0440 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -118,6 +118,9 @@ struct kvm_arch {
>>       */
>>      unsigned long *pmu_filter;
>>      unsigned int pmuver;
>> +
>> +    /* Memory Tagging Extension enabled for the guest */
>> +    bool mte_enabled;
>>  };
>>
>>  struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index f56122eedffc..7ee93bcac017 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>          r = 0;
>>          kvm->arch.return_nisv_io_abort_to_user = true;
>>          break;
>> +    case KVM_CAP_ARM_MTE:
>> +        if (!system_supports_mte() || kvm->created_vcpus)
>> +            return -EINVAL;
> 
> You also want to avoid 32bit guests. Also, what is the rational for

Interesting point, however if I understand correctly the 32 bit flag is 
a VCPU flag. And at this point kvm->created_vcpus==0, so I don't believe 
we actually know whether the guest is 32 bit or not at the point of this 
test. And since this is a per-VM flag it actually can make sense for a 
heterogeneous VM if anyone is crazy enough to want such a thing.

> this being a VM capability as opposed to a CPU feature, similar
> to SVE and PMU?

v1/v2 actually had it as a CPU feature. However you need a per-VM flag 
to enforce the use of tagged memory (the code in user_mem_abort() below).

>> +        r = 0;
>> +        kvm->arch.mte_enabled = true;
>> +        break;
>>      default:
>>          r = -EINVAL;
>>          break;
>> @@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>           */
>>          r = 1;
>>          break;
>> +    case KVM_CAP_ARM_MTE:
>> +        r = system_supports_mte();
> 
> Same comment about 32bit.

As above, we don't know if we're launching a 32 bit guest or not.

>> +        break;
>>      case KVM_CAP_STEAL_TIME:
>>          r = kvm_arm_pvtime_supported();
>>          break;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 19aacc7d64de..38fe25310ca1 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>>      if (vma_pagesize == PAGE_SIZE && !force_pte)
>>          vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>                                 &pfn, &fault_ipa);
>> +
>> +    /*
>> +     * The otherwise redundant test for system_supports_mte() allows the
>> +     * code to be compiled out when CONFIG_ARM64_MTE is not present.
>> +     */
>> +    if (system_supports_mte() && kvm->arch.mte_enabled && 
>> pfn_valid(pfn)) {
>> +        /*
>> +         * VM will be able to see the page's tags, so we must ensure
>> +         * they have been initialised.
>> +         */
>> +        struct page *page = pfn_to_page(pfn);
>> +        long i, nr_pages = compound_nr(page);
>> +
>> +        /* if PG_mte_tagged is set, tags have already been 
>> initialised */
>> +        for (i = 0; i < nr_pages; i++, page++) {
>> +            if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +                mte_clear_page_tags(page_address(page));
>> +        }
>> +    }
> 
> What are the visibility requirements for the tags, specially if the
> guest has its MMU off? Is there any cache management that needs to
> occur?

If the guest has its MMU off then the memory would be tread as Untagged 
by the architecture (the stage 1 page table must provide the 'tagged' 
flag). Architecturally the tag bits handled the same as the data bits so 
no extra/different cache management is required. The only exception in 
the architecture is that the tag values are optionally exposed in the 
normal PA space (in a potentially non-coherent way) - but if that is the 
case that PA space shouldn't be touched by Linux.

> Another thing is device-like memory that is managed by userspace,
> such as the QEMU emulated flash, for which there also might be tags.
> How is that dealt with? In general, what are the expectations for
> tags on memory shared between host and guest? Who owns them?

Actual device-like memory shouldn't be expected to have tags - they 
wouldn't have tags on a real host.

In terms of memory shared between host and guest - the tags are also 
shared, effectively tags are just data. Clearly the host and guest need 
to decide how to share the tag space. In general I would expect the tags 
to be ignored (and tag checking to be disabled) in those shared regions. 
Sadly the architecture doesn't provide a method to prevent the guest 
accessing tags in a region (without also crippling cacheability).

This is one of the areas that is potentially tricky for migration 
because the VMM may want to use MTE but must disable tag checking while 
touching the guest's memory because the guest may be using the tag 
memory for it's own purposes. There is some discussion about this in the 
cover letter of v2:

https://lore.kernel.org/kvmarm/20200904160018.29481-1-steven.price@arm.com/
Thanks,

Steve

>> +
>>      if (writable) {
>>          prot |= KVM_PGTABLE_PROT_W;
>>          kvm_set_pfn_dirty(pfn);
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 430e36e1a13d..35a3dc448231 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>>              arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED)
>>              val |= (1UL << ID_AA64PFR0_CSV2_SHIFT);
>>      } else if (id == SYS_ID_AA64PFR1_EL1) {
>> -        val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>> +        if (!vcpu->kvm->arch.mte_enabled)
>> +            val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>>      } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>>          val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>>               (0xfUL << ID_AA64ISAR1_API_SHIFT) |
>> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu
>> *vcpu, struct sys_reg_params *p,
>>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>>                     const struct sys_reg_desc *rd)
>>  {
>> +    if (vcpu->kvm->arch.mte_enabled)
>> +        return 0;
>> +
>>      return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>>  }
>>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index ca41220b40b8..3e6fb5b580a9 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_X86_USER_SPACE_MSR 188
>>  #define KVM_CAP_X86_MSR_FILTER 189
>>  #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
>> +#define KVM_CAP_ARM_MTE 191
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
> 
> Thanks,
> 
>          M.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-18 16:01     ` Steven Price
@ 2020-11-18 16:50       ` Catalin Marinas
  2020-11-18 17:05         ` Andrew Jones
  2020-11-25 18:13       ` James Morse
  1 sibling, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2020-11-18 16:50 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Wed, Nov 18, 2020 at 04:01:20PM +0000, Steven Price wrote:
> On 17/11/2020 16:07, Catalin Marinas wrote:
> > On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 19aacc7d64de..38fe25310ca1 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > >   	if (vma_pagesize == PAGE_SIZE && !force_pte)
> > >   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> > >   							   &pfn, &fault_ipa);
> > > +
> > > +	/*
> > > +	 * The otherwise redundant test for system_supports_mte() allows the
> > > +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
> > > +	 */
> > > +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
> > > +		/*
> > > +		 * VM will be able to see the page's tags, so we must ensure
> > > +		 * they have been initialised.
> > > +		 */
> > > +		struct page *page = pfn_to_page(pfn);
> > > +		long i, nr_pages = compound_nr(page);
> > > +
> > > +		/* if PG_mte_tagged is set, tags have already been initialised */
> > > +		for (i = 0; i < nr_pages; i++, page++) {
> > > +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> > > +				mte_clear_page_tags(page_address(page));
> > > +		}
> > > +	}
> > 
> > If this page was swapped out and mapped back in, where does the
> > restoring from swap happen?
> 
> Restoring from swap happens above this in the call to gfn_to_pfn_prot()

Looking at the call chain, gfn_to_pfn_prot() ends up with
get_user_pages() using the current->mm (the VMM) and that does a
set_pte_at(), presumably restoring the tags. Does this mean that all
memory mapped by the VMM in user space should have PROT_MTE set?
Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
tags restored from swap (we do save them since when they were mapped,
PG_mte_tagged was set).

So I think the code above should be similar to mte_sync_tags(), even
calling a common function, but I'm not sure where to get the swap pte
from.

An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm
mapped the memory with PROT_MTE.

Yet another option is to always call mte_sync_tags() from set_pte_at()
and defer the pte_tagged() or is_swap_pte() checks to the MTE code.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers
  2020-11-18 16:01     ` Steven Price
@ 2020-11-18 17:02       ` Catalin Marinas
  2020-11-19 12:45         ` Steven Price
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2020-11-18 17:02 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Wed, Nov 18, 2020 at 04:01:18PM +0000, Steven Price wrote:
> On 17/11/2020 19:20, Marc Zyngier wrote:
> > On 2020-10-26 15:57, Steven Price wrote:
> > > diff --git a/arch/arm64/include/asm/sysreg.h
> > > b/arch/arm64/include/asm/sysreg.h
> > > index d52c1b3ce589..7727df0bc09d 100644
> > > --- a/arch/arm64/include/asm/sysreg.h
> > > +++ b/arch/arm64/include/asm/sysreg.h
> > > @@ -565,7 +565,8 @@
> > >  #define SCTLR_ELx_M    (BIT(0))
> > > 
> > >  #define SCTLR_ELx_FLAGS    (SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
> > > -             SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
> > > +             SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
> > > +             SCTLR_ELx_ITFSB)
> > > 
> > >  /* SCTLR_EL2 specific flags. */
> > >  #define SCTLR_EL2_RES1    ((BIT(4))  | (BIT(5))  | (BIT(11)) |
> > > (BIT(16)) | \
> > > diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > > b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > > index 7a986030145f..a124ffa49ba3 100644
> > > --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > > +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> > > @@ -18,6 +18,11 @@
> > >  static inline void __sysreg_save_common_state(struct
> > > kvm_cpu_context *ctxt)
> > >  {
> > >      ctxt_sys_reg(ctxt, MDSCR_EL1)    = read_sysreg(mdscr_el1);
> > > +    if (system_supports_mte()) {
> > > +        ctxt_sys_reg(ctxt, RGSR_EL1)    = read_sysreg_s(SYS_RGSR_EL1);
> > > +        ctxt_sys_reg(ctxt, GCR_EL1)    = read_sysreg_s(SYS_GCR_EL1);
> > > +        ctxt_sys_reg(ctxt, TFSRE0_EL1)    =
> > > read_sysreg_s(SYS_TFSRE0_EL1);
> > 
> > As far as I can tell, HCR_EL2.ATA is still clear when running a guest.
> > So why, do we save/restore this state yet?
> > 
> > Also, I wonder whether we should keep these in the C code. If one day
> > we enable MTE in the kernel, we will have to move them to the assembly
> > part, much like we do for PAuth. And I fear that "one day" is pretty
> > soon:
> > 
> > https://lore.kernel.org/linux-arm-kernel/cover.1605046192.git.andreyknvl@google.com/
> 
> Good point. Although for MTE we do have the option of setting TCO in PSTATE
> so this could remain in C if we're not bothered about the 'gap' in KASAN
> coverage. I haven't yet got my head around how (or indeed if) that series
> handles guests.

I think we should be fine with the currently proposed in-kernel MTE
support. However, setting GCR_EL1 can get in the way if stack tagging is
ever enabled (it breaks single image). The compiler uses GCR_EL1 to
generate different colours for variables on the stack and changing it in
the middle of a function may cause confusion. You'd have to set
PSTATE.TCO for the whole function, either from the caller or, if the
compiler gets smarter, some function attribute.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-18 16:50       ` Catalin Marinas
@ 2020-11-18 17:05         ` Andrew Jones
  2020-11-19 12:45           ` Steven Price
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Jones @ 2020-11-18 17:05 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	qemu-devel, Marc Zyngier, Juan Quintela, Richard Henderson,
	Dr. David Alan Gilbert, Steven Price, James Morse,
	linux-arm-kernel, kvmarm, Thomas Gleixner, Julien Thierry,
	Will Deacon, Dave Martin, linux-kernel

On Wed, Nov 18, 2020 at 04:50:01PM +0000, Catalin Marinas wrote:
> On Wed, Nov 18, 2020 at 04:01:20PM +0000, Steven Price wrote:
> > On 17/11/2020 16:07, Catalin Marinas wrote:
> > > On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 19aacc7d64de..38fe25310ca1 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > >   	if (vma_pagesize == PAGE_SIZE && !force_pte)
> > > >   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> > > >   							   &pfn, &fault_ipa);
> > > > +
> > > > +	/*
> > > > +	 * The otherwise redundant test for system_supports_mte() allows the
> > > > +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
> > > > +	 */
> > > > +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
> > > > +		/*
> > > > +		 * VM will be able to see the page's tags, so we must ensure
> > > > +		 * they have been initialised.
> > > > +		 */
> > > > +		struct page *page = pfn_to_page(pfn);
> > > > +		long i, nr_pages = compound_nr(page);
> > > > +
> > > > +		/* if PG_mte_tagged is set, tags have already been initialised */
> > > > +		for (i = 0; i < nr_pages; i++, page++) {
> > > > +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> > > > +				mte_clear_page_tags(page_address(page));
> > > > +		}
> > > > +	}
> > > 
> > > If this page was swapped out and mapped back in, where does the
> > > restoring from swap happen?
> > 
> > Restoring from swap happens above this in the call to gfn_to_pfn_prot()
> 
> Looking at the call chain, gfn_to_pfn_prot() ends up with
> get_user_pages() using the current->mm (the VMM) and that does a
> set_pte_at(), presumably restoring the tags. Does this mean that all
> memory mapped by the VMM in user space should have PROT_MTE set?
> Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
> tags restored from swap (we do save them since when they were mapped,
> PG_mte_tagged was set).
> 
> So I think the code above should be similar to mte_sync_tags(), even
> calling a common function, but I'm not sure where to get the swap pte
> from.
> 
> An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm
> mapped the memory with PROT_MTE.

This is a very reasonable alternative. The VMM must be aware of whether
the guest may use MTE anyway. Asking it to map the memory with PROT_MTE
when it wants to offer the guest that option is a reasonable requirement.
If the memory is not mapped as such, then the host kernel shouldn't assume
MTE may be used by the guest, and it should even enforce that it is not
(by not enabling the feature).

Thanks,
drew

> 
> Yet another option is to always call mte_sync_tags() from set_pte_at()
> and defer the pte_tagged() or is_swap_pte() checks to the MTE code.
> 
> -- 
> Catalin
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-18 17:05         ` Andrew Jones
@ 2020-11-19 12:45           ` Steven Price
  2020-11-19 16:24             ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Price @ 2020-11-19 12:45 UTC (permalink / raw)
  To: Andrew Jones, Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	Marc Zyngier, Juan Quintela, Richard Henderson, qemu-devel,
	Dr. David Alan Gilbert, James Morse, linux-arm-kernel, kvmarm,
	Thomas Gleixner, Julien Thierry, Will Deacon, Dave Martin,
	linux-kernel

On 18/11/2020 17:05, Andrew Jones wrote:
> On Wed, Nov 18, 2020 at 04:50:01PM +0000, Catalin Marinas wrote:
>> On Wed, Nov 18, 2020 at 04:01:20PM +0000, Steven Price wrote:
>>> On 17/11/2020 16:07, Catalin Marinas wrote:
>>>> On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>> index 19aacc7d64de..38fe25310ca1 100644
>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>    		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>    							   &pfn, &fault_ipa);
>>>>> +
>>>>> +	/*
>>>>> +	 * The otherwise redundant test for system_supports_mte() allows the
>>>>> +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
>>>>> +	 */
>>>>> +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
>>>>> +		/*
>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>> +		 * they have been initialised.
>>>>> +		 */
>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>> +		long i, nr_pages = compound_nr(page);
>>>>> +
>>>>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>> +				mte_clear_page_tags(page_address(page));
>>>>> +		}
>>>>> +	}
>>>>
>>>> If this page was swapped out and mapped back in, where does the
>>>> restoring from swap happen?
>>>
>>> Restoring from swap happens above this in the call to gfn_to_pfn_prot()
>>
>> Looking at the call chain, gfn_to_pfn_prot() ends up with
>> get_user_pages() using the current->mm (the VMM) and that does a
>> set_pte_at(), presumably restoring the tags. Does this mean that all
>> memory mapped by the VMM in user space should have PROT_MTE set?
>> Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
>> tags restored from swap (we do save them since when they were mapped,
>> PG_mte_tagged was set).
>>
>> So I think the code above should be similar to mte_sync_tags(), even
>> calling a common function, but I'm not sure where to get the swap pte
>> from.

You're right - the code is broken as it stands. I've just been able to 
reproduce the loss of tags due to swap.

The problem is that we also don't have a suitable pte to do the restore 
from swap from. So either set_pte_at() would have to unconditionally 
check for MTE tags for all previous swap entries as you suggest below. I 
had a quick go at testing this and hit issues with the idle task getting 
killed during boot - I fear there are some fun issues regarding 
initialisation order here.

Or we enforce PROT_MTE...

>> An alternative is to only enable HCR_EL2.ATA and MTE in guest if the vmm
>> mapped the memory with PROT_MTE.
> 
> This is a very reasonable alternative. The VMM must be aware of whether
> the guest may use MTE anyway. Asking it to map the memory with PROT_MTE
> when it wants to offer the guest that option is a reasonable requirement.
> If the memory is not mapped as such, then the host kernel shouldn't assume
> MTE may be used by the guest, and it should even enforce that it is not
> (by not enabling the feature).

The main issue with this is that the VMM can change the mappings while 
the guest is running, so the only place we can reliably check this is 
during user_mem_abort(). So we can't just downgrade HCR_EL2.ATA. This 
makes the error reporting not so great as the memory access is simply 
faulted. However I do have this working and it's actually (slightly) 
less code.

Another drawback is that the VMM needs to be more careful with the tags 
- e.g. for virtualised devices the VMM can't simply have a non-PROT_MTE 
mapping and ignore what the guest is doing with tags.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers
  2020-11-18 17:02       ` Catalin Marinas
@ 2020-11-19 12:45         ` Steven Price
  0 siblings, 0 replies; 17+ messages in thread
From: Steven Price @ 2020-11-19 12:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 18/11/2020 17:02, Catalin Marinas wrote:
> On Wed, Nov 18, 2020 at 04:01:18PM +0000, Steven Price wrote:
>> On 17/11/2020 19:20, Marc Zyngier wrote:
>>> On 2020-10-26 15:57, Steven Price wrote:
>>>> diff --git a/arch/arm64/include/asm/sysreg.h
>>>> b/arch/arm64/include/asm/sysreg.h
>>>> index d52c1b3ce589..7727df0bc09d 100644
>>>> --- a/arch/arm64/include/asm/sysreg.h
>>>> +++ b/arch/arm64/include/asm/sysreg.h
>>>> @@ -565,7 +565,8 @@
>>>> �#define SCTLR_ELx_M��� (BIT(0))
>>>>
>>>> �#define SCTLR_ELx_FLAGS��� (SCTLR_ELx_M� | SCTLR_ELx_A | SCTLR_ELx_C | \
>>>> -������������ SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
>>>> +������������ SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
>>>> +������������ SCTLR_ELx_ITFSB)
>>>>
>>>> �/* SCTLR_EL2 specific flags. */
>>>> �#define SCTLR_EL2_RES1��� ((BIT(4))� | (BIT(5))� | (BIT(11)) |
>>>> (BIT(16)) | \
>>>> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>>> b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>>> index 7a986030145f..a124ffa49ba3 100644
>>>> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>>> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>>> @@ -18,6 +18,11 @@
>>>> �static inline void __sysreg_save_common_state(struct
>>>> kvm_cpu_context *ctxt)
>>>> �{
>>>> ���� ctxt_sys_reg(ctxt, MDSCR_EL1)��� = read_sysreg(mdscr_el1);
>>>> +��� if (system_supports_mte()) {
>>>> +������� ctxt_sys_reg(ctxt, RGSR_EL1)��� = read_sysreg_s(SYS_RGSR_EL1);
>>>> +������� ctxt_sys_reg(ctxt, GCR_EL1)��� = read_sysreg_s(SYS_GCR_EL1);
>>>> +������� ctxt_sys_reg(ctxt, TFSRE0_EL1)��� =
>>>> read_sysreg_s(SYS_TFSRE0_EL1);
>>>
>>> As far as I can tell, HCR_EL2.ATA is still clear when running a guest.
>>> So why, do we save/restore this state yet?
>>>
>>> Also, I wonder whether we should keep these in the C code. If one day
>>> we enable MTE in the kernel, we will have to move them to the assembly
>>> part, much like we do for PAuth. And I fear that "one day" is pretty
>>> soon:
>>>
>>> https://lore.kernel.org/linux-arm-kernel/cover.1605046192.git.andreyknvl@google.com/
>>
>> Good point. Although for MTE we do have the option of setting TCO in PSTATE
>> so this could remain in C if we're not bothered about the 'gap' in KASAN
>> coverage. I haven't yet got my head around how (or indeed if) that series
>> handles guests.
> 
> I think we should be fine with the currently proposed in-kernel MTE
> support. However, setting GCR_EL1 can get in the way if stack tagging is
> ever enabled (it breaks single image). The compiler uses GCR_EL1 to
> generate different colours for variables on the stack and changing it in
> the middle of a function may cause confusion. You'd have to set
> PSTATE.TCO for the whole function, either from the caller or, if the
> compiler gets smarter, some function attribute.
> 

If the compiler might start playing with TCO then this could also be an 
issue for VMMs which will (at least with the current design) need to use 
TCO to safely access guest memory. Especially if we enforce PROT_MTE 
mappings for the VMM.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-19 12:45           ` Steven Price
@ 2020-11-19 16:24             ` Catalin Marinas
  2020-11-20  9:33               ` Steven Price
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2020-11-19 16:24 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Andrew Jones, Haibo Xu,
	Suzuki K Poulose, Marc Zyngier, Juan Quintela, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, James Morse,
	linux-arm-kernel, kvmarm, Thomas Gleixner, Julien Thierry,
	Will Deacon, Dave Martin, linux-kernel

On Thu, Nov 19, 2020 at 12:45:52PM +0000, Steven Price wrote:
> On 18/11/2020 17:05, Andrew Jones wrote:
> > On Wed, Nov 18, 2020 at 04:50:01PM +0000, Catalin Marinas wrote:
> > > On Wed, Nov 18, 2020 at 04:01:20PM +0000, Steven Price wrote:
> > > > On 17/11/2020 16:07, Catalin Marinas wrote:
> > > > > On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
> > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > > index 19aacc7d64de..38fe25310ca1 100644
> > > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > > @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > > > >    	if (vma_pagesize == PAGE_SIZE && !force_pte)
> > > > > >    		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> > > > > >    							   &pfn, &fault_ipa);
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * The otherwise redundant test for system_supports_mte() allows the
> > > > > > +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
> > > > > > +	 */
> > > > > > +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
> > > > > > +		/*
> > > > > > +		 * VM will be able to see the page's tags, so we must ensure
> > > > > > +		 * they have been initialised.
> > > > > > +		 */
> > > > > > +		struct page *page = pfn_to_page(pfn);
> > > > > > +		long i, nr_pages = compound_nr(page);
> > > > > > +
> > > > > > +		/* if PG_mte_tagged is set, tags have already been initialised */
> > > > > > +		for (i = 0; i < nr_pages; i++, page++) {
> > > > > > +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> > > > > > +				mte_clear_page_tags(page_address(page));
> > > > > > +		}
> > > > > > +	}
> > > > > 
> > > > > If this page was swapped out and mapped back in, where does the
> > > > > restoring from swap happen?
> > > > 
> > > > Restoring from swap happens above this in the call to gfn_to_pfn_prot()
> > > 
> > > Looking at the call chain, gfn_to_pfn_prot() ends up with
> > > get_user_pages() using the current->mm (the VMM) and that does a
> > > set_pte_at(), presumably restoring the tags. Does this mean that all
> > > memory mapped by the VMM in user space should have PROT_MTE set?
> > > Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
> > > tags restored from swap (we do save them since when they were mapped,
> > > PG_mte_tagged was set).
> > > 
> > > So I think the code above should be similar to mte_sync_tags(), even
> > > calling a common function, but I'm not sure where to get the swap pte
> > > from.
> 
> You're right - the code is broken as it stands. I've just been able to
> reproduce the loss of tags due to swap.
> 
> The problem is that we also don't have a suitable pte to do the restore from
> swap from. So either set_pte_at() would have to unconditionally check for
> MTE tags for all previous swap entries as you suggest below. I had a quick
> go at testing this and hit issues with the idle task getting killed during
> boot - I fear there are some fun issues regarding initialisation order here.

My attempt here but not fully tested (just booted, no swap support):

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b35833259f08..27d7fd336a16 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -304,7 +304,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 		__sync_icache_dcache(pte);
 
 	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
+	    pte_present(pte) && pte_valid_user(pte) && !pte_special(pte))
 		mte_sync_tags(ptep, pte);
 
 	__check_racy_pte_update(mm, ptep, pte);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 52a0638ed967..bbd6c56d33d9 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -20,18 +20,24 @@
 #include <asm/ptrace.h>
 #include <asm/sysreg.h>
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t *ptep, pte_t pte,
+			       bool check_swap)
 {
 	pte_t old_pte = READ_ONCE(*ptep);
 
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
 			return;
+		}
 	}
 
-	mte_clear_page_tags(page_address(page));
+	if (pte_tagged(pte)) {
+		mte_clear_page_tags(page_address(page));
+		set_bit(PG_mte_tagged, &page->flags);
+	}
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -42,8 +48,8 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+		if (!test_bit(PG_mte_tagged, &page->flags))
+			mte_sync_page_tags(page, ptep, pte, check_swap);
 	}
 }

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-19 16:24             ` Catalin Marinas
@ 2020-11-20  9:33               ` Steven Price
  0 siblings, 0 replies; 17+ messages in thread
From: Steven Price @ 2020-11-20  9:33 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Andrew Jones, Haibo Xu,
	Suzuki K Poulose, Marc Zyngier, Juan Quintela, Richard Henderson,
	Dr. David Alan Gilbert, qemu-devel, James Morse,
	linux-arm-kernel, kvmarm, Thomas Gleixner, Julien Thierry,
	Will Deacon, Dave Martin, linux-kernel

On 19/11/2020 16:24, Catalin Marinas wrote:
> On Thu, Nov 19, 2020 at 12:45:52PM +0000, Steven Price wrote:
>> On 18/11/2020 17:05, Andrew Jones wrote:
>>> On Wed, Nov 18, 2020 at 04:50:01PM +0000, Catalin Marinas wrote:
>>>> On Wed, Nov 18, 2020 at 04:01:20PM +0000, Steven Price wrote:
>>>>> On 17/11/2020 16:07, Catalin Marinas wrote:
>>>>>> On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
>>>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>>>> index 19aacc7d64de..38fe25310ca1 100644
>>>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>>>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>>     	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>>>     		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>>>     							   &pfn, &fault_ipa);
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * The otherwise redundant test for system_supports_mte() allows the
>>>>>>> +	 * code to be compiled out when CONFIG_ARM64_MTE is not present.
>>>>>>> +	 */
>>>>>>> +	if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
>>>>>>> +		/*
>>>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>>>> +		 * they have been initialised.
>>>>>>> +		 */
>>>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>>>> +		long i, nr_pages = compound_nr(page);
>>>>>>> +
>>>>>>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>>>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>>>> +				mte_clear_page_tags(page_address(page));
>>>>>>> +		}
>>>>>>> +	}
>>>>>>
>>>>>> If this page was swapped out and mapped back in, where does the
>>>>>> restoring from swap happen?
>>>>>
>>>>> Restoring from swap happens above this in the call to gfn_to_pfn_prot()
>>>>
>>>> Looking at the call chain, gfn_to_pfn_prot() ends up with
>>>> get_user_pages() using the current->mm (the VMM) and that does a
>>>> set_pte_at(), presumably restoring the tags. Does this mean that all
>>>> memory mapped by the VMM in user space should have PROT_MTE set?
>>>> Otherwise we don't take the mte_sync_tags() path in set_pte_at() and no
>>>> tags restored from swap (we do save them since when they were mapped,
>>>> PG_mte_tagged was set).
>>>>
>>>> So I think the code above should be similar to mte_sync_tags(), even
>>>> calling a common function, but I'm not sure where to get the swap pte
>>>> from.
>>
>> You're right - the code is broken as it stands. I've just been able to
>> reproduce the loss of tags due to swap.
>>
>> The problem is that we also don't have a suitable pte to do the restore from
>> swap from. So either set_pte_at() would have to unconditionally check for
>> MTE tags for all previous swap entries as you suggest below. I had a quick
>> go at testing this and hit issues with the idle task getting killed during
>> boot - I fear there are some fun issues regarding initialisation order here.
> 
> My attempt here but not fully tested (just booted, no swap support):

Ah, very similar to what I had, just without the silly mistake... ;)

I just did a quick test with this and it seems to work. I obviously 
should have looked harder before giving up on this approach.

Thanks!

Steve

> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index b35833259f08..27d7fd336a16 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -304,7 +304,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>   		__sync_icache_dcache(pte);
>   
>   	if (system_supports_mte() &&
> -	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
> +	    pte_present(pte) && pte_valid_user(pte) && !pte_special(pte))
>   		mte_sync_tags(ptep, pte);
>   
>   	__check_racy_pte_update(mm, ptep, pte);
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 52a0638ed967..bbd6c56d33d9 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -20,18 +20,24 @@
>   #include <asm/ptrace.h>
>   #include <asm/sysreg.h>
>   
> -static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
> +static void mte_sync_page_tags(struct page *page, pte_t *ptep, pte_t pte,
> +			       bool check_swap)
>   {
>   	pte_t old_pte = READ_ONCE(*ptep);
>   
>   	if (check_swap && is_swap_pte(old_pte)) {
>   		swp_entry_t entry = pte_to_swp_entry(old_pte);
>   
> -		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
> +		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
> +			set_bit(PG_mte_tagged, &page->flags);
>   			return;
> +		}
>   	}
>   
> -	mte_clear_page_tags(page_address(page));
> +	if (pte_tagged(pte)) {
> +		mte_clear_page_tags(page_address(page));
> +		set_bit(PG_mte_tagged, &page->flags);
> +	}
>   }
>   
>   void mte_sync_tags(pte_t *ptep, pte_t pte)
> @@ -42,8 +48,8 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
>   
>   	/* if PG_mte_tagged is set, tags have already been initialised */
>   	for (i = 0; i < nr_pages; i++, page++) {
> -		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> -			mte_sync_page_tags(page, ptep, check_swap);
> +		if (!test_bit(PG_mte_tagged, &page->flags))
> +			mte_sync_page_tags(page, ptep, pte, check_swap);
>   	}
>   }
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-11-18 16:01     ` Steven Price
  2020-11-18 16:50       ` Catalin Marinas
@ 2020-11-25 18:13       ` James Morse
  1 sibling, 0 replies; 17+ messages in thread
From: James Morse @ 2020-11-25 18:13 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, Julien Thierry, Thomas Gleixner, Will Deacon,
	kvmarm, linux-arm-kernel

Hi Steven, Catalin,

On 18/11/2020 16:01, Steven Price wrote:
> On 17/11/2020 16:07, Catalin Marinas wrote:
>> On Mon, Oct 26, 2020 at 03:57:27PM +0000, Steven Price wrote:
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 19aacc7d64de..38fe25310ca1 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t
>>> fault_ipa,
>>>       if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>           vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>                                  &pfn, &fault_ipa);
>>> +
>>> +    /*
>>> +     * The otherwise redundant test for system_supports_mte() allows the
>>> +     * code to be compiled out when CONFIG_ARM64_MTE is not present.
>>> +     */
>>> +    if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
>>> +        /*
>>> +         * VM will be able to see the page's tags, so we must ensure
>>> +         * they have been initialised.
>>> +         */
>>> +        struct page *page = pfn_to_page(pfn);
>>> +        long i, nr_pages = compound_nr(page);
>>> +
>>> +        /* if PG_mte_tagged is set, tags have already been initialised */
>>> +        for (i = 0; i < nr_pages; i++, page++) {
>>> +            if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>> +                mte_clear_page_tags(page_address(page));
>>> +        }
>>> +    }
>>
>> If this page was swapped out and mapped back in, where does the
>> restoring from swap happen?
> 
> Restoring from swap happens above this in the call to gfn_to_pfn_prot()
> 
>> I may have asked in the past, is user_mem_abort() the only path for
>> mapping Normal pages into stage 2?
>>
> 
> That is my understanding (and yes you asked before) and no one has corrected me! ;)

A recent discovery: Copy on write will cause kvm_set_spte_handler() to fixup the mapping
(instead of just invalidating it) on the assumption the guest is going to read whatever
was written.

Its possible user_mem_abort() will go and stomp on that mapping a second time, but if the
VMM triggers this at stage1, you won't have a vcpu for the update.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-11-25 18:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 15:57 [PATCH v4 0/2] MTE support for KVM guest Steven Price
2020-10-26 15:57 ` [PATCH v4 1/2] arm64: kvm: Save/restore MTE registers Steven Price
2020-11-17 19:20   ` Marc Zyngier
2020-11-18 16:01     ` Steven Price
2020-11-18 17:02       ` Catalin Marinas
2020-11-19 12:45         ` Steven Price
2020-10-26 15:57 ` [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
2020-11-17 16:07   ` Catalin Marinas
2020-11-18 16:01     ` Steven Price
2020-11-18 16:50       ` Catalin Marinas
2020-11-18 17:05         ` Andrew Jones
2020-11-19 12:45           ` Steven Price
2020-11-19 16:24             ` Catalin Marinas
2020-11-20  9:33               ` Steven Price
2020-11-25 18:13       ` James Morse
2020-11-17 19:35   ` Marc Zyngier
2020-11-18 16:01     ` Steven Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).