All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-04 16:00 ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu

Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
bytes of memory in the system. This along with stashing a tag within the
high bit of virtual addresses allows runtime checking of memory
accesses.

These patches add support to KVM to enable MTE within a guest. They are
based on Catalin's v9 MTE user-space support series[1].

I'd welcome feedback on the proposed user-kernel ABI. Specifically this
series currently:

 1. Requires the VMM to enable MTE per-VCPU.
 2. Automatically promotes (normal host) memory given to the guest to be
    tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
    tags are cleared if the memory wasn't previously MTE enabled.
 3. Doesn't provide any new methods for the VMM to access the tags on
    memory.

(2) and (3) are particularly interesting from the aspect of VM migration.
The guest is able to store/retrieve data in the tags (presumably for the
purpose of tag checking, but architecturally it could be used as just
storage). This means that when migrating a guest the data needs to be
transferred (or saved/restored).

MTE tags are controlled by the same permission model as normal pages
(i.e. a read-only page has read-only tags), so the normal methods of
detecting guest changes to pages can be used. But this would also
require the tags within a page to be migrated at the same time as the
data (since the access control for tags is the same as the normal data
within a page).

(3) may be problematic and I'd welcome input from those familiar with
VMMs. User space cannot access tags unless the memory is mapped with the
PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
for the user space process (assuming the VMM enables tag checking for
the process) and since the tags in memory are controlled by the guest
it's unlikely the VMM would have an appropriately tagged pointer for its
access. This means the VMM would either need to maintain two mappings of
memory (one to access tags, the other to access data) or disable tag
checking during the accesses to data.

If it's not practical to either disable tag checking in the VMM or
maintain multiple mappings then the alternatives I'm aware of are:

 * Provide a KVM-specific method to extract the tags from guest memory.
   This might also have benefits in terms of providing an easy way to
   read bulk tag data from guest memory (since the LDGM instruction
   isn't available at EL0).
 * Provide support for user space setting the TCMA0 or TCMA1 bits in
   TCR_EL1. These would allow the VMM to generate pointers which are not
   tag checked.

Feedback is welcome, and feel free to ask questions if anything in the
above doesn't make sense.

Changes since the previous v1 posting[2]:

 * Rebasing clean-ups
 * sysreg visibility is now controlled based on whether the VCPU has MTE
   enabled or not

[1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
[2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/include/uapi/asm/kvm.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
 arch/arm64/kvm/reset.c                     |  8 ++++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 8 files changed, 66 insertions(+), 7 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-04 16:00 ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
bytes of memory in the system. This along with stashing a tag within the
high bit of virtual addresses allows runtime checking of memory
accesses.

These patches add support to KVM to enable MTE within a guest. They are
based on Catalin's v9 MTE user-space support series[1].

I'd welcome feedback on the proposed user-kernel ABI. Specifically this
series currently:

 1. Requires the VMM to enable MTE per-VCPU.
 2. Automatically promotes (normal host) memory given to the guest to be
    tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
    tags are cleared if the memory wasn't previously MTE enabled.
 3. Doesn't provide any new methods for the VMM to access the tags on
    memory.

(2) and (3) are particularly interesting from the aspect of VM migration.
The guest is able to store/retrieve data in the tags (presumably for the
purpose of tag checking, but architecturally it could be used as just
storage). This means that when migrating a guest the data needs to be
transferred (or saved/restored).

MTE tags are controlled by the same permission model as normal pages
(i.e. a read-only page has read-only tags), so the normal methods of
detecting guest changes to pages can be used. But this would also
require the tags within a page to be migrated at the same time as the
data (since the access control for tags is the same as the normal data
within a page).

(3) may be problematic and I'd welcome input from those familiar with
VMMs. User space cannot access tags unless the memory is mapped with the
PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
for the user space process (assuming the VMM enables tag checking for
the process) and since the tags in memory are controlled by the guest
it's unlikely the VMM would have an appropriately tagged pointer for its
access. This means the VMM would either need to maintain two mappings of
memory (one to access tags, the other to access data) or disable tag
checking during the accesses to data.

If it's not practical to either disable tag checking in the VMM or
maintain multiple mappings then the alternatives I'm aware of are:

 * Provide a KVM-specific method to extract the tags from guest memory.
   This might also have benefits in terms of providing an easy way to
   read bulk tag data from guest memory (since the LDGM instruction
   isn't available at EL0).
 * Provide support for user space setting the TCMA0 or TCMA1 bits in
   TCR_EL1. These would allow the VMM to generate pointers which are not
   tag checked.

Feedback is welcome, and feel free to ask questions if anything in the
above doesn't make sense.

Changes since the previous v1 posting[2]:

 * Rebasing clean-ups
 * sysreg visibility is now controlled based on whether the VCPU has MTE
   enabled or not

[1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
[2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/include/uapi/asm/kvm.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
 arch/arm64/kvm/reset.c                     |  8 ++++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 8 files changed, 66 insertions(+), 7 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-04 16:00 ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Steven Price,
	Thomas Gleixner, kvmarm, linux-arm-kernel

Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
bytes of memory in the system. This along with stashing a tag within the
high bit of virtual addresses allows runtime checking of memory
accesses.

These patches add support to KVM to enable MTE within a guest. They are
based on Catalin's v9 MTE user-space support series[1].

I'd welcome feedback on the proposed user-kernel ABI. Specifically this
series currently:

 1. Requires the VMM to enable MTE per-VCPU.
 2. Automatically promotes (normal host) memory given to the guest to be
    tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
    tags are cleared if the memory wasn't previously MTE enabled.
 3. Doesn't provide any new methods for the VMM to access the tags on
    memory.

(2) and (3) are particularly interesting from the aspect of VM migration.
The guest is able to store/retrieve data in the tags (presumably for the
purpose of tag checking, but architecturally it could be used as just
storage). This means that when migrating a guest the data needs to be
transferred (or saved/restored).

MTE tags are controlled by the same permission model as normal pages
(i.e. a read-only page has read-only tags), so the normal methods of
detecting guest changes to pages can be used. But this would also
require the tags within a page to be migrated at the same time as the
data (since the access control for tags is the same as the normal data
within a page).

(3) may be problematic and I'd welcome input from those familiar with
VMMs. User space cannot access tags unless the memory is mapped with the
PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
for the user space process (assuming the VMM enables tag checking for
the process) and since the tags in memory are controlled by the guest
it's unlikely the VMM would have an appropriately tagged pointer for its
access. This means the VMM would either need to maintain two mappings of
memory (one to access tags, the other to access data) or disable tag
checking during the accesses to data.

If it's not practical to either disable tag checking in the VMM or
maintain multiple mappings then the alternatives I'm aware of are:

 * Provide a KVM-specific method to extract the tags from guest memory.
   This might also have benefits in terms of providing an easy way to
   read bulk tag data from guest memory (since the LDGM instruction
   isn't available at EL0).
 * Provide support for user space setting the TCMA0 or TCMA1 bits in
   TCR_EL1. These would allow the VMM to generate pointers which are not
   tag checked.

Feedback is welcome, and feel free to ask questions if anything in the
above doesn't make sense.

Changes since the previous v1 posting[2]:

 * Rebasing clean-ups
 * sysreg visibility is now controlled based on whether the VCPU has MTE
   enabled or not

[1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
[2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/include/uapi/asm/kvm.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
 arch/arm64/kvm/reset.c                     |  8 ++++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 8 files changed, 66 insertions(+), 7 deletions(-)

-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-04 16:00 ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
bytes of memory in the system. This along with stashing a tag within the
high bit of virtual addresses allows runtime checking of memory
accesses.

These patches add support to KVM to enable MTE within a guest. They are
based on Catalin's v9 MTE user-space support series[1].

I'd welcome feedback on the proposed user-kernel ABI. Specifically this
series currently:

 1. Requires the VMM to enable MTE per-VCPU.
 2. Automatically promotes (normal host) memory given to the guest to be
    tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
    tags are cleared if the memory wasn't previously MTE enabled.
 3. Doesn't provide any new methods for the VMM to access the tags on
    memory.

(2) and (3) are particularly interesting from the aspect of VM migration.
The guest is able to store/retrieve data in the tags (presumably for the
purpose of tag checking, but architecturally it could be used as just
storage). This means that when migrating a guest the data needs to be
transferred (or saved/restored).

MTE tags are controlled by the same permission model as normal pages
(i.e. a read-only page has read-only tags), so the normal methods of
detecting guest changes to pages can be used. But this would also
require the tags within a page to be migrated at the same time as the
data (since the access control for tags is the same as the normal data
within a page).

(3) may be problematic and I'd welcome input from those familiar with
VMMs. User space cannot access tags unless the memory is mapped with the
PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
for the user space process (assuming the VMM enables tag checking for
the process) and since the tags in memory are controlled by the guest
it's unlikely the VMM would have an appropriately tagged pointer for its
access. This means the VMM would either need to maintain two mappings of
memory (one to access tags, the other to access data) or disable tag
checking during the accesses to data.

If it's not practical to either disable tag checking in the VMM or
maintain multiple mappings then the alternatives I'm aware of are:

 * Provide a KVM-specific method to extract the tags from guest memory.
   This might also have benefits in terms of providing an easy way to
   read bulk tag data from guest memory (since the LDGM instruction
   isn't available at EL0).
 * Provide support for user space setting the TCMA0 or TCMA1 bits in
   TCR_EL1. These would allow the VMM to generate pointers which are not
   tag checked.

Feedback is welcome, and feel free to ask questions if anything in the
above doesn't make sense.

Changes since the previous v1 posting[2]:

 * Rebasing clean-ups
 * sysreg visibility is now controlled based on whether the VCPU has MTE
   enabled or not

[1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
[2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com

Steven Price (2):
  arm64: kvm: Save/restore MTE registers
  arm64: kvm: Introduce MTE VCPU feature

 arch/arm64/include/asm/kvm_emulate.h       |  3 +++
 arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/include/uapi/asm/kvm.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
 arch/arm64/kvm/reset.c                     |  8 ++++++++
 arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
 8 files changed, 66 insertions(+), 7 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 1/2] arm64: kvm: Save/restore MTE registers
  2020-09-04 16:00 ` Steven Price
  (?)
  (?)
@ 2020-09-04 16:00   ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  4 ++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..4f4360dd149e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -126,6 +126,8 @@ enum vcpu_sysreg {
 	SCTLR_EL1,	/* System Control Register */
 	ACTLR_EL1,	/* Auxiliary Control Register */
 	CPACR_EL1,	/* Coprocessor Access Control */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
 	ZCR_EL1,	/* SVE Control */
 	TTBR0_EL1,	/* Translation Table Base Register 0 */
 	TTBR1_EL1,	/* Translation Table Base Register 1 */
@@ -142,6 +144,8 @@ enum vcpu_sysreg {
 	TPIDR_EL1,	/* Thread ID, Privileged */
 	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
 	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 52eefe2f7d95..cd60677551b7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -563,7 +563,8 @@
 #define SCTLR_ELx_M	(BIT(0))
 
 #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+			 SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
+	if (system_supports_mte()) {
+		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
+		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+	if (system_supports_mte())
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+	if (system_supports_mte()) {
+		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+	if (system_supports_mte())
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 379f4969d0bd..a655f172b5ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return false;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 1/2] arm64: kvm: Save/restore MTE registers
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  4 ++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..4f4360dd149e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -126,6 +126,8 @@ enum vcpu_sysreg {
 	SCTLR_EL1,	/* System Control Register */
 	ACTLR_EL1,	/* Auxiliary Control Register */
 	CPACR_EL1,	/* Coprocessor Access Control */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
 	ZCR_EL1,	/* SVE Control */
 	TTBR0_EL1,	/* Translation Table Base Register 0 */
 	TTBR1_EL1,	/* Translation Table Base Register 1 */
@@ -142,6 +144,8 @@ enum vcpu_sysreg {
 	TPIDR_EL1,	/* Thread ID, Privileged */
 	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
 	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 52eefe2f7d95..cd60677551b7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -563,7 +563,8 @@
 #define SCTLR_ELx_M	(BIT(0))
 
 #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+			 SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
+	if (system_supports_mte()) {
+		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
+		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+	if (system_supports_mte())
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+	if (system_supports_mte()) {
+		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+	if (system_supports_mte())
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 379f4969d0bd..a655f172b5ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return false;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 1/2] arm64: kvm: Save/restore MTE registers
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Steven Price,
	Thomas Gleixner, kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  4 ++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..4f4360dd149e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -126,6 +126,8 @@ enum vcpu_sysreg {
 	SCTLR_EL1,	/* System Control Register */
 	ACTLR_EL1,	/* Auxiliary Control Register */
 	CPACR_EL1,	/* Coprocessor Access Control */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
 	ZCR_EL1,	/* SVE Control */
 	TTBR0_EL1,	/* Translation Table Base Register 0 */
 	TTBR1_EL1,	/* Translation Table Base Register 1 */
@@ -142,6 +144,8 @@ enum vcpu_sysreg {
 	TPIDR_EL1,	/* Thread ID, Privileged */
 	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
 	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 52eefe2f7d95..cd60677551b7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -563,7 +563,8 @@
 #define SCTLR_ELx_M	(BIT(0))
 
 #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+			 SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
+	if (system_supports_mte()) {
+		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
+		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+	if (system_supports_mte())
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+	if (system_supports_mte()) {
+		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+	if (system_supports_mte())
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 379f4969d0bd..a655f172b5ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return false;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 1/2] arm64: kvm: Save/restore MTE registers
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  4 ++++
 arch/arm64/include/asm/sysreg.h            |  3 ++-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
 arch/arm64/kvm/sys_regs.c                  | 14 ++++++++++----
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..4f4360dd149e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -126,6 +126,8 @@ enum vcpu_sysreg {
 	SCTLR_EL1,	/* System Control Register */
 	ACTLR_EL1,	/* Auxiliary Control Register */
 	CPACR_EL1,	/* Coprocessor Access Control */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
 	ZCR_EL1,	/* SVE Control */
 	TTBR0_EL1,	/* Translation Table Base Register 0 */
 	TTBR1_EL1,	/* Translation Table Base Register 1 */
@@ -142,6 +144,8 @@ enum vcpu_sysreg {
 	TPIDR_EL1,	/* Thread ID, Privileged */
 	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+	TFSR_EL1,	/* Tag Fault Stauts Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
 	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 52eefe2f7d95..cd60677551b7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -563,7 +563,8 @@
 #define SCTLR_ELx_M	(BIT(0))
 
 #define SCTLR_ELx_FLAGS	(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
-			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
+			 SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \
+			 SCTLR_ELx_ITFSB)
 
 /* SCTLR_EL2 specific flags. */
 #define SCTLR_EL2_RES1	((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7a986030145f..a124ffa49ba3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -18,6 +18,11 @@
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
+	if (system_supports_mte()) {
+		ctxt_sys_reg(ctxt, RGSR_EL1)	= read_sysreg_s(SYS_RGSR_EL1);
+		ctxt_sys_reg(ctxt, GCR_EL1)	= read_sysreg_s(SYS_GCR_EL1);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1)	= read_sysreg_s(SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+	if (system_supports_mte())
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
@@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
+	if (system_supports_mte()) {
+		write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
 }
 
 static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+	if (system_supports_mte())
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
 
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 379f4969d0bd..a655f172b5ad 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1391,6 +1391,12 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return false;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1557,8 +1563,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_RGSR_EL1), access_mte_regs, reset_unknown, RGSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_GCR_EL1), access_mte_regs, reset_unknown, GCR_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -1584,8 +1590,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs },
-	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs },
+	{ SYS_DESC(SYS_TFSR_EL1), access_mte_regs, reset_unknown, TFSR_EL1, .visibility = mte_visibility },
+	{ SYS_DESC(SYS_TFSRE0_EL1), access_mte_regs, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility },
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-04 16:00 ` Steven Price
  (?)
  (?)
@ 2020-09-04 16:00   ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu

Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
on a VCPU. When enabled on any VCPU in the virtual machine this causes
all pages that are faulted into the VM to have the PG_mte_tagged flag
set (and the tag storage cleared if this is the first use).

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  5 ++++-
 arch/arm64/include/uapi/asm/kvm.h    |  1 +
 arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
 arch/arm64/kvm/reset.c               |  8 ++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 6 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 49a55be2b9a2..0042323a4b7f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4f4360dd149e..b1190366242b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@@ -110,6 +110,9 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
+	bool vcpu_has_mte;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..2677e1ab8c16 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..e8891bacd76f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
+	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
+		/*
+		 * VM will be able to see the page's tags, so we must ensure
+		 * they have been initialised.
+		 */
+		struct page *page = pfn_to_page(pfn);
+		long i, nr_pages = compound_nr(page);
+
+		/* if PG_mte_tagged is set, tags have already been initialised */
+		for (i = 0; i < nr_pages; i++, page++) {
+			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+				mte_clear_page_tags(page_address(page));
+		}
+	}
+
 	if (writable)
 		kvm_set_pfn_dirty(pfn);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ee33875c5c2a..82f3883d717f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
+		if (!system_supports_mte()) {
+			ret = -EINVAL;
+			goto out;
+		}
+		vcpu->kvm->arch.vcpu_has_mte = true;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a655f172b5ad..6a971b201e81 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
 	} else if (id == SYS_ID_AA64PFR1_EL1) {
-		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		return 0;
+
 	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
on a VCPU. When enabled on any VCPU in the virtual machine this causes
all pages that are faulted into the VM to have the PG_mte_tagged flag
set (and the tag storage cleared if this is the first use).

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  5 ++++-
 arch/arm64/include/uapi/asm/kvm.h    |  1 +
 arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
 arch/arm64/kvm/reset.c               |  8 ++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 6 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 49a55be2b9a2..0042323a4b7f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4f4360dd149e..b1190366242b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@@ -110,6 +110,9 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
+	bool vcpu_has_mte;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..2677e1ab8c16 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..e8891bacd76f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
+	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
+		/*
+		 * VM will be able to see the page's tags, so we must ensure
+		 * they have been initialised.
+		 */
+		struct page *page = pfn_to_page(pfn);
+		long i, nr_pages = compound_nr(page);
+
+		/* if PG_mte_tagged is set, tags have already been initialised */
+		for (i = 0; i < nr_pages; i++, page++) {
+			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+				mte_clear_page_tags(page_address(page));
+		}
+	}
+
 	if (writable)
 		kvm_set_pfn_dirty(pfn);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ee33875c5c2a..82f3883d717f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
+		if (!system_supports_mte()) {
+			ret = -EINVAL;
+			goto out;
+		}
+		vcpu->kvm->arch.vcpu_has_mte = true;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a655f172b5ad..6a971b201e81 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
 	} else if (id == SYS_ID_AA64PFR1_EL1) {
-		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		return 0;
+
 	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Steven Price,
	Thomas Gleixner, kvmarm, linux-arm-kernel

Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
on a VCPU. When enabled on any VCPU in the virtual machine this causes
all pages that are faulted into the VM to have the PG_mte_tagged flag
set (and the tag storage cleared if this is the first use).

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  5 ++++-
 arch/arm64/include/uapi/asm/kvm.h    |  1 +
 arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
 arch/arm64/kvm/reset.c               |  8 ++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 6 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 49a55be2b9a2..0042323a4b7f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4f4360dd149e..b1190366242b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@@ -110,6 +110,9 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
+	bool vcpu_has_mte;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..2677e1ab8c16 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..e8891bacd76f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
+	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
+		/*
+		 * VM will be able to see the page's tags, so we must ensure
+		 * they have been initialised.
+		 */
+		struct page *page = pfn_to_page(pfn);
+		long i, nr_pages = compound_nr(page);
+
+		/* if PG_mte_tagged is set, tags have already been initialised */
+		for (i = 0; i < nr_pages; i++, page++) {
+			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+				mte_clear_page_tags(page_address(page));
+		}
+	}
+
 	if (writable)
 		kvm_set_pfn_dirty(pfn);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ee33875c5c2a..82f3883d717f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
+		if (!system_supports_mte()) {
+			ret = -EINVAL;
+			goto out;
+		}
+		vcpu->kvm->arch.vcpu_has_mte = true;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a655f172b5ad..6a971b201e81 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
 	} else if (id == SYS_ID_AA64PFR1_EL1) {
-		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		return 0;
+
 	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-04 16:00   ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-04 16:00 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, James Morse,
	Julien Thierry, Thomas Gleixner, kvmarm, linux-arm-kernel

Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
on a VCPU. When enabled on any VCPU in the virtual machine this causes
all pages that are faulted into the VM to have the PG_mte_tagged flag
set (and the tag storage cleared if this is the first use).

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  5 ++++-
 arch/arm64/include/uapi/asm/kvm.h    |  1 +
 arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
 arch/arm64/kvm/reset.c               |  8 ++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 6 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 49a55be2b9a2..0042323a4b7f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4f4360dd149e..b1190366242b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@@ -110,6 +110,9 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
+	bool vcpu_has_mte;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..2677e1ab8c16 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..e8891bacd76f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
+	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
+		/*
+		 * VM will be able to see the page's tags, so we must ensure
+		 * they have been initialised.
+		 */
+		struct page *page = pfn_to_page(pfn);
+		long i, nr_pages = compound_nr(page);
+
+		/* if PG_mte_tagged is set, tags have already been initialised */
+		for (i = 0; i < nr_pages; i++, page++) {
+			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+				mte_clear_page_tags(page_address(page));
+		}
+	}
+
 	if (writable)
 		kvm_set_pfn_dirty(pfn);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ee33875c5c2a..82f3883d717f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
+		if (!system_supports_mte()) {
+			ret = -EINVAL;
+			goto out;
+		}
+		vcpu->kvm->arch.vcpu_has_mte = true;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a655f172b5ad..6a971b201e81 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
 	} else if (id == SYS_ID_AA64PFR1_EL1) {
-		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
+		return 0;
+
 	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-04 16:00 ` Steven Price
  (?)
  (?)
@ 2020-09-07 15:28   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-07 15:28 UTC (permalink / raw)
  To: Steven Price, eric.auger
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Julien Thierry, Suzuki K Poulose, kvmarm, linux-arm-kernel,
	linux-kernel, Dave Martin, Mark Rutland, Thomas Gleixner,
	qemu-devel, Juan Quintela, Richard Henderson, Peter Maydell,
	Haibo Xu

(cc'ing in Eric Auger)

* Steven Price (steven.price@arm.com) wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
> 
>  1. Requires the VMM to enable MTE per-VCPU.
>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).

(Without understanding anything about your tag system...)

Note that during (normal, non-postcopy) migration the consistency can
be a little loose - until the guest starts running; i.e. you can send
a page that's in themiddle of being modified as long as you make sure
you send it again later so that what the guest sees on the destination
when it runs is consistent; i.e. it would be fine to send your tags
separately to your data and allow them to get a little out of sync, as
long as they caught up before the guest ran.

> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

Imagine I had a second mapping; what would it look like; how would I get
and restore the tags?

In terms of migration stream, I guess we have two ways to do this,
either it rides shotgun on the main RAM section pages, transmitting
those few extra bytes whenever we transmit a page, or you have a
separate iteratable device for RAMtags, and it just transmits those.
How you keep the two together is an interesting question.
The shotgun method sounds nasty to avoid putting special cases in the,
already hairy, RAM code.

> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).
>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

I guess you want the VMM to do as much tagged checked access as possible
on it's own data structures?

How do things like virtio work where the qemu or kernel is accessing
guest memory for IO?

Dave

> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-07 15:28   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-07 15:28 UTC (permalink / raw)
  To: Steven Price, eric.auger
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	qemu-devel, Catalin Marinas, Juan Quintela, Richard Henderson,
	linux-kernel, Dave Martin, James Morse, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	Julien Thierry

(cc'ing in Eric Auger)

* Steven Price (steven.price@arm.com) wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
> 
>  1. Requires the VMM to enable MTE per-VCPU.
>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).

(Without understanding anything about your tag system...)

Note that during (normal, non-postcopy) migration the consistency can
be a little loose - until the guest starts running; i.e. you can send
a page that's in themiddle of being modified as long as you make sure
you send it again later so that what the guest sees on the destination
when it runs is consistent; i.e. it would be fine to send your tags
separately to your data and allow them to get a little out of sync, as
long as they caught up before the guest ran.

> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

Imagine I had a second mapping; what would it look like; how would I get
and restore the tags?

In terms of migration stream, I guess we have two ways to do this,
either it rides shotgun on the main RAM section pages, transmitting
those few extra bytes whenever we transmit a page, or you have a
separate iteratable device for RAMtags, and it just transmits those.
How you keep the two together is an interesting question.
The shotgun method sounds nasty to avoid putting special cases in the,
already hairy, RAM code.

> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).
>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

I guess you want the VMM to do as much tagged checked access as possible
on it's own data structures?

How do things like virtio work where the qemu or kernel is accessing
guest memory for IO?

Dave

> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-07 15:28   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-07 15:28 UTC (permalink / raw)
  To: Steven Price, eric.auger
  Cc: Peter Maydell, qemu-devel, Catalin Marinas, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm

(cc'ing in Eric Auger)

* Steven Price (steven.price@arm.com) wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
> 
>  1. Requires the VMM to enable MTE per-VCPU.
>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).

(Without understanding anything about your tag system...)

Note that during (normal, non-postcopy) migration the consistency can
be a little loose - until the guest starts running; i.e. you can send
a page that's in themiddle of being modified as long as you make sure
you send it again later so that what the guest sees on the destination
when it runs is consistent; i.e. it would be fine to send your tags
separately to your data and allow them to get a little out of sync, as
long as they caught up before the guest ran.

> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

Imagine I had a second mapping; what would it look like; how would I get
and restore the tags?

In terms of migration stream, I guess we have two ways to do this,
either it rides shotgun on the main RAM section pages, transmitting
those few extra bytes whenever we transmit a page, or you have a
separate iteratable device for RAMtags, and it just transmits those.
How you keep the two together is an interesting question.
The shotgun method sounds nasty to avoid putting special cases in the,
already hairy, RAM code.

> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).
>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

I guess you want the VMM to do as much tagged checked access as possible
on it's own data structures?

How do things like virtio work where the qemu or kernel is accessing
guest memory for IO?

Dave

> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-07 15:28   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-07 15:28 UTC (permalink / raw)
  To: Steven Price, eric.auger
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	qemu-devel, Catalin Marinas, Juan Quintela, Richard Henderson,
	linux-kernel, Dave Martin, James Morse, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	Julien Thierry

(cc'ing in Eric Auger)

* Steven Price (steven.price@arm.com) wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
> 
>  1. Requires the VMM to enable MTE per-VCPU.
>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).

(Without understanding anything about your tag system...)

Note that during (normal, non-postcopy) migration the consistency can
be a little loose - until the guest starts running; i.e. you can send
a page that's in themiddle of being modified as long as you make sure
you send it again later so that what the guest sees on the destination
when it runs is consistent; i.e. it would be fine to send your tags
separately to your data and allow them to get a little out of sync, as
long as they caught up before the guest ran.

> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

Imagine I had a second mapping; what would it look like; how would I get
and restore the tags?

In terms of migration stream, I guess we have two ways to do this,
either it rides shotgun on the main RAM section pages, transmitting
those few extra bytes whenever we transmit a page, or you have a
separate iteratable device for RAMtags, and it just transmits those.
How you keep the two together is an interesting question.
The shotgun method sounds nasty to avoid putting special cases in the,
already hairy, RAM code.

> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).
>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

I guess you want the VMM to do as much tagged checked access as possible
on it's own data structures?

How do things like virtio work where the qemu or kernel is accessing
guest memory for IO?

Dave

> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-07 15:28   ` Dr. David Alan Gilbert
  (?)
  (?)
@ 2020-09-09  9:15     ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09  9:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, eric.auger
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Julien Thierry, Suzuki K Poulose, kvmarm, linux-arm-kernel,
	linux-kernel, Dave Martin, Mark Rutland, Thomas Gleixner,
	qemu-devel, Juan Quintela, Richard Henderson, Peter Maydell,
	Haibo Xu

On 07/09/2020 16:28, Dr. David Alan Gilbert wrote:
> (cc'ing in Eric Auger)
> 
> * Steven Price (steven.price@arm.com) wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>>   1. Requires the VMM to enable MTE per-VCPU.
>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
> 
> (Without understanding anything about your tag system...)
> 
> Note that during (normal, non-postcopy) migration the consistency can
> be a little loose - until the guest starts running; i.e. you can send
> a page that's in themiddle of being modified as long as you make sure
> you send it again later so that what the guest sees on the destination
> when it runs is consistent; i.e. it would be fine to send your tags
> separately to your data and allow them to get a little out of sync, as
> long as they caught up before the guest ran.

Yes, you can obviously pro-actively send data early as you as you 
appropriately deal with any potential changes that the guest might make. 
I'm not very familiar with exactly how QEMU handles this, so it might 
not be a problem - I just wanted to point out that we don't have 
separate access permissions.

>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> Imagine I had a second mapping; what would it look like; how would I get
> and restore the tags?

At a very simple level you could do something like:

  normal_mapping = mmap(..., PROT_READ | PROT_WRITE, ..., fd, 0);
  mte_mapping = mmap(..., PROT_READ | PROT_WRITE | PROT_MTE, ..., fd, 0);

  /* access normal mapping as normal */
  normal_mapping[offset] = 0xf00 + normal_mapping[offset + 1];

  /* read tag from mte_mapping */
  uint64_t tag = ldg(&mte_mapping[offset]);

  /* write a new tag value (8)
   * NOTE: tags are stored in the top byte, hence the shift
   */
  stg(0x8ULL << 56, &mte_mapping[offset]);

Where stg() and ldg() are simple wrappers around the new instructions:

  stg:
         STG x0, [x1]
         RET

  ldg:
         LDG x0, [x0]
         RET

> In terms of migration stream, I guess we have two ways to do this,
> either it rides shotgun on the main RAM section pages, transmitting
> those few extra bytes whenever we transmit a page, or you have a
> separate iteratable device for RAMtags, and it just transmits those.
> How you keep the two together is an interesting question.
> The shotgun method sounds nasty to avoid putting special cases in the,
> already hairy, RAM code.

As you say above it may be possible to simply let the normal RAM and 
tags get out of sync. E.g. if you send all the normal RAM (marking 
read-only as you go), then all the tags (not changing the permissions) 
you will end up with all the pages that have remained read-only (i.e. 
the guest hasn't modified) being consistent on the destination. Pages 
that have been written by the guest will be inconsistent, but you were 
going to have to resend those anyway.

However for post-migration copy you need to copy *both* normal RAM and 
tags before resuming the guest. You might need special cases for this.

>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> I guess you want the VMM to do as much tagged checked access as possible
> on it's own data structures?

Ideally yes, you would want the VMM to have checked accesses for all 
it's internal data structures because that gives the maximum benefit 
from MTE.

> How do things like virtio work where the qemu or kernel is accessing
> guest memory for IO?

Since virtio is effectively emulating a device it should be treated like 
a device - no tag checking and no tag storage used. This would be the 
obvious situation where you would use "normal_mapping" as above so tags 
wouldn't be visible or checked.

Really the VMM is only interested in guest tags for the migration case 
where it simply needs to preserve them. I don't expect the guest and VMM 
(or hypervisor) to communicate using tagged memory.

Steve

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09  9:15     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09  9:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, eric.auger
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	qemu-devel, Catalin Marinas, Juan Quintela, Richard Henderson,
	linux-kernel, Dave Martin, James Morse, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	Julien Thierry

On 07/09/2020 16:28, Dr. David Alan Gilbert wrote:
> (cc'ing in Eric Auger)
> 
> * Steven Price (steven.price@arm.com) wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>>   1. Requires the VMM to enable MTE per-VCPU.
>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
> 
> (Without understanding anything about your tag system...)
> 
> Note that during (normal, non-postcopy) migration the consistency can
> be a little loose - until the guest starts running; i.e. you can send
> a page that's in themiddle of being modified as long as you make sure
> you send it again later so that what the guest sees on the destination
> when it runs is consistent; i.e. it would be fine to send your tags
> separately to your data and allow them to get a little out of sync, as
> long as they caught up before the guest ran.

Yes, you can obviously pro-actively send data early as you as you 
appropriately deal with any potential changes that the guest might make. 
I'm not very familiar with exactly how QEMU handles this, so it might 
not be a problem - I just wanted to point out that we don't have 
separate access permissions.

>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> Imagine I had a second mapping; what would it look like; how would I get
> and restore the tags?

At a very simple level you could do something like:

  normal_mapping = mmap(..., PROT_READ | PROT_WRITE, ..., fd, 0);
  mte_mapping = mmap(..., PROT_READ | PROT_WRITE | PROT_MTE, ..., fd, 0);

  /* access normal mapping as normal */
  normal_mapping[offset] = 0xf00 + normal_mapping[offset + 1];

  /* read tag from mte_mapping */
  uint64_t tag = ldg(&mte_mapping[offset]);

  /* write a new tag value (8)
   * NOTE: tags are stored in the top byte, hence the shift
   */
  stg(0x8ULL << 56, &mte_mapping[offset]);

Where stg() and ldg() are simple wrappers around the new instructions:

  stg:
         STG x0, [x1]
         RET

  ldg:
         LDG x0, [x0]
         RET

> In terms of migration stream, I guess we have two ways to do this,
> either it rides shotgun on the main RAM section pages, transmitting
> those few extra bytes whenever we transmit a page, or you have a
> separate iteratable device for RAMtags, and it just transmits those.
> How you keep the two together is an interesting question.
> The shotgun method sounds nasty to avoid putting special cases in the,
> already hairy, RAM code.

As you say above it may be possible to simply let the normal RAM and 
tags get out of sync. E.g. if you send all the normal RAM (marking 
read-only as you go), then all the tags (not changing the permissions) 
you will end up with all the pages that have remained read-only (i.e. 
the guest hasn't modified) being consistent on the destination. Pages 
that have been written by the guest will be inconsistent, but you were 
going to have to resend those anyway.

However for post-migration copy you need to copy *both* normal RAM and 
tags before resuming the guest. You might need special cases for this.

>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> I guess you want the VMM to do as much tagged checked access as possible
> on it's own data structures?

Ideally yes, you would want the VMM to have checked accesses for all 
it's internal data structures because that gives the maximum benefit 
from MTE.

> How do things like virtio work where the qemu or kernel is accessing
> guest memory for IO?

Since virtio is effectively emulating a device it should be treated like 
a device - no tag checking and no tag storage used. This would be the 
obvious situation where you would use "normal_mapping" as above so tags 
wouldn't be visible or checked.

Really the VMM is only interested in guest tags for the migration case 
where it simply needs to preserve them. I don't expect the guest and VMM 
(or hypervisor) to communicate using tagged memory.

Steve


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09  9:15     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09  9:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, eric.auger
  Cc: Peter Maydell, qemu-devel, Catalin Marinas, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm

On 07/09/2020 16:28, Dr. David Alan Gilbert wrote:
> (cc'ing in Eric Auger)
> 
> * Steven Price (steven.price@arm.com) wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>>   1. Requires the VMM to enable MTE per-VCPU.
>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
> 
> (Without understanding anything about your tag system...)
> 
> Note that during (normal, non-postcopy) migration the consistency can
> be a little loose - until the guest starts running; i.e. you can send
> a page that's in themiddle of being modified as long as you make sure
> you send it again later so that what the guest sees on the destination
> when it runs is consistent; i.e. it would be fine to send your tags
> separately to your data and allow them to get a little out of sync, as
> long as they caught up before the guest ran.

Yes, you can obviously pro-actively send data early as you as you 
appropriately deal with any potential changes that the guest might make. 
I'm not very familiar with exactly how QEMU handles this, so it might 
not be a problem - I just wanted to point out that we don't have 
separate access permissions.

>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> Imagine I had a second mapping; what would it look like; how would I get
> and restore the tags?

At a very simple level you could do something like:

  normal_mapping = mmap(..., PROT_READ | PROT_WRITE, ..., fd, 0);
  mte_mapping = mmap(..., PROT_READ | PROT_WRITE | PROT_MTE, ..., fd, 0);

  /* access normal mapping as normal */
  normal_mapping[offset] = 0xf00 + normal_mapping[offset + 1];

  /* read tag from mte_mapping */
  uint64_t tag = ldg(&mte_mapping[offset]);

  /* write a new tag value (8)
   * NOTE: tags are stored in the top byte, hence the shift
   */
  stg(0x8ULL << 56, &mte_mapping[offset]);

Where stg() and ldg() are simple wrappers around the new instructions:

  stg:
         STG x0, [x1]
         RET

  ldg:
         LDG x0, [x0]
         RET

> In terms of migration stream, I guess we have two ways to do this,
> either it rides shotgun on the main RAM section pages, transmitting
> those few extra bytes whenever we transmit a page, or you have a
> separate iteratable device for RAMtags, and it just transmits those.
> How you keep the two together is an interesting question.
> The shotgun method sounds nasty to avoid putting special cases in the,
> already hairy, RAM code.

As you say above it may be possible to simply let the normal RAM and 
tags get out of sync. E.g. if you send all the normal RAM (marking 
read-only as you go), then all the tags (not changing the permissions) 
you will end up with all the pages that have remained read-only (i.e. 
the guest hasn't modified) being consistent on the destination. Pages 
that have been written by the guest will be inconsistent, but you were 
going to have to resend those anyway.

However for post-migration copy you need to copy *both* normal RAM and 
tags before resuming the guest. You might need special cases for this.

>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> I guess you want the VMM to do as much tagged checked access as possible
> on it's own data structures?

Ideally yes, you would want the VMM to have checked accesses for all 
it's internal data structures because that gives the maximum benefit 
from MTE.

> How do things like virtio work where the qemu or kernel is accessing
> guest memory for IO?

Since virtio is effectively emulating a device it should be treated like 
a device - no tag checking and no tag storage used. This would be the 
obvious situation where you would use "normal_mapping" as above so tags 
wouldn't be visible or checked.

Really the VMM is only interested in guest tags for the migration case 
where it simply needs to preserve them. I don't expect the guest and VMM 
(or hypervisor) to communicate using tagged memory.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09  9:15     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09  9:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, eric.auger
  Cc: Mark Rutland, Peter Maydell, Haibo Xu, Suzuki K Poulose,
	qemu-devel, Catalin Marinas, Juan Quintela, Richard Henderson,
	linux-kernel, Dave Martin, James Morse, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	Julien Thierry

On 07/09/2020 16:28, Dr. David Alan Gilbert wrote:
> (cc'ing in Eric Auger)
> 
> * Steven Price (steven.price@arm.com) wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>>   1. Requires the VMM to enable MTE per-VCPU.
>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
> 
> (Without understanding anything about your tag system...)
> 
> Note that during (normal, non-postcopy) migration the consistency can
> be a little loose - until the guest starts running; i.e. you can send
> a page that's in themiddle of being modified as long as you make sure
> you send it again later so that what the guest sees on the destination
> when it runs is consistent; i.e. it would be fine to send your tags
> separately to your data and allow them to get a little out of sync, as
> long as they caught up before the guest ran.

Yes, you can obviously pro-actively send data early as you as you 
appropriately deal with any potential changes that the guest might make. 
I'm not very familiar with exactly how QEMU handles this, so it might 
not be a problem - I just wanted to point out that we don't have 
separate access permissions.

>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> Imagine I had a second mapping; what would it look like; how would I get
> and restore the tags?

At a very simple level you could do something like:

  normal_mapping = mmap(..., PROT_READ | PROT_WRITE, ..., fd, 0);
  mte_mapping = mmap(..., PROT_READ | PROT_WRITE | PROT_MTE, ..., fd, 0);

  /* access normal mapping as normal */
  normal_mapping[offset] = 0xf00 + normal_mapping[offset + 1];

  /* read tag from mte_mapping */
  uint64_t tag = ldg(&mte_mapping[offset]);

  /* write a new tag value (8)
   * NOTE: tags are stored in the top byte, hence the shift
   */
  stg(0x8ULL << 56, &mte_mapping[offset]);

Where stg() and ldg() are simple wrappers around the new instructions:

  stg:
         STG x0, [x1]
         RET

  ldg:
         LDG x0, [x0]
         RET

> In terms of migration stream, I guess we have two ways to do this,
> either it rides shotgun on the main RAM section pages, transmitting
> those few extra bytes whenever we transmit a page, or you have a
> separate iteratable device for RAMtags, and it just transmits those.
> How you keep the two together is an interesting question.
> The shotgun method sounds nasty to avoid putting special cases in the,
> already hairy, RAM code.

As you say above it may be possible to simply let the normal RAM and 
tags get out of sync. E.g. if you send all the normal RAM (marking 
read-only as you go), then all the tags (not changing the permissions) 
you will end up with all the pages that have remained read-only (i.e. 
the guest hasn't modified) being consistent on the destination. Pages 
that have been written by the guest will be inconsistent, but you were 
going to have to resend those anyway.

However for post-migration copy you need to copy *both* normal RAM and 
tags before resuming the guest. You might need special cases for this.

>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> I guess you want the VMM to do as much tagged checked access as possible
> on it's own data structures?

Ideally yes, you would want the VMM to have checked accesses for all 
it's internal data structures because that gives the maximum benefit 
from MTE.

> How do things like virtio work where the qemu or kernel is accessing
> guest memory for IO?

Since virtio is effectively emulating a device it should be treated like 
a device - no tag checking and no tag storage used. This would be the 
obvious situation where you would use "normal_mapping" as above so tags 
wouldn't be visible or checked.

Really the VMM is only interested in guest tags for the migration case 
where it simply needs to preserve them. I don't expect the guest and VMM 
(or hypervisor) to communicate using tagged memory.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-04 16:00 ` Steven Price
  (?)
  (?)
@ 2020-09-09 15:25   ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:25 UTC (permalink / raw)
  To: Steven Price
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
>
   0. Feature probing

Probably a KVM cap, rather than requiring userspace to attempt VCPU
features one at a time with a scratch VCPU.
 
>  1. Requires the VMM to enable MTE per-VCPU.

I suppose. We're collecting many features that are enabling CPU features,
so they map nicely to VCPU features, yet they're effectively VM features
due to a shared resource such as an irq or memory.

>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.

Shouldn't this be up to the guest? Or, is this required in order for the
guest to use tagging at all. Something like making the guest IPAs memtag
capable, but if the guest doesn't enable tagging then there is no guest
impact? In any case, shouldn't userspace be the one that adds PROT_MTE
to the memory regions it wants the guest to be able to use tagging with,
rather than KVM adding the attribute page by page?

>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).
> 
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

If userspace needs to write to guest memory then it should be due to
a device DMA or other specific hardware emulation. Those accesses can
be done with tag checking disabled.

> 
> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).

Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
the tags for all addresses of each dirty page.

>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

So this is necessary to allow the VMM to keep tag checking enabled for
itself, plus map guest memory as PROT_MTE, and write to that memory when
needed? 

Thanks,
drew

> 
> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 15:25   ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:25 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
>
   0. Feature probing

Probably a KVM cap, rather than requiring userspace to attempt VCPU
features one at a time with a scratch VCPU.
 
>  1. Requires the VMM to enable MTE per-VCPU.

I suppose. We're collecting many features that are enabling CPU features,
so they map nicely to VCPU features, yet they're effectively VM features
due to a shared resource such as an irq or memory.

>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.

Shouldn't this be up to the guest? Or, is this required in order for the
guest to use tagging at all. Something like making the guest IPAs memtag
capable, but if the guest doesn't enable tagging then there is no guest
impact? In any case, shouldn't userspace be the one that adds PROT_MTE
to the memory regions it wants the guest to be able to use tagging with,
rather than KVM adding the attribute page by page?

>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).
> 
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

If userspace needs to write to guest memory then it should be due to
a device DMA or other specific hardware emulation. Those accesses can
be done with tag checking disabled.

> 
> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).

Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
the tags for all addresses of each dirty page.

>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

So this is necessary to allow the VMM to keep tag checking enabled for
itself, plus map guest memory as PROT_MTE, and write to that memory when
needed? 

Thanks,
drew

> 
> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 15:25   ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:25 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
>
   0. Feature probing

Probably a KVM cap, rather than requiring userspace to attempt VCPU
features one at a time with a scratch VCPU.
 
>  1. Requires the VMM to enable MTE per-VCPU.

I suppose. We're collecting many features that are enabling CPU features,
so they map nicely to VCPU features, yet they're effectively VM features
due to a shared resource such as an irq or memory.

>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.

Shouldn't this be up to the guest? Or, is this required in order for the
guest to use tagging at all. Something like making the guest IPAs memtag
capable, but if the guest doesn't enable tagging then there is no guest
impact? In any case, shouldn't userspace be the one that adds PROT_MTE
to the memory regions it wants the guest to be able to use tagging with,
rather than KVM adding the attribute page by page?

>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).
> 
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

If userspace needs to write to guest memory then it should be due to
a device DMA or other specific hardware emulation. Those accesses can
be done with tag checking disabled.

> 
> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).

Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
the tags for all addresses of each dirty page.

>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

So this is necessary to allow the VMM to keep tag checking enabled for
itself, plus map guest memory as PROT_MTE, and write to that memory when
needed? 

Thanks,
drew

> 
> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 15:25   ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:25 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
> bytes of memory in the system. This along with stashing a tag within the
> high bit of virtual addresses allows runtime checking of memory
> accesses.
> 
> These patches add support to KVM to enable MTE within a guest. They are
> based on Catalin's v9 MTE user-space support series[1].
> 
> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
> series currently:
>
   0. Feature probing

Probably a KVM cap, rather than requiring userspace to attempt VCPU
features one at a time with a scratch VCPU.
 
>  1. Requires the VMM to enable MTE per-VCPU.

I suppose. We're collecting many features that are enabling CPU features,
so they map nicely to VCPU features, yet they're effectively VM features
due to a shared resource such as an irq or memory.

>  2. Automatically promotes (normal host) memory given to the guest to be
>     tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>     tags are cleared if the memory wasn't previously MTE enabled.

Shouldn't this be up to the guest? Or, is this required in order for the
guest to use tagging at all. Something like making the guest IPAs memtag
capable, but if the guest doesn't enable tagging then there is no guest
impact? In any case, shouldn't userspace be the one that adds PROT_MTE
to the memory regions it wants the guest to be able to use tagging with,
rather than KVM adding the attribute page by page?

>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
> 
> (2) and (3) are particularly interesting from the aspect of VM migration.
> The guest is able to store/retrieve data in the tags (presumably for the
> purpose of tag checking, but architecturally it could be used as just
> storage). This means that when migrating a guest the data needs to be
> transferred (or saved/restored).
> 
> MTE tags are controlled by the same permission model as normal pages
> (i.e. a read-only page has read-only tags), so the normal methods of
> detecting guest changes to pages can be used. But this would also
> require the tags within a page to be migrated at the same time as the
> data (since the access control for tags is the same as the normal data
> within a page).
> 
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process) and since the tags in memory are controlled by the guest
> it's unlikely the VMM would have an appropriately tagged pointer for its
> access. This means the VMM would either need to maintain two mappings of
> memory (one to access tags, the other to access data) or disable tag
> checking during the accesses to data.

If userspace needs to write to guest memory then it should be due to
a device DMA or other specific hardware emulation. Those accesses can
be done with tag checking disabled.

> 
> If it's not practical to either disable tag checking in the VMM or
> maintain multiple mappings then the alternatives I'm aware of are:
> 
>  * Provide a KVM-specific method to extract the tags from guest memory.
>    This might also have benefits in terms of providing an easy way to
>    read bulk tag data from guest memory (since the LDGM instruction
>    isn't available at EL0).

Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
the tags for all addresses of each dirty page.

>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>    TCR_EL1. These would allow the VMM to generate pointers which are not
>    tag checked.

So this is necessary to allow the VMM to keep tag checking enabled for
itself, plus map guest memory as PROT_MTE, and write to that memory when
needed? 

Thanks,
drew

> 
> Feedback is welcome, and feel free to ask questions if anything in the
> above doesn't make sense.
> 
> Changes since the previous v1 posting[2]:
> 
>  * Rebasing clean-ups
>  * sysreg visibility is now controlled based on whether the VCPU has MTE
>    enabled or not
> 
> [1] https://lore.kernel.org/r/20200904103029.32083-1-catalin.marinas@arm.com
> [2] https://lore.kernel.org/r/20200713100102.53664-1-steven.price%40arm.com
> 
> Steven Price (2):
>   arm64: kvm: Save/restore MTE registers
>   arm64: kvm: Introduce MTE VCPU feature
> 
>  arch/arm64/include/asm/kvm_emulate.h       |  3 +++
>  arch/arm64/include/asm/kvm_host.h          |  9 ++++++++-
>  arch/arm64/include/asm/sysreg.h            |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++
>  arch/arm64/kvm/mmu.c                       | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c                     |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c                  | 20 +++++++++++++++-----
>  8 files changed, 66 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-04 16:00   ` Steven Price
  (?)
  (?)
@ 2020-09-09 15:48     ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
> on a VCPU. When enabled on any VCPU in the virtual machine this causes
> all pages that are faulted into the VM to have the PG_mte_tagged flag
> set (and the tag storage cleared if this is the first use).
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>  arch/arm64/include/uapi/asm/kvm.h    |  1 +
>  arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>  6 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 49a55be2b9a2..0042323a4b7f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>  	    vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4f4360dd149e..b1190366242b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,7 +37,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> @@ -110,6 +110,9 @@ struct kvm_arch {
>  	 * supported.
>  	 */
>  	bool return_nisv_io_abort_to_user;
> +
> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
> +	bool vcpu_has_mte;

It looks like this is unnecessary as it's only used once, where a feature
check could be used.

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..2677e1ab8c16 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ba00bcc0c884..e8891bacd76f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}
> +
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..82f3883d717f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
> +		if (!system_supports_mte()) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		vcpu->kvm->arch.vcpu_has_mte = true;
> +	}

We either need a KVM cap or a new CPU feature probing interface to avoid
making userspace try features one at a time. It's too bad that VCPU_INIT
doesn't clear all offending features from the feature set when returning
EINVAL, because then userspace could create a scratch VCPU with everything
it supports in order to see what KVM also supports in one go.

> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index a655f172b5ad..6a971b201e81 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>  			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>  		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>  	} else if (id == SYS_ID_AA64PFR1_EL1) {
> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  				   const struct sys_reg_desc *rd)
>  {
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		return 0;
> +
>  	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>

Thanks,
drew


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:48     ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
> on a VCPU. When enabled on any VCPU in the virtual machine this causes
> all pages that are faulted into the VM to have the PG_mte_tagged flag
> set (and the tag storage cleared if this is the first use).
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>  arch/arm64/include/uapi/asm/kvm.h    |  1 +
>  arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>  6 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 49a55be2b9a2..0042323a4b7f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>  	    vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4f4360dd149e..b1190366242b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,7 +37,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> @@ -110,6 +110,9 @@ struct kvm_arch {
>  	 * supported.
>  	 */
>  	bool return_nisv_io_abort_to_user;
> +
> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
> +	bool vcpu_has_mte;

It looks like this is unnecessary as it's only used once, where a feature
check could be used.

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..2677e1ab8c16 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ba00bcc0c884..e8891bacd76f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}
> +
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..82f3883d717f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
> +		if (!system_supports_mte()) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		vcpu->kvm->arch.vcpu_has_mte = true;
> +	}

We either need a KVM cap or a new CPU feature probing interface to avoid
making userspace try features one at a time. It's too bad that VCPU_INIT
doesn't clear all offending features from the feature set when returning
EINVAL, because then userspace could create a scratch VCPU with everything
it supports in order to see what KVM also supports in one go.

> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index a655f172b5ad..6a971b201e81 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>  			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>  		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>  	} else if (id == SYS_ID_AA64PFR1_EL1) {
> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  				   const struct sys_reg_desc *rd)
>  {
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		return 0;
> +
>  	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>

Thanks,
drew



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:48     ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
> on a VCPU. When enabled on any VCPU in the virtual machine this causes
> all pages that are faulted into the VM to have the PG_mte_tagged flag
> set (and the tag storage cleared if this is the first use).
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>  arch/arm64/include/uapi/asm/kvm.h    |  1 +
>  arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>  6 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 49a55be2b9a2..0042323a4b7f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>  	    vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4f4360dd149e..b1190366242b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,7 +37,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> @@ -110,6 +110,9 @@ struct kvm_arch {
>  	 * supported.
>  	 */
>  	bool return_nisv_io_abort_to_user;
> +
> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
> +	bool vcpu_has_mte;

It looks like this is unnecessary as it's only used once, where a feature
check could be used.

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..2677e1ab8c16 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ba00bcc0c884..e8891bacd76f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}
> +
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..82f3883d717f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
> +		if (!system_supports_mte()) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		vcpu->kvm->arch.vcpu_has_mte = true;
> +	}

We either need a KVM cap or a new CPU feature probing interface to avoid
making userspace try features one at a time. It's too bad that VCPU_INIT
doesn't clear all offending features from the feature set when returning
EINVAL, because then userspace could create a scratch VCPU with everything
it supports in order to see what KVM also supports in one go.

> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index a655f172b5ad..6a971b201e81 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>  			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>  		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>  	} else if (id == SYS_ID_AA64PFR1_EL1) {
> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  				   const struct sys_reg_desc *rd)
>  {
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		return 0;
> +
>  	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:48     ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-09 15:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
> on a VCPU. When enabled on any VCPU in the virtual machine this causes
> all pages that are faulted into the VM to have the PG_mte_tagged flag
> set (and the tag storage cleared if this is the first use).
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  3 +++
>  arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>  arch/arm64/include/uapi/asm/kvm.h    |  1 +
>  arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ++++++++
>  arch/arm64/kvm/sys_regs.c            |  6 +++++-
>  6 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 49a55be2b9a2..0042323a4b7f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>  	    vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 |= HCR_TID2;
> +
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>  }
>  
>  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4f4360dd149e..b1190366242b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,7 +37,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> @@ -110,6 +110,9 @@ struct kvm_arch {
>  	 * supported.
>  	 */
>  	bool return_nisv_io_abort_to_user;
> +
> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
> +	bool vcpu_has_mte;

It looks like this is unnecessary as it's only used once, where a feature
check could be used.

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..2677e1ab8c16 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ba00bcc0c884..e8891bacd76f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>  							   &pfn, &fault_ipa);
> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
> +		/*
> +		 * VM will be able to see the page's tags, so we must ensure
> +		 * they have been initialised.
> +		 */
> +		struct page *page = pfn_to_page(pfn);
> +		long i, nr_pages = compound_nr(page);
> +
> +		/* if PG_mte_tagged is set, tags have already been initialised */
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> +				mte_clear_page_tags(page_address(page));
> +		}
> +	}
> +
>  	if (writable)
>  		kvm_set_pfn_dirty(pfn);
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..82f3883d717f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
> +		if (!system_supports_mte()) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		vcpu->kvm->arch.vcpu_has_mte = true;
> +	}

We either need a KVM cap or a new CPU feature probing interface to avoid
making userspace try features one at a time. It's too bad that VCPU_INIT
doesn't clear all offending features from the feature set when returning
EINVAL, because then userspace could create a scratch VCPU with everything
it supports in order to see what KVM also supports in one go.

> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index a655f172b5ad..6a971b201e81 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>  			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>  		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>  	} else if (id == SYS_ID_AA64PFR1_EL1) {
> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>  	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>  		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>  			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>  				   const struct sys_reg_desc *rd)
>  {
> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
> +		return 0;
> +
>  	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>  }
>  
> -- 
> 2.20.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-09 15:48     ` Andrew Jones
  (?)
  (?)
@ 2020-09-09 15:53       ` Peter Maydell
  -1 siblings, 0 replies; 96+ messages in thread
From: Peter Maydell @ 2020-09-09 15:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Steven Price, Peter Maydell, Juan Quintela, Catalin Marinas,
	Richard Henderson, QEMU Developers, Dr. David Alan Gilbert,
	kvmarm, arm-mail-list, Marc Zyngier, Thomas Gleixner,
	Will Deacon, Dave Martin, lkml - Kernel Mailing List

On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

You could add one if you wanted -- add a new feature bit
TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
it clears out feature bits it doesn't support and also clears
TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
is still set, then it knows it's dealing with an old kernel
and has to do one-at-a-time probing. If it sees EINVAL but not
TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
has just got all the info.

-- PMM

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:53       ` Peter Maydell
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Maydell @ 2020-09-09 15:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, lkml - Kernel Mailing List, Juan Quintela,
	Catalin Marinas, Richard Henderson, QEMU Developers,
	Steven Price, arm-mail-list, Marc Zyngier, Thomas Gleixner,
	Will Deacon, kvmarm, Dr. David Alan Gilbert, Dave Martin

On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

You could add one if you wanted -- add a new feature bit
TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
it clears out feature bits it doesn't support and also clears
TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
is still set, then it knows it's dealing with an old kernel
and has to do one-at-a-time probing. If it sees EINVAL but not
TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
has just got all the info.

-- PMM


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:53       ` Peter Maydell
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Maydell @ 2020-09-09 15:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, lkml - Kernel Mailing List, Juan Quintela,
	Catalin Marinas, Richard Henderson, QEMU Developers,
	Steven Price, arm-mail-list, Marc Zyngier, Thomas Gleixner,
	Will Deacon, kvmarm, Dr. David Alan Gilbert, Dave Martin

On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

You could add one if you wanted -- add a new feature bit
TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
it clears out feature bits it doesn't support and also clears
TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
is still set, then it knows it's dealing with an old kernel
and has to do one-at-a-time probing. If it sees EINVAL but not
TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
has just got all the info.

-- PMM
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-09 15:53       ` Peter Maydell
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Maydell @ 2020-09-09 15:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, lkml - Kernel Mailing List, Juan Quintela,
	Catalin Marinas, Richard Henderson, QEMU Developers,
	Steven Price, arm-mail-list, Marc Zyngier, Thomas Gleixner,
	Will Deacon, kvmarm, Dr. David Alan Gilbert, Dave Martin

On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

You could add one if you wanted -- add a new feature bit
TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
it clears out feature bits it doesn't support and also clears
TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
is still set, then it knows it's dealing with an old kernel
and has to do one-at-a-time probing. If it sees EINVAL but not
TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
has just got all the info.

-- PMM

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-09 15:25   ` Andrew Jones
  (?)
  (?)
@ 2020-09-09 16:04     ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09 16:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 09/09/2020 16:25, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>     0. Feature probing
> 
> Probably a KVM cap, rather than requiring userspace to attempt VCPU
> features one at a time with a scratch VCPU.

Ah, good point - thanks for pointing that out.

>>   1. Requires the VMM to enable MTE per-VCPU.
> 
> I suppose. We're collecting many features that are enabling CPU features,
> so they map nicely to VCPU features, yet they're effectively VM features
> due to a shared resource such as an irq or memory.

Yeah this is a little weird I'll admit. The architectural feature is 
described per-CPU (well "processing element"), but it makes little sense 
to have it only on some CPUs and has effects on the rest of the memory 
system. Given that it's theoretically possible to build e.g. a 
big.LITTLE setup with only some CPUs support MTE it seemed more 
future-proof to design the API to allow it even though I hope no-one 
will use it.

>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
> 
> Shouldn't this be up to the guest? Or, is this required in order for the
> guest to use tagging at all. Something like making the guest IPAs memtag
> capable, but if the guest doesn't enable tagging then there is no guest
> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> to the memory regions it wants the guest to be able to use tagging with,
> rather than KVM adding the attribute page by page?

I think I've probably explained this badly.

The guest can choose how to populate the stage 1 mapping - so can choose 
which parts of memory are accessed tagged or not. However, the 
hypervisor cannot restrict this in stage 2 (except by e.g. making the 
memory uncached but that's obviously not great - however devices forward 
to the guest can be handled like this).

Because the hypervisor cannot restrict the guest's access to the tags, 
the hypervisor must assume that all memory given to the guest could have 
the tags accessed. So it must (a) clear any stale data from the tags, 
and (b) ensure that the tags are preserved (e.g. when swapping pages out).

Because of the above the current series automatically sets PG_mte_tagged 
on the pages. Note that this doesn't change the mappings that the VMM 
has (a non-PROT_MTE mapping will still not have access to the tags).

It's a shame that the stage-2 can't usefully restrict tag access, but 
this matches the architectural expectation: that if MTE is supported 
then all standard memory will be MTE-enabled.

>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
>>
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> If userspace needs to write to guest memory then it should be due to
> a device DMA or other specific hardware emulation. Those accesses can
> be done with tag checking disabled.

Yes, the question is can the VMM (sensibly) wrap the accesses with a 
disable/renable tag checking for the process sequence. The alternative 
at the moment is to maintain a separate (untagged) mapping for the 
purpose which might present it's own problems.

>>
>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

Certainly possible, although it seems to conflate two operations: "get 
list of dirty pages", "get tags from page". It would also require a lot 
of return space (size of slot/32).

>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed?

This is certainly one option. The architecture provides two "magic" 
values (all-0s and all-1s) which can be configured using TCMAx to be 
treated differently. The VMM could therefore construct pointers to 
otherwise tagged memory which would be treated as untagged.

However, Catalin's user space series doesn't at the moment expose this 
functionality.

Steve

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 16:04     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09 16:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:25, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>     0. Feature probing
> 
> Probably a KVM cap, rather than requiring userspace to attempt VCPU
> features one at a time with a scratch VCPU.

Ah, good point - thanks for pointing that out.

>>   1. Requires the VMM to enable MTE per-VCPU.
> 
> I suppose. We're collecting many features that are enabling CPU features,
> so they map nicely to VCPU features, yet they're effectively VM features
> due to a shared resource such as an irq or memory.

Yeah this is a little weird I'll admit. The architectural feature is 
described per-CPU (well "processing element"), but it makes little sense 
to have it only on some CPUs and has effects on the rest of the memory 
system. Given that it's theoretically possible to build e.g. a 
big.LITTLE setup with only some CPUs support MTE it seemed more 
future-proof to design the API to allow it even though I hope no-one 
will use it.

>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
> 
> Shouldn't this be up to the guest? Or, is this required in order for the
> guest to use tagging at all. Something like making the guest IPAs memtag
> capable, but if the guest doesn't enable tagging then there is no guest
> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> to the memory regions it wants the guest to be able to use tagging with,
> rather than KVM adding the attribute page by page?

I think I've probably explained this badly.

The guest can choose how to populate the stage 1 mapping - so can choose 
which parts of memory are accessed tagged or not. However, the 
hypervisor cannot restrict this in stage 2 (except by e.g. making the 
memory uncached but that's obviously not great - however devices forward 
to the guest can be handled like this).

Because the hypervisor cannot restrict the guest's access to the tags, 
the hypervisor must assume that all memory given to the guest could have 
the tags accessed. So it must (a) clear any stale data from the tags, 
and (b) ensure that the tags are preserved (e.g. when swapping pages out).

Because of the above the current series automatically sets PG_mte_tagged 
on the pages. Note that this doesn't change the mappings that the VMM 
has (a non-PROT_MTE mapping will still not have access to the tags).

It's a shame that the stage-2 can't usefully restrict tag access, but 
this matches the architectural expectation: that if MTE is supported 
then all standard memory will be MTE-enabled.

>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
>>
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> If userspace needs to write to guest memory then it should be due to
> a device DMA or other specific hardware emulation. Those accesses can
> be done with tag checking disabled.

Yes, the question is can the VMM (sensibly) wrap the accesses with a 
disable/renable tag checking for the process sequence. The alternative 
at the moment is to maintain a separate (untagged) mapping for the 
purpose which might present it's own problems.

>>
>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

Certainly possible, although it seems to conflate two operations: "get 
list of dirty pages", "get tags from page". It would also require a lot 
of return space (size of slot/32).

>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed?

This is certainly one option. The architecture provides two "magic" 
values (all-0s and all-1s) which can be configured using TCMAx to be 
treated differently. The VMM could therefore construct pointers to 
otherwise tagged memory which would be treated as untagged.

However, Catalin's user space series doesn't at the moment expose this 
functionality.

Steve


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 16:04     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09 16:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:25, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>     0. Feature probing
> 
> Probably a KVM cap, rather than requiring userspace to attempt VCPU
> features one at a time with a scratch VCPU.

Ah, good point - thanks for pointing that out.

>>   1. Requires the VMM to enable MTE per-VCPU.
> 
> I suppose. We're collecting many features that are enabling CPU features,
> so they map nicely to VCPU features, yet they're effectively VM features
> due to a shared resource such as an irq or memory.

Yeah this is a little weird I'll admit. The architectural feature is 
described per-CPU (well "processing element"), but it makes little sense 
to have it only on some CPUs and has effects on the rest of the memory 
system. Given that it's theoretically possible to build e.g. a 
big.LITTLE setup with only some CPUs support MTE it seemed more 
future-proof to design the API to allow it even though I hope no-one 
will use it.

>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
> 
> Shouldn't this be up to the guest? Or, is this required in order for the
> guest to use tagging at all. Something like making the guest IPAs memtag
> capable, but if the guest doesn't enable tagging then there is no guest
> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> to the memory regions it wants the guest to be able to use tagging with,
> rather than KVM adding the attribute page by page?

I think I've probably explained this badly.

The guest can choose how to populate the stage 1 mapping - so can choose 
which parts of memory are accessed tagged or not. However, the 
hypervisor cannot restrict this in stage 2 (except by e.g. making the 
memory uncached but that's obviously not great - however devices forward 
to the guest can be handled like this).

Because the hypervisor cannot restrict the guest's access to the tags, 
the hypervisor must assume that all memory given to the guest could have 
the tags accessed. So it must (a) clear any stale data from the tags, 
and (b) ensure that the tags are preserved (e.g. when swapping pages out).

Because of the above the current series automatically sets PG_mte_tagged 
on the pages. Note that this doesn't change the mappings that the VMM 
has (a non-PROT_MTE mapping will still not have access to the tags).

It's a shame that the stage-2 can't usefully restrict tag access, but 
this matches the architectural expectation: that if MTE is supported 
then all standard memory will be MTE-enabled.

>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
>>
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> If userspace needs to write to guest memory then it should be due to
> a device DMA or other specific hardware emulation. Those accesses can
> be done with tag checking disabled.

Yes, the question is can the VMM (sensibly) wrap the accesses with a 
disable/renable tag checking for the process sequence. The alternative 
at the moment is to maintain a separate (untagged) mapping for the 
purpose which might present it's own problems.

>>
>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

Certainly possible, although it seems to conflate two operations: "get 
list of dirty pages", "get tags from page". It would also require a lot 
of return space (size of slot/32).

>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed?

This is certainly one option. The architecture provides two "magic" 
values (all-0s and all-1s) which can be configured using TCMAx to be 
treated differently. The VMM could therefore construct pointers to 
otherwise tagged memory which would be treated as untagged.

However, Catalin's user space series doesn't at the moment expose this 
functionality.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-09 16:04     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-09 16:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:25, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>> Arm's Memory Tagging Extension (MTE) adds 4 bits of tag data to every 16
>> bytes of memory in the system. This along with stashing a tag within the
>> high bit of virtual addresses allows runtime checking of memory
>> accesses.
>>
>> These patches add support to KVM to enable MTE within a guest. They are
>> based on Catalin's v9 MTE user-space support series[1].
>>
>> I'd welcome feedback on the proposed user-kernel ABI. Specifically this
>> series currently:
>>
>     0. Feature probing
> 
> Probably a KVM cap, rather than requiring userspace to attempt VCPU
> features one at a time with a scratch VCPU.

Ah, good point - thanks for pointing that out.

>>   1. Requires the VMM to enable MTE per-VCPU.
> 
> I suppose. We're collecting many features that are enabling CPU features,
> so they map nicely to VCPU features, yet they're effectively VM features
> due to a shared resource such as an irq or memory.

Yeah this is a little weird I'll admit. The architectural feature is 
described per-CPU (well "processing element"), but it makes little sense 
to have it only on some CPUs and has effects on the rest of the memory 
system. Given that it's theoretically possible to build e.g. a 
big.LITTLE setup with only some CPUs support MTE it seemed more 
future-proof to design the API to allow it even though I hope no-one 
will use it.

>>   2. Automatically promotes (normal host) memory given to the guest to be
>>      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>      tags are cleared if the memory wasn't previously MTE enabled.
> 
> Shouldn't this be up to the guest? Or, is this required in order for the
> guest to use tagging at all. Something like making the guest IPAs memtag
> capable, but if the guest doesn't enable tagging then there is no guest
> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> to the memory regions it wants the guest to be able to use tagging with,
> rather than KVM adding the attribute page by page?

I think I've probably explained this badly.

The guest can choose how to populate the stage 1 mapping - so can choose 
which parts of memory are accessed tagged or not. However, the 
hypervisor cannot restrict this in stage 2 (except by e.g. making the 
memory uncached but that's obviously not great - however devices forward 
to the guest can be handled like this).

Because the hypervisor cannot restrict the guest's access to the tags, 
the hypervisor must assume that all memory given to the guest could have 
the tags accessed. So it must (a) clear any stale data from the tags, 
and (b) ensure that the tags are preserved (e.g. when swapping pages out).

Because of the above the current series automatically sets PG_mte_tagged 
on the pages. Note that this doesn't change the mappings that the VMM 
has (a non-PROT_MTE mapping will still not have access to the tags).

It's a shame that the stage-2 can't usefully restrict tag access, but 
this matches the architectural expectation: that if MTE is supported 
then all standard memory will be MTE-enabled.

>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
>>
>> (2) and (3) are particularly interesting from the aspect of VM migration.
>> The guest is able to store/retrieve data in the tags (presumably for the
>> purpose of tag checking, but architecturally it could be used as just
>> storage). This means that when migrating a guest the data needs to be
>> transferred (or saved/restored).
>>
>> MTE tags are controlled by the same permission model as normal pages
>> (i.e. a read-only page has read-only tags), so the normal methods of
>> detecting guest changes to pages can be used. But this would also
>> require the tags within a page to be migrated at the same time as the
>> data (since the access control for tags is the same as the normal data
>> within a page).
>>
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process) and since the tags in memory are controlled by the guest
>> it's unlikely the VMM would have an appropriately tagged pointer for its
>> access. This means the VMM would either need to maintain two mappings of
>> memory (one to access tags, the other to access data) or disable tag
>> checking during the accesses to data.
> 
> If userspace needs to write to guest memory then it should be due to
> a device DMA or other specific hardware emulation. Those accesses can
> be done with tag checking disabled.

Yes, the question is can the VMM (sensibly) wrap the accesses with a 
disable/renable tag checking for the process sequence. The alternative 
at the moment is to maintain a separate (untagged) mapping for the 
purpose which might present it's own problems.

>>
>> If it's not practical to either disable tag checking in the VMM or
>> maintain multiple mappings then the alternatives I'm aware of are:
>>
>>   * Provide a KVM-specific method to extract the tags from guest memory.
>>     This might also have benefits in terms of providing an easy way to
>>     read bulk tag data from guest memory (since the LDGM instruction
>>     isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

Certainly possible, although it seems to conflate two operations: "get 
list of dirty pages", "get tags from page". It would also require a lot 
of return space (size of slot/32).

>>   * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>     TCR_EL1. These would allow the VMM to generate pointers which are not
>>     tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed?

This is certainly one option. The architecture provides two "magic" 
values (all-0s and all-1s) which can be configured using TCMAx to be 
treated differently. The VMM could therefore construct pointers to 
otherwise tagged memory which would be treated as untagged.

However, Catalin's user space series doesn't at the moment expose this 
functionality.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-04 16:00 ` Steven Price
  (?)
  (?)
@ 2020-09-10  0:33   ` Richard Henderson
  -1 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  0:33 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, kvmarm,
	linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland,
	Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Peter Maydell, Haibo Xu

On 9/4/20 9:00 AM, Steven Price wrote:
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
...
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process)...

The latest version of the kernel patches for user mte support has separate
controls for how tag check fail is reported.  Including

> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

That may be less than optimal once userland starts uses tags itself, e.g.
running qemu itself with an mte-aware malloc.

Independent of that, there's also the TCO bit, which can be toggled by any
piece of code that wants to disable checking locally.

However, none of that is required for accessing tags.  User space can always
load/store tags via LDG/STG.  That's going to be slow, though.

It's a shame that LDGM/STGM are privileged instructions.  I don't understand
why that was done, since there's absolutely nothing that those insns can do
that you can't do with (up to) 16x LDG/STG.

I think it might be worth adding some sort of kernel entry point that can bulk
copy tags, e.g. page aligned quantities.  But that's just a speed of migration
thing and could come later.


r~

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  0:33   ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  0:33 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 9/4/20 9:00 AM, Steven Price wrote:
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
...
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process)...

The latest version of the kernel patches for user mte support has separate
controls for how tag check fail is reported.  Including

> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

That may be less than optimal once userland starts uses tags itself, e.g.
running qemu itself with an mte-aware malloc.

Independent of that, there's also the TCO bit, which can be toggled by any
piece of code that wants to disable checking locally.

However, none of that is required for accessing tags.  User space can always
load/store tags via LDG/STG.  That's going to be slow, though.

It's a shame that LDGM/STGM are privileged instructions.  I don't understand
why that was done, since there's absolutely nothing that those insns can do
that you can't do with (up to) 16x LDG/STG.

I think it might be worth adding some sort of kernel entry point that can bulk
copy tags, e.g. page aligned quantities.  But that's just a speed of migration
thing and could come later.


r~


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  0:33   ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  0:33 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Juan Quintela,
	linux-kernel, Dave Martin, Thomas Gleixner, kvmarm,
	linux-arm-kernel

On 9/4/20 9:00 AM, Steven Price wrote:
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
...
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process)...

The latest version of the kernel patches for user mte support has separate
controls for how tag check fail is reported.  Including

> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

That may be less than optimal once userland starts uses tags itself, e.g.
running qemu itself with an mte-aware malloc.

Independent of that, there's also the TCO bit, which can be toggled by any
piece of code that wants to disable checking locally.

However, none of that is required for accessing tags.  User space can always
load/store tags via LDG/STG.  That's going to be slow, though.

It's a shame that LDGM/STGM are privileged instructions.  I don't understand
why that was done, since there's absolutely nothing that those insns can do
that you can't do with (up to) 16x LDG/STG.

I think it might be worth adding some sort of kernel entry point that can bulk
copy tags, e.g. page aligned quantities.  But that's just a speed of migration
thing and could come later.


r~
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  0:33   ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  0:33 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 9/4/20 9:00 AM, Steven Price wrote:
>  3. Doesn't provide any new methods for the VMM to access the tags on
>     memory.
...
> (3) may be problematic and I'd welcome input from those familiar with
> VMMs. User space cannot access tags unless the memory is mapped with the
> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
> for the user space process (assuming the VMM enables tag checking for
> the process)...

The latest version of the kernel patches for user mte support has separate
controls for how tag check fail is reported.  Including

> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

That may be less than optimal once userland starts uses tags itself, e.g.
running qemu itself with an mte-aware malloc.

Independent of that, there's also the TCO bit, which can be toggled by any
piece of code that wants to disable checking locally.

However, none of that is required for accessing tags.  User space can always
load/store tags via LDG/STG.  That's going to be slow, though.

It's a shame that LDGM/STGM are privileged instructions.  I don't understand
why that was done, since there's absolutely nothing that those insns can do
that you can't do with (up to) 16x LDG/STG.

I think it might be worth adding some sort of kernel entry point that can bulk
copy tags, e.g. page aligned quantities.  But that's just a speed of migration
thing and could come later.


r~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-09 15:25   ` Andrew Jones
  (?)
  (?)
@ 2020-09-10  1:45     ` Richard Henderson
  -1 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  1:45 UTC (permalink / raw)
  To: Andrew Jones, Steven Price
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, linux-kernel, Thomas Gleixner, kvmarm,
	linux-arm-kernel

On 9/9/20 8:25 AM, Andrew Jones wrote:
>>  * Provide a KVM-specific method to extract the tags from guest memory.
>>    This might also have benefits in terms of providing an easy way to
>>    read bulk tag data from guest memory (since the LDGM instruction
>>    isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
the data out from its local address to guest memory.

There'd be no difference with or without tags, afaik.  It's just about how VMM
copies the data, with or without tags.

>>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>    TCR_EL1. These would allow the VMM to generate pointers which are not
>>    tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed? 

I don't see a requirement for the VMM to set TCMA0.


r~

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  1:45     ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  1:45 UTC (permalink / raw)
  To: Andrew Jones, Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, qemu-devel,
	Dr. David Alan Gilbert, kvmarm, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, Dave Martin, linux-kernel

On 9/9/20 8:25 AM, Andrew Jones wrote:
>>  * Provide a KVM-specific method to extract the tags from guest memory.
>>    This might also have benefits in terms of providing an easy way to
>>    read bulk tag data from guest memory (since the LDGM instruction
>>    isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
the data out from its local address to guest memory.

There'd be no difference with or without tags, afaik.  It's just about how VMM
copies the data, with or without tags.

>>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>    TCR_EL1. These would allow the VMM to generate pointers which are not
>>    tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed? 

I don't see a requirement for the VMM to set TCMA0.


r~


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  1:45     ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  1:45 UTC (permalink / raw)
  To: Andrew Jones, Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, qemu-devel,
	Dr. David Alan Gilbert, kvmarm, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, Dave Martin, linux-kernel

On 9/9/20 8:25 AM, Andrew Jones wrote:
>>  * Provide a KVM-specific method to extract the tags from guest memory.
>>    This might also have benefits in terms of providing an easy way to
>>    read bulk tag data from guest memory (since the LDGM instruction
>>    isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
the data out from its local address to guest memory.

There'd be no difference with or without tags, afaik.  It's just about how VMM
copies the data, with or without tags.

>>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>    TCR_EL1. These would allow the VMM to generate pointers which are not
>>    tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed? 

I don't see a requirement for the VMM to set TCMA0.


r~
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  1:45     ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10  1:45 UTC (permalink / raw)
  To: Andrew Jones, Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, qemu-devel,
	Dr. David Alan Gilbert, kvmarm, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, Dave Martin, linux-kernel

On 9/9/20 8:25 AM, Andrew Jones wrote:
>>  * Provide a KVM-specific method to extract the tags from guest memory.
>>    This might also have benefits in terms of providing an easy way to
>>    read bulk tag data from guest memory (since the LDGM instruction
>>    isn't available at EL0).
> 
> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> the tags for all addresses of each dirty page.

KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
the data out from its local address to guest memory.

There'd be no difference with or without tags, afaik.  It's just about how VMM
copies the data, with or without tags.

>>  * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>    TCR_EL1. These would allow the VMM to generate pointers which are not
>>    tag checked.
> 
> So this is necessary to allow the VMM to keep tag checking enabled for
> itself, plus map guest memory as PROT_MTE, and write to that memory when
> needed? 

I don't see a requirement for the VMM to set TCMA0.


r~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10  1:45     ` Richard Henderson
  (?)
  (?)
@ 2020-09-10  5:44       ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  5:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Steven Price, Peter Maydell, Juan Quintela, Catalin Marinas,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> On 9/9/20 8:25 AM, Andrew Jones wrote:
> >>  * Provide a KVM-specific method to extract the tags from guest memory.
> >>    This might also have benefits in terms of providing an easy way to
> >>    read bulk tag data from guest memory (since the LDGM instruction
> >>    isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> the data out from its local address to guest memory.
> 
> There'd be no difference with or without tags, afaik.  It's just about how VMM
> copies the data, with or without tags.

Right, as long as it's fast enough to do

  for_each_dirty_page(page, dirty_log)
    for (i = 0; i < host-page-size/16; i += 16)
      append_tag(LDG(page + i))

to get all the tags for each dirty page. I understood it would be faster
to use LDGM, but we'd need a new ioctl for that. So I was proposing we
just piggyback on a new dirty-log ioctl instead.

Thanks,
drew 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  5:44       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  5:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, linux-kernel,
	qemu-devel, Dr. David Alan Gilbert, Marc Zyngier,
	Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> On 9/9/20 8:25 AM, Andrew Jones wrote:
> >>  * Provide a KVM-specific method to extract the tags from guest memory.
> >>    This might also have benefits in terms of providing an easy way to
> >>    read bulk tag data from guest memory (since the LDGM instruction
> >>    isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> the data out from its local address to guest memory.
> 
> There'd be no difference with or without tags, afaik.  It's just about how VMM
> copies the data, with or without tags.

Right, as long as it's fast enough to do

  for_each_dirty_page(page, dirty_log)
    for (i = 0; i < host-page-size/16; i += 16)
      append_tag(LDG(page + i))

to get all the tags for each dirty page. I understood it would be faster
to use LDGM, but we'd need a new ioctl for that. So I was proposing we
just piggyback on a new dirty-log ioctl instead.

Thanks,
drew 



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  5:44       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  5:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, linux-kernel,
	qemu-devel, Dr. David Alan Gilbert, Marc Zyngier,
	Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> On 9/9/20 8:25 AM, Andrew Jones wrote:
> >>  * Provide a KVM-specific method to extract the tags from guest memory.
> >>    This might also have benefits in terms of providing an easy way to
> >>    read bulk tag data from guest memory (since the LDGM instruction
> >>    isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> the data out from its local address to guest memory.
> 
> There'd be no difference with or without tags, afaik.  It's just about how VMM
> copies the data, with or without tags.

Right, as long as it's fast enough to do

  for_each_dirty_page(page, dirty_log)
    for (i = 0; i < host-page-size/16; i += 16)
      append_tag(LDG(page + i))

to get all the tags for each dirty page. I understood it would be faster
to use LDGM, but we'd need a new ioctl for that. So I was proposing we
just piggyback on a new dirty-log ioctl instead.

Thanks,
drew 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  5:44       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  5:44 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, linux-kernel,
	qemu-devel, Dr. David Alan Gilbert, Marc Zyngier,
	Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> On 9/9/20 8:25 AM, Andrew Jones wrote:
> >>  * Provide a KVM-specific method to extract the tags from guest memory.
> >>    This might also have benefits in terms of providing an easy way to
> >>    read bulk tag data from guest memory (since the LDGM instruction
> >>    isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> the data out from its local address to guest memory.
> 
> There'd be no difference with or without tags, afaik.  It's just about how VMM
> copies the data, with or without tags.

Right, as long as it's fast enough to do

  for_each_dirty_page(page, dirty_log)
    for (i = 0; i < host-page-size/16; i += 16)
      append_tag(LDG(page + i))

to get all the tags for each dirty page. I understood it would be faster
to use LDGM, but we'd need a new ioctl for that. So I was proposing we
just piggyback on a new dirty-log ioctl instead.

Thanks,
drew 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-09 16:04     ` Steven Price
  (?)
  (?)
@ 2020-09-10  6:29       ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:29 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
> On 09/09/2020 16:25, Andrew Jones wrote:
> > On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> > >   2. Automatically promotes (normal host) memory given to the guest to be
> > >      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> > >      tags are cleared if the memory wasn't previously MTE enabled.
> > 
> > Shouldn't this be up to the guest? Or, is this required in order for the
> > guest to use tagging at all. Something like making the guest IPAs memtag
> > capable, but if the guest doesn't enable tagging then there is no guest
> > impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> > to the memory regions it wants the guest to be able to use tagging with,
> > rather than KVM adding the attribute page by page?
> 
> I think I've probably explained this badly.
> 
> The guest can choose how to populate the stage 1 mapping - so can choose
> which parts of memory are accessed tagged or not. However, the hypervisor
> cannot restrict this in stage 2 (except by e.g. making the memory uncached
> but that's obviously not great - however devices forward to the guest can be
> handled like this).
> 
> Because the hypervisor cannot restrict the guest's access to the tags, the
> hypervisor must assume that all memory given to the guest could have the
> tags accessed. So it must (a) clear any stale data from the tags, and (b)
> ensure that the tags are preserved (e.g. when swapping pages out).
> 

Yes, this is how I understood it.

> Because of the above the current series automatically sets PG_mte_tagged on
> the pages. Note that this doesn't change the mappings that the VMM has (a
> non-PROT_MTE mapping will still not have access to the tags).

But if userspace created the memslots with memory already set with
PROT_MTE, then this wouldn't be necessary, right? And, as long as
there's still a way to access the memory with tag checking disabled,
then it shouldn't be a problem.

> > 
> > If userspace needs to write to guest memory then it should be due to
> > a device DMA or other specific hardware emulation. Those accesses can
> > be done with tag checking disabled.
> 
> Yes, the question is can the VMM (sensibly) wrap the accesses with a
> disable/renable tag checking for the process sequence. The alternative at
> the moment is to maintain a separate (untagged) mapping for the purpose
> which might present it's own problems.

Hmm, so there's no easy way to disable tag checking when necessary? If we
don't map the guest ram with PROT_MTE and continue setting the attribute
in KVM, as this series does, then we don't need to worry about it tag
checking when accessing the memory, but then we can't access the tags for
migration.

> 
> > > 
> > > If it's not practical to either disable tag checking in the VMM or
> > > maintain multiple mappings then the alternatives I'm aware of are:
> > > 
> > >   * Provide a KVM-specific method to extract the tags from guest memory.
> > >     This might also have benefits in terms of providing an easy way to
> > >     read bulk tag data from guest memory (since the LDGM instruction
> > >     isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> Certainly possible, although it seems to conflate two operations: "get list
> of dirty pages", "get tags from page". It would also require a lot of return
> space (size of slot/32).
>

It would require num-set-bits * host-page-size / 16 / 2, right?
 
> > >   * Provide support for user space setting the TCMA0 or TCMA1 bits in
> > >     TCR_EL1. These would allow the VMM to generate pointers which are not
> > >     tag checked.
> > 
> > So this is necessary to allow the VMM to keep tag checking enabled for
> > itself, plus map guest memory as PROT_MTE, and write to that memory when
> > needed?
> 
> This is certainly one option. The architecture provides two "magic" values
> (all-0s and all-1s) which can be configured using TCMAx to be treated
> differently. The VMM could therefore construct pointers to otherwise tagged
> memory which would be treated as untagged.
> 
> However, Catalin's user space series doesn't at the moment expose this
> functionality.
>

So if I understand correctly this would allow us to map the guest memory
with PAGE_MTE and still access the memory when needed. If so, then this
sounds interesting.

Thanks,
drew 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  6:29       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:29 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
> On 09/09/2020 16:25, Andrew Jones wrote:
> > On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> > >   2. Automatically promotes (normal host) memory given to the guest to be
> > >      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> > >      tags are cleared if the memory wasn't previously MTE enabled.
> > 
> > Shouldn't this be up to the guest? Or, is this required in order for the
> > guest to use tagging at all. Something like making the guest IPAs memtag
> > capable, but if the guest doesn't enable tagging then there is no guest
> > impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> > to the memory regions it wants the guest to be able to use tagging with,
> > rather than KVM adding the attribute page by page?
> 
> I think I've probably explained this badly.
> 
> The guest can choose how to populate the stage 1 mapping - so can choose
> which parts of memory are accessed tagged or not. However, the hypervisor
> cannot restrict this in stage 2 (except by e.g. making the memory uncached
> but that's obviously not great - however devices forward to the guest can be
> handled like this).
> 
> Because the hypervisor cannot restrict the guest's access to the tags, the
> hypervisor must assume that all memory given to the guest could have the
> tags accessed. So it must (a) clear any stale data from the tags, and (b)
> ensure that the tags are preserved (e.g. when swapping pages out).
> 

Yes, this is how I understood it.

> Because of the above the current series automatically sets PG_mte_tagged on
> the pages. Note that this doesn't change the mappings that the VMM has (a
> non-PROT_MTE mapping will still not have access to the tags).

But if userspace created the memslots with memory already set with
PROT_MTE, then this wouldn't be necessary, right? And, as long as
there's still a way to access the memory with tag checking disabled,
then it shouldn't be a problem.

> > 
> > If userspace needs to write to guest memory then it should be due to
> > a device DMA or other specific hardware emulation. Those accesses can
> > be done with tag checking disabled.
> 
> Yes, the question is can the VMM (sensibly) wrap the accesses with a
> disable/renable tag checking for the process sequence. The alternative at
> the moment is to maintain a separate (untagged) mapping for the purpose
> which might present it's own problems.

Hmm, so there's no easy way to disable tag checking when necessary? If we
don't map the guest ram with PROT_MTE and continue setting the attribute
in KVM, as this series does, then we don't need to worry about it tag
checking when accessing the memory, but then we can't access the tags for
migration.

> 
> > > 
> > > If it's not practical to either disable tag checking in the VMM or
> > > maintain multiple mappings then the alternatives I'm aware of are:
> > > 
> > >   * Provide a KVM-specific method to extract the tags from guest memory.
> > >     This might also have benefits in terms of providing an easy way to
> > >     read bulk tag data from guest memory (since the LDGM instruction
> > >     isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> Certainly possible, although it seems to conflate two operations: "get list
> of dirty pages", "get tags from page". It would also require a lot of return
> space (size of slot/32).
>

It would require num-set-bits * host-page-size / 16 / 2, right?
 
> > >   * Provide support for user space setting the TCMA0 or TCMA1 bits in
> > >     TCR_EL1. These would allow the VMM to generate pointers which are not
> > >     tag checked.
> > 
> > So this is necessary to allow the VMM to keep tag checking enabled for
> > itself, plus map guest memory as PROT_MTE, and write to that memory when
> > needed?
> 
> This is certainly one option. The architecture provides two "magic" values
> (all-0s and all-1s) which can be configured using TCMAx to be treated
> differently. The VMM could therefore construct pointers to otherwise tagged
> memory which would be treated as untagged.
> 
> However, Catalin's user space series doesn't at the moment expose this
> functionality.
>

So if I understand correctly this would allow us to map the guest memory
with PAGE_MTE and still access the memory when needed. If so, then this
sounds interesting.

Thanks,
drew 



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  6:29       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:29 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
> On 09/09/2020 16:25, Andrew Jones wrote:
> > On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> > >   2. Automatically promotes (normal host) memory given to the guest to be
> > >      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> > >      tags are cleared if the memory wasn't previously MTE enabled.
> > 
> > Shouldn't this be up to the guest? Or, is this required in order for the
> > guest to use tagging at all. Something like making the guest IPAs memtag
> > capable, but if the guest doesn't enable tagging then there is no guest
> > impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> > to the memory regions it wants the guest to be able to use tagging with,
> > rather than KVM adding the attribute page by page?
> 
> I think I've probably explained this badly.
> 
> The guest can choose how to populate the stage 1 mapping - so can choose
> which parts of memory are accessed tagged or not. However, the hypervisor
> cannot restrict this in stage 2 (except by e.g. making the memory uncached
> but that's obviously not great - however devices forward to the guest can be
> handled like this).
> 
> Because the hypervisor cannot restrict the guest's access to the tags, the
> hypervisor must assume that all memory given to the guest could have the
> tags accessed. So it must (a) clear any stale data from the tags, and (b)
> ensure that the tags are preserved (e.g. when swapping pages out).
> 

Yes, this is how I understood it.

> Because of the above the current series automatically sets PG_mte_tagged on
> the pages. Note that this doesn't change the mappings that the VMM has (a
> non-PROT_MTE mapping will still not have access to the tags).

But if userspace created the memslots with memory already set with
PROT_MTE, then this wouldn't be necessary, right? And, as long as
there's still a way to access the memory with tag checking disabled,
then it shouldn't be a problem.

> > 
> > If userspace needs to write to guest memory then it should be due to
> > a device DMA or other specific hardware emulation. Those accesses can
> > be done with tag checking disabled.
> 
> Yes, the question is can the VMM (sensibly) wrap the accesses with a
> disable/renable tag checking for the process sequence. The alternative at
> the moment is to maintain a separate (untagged) mapping for the purpose
> which might present it's own problems.

Hmm, so there's no easy way to disable tag checking when necessary? If we
don't map the guest ram with PROT_MTE and continue setting the attribute
in KVM, as this series does, then we don't need to worry about it tag
checking when accessing the memory, but then we can't access the tags for
migration.

> 
> > > 
> > > If it's not practical to either disable tag checking in the VMM or
> > > maintain multiple mappings then the alternatives I'm aware of are:
> > > 
> > >   * Provide a KVM-specific method to extract the tags from guest memory.
> > >     This might also have benefits in terms of providing an easy way to
> > >     read bulk tag data from guest memory (since the LDGM instruction
> > >     isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> Certainly possible, although it seems to conflate two operations: "get list
> of dirty pages", "get tags from page". It would also require a lot of return
> space (size of slot/32).
>

It would require num-set-bits * host-page-size / 16 / 2, right?
 
> > >   * Provide support for user space setting the TCMA0 or TCMA1 bits in
> > >     TCR_EL1. These would allow the VMM to generate pointers which are not
> > >     tag checked.
> > 
> > So this is necessary to allow the VMM to keep tag checking enabled for
> > itself, plus map guest memory as PROT_MTE, and write to that memory when
> > needed?
> 
> This is certainly one option. The architecture provides two "magic" values
> (all-0s and all-1s) which can be configured using TCMAx to be treated
> differently. The VMM could therefore construct pointers to otherwise tagged
> memory which would be treated as untagged.
> 
> However, Catalin's user space series doesn't at the moment expose this
> functionality.
>

So if I understand correctly this would allow us to map the guest memory
with PAGE_MTE and still access the memory when needed. If so, then this
sounds interesting.

Thanks,
drew 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  6:29       ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:29 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
> On 09/09/2020 16:25, Andrew Jones wrote:
> > On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
> > >   2. Automatically promotes (normal host) memory given to the guest to be
> > >      tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
> > >      tags are cleared if the memory wasn't previously MTE enabled.
> > 
> > Shouldn't this be up to the guest? Or, is this required in order for the
> > guest to use tagging at all. Something like making the guest IPAs memtag
> > capable, but if the guest doesn't enable tagging then there is no guest
> > impact? In any case, shouldn't userspace be the one that adds PROT_MTE
> > to the memory regions it wants the guest to be able to use tagging with,
> > rather than KVM adding the attribute page by page?
> 
> I think I've probably explained this badly.
> 
> The guest can choose how to populate the stage 1 mapping - so can choose
> which parts of memory are accessed tagged or not. However, the hypervisor
> cannot restrict this in stage 2 (except by e.g. making the memory uncached
> but that's obviously not great - however devices forward to the guest can be
> handled like this).
> 
> Because the hypervisor cannot restrict the guest's access to the tags, the
> hypervisor must assume that all memory given to the guest could have the
> tags accessed. So it must (a) clear any stale data from the tags, and (b)
> ensure that the tags are preserved (e.g. when swapping pages out).
> 

Yes, this is how I understood it.

> Because of the above the current series automatically sets PG_mte_tagged on
> the pages. Note that this doesn't change the mappings that the VMM has (a
> non-PROT_MTE mapping will still not have access to the tags).

But if userspace created the memslots with memory already set with
PROT_MTE, then this wouldn't be necessary, right? And, as long as
there's still a way to access the memory with tag checking disabled,
then it shouldn't be a problem.

> > 
> > If userspace needs to write to guest memory then it should be due to
> > a device DMA or other specific hardware emulation. Those accesses can
> > be done with tag checking disabled.
> 
> Yes, the question is can the VMM (sensibly) wrap the accesses with a
> disable/renable tag checking for the process sequence. The alternative at
> the moment is to maintain a separate (untagged) mapping for the purpose
> which might present it's own problems.

Hmm, so there's no easy way to disable tag checking when necessary? If we
don't map the guest ram with PROT_MTE and continue setting the attribute
in KVM, as this series does, then we don't need to worry about it tag
checking when accessing the memory, but then we can't access the tags for
migration.

> 
> > > 
> > > If it's not practical to either disable tag checking in the VMM or
> > > maintain multiple mappings then the alternatives I'm aware of are:
> > > 
> > >   * Provide a KVM-specific method to extract the tags from guest memory.
> > >     This might also have benefits in terms of providing an easy way to
> > >     read bulk tag data from guest memory (since the LDGM instruction
> > >     isn't available at EL0).
> > 
> > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > the tags for all addresses of each dirty page.
> 
> Certainly possible, although it seems to conflate two operations: "get list
> of dirty pages", "get tags from page". It would also require a lot of return
> space (size of slot/32).
>

It would require num-set-bits * host-page-size / 16 / 2, right?
 
> > >   * Provide support for user space setting the TCMA0 or TCMA1 bits in
> > >     TCR_EL1. These would allow the VMM to generate pointers which are not
> > >     tag checked.
> > 
> > So this is necessary to allow the VMM to keep tag checking enabled for
> > itself, plus map guest memory as PROT_MTE, and write to that memory when
> > needed?
> 
> This is certainly one option. The architecture provides two "magic" values
> (all-0s and all-1s) which can be configured using TCMAx to be treated
> differently. The VMM could therefore construct pointers to otherwise tagged
> memory which would be treated as untagged.
> 
> However, Catalin's user space series doesn't at the moment expose this
> functionality.
>

So if I understand correctly this would allow us to map the guest memory
with PAGE_MTE and still access the memory when needed. If so, then this
sounds interesting.

Thanks,
drew 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-09 15:53       ` Peter Maydell
  (?)
  (?)
@ 2020-09-10  6:38         ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, lkml - Kernel Mailing List, Juan Quintela,
	Catalin Marinas, Richard Henderson, QEMU Developers,
	Steven Price, arm-mail-list, Marc Zyngier, Thomas Gleixner,
	Will Deacon, kvmarm, Dr. David Alan Gilbert, Dave Martin

On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> You could add one if you wanted -- add a new feature bit
> TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> it clears out feature bits it doesn't support and also clears
> TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> is still set, then it knows it's dealing with an old kernel
> and has to do one-at-a-time probing. If it sees EINVAL but not
> TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> has just got all the info.
>

That's a great proposal. I'll try to find time to send the patches.

Thanks,
drew


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  6:38         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List,
	Dr. David Alan Gilbert, Marc Zyngier, Thomas Gleixner,
	Steven Price, Will Deacon, kvmarm, arm-mail-list, Dave Martin

On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> You could add one if you wanted -- add a new feature bit
> TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> it clears out feature bits it doesn't support and also clears
> TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> is still set, then it knows it's dealing with an old kernel
> and has to do one-at-a-time probing. If it sees EINVAL but not
> TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> has just got all the info.
>

That's a great proposal. I'll try to find time to send the patches.

Thanks,
drew



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  6:38         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List,
	Dr. David Alan Gilbert, Marc Zyngier, Thomas Gleixner,
	Steven Price, Will Deacon, kvmarm, arm-mail-list, Dave Martin

On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> You could add one if you wanted -- add a new feature bit
> TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> it clears out feature bits it doesn't support and also clears
> TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> is still set, then it knows it's dealing with an old kernel
> and has to do one-at-a-time probing. If it sees EINVAL but not
> TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> has just got all the info.
>

That's a great proposal. I'll try to find time to send the patches.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  6:38         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10  6:38 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List,
	Dr. David Alan Gilbert, Marc Zyngier, Thomas Gleixner,
	Steven Price, Will Deacon, kvmarm, arm-mail-list, Dave Martin

On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> You could add one if you wanted -- add a new feature bit
> TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> it clears out feature bits it doesn't support and also clears
> TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> is still set, then it knows it's dealing with an old kernel
> and has to do one-at-a-time probing. If it sees EINVAL but not
> TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> has just got all the info.
>

That's a great proposal. I'll try to find time to send the patches.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10  6:29       ` Andrew Jones
  (?)
  (?)
@ 2020-09-10  9:21         ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 10/09/2020 07:29, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
>> On 09/09/2020 16:25, Andrew Jones wrote:
>>> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>>>>    2. Automatically promotes (normal host) memory given to the guest to be
>>>>       tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>>>       tags are cleared if the memory wasn't previously MTE enabled.
>>>
>>> Shouldn't this be up to the guest? Or, is this required in order for the
>>> guest to use tagging at all. Something like making the guest IPAs memtag
>>> capable, but if the guest doesn't enable tagging then there is no guest
>>> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
>>> to the memory regions it wants the guest to be able to use tagging with,
>>> rather than KVM adding the attribute page by page?
>>
>> I think I've probably explained this badly.
>>
>> The guest can choose how to populate the stage 1 mapping - so can choose
>> which parts of memory are accessed tagged or not. However, the hypervisor
>> cannot restrict this in stage 2 (except by e.g. making the memory uncached
>> but that's obviously not great - however devices forward to the guest can be
>> handled like this).
>>
>> Because the hypervisor cannot restrict the guest's access to the tags, the
>> hypervisor must assume that all memory given to the guest could have the
>> tags accessed. So it must (a) clear any stale data from the tags, and (b)
>> ensure that the tags are preserved (e.g. when swapping pages out).
>>
> 
> Yes, this is how I understood it.

Ok, I've obviously misunderstood your comment instead ;)

>> Because of the above the current series automatically sets PG_mte_tagged on
>> the pages. Note that this doesn't change the mappings that the VMM has (a
>> non-PROT_MTE mapping will still not have access to the tags).
> 
> But if userspace created the memslots with memory already set with
> PROT_MTE, then this wouldn't be necessary, right? And, as long as
> there's still a way to access the memory with tag checking disabled,
> then it shouldn't be a problem.

Yes, so one option would be to attempt to validate that the VMM has 
provided memory pages with the PG_mte_tagged bit set (e.g. by mapping 
with PROT_MTE). The tricky part here is that we support KVM_CAP_SYNC_MMU 
which means that the VMM can change the memory backing at any time - so 
we could end up in user_mem_abort() discovering that a page doesn't have 
PG_mte_tagged set - at that point there's no nice way of handling it 
(other than silently upgrading the page) so the VM is dead.

So since enforcing that PG_mte_tagged is set isn't easy and provides a 
hard-to-debug foot gun to the VMM I decided the better option was to let 
the kernel set the bit automatically.

>>>
>>> If userspace needs to write to guest memory then it should be due to
>>> a device DMA or other specific hardware emulation. Those accesses can
>>> be done with tag checking disabled.
>>
>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>> disable/renable tag checking for the process sequence. The alternative at
>> the moment is to maintain a separate (untagged) mapping for the purpose
>> which might present it's own problems.
> 
> Hmm, so there's no easy way to disable tag checking when necessary? If we
> don't map the guest ram with PROT_MTE and continue setting the attribute
> in KVM, as this series does, then we don't need to worry about it tag
> checking when accessing the memory, but then we can't access the tags for
> migration.

There's a "TCO" (Tag Check Override) bit in PSTATE which allows 
disabling tag checking, so if it's reasonable to wrap accesses to the 
memory you can simply set the TCO bit, perform the memory access and 
then unset TCO. That would mean a single mapping with MTE enabled would 
work fine. What I don't have a clue about is whether it's practical in 
the VMM to wrap guest accesses like this.

>>
>>>>
>>>> If it's not practical to either disable tag checking in the VMM or
>>>> maintain multiple mappings then the alternatives I'm aware of are:
>>>>
>>>>    * Provide a KVM-specific method to extract the tags from guest memory.
>>>>      This might also have benefits in terms of providing an easy way to
>>>>      read bulk tag data from guest memory (since the LDGM instruction
>>>>      isn't available at EL0).
>>>
>>> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
>>> the tags for all addresses of each dirty page.
>>
>> Certainly possible, although it seems to conflate two operations: "get list
>> of dirty pages", "get tags from page". It would also require a lot of return
>> space (size of slot/32).
>>
> 
> It would require num-set-bits * host-page-size / 16 / 2, right?

Yes, where the worst case is all bits set which is size/32. Since you 
don't know at the time of the call how many bits are going to be set I'm 
not sure how you would design the API which doesn't require 
preallocating the worst case.

>>>>    * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>>>      TCR_EL1. These would allow the VMM to generate pointers which are not
>>>>      tag checked.
>>>
>>> So this is necessary to allow the VMM to keep tag checking enabled for
>>> itself, plus map guest memory as PROT_MTE, and write to that memory when
>>> needed?
>>
>> This is certainly one option. The architecture provides two "magic" values
>> (all-0s and all-1s) which can be configured using TCMAx to be treated
>> differently. The VMM could therefore construct pointers to otherwise tagged
>> memory which would be treated as untagged.
>>
>> However, Catalin's user space series doesn't at the moment expose this
>> functionality.
>>
> 
> So if I understand correctly this would allow us to map the guest memory
> with PAGE_MTE and still access the memory when needed. If so, then this
> sounds interesting.

Yes - you could derive a pointer which didn't perform tag checking. Note 
that this also requires the rest of user space to play along (i.e. 
understand that the tag value is reserved). I believe for user space we 
have to use the all-0s value which means that a standard pointer 
(top-byte is 0) would be unchecked.

Steve

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  9:21         ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On 10/09/2020 07:29, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
>> On 09/09/2020 16:25, Andrew Jones wrote:
>>> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>>>>    2. Automatically promotes (normal host) memory given to the guest to be
>>>>       tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>>>       tags are cleared if the memory wasn't previously MTE enabled.
>>>
>>> Shouldn't this be up to the guest? Or, is this required in order for the
>>> guest to use tagging at all. Something like making the guest IPAs memtag
>>> capable, but if the guest doesn't enable tagging then there is no guest
>>> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
>>> to the memory regions it wants the guest to be able to use tagging with,
>>> rather than KVM adding the attribute page by page?
>>
>> I think I've probably explained this badly.
>>
>> The guest can choose how to populate the stage 1 mapping - so can choose
>> which parts of memory are accessed tagged or not. However, the hypervisor
>> cannot restrict this in stage 2 (except by e.g. making the memory uncached
>> but that's obviously not great - however devices forward to the guest can be
>> handled like this).
>>
>> Because the hypervisor cannot restrict the guest's access to the tags, the
>> hypervisor must assume that all memory given to the guest could have the
>> tags accessed. So it must (a) clear any stale data from the tags, and (b)
>> ensure that the tags are preserved (e.g. when swapping pages out).
>>
> 
> Yes, this is how I understood it.

Ok, I've obviously misunderstood your comment instead ;)

>> Because of the above the current series automatically sets PG_mte_tagged on
>> the pages. Note that this doesn't change the mappings that the VMM has (a
>> non-PROT_MTE mapping will still not have access to the tags).
> 
> But if userspace created the memslots with memory already set with
> PROT_MTE, then this wouldn't be necessary, right? And, as long as
> there's still a way to access the memory with tag checking disabled,
> then it shouldn't be a problem.

Yes, so one option would be to attempt to validate that the VMM has 
provided memory pages with the PG_mte_tagged bit set (e.g. by mapping 
with PROT_MTE). The tricky part here is that we support KVM_CAP_SYNC_MMU 
which means that the VMM can change the memory backing at any time - so 
we could end up in user_mem_abort() discovering that a page doesn't have 
PG_mte_tagged set - at that point there's no nice way of handling it 
(other than silently upgrading the page) so the VM is dead.

So since enforcing that PG_mte_tagged is set isn't easy and provides a 
hard-to-debug foot gun to the VMM I decided the better option was to let 
the kernel set the bit automatically.

>>>
>>> If userspace needs to write to guest memory then it should be due to
>>> a device DMA or other specific hardware emulation. Those accesses can
>>> be done with tag checking disabled.
>>
>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>> disable/renable tag checking for the process sequence. The alternative at
>> the moment is to maintain a separate (untagged) mapping for the purpose
>> which might present it's own problems.
> 
> Hmm, so there's no easy way to disable tag checking when necessary? If we
> don't map the guest ram with PROT_MTE and continue setting the attribute
> in KVM, as this series does, then we don't need to worry about it tag
> checking when accessing the memory, but then we can't access the tags for
> migration.

There's a "TCO" (Tag Check Override) bit in PSTATE which allows 
disabling tag checking, so if it's reasonable to wrap accesses to the 
memory you can simply set the TCO bit, perform the memory access and 
then unset TCO. That would mean a single mapping with MTE enabled would 
work fine. What I don't have a clue about is whether it's practical in 
the VMM to wrap guest accesses like this.

>>
>>>>
>>>> If it's not practical to either disable tag checking in the VMM or
>>>> maintain multiple mappings then the alternatives I'm aware of are:
>>>>
>>>>    * Provide a KVM-specific method to extract the tags from guest memory.
>>>>      This might also have benefits in terms of providing an easy way to
>>>>      read bulk tag data from guest memory (since the LDGM instruction
>>>>      isn't available at EL0).
>>>
>>> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
>>> the tags for all addresses of each dirty page.
>>
>> Certainly possible, although it seems to conflate two operations: "get list
>> of dirty pages", "get tags from page". It would also require a lot of return
>> space (size of slot/32).
>>
> 
> It would require num-set-bits * host-page-size / 16 / 2, right?

Yes, where the worst case is all bits set which is size/32. Since you 
don't know at the time of the call how many bits are going to be set I'm 
not sure how you would design the API which doesn't require 
preallocating the worst case.

>>>>    * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>>>      TCR_EL1. These would allow the VMM to generate pointers which are not
>>>>      tag checked.
>>>
>>> So this is necessary to allow the VMM to keep tag checking enabled for
>>> itself, plus map guest memory as PROT_MTE, and write to that memory when
>>> needed?
>>
>> This is certainly one option. The architecture provides two "magic" values
>> (all-0s and all-1s) which can be configured using TCMAx to be treated
>> differently. The VMM could therefore construct pointers to otherwise tagged
>> memory which would be treated as untagged.
>>
>> However, Catalin's user space series doesn't at the moment expose this
>> functionality.
>>
> 
> So if I understand correctly this would allow us to map the guest memory
> with PAGE_MTE and still access the memory when needed. If so, then this
> sounds interesting.

Yes - you could derive a pointer which didn't perform tag checking. Note 
that this also requires the rest of user space to play along (i.e. 
understand that the tag value is reserved). I believe for user space we 
have to use the all-0s value which means that a standard pointer 
(top-byte is 0) would be unchecked.

Steve


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  9:21         ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On 10/09/2020 07:29, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
>> On 09/09/2020 16:25, Andrew Jones wrote:
>>> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>>>>    2. Automatically promotes (normal host) memory given to the guest to be
>>>>       tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>>>       tags are cleared if the memory wasn't previously MTE enabled.
>>>
>>> Shouldn't this be up to the guest? Or, is this required in order for the
>>> guest to use tagging at all. Something like making the guest IPAs memtag
>>> capable, but if the guest doesn't enable tagging then there is no guest
>>> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
>>> to the memory regions it wants the guest to be able to use tagging with,
>>> rather than KVM adding the attribute page by page?
>>
>> I think I've probably explained this badly.
>>
>> The guest can choose how to populate the stage 1 mapping - so can choose
>> which parts of memory are accessed tagged or not. However, the hypervisor
>> cannot restrict this in stage 2 (except by e.g. making the memory uncached
>> but that's obviously not great - however devices forward to the guest can be
>> handled like this).
>>
>> Because the hypervisor cannot restrict the guest's access to the tags, the
>> hypervisor must assume that all memory given to the guest could have the
>> tags accessed. So it must (a) clear any stale data from the tags, and (b)
>> ensure that the tags are preserved (e.g. when swapping pages out).
>>
> 
> Yes, this is how I understood it.

Ok, I've obviously misunderstood your comment instead ;)

>> Because of the above the current series automatically sets PG_mte_tagged on
>> the pages. Note that this doesn't change the mappings that the VMM has (a
>> non-PROT_MTE mapping will still not have access to the tags).
> 
> But if userspace created the memslots with memory already set with
> PROT_MTE, then this wouldn't be necessary, right? And, as long as
> there's still a way to access the memory with tag checking disabled,
> then it shouldn't be a problem.

Yes, so one option would be to attempt to validate that the VMM has 
provided memory pages with the PG_mte_tagged bit set (e.g. by mapping 
with PROT_MTE). The tricky part here is that we support KVM_CAP_SYNC_MMU 
which means that the VMM can change the memory backing at any time - so 
we could end up in user_mem_abort() discovering that a page doesn't have 
PG_mte_tagged set - at that point there's no nice way of handling it 
(other than silently upgrading the page) so the VM is dead.

So since enforcing that PG_mte_tagged is set isn't easy and provides a 
hard-to-debug foot gun to the VMM I decided the better option was to let 
the kernel set the bit automatically.

>>>
>>> If userspace needs to write to guest memory then it should be due to
>>> a device DMA or other specific hardware emulation. Those accesses can
>>> be done with tag checking disabled.
>>
>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>> disable/renable tag checking for the process sequence. The alternative at
>> the moment is to maintain a separate (untagged) mapping for the purpose
>> which might present it's own problems.
> 
> Hmm, so there's no easy way to disable tag checking when necessary? If we
> don't map the guest ram with PROT_MTE and continue setting the attribute
> in KVM, as this series does, then we don't need to worry about it tag
> checking when accessing the memory, but then we can't access the tags for
> migration.

There's a "TCO" (Tag Check Override) bit in PSTATE which allows 
disabling tag checking, so if it's reasonable to wrap accesses to the 
memory you can simply set the TCO bit, perform the memory access and 
then unset TCO. That would mean a single mapping with MTE enabled would 
work fine. What I don't have a clue about is whether it's practical in 
the VMM to wrap guest accesses like this.

>>
>>>>
>>>> If it's not practical to either disable tag checking in the VMM or
>>>> maintain multiple mappings then the alternatives I'm aware of are:
>>>>
>>>>    * Provide a KVM-specific method to extract the tags from guest memory.
>>>>      This might also have benefits in terms of providing an easy way to
>>>>      read bulk tag data from guest memory (since the LDGM instruction
>>>>      isn't available at EL0).
>>>
>>> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
>>> the tags for all addresses of each dirty page.
>>
>> Certainly possible, although it seems to conflate two operations: "get list
>> of dirty pages", "get tags from page". It would also require a lot of return
>> space (size of slot/32).
>>
> 
> It would require num-set-bits * host-page-size / 16 / 2, right?

Yes, where the worst case is all bits set which is size/32. Since you 
don't know at the time of the call how many bits are going to be set I'm 
not sure how you would design the API which doesn't require 
preallocating the worst case.

>>>>    * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>>>      TCR_EL1. These would allow the VMM to generate pointers which are not
>>>>      tag checked.
>>>
>>> So this is necessary to allow the VMM to keep tag checking enabled for
>>> itself, plus map guest memory as PROT_MTE, and write to that memory when
>>> needed?
>>
>> This is certainly one option. The architecture provides two "magic" values
>> (all-0s and all-1s) which can be configured using TCMAx to be treated
>> differently. The VMM could therefore construct pointers to otherwise tagged
>> memory which would be treated as untagged.
>>
>> However, Catalin's user space series doesn't at the moment expose this
>> functionality.
>>
> 
> So if I understand correctly this would allow us to map the guest memory
> with PAGE_MTE and still access the memory when needed. If so, then this
> sounds interesting.

Yes - you could derive a pointer which didn't perform tag checking. Note 
that this also requires the rest of user space to play along (i.e. 
understand that the tag value is reserved). I believe for user space we 
have to use the all-0s value which means that a standard pointer 
(top-byte is 0) would be unchecked.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10  9:21         ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On 10/09/2020 07:29, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 05:04:15PM +0100, Steven Price wrote:
>> On 09/09/2020 16:25, Andrew Jones wrote:
>>> On Fri, Sep 04, 2020 at 05:00:16PM +0100, Steven Price wrote:
>>>>    2. Automatically promotes (normal host) memory given to the guest to be
>>>>       tag enabled (sets PG_mte_tagged), if any VCPU has MTE enabled. The
>>>>       tags are cleared if the memory wasn't previously MTE enabled.
>>>
>>> Shouldn't this be up to the guest? Or, is this required in order for the
>>> guest to use tagging at all. Something like making the guest IPAs memtag
>>> capable, but if the guest doesn't enable tagging then there is no guest
>>> impact? In any case, shouldn't userspace be the one that adds PROT_MTE
>>> to the memory regions it wants the guest to be able to use tagging with,
>>> rather than KVM adding the attribute page by page?
>>
>> I think I've probably explained this badly.
>>
>> The guest can choose how to populate the stage 1 mapping - so can choose
>> which parts of memory are accessed tagged or not. However, the hypervisor
>> cannot restrict this in stage 2 (except by e.g. making the memory uncached
>> but that's obviously not great - however devices forward to the guest can be
>> handled like this).
>>
>> Because the hypervisor cannot restrict the guest's access to the tags, the
>> hypervisor must assume that all memory given to the guest could have the
>> tags accessed. So it must (a) clear any stale data from the tags, and (b)
>> ensure that the tags are preserved (e.g. when swapping pages out).
>>
> 
> Yes, this is how I understood it.

Ok, I've obviously misunderstood your comment instead ;)

>> Because of the above the current series automatically sets PG_mte_tagged on
>> the pages. Note that this doesn't change the mappings that the VMM has (a
>> non-PROT_MTE mapping will still not have access to the tags).
> 
> But if userspace created the memslots with memory already set with
> PROT_MTE, then this wouldn't be necessary, right? And, as long as
> there's still a way to access the memory with tag checking disabled,
> then it shouldn't be a problem.

Yes, so one option would be to attempt to validate that the VMM has 
provided memory pages with the PG_mte_tagged bit set (e.g. by mapping 
with PROT_MTE). The tricky part here is that we support KVM_CAP_SYNC_MMU 
which means that the VMM can change the memory backing at any time - so 
we could end up in user_mem_abort() discovering that a page doesn't have 
PG_mte_tagged set - at that point there's no nice way of handling it 
(other than silently upgrading the page) so the VM is dead.

So since enforcing that PG_mte_tagged is set isn't easy and provides a 
hard-to-debug foot gun to the VMM I decided the better option was to let 
the kernel set the bit automatically.

>>>
>>> If userspace needs to write to guest memory then it should be due to
>>> a device DMA or other specific hardware emulation. Those accesses can
>>> be done with tag checking disabled.
>>
>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>> disable/renable tag checking for the process sequence. The alternative at
>> the moment is to maintain a separate (untagged) mapping for the purpose
>> which might present it's own problems.
> 
> Hmm, so there's no easy way to disable tag checking when necessary? If we
> don't map the guest ram with PROT_MTE and continue setting the attribute
> in KVM, as this series does, then we don't need to worry about it tag
> checking when accessing the memory, but then we can't access the tags for
> migration.

There's a "TCO" (Tag Check Override) bit in PSTATE which allows 
disabling tag checking, so if it's reasonable to wrap accesses to the 
memory you can simply set the TCO bit, perform the memory access and 
then unset TCO. That would mean a single mapping with MTE enabled would 
work fine. What I don't have a clue about is whether it's practical in 
the VMM to wrap guest accesses like this.

>>
>>>>
>>>> If it's not practical to either disable tag checking in the VMM or
>>>> maintain multiple mappings then the alternatives I'm aware of are:
>>>>
>>>>    * Provide a KVM-specific method to extract the tags from guest memory.
>>>>      This might also have benefits in terms of providing an easy way to
>>>>      read bulk tag data from guest memory (since the LDGM instruction
>>>>      isn't available at EL0).
>>>
>>> Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
>>> the tags for all addresses of each dirty page.
>>
>> Certainly possible, although it seems to conflate two operations: "get list
>> of dirty pages", "get tags from page". It would also require a lot of return
>> space (size of slot/32).
>>
> 
> It would require num-set-bits * host-page-size / 16 / 2, right?

Yes, where the worst case is all bits set which is size/32. Since you 
don't know at the time of the call how many bits are going to be set I'm 
not sure how you would design the API which doesn't require 
preallocating the worst case.

>>>>    * Provide support for user space setting the TCMA0 or TCMA1 bits in
>>>>      TCR_EL1. These would allow the VMM to generate pointers which are not
>>>>      tag checked.
>>>
>>> So this is necessary to allow the VMM to keep tag checking enabled for
>>> itself, plus map guest memory as PROT_MTE, and write to that memory when
>>> needed?
>>
>> This is certainly one option. The architecture provides two "magic" values
>> (all-0s and all-1s) which can be configured using TCMAx to be treated
>> differently. The VMM could therefore construct pointers to otherwise tagged
>> memory which would be treated as untagged.
>>
>> However, Catalin's user space series doesn't at the moment expose this
>> functionality.
>>
> 
> So if I understand correctly this would allow us to map the guest memory
> with PAGE_MTE and still access the memory when needed. If so, then this
> sounds interesting.

Yes - you could derive a pointer which didn't perform tag checking. Note 
that this also requires the rest of user space to play along (i.e. 
understand that the tag value is reserved). I believe for user space we 
have to use the all-0s value which means that a standard pointer 
(top-byte is 0) would be unchecked.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-09 15:48     ` Andrew Jones
  (?)
  (?)
@ 2020-09-10  9:21       ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 09/09/2020 16:48, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
>> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
>> on a VCPU. When enabled on any VCPU in the virtual machine this causes
>> all pages that are faulted into the VM to have the PG_mte_tagged flag
>> set (and the tag storage cleared if this is the first use).
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h |  3 +++
>>   arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>>   arch/arm64/include/uapi/asm/kvm.h    |  1 +
>>   arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>>   arch/arm64/kvm/reset.c               |  8 ++++++++
>>   arch/arm64/kvm/sys_regs.c            |  6 +++++-
>>   6 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 49a55be2b9a2..0042323a4b7f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>   	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>>   	    vcpu_el1_is_32bit(vcpu))
>>   		vcpu->arch.hcr_el2 |= HCR_TID2;
>> +
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>>   }
>>   
>>   static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4f4360dd149e..b1190366242b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -37,7 +37,7 @@
>>   
>>   #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>   
>> -#define KVM_VCPU_MAX_FEATURES 7
>> +#define KVM_VCPU_MAX_FEATURES 8
>>   
>>   #define KVM_REQ_SLEEP \
>>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> @@ -110,6 +110,9 @@ struct kvm_arch {
>>   	 * supported.
>>   	 */
>>   	bool return_nisv_io_abort_to_user;
>> +
>> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
>> +	bool vcpu_has_mte;
> 
> It looks like this is unnecessary as it's only used once, where a feature
> check could be used.

It's used in user_mem_abort(), so every time we fault a page into the VM 
- so having to iterate over all VCPUs to check if any have the feature 
bit set seems too expensive.

Although perhaps I should just accept that this is realistically a VM 
setting and move it out of the VCPU.

>>   };
>>   
>>   struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index ba85bb23f060..2677e1ab8c16 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>   #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>>   #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>>   #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>>   
>>   struct kvm_vcpu_init {
>>   	__u32 target;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index ba00bcc0c884..e8891bacd76f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised.
>> +		 */
>> +		struct page *page = pfn_to_page(pfn);
>> +		long i, nr_pages = compound_nr(page);
>> +
>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		kvm_set_pfn_dirty(pfn);
>>   
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index ee33875c5c2a..82f3883d717f 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>   		}
>>   	}
>>   
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
>> +		if (!system_supports_mte()) {
>> +			ret = -EINVAL;
>> +			goto out;
>> +		}
>> +		vcpu->kvm->arch.vcpu_has_mte = true;
>> +	}
> 
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't 
need the cap? Or would it still be useful?

Thanks,

Steve

>> +
>>   	switch (vcpu->arch.target) {
>>   	default:
>>   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index a655f172b5ad..6a971b201e81 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>>   			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>>   		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>>   	} else if (id == SYS_ID_AA64PFR1_EL1) {
>> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>>   	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>>   		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>>   			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
>> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>   static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>>   				   const struct sys_reg_desc *rd)
>>   {
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		return 0;
>> +
>>   	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>>   }
>>   
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 
> Thanks,
> drew
> 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  9:21       ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:48, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
>> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
>> on a VCPU. When enabled on any VCPU in the virtual machine this causes
>> all pages that are faulted into the VM to have the PG_mte_tagged flag
>> set (and the tag storage cleared if this is the first use).
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h |  3 +++
>>   arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>>   arch/arm64/include/uapi/asm/kvm.h    |  1 +
>>   arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>>   arch/arm64/kvm/reset.c               |  8 ++++++++
>>   arch/arm64/kvm/sys_regs.c            |  6 +++++-
>>   6 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 49a55be2b9a2..0042323a4b7f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>   	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>>   	    vcpu_el1_is_32bit(vcpu))
>>   		vcpu->arch.hcr_el2 |= HCR_TID2;
>> +
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>>   }
>>   
>>   static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4f4360dd149e..b1190366242b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -37,7 +37,7 @@
>>   
>>   #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>   
>> -#define KVM_VCPU_MAX_FEATURES 7
>> +#define KVM_VCPU_MAX_FEATURES 8
>>   
>>   #define KVM_REQ_SLEEP \
>>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> @@ -110,6 +110,9 @@ struct kvm_arch {
>>   	 * supported.
>>   	 */
>>   	bool return_nisv_io_abort_to_user;
>> +
>> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
>> +	bool vcpu_has_mte;
> 
> It looks like this is unnecessary as it's only used once, where a feature
> check could be used.

It's used in user_mem_abort(), so every time we fault a page into the VM 
- so having to iterate over all VCPUs to check if any have the feature 
bit set seems too expensive.

Although perhaps I should just accept that this is realistically a VM 
setting and move it out of the VCPU.

>>   };
>>   
>>   struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index ba85bb23f060..2677e1ab8c16 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>   #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>>   #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>>   #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>>   
>>   struct kvm_vcpu_init {
>>   	__u32 target;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index ba00bcc0c884..e8891bacd76f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised.
>> +		 */
>> +		struct page *page = pfn_to_page(pfn);
>> +		long i, nr_pages = compound_nr(page);
>> +
>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		kvm_set_pfn_dirty(pfn);
>>   
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index ee33875c5c2a..82f3883d717f 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>   		}
>>   	}
>>   
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
>> +		if (!system_supports_mte()) {
>> +			ret = -EINVAL;
>> +			goto out;
>> +		}
>> +		vcpu->kvm->arch.vcpu_has_mte = true;
>> +	}
> 
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't 
need the cap? Or would it still be useful?

Thanks,

Steve

>> +
>>   	switch (vcpu->arch.target) {
>>   	default:
>>   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index a655f172b5ad..6a971b201e81 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>>   			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>>   		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>>   	} else if (id == SYS_ID_AA64PFR1_EL1) {
>> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>>   	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>>   		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>>   			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
>> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>   static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>>   				   const struct sys_reg_desc *rd)
>>   {
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		return 0;
>> +
>>   	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>>   }
>>   
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 
> Thanks,
> drew
> 



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  9:21       ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:48, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
>> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
>> on a VCPU. When enabled on any VCPU in the virtual machine this causes
>> all pages that are faulted into the VM to have the PG_mte_tagged flag
>> set (and the tag storage cleared if this is the first use).
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h |  3 +++
>>   arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>>   arch/arm64/include/uapi/asm/kvm.h    |  1 +
>>   arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>>   arch/arm64/kvm/reset.c               |  8 ++++++++
>>   arch/arm64/kvm/sys_regs.c            |  6 +++++-
>>   6 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 49a55be2b9a2..0042323a4b7f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>   	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>>   	    vcpu_el1_is_32bit(vcpu))
>>   		vcpu->arch.hcr_el2 |= HCR_TID2;
>> +
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>>   }
>>   
>>   static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4f4360dd149e..b1190366242b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -37,7 +37,7 @@
>>   
>>   #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>   
>> -#define KVM_VCPU_MAX_FEATURES 7
>> +#define KVM_VCPU_MAX_FEATURES 8
>>   
>>   #define KVM_REQ_SLEEP \
>>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> @@ -110,6 +110,9 @@ struct kvm_arch {
>>   	 * supported.
>>   	 */
>>   	bool return_nisv_io_abort_to_user;
>> +
>> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
>> +	bool vcpu_has_mte;
> 
> It looks like this is unnecessary as it's only used once, where a feature
> check could be used.

It's used in user_mem_abort(), so every time we fault a page into the VM 
- so having to iterate over all VCPUs to check if any have the feature 
bit set seems too expensive.

Although perhaps I should just accept that this is realistically a VM 
setting and move it out of the VCPU.

>>   };
>>   
>>   struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index ba85bb23f060..2677e1ab8c16 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>   #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>>   #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>>   #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>>   
>>   struct kvm_vcpu_init {
>>   	__u32 target;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index ba00bcc0c884..e8891bacd76f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised.
>> +		 */
>> +		struct page *page = pfn_to_page(pfn);
>> +		long i, nr_pages = compound_nr(page);
>> +
>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		kvm_set_pfn_dirty(pfn);
>>   
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index ee33875c5c2a..82f3883d717f 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>   		}
>>   	}
>>   
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
>> +		if (!system_supports_mte()) {
>> +			ret = -EINVAL;
>> +			goto out;
>> +		}
>> +		vcpu->kvm->arch.vcpu_has_mte = true;
>> +	}
> 
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't 
need the cap? Or would it still be useful?

Thanks,

Steve

>> +
>>   	switch (vcpu->arch.target) {
>>   	default:
>>   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index a655f172b5ad..6a971b201e81 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>>   			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>>   		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>>   	} else if (id == SYS_ID_AA64PFR1_EL1) {
>> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>>   	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>>   		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>>   			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
>> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>   static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>>   				   const struct sys_reg_desc *rd)
>>   {
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		return 0;
>> +
>>   	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>>   }
>>   
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 
> Thanks,
> drew
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10  9:21       ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10  9:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On 09/09/2020 16:48, Andrew Jones wrote:
> On Fri, Sep 04, 2020 at 05:00:18PM +0100, Steven Price wrote:
>> Add a new VCPU features 'KVM_ARM_VCPU_MTE' which enables memory tagging
>> on a VCPU. When enabled on any VCPU in the virtual machine this causes
>> all pages that are faulted into the VM to have the PG_mte_tagged flag
>> set (and the tag storage cleared if this is the first use).
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_emulate.h |  3 +++
>>   arch/arm64/include/asm/kvm_host.h    |  5 ++++-
>>   arch/arm64/include/uapi/asm/kvm.h    |  1 +
>>   arch/arm64/kvm/mmu.c                 | 15 +++++++++++++++
>>   arch/arm64/kvm/reset.c               |  8 ++++++++
>>   arch/arm64/kvm/sys_regs.c            |  6 +++++-
>>   6 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 49a55be2b9a2..0042323a4b7f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>   	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
>>   	    vcpu_el1_is_32bit(vcpu))
>>   		vcpu->arch.hcr_el2 |= HCR_TID2;
>> +
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		vcpu->arch.hcr_el2 |= HCR_ATA;
>>   }
>>   
>>   static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4f4360dd149e..b1190366242b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -37,7 +37,7 @@
>>   
>>   #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>   
>> -#define KVM_VCPU_MAX_FEATURES 7
>> +#define KVM_VCPU_MAX_FEATURES 8
>>   
>>   #define KVM_REQ_SLEEP \
>>   	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>> @@ -110,6 +110,9 @@ struct kvm_arch {
>>   	 * supported.
>>   	 */
>>   	bool return_nisv_io_abort_to_user;
>> +
>> +	/* If any VCPU has MTE enabled then all memory must be MTE enabled */
>> +	bool vcpu_has_mte;
> 
> It looks like this is unnecessary as it's only used once, where a feature
> check could be used.

It's used in user_mem_abort(), so every time we fault a page into the VM 
- so having to iterate over all VCPUs to check if any have the feature 
bit set seems too expensive.

Although perhaps I should just accept that this is realistically a VM 
setting and move it out of the VCPU.

>>   };
>>   
>>   struct kvm_vcpu_fault_info {
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index ba85bb23f060..2677e1ab8c16 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>   #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>>   #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>>   #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>> +#define KVM_ARM_VCPU_MTE		7 /* VCPU supports Memory Tagging */
>>   
>>   struct kvm_vcpu_init {
>>   	__u32 target;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index ba00bcc0c884..e8891bacd76f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1949,6 +1949,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +	if (system_supports_mte() && kvm->arch.vcpu_has_mte && pfn_valid(pfn)) {
>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised.
>> +		 */
>> +		struct page *page = pfn_to_page(pfn);
>> +		long i, nr_pages = compound_nr(page);
>> +
>> +		/* if PG_mte_tagged is set, tags have already been initialised */
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		kvm_set_pfn_dirty(pfn);
>>   
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index ee33875c5c2a..82f3883d717f 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -274,6 +274,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>   		}
>>   	}
>>   
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features)) {
>> +		if (!system_supports_mte()) {
>> +			ret = -EINVAL;
>> +			goto out;
>> +		}
>> +		vcpu->kvm->arch.vcpu_has_mte = true;
>> +	}
> 
> We either need a KVM cap or a new CPU feature probing interface to avoid
> making userspace try features one at a time. It's too bad that VCPU_INIT
> doesn't clear all offending features from the feature set when returning
> EINVAL, because then userspace could create a scratch VCPU with everything
> it supports in order to see what KVM also supports in one go.

If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't 
need the cap? Or would it still be useful?

Thanks,

Steve

>> +
>>   	switch (vcpu->arch.target) {
>>   	default:
>>   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index a655f172b5ad..6a971b201e81 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
>>   			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
>>   		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
>>   	} else if (id == SYS_ID_AA64PFR1_EL1) {
>> -		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>> +		if (!test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +			val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
>>   	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
>>   		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
>>   			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |
>> @@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>   static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
>>   				   const struct sys_reg_desc *rd)
>>   {
>> +	if (test_bit(KVM_ARM_VCPU_MTE, vcpu->arch.features))
>> +		return 0;
>> +
>>   	return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
>>   }
>>   
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 
> Thanks,
> drew
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-10  6:38         ` Andrew Jones
  (?)
  (?)
@ 2020-09-10 10:01           ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 10:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List,
	Dr. David Alan Gilbert, Marc Zyngier, Thomas Gleixner,
	Steven Price, Will Deacon, kvmarm, arm-mail-list, Dave Martin

On Thu, Sep 10, 2020 at 08:38:54AM +0200, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> > On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > > We either need a KVM cap or a new CPU feature probing interface to avoid
> > > making userspace try features one at a time. It's too bad that VCPU_INIT
> > > doesn't clear all offending features from the feature set when returning
> > > EINVAL, because then userspace could create a scratch VCPU with everything
> > > it supports in order to see what KVM also supports in one go.
> > 
> > You could add one if you wanted -- add a new feature bit
> > TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> > it clears out feature bits it doesn't support and also clears
> > TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> > is still set, then it knows it's dealing with an old kernel
> > and has to do one-at-a-time probing. If it sees EINVAL but not
> > TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> > has just got all the info.
> >
> 
> That's a great proposal. I'll try to find time to send the patches.
>

We also have KVM_ARM_PREFERRED_TARGET, which is documented as

"""
...
The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.
...
"""

But, it says "recommended" features, not "all supported" features,
and the current implementation of KVM_ARM_PREFERRED_TARGET only
zeros out features. So, I think we should just leave
KVM_ARM_PREFERRED_TARGET as is and stick to the plan of extending
VCPU_INIT.

Thanks,
drew


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 10:01           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 10:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List, arm-mail-list,
	Marc Zyngier, Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	Dr. David Alan Gilbert, Dave Martin

On Thu, Sep 10, 2020 at 08:38:54AM +0200, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> > On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > > We either need a KVM cap or a new CPU feature probing interface to avoid
> > > making userspace try features one at a time. It's too bad that VCPU_INIT
> > > doesn't clear all offending features from the feature set when returning
> > > EINVAL, because then userspace could create a scratch VCPU with everything
> > > it supports in order to see what KVM also supports in one go.
> > 
> > You could add one if you wanted -- add a new feature bit
> > TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> > it clears out feature bits it doesn't support and also clears
> > TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> > is still set, then it knows it's dealing with an old kernel
> > and has to do one-at-a-time probing. If it sees EINVAL but not
> > TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> > has just got all the info.
> >
> 
> That's a great proposal. I'll try to find time to send the patches.
>

We also have KVM_ARM_PREFERRED_TARGET, which is documented as

"""
...
The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.
...
"""

But, it says "recommended" features, not "all supported" features,
and the current implementation of KVM_ARM_PREFERRED_TARGET only
zeros out features. So, I think we should just leave
KVM_ARM_PREFERRED_TARGET as is and stick to the plan of extending
VCPU_INIT.

Thanks,
drew



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 10:01           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 10:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List, arm-mail-list,
	Marc Zyngier, Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	Dr. David Alan Gilbert, Dave Martin

On Thu, Sep 10, 2020 at 08:38:54AM +0200, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> > On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > > We either need a KVM cap or a new CPU feature probing interface to avoid
> > > making userspace try features one at a time. It's too bad that VCPU_INIT
> > > doesn't clear all offending features from the feature set when returning
> > > EINVAL, because then userspace could create a scratch VCPU with everything
> > > it supports in order to see what KVM also supports in one go.
> > 
> > You could add one if you wanted -- add a new feature bit
> > TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> > it clears out feature bits it doesn't support and also clears
> > TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> > is still set, then it knows it's dealing with an old kernel
> > and has to do one-at-a-time probing. If it sees EINVAL but not
> > TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> > has just got all the info.
> >
> 
> That's a great proposal. I'll try to find time to send the patches.
>

We also have KVM_ARM_PREFERRED_TARGET, which is documented as

"""
...
The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.
...
"""

But, it says "recommended" features, not "all supported" features,
and the current implementation of KVM_ARM_PREFERRED_TARGET only
zeros out features. So, I think we should just leave
KVM_ARM_PREFERRED_TARGET as is and stick to the plan of extending
VCPU_INIT.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 10:01           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 10:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	QEMU Developers, lkml - Kernel Mailing List, arm-mail-list,
	Marc Zyngier, Thomas Gleixner, Steven Price, Will Deacon, kvmarm,
	Dr. David Alan Gilbert, Dave Martin

On Thu, Sep 10, 2020 at 08:38:54AM +0200, Andrew Jones wrote:
> On Wed, Sep 09, 2020 at 04:53:02PM +0100, Peter Maydell wrote:
> > On Wed, 9 Sep 2020 at 16:48, Andrew Jones <drjones@redhat.com> wrote:
> > > We either need a KVM cap or a new CPU feature probing interface to avoid
> > > making userspace try features one at a time. It's too bad that VCPU_INIT
> > > doesn't clear all offending features from the feature set when returning
> > > EINVAL, because then userspace could create a scratch VCPU with everything
> > > it supports in order to see what KVM also supports in one go.
> > 
> > You could add one if you wanted -- add a new feature bit
> > TELL_ME_WHAT_YOU_HAVE. If the kernel sees that then on filure
> > it clears out feature bits it doesn't support and also clears
> > TELL_ME_WHAT_YOU_HAVE. If QEMU sees EINVAL and TELL_ME_WHAT_YOU_HAVE
> > is still set, then it knows it's dealing with an old kernel
> > and has to do one-at-a-time probing. If it sees EINVAL but not
> > TELL_ME_WHAT_YOU_HAVE then it knows it has a new kernel and
> > has just got all the info.
> >
> 
> That's a great proposal. I'll try to find time to send the patches.
>

We also have KVM_ARM_PREFERRED_TARGET, which is documented as

"""
...
The ioctl returns struct kvm_vcpu_init instance containing information
about preferred CPU target type and recommended features for it.  The
kvm_vcpu_init->features bitmap returned will have feature bits set if
the preferred target recommends setting these features, but this is
not mandatory.
...
"""

But, it says "recommended" features, not "all supported" features,
and the current implementation of KVM_ARM_PREFERRED_TARGET only
zeros out features. So, I think we should just leave
KVM_ARM_PREFERRED_TARGET as is and stick to the plan of extending
VCPU_INIT.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10  0:33   ` Richard Henderson
  (?)
  (?)
@ 2020-09-10 10:24     ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 10:24 UTC (permalink / raw)
  To: Richard Henderson, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, kvmarm,
	linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland,
	Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Peter Maydell, Haibo Xu

On 10/09/2020 01:33, Richard Henderson wrote:
> On 9/4/20 9:00 AM, Steven Price wrote:
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
> ...
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process)...
> 
> The latest version of the kernel patches for user mte support has separate
> controls for how tag check fail is reported.  Including
> 
>> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> 
> That may be less than optimal once userland starts uses tags itself, e.g.
> running qemu itself with an mte-aware malloc.
> 
> Independent of that, there's also the TCO bit, which can be toggled by any
> piece of code that wants to disable checking locally.

Yes, I would expect the TCO bit is the best option for wrapping accesses 
to make them unchecked.

> However, none of that is required for accessing tags.  User space can always
> load/store tags via LDG/STG.  That's going to be slow, though.

Yes as things stand LDG/STG is the way for user space to access tags. 
Since I don't have any real hardware I can't really comment on speed.

> It's a shame that LDGM/STGM are privileged instructions.  I don't understand
> why that was done, since there's absolutely nothing that those insns can do
> that you can't do with (up to) 16x LDG/STG.

It is a shame, however I suspect this is because to use those 
instructions you need to know the block size held in GMID_EL1. And at 
least in theory that could vary between CPUs.

> I think it might be worth adding some sort of kernel entry point that can bulk
> copy tags, e.g. page aligned quantities.  But that's just a speed of migration
> thing and could come later.

When we have some real hardware it would be worth profiling this. At the 
moment I've no idea whether the kernel entry overhead would make such an 
interface useful from a performance perspective or not.

Steve

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 10:24     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 10:24 UTC (permalink / raw)
  To: Richard Henderson, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 10/09/2020 01:33, Richard Henderson wrote:
> On 9/4/20 9:00 AM, Steven Price wrote:
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
> ...
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process)...
> 
> The latest version of the kernel patches for user mte support has separate
> controls for how tag check fail is reported.  Including
> 
>> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> 
> That may be less than optimal once userland starts uses tags itself, e.g.
> running qemu itself with an mte-aware malloc.
> 
> Independent of that, there's also the TCO bit, which can be toggled by any
> piece of code that wants to disable checking locally.

Yes, I would expect the TCO bit is the best option for wrapping accesses 
to make them unchecked.

> However, none of that is required for accessing tags.  User space can always
> load/store tags via LDG/STG.  That's going to be slow, though.

Yes as things stand LDG/STG is the way for user space to access tags. 
Since I don't have any real hardware I can't really comment on speed.

> It's a shame that LDGM/STGM are privileged instructions.  I don't understand
> why that was done, since there's absolutely nothing that those insns can do
> that you can't do with (up to) 16x LDG/STG.

It is a shame, however I suspect this is because to use those 
instructions you need to know the block size held in GMID_EL1. And at 
least in theory that could vary between CPUs.

> I think it might be worth adding some sort of kernel entry point that can bulk
> copy tags, e.g. page aligned quantities.  But that's just a speed of migration
> thing and could come later.

When we have some real hardware it would be worth profiling this. At the 
moment I've no idea whether the kernel entry overhead would make such an 
interface useful from a performance perspective or not.

Steve


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 10:24     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 10:24 UTC (permalink / raw)
  To: Richard Henderson, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Juan Quintela,
	linux-kernel, Dave Martin, Thomas Gleixner, kvmarm,
	linux-arm-kernel

On 10/09/2020 01:33, Richard Henderson wrote:
> On 9/4/20 9:00 AM, Steven Price wrote:
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
> ...
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process)...
> 
> The latest version of the kernel patches for user mte support has separate
> controls for how tag check fail is reported.  Including
> 
>> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> 
> That may be less than optimal once userland starts uses tags itself, e.g.
> running qemu itself with an mte-aware malloc.
> 
> Independent of that, there's also the TCO bit, which can be toggled by any
> piece of code that wants to disable checking locally.

Yes, I would expect the TCO bit is the best option for wrapping accesses 
to make them unchecked.

> However, none of that is required for accessing tags.  User space can always
> load/store tags via LDG/STG.  That's going to be slow, though.

Yes as things stand LDG/STG is the way for user space to access tags. 
Since I don't have any real hardware I can't really comment on speed.

> It's a shame that LDGM/STGM are privileged instructions.  I don't understand
> why that was done, since there's absolutely nothing that those insns can do
> that you can't do with (up to) 16x LDG/STG.

It is a shame, however I suspect this is because to use those 
instructions you need to know the block size held in GMID_EL1. And at 
least in theory that could vary between CPUs.

> I think it might be worth adding some sort of kernel entry point that can bulk
> copy tags, e.g. page aligned quantities.  But that's just a speed of migration
> thing and could come later.

When we have some real hardware it would be worth profiling this. At the 
moment I've no idea whether the kernel entry overhead would make such an 
interface useful from a performance perspective or not.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 10:24     ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 10:24 UTC (permalink / raw)
  To: Richard Henderson, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 10/09/2020 01:33, Richard Henderson wrote:
> On 9/4/20 9:00 AM, Steven Price wrote:
>>   3. Doesn't provide any new methods for the VMM to access the tags on
>>      memory.
> ...
>> (3) may be problematic and I'd welcome input from those familiar with
>> VMMs. User space cannot access tags unless the memory is mapped with the
>> PROT_MTE flag. However enabling PROT_MTE will also enable tag checking
>> for the user space process (assuming the VMM enables tag checking for
>> the process)...
> 
> The latest version of the kernel patches for user mte support has separate
> controls for how tag check fail is reported.  Including
> 
>> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> 
> That may be less than optimal once userland starts uses tags itself, e.g.
> running qemu itself with an mte-aware malloc.
> 
> Independent of that, there's also the TCO bit, which can be toggled by any
> piece of code that wants to disable checking locally.

Yes, I would expect the TCO bit is the best option for wrapping accesses 
to make them unchecked.

> However, none of that is required for accessing tags.  User space can always
> load/store tags via LDG/STG.  That's going to be slow, though.

Yes as things stand LDG/STG is the way for user space to access tags. 
Since I don't have any real hardware I can't really comment on speed.

> It's a shame that LDGM/STGM are privileged instructions.  I don't understand
> why that was done, since there's absolutely nothing that those insns can do
> that you can't do with (up to) 16x LDG/STG.

It is a shame, however I suspect this is because to use those 
instructions you need to know the block size held in GMID_EL1. And at 
least in theory that could vary between CPUs.

> I think it might be worth adding some sort of kernel entry point that can bulk
> copy tags, e.g. page aligned quantities.  But that's just a speed of migration
> thing and could come later.

When we have some real hardware it would be worth profiling this. At the 
moment I've no idea whether the kernel entry overhead would make such an 
interface useful from a performance perspective or not.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
  2020-09-10  9:21       ` Steven Price
  (?)
  (?)
@ 2020-09-10 11:49         ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 11:49 UTC (permalink / raw)
  To: Steven Price
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon,
	Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Dave Martin,
	Juan Quintela, Richard Henderson, linux-kernel, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On Thu, Sep 10, 2020 at 10:21:07AM +0100, Steven Price wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't need
> the cap? Or would it still be useful?
>

We wouldn't need it, but we don't _need_ it now either. It's not very
convenient to probe vcpu features with scratch vcpus, especially if we
must probe one at a time, but it works. The TELL_ME_WHAT_YOU_HAVE idea
will only fix the one at a time issue, but still require a vcpu fd. If
this feature becomes a VM feature then a cap or VM level API would help
reduce the userspace probing work.

Thanks,
drew


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 11:49         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 11:49 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Thu, Sep 10, 2020 at 10:21:07AM +0100, Steven Price wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't need
> the cap? Or would it still be useful?
>

We wouldn't need it, but we don't _need_ it now either. It's not very
convenient to probe vcpu features with scratch vcpus, especially if we
must probe one at a time, but it works. The TELL_ME_WHAT_YOU_HAVE idea
will only fix the one at a time issue, but still require a vcpu fd. If
this feature becomes a VM feature then a cap or VM level API would help
reduce the userspace probing work.

Thanks,
drew



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 11:49         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 11:49 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Thu, Sep 10, 2020 at 10:21:07AM +0100, Steven Price wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't need
> the cap? Or would it still be useful?
>

We wouldn't need it, but we don't _need_ it now either. It's not very
convenient to probe vcpu features with scratch vcpus, especially if we
must probe one at a time, but it works. The TELL_ME_WHAT_YOU_HAVE idea
will only fix the one at a time issue, but still require a vcpu fd. If
this feature becomes a VM feature then a cap or VM level API would help
reduce the userspace probing work.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature
@ 2020-09-10 11:49         ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 11:49 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, Dr. David Alan Gilbert, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Thu, Sep 10, 2020 at 10:21:07AM +0100, Steven Price wrote:
> > We either need a KVM cap or a new CPU feature probing interface to avoid
> > making userspace try features one at a time. It's too bad that VCPU_INIT
> > doesn't clear all offending features from the feature set when returning
> > EINVAL, because then userspace could create a scratch VCPU with everything
> > it supports in order to see what KVM also supports in one go.
> 
> If Peter's TELL_ME_WHAT_YOU_HAVE idea works out then perhaps we don't need
> the cap? Or would it still be useful?
>

We wouldn't need it, but we don't _need_ it now either. It's not very
convenient to probe vcpu features with scratch vcpus, especially if we
must probe one at a time, but it works. The TELL_ME_WHAT_YOU_HAVE idea
will only fix the one at a time issue, but still require a vcpu fd. If
this feature becomes a VM feature then a cap or VM level API would help
reduce the userspace probing work.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10  5:44       ` Andrew Jones
  (?)
  (?)
@ 2020-09-10 13:27         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-10 13:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Richard Henderson, Steven Price, Peter Maydell, Juan Quintela,
	Catalin Marinas, qemu-devel, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

* Andrew Jones (drjones@redhat.com) wrote:
> On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > >>    This might also have benefits in terms of providing an easy way to
> > >>    read bulk tag data from guest memory (since the LDGM instruction
> > >>    isn't available at EL0).
> > > 
> > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > the tags for all addresses of each dirty page.
> > 
> > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > the data out from its local address to guest memory.
> > 
> > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > copies the data, with or without tags.
> 
> Right, as long as it's fast enough to do
> 
>   for_each_dirty_page(page, dirty_log)
>     for (i = 0; i < host-page-size/16; i += 16)
>       append_tag(LDG(page + i))
> 
> to get all the tags for each dirty page. I understood it would be faster
> to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> just piggyback on a new dirty-log ioctl instead.

That feels a bad idea to me; there's a couple of different ways dirty
page checking work; lets keep extracting the tags separate.

Dave

> Thanks,
> drew 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:27         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-10 13:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

* Andrew Jones (drjones@redhat.com) wrote:
> On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > >>    This might also have benefits in terms of providing an easy way to
> > >>    read bulk tag data from guest memory (since the LDGM instruction
> > >>    isn't available at EL0).
> > > 
> > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > the tags for all addresses of each dirty page.
> > 
> > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > the data out from its local address to guest memory.
> > 
> > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > copies the data, with or without tags.
> 
> Right, as long as it's fast enough to do
> 
>   for_each_dirty_page(page, dirty_log)
>     for (i = 0; i < host-page-size/16; i += 16)
>       append_tag(LDG(page + i))
> 
> to get all the tags for each dirty page. I understood it would be faster
> to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> just piggyback on a new dirty-log ioctl instead.

That feels a bad idea to me; there's a couple of different ways dirty
page checking work; lets keep extracting the tags separate.

Dave

> Thanks,
> drew 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:27         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-10 13:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

* Andrew Jones (drjones@redhat.com) wrote:
> On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > >>    This might also have benefits in terms of providing an easy way to
> > >>    read bulk tag data from guest memory (since the LDGM instruction
> > >>    isn't available at EL0).
> > > 
> > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > the tags for all addresses of each dirty page.
> > 
> > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > the data out from its local address to guest memory.
> > 
> > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > copies the data, with or without tags.
> 
> Right, as long as it's fast enough to do
> 
>   for_each_dirty_page(page, dirty_log)
>     for (i = 0; i < host-page-size/16; i += 16)
>       append_tag(LDG(page + i))
> 
> to get all the tags for each dirty page. I understood it would be faster
> to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> just piggyback on a new dirty-log ioctl instead.

That feels a bad idea to me; there's a couple of different ways dirty
page checking work; lets keep extracting the tags separate.

Dave

> Thanks,
> drew 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:27         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 96+ messages in thread
From: Dr. David Alan Gilbert @ 2020-09-10 13:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

* Andrew Jones (drjones@redhat.com) wrote:
> On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > >>    This might also have benefits in terms of providing an easy way to
> > >>    read bulk tag data from guest memory (since the LDGM instruction
> > >>    isn't available at EL0).
> > > 
> > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > the tags for all addresses of each dirty page.
> > 
> > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > the data out from its local address to guest memory.
> > 
> > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > copies the data, with or without tags.
> 
> Right, as long as it's fast enough to do
> 
>   for_each_dirty_page(page, dirty_log)
>     for (i = 0; i < host-page-size/16; i += 16)
>       append_tag(LDG(page + i))
> 
> to get all the tags for each dirty page. I understood it would be faster
> to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> just piggyback on a new dirty-log ioctl instead.

That feels a bad idea to me; there's a couple of different ways dirty
page checking work; lets keep extracting the tags separate.

Dave

> Thanks,
> drew 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10 13:27         ` Dr. David Alan Gilbert
  (?)
  (?)
@ 2020-09-10 13:39           ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Richard Henderson, Steven Price, Peter Maydell, Juan Quintela,
	Catalin Marinas, qemu-devel, kvmarm, linux-arm-kernel,
	Marc Zyngier, Thomas Gleixner, Will Deacon, Dave Martin,
	linux-kernel

On Thu, Sep 10, 2020 at 02:27:48PM +0100, Dr. David Alan Gilbert wrote:
> * Andrew Jones (drjones@redhat.com) wrote:
> > On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > > >>    This might also have benefits in terms of providing an easy way to
> > > >>    read bulk tag data from guest memory (since the LDGM instruction
> > > >>    isn't available at EL0).
> > > > 
> > > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > > the tags for all addresses of each dirty page.
> > > 
> > > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > > the data out from its local address to guest memory.
> > > 
> > > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > > copies the data, with or without tags.
> > 
> > Right, as long as it's fast enough to do
> > 
> >   for_each_dirty_page(page, dirty_log)
> >     for (i = 0; i < host-page-size/16; i += 16)
> >       append_tag(LDG(page + i))
> > 
> > to get all the tags for each dirty page. I understood it would be faster
> > to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> > just piggyback on a new dirty-log ioctl instead.
> 
> That feels a bad idea to me; there's a couple of different ways dirty
> page checking work; lets keep extracting the tags separate.
>

It's sounding like it was a premature optimization anyway. We don't yet
know if an ioctl for LDGM is worth it. Looping over LDG may work fine.

Thanks,
drew 


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:39           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

On Thu, Sep 10, 2020 at 02:27:48PM +0100, Dr. David Alan Gilbert wrote:
> * Andrew Jones (drjones@redhat.com) wrote:
> > On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > > >>    This might also have benefits in terms of providing an easy way to
> > > >>    read bulk tag data from guest memory (since the LDGM instruction
> > > >>    isn't available at EL0).
> > > > 
> > > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > > the tags for all addresses of each dirty page.
> > > 
> > > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > > the data out from its local address to guest memory.
> > > 
> > > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > > copies the data, with or without tags.
> > 
> > Right, as long as it's fast enough to do
> > 
> >   for_each_dirty_page(page, dirty_log)
> >     for (i = 0; i < host-page-size/16; i += 16)
> >       append_tag(LDG(page + i))
> > 
> > to get all the tags for each dirty page. I understood it would be faster
> > to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> > just piggyback on a new dirty-log ioctl instead.
> 
> That feels a bad idea to me; there's a couple of different ways dirty
> page checking work; lets keep extracting the tags separate.
>

It's sounding like it was a premature optimization anyway. We don't yet
know if an ioctl for LDGM is worth it. Looping over LDG may work fine.

Thanks,
drew 



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:39           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

On Thu, Sep 10, 2020 at 02:27:48PM +0100, Dr. David Alan Gilbert wrote:
> * Andrew Jones (drjones@redhat.com) wrote:
> > On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > > >>    This might also have benefits in terms of providing an easy way to
> > > >>    read bulk tag data from guest memory (since the LDGM instruction
> > > >>    isn't available at EL0).
> > > > 
> > > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > > the tags for all addresses of each dirty page.
> > > 
> > > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > > the data out from its local address to guest memory.
> > > 
> > > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > > copies the data, with or without tags.
> > 
> > Right, as long as it's fast enough to do
> > 
> >   for_each_dirty_page(page, dirty_log)
> >     for (i = 0; i < host-page-size/16; i += 16)
> >       append_tag(LDG(page + i))
> > 
> > to get all the tags for each dirty page. I understood it would be faster
> > to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> > just piggyback on a new dirty-log ioctl instead.
> 
> That feels a bad idea to me; there's a couple of different ways dirty
> page checking work; lets keep extracting the tags separate.
>

It's sounding like it was a premature optimization anyway. We don't yet
know if an ioctl for LDGM is worth it. Looping over LDG may work fine.

Thanks,
drew 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:39           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Steven Price, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, linux-arm-kernel,
	Dave Martin

On Thu, Sep 10, 2020 at 02:27:48PM +0100, Dr. David Alan Gilbert wrote:
> * Andrew Jones (drjones@redhat.com) wrote:
> > On Wed, Sep 09, 2020 at 06:45:33PM -0700, Richard Henderson wrote:
> > > On 9/9/20 8:25 AM, Andrew Jones wrote:
> > > >>  * Provide a KVM-specific method to extract the tags from guest memory.
> > > >>    This might also have benefits in terms of providing an easy way to
> > > >>    read bulk tag data from guest memory (since the LDGM instruction
> > > >>    isn't available at EL0).
> > > > 
> > > > Maybe we need a new version of KVM_GET_DIRTY_LOG that also provides
> > > > the tags for all addresses of each dirty page.
> > > 
> > > KVM_GET_DIRTY_LOG just provides one bit per dirty page, no?  Then VMM copies
> > > the data out from its local address to guest memory.
> > > 
> > > There'd be no difference with or without tags, afaik.  It's just about how VMM
> > > copies the data, with or without tags.
> > 
> > Right, as long as it's fast enough to do
> > 
> >   for_each_dirty_page(page, dirty_log)
> >     for (i = 0; i < host-page-size/16; i += 16)
> >       append_tag(LDG(page + i))
> > 
> > to get all the tags for each dirty page. I understood it would be faster
> > to use LDGM, but we'd need a new ioctl for that. So I was proposing we
> > just piggyback on a new dirty-log ioctl instead.
> 
> That feels a bad idea to me; there's a couple of different ways dirty
> page checking work; lets keep extracting the tags separate.
>

It's sounding like it was a premature optimization anyway. We don't yet
know if an ioctl for LDGM is worth it. Looping over LDG may work fine.

Thanks,
drew 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10  9:21         ` Steven Price
  (?)
  (?)
@ 2020-09-10 13:56           ` Andrew Jones
  -1 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:56 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:56           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:56 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:56           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:56 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 13:56           ` Andrew Jones
  0 siblings, 0 replies; 96+ messages in thread
From: Andrew Jones @ 2020-09-10 13:56 UTC (permalink / raw)
  To: Steven Price
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
> On 10/09/2020 07:29, Andrew Jones wrote:
> > But if userspace created the memslots with memory already set with
> > PROT_MTE, then this wouldn't be necessary, right? And, as long as
> > there's still a way to access the memory with tag checking disabled,
> > then it shouldn't be a problem.
> 
> Yes, so one option would be to attempt to validate that the VMM has provided
> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
> the VMM can change the memory backing at any time - so we could end up in
> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
> that point there's no nice way of handling it (other than silently upgrading
> the page) so the VM is dead.
> 
> So since enforcing that PG_mte_tagged is set isn't easy and provides a
> hard-to-debug foot gun to the VMM I decided the better option was to let the
> kernel set the bit automatically.
>

The foot gun still exists when migration is considered, no? If userspace
is telling a guest it can use MTE on its normal memory, but then doesn't
prepare that memory correctly, or remember to migrate the tags correctly
(which requires knowing the memory has tags and knowing how to get them),
then I guess the VM is in trouble one way or another.

I feel like we should trust the VMM to ensure MTE will work on any memory
the guest could use it on, and change the action in user_mem_abort() to
abort the guest with a big error message if it sees the flag is missing.
 
> > > > 
> > > > If userspace needs to write to guest memory then it should be due to
> > > > a device DMA or other specific hardware emulation. Those accesses can
> > > > be done with tag checking disabled.
> > > 
> > > Yes, the question is can the VMM (sensibly) wrap the accesses with a
> > > disable/renable tag checking for the process sequence. The alternative at
> > > the moment is to maintain a separate (untagged) mapping for the purpose
> > > which might present it's own problems.
> > 
> > Hmm, so there's no easy way to disable tag checking when necessary? If we
> > don't map the guest ram with PROT_MTE and continue setting the attribute
> > in KVM, as this series does, then we don't need to worry about it tag
> > checking when accessing the memory, but then we can't access the tags for
> > migration.
> 
> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
> tag checking, so if it's reasonable to wrap accesses to the memory you can
> simply set the TCO bit, perform the memory access and then unset TCO. That
> would mean a single mapping with MTE enabled would work fine. What I don't
> have a clue about is whether it's practical in the VMM to wrap guest
> accesses like this.
> 

At least QEMU goes through many abstractions to get to memory already.
There may already be a hook we could use, if not, it probably wouldn't
be too hard to add one (famous last words).

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10 13:56           ` Andrew Jones
  (?)
  (?)
@ 2020-09-10 14:14             ` Steven Price
  -1 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 14:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, linux-kernel, Juan Quintela, Catalin Marinas,
	Richard Henderson, qemu-devel, Dr. David Alan Gilbert,
	Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm,
	linux-arm-kernel, Dave Martin

On 10/09/2020 14:56, Andrew Jones wrote:
> On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
>> On 10/09/2020 07:29, Andrew Jones wrote:
>>> But if userspace created the memslots with memory already set with
>>> PROT_MTE, then this wouldn't be necessary, right? And, as long as
>>> there's still a way to access the memory with tag checking disabled,
>>> then it shouldn't be a problem.
>>
>> Yes, so one option would be to attempt to validate that the VMM has provided
>> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
>> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
>> the VMM can change the memory backing at any time - so we could end up in
>> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
>> that point there's no nice way of handling it (other than silently upgrading
>> the page) so the VM is dead.
>>
>> So since enforcing that PG_mte_tagged is set isn't easy and provides a
>> hard-to-debug foot gun to the VMM I decided the better option was to let the
>> kernel set the bit automatically.
>>
> 
> The foot gun still exists when migration is considered, no? If userspace
> is telling a guest it can use MTE on its normal memory, but then doesn't
> prepare that memory correctly, or remember to migrate the tags correctly
> (which requires knowing the memory has tags and knowing how to get them),
> then I guess the VM is in trouble one way or another.

Well not all VMMs support migration, and it's only migration that is 
affected by this for a simple VMM (e.g. the changes to kvmtool are 
minimal for MTE). But yes fundamentally if a VMM enables MTE it needs to 
know how to deal with the extra tags everywhere.

> I feel like we should trust the VMM to ensure MTE will work on any memory
> the guest could use it on, and change the action in user_mem_abort() to
> abort the guest with a big error message if it sees the flag is missing.

I'm happy to change it, if you feel this is easier to debug.

>>>>>
>>>>> If userspace needs to write to guest memory then it should be due to
>>>>> a device DMA or other specific hardware emulation. Those accesses can
>>>>> be done with tag checking disabled.
>>>>
>>>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>>>> disable/renable tag checking for the process sequence. The alternative at
>>>> the moment is to maintain a separate (untagged) mapping for the purpose
>>>> which might present it's own problems.
>>>
>>> Hmm, so there's no easy way to disable tag checking when necessary? If we
>>> don't map the guest ram with PROT_MTE and continue setting the attribute
>>> in KVM, as this series does, then we don't need to worry about it tag
>>> checking when accessing the memory, but then we can't access the tags for
>>> migration.
>>
>> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
>> tag checking, so if it's reasonable to wrap accesses to the memory you can
>> simply set the TCO bit, perform the memory access and then unset TCO. That
>> would mean a single mapping with MTE enabled would work fine. What I don't
>> have a clue about is whether it's practical in the VMM to wrap guest
>> accesses like this.
>>
> 
> At least QEMU goes through many abstractions to get to memory already.
> There may already be a hook we could use, if not, it probably wouldn't
> be too hard to add one (famous last words).

Sounds good. My hope was that the abstractions were already in there.

Thanks,

Steve

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 14:14             ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 14:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On 10/09/2020 14:56, Andrew Jones wrote:
> On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
>> On 10/09/2020 07:29, Andrew Jones wrote:
>>> But if userspace created the memslots with memory already set with
>>> PROT_MTE, then this wouldn't be necessary, right? And, as long as
>>> there's still a way to access the memory with tag checking disabled,
>>> then it shouldn't be a problem.
>>
>> Yes, so one option would be to attempt to validate that the VMM has provided
>> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
>> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
>> the VMM can change the memory backing at any time - so we could end up in
>> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
>> that point there's no nice way of handling it (other than silently upgrading
>> the page) so the VM is dead.
>>
>> So since enforcing that PG_mte_tagged is set isn't easy and provides a
>> hard-to-debug foot gun to the VMM I decided the better option was to let the
>> kernel set the bit automatically.
>>
> 
> The foot gun still exists when migration is considered, no? If userspace
> is telling a guest it can use MTE on its normal memory, but then doesn't
> prepare that memory correctly, or remember to migrate the tags correctly
> (which requires knowing the memory has tags and knowing how to get them),
> then I guess the VM is in trouble one way or another.

Well not all VMMs support migration, and it's only migration that is 
affected by this for a simple VMM (e.g. the changes to kvmtool are 
minimal for MTE). But yes fundamentally if a VMM enables MTE it needs to 
know how to deal with the extra tags everywhere.

> I feel like we should trust the VMM to ensure MTE will work on any memory
> the guest could use it on, and change the action in user_mem_abort() to
> abort the guest with a big error message if it sees the flag is missing.

I'm happy to change it, if you feel this is easier to debug.

>>>>>
>>>>> If userspace needs to write to guest memory then it should be due to
>>>>> a device DMA or other specific hardware emulation. Those accesses can
>>>>> be done with tag checking disabled.
>>>>
>>>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>>>> disable/renable tag checking for the process sequence. The alternative at
>>>> the moment is to maintain a separate (untagged) mapping for the purpose
>>>> which might present it's own problems.
>>>
>>> Hmm, so there's no easy way to disable tag checking when necessary? If we
>>> don't map the guest ram with PROT_MTE and continue setting the attribute
>>> in KVM, as this series does, then we don't need to worry about it tag
>>> checking when accessing the memory, but then we can't access the tags for
>>> migration.
>>
>> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
>> tag checking, so if it's reasonable to wrap accesses to the memory you can
>> simply set the TCO bit, perform the memory access and then unset TCO. That
>> would mean a single mapping with MTE enabled would work fine. What I don't
>> have a clue about is whether it's practical in the VMM to wrap guest
>> accesses like this.
>>
> 
> At least QEMU goes through many abstractions to get to memory already.
> There may already be a hook we could use, if not, it probably wouldn't
> be too hard to add one (famous last words).

Sounds good. My hope was that the abstractions were already in there.

Thanks,

Steve


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 14:14             ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 14:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On 10/09/2020 14:56, Andrew Jones wrote:
> On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
>> On 10/09/2020 07:29, Andrew Jones wrote:
>>> But if userspace created the memslots with memory already set with
>>> PROT_MTE, then this wouldn't be necessary, right? And, as long as
>>> there's still a way to access the memory with tag checking disabled,
>>> then it shouldn't be a problem.
>>
>> Yes, so one option would be to attempt to validate that the VMM has provided
>> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
>> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
>> the VMM can change the memory backing at any time - so we could end up in
>> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
>> that point there's no nice way of handling it (other than silently upgrading
>> the page) so the VM is dead.
>>
>> So since enforcing that PG_mte_tagged is set isn't easy and provides a
>> hard-to-debug foot gun to the VMM I decided the better option was to let the
>> kernel set the bit automatically.
>>
> 
> The foot gun still exists when migration is considered, no? If userspace
> is telling a guest it can use MTE on its normal memory, but then doesn't
> prepare that memory correctly, or remember to migrate the tags correctly
> (which requires knowing the memory has tags and knowing how to get them),
> then I guess the VM is in trouble one way or another.

Well not all VMMs support migration, and it's only migration that is 
affected by this for a simple VMM (e.g. the changes to kvmtool are 
minimal for MTE). But yes fundamentally if a VMM enables MTE it needs to 
know how to deal with the extra tags everywhere.

> I feel like we should trust the VMM to ensure MTE will work on any memory
> the guest could use it on, and change the action in user_mem_abort() to
> abort the guest with a big error message if it sees the flag is missing.

I'm happy to change it, if you feel this is easier to debug.

>>>>>
>>>>> If userspace needs to write to guest memory then it should be due to
>>>>> a device DMA or other specific hardware emulation. Those accesses can
>>>>> be done with tag checking disabled.
>>>>
>>>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>>>> disable/renable tag checking for the process sequence. The alternative at
>>>> the moment is to maintain a separate (untagged) mapping for the purpose
>>>> which might present it's own problems.
>>>
>>> Hmm, so there's no easy way to disable tag checking when necessary? If we
>>> don't map the guest ram with PROT_MTE and continue setting the attribute
>>> in KVM, as this series does, then we don't need to worry about it tag
>>> checking when accessing the memory, but then we can't access the tags for
>>> migration.
>>
>> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
>> tag checking, so if it's reasonable to wrap accesses to the memory you can
>> simply set the TCO bit, perform the memory access and then unset TCO. That
>> would mean a single mapping with MTE enabled would work fine. What I don't
>> have a clue about is whether it's practical in the VMM to wrap guest
>> accesses like this.
>>
> 
> At least QEMU goes through many abstractions to get to memory already.
> There may already be a hook we could use, if not, it probably wouldn't
> be too hard to add one (famous last words).

Sounds good. My hope was that the abstractions were already in there.

Thanks,

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 14:14             ` Steven Price
  0 siblings, 0 replies; 96+ messages in thread
From: Steven Price @ 2020-09-10 14:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Juan Quintela, Catalin Marinas, Richard Henderson,
	qemu-devel, linux-kernel, linux-arm-kernel, Marc Zyngier,
	Thomas Gleixner, Will Deacon, kvmarm, Dr. David Alan Gilbert,
	Dave Martin

On 10/09/2020 14:56, Andrew Jones wrote:
> On Thu, Sep 10, 2020 at 10:21:04AM +0100, Steven Price wrote:
>> On 10/09/2020 07:29, Andrew Jones wrote:
>>> But if userspace created the memslots with memory already set with
>>> PROT_MTE, then this wouldn't be necessary, right? And, as long as
>>> there's still a way to access the memory with tag checking disabled,
>>> then it shouldn't be a problem.
>>
>> Yes, so one option would be to attempt to validate that the VMM has provided
>> memory pages with the PG_mte_tagged bit set (e.g. by mapping with PROT_MTE).
>> The tricky part here is that we support KVM_CAP_SYNC_MMU which means that
>> the VMM can change the memory backing at any time - so we could end up in
>> user_mem_abort() discovering that a page doesn't have PG_mte_tagged set - at
>> that point there's no nice way of handling it (other than silently upgrading
>> the page) so the VM is dead.
>>
>> So since enforcing that PG_mte_tagged is set isn't easy and provides a
>> hard-to-debug foot gun to the VMM I decided the better option was to let the
>> kernel set the bit automatically.
>>
> 
> The foot gun still exists when migration is considered, no? If userspace
> is telling a guest it can use MTE on its normal memory, but then doesn't
> prepare that memory correctly, or remember to migrate the tags correctly
> (which requires knowing the memory has tags and knowing how to get them),
> then I guess the VM is in trouble one way or another.

Well not all VMMs support migration, and it's only migration that is 
affected by this for a simple VMM (e.g. the changes to kvmtool are 
minimal for MTE). But yes fundamentally if a VMM enables MTE it needs to 
know how to deal with the extra tags everywhere.

> I feel like we should trust the VMM to ensure MTE will work on any memory
> the guest could use it on, and change the action in user_mem_abort() to
> abort the guest with a big error message if it sees the flag is missing.

I'm happy to change it, if you feel this is easier to debug.

>>>>>
>>>>> If userspace needs to write to guest memory then it should be due to
>>>>> a device DMA or other specific hardware emulation. Those accesses can
>>>>> be done with tag checking disabled.
>>>>
>>>> Yes, the question is can the VMM (sensibly) wrap the accesses with a
>>>> disable/renable tag checking for the process sequence. The alternative at
>>>> the moment is to maintain a separate (untagged) mapping for the purpose
>>>> which might present it's own problems.
>>>
>>> Hmm, so there's no easy way to disable tag checking when necessary? If we
>>> don't map the guest ram with PROT_MTE and continue setting the attribute
>>> in KVM, as this series does, then we don't need to worry about it tag
>>> checking when accessing the memory, but then we can't access the tags for
>>> migration.
>>
>> There's a "TCO" (Tag Check Override) bit in PSTATE which allows disabling
>> tag checking, so if it's reasonable to wrap accesses to the memory you can
>> simply set the TCO bit, perform the memory access and then unset TCO. That
>> would mean a single mapping with MTE enabled would work fine. What I don't
>> have a clue about is whether it's practical in the VMM to wrap guest
>> accesses like this.
>>
> 
> At least QEMU goes through many abstractions to get to memory already.
> There may already be a hook we could use, if not, it probably wouldn't
> be too hard to add one (famous last words).

Sounds good. My hope was that the abstractions were already in there.

Thanks,

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
  2020-09-10 10:24     ` Steven Price
  (?)
  (?)
@ 2020-09-10 15:36       ` Richard Henderson
  -1 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10 15:36 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, kvmarm,
	linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland,
	Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Peter Maydell, Haibo Xu

On 9/10/20 3:24 AM, Steven Price wrote:
> It is a shame, however I suspect this is because to use those instructions you
> need to know the block size held in GMID_EL1. And at least in theory that could
> vary between CPUs.

Which is no different from having to read DCZID_EL0 in order to implement
memset, in my opinion.  But, whatever.


> When we have some real hardware it would be worth profiling this. At the moment
> I've no idea whether the kernel entry overhead would make such an interface
> useful from a performance perspective or not.

Yep.


r~

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 15:36       ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10 15:36 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 9/10/20 3:24 AM, Steven Price wrote:
> It is a shame, however I suspect this is because to use those instructions you
> need to know the block size held in GMID_EL1. And at least in theory that could
> vary between CPUs.

Which is no different from having to read DCZID_EL0 in order to implement
memset, in my opinion.  But, whatever.


> When we have some real hardware it would be worth profiling this. At the moment
> I've no idea whether the kernel entry overhead would make such an interface
> useful from a performance perspective or not.

Yep.


r~


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 15:36       ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10 15:36 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, Peter Maydell, qemu-devel, Juan Quintela,
	linux-kernel, Dave Martin, Thomas Gleixner, kvmarm,
	linux-arm-kernel

On 9/10/20 3:24 AM, Steven Price wrote:
> It is a shame, however I suspect this is because to use those instructions you
> need to know the block size held in GMID_EL1. And at least in theory that could
> vary between CPUs.

Which is no different from having to read DCZID_EL0 in order to implement
memset, in my opinion.  But, whatever.


> When we have some real hardware it would be worth profiling this. At the moment
> I've no idea whether the kernel entry overhead would make such an interface
> useful from a performance perspective or not.

Yep.


r~
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 0/2] MTE support for KVM guest
@ 2020-09-10 15:36       ` Richard Henderson
  0 siblings, 0 replies; 96+ messages in thread
From: Richard Henderson @ 2020-09-10 15:36 UTC (permalink / raw)
  To: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Dr. David Alan Gilbert, Peter Maydell, Haibo Xu,
	Suzuki K Poulose, qemu-devel, Juan Quintela, linux-kernel,
	Dave Martin, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

On 9/10/20 3:24 AM, Steven Price wrote:
> It is a shame, however I suspect this is because to use those instructions you
> need to know the block size held in GMID_EL1. And at least in theory that could
> vary between CPUs.

Which is no different from having to read DCZID_EL0 in order to implement
memset, in my opinion.  But, whatever.


> When we have some real hardware it would be worth profiling this. At the moment
> I've no idea whether the kernel entry overhead would make such an interface
> useful from a performance perspective or not.

Yep.


r~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2020-09-10 21:36 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 16:00 [PATCH v2 0/2] MTE support for KVM guest Steven Price
2020-09-04 16:00 ` Steven Price
2020-09-04 16:00 ` Steven Price
2020-09-04 16:00 ` Steven Price
2020-09-04 16:00 ` [PATCH v2 1/2] arm64: kvm: Save/restore MTE registers Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-04 16:00 ` [PATCH v2 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-04 16:00   ` Steven Price
2020-09-09 15:48   ` Andrew Jones
2020-09-09 15:48     ` Andrew Jones
2020-09-09 15:48     ` Andrew Jones
2020-09-09 15:48     ` Andrew Jones
2020-09-09 15:53     ` Peter Maydell
2020-09-09 15:53       ` Peter Maydell
2020-09-09 15:53       ` Peter Maydell
2020-09-09 15:53       ` Peter Maydell
2020-09-10  6:38       ` Andrew Jones
2020-09-10  6:38         ` Andrew Jones
2020-09-10  6:38         ` Andrew Jones
2020-09-10  6:38         ` Andrew Jones
2020-09-10 10:01         ` Andrew Jones
2020-09-10 10:01           ` Andrew Jones
2020-09-10 10:01           ` Andrew Jones
2020-09-10 10:01           ` Andrew Jones
2020-09-10  9:21     ` Steven Price
2020-09-10  9:21       ` Steven Price
2020-09-10  9:21       ` Steven Price
2020-09-10  9:21       ` Steven Price
2020-09-10 11:49       ` Andrew Jones
2020-09-10 11:49         ` Andrew Jones
2020-09-10 11:49         ` Andrew Jones
2020-09-10 11:49         ` Andrew Jones
2020-09-07 15:28 ` [PATCH v2 0/2] MTE support for KVM guest Dr. David Alan Gilbert
2020-09-07 15:28   ` Dr. David Alan Gilbert
2020-09-07 15:28   ` Dr. David Alan Gilbert
2020-09-07 15:28   ` Dr. David Alan Gilbert
2020-09-09  9:15   ` Steven Price
2020-09-09  9:15     ` Steven Price
2020-09-09  9:15     ` Steven Price
2020-09-09  9:15     ` Steven Price
2020-09-09 15:25 ` Andrew Jones
2020-09-09 15:25   ` Andrew Jones
2020-09-09 15:25   ` Andrew Jones
2020-09-09 15:25   ` Andrew Jones
2020-09-09 16:04   ` Steven Price
2020-09-09 16:04     ` Steven Price
2020-09-09 16:04     ` Steven Price
2020-09-09 16:04     ` Steven Price
2020-09-10  6:29     ` Andrew Jones
2020-09-10  6:29       ` Andrew Jones
2020-09-10  6:29       ` Andrew Jones
2020-09-10  6:29       ` Andrew Jones
2020-09-10  9:21       ` Steven Price
2020-09-10  9:21         ` Steven Price
2020-09-10  9:21         ` Steven Price
2020-09-10  9:21         ` Steven Price
2020-09-10 13:56         ` Andrew Jones
2020-09-10 13:56           ` Andrew Jones
2020-09-10 13:56           ` Andrew Jones
2020-09-10 13:56           ` Andrew Jones
2020-09-10 14:14           ` Steven Price
2020-09-10 14:14             ` Steven Price
2020-09-10 14:14             ` Steven Price
2020-09-10 14:14             ` Steven Price
2020-09-10  1:45   ` Richard Henderson
2020-09-10  1:45     ` Richard Henderson
2020-09-10  1:45     ` Richard Henderson
2020-09-10  1:45     ` Richard Henderson
2020-09-10  5:44     ` Andrew Jones
2020-09-10  5:44       ` Andrew Jones
2020-09-10  5:44       ` Andrew Jones
2020-09-10  5:44       ` Andrew Jones
2020-09-10 13:27       ` Dr. David Alan Gilbert
2020-09-10 13:27         ` Dr. David Alan Gilbert
2020-09-10 13:27         ` Dr. David Alan Gilbert
2020-09-10 13:27         ` Dr. David Alan Gilbert
2020-09-10 13:39         ` Andrew Jones
2020-09-10 13:39           ` Andrew Jones
2020-09-10 13:39           ` Andrew Jones
2020-09-10 13:39           ` Andrew Jones
2020-09-10  0:33 ` Richard Henderson
2020-09-10  0:33   ` Richard Henderson
2020-09-10  0:33   ` Richard Henderson
2020-09-10  0:33   ` Richard Henderson
2020-09-10 10:24   ` Steven Price
2020-09-10 10:24     ` Steven Price
2020-09-10 10:24     ` Steven Price
2020-09-10 10:24     ` Steven Price
2020-09-10 15:36     ` Richard Henderson
2020-09-10 15:36       ` Richard Henderson
2020-09-10 15:36       ` Richard Henderson
2020-09-10 15:36       ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.