linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 0/8] MTE support for KVM guest
@ 2021-06-07 11:08 Steven Price
  2021-06-07 11:08 ` [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags Steven Price
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

Changes since v13[1]:

 * Add Reviewed-by tags from Catalin - thanks!

 * Introduce a new function mte_prepare_page_tags() for handling the
   initialisation of pages ready for a KVM guest. This takes the big
   tag_sync_lock removing any race with the VMM (or another guest)
   around clearing the tags and setting PG_mte_tagged.

 * The ioctl to fetch/store tags now returns the number of bytes
   processed so userspace can tell how far a partial fetch/store got.

 * Some minor refactoring to tidy up the code thanks to pointers from
   Catalin.

[1] https://lore.kernel.org/r/20210524104513.13258-1-steven.price%40arm.com

Catalin Marinas (1):
  arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()

Steven Price (7):
  arm64: mte: Handle race when synchronising tags
  arm64: mte: Sync tags for pages where PTE is untagged
  KVM: arm64: Introduce MTE VM feature
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst             | 57 +++++++++++++++
 arch/arm64/include/asm/kvm_arm.h           |  3 +-
 arch/arm64/include/asm/kvm_emulate.h       |  3 +
 arch/arm64/include/asm/kvm_host.h          | 12 ++++
 arch/arm64/include/asm/kvm_mte.h           | 66 +++++++++++++++++
 arch/arm64/include/asm/mte-def.h           |  1 +
 arch/arm64/include/asm/mte.h               |  8 ++-
 arch/arm64/include/asm/page.h              |  6 +-
 arch/arm64/include/asm/pgtable.h           | 22 +++++-
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/include/uapi/asm/kvm.h          | 11 +++
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kernel/mte.c                    | 54 ++++++++++++--
 arch/arm64/kvm/arm.c                       | 16 +++++
 arch/arm64/kvm/guest.c                     | 82 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/entry.S                 |  7 ++
 arch/arm64/kvm/hyp/exception.c             |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++++++
 arch/arm64/kvm/mmu.c                       | 42 ++++++++++-
 arch/arm64/kvm/reset.c                     |  3 +-
 arch/arm64/kvm/sys_regs.c                  | 32 +++++++--
 arch/arm64/mm/fault.c                      | 21 ++++++
 include/uapi/linux/kvm.h                   |  2 +
 23 files changed, 454 insertions(+), 23 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-09 10:30   ` Marc Zyngier
  2021-06-07 11:08 ` [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
before restoring/zeroing the MTE tags. However if another thread were to
race and attempt to sync the tags on the same page before the first
thread had completed restoring/zeroing then it would see the flag is
already set and continue without waiting. This would potentially expose
the previous contents of the tags to user space, and cause any updates
that user space makes before the restoring/zeroing has completed to
potentially be lost.

Since this code is run from atomic contexts we can't just lock the page
during the process. Instead implement a new (global) spinlock to protect
the mte_sync_page_tags() function.

Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kernel/mte.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..a3583a7fd400 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -25,6 +25,7 @@
 u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
+static DEFINE_SPINLOCK(tag_sync_lock);
 
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
@@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 
 static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 {
+	unsigned long flags;
 	pte_t old_pte = READ_ONCE(*ptep);
 
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
-			return;
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
+			goto out;
+		}
 	}
 
 	page_kasan_tag_reset(page);
@@ -53,6 +63,10 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	 */
 	smp_wmb();
 	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -63,7 +77,7 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+		if (!test_bit(PG_mte_tagged, &page->flags))
 			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
  2021-06-07 11:08 ` [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 17:07   ` Catalin Marinas
  2021-06-07 11:08 ` [PATCH v14 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

From: Catalin Marinas <catalin.marinas@arm.com>

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags under a spinlock to avoid a
race on the page->flags. In order to optimise the lock, clear the page
tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
have VM_MTE set.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/page.h |  6 ++++--
 arch/arm64/mm/fault.c         | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..97853570d0f1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h>
 #include <asm/pgtable-types.h>
 
 struct page;
@@ -28,8 +29,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 871c82ab0a30..5a03428e97f3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -921,3 +921,24 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+/*
+ * Used during anonymous page fault handling.
+ */
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr)
+{
+	struct page *page;
+	bool tagged = system_supports_mte() && (vma->vm_flags & VM_MTE);
+
+	page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma,
+			      vaddr);
+	if (tagged && page) {
+		mte_clear_page_tags(page_address(page));
+		page_kasan_tag_reset(page);
+		set_bit(PG_mte_tagged, &page->flags);
+	}
+
+	return page;
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 3/8] arm64: mte: Sync tags for pages where PTE is untagged
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
  2021-06-07 11:08 ` [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags Steven Price
  2021-06-07 11:08 ` [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 11:08 ` [PATCH v14 4/8] KVM: arm64: Introduce MTE VM feature Steven Price
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages for which pte_access_permitted() returns false
as these will not have been swapped out.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/mte.h     |  4 ++--
 arch/arm64/include/asm/pgtable.h | 22 +++++++++++++++++++---
 arch/arm64/kernel/mte.c          | 17 +++++++++++++----
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..347ef38a35f7 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,7 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
-void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
 void mte_thread_switch(struct task_struct *next);
@@ -53,7 +53,7 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
-static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
 static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b10204e72fc..db5402168841 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,9 +314,25 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
-	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
-		mte_sync_tags(ptep, pte);
+	/*
+	 * If the PTE would provide user space access to the tags associated
+	 * with it then ensure that the MTE tags are synchronised.  Although
+	 * pte_access_permitted() returns false for exec only mappings, they
+	 * don't expose tags (instruction fetches don't check tags).
+	 */
+	if (system_supports_mte() && pte_access_permitted(pte, false) &&
+	    !pte_special(pte)) {
+		pte_t old_pte = READ_ONCE(*ptep);
+		/*
+		 * We only need to synchronise if the new PTE has tags enabled
+		 * or if swapping in (in which case another mapping may have
+		 * set tags in the past even if this PTE isn't tagged).
+		 * (!pte_none() && !pte_present()) is an open coded version of
+		 * is_swap_pte()
+		 */
+		if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+			mte_sync_tags(old_pte, pte);
+	}
 
 	__check_racy_pte_update(mm, ptep, pte);
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index a3583a7fd400..ae0a3c68fece 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -33,10 +33,10 @@ DEFINE_STATIC_KEY_FALSE(mte_async_mode);
 EXPORT_SYMBOL_GPL(mte_async_mode);
 #endif
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t old_pte,
+			       bool check_swap, bool pte_is_tagged)
 {
 	unsigned long flags;
-	pte_t old_pte = READ_ONCE(*ptep);
 
 	spin_lock_irqsave(&tag_sync_lock, flags);
 
@@ -53,6 +53,9 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 		}
 	}
 
+	if (!pte_is_tagged)
+		goto out;
+
 	page_kasan_tag_reset(page);
 	/*
 	 * We need smp_wmb() in between setting the flags and clearing the
@@ -69,16 +72,22 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
-void mte_sync_tags(pte_t *ptep, pte_t pte)
+void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
+	bool pte_is_tagged = pte_tagged(pte);
+
+	/* Early out if there's nothing to do */
+	if (!check_swap && !pte_is_tagged)
+		return;
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+			mte_sync_page_tags(page, old_pte, check_swap,
+					   pte_is_tagged);
 	}
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
                   ` (2 preceding siblings ...)
  2021-06-07 11:08 ` [PATCH v14 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 11:08 ` [PATCH v14 5/8] KVM: arm64: Save/restore MTE registers Steven Price
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++
 arch/arm64/include/asm/kvm_host.h    |  3 ++
 arch/arm64/include/asm/mte.h         |  4 +++
 arch/arm64/kernel/mte.c              | 17 +++++++++++
 arch/arm64/kvm/hyp/exception.c       |  3 +-
 arch/arm64/kvm/mmu.c                 | 42 +++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c            |  7 +++++
 include/uapi/linux/kvm.h             |  1 +
 8 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (kvm_has_mte(vcpu->kvm))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..afaa5333f0e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 347ef38a35f7..be1de541a11c 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,6 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
+void mte_prepare_page_tags(struct page *page);
 void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
@@ -53,6 +54,9 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
+static inline void mte_prepare_page_tags(struct page *page)
+{
+}
 static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index ae0a3c68fece..b120f82a2258 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -72,6 +72,23 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
+void mte_prepare_page_tags(struct page *page)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
+	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
+}
+
 void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+	if (kvm_has_mte(vcpu->kvm))
+		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..ed7c624e7362 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -822,6 +822,36 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 	return PAGE_SIZE;
 }
 
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+			     unsigned long size)
+{
+	unsigned long i, nr_pages = size >> PAGE_SHIFT;
+	struct page *page;
+
+	if (!kvm_has_mte(kvm))
+		return 0;
+
+	/*
+	 * The page will be mapped in stage 2 as Normal Cacheable, so
+	 * the VM will be able to see the page's tags and therefore
+	 * they must be initialised first. If PG_mte_tagged is set,
+	 * tags have already been initialised.
+	 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+	 * that may not support tags.
+	 */
+	page = pfn_to_online_page(pfn);
+
+	if (!page)
+		return -EFAULT;
+
+	for (i = 0; i < nr_pages; i++, page++) {
+		if (!test_bit(PG_mte_tagged, &page->flags))
+			mte_prepare_page_tags(page);
+	}
+
+	return 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -971,8 +1001,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
+	if (fault_status != FSC_PERM && !device) {
+		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+		if (ret)
+			goto out_unlock;
+
 		clean_dcache_guest_page(pfn, vma_pagesize);
+	}
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
@@ -1168,12 +1203,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	kvm_pfn_t pfn = pte_pfn(range->pte);
+	int ret;
 
 	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	WARN_ON(range->end - range->start != 1);
 
+	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+	if (ret)
+		return false;
+
 	/*
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76ea2800c33e..4a98902eaf1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,13 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		break;
 	case SYS_ID_AA64PFR1_EL1:
 		val &= ~FEATURE(ID_AA64PFR1_MTE);
+		if (kvm_has_mte(vcpu->kvm)) {
+			u64 pfr, mte;
+
+			pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
+			mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT);
+			val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE), mte);
+		}
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..8c95ba0fadda 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MTE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 5/8] KVM: arm64: Save/restore MTE registers
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
                   ` (3 preceding siblings ...)
  2021-06-07 11:08 ` [PATCH v14 4/8] KVM: arm64: Introduce MTE VM feature Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 11:08 ` [PATCH v14 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE Steven Price
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h           |  3 +-
 arch/arm64/include/asm/kvm_host.h          |  6 ++
 arch/arm64/include/asm/kvm_mte.h           | 66 ++++++++++++++++++++++
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kvm/hyp/entry.S                 |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++++++
 arch/arm64/kvm/sys_regs.c                  | 22 ++++++--
 8 files changed, 124 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 692c9049befa..d436831dd706 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -12,7 +12,8 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
-#define HCR_ATA		(UL(1) << 56)
+#define HCR_ATA_SHIFT	56
+#define HCR_ATA		(UL(1) << HCR_ATA_SHIFT)
 #define HCR_FWB		(UL(1) << 46)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index afaa5333f0e4..309e36cc1b42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
 	CNTP_CVAL_EL0,
 	CNTP_CTL_EL0,
 
+	/* Memory Tagging Extension registers */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
+	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index 000000000000..88dd1199670b
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020-2021 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/sysreg.h>
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	tbz	\reg1, #(HCR_ATA_SHIFT), .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	tbz	\reg1, #(HCR_ATA_SHIFT), .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+	isb
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..347ccac2341e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -651,7 +651,8 @@
 
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
-	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
+	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
+	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..6f0044cb233e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -111,6 +111,8 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e831d3dfd50d..435346ea1504 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -13,6 +13,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mte.h>
 #include <asm/kvm_ptrauth.h>
 
 	.text
@@ -51,6 +52,9 @@ alternative_else_nop_endif
 
 	add	x29, x0, #VCPU_CONTEXT
 
+	// mte_switch_to_guest(g_ctxt, h_ctxt, tmp1)
+	mte_switch_to_guest x29, x1, x2
+
 	// Macro ptrauth_switch_to_guest format:
 	// 	ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
 	// The below macro to restore guest keys is not implemented in C code
@@ -142,6 +146,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 	// when this feature is enabled for kernel code.
 	ptrauth_switch_to_hyp x1, x2, x3, x4, x5
 
+	// mte_switch_to_hyp(g_ctxt, h_ctxt, reg1)
+	mte_switch_to_hyp x1, x2, x3
+
 	// Restore hyp's sp_el0
 	restore_sp_el0 x2, x3
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index cce43bfe158f..de7e14c862e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -14,6 +14,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
@@ -26,6 +27,16 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
+static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->__hyp_running_vcpu;
+
+	if (!vcpu)
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	return kvm_has_mte(kern_hyp_va(vcpu->kvm));
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
@@ -46,6 +57,11 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
+	}
+
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
@@ -107,6 +123,11 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
+
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
 	    ctxt->__hyp_running_vcpu) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a98902eaf1a..440315a556c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1309,6 +1309,20 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN;
+}
+
+#define MTE_REG(name) {				\
+	SYS_DESC(SYS_##name),			\
+	.access = undef_access,			\
+	.reset = reset_unknown,			\
+	.reg = name,				\
+	.visibility = mte_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1477,8 +1491,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), undef_access },
-	{ SYS_DESC(SYS_GCR_EL1), undef_access },
+	MTE_REG(RGSR_EL1),
+	MTE_REG(GCR_EL1),
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TRFCR_EL1), undef_access },
@@ -1505,8 +1519,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), undef_access },
-	{ SYS_DESC(SYS_TFSRE0_EL1), undef_access },
+	MTE_REG(TFSR_EL1),
+	MTE_REG(TFSRE0_EL1),
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
                   ` (4 preceding siblings ...)
  2021-06-07 11:08 ` [PATCH v14 5/8] KVM: arm64: Save/restore MTE registers Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 11:08 ` [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
  2021-06-07 11:08 ` [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
  7 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c      | 9 +++++++++
 arch/arm64/kvm/reset.c    | 3 ++-
 arch/arm64/kvm/sys_regs.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..e89a5e275e25 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -237,6 +243,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..50635eacfa43 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -220,7 +220,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
+			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) ||
+			    vcpu->kvm->arch.mte_enabled) {
 				ret = -EINVAL;
 				goto out;
 			}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 440315a556c2..d4e1c1b1a08d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1312,6 +1312,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (kvm_has_mte(vcpu->kvm))
+		return 0;
+
 	return REG_HIDDEN;
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
                   ` (5 preceding siblings ...)
  2021-06-07 11:08 ` [PATCH v14 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 17:21   ` Catalin Marinas
  2021-06-07 11:08 ` [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
  7 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/include/asm/mte-def.h  |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 11 +++++
 arch/arm64/kvm/arm.c              |  7 +++
 arch/arm64/kvm/guest.c            | 82 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 105 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 309e36cc1b42..6a2ac4636d42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -729,6 +729,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+				struct kvm_arm_copy_mte_tags *copy_tags);
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h
index cf241b0f0a42..626d359b396e 100644
--- a/arch/arm64/include/asm/mte-def.h
+++ b/arch/arm64/include/asm/mte-def.h
@@ -7,6 +7,7 @@
 
 #define MTE_GRANULE_SIZE	UL(16)
 #define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_GRANULES_PER_PAGE	(PAGE_SIZE / MTE_GRANULE_SIZE)
 #define MTE_TAG_SHIFT		56
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..b3edde68bc3e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,17 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST		0
+#define KVM_ARM_TAGS_FROM_GUEST		1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e89a5e275e25..baa33359e477 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_ARM_MTE_COPY_TAGS: {
+		struct kvm_arm_copy_mte_tags copy_tags;
+
+		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
+			return -EFAULT;
+		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..4ddb20017b2f 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -995,3 +995,85 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
+
+long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+				struct kvm_arm_copy_mte_tags *copy_tags)
+{
+	gpa_t guest_ipa = copy_tags->guest_ipa;
+	size_t length = copy_tags->length;
+	void __user *tags = copy_tags->addr;
+	gpa_t gfn;
+	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return -EINVAL;
+
+	if (copy_tags->reserved[0] || copy_tags->reserved[1])
+		return -EINVAL;
+
+	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+		return -EINVAL;
+
+	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	gfn = gpa_to_gfn(guest_ipa);
+
+	mutex_lock(&kvm->slots_lock);
+
+	while (length > 0) {
+		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		void *maddr;
+		unsigned long num_tags;
+		struct page *page;
+
+		if (is_error_noslot_pfn(pfn)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		page = pfn_to_online_page(pfn);
+		if (!page) {
+			/* Reject ZONE_DEVICE memory */
+			ret = -EFAULT;
+			goto out;
+		}
+		maddr = page_address(page);
+
+		if (!write) {
+			if (test_bit(PG_mte_tagged, &page->flags))
+				num_tags = mte_copy_tags_to_user(tags, maddr,
+							MTE_GRANULES_PER_PAGE);
+			else
+				/* No tags in memory, so write zeros */
+				num_tags = MTE_GRANULES_PER_PAGE -
+					clear_user(tags, MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_clean(pfn);
+		} else {
+			num_tags = mte_copy_tags_from_user(maddr, tags,
+							MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_dirty(pfn);
+		}
+
+		if (num_tags != MTE_GRANULES_PER_PAGE) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* Set the flag after checking the write completed fully */
+		if (write)
+			set_bit(PG_mte_tagged, &page->flags);
+
+		gfn++;
+		tags += num_tags;
+		length -= PAGE_SIZE;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	/* If some data has been copied report the number of bytes copied */
+	if (length != copy_tags->length)
+		return copy_tags->length - length;
+	return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8c95ba0fadda..4c011c60d468 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1428,6 +1428,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl
  2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
                   ` (6 preceding siblings ...)
  2021-06-07 11:08 ` [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
@ 2021-06-07 11:08 ` Steven Price
  2021-06-07 17:32   ` Catalin Marinas
  7 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-07 11:08 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..fc6f0cbc30b3 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,42 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: number of bytes copied, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
+If an error occurs before any data is copied then a negative error code is
+returned. If some tags have been copied before an error occurs then the number
+of bytes successfully copied is returned. If the call completes successfully
+then ``length`` is returned.
+
 5. The kvm_run structure
 ========================
 
@@ -6362,6 +6398,27 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ======================
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
  2021-06-07 11:08 ` [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
@ 2021-06-07 17:07   ` Catalin Marinas
  2021-06-09  8:56     ` Steven Price
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2021-06-07 17:07 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, Jun 07, 2021 at 12:08:10PM +0100, Steven Price wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Currently, on an anonymous page fault, the kernel allocates a zeroed
> page and maps it in user space. If the mapping is tagged (PROT_MTE),
> set_pte_at() additionally clears the tags under a spinlock to avoid a
> race on the page->flags. In order to optimise the lock, clear the page
> tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
> have VM_MTE set.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>

I think you can drop this patch now that Peter's series has been queued
via the arm64 tree:

https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-07 11:08 ` [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
@ 2021-06-07 17:21   ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-06-07 17:21 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, Jun 07, 2021 at 12:08:15PM +0100, Steven Price wrote:
> The VMM may not wish to have it's own mapping of guest memory mapped
> with PROT_MTE because this causes problems if the VMM has tag checking
> enabled (the guest controls the tags in physical RAM and it's unlikely
> the tags are correct for the VMM).
> 
> Instead add a new ioctl which allows the VMM to easily read/write the
> tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
> while the VMM can still read/write the tags for the purpose of
> migration.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl
  2021-06-07 11:08 ` [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
@ 2021-06-07 17:32   ` Catalin Marinas
  2021-06-09  8:56     ` Steven Price
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2021-06-07 17:32 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, Jun 07, 2021 at 12:08:16PM +0100, Steven Price wrote:
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 22d077562149..fc6f0cbc30b3 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5034,6 +5034,42 @@ see KVM_XEN_VCPU_SET_ATTR above.
>  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>  with the KVM_XEN_VCPU_GET_ATTR ioctl.
>  
> +4.130 KVM_ARM_MTE_COPY_TAGS
> +---------------------------
> +
> +:Capability: KVM_CAP_ARM_MTE
> +:Architectures: arm64
> +:Type: vm ioctl
> +:Parameters: struct kvm_arm_copy_mte_tags
> +:Returns: number of bytes copied, < 0 on error

I guess you can be a bit more specific here, -EINVAL on incorrect
arguments, -EFAULT if the guest memory cannot be accessed.

> +
> +::
> +
> +  struct kvm_arm_copy_mte_tags {
> +	__u64 guest_ipa;
> +	__u64 length;
> +	void __user *addr;
> +	__u64 flags;
> +	__u64 reserved[2];
> +  };
> +
> +Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
> +``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
> +fieldmust point to a buffer which the tags will be copied to or from.

s/fieldmust/field must/

> +
> +``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
> +``KVM_ARM_TAGS_FROM_GUEST``.
> +
> +The size of the buffer to store the tags is ``(length / 16)`` bytes
> +(granules in MTE are 16 bytes long). Each byte contains a single tag
> +value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
> +``PTRACE_POKEMTETAGS``.

One difference I think with ptrace() is that iov_len (length here) is
the actual buffer size. But for kvm I think this works better since
length is tied to the guest_ipa.

> +
> +If an error occurs before any data is copied then a negative error code is
> +returned. If some tags have been copied before an error occurs then the number
> +of bytes successfully copied is returned. If the call completes successfully
> +then ``length`` is returned.
> +
>  5. The kvm_run structure
>  ========================
>  
> @@ -6362,6 +6398,27 @@ default.
>  
>  See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
>  
> +7.26 KVM_CAP_ARM_MTE
> +--------------------
> +
> +:Architectures: arm64
> +:Parameters: none
> +
> +This capability indicates that KVM (and the hardware) supports exposing the
> +Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
> +VMM before creating any VCPUs to allow the guest access. Note that MTE is only
> +available to a guest running in AArch64 mode and enabling this capability will
> +cause attempts to create AArch32 VCPUs to fail.
> +
> +When enabled the guest is able to access tags associated with any memory given
> +to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
> +that the tags are maintained during swap or hibernation of the host; however

I'd drop PG_mte_tagged here, that's just how the implementation handles
it, not necessary for describing the API. You can just say "KVM will
ensure that the tags are maintained during swap or hibernation of the
host"

> +the VMM needs to manually save/restore the tags as appropriate if the VM is
> +migrated.
> +
> +When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> +perform a bulk copy of tags to/from the guest.
> +
>  8. Other capabilities.
>  ======================
>  
> -- 
> 2.20.1

Otherwise, feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
  2021-06-07 17:07   ` Catalin Marinas
@ 2021-06-09  8:56     ` Steven Price
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-09  8:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 07/06/2021 18:07, Catalin Marinas wrote:
> On Mon, Jun 07, 2021 at 12:08:10PM +0100, Steven Price wrote:
>> From: Catalin Marinas <catalin.marinas@arm.com>
>>
>> Currently, on an anonymous page fault, the kernel allocates a zeroed
>> page and maps it in user space. If the mapping is tagged (PROT_MTE),
>> set_pte_at() additionally clears the tags under a spinlock to avoid a
>> race on the page->flags. In order to optimise the lock, clear the page
>> tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
>> have VM_MTE set.
>>
>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
> I think you can drop this patch now that Peter's series has been queued
> via the arm64 tree:
> 
> https://lore.kernel.org/r/20210602235230.3928842-4-pcc@google.com
> 

Thanks for the heads up - I hadn't seen that land. I'll drop this patch
from the next posting.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl
  2021-06-07 17:32   ` Catalin Marinas
@ 2021-06-09  8:56     ` Steven Price
  0 siblings, 0 replies; 20+ messages in thread
From: Steven Price @ 2021-06-09  8:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 07/06/2021 18:32, Catalin Marinas wrote:
> On Mon, Jun 07, 2021 at 12:08:16PM +0100, Steven Price wrote:
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 22d077562149..fc6f0cbc30b3 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -5034,6 +5034,42 @@ see KVM_XEN_VCPU_SET_ATTR above.
>>  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>>  with the KVM_XEN_VCPU_GET_ATTR ioctl.
>>  
>> +4.130 KVM_ARM_MTE_COPY_TAGS
>> +---------------------------
>> +
>> +:Capability: KVM_CAP_ARM_MTE
>> +:Architectures: arm64
>> +:Type: vm ioctl
>> +:Parameters: struct kvm_arm_copy_mte_tags
>> +:Returns: number of bytes copied, < 0 on error
> 
> I guess you can be a bit more specific here, -EINVAL on incorrect
> arguments, -EFAULT if the guest memory cannot be accessed.

Sure. Note that -EFAULT can also be returned if the VMM's memory cannot
be accessed (the other end of the copy).

>> +
>> +::
>> +
>> +  struct kvm_arm_copy_mte_tags {
>> +	__u64 guest_ipa;
>> +	__u64 length;
>> +	void __user *addr;
>> +	__u64 flags;
>> +	__u64 reserved[2];
>> +  };
>> +
>> +Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
>> +``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
>> +fieldmust point to a buffer which the tags will be copied to or from.
> 
> s/fieldmust/field must/

Thanks - Vim's spell checker missed that apparently because it's syntax
highlighter got confused.

>> +
>> +``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
>> +``KVM_ARM_TAGS_FROM_GUEST``.
>> +
>> +The size of the buffer to store the tags is ``(length / 16)`` bytes
>> +(granules in MTE are 16 bytes long). Each byte contains a single tag
>> +value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
>> +``PTRACE_POKEMTETAGS``.
> 
> One difference I think with ptrace() is that iov_len (length here) is
> the actual buffer size. But for kvm I think this works better since
> length is tied to the guest_ipa.

What I intended to say is that the storage in memory patches ptrace (one
byte per tag). In the kernel (e.g. for swap) we store it more compactly
(two tags per byte). As you say I think having 'length' match
'guest_ipa' is sensible rather than deducing it from the buffer size.

>> +
>> +If an error occurs before any data is copied then a negative error code is
>> +returned. If some tags have been copied before an error occurs then the number
>> +of bytes successfully copied is returned. If the call completes successfully
>> +then ``length`` is returned.
>> +
>>  5. The kvm_run structure
>>  ========================
>>  
>> @@ -6362,6 +6398,27 @@ default.
>>  
>>  See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
>>  
>> +7.26 KVM_CAP_ARM_MTE
>> +--------------------
>> +
>> +:Architectures: arm64
>> +:Parameters: none
>> +
>> +This capability indicates that KVM (and the hardware) supports exposing the
>> +Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
>> +VMM before creating any VCPUs to allow the guest access. Note that MTE is only
>> +available to a guest running in AArch64 mode and enabling this capability will
>> +cause attempts to create AArch32 VCPUs to fail.
>> +
>> +When enabled the guest is able to access tags associated with any memory given
>> +to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
>> +that the tags are maintained during swap or hibernation of the host; however
> 
> I'd drop PG_mte_tagged here, that's just how the implementation handles
> it, not necessary for describing the API. You can just say "KVM will
> ensure that the tags are maintained during swap or hibernation of the
> host"

Good point - will update with your wording.

>> +the VMM needs to manually save/restore the tags as appropriate if the VM is
>> +migrated.
>> +
>> +When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>> +perform a bulk copy of tags to/from the guest.
>> +
>>  8. Other capabilities.
>>  ======================
>>  
>> -- 
>> 2.20.1
> 
> Otherwise, feel free to add:
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 

Thanks!

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-07 11:08 ` [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags Steven Price
@ 2021-06-09 10:30   ` Marc Zyngier
  2021-06-09 10:51     ` Steven Price
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-06-09 10:30 UTC (permalink / raw)
  To: Steven Price
  Cc: Catalin Marinas, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, 07 Jun 2021 12:08:09 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
> before restoring/zeroing the MTE tags. However if another thread were to
> race and attempt to sync the tags on the same page before the first
> thread had completed restoring/zeroing then it would see the flag is
> already set and continue without waiting. This would potentially expose
> the previous contents of the tags to user space, and cause any updates
> that user space makes before the restoring/zeroing has completed to
> potentially be lost.
> 
> Since this code is run from atomic contexts we can't just lock the page
> during the process. Instead implement a new (global) spinlock to protect
> the mte_sync_page_tags() function.
> 
> Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/arm64/kernel/mte.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 125a10e413e9..a3583a7fd400 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -25,6 +25,7 @@
>  u64 gcr_kernel_excl __ro_after_init;
>  
>  static bool report_fault_once = true;
> +static DEFINE_SPINLOCK(tag_sync_lock);
>  
>  #ifdef CONFIG_KASAN_HW_TAGS
>  /* Whether the MTE asynchronous mode is enabled. */
> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
>  
>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
>  {
> +	unsigned long flags;
>  	pte_t old_pte = READ_ONCE(*ptep);
>  
> +	spin_lock_irqsave(&tag_sync_lock, flags);

having though a bit more about this after an offline discussion with
Catalin: why can't this lock be made per mm? We can't really share
tags across processes anyway, so this is limited to threads from the
same process.

I'd also like it to be documented that page sharing can only reliably
work with tagging if only one of the mappings is using tags.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-09 10:30   ` Marc Zyngier
@ 2021-06-09 10:51     ` Steven Price
  2021-06-09 11:19       ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-09 10:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 09/06/2021 11:30, Marc Zyngier wrote:
> On Mon, 07 Jun 2021 12:08:09 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
>> before restoring/zeroing the MTE tags. However if another thread were to
>> race and attempt to sync the tags on the same page before the first
>> thread had completed restoring/zeroing then it would see the flag is
>> already set and continue without waiting. This would potentially expose
>> the previous contents of the tags to user space, and cause any updates
>> that user space makes before the restoring/zeroing has completed to
>> potentially be lost.
>>
>> Since this code is run from atomic contexts we can't just lock the page
>> during the process. Instead implement a new (global) spinlock to protect
>> the mte_sync_page_tags() function.
>>
>> Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  arch/arm64/kernel/mte.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index 125a10e413e9..a3583a7fd400 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -25,6 +25,7 @@
>>  u64 gcr_kernel_excl __ro_after_init;
>>  
>>  static bool report_fault_once = true;
>> +static DEFINE_SPINLOCK(tag_sync_lock);
>>  
>>  #ifdef CONFIG_KASAN_HW_TAGS
>>  /* Whether the MTE asynchronous mode is enabled. */
>> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
>>  
>>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
>>  {
>> +	unsigned long flags;
>>  	pte_t old_pte = READ_ONCE(*ptep);
>>  
>> +	spin_lock_irqsave(&tag_sync_lock, flags);
> 
> having though a bit more about this after an offline discussion with
> Catalin: why can't this lock be made per mm? We can't really share
> tags across processes anyway, so this is limited to threads from the
> same process.

Currently there's nothing stopping processes sharing tags (mmap(...,
PROT_MTE, MAP_SHARED)) - I agree making use of this is tricky and it
would have been nice if this had just been prevented from the beginning.

Given the above, clearly the lock can't be per mm and robust.

> I'd also like it to be documented that page sharing can only reliably
> work with tagging if only one of the mappings is using tags.

I'm not entirely clear whether you mean "can only reliably work" to be
"is practically impossible to coordinate tag values", or whether you are
proposing to (purposefully) introduce the race with a per-mm lock? (and
document it).

I guess we could have a per-mm lock and handle the race if user space
screws up with the outcome being lost tags (double clear).

But it feels to me like it could come back to bite in the future since
VM_SHARED|VM_MTE will almost always work and I fear someone will start
using it since it's permitted by the kernel.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-09 10:51     ` Steven Price
@ 2021-06-09 11:19       ` Marc Zyngier
  2021-06-09 17:41         ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-06-09 11:19 UTC (permalink / raw)
  To: Steven Price
  Cc: Catalin Marinas, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Wed, 09 Jun 2021 11:51:34 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> On 09/06/2021 11:30, Marc Zyngier wrote:
> > On Mon, 07 Jun 2021 12:08:09 +0100,
> > Steven Price <steven.price@arm.com> wrote:
> >>
> >> mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
> >> before restoring/zeroing the MTE tags. However if another thread were to
> >> race and attempt to sync the tags on the same page before the first
> >> thread had completed restoring/zeroing then it would see the flag is
> >> already set and continue without waiting. This would potentially expose
> >> the previous contents of the tags to user space, and cause any updates
> >> that user space makes before the restoring/zeroing has completed to
> >> potentially be lost.
> >>
> >> Since this code is run from atomic contexts we can't just lock the page
> >> during the process. Instead implement a new (global) spinlock to protect
> >> the mte_sync_page_tags() function.
> >>
> >> Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
> >> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> >> Signed-off-by: Steven Price <steven.price@arm.com>
> >> ---
> >>  arch/arm64/kernel/mte.c | 20 +++++++++++++++++---
> >>  1 file changed, 17 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> >> index 125a10e413e9..a3583a7fd400 100644
> >> --- a/arch/arm64/kernel/mte.c
> >> +++ b/arch/arm64/kernel/mte.c
> >> @@ -25,6 +25,7 @@
> >>  u64 gcr_kernel_excl __ro_after_init;
> >>  
> >>  static bool report_fault_once = true;
> >> +static DEFINE_SPINLOCK(tag_sync_lock);
> >>  
> >>  #ifdef CONFIG_KASAN_HW_TAGS
> >>  /* Whether the MTE asynchronous mode is enabled. */
> >> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
> >>  
> >>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
> >>  {
> >> +	unsigned long flags;
> >>  	pte_t old_pte = READ_ONCE(*ptep);
> >>  
> >> +	spin_lock_irqsave(&tag_sync_lock, flags);
> > 
> > having though a bit more about this after an offline discussion with
> > Catalin: why can't this lock be made per mm? We can't really share
> > tags across processes anyway, so this is limited to threads from the
> > same process.
> 
> Currently there's nothing stopping processes sharing tags (mmap(...,
> PROT_MTE, MAP_SHARED)) - I agree making use of this is tricky and it
> would have been nice if this had just been prevented from the
> beginning.

I don't think it should be prevented. I think it should be made clear
that it is unreliable and that it will result in tag corruption.

> Given the above, clearly the lock can't be per mm and robust.

I don't think we need to make it robust. The architecture actively
prevents sharing if the tags are also shared, just like we can't
really expect the VMM to share tags with the guest.

> > I'd also like it to be documented that page sharing can only reliably
> > work with tagging if only one of the mappings is using tags.
> 
> I'm not entirely clear whether you mean "can only reliably work" to be
> "is practically impossible to coordinate tag values", or whether you are
> proposing to (purposefully) introduce the race with a per-mm lock? (and
> document it).

The latter. You can obviously communicate your tags to another task,
but this should come with attached restrictions (mlock?).

> I guess we could have a per-mm lock and handle the race if user space
> screws up with the outcome being lost tags (double clear).
> 
> But it feels to me like it could come back to bite in the future since
> VM_SHARED|VM_MTE will almost always work and I fear someone will start
> using it since it's permitted by the kernel.

I'm really worried that performance is going to suck even on a small
system, and this global lock will be heavily contended, even without
considering KVM.

Catalin, any though?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-09 11:19       ` Marc Zyngier
@ 2021-06-09 17:41         ` Catalin Marinas
  2021-06-10  8:05           ` Steven Price
  0 siblings, 1 reply; 20+ messages in thread
From: Catalin Marinas @ 2021-06-09 17:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Steven Price, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Wed, Jun 09, 2021 at 12:19:31PM +0100, Marc Zyngier wrote:
> On Wed, 09 Jun 2021 11:51:34 +0100,
> Steven Price <steven.price@arm.com> wrote:
> > On 09/06/2021 11:30, Marc Zyngier wrote:
> > > On Mon, 07 Jun 2021 12:08:09 +0100,
> > > Steven Price <steven.price@arm.com> wrote:
> > >> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > >> index 125a10e413e9..a3583a7fd400 100644
> > >> --- a/arch/arm64/kernel/mte.c
> > >> +++ b/arch/arm64/kernel/mte.c
> > >> @@ -25,6 +25,7 @@
> > >>  u64 gcr_kernel_excl __ro_after_init;
> > >>  
> > >>  static bool report_fault_once = true;
> > >> +static DEFINE_SPINLOCK(tag_sync_lock);
> > >>  
> > >>  #ifdef CONFIG_KASAN_HW_TAGS
> > >>  /* Whether the MTE asynchronous mode is enabled. */
> > >> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
> > >>  
> > >>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
> > >>  {
> > >> +	unsigned long flags;
> > >>  	pte_t old_pte = READ_ONCE(*ptep);
> > >>  
> > >> +	spin_lock_irqsave(&tag_sync_lock, flags);
> > > 
> > > having though a bit more about this after an offline discussion with
> > > Catalin: why can't this lock be made per mm? We can't really share
> > > tags across processes anyway, so this is limited to threads from the
> > > same process.
> > 
> > Currently there's nothing stopping processes sharing tags (mmap(...,
> > PROT_MTE, MAP_SHARED)) - I agree making use of this is tricky and it
> > would have been nice if this had just been prevented from the
> > beginning.
> 
> I don't think it should be prevented. I think it should be made clear
> that it is unreliable and that it will result in tag corruption.
> 
> > Given the above, clearly the lock can't be per mm and robust.
> 
> I don't think we need to make it robust. The architecture actively
> prevents sharing if the tags are also shared, just like we can't
> really expect the VMM to share tags with the guest.

The architecture does not prevent MTE tag sharing (if that's what you
meant). The tags are just an additional metadata stored in physical
memory. It's not associated with the VA (as in the CHERI-style
capability tags), only checked against the logical tag in a pointer. If
the architecture prevented MAP_SHARED, we would have prevented PROT_MTE
on them (well, it's not too late to do this ;)).

I went with Steven a few times through this exercise, though I tend to
forget it quickly after. The use-case we had in mind when deciding to
allow MTE on shared mappings is something like:

	int fd = memfd_create("jitted-code", MFD_ALLOW_SEALING);
	ftruncate(fd, size);

	void* rw_mapping = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	void* rx_mapping = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);

	close(fd);

The above is within the same mm but you might as well have a fork and
the rx mapping in a child process. Any of the mappings may have
PROT_MTE from the start or set later with mprotect(), though it's
probably the rw one only.

The race we have is in set_pte_at() and the equivalent KVM setting for
stage 2 (in any combination of these). To detect a page that was not
previously tagged (first time mapped, remapped with new attributes), we
have a test like this via set_pte_at():

	if (!test_bit(PG_mte_tagged, &page->flags)) {
		mte_clear_page_tags(page);
		set_bit(PG_mte_tagged, &page->flags);
	}

Calling the above concurrently on a page may cause some tag loss in the
absence of any locking. Note that it only matters if one of the mappings
is writable (to write tags), so this excludes CoW (fork, KSM).

For stage 1, I think almost all cases that end up in set_pte_at() also
have the page->lock held and the ptl. The exception is mprotect() which
doesn't bother to look up each page and lock it, it just takes the ptl
lock. Within the same mm, mprotect() also takes the mmap_lock as a
writer, so it's all fine. The race is between two mms, one doing an
mprotect(PROT_MTE) with the page already mapped in its address space and
the other taking a fault and mapping the page via set_pte_at(). Two
faults in two mms again are fine because of the page lock.

For stage 2, the race between the VMM doing an mprotect() and the VM
going via user_mem_abort() is fine because the former calls
mmap_write_lock() while the latter mmap_read_lock(). So, as in stage 1,
the problem in stage 2 is for a MAP_SHARED region that another process
(maybe spawned by the VMM) calls mprotect(PROT_MTE).

There is another case of MAP_SHARED in the VMM that does not involve
mprotect(). The shared page is mapped on fault in VMM2, initially mapped
as PROT_MTE while VMM1 handles a user_mem_abort() -> hva_to_pfn(). If in
VMM1 the page was not mapped with PROT_MTE but the pte is accessible,
get_user_pages_fast() won't touch the VMM1 pte, so we have the race
between user_mem_abort() in VMM1 and set_pte_at() in VMM2.

So, AFAICT, MAP_SHARED between two different mms is the only problem
(both for stage 1 and stage 2), hence the big lock that Steven
introduced. I don't like the lock either but we couldn't come up with a
better solution.

I'm happy to document that MAP_SHARED may lose tags but we need to be
more precise than that as people may still want to use MTE with shared
memory as per my first example (otherwise we can block it upfront easily
in arch_validate_flags()). Also the exact behaviour in a MAP_SHARED case
may be quite fragile.

An alternative is to set PG_mte_tagged before we even end up in a
set_pte_at() or user_mem_abort() and that's what patch 2 does here for
stage 1. There are other options like finer grained locking via another
page flag (well, inventing a new page lock that doesn't sleep), maybe an
array of locks indexed by a hash of the pfn to mitigate the big lock
(multiple big locks ;)).

With patch 2 in this series (and an equivalent one in Peter's
optimisation series), PG_mte_tagged is set on page allocation in the
majority of the cases, so we'd very rarely get to the big lock path. We
could do a similar trick with a new vma flag which is set by kvm when a
slot is added. Subsequent __alloc_zeroed_user_highpage() just set
PG_mte_tagged. Note that for MAP_SHARED, we'd still end up on the big
lock path (both for stage 1 and stage 2) since they don't use
__alloc_zeroed_user_highpage().

Another big hammer approach is to scrap PG_mte_tagged altogether. That's
what I had in some early versions before Steven added swap support. Any
page is considered tagged, we always zero the tags in clear_page() and
copy tags in copy_page() (is DC GZVA as fast as DC ZVA? What's the
overhead of copying tags? I don't have access to hardware to benchmark).
The page comparison for KSM would also need to compare the tags (I have
such patch already).

Other suggestions are welcomed, including banning MAP_SHARED with
PROT_MTE.

> > > I'd also like it to be documented that page sharing can only reliably
> > > work with tagging if only one of the mappings is using tags.
> > 
> > I'm not entirely clear whether you mean "can only reliably work" to be
> > "is practically impossible to coordinate tag values", or whether you are
> > proposing to (purposefully) introduce the race with a per-mm lock? (and
> > document it).
> 
> The latter. You can obviously communicate your tags to another task,
> but this should come with attached restrictions (mlock?).

No, it wouldn't, you may confuse them with CHERI tags.

> > I guess we could have a per-mm lock and handle the race if user space
> > screws up with the outcome being lost tags (double clear).
> > 
> > But it feels to me like it could come back to bite in the future since
> > VM_SHARED|VM_MTE will almost always work and I fear someone will start
> > using it since it's permitted by the kernel.
> 
> I'm really worried that performance is going to suck even on a small
> system, and this global lock will be heavily contended, even without
> considering KVM.

I agree, as it currently stands, enabling MTE in the guest will always
serialise user_mem_abort() through a big lock shared by all VMs.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-09 17:41         ` Catalin Marinas
@ 2021-06-10  8:05           ` Steven Price
  2021-06-10 14:07             ` Catalin Marinas
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Price @ 2021-06-10  8:05 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier
  Cc: Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

On 09/06/2021 18:41, Catalin Marinas wrote:
> On Wed, Jun 09, 2021 at 12:19:31PM +0100, Marc Zyngier wrote:
>> On Wed, 09 Jun 2021 11:51:34 +0100,
>> Steven Price <steven.price@arm.com> wrote:
>>> On 09/06/2021 11:30, Marc Zyngier wrote:
>>>> On Mon, 07 Jun 2021 12:08:09 +0100,
>>>> Steven Price <steven.price@arm.com> wrote:
>>>>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>>>>> index 125a10e413e9..a3583a7fd400 100644
>>>>> --- a/arch/arm64/kernel/mte.c
>>>>> +++ b/arch/arm64/kernel/mte.c
>>>>> @@ -25,6 +25,7 @@
>>>>>  u64 gcr_kernel_excl __ro_after_init;
>>>>>  
>>>>>  static bool report_fault_once = true;
>>>>> +static DEFINE_SPINLOCK(tag_sync_lock);
>>>>>  
>>>>>  #ifdef CONFIG_KASAN_HW_TAGS
>>>>>  /* Whether the MTE asynchronous mode is enabled. */
>>>>> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
>>>>>  
>>>>>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
>>>>>  {
>>>>> +	unsigned long flags;
>>>>>  	pte_t old_pte = READ_ONCE(*ptep);
>>>>>  
>>>>> +	spin_lock_irqsave(&tag_sync_lock, flags);
>>>>
>>>> having though a bit more about this after an offline discussion with
>>>> Catalin: why can't this lock be made per mm? We can't really share
>>>> tags across processes anyway, so this is limited to threads from the
>>>> same process.
>>>
>>> Currently there's nothing stopping processes sharing tags (mmap(...,
>>> PROT_MTE, MAP_SHARED)) - I agree making use of this is tricky and it
>>> would have been nice if this had just been prevented from the
>>> beginning.
>>
>> I don't think it should be prevented. I think it should be made clear
>> that it is unreliable and that it will result in tag corruption.
>>
>>> Given the above, clearly the lock can't be per mm and robust.
>>
>> I don't think we need to make it robust. The architecture actively
>> prevents sharing if the tags are also shared, just like we can't
>> really expect the VMM to share tags with the guest.
> 
> The architecture does not prevent MTE tag sharing (if that's what you
> meant). The tags are just an additional metadata stored in physical
> memory. It's not associated with the VA (as in the CHERI-style
> capability tags), only checked against the logical tag in a pointer. If
> the architecture prevented MAP_SHARED, we would have prevented PROT_MTE
> on them (well, it's not too late to do this ;)).
> 
> I went with Steven a few times through this exercise, though I tend to
> forget it quickly after. The use-case we had in mind when deciding to
> allow MTE on shared mappings is something like:
> 
> 	int fd = memfd_create("jitted-code", MFD_ALLOW_SEALING);
> 	ftruncate(fd, size);
> 
> 	void* rw_mapping = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 	void* rx_mapping = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);
> 
> 	close(fd);
> 
> The above is within the same mm but you might as well have a fork and
> the rx mapping in a child process. Any of the mappings may have
> PROT_MTE from the start or set later with mprotect(), though it's
> probably the rw one only.
> 
> The race we have is in set_pte_at() and the equivalent KVM setting for
> stage 2 (in any combination of these). To detect a page that was not
> previously tagged (first time mapped, remapped with new attributes), we
> have a test like this via set_pte_at():
> 
> 	if (!test_bit(PG_mte_tagged, &page->flags)) {
> 		mte_clear_page_tags(page);
> 		set_bit(PG_mte_tagged, &page->flags);
> 	}
> 
> Calling the above concurrently on a page may cause some tag loss in the
> absence of any locking. Note that it only matters if one of the mappings
> is writable (to write tags), so this excludes CoW (fork, KSM).
> 
> For stage 1, I think almost all cases that end up in set_pte_at() also
> have the page->lock held and the ptl. The exception is mprotect() which
> doesn't bother to look up each page and lock it, it just takes the ptl
> lock. Within the same mm, mprotect() also takes the mmap_lock as a
> writer, so it's all fine. The race is between two mms, one doing an
> mprotect(PROT_MTE) with the page already mapped in its address space and
> the other taking a fault and mapping the page via set_pte_at(). Two
> faults in two mms again are fine because of the page lock.
> 
> For stage 2, the race between the VMM doing an mprotect() and the VM
> going via user_mem_abort() is fine because the former calls
> mmap_write_lock() while the latter mmap_read_lock(). So, as in stage 1,
> the problem in stage 2 is for a MAP_SHARED region that another process
> (maybe spawned by the VMM) calls mprotect(PROT_MTE).
> 
> There is another case of MAP_SHARED in the VMM that does not involve
> mprotect(). The shared page is mapped on fault in VMM2, initially mapped
> as PROT_MTE while VMM1 handles a user_mem_abort() -> hva_to_pfn(). If in
> VMM1 the page was not mapped with PROT_MTE but the pte is accessible,
> get_user_pages_fast() won't touch the VMM1 pte, so we have the race
> between user_mem_abort() in VMM1 and set_pte_at() in VMM2.
> 
> So, AFAICT, MAP_SHARED between two different mms is the only problem
> (both for stage 1 and stage 2), hence the big lock that Steven
> introduced. I don't like the lock either but we couldn't come up with a
> better solution.
> 
> I'm happy to document that MAP_SHARED may lose tags but we need to be
> more precise than that as people may still want to use MTE with shared
> memory as per my first example (otherwise we can block it upfront easily
> in arch_validate_flags()). Also the exact behaviour in a MAP_SHARED case
> may be quite fragile.
> 
> An alternative is to set PG_mte_tagged before we even end up in a
> set_pte_at() or user_mem_abort() and that's what patch 2 does here for
> stage 1. There are other options like finer grained locking via another
> page flag (well, inventing a new page lock that doesn't sleep), maybe an
> array of locks indexed by a hash of the pfn to mitigate the big lock
> (multiple big locks ;)).
> 
> With patch 2 in this series (and an equivalent one in Peter's
> optimisation series), PG_mte_tagged is set on page allocation in the
> majority of the cases, so we'd very rarely get to the big lock path. We
> could do a similar trick with a new vma flag which is set by kvm when a
> slot is added. Subsequent __alloc_zeroed_user_highpage() just set
> PG_mte_tagged. Note that for MAP_SHARED, we'd still end up on the big
> lock path (both for stage 1 and stage 2) since they don't use
> __alloc_zeroed_user_highpage().
> 
> Another big hammer approach is to scrap PG_mte_tagged altogether. That's
> what I had in some early versions before Steven added swap support. Any
> page is considered tagged, we always zero the tags in clear_page() and
> copy tags in copy_page() (is DC GZVA as fast as DC ZVA? What's the
> overhead of copying tags? I don't have access to hardware to benchmark).
> The page comparison for KSM would also need to compare the tags (I have
> such patch already).
> 
> Other suggestions are welcomed, including banning MAP_SHARED with
> PROT_MTE.

The finer grained locking could also be done using a table of spin_locks
and hashing the struct page pointer to choose a lock. This is what
page_waitqueue() does - but unlike the existing mechanism we need
spinlocks not wait queues.

>>>> I'd also like it to be documented that page sharing can only reliably
>>>> work with tagging if only one of the mappings is using tags.
>>>
>>> I'm not entirely clear whether you mean "can only reliably work" to be
>>> "is practically impossible to coordinate tag values", or whether you are
>>> proposing to (purposefully) introduce the race with a per-mm lock? (and
>>> document it).
>>
>> The latter. You can obviously communicate your tags to another task,
>> but this should come with attached restrictions (mlock?).
> 
> No, it wouldn't, you may confuse them with CHERI tags.
> 
>>> I guess we could have a per-mm lock and handle the race if user space
>>> screws up with the outcome being lost tags (double clear).
>>>
>>> But it feels to me like it could come back to bite in the future since
>>> VM_SHARED|VM_MTE will almost always work and I fear someone will start
>>> using it since it's permitted by the kernel.
>>
>> I'm really worried that performance is going to suck even on a small
>> system, and this global lock will be heavily contended, even without
>> considering KVM.
> 
> I agree, as it currently stands, enabling MTE in the guest will always
> serialise user_mem_abort() through a big lock shared by all VMs.
> 

We only serialise in the case where PG_mte_tagged isn't set. Admittedly
this is likely to be VM startup which I know tends to be an important
use case. One other option there would be to provide a mechanism to the
VMM to proactively clear the tags (e.g. a new VM_MTE_INIT flag which
causes the PG_mte_tagged bit to be set but doesn't affect user space's
PTEs).

The problem I've got is without real hardware to benchmark on it's
impossible to know how big of a performance problem this is and
therefore which of the available options (if any) is appropriate. Some
like the "big hammer" of scrapping PG_mte_tagged could hit the
performance of non-MTE paths - but equally might not have any affect and
would simplify the code.

My belief is that this series is now at the stage where it is
functionally correct and it doesn't impact the performance when MTE
isn't in use, so no impact to existing systems or MTE-enabled systems
where user space/VMs are not using MTE. We also have a number of ideas
to investigate if/when we see performance problems with the MTE use
cases. I'm not sure what else we can do until hardware for benchmarking
is readily available.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags
  2021-06-10  8:05           ` Steven Price
@ 2021-06-10 14:07             ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2021-06-10 14:07 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Thu, Jun 10, 2021 at 09:05:18AM +0100, Steven Price wrote:
> On 09/06/2021 18:41, Catalin Marinas wrote:
> > On Wed, Jun 09, 2021 at 12:19:31PM +0100, Marc Zyngier wrote:
> >> On Wed, 09 Jun 2021 11:51:34 +0100,
> >> Steven Price <steven.price@arm.com> wrote:
> >>> On 09/06/2021 11:30, Marc Zyngier wrote:
> >>>> On Mon, 07 Jun 2021 12:08:09 +0100,
> >>>> Steven Price <steven.price@arm.com> wrote:
> >>>>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> >>>>> index 125a10e413e9..a3583a7fd400 100644
> >>>>> --- a/arch/arm64/kernel/mte.c
> >>>>> +++ b/arch/arm64/kernel/mte.c
> >>>>> @@ -25,6 +25,7 @@
> >>>>>  u64 gcr_kernel_excl __ro_after_init;
> >>>>>  
> >>>>>  static bool report_fault_once = true;
> >>>>> +static DEFINE_SPINLOCK(tag_sync_lock);
> >>>>>  
> >>>>>  #ifdef CONFIG_KASAN_HW_TAGS
> >>>>>  /* Whether the MTE asynchronous mode is enabled. */
> >>>>> @@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
> >>>>>  
> >>>>>  static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
> >>>>>  {
> >>>>> +	unsigned long flags;
> >>>>>  	pte_t old_pte = READ_ONCE(*ptep);
> >>>>>  
> >>>>> +	spin_lock_irqsave(&tag_sync_lock, flags);
> >>>>
> >>>> having though a bit more about this after an offline discussion with
> >>>> Catalin: why can't this lock be made per mm? We can't really share
> >>>> tags across processes anyway, so this is limited to threads from the
> >>>> same process.
> >>>
> >>> Currently there's nothing stopping processes sharing tags (mmap(...,
> >>> PROT_MTE, MAP_SHARED)) - I agree making use of this is tricky and it
> >>> would have been nice if this had just been prevented from the
> >>> beginning.
> >>
> >> I don't think it should be prevented. I think it should be made clear
> >> that it is unreliable and that it will result in tag corruption.
> >>
> >>> Given the above, clearly the lock can't be per mm and robust.
> >>
> >> I don't think we need to make it robust. The architecture actively
> >> prevents sharing if the tags are also shared, just like we can't
> >> really expect the VMM to share tags with the guest.
[...]
> > So, AFAICT, MAP_SHARED between two different mms is the only problem
> > (both for stage 1 and stage 2), hence the big lock that Steven
> > introduced. I don't like the lock either but we couldn't come up with a
> > better solution.
[...]
> > Other suggestions are welcomed, including banning MAP_SHARED with
> > PROT_MTE.
> 
> The finer grained locking could also be done using a table of spin_locks
> and hashing the struct page pointer to choose a lock. This is what
> page_waitqueue() does - but unlike the existing mechanism we need
> spinlocks not wait queues.

That's an option indeed.

> >>> I guess we could have a per-mm lock and handle the race if user space
> >>> screws up with the outcome being lost tags (double clear).
> >>>
> >>> But it feels to me like it could come back to bite in the future since
> >>> VM_SHARED|VM_MTE will almost always work and I fear someone will start
> >>> using it since it's permitted by the kernel.
> >>
> >> I'm really worried that performance is going to suck even on a small
> >> system, and this global lock will be heavily contended, even without
> >> considering KVM.
> > 
> > I agree, as it currently stands, enabling MTE in the guest will always
> > serialise user_mem_abort() through a big lock shared by all VMs.
> 
> We only serialise in the case where PG_mte_tagged isn't set. Admittedly
> this is likely to be VM startup which I know tends to be an important
> use case. One other option there would be to provide a mechanism to the
> VMM to proactively clear the tags (e.g. a new VM_MTE_INIT flag which
> causes the PG_mte_tagged bit to be set but doesn't affect user space's
> PTEs).
> 
> The problem I've got is without real hardware to benchmark on it's
> impossible to know how big of a performance problem this is and
> therefore which of the available options (if any) is appropriate. Some
> like the "big hammer" of scrapping PG_mte_tagged could hit the
> performance of non-MTE paths - but equally might not have any affect and
> would simplify the code.

One reason to keep PG_mte_tagged around for normal user programs,
especially those not using MTE, is that it doesn't affect CoW. Page
copying in two separate loops (for data and tags) does affect
performance. Now, if there's no fork, you may not notice but in the
Android land with zygote, I think it will be quickly observed.

For KVM, I guess we could benchmark a dummy lock array without any MTE
hardware but I don't have a setup around where I can time multiple VMs
starting in parallel.

> My belief is that this series is now at the stage where it is
> functionally correct and it doesn't impact the performance when MTE
> isn't in use, so no impact to existing systems or MTE-enabled systems
> where user space/VMs are not using MTE.

I agree.

> We also have a number of ideas
> to investigate if/when we see performance problems with the MTE use
> cases. I'm not sure what else we can do until hardware for benchmarking
> is readily available.

My concern is that when we hit a bottleneck on that big lock (and I'm
pretty sure we will), do we have a good mitigation or we should rather
break the ABI now and disallow MAP_SHARED? The latter would need to be
done fairly soon as a fix, backported.

I think the possible workarounds are pretty good but maybe we should
also ask user-space people how important MAP_SHARED + PROT_MTE is. If
they don't use it, maybe we shouldn't bother with all this (well,
assuming there's no other race between different mms that we missed).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-06-10 14:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 11:08 [PATCH v14 0/8] MTE support for KVM guest Steven Price
2021-06-07 11:08 ` [PATCH v14 1/8] arm64: mte: Handle race when synchronising tags Steven Price
2021-06-09 10:30   ` Marc Zyngier
2021-06-09 10:51     ` Steven Price
2021-06-09 11:19       ` Marc Zyngier
2021-06-09 17:41         ` Catalin Marinas
2021-06-10  8:05           ` Steven Price
2021-06-10 14:07             ` Catalin Marinas
2021-06-07 11:08 ` [PATCH v14 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
2021-06-07 17:07   ` Catalin Marinas
2021-06-09  8:56     ` Steven Price
2021-06-07 11:08 ` [PATCH v14 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-06-07 11:08 ` [PATCH v14 4/8] KVM: arm64: Introduce MTE VM feature Steven Price
2021-06-07 11:08 ` [PATCH v14 5/8] KVM: arm64: Save/restore MTE registers Steven Price
2021-06-07 11:08 ` [PATCH v14 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE Steven Price
2021-06-07 11:08 ` [PATCH v14 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-06-07 17:21   ` Catalin Marinas
2021-06-07 11:08 ` [PATCH v14 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-06-07 17:32   ` Catalin Marinas
2021-06-09  8:56     ` Steven Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).