All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 0/8] MTE support for KVM guest
@ 2021-05-24 10:45 ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

Changes since v12[1]:

 * Use DEFINE_SPINLOCK() to define tag_sync_lock.

 * Refactor mte_sync_tags() to take the old PTE value rather than a
   pointer to the PTE. The checks in set_pte_at() are also strengthed to
   avoid the function call when possible.

 * Fix prefix on a couple of patches ("arm64: kvm" -> "KVM: arm64").

 * Reorder arguments to sanitise_mte_tags() ("size, pfn" -> "pfn,
   size").

 * Add/improve comments in several places.

 * Report the host's sanitised version of ID_AA64PFR1_EL1:MTE rather
   than making up one for the guest.

 * Insert ISB at the end of mte_switch_to_hyp macro.

 * Drop the definition of CPU_TFSRE0_EL1 in asm-offsets.c as it isn't
   used anymore.

 * Prevent creation of 32 bit vCPUs when MTE is enabled for the guest
   (and document it).

 * Move kvm_vm_ioctl_mte_copy_tags() to guest.c.

 * Reject ZONE_DEVICE memory in kvm_vm_ioctl_mte_copy_tags() and
   correctly handle pages where PG_mte_tagged hasn't been set yet.

 * Define MTE_GRANULES_PER_PAGE rather than open coding the divison
   PAGE_SIZE / MTE_GRANULE_SIZE.

 * Correct the definition of struct kvm_arm_copy_mte_tags in the docs.
   Also avoid mentioning MTE_GRANULE_SIZE as it isn't exported to
   userspace.

[1] https://lore.kernel.org/r/20210517123239.8025-1-steven.price@arm.com/

Catalin Marinas (1):
  arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()

Steven Price (7):
  arm64: mte: Handle race when synchronising tags
  arm64: mte: Sync tags for pages where PTE is untagged
  KVM: arm64: Introduce MTE VM feature
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst             | 52 ++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h       |  3 +
 arch/arm64/include/asm/kvm_host.h          | 12 ++++
 arch/arm64/include/asm/kvm_mte.h           | 68 +++++++++++++++++++
 arch/arm64/include/asm/mte-def.h           |  1 +
 arch/arm64/include/asm/mte.h               |  4 +-
 arch/arm64/include/asm/page.h              |  6 +-
 arch/arm64/include/asm/pgtable.h           | 22 +++++-
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/include/uapi/asm/kvm.h          | 11 +++
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kernel/mte.c                    | 37 ++++++++--
 arch/arm64/kvm/arm.c                       | 16 +++++
 arch/arm64/kvm/guest.c                     | 79 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/entry.S                 |  7 ++
 arch/arm64/kvm/hyp/exception.c             |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++++++
 arch/arm64/kvm/mmu.c                       | 48 ++++++++++++-
 arch/arm64/kvm/reset.c                     |  3 +-
 arch/arm64/kvm/sys_regs.c                  | 32 +++++++--
 arch/arm64/mm/fault.c                      | 21 ++++++
 include/uapi/linux/kvm.h                   |  2 +
 22 files changed, 431 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 0/8] MTE support for KVM guest
@ 2021-05-24 10:45 ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

Changes since v12[1]:

 * Use DEFINE_SPINLOCK() to define tag_sync_lock.

 * Refactor mte_sync_tags() to take the old PTE value rather than a
   pointer to the PTE. The checks in set_pte_at() are also strengthed to
   avoid the function call when possible.

 * Fix prefix on a couple of patches ("arm64: kvm" -> "KVM: arm64").

 * Reorder arguments to sanitise_mte_tags() ("size, pfn" -> "pfn,
   size").

 * Add/improve comments in several places.

 * Report the host's sanitised version of ID_AA64PFR1_EL1:MTE rather
   than making up one for the guest.

 * Insert ISB at the end of mte_switch_to_hyp macro.

 * Drop the definition of CPU_TFSRE0_EL1 in asm-offsets.c as it isn't
   used anymore.

 * Prevent creation of 32 bit vCPUs when MTE is enabled for the guest
   (and document it).

 * Move kvm_vm_ioctl_mte_copy_tags() to guest.c.

 * Reject ZONE_DEVICE memory in kvm_vm_ioctl_mte_copy_tags() and
   correctly handle pages where PG_mte_tagged hasn't been set yet.

 * Define MTE_GRANULES_PER_PAGE rather than open coding the divison
   PAGE_SIZE / MTE_GRANULE_SIZE.

 * Correct the definition of struct kvm_arm_copy_mte_tags in the docs.
   Also avoid mentioning MTE_GRANULE_SIZE as it isn't exported to
   userspace.

[1] https://lore.kernel.org/r/20210517123239.8025-1-steven.price@arm.com/

Catalin Marinas (1):
  arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()

Steven Price (7):
  arm64: mte: Handle race when synchronising tags
  arm64: mte: Sync tags for pages where PTE is untagged
  KVM: arm64: Introduce MTE VM feature
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst             | 52 ++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h       |  3 +
 arch/arm64/include/asm/kvm_host.h          | 12 ++++
 arch/arm64/include/asm/kvm_mte.h           | 68 +++++++++++++++++++
 arch/arm64/include/asm/mte-def.h           |  1 +
 arch/arm64/include/asm/mte.h               |  4 +-
 arch/arm64/include/asm/page.h              |  6 +-
 arch/arm64/include/asm/pgtable.h           | 22 +++++-
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/include/uapi/asm/kvm.h          | 11 +++
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kernel/mte.c                    | 37 ++++++++--
 arch/arm64/kvm/arm.c                       | 16 +++++
 arch/arm64/kvm/guest.c                     | 79 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/entry.S                 |  7 ++
 arch/arm64/kvm/hyp/exception.c             |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++++++
 arch/arm64/kvm/mmu.c                       | 48 ++++++++++++-
 arch/arm64/kvm/reset.c                     |  3 +-
 arch/arm64/kvm/sys_regs.c                  | 32 +++++++--
 arch/arm64/mm/fault.c                      | 21 ++++++
 include/uapi/linux/kvm.h                   |  2 +
 22 files changed, 431 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 0/8] MTE support for KVM guest
@ 2021-05-24 10:45 ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

Changes since v12[1]:

 * Use DEFINE_SPINLOCK() to define tag_sync_lock.

 * Refactor mte_sync_tags() to take the old PTE value rather than a
   pointer to the PTE. The checks in set_pte_at() are also strengthed to
   avoid the function call when possible.

 * Fix prefix on a couple of patches ("arm64: kvm" -> "KVM: arm64").

 * Reorder arguments to sanitise_mte_tags() ("size, pfn" -> "pfn,
   size").

 * Add/improve comments in several places.

 * Report the host's sanitised version of ID_AA64PFR1_EL1:MTE rather
   than making up one for the guest.

 * Insert ISB at the end of mte_switch_to_hyp macro.

 * Drop the definition of CPU_TFSRE0_EL1 in asm-offsets.c as it isn't
   used anymore.

 * Prevent creation of 32 bit vCPUs when MTE is enabled for the guest
   (and document it).

 * Move kvm_vm_ioctl_mte_copy_tags() to guest.c.

 * Reject ZONE_DEVICE memory in kvm_vm_ioctl_mte_copy_tags() and
   correctly handle pages where PG_mte_tagged hasn't been set yet.

 * Define MTE_GRANULES_PER_PAGE rather than open coding the divison
   PAGE_SIZE / MTE_GRANULE_SIZE.

 * Correct the definition of struct kvm_arm_copy_mte_tags in the docs.
   Also avoid mentioning MTE_GRANULE_SIZE as it isn't exported to
   userspace.

[1] https://lore.kernel.org/r/20210517123239.8025-1-steven.price@arm.com/

Catalin Marinas (1):
  arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()

Steven Price (7):
  arm64: mte: Handle race when synchronising tags
  arm64: mte: Sync tags for pages where PTE is untagged
  KVM: arm64: Introduce MTE VM feature
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst             | 52 ++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h       |  3 +
 arch/arm64/include/asm/kvm_host.h          | 12 ++++
 arch/arm64/include/asm/kvm_mte.h           | 68 +++++++++++++++++++
 arch/arm64/include/asm/mte-def.h           |  1 +
 arch/arm64/include/asm/mte.h               |  4 +-
 arch/arm64/include/asm/page.h              |  6 +-
 arch/arm64/include/asm/pgtable.h           | 22 +++++-
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/include/uapi/asm/kvm.h          | 11 +++
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kernel/mte.c                    | 37 ++++++++--
 arch/arm64/kvm/arm.c                       | 16 +++++
 arch/arm64/kvm/guest.c                     | 79 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/entry.S                 |  7 ++
 arch/arm64/kvm/hyp/exception.c             |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++++++
 arch/arm64/kvm/mmu.c                       | 48 ++++++++++++-
 arch/arm64/kvm/reset.c                     |  3 +-
 arch/arm64/kvm/sys_regs.c                  | 32 +++++++--
 arch/arm64/mm/fault.c                      | 21 ++++++
 include/uapi/linux/kvm.h                   |  2 +
 22 files changed, 431 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 0/8] MTE support for KVM guest
@ 2021-05-24 10:45 ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

This series adds support for using the Arm Memory Tagging Extensions
(MTE) in a KVM guest.

Changes since v12[1]:

 * Use DEFINE_SPINLOCK() to define tag_sync_lock.

 * Refactor mte_sync_tags() to take the old PTE value rather than a
   pointer to the PTE. The checks in set_pte_at() are also strengthed to
   avoid the function call when possible.

 * Fix prefix on a couple of patches ("arm64: kvm" -> "KVM: arm64").

 * Reorder arguments to sanitise_mte_tags() ("size, pfn" -> "pfn,
   size").

 * Add/improve comments in several places.

 * Report the host's sanitised version of ID_AA64PFR1_EL1:MTE rather
   than making up one for the guest.

 * Insert ISB at the end of mte_switch_to_hyp macro.

 * Drop the definition of CPU_TFSRE0_EL1 in asm-offsets.c as it isn't
   used anymore.

 * Prevent creation of 32 bit vCPUs when MTE is enabled for the guest
   (and document it).

 * Move kvm_vm_ioctl_mte_copy_tags() to guest.c.

 * Reject ZONE_DEVICE memory in kvm_vm_ioctl_mte_copy_tags() and
   correctly handle pages where PG_mte_tagged hasn't been set yet.

 * Define MTE_GRANULES_PER_PAGE rather than open coding the divison
   PAGE_SIZE / MTE_GRANULE_SIZE.

 * Correct the definition of struct kvm_arm_copy_mte_tags in the docs.
   Also avoid mentioning MTE_GRANULE_SIZE as it isn't exported to
   userspace.

[1] https://lore.kernel.org/r/20210517123239.8025-1-steven.price@arm.com/

Catalin Marinas (1):
  arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()

Steven Price (7):
  arm64: mte: Handle race when synchronising tags
  arm64: mte: Sync tags for pages where PTE is untagged
  KVM: arm64: Introduce MTE VM feature
  KVM: arm64: Save/restore MTE registers
  KVM: arm64: Expose KVM_ARM_CAP_MTE
  KVM: arm64: ioctl to fetch/store tags in a guest
  KVM: arm64: Document MTE capability and ioctl

 Documentation/virt/kvm/api.rst             | 52 ++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h       |  3 +
 arch/arm64/include/asm/kvm_host.h          | 12 ++++
 arch/arm64/include/asm/kvm_mte.h           | 68 +++++++++++++++++++
 arch/arm64/include/asm/mte-def.h           |  1 +
 arch/arm64/include/asm/mte.h               |  4 +-
 arch/arm64/include/asm/page.h              |  6 +-
 arch/arm64/include/asm/pgtable.h           | 22 +++++-
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/include/uapi/asm/kvm.h          | 11 +++
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kernel/mte.c                    | 37 ++++++++--
 arch/arm64/kvm/arm.c                       | 16 +++++
 arch/arm64/kvm/guest.c                     | 79 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/entry.S                 |  7 ++
 arch/arm64/kvm/hyp/exception.c             |  3 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 ++++++
 arch/arm64/kvm/mmu.c                       | 48 ++++++++++++-
 arch/arm64/kvm/reset.c                     |  3 +-
 arch/arm64/kvm/sys_regs.c                  | 32 +++++++--
 arch/arm64/mm/fault.c                      | 21 ++++++
 include/uapi/linux/kvm.h                   |  2 +
 22 files changed, 431 insertions(+), 22 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 1/8] arm64: mte: Handle race when synchronising tags
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
before restoring/zeroing the MTE tags. However if another thread were to
race and attempt to sync the tags on the same page before the first
thread had completed restoring/zeroing then it would see the flag is
already set and continue without waiting. This would potentially expose
the previous contents of the tags to user space, and cause any updates
that user space makes before the restoring/zeroing has completed to
potentially be lost.

Since this code is run from atomic contexts we can't just lock the page
during the process. Instead implement a new (global) spinlock to protect
the mte_sync_page_tags() function.

Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
---
 arch/arm64/kernel/mte.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..45fac0e9c323 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -25,6 +25,7 @@
 u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
+static DEFINE_SPINLOCK(tag_sync_lock);
 
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
@@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 
 static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 {
+	unsigned long flags;
 	pte_t old_pte = READ_ONCE(*ptep);
 
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
-			return;
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
+			goto out;
+		}
 	}
 
 	page_kasan_tag_reset(page);
@@ -53,6 +63,10 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	 */
 	smp_wmb();
 	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -60,10 +74,11 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
+	bool pte_is_tagged = pte_tagged(pte);
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+		if (!test_bit(PG_mte_tagged, &page->flags))
 			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 1/8] arm64: mte: Handle race when synchronising tags
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
before restoring/zeroing the MTE tags. However if another thread were to
race and attempt to sync the tags on the same page before the first
thread had completed restoring/zeroing then it would see the flag is
already set and continue without waiting. This would potentially expose
the previous contents of the tags to user space, and cause any updates
that user space makes before the restoring/zeroing has completed to
potentially be lost.

Since this code is run from atomic contexts we can't just lock the page
during the process. Instead implement a new (global) spinlock to protect
the mte_sync_page_tags() function.

Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
---
 arch/arm64/kernel/mte.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..45fac0e9c323 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -25,6 +25,7 @@
 u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
+static DEFINE_SPINLOCK(tag_sync_lock);
 
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
@@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 
 static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 {
+	unsigned long flags;
 	pte_t old_pte = READ_ONCE(*ptep);
 
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
-			return;
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
+			goto out;
+		}
 	}
 
 	page_kasan_tag_reset(page);
@@ -53,6 +63,10 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	 */
 	smp_wmb();
 	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -60,10 +74,11 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
+	bool pte_is_tagged = pte_tagged(pte);
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+		if (!test_bit(PG_mte_tagged, &page->flags))
 			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 1/8] arm64: mte: Handle race when synchronising tags
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
before restoring/zeroing the MTE tags. However if another thread were to
race and attempt to sync the tags on the same page before the first
thread had completed restoring/zeroing then it would see the flag is
already set and continue without waiting. This would potentially expose
the previous contents of the tags to user space, and cause any updates
that user space makes before the restoring/zeroing has completed to
potentially be lost.

Since this code is run from atomic contexts we can't just lock the page
during the process. Instead implement a new (global) spinlock to protect
the mte_sync_page_tags() function.

Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
---
 arch/arm64/kernel/mte.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..45fac0e9c323 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -25,6 +25,7 @@
 u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
+static DEFINE_SPINLOCK(tag_sync_lock);
 
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
@@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 
 static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 {
+	unsigned long flags;
 	pte_t old_pte = READ_ONCE(*ptep);
 
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
-			return;
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
+			goto out;
+		}
 	}
 
 	page_kasan_tag_reset(page);
@@ -53,6 +63,10 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	 */
 	smp_wmb();
 	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -60,10 +74,11 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
+	bool pte_is_tagged = pte_tagged(pte);
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+		if (!test_bit(PG_mte_tagged, &page->flags))
 			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 1/8] arm64: mte: Handle race when synchronising tags
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

mte_sync_tags() used test_and_set_bit() to set the PG_mte_tagged flag
before restoring/zeroing the MTE tags. However if another thread were to
race and attempt to sync the tags on the same page before the first
thread had completed restoring/zeroing then it would see the flag is
already set and continue without waiting. This would potentially expose
the previous contents of the tags to user space, and cause any updates
that user space makes before the restoring/zeroing has completed to
potentially be lost.

Since this code is run from atomic contexts we can't just lock the page
during the process. Instead implement a new (global) spinlock to protect
the mte_sync_page_tags() function.

Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
---
 arch/arm64/kernel/mte.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 125a10e413e9..45fac0e9c323 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -25,6 +25,7 @@
 u64 gcr_kernel_excl __ro_after_init;
 
 static bool report_fault_once = true;
+static DEFINE_SPINLOCK(tag_sync_lock);
 
 #ifdef CONFIG_KASAN_HW_TAGS
 /* Whether the MTE asynchronous mode is enabled. */
@@ -34,13 +35,22 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 
 static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 {
+	unsigned long flags;
 	pte_t old_pte = READ_ONCE(*ptep);
 
+	spin_lock_irqsave(&tag_sync_lock, flags);
+
+	/* Recheck with the lock held */
+	if (test_bit(PG_mte_tagged, &page->flags))
+		goto out;
+
 	if (check_swap && is_swap_pte(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
-		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
-			return;
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) {
+			set_bit(PG_mte_tagged, &page->flags);
+			goto out;
+		}
 	}
 
 	page_kasan_tag_reset(page);
@@ -53,6 +63,10 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	 */
 	smp_wmb();
 	mte_clear_page_tags(page_address(page));
+	set_bit(PG_mte_tagged, &page->flags);
+
+out:
+	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -60,10 +74,11 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
+	bool pte_is_tagged = pte_tagged(pte);
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
-		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+		if (!test_bit(PG_mte_tagged, &page->flags))
 			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

From: Catalin Marinas <catalin.marinas@arm.com>

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags under a spinlock to avoid a
race on the page->flags. In order to optimise the lock, clear the page
tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
have VM_MTE set.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/page.h |  6 ++++--
 arch/arm64/mm/fault.c         | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..97853570d0f1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h>
 #include <asm/pgtable-types.h>
 
 struct page;
@@ -28,8 +29,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 871c82ab0a30..5a03428e97f3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -921,3 +921,24 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+/*
+ * Used during anonymous page fault handling.
+ */
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr)
+{
+	struct page *page;
+	bool tagged = system_supports_mte() && (vma->vm_flags & VM_MTE);
+
+	page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma,
+			      vaddr);
+	if (tagged && page) {
+		mte_clear_page_tags(page_address(page));
+		page_kasan_tag_reset(page);
+		set_bit(PG_mte_tagged, &page->flags);
+	}
+
+	return page;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

From: Catalin Marinas <catalin.marinas@arm.com>

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags under a spinlock to avoid a
race on the page->flags. In order to optimise the lock, clear the page
tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
have VM_MTE set.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/page.h |  6 ++++--
 arch/arm64/mm/fault.c         | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..97853570d0f1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h>
 #include <asm/pgtable-types.h>
 
 struct page;
@@ -28,8 +29,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 871c82ab0a30..5a03428e97f3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -921,3 +921,24 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+/*
+ * Used during anonymous page fault handling.
+ */
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr)
+{
+	struct page *page;
+	bool tagged = system_supports_mte() && (vma->vm_flags & VM_MTE);
+
+	page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma,
+			      vaddr);
+	if (tagged && page) {
+		mte_clear_page_tags(page_address(page));
+		page_kasan_tag_reset(page);
+		set_bit(PG_mte_tagged, &page->flags);
+	}
+
+	return page;
+}
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

From: Catalin Marinas <catalin.marinas@arm.com>

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags under a spinlock to avoid a
race on the page->flags. In order to optimise the lock, clear the page
tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
have VM_MTE set.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/page.h |  6 ++++--
 arch/arm64/mm/fault.c         | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..97853570d0f1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h>
 #include <asm/pgtable-types.h>
 
 struct page;
@@ -28,8 +29,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 871c82ab0a30..5a03428e97f3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -921,3 +921,24 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+/*
+ * Used during anonymous page fault handling.
+ */
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr)
+{
+	struct page *page;
+	bool tagged = system_supports_mte() && (vma->vm_flags & VM_MTE);
+
+	page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma,
+			      vaddr);
+	if (tagged && page) {
+		mte_clear_page_tags(page_address(page));
+		page_kasan_tag_reset(page);
+		set_bit(PG_mte_tagged, &page->flags);
+	}
+
+	return page;
+}
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage()
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

From: Catalin Marinas <catalin.marinas@arm.com>

Currently, on an anonymous page fault, the kernel allocates a zeroed
page and maps it in user space. If the mapping is tagged (PROT_MTE),
set_pte_at() additionally clears the tags under a spinlock to avoid a
race on the page->flags. In order to optimise the lock, clear the page
tags on allocation in __alloc_zeroed_user_highpage() if the vma flags
have VM_MTE set.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/page.h |  6 ++++--
 arch/arm64/mm/fault.c         | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 012cffc574e8..97853570d0f1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
+#include <linux/types.h>
 #include <asm/pgtable-types.h>
 
 struct page;
@@ -28,8 +29,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
-	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr);
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 871c82ab0a30..5a03428e97f3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -921,3 +921,24 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+/*
+ * Used during anonymous page fault handling.
+ */
+struct page *__alloc_zeroed_user_highpage(gfp_t movableflags,
+					  struct vm_area_struct *vma,
+					  unsigned long vaddr)
+{
+	struct page *page;
+	bool tagged = system_supports_mte() && (vma->vm_flags & VM_MTE);
+
+	page = alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma,
+			      vaddr);
+	if (tagged && page) {
+		mte_clear_page_tags(page_address(page));
+		page_kasan_tag_reset(page);
+		set_bit(PG_mte_tagged, &page->flags);
+	}
+
+	return page;
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages for which pte_access_permitted() returns false
as these will not have been swapped out.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/mte.h     |  4 ++--
 arch/arm64/include/asm/pgtable.h | 22 +++++++++++++++++++---
 arch/arm64/kernel/mte.c          | 16 ++++++++++++----
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..347ef38a35f7 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,7 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
-void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
 void mte_thread_switch(struct task_struct *next);
@@ -53,7 +53,7 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
-static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
 static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b10204e72fc..db5402168841 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,9 +314,25 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
-	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
-		mte_sync_tags(ptep, pte);
+	/*
+	 * If the PTE would provide user space access to the tags associated
+	 * with it then ensure that the MTE tags are synchronised.  Although
+	 * pte_access_permitted() returns false for exec only mappings, they
+	 * don't expose tags (instruction fetches don't check tags).
+	 */
+	if (system_supports_mte() && pte_access_permitted(pte, false) &&
+	    !pte_special(pte)) {
+		pte_t old_pte = READ_ONCE(*ptep);
+		/*
+		 * We only need to synchronise if the new PTE has tags enabled
+		 * or if swapping in (in which case another mapping may have
+		 * set tags in the past even if this PTE isn't tagged).
+		 * (!pte_none() && !pte_present()) is an open coded version of
+		 * is_swap_pte()
+		 */
+		if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+			mte_sync_tags(old_pte, pte);
+	}
 
 	__check_racy_pte_update(mm, ptep, pte);
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 45fac0e9c323..ae0a3c68fece 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -33,10 +33,10 @@ DEFINE_STATIC_KEY_FALSE(mte_async_mode);
 EXPORT_SYMBOL_GPL(mte_async_mode);
 #endif
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t old_pte,
+			       bool check_swap, bool pte_is_tagged)
 {
 	unsigned long flags;
-	pte_t old_pte = READ_ONCE(*ptep);
 
 	spin_lock_irqsave(&tag_sync_lock, flags);
 
@@ -53,6 +53,9 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 		}
 	}
 
+	if (!pte_is_tagged)
+		goto out;
+
 	page_kasan_tag_reset(page);
 	/*
 	 * We need smp_wmb() in between setting the flags and clearing the
@@ -69,17 +72,22 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
-void mte_sync_tags(pte_t *ptep, pte_t pte)
+void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
 	bool pte_is_tagged = pte_tagged(pte);
 
+	/* Early out if there's nothing to do */
+	if (!check_swap && !pte_is_tagged)
+		return;
+
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+			mte_sync_page_tags(page, old_pte, check_swap,
+					   pte_is_tagged);
 	}
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages for which pte_access_permitted() returns false
as these will not have been swapped out.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/mte.h     |  4 ++--
 arch/arm64/include/asm/pgtable.h | 22 +++++++++++++++++++---
 arch/arm64/kernel/mte.c          | 16 ++++++++++++----
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..347ef38a35f7 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,7 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
-void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
 void mte_thread_switch(struct task_struct *next);
@@ -53,7 +53,7 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
-static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
 static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b10204e72fc..db5402168841 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,9 +314,25 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
-	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
-		mte_sync_tags(ptep, pte);
+	/*
+	 * If the PTE would provide user space access to the tags associated
+	 * with it then ensure that the MTE tags are synchronised.  Although
+	 * pte_access_permitted() returns false for exec only mappings, they
+	 * don't expose tags (instruction fetches don't check tags).
+	 */
+	if (system_supports_mte() && pte_access_permitted(pte, false) &&
+	    !pte_special(pte)) {
+		pte_t old_pte = READ_ONCE(*ptep);
+		/*
+		 * We only need to synchronise if the new PTE has tags enabled
+		 * or if swapping in (in which case another mapping may have
+		 * set tags in the past even if this PTE isn't tagged).
+		 * (!pte_none() && !pte_present()) is an open coded version of
+		 * is_swap_pte()
+		 */
+		if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+			mte_sync_tags(old_pte, pte);
+	}
 
 	__check_racy_pte_update(mm, ptep, pte);
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 45fac0e9c323..ae0a3c68fece 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -33,10 +33,10 @@ DEFINE_STATIC_KEY_FALSE(mte_async_mode);
 EXPORT_SYMBOL_GPL(mte_async_mode);
 #endif
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t old_pte,
+			       bool check_swap, bool pte_is_tagged)
 {
 	unsigned long flags;
-	pte_t old_pte = READ_ONCE(*ptep);
 
 	spin_lock_irqsave(&tag_sync_lock, flags);
 
@@ -53,6 +53,9 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 		}
 	}
 
+	if (!pte_is_tagged)
+		goto out;
+
 	page_kasan_tag_reset(page);
 	/*
 	 * We need smp_wmb() in between setting the flags and clearing the
@@ -69,17 +72,22 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
-void mte_sync_tags(pte_t *ptep, pte_t pte)
+void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
 	bool pte_is_tagged = pte_tagged(pte);
 
+	/* Early out if there's nothing to do */
+	if (!check_swap && !pte_is_tagged)
+		return;
+
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+			mte_sync_page_tags(page, old_pte, check_swap,
+					   pte_is_tagged);
 	}
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages for which pte_access_permitted() returns false
as these will not have been swapped out.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/mte.h     |  4 ++--
 arch/arm64/include/asm/pgtable.h | 22 +++++++++++++++++++---
 arch/arm64/kernel/mte.c          | 16 ++++++++++++----
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..347ef38a35f7 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,7 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
-void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
 void mte_thread_switch(struct task_struct *next);
@@ -53,7 +53,7 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
-static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
 static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b10204e72fc..db5402168841 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,9 +314,25 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
-	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
-		mte_sync_tags(ptep, pte);
+	/*
+	 * If the PTE would provide user space access to the tags associated
+	 * with it then ensure that the MTE tags are synchronised.  Although
+	 * pte_access_permitted() returns false for exec only mappings, they
+	 * don't expose tags (instruction fetches don't check tags).
+	 */
+	if (system_supports_mte() && pte_access_permitted(pte, false) &&
+	    !pte_special(pte)) {
+		pte_t old_pte = READ_ONCE(*ptep);
+		/*
+		 * We only need to synchronise if the new PTE has tags enabled
+		 * or if swapping in (in which case another mapping may have
+		 * set tags in the past even if this PTE isn't tagged).
+		 * (!pte_none() && !pte_present()) is an open coded version of
+		 * is_swap_pte()
+		 */
+		if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+			mte_sync_tags(old_pte, pte);
+	}
 
 	__check_racy_pte_update(mm, ptep, pte);
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 45fac0e9c323..ae0a3c68fece 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -33,10 +33,10 @@ DEFINE_STATIC_KEY_FALSE(mte_async_mode);
 EXPORT_SYMBOL_GPL(mte_async_mode);
 #endif
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t old_pte,
+			       bool check_swap, bool pte_is_tagged)
 {
 	unsigned long flags;
-	pte_t old_pte = READ_ONCE(*ptep);
 
 	spin_lock_irqsave(&tag_sync_lock, flags);
 
@@ -53,6 +53,9 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 		}
 	}
 
+	if (!pte_is_tagged)
+		goto out;
+
 	page_kasan_tag_reset(page);
 	/*
 	 * We need smp_wmb() in between setting the flags and clearing the
@@ -69,17 +72,22 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
-void mte_sync_tags(pte_t *ptep, pte_t pte)
+void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
 	bool pte_is_tagged = pte_tagged(pte);
 
+	/* Early out if there's nothing to do */
+	if (!check_swap && !pte_is_tagged)
+		return;
+
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+			mte_sync_page_tags(page, old_pte, check_swap,
+					   pte_is_tagged);
 	}
 }
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A KVM guest could store tags in a page even if the VMM hasn't mapped
the page with PROT_MTE. So when restoring pages from swap we will
need to check to see if there are any saved tags even if !pte_tagged().

However don't check pages for which pte_access_permitted() returns false
as these will not have been swapped out.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/mte.h     |  4 ++--
 arch/arm64/include/asm/pgtable.h | 22 +++++++++++++++++++---
 arch/arm64/kernel/mte.c          | 16 ++++++++++++----
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index bc88a1ced0d7..347ef38a35f7 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -37,7 +37,7 @@ void mte_free_tag_storage(char *storage);
 /* track which pages have valid allocation tags */
 #define PG_mte_tagged	PG_arch_2
 
-void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_sync_tags(pte_t old_pte, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void mte_thread_init_user(void);
 void mte_thread_switch(struct task_struct *next);
@@ -53,7 +53,7 @@ int mte_ptrace_copy_tags(struct task_struct *child, long request,
 /* unused if !CONFIG_ARM64_MTE, silence the compiler */
 #define PG_mte_tagged	0
 
-static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 }
 static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b10204e72fc..db5402168841 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -314,9 +314,25 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
-	if (system_supports_mte() &&
-	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
-		mte_sync_tags(ptep, pte);
+	/*
+	 * If the PTE would provide user space access to the tags associated
+	 * with it then ensure that the MTE tags are synchronised.  Although
+	 * pte_access_permitted() returns false for exec only mappings, they
+	 * don't expose tags (instruction fetches don't check tags).
+	 */
+	if (system_supports_mte() && pte_access_permitted(pte, false) &&
+	    !pte_special(pte)) {
+		pte_t old_pte = READ_ONCE(*ptep);
+		/*
+		 * We only need to synchronise if the new PTE has tags enabled
+		 * or if swapping in (in which case another mapping may have
+		 * set tags in the past even if this PTE isn't tagged).
+		 * (!pte_none() && !pte_present()) is an open coded version of
+		 * is_swap_pte()
+		 */
+		if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
+			mte_sync_tags(old_pte, pte);
+	}
 
 	__check_racy_pte_update(mm, ptep, pte);
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 45fac0e9c323..ae0a3c68fece 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -33,10 +33,10 @@ DEFINE_STATIC_KEY_FALSE(mte_async_mode);
 EXPORT_SYMBOL_GPL(mte_async_mode);
 #endif
 
-static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+static void mte_sync_page_tags(struct page *page, pte_t old_pte,
+			       bool check_swap, bool pte_is_tagged)
 {
 	unsigned long flags;
-	pte_t old_pte = READ_ONCE(*ptep);
 
 	spin_lock_irqsave(&tag_sync_lock, flags);
 
@@ -53,6 +53,9 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 		}
 	}
 
+	if (!pte_is_tagged)
+		goto out;
+
 	page_kasan_tag_reset(page);
 	/*
 	 * We need smp_wmb() in between setting the flags and clearing the
@@ -69,17 +72,22 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
 	spin_unlock_irqrestore(&tag_sync_lock, flags);
 }
 
-void mte_sync_tags(pte_t *ptep, pte_t pte)
+void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
 	bool check_swap = nr_pages == 1;
 	bool pte_is_tagged = pte_tagged(pte);
 
+	/* Early out if there's nothing to do */
+	if (!check_swap && !pte_is_tagged)
+		return;
+
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_bit(PG_mte_tagged, &page->flags))
-			mte_sync_page_tags(page, ptep, check_swap);
+			mte_sync_page_tags(page, old_pte, check_swap,
+					   pte_is_tagged);
 	}
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++
 arch/arm64/include/asm/kvm_host.h    |  3 ++
 arch/arm64/kvm/hyp/exception.c       |  3 +-
 arch/arm64/kvm/mmu.c                 | 48 +++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c            |  7 ++++
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (kvm_has_mte(vcpu->kvm))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..afaa5333f0e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+	if (kvm_has_mte(vcpu->kvm))
+		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..226035cf7d6c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 	return PAGE_SIZE;
 }
 
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+			     unsigned long size)
+{
+	if (kvm_has_mte(kvm)) {
+		/*
+		 * The page will be mapped in stage 2 as Normal Cacheable, so
+		 * the VM will be able to see the page's tags and therefore
+		 * they must be initialised first. If PG_mte_tagged is set,
+		 * tags have already been initialised.
+		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+		 * that may not support tags.
+		 */
+		unsigned long i, nr_pages = size >> PAGE_SHIFT;
+		struct page *page = pfn_to_online_page(pfn);
+
+		if (!page)
+			return -EFAULT;
+
+		for (i = 0; i < nr_pages; i++, page++) {
+			/*
+			 * There is a potential (but very unlikely) race
+			 * between two VMs which are sharing a physical page
+			 * entering this at the same time. However by splitting
+			 * the test/set the only risk is tags being overwritten
+			 * by the mte_clear_page_tags() call.
+			 */
+			if (!test_bit(PG_mte_tagged, &page->flags)) {
+				mte_clear_page_tags(page_address(page));
+				set_bit(PG_mte_tagged, &page->flags);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
+	if (fault_status != FSC_PERM && !device) {
+		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+		if (ret)
+			goto out_unlock;
+
 		clean_dcache_guest_page(pfn, vma_pagesize);
+	}
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
@@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	kvm_pfn_t pfn = pte_pfn(range->pte);
+	int ret;
 
 	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	WARN_ON(range->end - range->start != 1);
 
+	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+	if (ret)
+		return false;
+
 	/*
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76ea2800c33e..4a98902eaf1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,13 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		break;
 	case SYS_ID_AA64PFR1_EL1:
 		val &= ~FEATURE(ID_AA64PFR1_MTE);
+		if (kvm_has_mte(vcpu->kvm)) {
+			u64 pfr, mte;
+
+			pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
+			mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT);
+			val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE), mte);
+		}
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..8c95ba0fadda 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MTE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++
 arch/arm64/include/asm/kvm_host.h    |  3 ++
 arch/arm64/kvm/hyp/exception.c       |  3 +-
 arch/arm64/kvm/mmu.c                 | 48 +++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c            |  7 ++++
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (kvm_has_mte(vcpu->kvm))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..afaa5333f0e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+	if (kvm_has_mte(vcpu->kvm))
+		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..226035cf7d6c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 	return PAGE_SIZE;
 }
 
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+			     unsigned long size)
+{
+	if (kvm_has_mte(kvm)) {
+		/*
+		 * The page will be mapped in stage 2 as Normal Cacheable, so
+		 * the VM will be able to see the page's tags and therefore
+		 * they must be initialised first. If PG_mte_tagged is set,
+		 * tags have already been initialised.
+		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+		 * that may not support tags.
+		 */
+		unsigned long i, nr_pages = size >> PAGE_SHIFT;
+		struct page *page = pfn_to_online_page(pfn);
+
+		if (!page)
+			return -EFAULT;
+
+		for (i = 0; i < nr_pages; i++, page++) {
+			/*
+			 * There is a potential (but very unlikely) race
+			 * between two VMs which are sharing a physical page
+			 * entering this at the same time. However by splitting
+			 * the test/set the only risk is tags being overwritten
+			 * by the mte_clear_page_tags() call.
+			 */
+			if (!test_bit(PG_mte_tagged, &page->flags)) {
+				mte_clear_page_tags(page_address(page));
+				set_bit(PG_mte_tagged, &page->flags);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
+	if (fault_status != FSC_PERM && !device) {
+		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+		if (ret)
+			goto out_unlock;
+
 		clean_dcache_guest_page(pfn, vma_pagesize);
+	}
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
@@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	kvm_pfn_t pfn = pte_pfn(range->pte);
+	int ret;
 
 	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	WARN_ON(range->end - range->start != 1);
 
+	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+	if (ret)
+		return false;
+
 	/*
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76ea2800c33e..4a98902eaf1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,13 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		break;
 	case SYS_ID_AA64PFR1_EL1:
 		val &= ~FEATURE(ID_AA64PFR1_MTE);
+		if (kvm_has_mte(vcpu->kvm)) {
+			u64 pfr, mte;
+
+			pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
+			mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT);
+			val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE), mte);
+		}
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..8c95ba0fadda 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MTE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++
 arch/arm64/include/asm/kvm_host.h    |  3 ++
 arch/arm64/kvm/hyp/exception.c       |  3 +-
 arch/arm64/kvm/mmu.c                 | 48 +++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c            |  7 ++++
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (kvm_has_mte(vcpu->kvm))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..afaa5333f0e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+	if (kvm_has_mte(vcpu->kvm))
+		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..226035cf7d6c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 	return PAGE_SIZE;
 }
 
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+			     unsigned long size)
+{
+	if (kvm_has_mte(kvm)) {
+		/*
+		 * The page will be mapped in stage 2 as Normal Cacheable, so
+		 * the VM will be able to see the page's tags and therefore
+		 * they must be initialised first. If PG_mte_tagged is set,
+		 * tags have already been initialised.
+		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+		 * that may not support tags.
+		 */
+		unsigned long i, nr_pages = size >> PAGE_SHIFT;
+		struct page *page = pfn_to_online_page(pfn);
+
+		if (!page)
+			return -EFAULT;
+
+		for (i = 0; i < nr_pages; i++, page++) {
+			/*
+			 * There is a potential (but very unlikely) race
+			 * between two VMs which are sharing a physical page
+			 * entering this at the same time. However by splitting
+			 * the test/set the only risk is tags being overwritten
+			 * by the mte_clear_page_tags() call.
+			 */
+			if (!test_bit(PG_mte_tagged, &page->flags)) {
+				mte_clear_page_tags(page_address(page));
+				set_bit(PG_mte_tagged, &page->flags);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
+	if (fault_status != FSC_PERM && !device) {
+		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+		if (ret)
+			goto out_unlock;
+
 		clean_dcache_guest_page(pfn, vma_pagesize);
+	}
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
@@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	kvm_pfn_t pfn = pte_pfn(range->pte);
+	int ret;
 
 	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	WARN_ON(range->end - range->start != 1);
 
+	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+	if (ret)
+		return false;
+
 	/*
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76ea2800c33e..4a98902eaf1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,13 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		break;
 	case SYS_ID_AA64PFR1_EL1:
 		val &= ~FEATURE(ID_AA64PFR1_MTE);
+		if (kvm_has_mte(vcpu->kvm)) {
+			u64 pfr, mte;
+
+			pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
+			mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT);
+			val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE), mte);
+		}
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..8c95ba0fadda 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MTE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This will expose the feature to the guest and automatically
tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
storage) to ensure that the guest cannot see stale tags, and so that
the tags are correctly saved/restored across swap.

Actually exposing the new capability to user space happens in a later
patch.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++
 arch/arm64/include/asm/kvm_host.h    |  3 ++
 arch/arm64/kvm/hyp/exception.c       |  3 +-
 arch/arm64/kvm/mmu.c                 | 48 +++++++++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c            |  7 ++++
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f612c090f2e4..6bf776c2399c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
 	    vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID2;
+
+	if (kvm_has_mte(vcpu->kvm))
+		vcpu->arch.hcr_el2 |= HCR_ATA;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..afaa5333f0e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -132,6 +132,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+	/* Memory Tagging Extension enabled for the guest */
+	bool mte_enabled;
 };
 
 struct kvm_vcpu_fault_info {
@@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
 	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..56426565600c 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+	if (kvm_has_mte(vcpu->kvm))
+		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..226035cf7d6c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 	return PAGE_SIZE;
 }
 
+static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
+			     unsigned long size)
+{
+	if (kvm_has_mte(kvm)) {
+		/*
+		 * The page will be mapped in stage 2 as Normal Cacheable, so
+		 * the VM will be able to see the page's tags and therefore
+		 * they must be initialised first. If PG_mte_tagged is set,
+		 * tags have already been initialised.
+		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
+		 * that may not support tags.
+		 */
+		unsigned long i, nr_pages = size >> PAGE_SHIFT;
+		struct page *page = pfn_to_online_page(pfn);
+
+		if (!page)
+			return -EFAULT;
+
+		for (i = 0; i < nr_pages; i++, page++) {
+			/*
+			 * There is a potential (but very unlikely) race
+			 * between two VMs which are sharing a physical page
+			 * entering this at the same time. However by splitting
+			 * the test/set the only risk is tags being overwritten
+			 * by the mte_clear_page_tags() call.
+			 */
+			if (!test_bit(PG_mte_tagged, &page->flags)) {
+				mte_clear_page_tags(page_address(page));
+				set_bit(PG_mte_tagged, &page->flags);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
+	if (fault_status != FSC_PERM && !device) {
+		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
+		if (ret)
+			goto out_unlock;
+
 		clean_dcache_guest_page(pfn, vma_pagesize);
+	}
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
@@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	kvm_pfn_t pfn = pte_pfn(range->pte);
+	int ret;
 
 	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	WARN_ON(range->end - range->start != 1);
 
+	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
+	if (ret)
+		return false;
+
 	/*
 	 * We've moved a page around, probably through CoW, so let's treat it
 	 * just like a translation fault and clean the cache to the PoC.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 76ea2800c33e..4a98902eaf1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1047,6 +1047,13 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		break;
 	case SYS_ID_AA64PFR1_EL1:
 		val &= ~FEATURE(ID_AA64PFR1_MTE);
+		if (kvm_has_mte(vcpu->kvm)) {
+			u64 pfr, mte;
+
+			pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
+			mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT);
+			val |= FIELD_PREP(FEATURE(ID_AA64PFR1_MTE), mte);
+		}
 		break;
 	case SYS_ID_AA64ISAR1_EL1:
 		if (!vcpu_has_ptrauth(vcpu))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..8c95ba0fadda 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_MTE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  6 ++
 arch/arm64/include/asm/kvm_mte.h           | 68 ++++++++++++++++++++++
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kvm/hyp/entry.S                 |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++++++
 arch/arm64/kvm/sys_regs.c                  | 22 +++++--
 7 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index afaa5333f0e4..309e36cc1b42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
 	CNTP_CVAL_EL0,
 	CNTP_CTL_EL0,
 
+	/* Memory Tagging Extension registers */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
+	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index 000000000000..eae4bce9e269
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/sysreg.h>
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+	isb
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..347ccac2341e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -651,7 +651,8 @@
 
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
-	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
+	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
+	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..6f0044cb233e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -111,6 +111,8 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e831d3dfd50d..435346ea1504 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -13,6 +13,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mte.h>
 #include <asm/kvm_ptrauth.h>
 
 	.text
@@ -51,6 +52,9 @@ alternative_else_nop_endif
 
 	add	x29, x0, #VCPU_CONTEXT
 
+	// mte_switch_to_guest(g_ctxt, h_ctxt, tmp1)
+	mte_switch_to_guest x29, x1, x2
+
 	// Macro ptrauth_switch_to_guest format:
 	// 	ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
 	// The below macro to restore guest keys is not implemented in C code
@@ -142,6 +146,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 	// when this feature is enabled for kernel code.
 	ptrauth_switch_to_hyp x1, x2, x3, x4, x5
 
+	// mte_switch_to_hyp(g_ctxt, h_ctxt, reg1)
+	mte_switch_to_hyp x1, x2, x3
+
 	// Restore hyp's sp_el0
 	restore_sp_el0 x2, x3
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index cce43bfe158f..de7e14c862e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -14,6 +14,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
@@ -26,6 +27,16 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
+static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->__hyp_running_vcpu;
+
+	if (!vcpu)
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	return kvm_has_mte(kern_hyp_va(vcpu->kvm));
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
@@ -46,6 +57,11 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
+	}
+
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
@@ -107,6 +123,11 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
+
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
 	    ctxt->__hyp_running_vcpu) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a98902eaf1a..440315a556c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1309,6 +1309,20 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN;
+}
+
+#define MTE_REG(name) {				\
+	SYS_DESC(SYS_##name),			\
+	.access = undef_access,			\
+	.reset = reset_unknown,			\
+	.reg = name,				\
+	.visibility = mte_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1477,8 +1491,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), undef_access },
-	{ SYS_DESC(SYS_GCR_EL1), undef_access },
+	MTE_REG(RGSR_EL1),
+	MTE_REG(GCR_EL1),
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TRFCR_EL1), undef_access },
@@ -1505,8 +1519,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), undef_access },
-	{ SYS_DESC(SYS_TFSRE0_EL1), undef_access },
+	MTE_REG(TFSR_EL1),
+	MTE_REG(TFSRE0_EL1),
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  6 ++
 arch/arm64/include/asm/kvm_mte.h           | 68 ++++++++++++++++++++++
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kvm/hyp/entry.S                 |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++++++
 arch/arm64/kvm/sys_regs.c                  | 22 +++++--
 7 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index afaa5333f0e4..309e36cc1b42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
 	CNTP_CVAL_EL0,
 	CNTP_CTL_EL0,
 
+	/* Memory Tagging Extension registers */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
+	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index 000000000000..eae4bce9e269
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/sysreg.h>
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+	isb
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..347ccac2341e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -651,7 +651,8 @@
 
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
-	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
+	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
+	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..6f0044cb233e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -111,6 +111,8 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e831d3dfd50d..435346ea1504 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -13,6 +13,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mte.h>
 #include <asm/kvm_ptrauth.h>
 
 	.text
@@ -51,6 +52,9 @@ alternative_else_nop_endif
 
 	add	x29, x0, #VCPU_CONTEXT
 
+	// mte_switch_to_guest(g_ctxt, h_ctxt, tmp1)
+	mte_switch_to_guest x29, x1, x2
+
 	// Macro ptrauth_switch_to_guest format:
 	// 	ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
 	// The below macro to restore guest keys is not implemented in C code
@@ -142,6 +146,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 	// when this feature is enabled for kernel code.
 	ptrauth_switch_to_hyp x1, x2, x3, x4, x5
 
+	// mte_switch_to_hyp(g_ctxt, h_ctxt, reg1)
+	mte_switch_to_hyp x1, x2, x3
+
 	// Restore hyp's sp_el0
 	restore_sp_el0 x2, x3
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index cce43bfe158f..de7e14c862e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -14,6 +14,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
@@ -26,6 +27,16 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
+static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->__hyp_running_vcpu;
+
+	if (!vcpu)
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	return kvm_has_mte(kern_hyp_va(vcpu->kvm));
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
@@ -46,6 +57,11 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
+	}
+
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
@@ -107,6 +123,11 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
+
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
 	    ctxt->__hyp_running_vcpu) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a98902eaf1a..440315a556c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1309,6 +1309,20 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN;
+}
+
+#define MTE_REG(name) {				\
+	SYS_DESC(SYS_##name),			\
+	.access = undef_access,			\
+	.reset = reset_unknown,			\
+	.reg = name,				\
+	.visibility = mte_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1477,8 +1491,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), undef_access },
-	{ SYS_DESC(SYS_GCR_EL1), undef_access },
+	MTE_REG(RGSR_EL1),
+	MTE_REG(GCR_EL1),
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TRFCR_EL1), undef_access },
@@ -1505,8 +1519,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), undef_access },
-	{ SYS_DESC(SYS_TFSRE0_EL1), undef_access },
+	MTE_REG(TFSR_EL1),
+	MTE_REG(TFSRE0_EL1),
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  6 ++
 arch/arm64/include/asm/kvm_mte.h           | 68 ++++++++++++++++++++++
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kvm/hyp/entry.S                 |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++++++
 arch/arm64/kvm/sys_regs.c                  | 22 +++++--
 7 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index afaa5333f0e4..309e36cc1b42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
 	CNTP_CVAL_EL0,
 	CNTP_CTL_EL0,
 
+	/* Memory Tagging Extension registers */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
+	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index 000000000000..eae4bce9e269
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/sysreg.h>
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+	isb
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..347ccac2341e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -651,7 +651,8 @@
 
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
-	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
+	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
+	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..6f0044cb233e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -111,6 +111,8 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e831d3dfd50d..435346ea1504 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -13,6 +13,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mte.h>
 #include <asm/kvm_ptrauth.h>
 
 	.text
@@ -51,6 +52,9 @@ alternative_else_nop_endif
 
 	add	x29, x0, #VCPU_CONTEXT
 
+	// mte_switch_to_guest(g_ctxt, h_ctxt, tmp1)
+	mte_switch_to_guest x29, x1, x2
+
 	// Macro ptrauth_switch_to_guest format:
 	// 	ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
 	// The below macro to restore guest keys is not implemented in C code
@@ -142,6 +146,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 	// when this feature is enabled for kernel code.
 	ptrauth_switch_to_hyp x1, x2, x3, x4, x5
 
+	// mte_switch_to_hyp(g_ctxt, h_ctxt, reg1)
+	mte_switch_to_hyp x1, x2, x3
+
 	// Restore hyp's sp_el0
 	restore_sp_el0 x2, x3
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index cce43bfe158f..de7e14c862e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -14,6 +14,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
@@ -26,6 +27,16 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
+static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->__hyp_running_vcpu;
+
+	if (!vcpu)
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	return kvm_has_mte(kern_hyp_va(vcpu->kvm));
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
@@ -46,6 +57,11 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
+	}
+
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
@@ -107,6 +123,11 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
+
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
 	    ctxt->__hyp_running_vcpu) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a98902eaf1a..440315a556c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1309,6 +1309,20 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN;
+}
+
+#define MTE_REG(name) {				\
+	SYS_DESC(SYS_##name),			\
+	.access = undef_access,			\
+	.reset = reset_unknown,			\
+	.reg = name,				\
+	.visibility = mte_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1477,8 +1491,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), undef_access },
-	{ SYS_DESC(SYS_GCR_EL1), undef_access },
+	MTE_REG(RGSR_EL1),
+	MTE_REG(GCR_EL1),
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TRFCR_EL1), undef_access },
@@ -1505,8 +1519,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), undef_access },
-	{ SYS_DESC(SYS_TFSRE0_EL1), undef_access },
+	MTE_REG(TFSR_EL1),
+	MTE_REG(TFSRE0_EL1),
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

Define the new system registers that MTE introduces and context switch
them. The MTE feature is still hidden from the ID register as it isn't
supported in a VM yet.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h          |  6 ++
 arch/arm64/include/asm/kvm_mte.h           | 68 ++++++++++++++++++++++
 arch/arm64/include/asm/sysreg.h            |  3 +-
 arch/arm64/kernel/asm-offsets.c            |  2 +
 arch/arm64/kvm/hyp/entry.S                 |  7 +++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 21 +++++++
 arch/arm64/kvm/sys_regs.c                  | 22 +++++--
 7 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_mte.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index afaa5333f0e4..309e36cc1b42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -208,6 +208,12 @@ enum vcpu_sysreg {
 	CNTP_CVAL_EL0,
 	CNTP_CTL_EL0,
 
+	/* Memory Tagging Extension registers */
+	RGSR_EL1,	/* Random Allocation Tag Seed Register */
+	GCR_EL1,	/* Tag Control Register */
+	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
+	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
new file mode 100644
index 000000000000..eae4bce9e269
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_mte.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_KVM_MTE_H
+#define __ASM_KVM_MTE_H
+
+#ifdef __ASSEMBLY__
+
+#include <asm/sysreg.h>
+
+#ifdef CONFIG_ARM64_MTE
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+.L__skip_switch\@:
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+alternative_if_not ARM64_MTE
+	b	.L__skip_switch\@
+alternative_else_nop_endif
+	mrs	\reg1, hcr_el2
+	and	\reg1, \reg1, #(HCR_ATA)
+	cbz	\reg1, .L__skip_switch\@
+
+	mrs_s	\reg1, SYS_RGSR_EL1
+	str	\reg1, [\g_ctxt, #CPU_RGSR_EL1]
+	mrs_s	\reg1, SYS_GCR_EL1
+	str	\reg1, [\g_ctxt, #CPU_GCR_EL1]
+
+	ldr	\reg1, [\h_ctxt, #CPU_RGSR_EL1]
+	msr_s	SYS_RGSR_EL1, \reg1
+	ldr	\reg1, [\h_ctxt, #CPU_GCR_EL1]
+	msr_s	SYS_GCR_EL1, \reg1
+
+	isb
+
+.L__skip_switch\@:
+.endm
+
+#else /* CONFIG_ARM64_MTE */
+
+.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
+.endm
+
+.macro mte_switch_to_hyp g_ctxt, h_ctxt, reg1
+.endm
+
+#endif /* CONFIG_ARM64_MTE */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_KVM_MTE_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..347ccac2341e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -651,7 +651,8 @@
 
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
-	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
+	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
+	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 0cb34ccb6e73..6f0044cb233e 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -111,6 +111,8 @@ int main(void)
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
+  DEFINE(CPU_RGSR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1]));
+  DEFINE(CPU_GCR_EL1,		offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1]));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e831d3dfd50d..435346ea1504 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -13,6 +13,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_mte.h>
 #include <asm/kvm_ptrauth.h>
 
 	.text
@@ -51,6 +52,9 @@ alternative_else_nop_endif
 
 	add	x29, x0, #VCPU_CONTEXT
 
+	// mte_switch_to_guest(g_ctxt, h_ctxt, tmp1)
+	mte_switch_to_guest x29, x1, x2
+
 	// Macro ptrauth_switch_to_guest format:
 	// 	ptrauth_switch_to_guest(guest cxt, tmp1, tmp2, tmp3)
 	// The below macro to restore guest keys is not implemented in C code
@@ -142,6 +146,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 	// when this feature is enabled for kernel code.
 	ptrauth_switch_to_hyp x1, x2, x3, x4, x5
 
+	// mte_switch_to_hyp(g_ctxt, h_ctxt, reg1)
+	mte_switch_to_hyp x1, x2, x3
+
 	// Restore hyp's sp_el0
 	restore_sp_el0 x2, x3
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index cce43bfe158f..de7e14c862e6 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -14,6 +14,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
@@ -26,6 +27,16 @@ static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
+static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->__hyp_running_vcpu;
+
+	if (!vcpu)
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	return kvm_has_mte(kern_hyp_va(vcpu->kvm));
+}
+
 static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
@@ -46,6 +57,11 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR);
+		ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
+	}
+
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
@@ -107,6 +123,11 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
+	if (ctxt_has_mte(ctxt)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR);
+		write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
+	}
+
 	if (!has_vhe() &&
 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
 	    ctxt->__hyp_running_vcpu) {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a98902eaf1a..440315a556c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1309,6 +1309,20 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN;
+}
+
+#define MTE_REG(name) {				\
+	SYS_DESC(SYS_##name),			\
+	.access = undef_access,			\
+	.reset = reset_unknown,			\
+	.reg = name,				\
+	.visibility = mte_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1477,8 +1491,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
 	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
 
-	{ SYS_DESC(SYS_RGSR_EL1), undef_access },
-	{ SYS_DESC(SYS_GCR_EL1), undef_access },
+	MTE_REG(RGSR_EL1),
+	MTE_REG(GCR_EL1),
 
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TRFCR_EL1), undef_access },
@@ -1505,8 +1519,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
 	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
 
-	{ SYS_DESC(SYS_TFSR_EL1), undef_access },
-	{ SYS_DESC(SYS_TFSRE0_EL1), undef_access },
+	MTE_REG(TFSR_EL1),
+	MTE_REG(TFSRE0_EL1),
 
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c      | 9 +++++++++
 arch/arm64/kvm/reset.c    | 3 ++-
 arch/arm64/kvm/sys_regs.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..e89a5e275e25 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -237,6 +243,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..50635eacfa43 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -220,7 +220,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
+			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) ||
+			    vcpu->kvm->arch.mte_enabled) {
 				ret = -EINVAL;
 				goto out;
 			}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 440315a556c2..d4e1c1b1a08d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1312,6 +1312,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (kvm_has_mte(vcpu->kvm))
+		return 0;
+
 	return REG_HIDDEN;
 }
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c      | 9 +++++++++
 arch/arm64/kvm/reset.c    | 3 ++-
 arch/arm64/kvm/sys_regs.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..e89a5e275e25 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -237,6 +243,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..50635eacfa43 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -220,7 +220,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
+			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) ||
+			    vcpu->kvm->arch.mte_enabled) {
 				ret = -EINVAL;
 				goto out;
 			}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 440315a556c2..d4e1c1b1a08d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1312,6 +1312,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (kvm_has_mte(vcpu->kvm))
+		return 0;
+
 	return REG_HIDDEN;
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c      | 9 +++++++++
 arch/arm64/kvm/reset.c    | 3 ++-
 arch/arm64/kvm/sys_regs.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..e89a5e275e25 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -237,6 +243,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..50635eacfa43 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -220,7 +220,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
+			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) ||
+			    vcpu->kvm->arch.mte_enabled) {
 				ret = -EINVAL;
 				goto out;
 			}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 440315a556c2..d4e1c1b1a08d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1312,6 +1312,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (kvm_has_mte(vcpu->kvm))
+		return 0;
+
 	return REG_HIDDEN;
 }
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

It's now safe for the VMM to enable MTE in a guest, so expose the
capability to user space.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/kvm/arm.c      | 9 +++++++++
 arch/arm64/kvm/reset.c    | 3 ++-
 arch/arm64/kvm/sys_regs.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..e89a5e275e25 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -93,6 +93,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_MTE:
+		if (!system_supports_mte() || kvm->created_vcpus)
+			return -EINVAL;
+		r = 0;
+		kvm->arch.mte_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -237,6 +243,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+	case KVM_CAP_ARM_MTE:
+		r = system_supports_mte();
+		break;
 	case KVM_CAP_STEAL_TIME:
 		r = kvm_arm_pvtime_supported();
 		break;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..50635eacfa43 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -220,7 +220,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
+			if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) ||
+			    vcpu->kvm->arch.mte_enabled) {
 				ret = -EINVAL;
 				goto out;
 			}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 440315a556c2..d4e1c1b1a08d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1312,6 +1312,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd)
 {
+	if (kvm_has_mte(vcpu->kvm))
+		return 0;
+
 	return REG_HIDDEN;
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/include/asm/mte-def.h  |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 11 +++++
 arch/arm64/kvm/arm.c              |  7 +++
 arch/arm64/kvm/guest.c            | 79 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 102 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 309e36cc1b42..66b6339df949 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -729,6 +729,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags);
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h
index cf241b0f0a42..626d359b396e 100644
--- a/arch/arm64/include/asm/mte-def.h
+++ b/arch/arm64/include/asm/mte-def.h
@@ -7,6 +7,7 @@
 
 #define MTE_GRANULE_SIZE	UL(16)
 #define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_GRANULES_PER_PAGE	(PAGE_SIZE / MTE_GRANULE_SIZE)
 #define MTE_TAG_SHIFT		56
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..b3edde68bc3e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,17 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST		0
+#define KVM_ARM_TAGS_FROM_GUEST		1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e89a5e275e25..baa33359e477 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_ARM_MTE_COPY_TAGS: {
+		struct kvm_arm_copy_mte_tags copy_tags;
+
+		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
+			return -EFAULT;
+		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..7a1e181eb463 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -995,3 +995,82 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
+
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags)
+{
+	gpa_t guest_ipa = copy_tags->guest_ipa;
+	size_t length = copy_tags->length;
+	void __user *tags = copy_tags->addr;
+	gpa_t gfn;
+	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return -EINVAL;
+
+	if (copy_tags->reserved[0] || copy_tags->reserved[1])
+		return -EINVAL;
+
+	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+		return -EINVAL;
+
+	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	gfn = gpa_to_gfn(guest_ipa);
+
+	mutex_lock(&kvm->slots_lock);
+
+	while (length > 0) {
+		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		void *maddr;
+		unsigned long num_tags;
+		struct page *page;
+
+		if (is_error_noslot_pfn(pfn)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		page = pfn_to_online_page(pfn);
+		if (!page) {
+			/* Reject ZONE_DEVICE memory */
+			ret = -EFAULT;
+			goto out;
+		}
+		maddr = page_address(page);
+
+		if (!write) {
+			if (test_bit(PG_mte_tagged, &page->flags))
+				num_tags = mte_copy_tags_to_user(tags, maddr,
+							MTE_GRANULES_PER_PAGE);
+			else
+				/* No tags in memory, so write zeros */
+				num_tags = MTE_GRANULES_PER_PAGE -
+					clear_user(tags, MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_clean(pfn);
+		} else {
+			num_tags = mte_copy_tags_from_user(maddr, tags,
+							MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_dirty(pfn);
+		}
+
+		if (num_tags != MTE_GRANULES_PER_PAGE) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* Set the flag after checking the write completed fully */
+		if (write)
+			set_bit(PG_mte_tagged, &page->flags);
+
+		gfn++;
+		tags += num_tags;
+		length -= PAGE_SIZE;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8c95ba0fadda..4c011c60d468 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1428,6 +1428,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/include/asm/mte-def.h  |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 11 +++++
 arch/arm64/kvm/arm.c              |  7 +++
 arch/arm64/kvm/guest.c            | 79 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 102 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 309e36cc1b42..66b6339df949 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -729,6 +729,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags);
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h
index cf241b0f0a42..626d359b396e 100644
--- a/arch/arm64/include/asm/mte-def.h
+++ b/arch/arm64/include/asm/mte-def.h
@@ -7,6 +7,7 @@
 
 #define MTE_GRANULE_SIZE	UL(16)
 #define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_GRANULES_PER_PAGE	(PAGE_SIZE / MTE_GRANULE_SIZE)
 #define MTE_TAG_SHIFT		56
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..b3edde68bc3e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,17 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST		0
+#define KVM_ARM_TAGS_FROM_GUEST		1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e89a5e275e25..baa33359e477 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_ARM_MTE_COPY_TAGS: {
+		struct kvm_arm_copy_mte_tags copy_tags;
+
+		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
+			return -EFAULT;
+		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..7a1e181eb463 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -995,3 +995,82 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
+
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags)
+{
+	gpa_t guest_ipa = copy_tags->guest_ipa;
+	size_t length = copy_tags->length;
+	void __user *tags = copy_tags->addr;
+	gpa_t gfn;
+	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return -EINVAL;
+
+	if (copy_tags->reserved[0] || copy_tags->reserved[1])
+		return -EINVAL;
+
+	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+		return -EINVAL;
+
+	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	gfn = gpa_to_gfn(guest_ipa);
+
+	mutex_lock(&kvm->slots_lock);
+
+	while (length > 0) {
+		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		void *maddr;
+		unsigned long num_tags;
+		struct page *page;
+
+		if (is_error_noslot_pfn(pfn)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		page = pfn_to_online_page(pfn);
+		if (!page) {
+			/* Reject ZONE_DEVICE memory */
+			ret = -EFAULT;
+			goto out;
+		}
+		maddr = page_address(page);
+
+		if (!write) {
+			if (test_bit(PG_mte_tagged, &page->flags))
+				num_tags = mte_copy_tags_to_user(tags, maddr,
+							MTE_GRANULES_PER_PAGE);
+			else
+				/* No tags in memory, so write zeros */
+				num_tags = MTE_GRANULES_PER_PAGE -
+					clear_user(tags, MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_clean(pfn);
+		} else {
+			num_tags = mte_copy_tags_from_user(maddr, tags,
+							MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_dirty(pfn);
+		}
+
+		if (num_tags != MTE_GRANULES_PER_PAGE) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* Set the flag after checking the write completed fully */
+		if (write)
+			set_bit(PG_mte_tagged, &page->flags);
+
+		gfn++;
+		tags += num_tags;
+		length -= PAGE_SIZE;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8c95ba0fadda..4c011c60d468 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1428,6 +1428,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/include/asm/mte-def.h  |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 11 +++++
 arch/arm64/kvm/arm.c              |  7 +++
 arch/arm64/kvm/guest.c            | 79 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 102 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 309e36cc1b42..66b6339df949 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -729,6 +729,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags);
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h
index cf241b0f0a42..626d359b396e 100644
--- a/arch/arm64/include/asm/mte-def.h
+++ b/arch/arm64/include/asm/mte-def.h
@@ -7,6 +7,7 @@
 
 #define MTE_GRANULE_SIZE	UL(16)
 #define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_GRANULES_PER_PAGE	(PAGE_SIZE / MTE_GRANULE_SIZE)
 #define MTE_TAG_SHIFT		56
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..b3edde68bc3e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,17 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST		0
+#define KVM_ARM_TAGS_FROM_GUEST		1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e89a5e275e25..baa33359e477 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_ARM_MTE_COPY_TAGS: {
+		struct kvm_arm_copy_mte_tags copy_tags;
+
+		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
+			return -EFAULT;
+		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..7a1e181eb463 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -995,3 +995,82 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
+
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags)
+{
+	gpa_t guest_ipa = copy_tags->guest_ipa;
+	size_t length = copy_tags->length;
+	void __user *tags = copy_tags->addr;
+	gpa_t gfn;
+	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return -EINVAL;
+
+	if (copy_tags->reserved[0] || copy_tags->reserved[1])
+		return -EINVAL;
+
+	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+		return -EINVAL;
+
+	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	gfn = gpa_to_gfn(guest_ipa);
+
+	mutex_lock(&kvm->slots_lock);
+
+	while (length > 0) {
+		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		void *maddr;
+		unsigned long num_tags;
+		struct page *page;
+
+		if (is_error_noslot_pfn(pfn)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		page = pfn_to_online_page(pfn);
+		if (!page) {
+			/* Reject ZONE_DEVICE memory */
+			ret = -EFAULT;
+			goto out;
+		}
+		maddr = page_address(page);
+
+		if (!write) {
+			if (test_bit(PG_mte_tagged, &page->flags))
+				num_tags = mte_copy_tags_to_user(tags, maddr,
+							MTE_GRANULES_PER_PAGE);
+			else
+				/* No tags in memory, so write zeros */
+				num_tags = MTE_GRANULES_PER_PAGE -
+					clear_user(tags, MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_clean(pfn);
+		} else {
+			num_tags = mte_copy_tags_from_user(maddr, tags,
+							MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_dirty(pfn);
+		}
+
+		if (num_tags != MTE_GRANULES_PER_PAGE) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* Set the flag after checking the write completed fully */
+		if (write)
+			set_bit(PG_mte_tagged, &page->flags);
+
+		gfn++;
+		tags += num_tags;
+		length -= PAGE_SIZE;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8c95ba0fadda..4c011c60d468 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1428,6 +1428,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

The VMM may not wish to have it's own mapping of guest memory mapped
with PROT_MTE because this causes problems if the VMM has tag checking
enabled (the guest controls the tags in physical RAM and it's unlikely
the tags are correct for the VMM).

Instead add a new ioctl which allows the VMM to easily read/write the
tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE
while the VMM can still read/write the tags for the purpose of
migration.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/include/asm/mte-def.h  |  1 +
 arch/arm64/include/uapi/asm/kvm.h | 11 +++++
 arch/arm64/kvm/arm.c              |  7 +++
 arch/arm64/kvm/guest.c            | 79 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 102 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 309e36cc1b42..66b6339df949 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -729,6 +729,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags);
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/mte-def.h b/arch/arm64/include/asm/mte-def.h
index cf241b0f0a42..626d359b396e 100644
--- a/arch/arm64/include/asm/mte-def.h
+++ b/arch/arm64/include/asm/mte-def.h
@@ -7,6 +7,7 @@
 
 #define MTE_GRANULE_SIZE	UL(16)
 #define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_GRANULES_PER_PAGE	(PAGE_SIZE / MTE_GRANULE_SIZE)
 #define MTE_TAG_SHIFT		56
 #define MTE_TAG_SIZE		4
 #define MTE_TAG_MASK		GENMASK((MTE_TAG_SHIFT + (MTE_TAG_SIZE - 1)), MTE_TAG_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..b3edde68bc3e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,17 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+};
+
+#define KVM_ARM_TAGS_TO_GUEST		0
+#define KVM_ARM_TAGS_FROM_GUEST		1
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e89a5e275e25..baa33359e477 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_ARM_MTE_COPY_TAGS: {
+		struct kvm_arm_copy_mte_tags copy_tags;
+
+		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
+			return -EFAULT;
+		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..7a1e181eb463 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -995,3 +995,82 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
+
+int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
+			       struct kvm_arm_copy_mte_tags *copy_tags)
+{
+	gpa_t guest_ipa = copy_tags->guest_ipa;
+	size_t length = copy_tags->length;
+	void __user *tags = copy_tags->addr;
+	gpa_t gfn;
+	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return -EINVAL;
+
+	if (copy_tags->reserved[0] || copy_tags->reserved[1])
+		return -EINVAL;
+
+	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
+		return -EINVAL;
+
+	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
+		return -EINVAL;
+
+	gfn = gpa_to_gfn(guest_ipa);
+
+	mutex_lock(&kvm->slots_lock);
+
+	while (length > 0) {
+		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
+		void *maddr;
+		unsigned long num_tags;
+		struct page *page;
+
+		if (is_error_noslot_pfn(pfn)) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		page = pfn_to_online_page(pfn);
+		if (!page) {
+			/* Reject ZONE_DEVICE memory */
+			ret = -EFAULT;
+			goto out;
+		}
+		maddr = page_address(page);
+
+		if (!write) {
+			if (test_bit(PG_mte_tagged, &page->flags))
+				num_tags = mte_copy_tags_to_user(tags, maddr,
+							MTE_GRANULES_PER_PAGE);
+			else
+				/* No tags in memory, so write zeros */
+				num_tags = MTE_GRANULES_PER_PAGE -
+					clear_user(tags, MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_clean(pfn);
+		} else {
+			num_tags = mte_copy_tags_from_user(maddr, tags,
+							MTE_GRANULES_PER_PAGE);
+			kvm_release_pfn_dirty(pfn);
+		}
+
+		if (num_tags != MTE_GRANULES_PER_PAGE) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* Set the flag after checking the write completed fully */
+		if (write)
+			set_bit(PG_mte_tagged, &page->flags);
+
+		gfn++;
+		tags += num_tags;
+		length -= PAGE_SIZE;
+	}
+
+out:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8c95ba0fadda..4c011c60d468 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1428,6 +1428,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 8/8] KVM: arm64: Document MTE capability and ioctl
  2021-05-24 10:45 ` Steven Price
  (?)
  (?)
@ 2021-05-24 10:45   ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..ab45d7fe2aa5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,37 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
 5. The kvm_run structure
 ========================
 
@@ -6362,6 +6393,27 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ======================
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 8/8] KVM: arm64: Document MTE capability and ioctl
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Dave Martin, Juan Quintela, Richard Henderson, linux-kernel,
	Steven Price, James Morse, Julien Thierry, Thomas Gleixner,
	kvmarm, linux-arm-kernel

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..ab45d7fe2aa5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,37 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
 5. The kvm_run structure
 ========================
 
@@ -6362,6 +6393,27 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ======================
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 8/8] KVM: arm64: Document MTE capability and ioctl
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Dr. David Alan Gilbert, qemu-devel, Dave Martin, Juan Quintela,
	Richard Henderson, linux-kernel, Steven Price, Thomas Gleixner,
	kvmarm, linux-arm-kernel

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..ab45d7fe2aa5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,37 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
 5. The kvm_run structure
 ========================
 
@@ -6362,6 +6393,27 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ======================
 
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v13 8/8] KVM: arm64: Document MTE capability and ioctl
@ 2021-05-24 10:45   ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-05-24 10:45 UTC (permalink / raw)
  To: Catalin Marinas, Marc Zyngier, Will Deacon
  Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose,
	kvmarm, linux-arm-kernel, linux-kernel, Dave Martin,
	Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela,
	Dr. David Alan Gilbert, Richard Henderson, Peter Maydell,
	Haibo Xu, Andrew Jones

A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports
granting a guest access to the tags, and provides a mechanism for the
VMM to enable it.

A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to
access the tags of a guest without having to maintain a PROT_MTE mapping
in userspace. The above capability gates access to the ioctl.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..ab45d7fe2aa5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,37 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_ARM_MTE_COPY_TAGS
+---------------------------
+
+:Capability: KVM_CAP_ARM_MTE
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_copy_mte_tags
+:Returns: 0 on success, < 0 on error
+
+::
+
+  struct kvm_arm_copy_mte_tags {
+	__u64 guest_ipa;
+	__u64 length;
+	void __user *addr;
+	__u64 flags;
+	__u64 reserved[2];
+  };
+
+Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
+``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
+fieldmust point to a buffer which the tags will be copied to or from.
+
+``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
+``KVM_ARM_TAGS_FROM_GUEST``.
+
+The size of the buffer to store the tags is ``(length / 16)`` bytes
+(granules in MTE are 16 bytes long). Each byte contains a single tag
+value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
+``PTRACE_POKEMTETAGS``.
+
 5. The kvm_run structure
 ========================
 
@@ -6362,6 +6393,27 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_MTE
+--------------------
+
+:Architectures: arm64
+:Parameters: none
+
+This capability indicates that KVM (and the hardware) supports exposing the
+Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the
+VMM before creating any VCPUs to allow the guest access. Note that MTE is only
+available to a guest running in AArch64 mode and enabling this capability will
+cause attempts to create AArch32 VCPUs to fail.
+
+When enabled the guest is able to access tags associated with any memory given
+to the guest. KVM will ensure that the pages are flagged ``PG_mte_tagged`` so
+that the tags are maintained during swap or hibernation of the host; however
+the VMM needs to manually save/restore the tags as appropriate if the VM is
+migrated.
+
+When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
+perform a bulk copy of tags to/from the guest.
+
 8. Other capabilities.
 ======================
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
  2021-05-24 10:45   ` Steven Price
  (?)
  (?)
@ 2021-06-03 14:20     ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 14:20 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:08AM +0100, Steven Price wrote:
> A KVM guest could store tags in a page even if the VMM hasn't mapped
> the page with PROT_MTE. So when restoring pages from swap we will
> need to check to see if there are any saved tags even if !pte_tagged().
> 
> However don't check pages for which pte_access_permitted() returns false
> as these will not have been swapped out.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-06-03 14:20     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 14:20 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Mon, May 24, 2021 at 11:45:08AM +0100, Steven Price wrote:
> A KVM guest could store tags in a page even if the VMM hasn't mapped
> the page with PROT_MTE. So when restoring pages from swap we will
> need to check to see if there are any saved tags even if !pte_tagged().
> 
> However don't check pages for which pte_access_permitted() returns false
> as these will not have been swapped out.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-06-03 14:20     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 14:20 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Mon, May 24, 2021 at 11:45:08AM +0100, Steven Price wrote:
> A KVM guest could store tags in a page even if the VMM hasn't mapped
> the page with PROT_MTE. So when restoring pages from swap we will
> need to check to see if there are any saved tags even if !pte_tagged().
> 
> However don't check pages for which pte_access_permitted() returns false
> as these will not have been swapped out.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged
@ 2021-06-03 14:20     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 14:20 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:08AM +0100, Steven Price wrote:
> A KVM guest could store tags in a page even if the VMM hasn't mapped
> the page with PROT_MTE. So when restoring pages from swap we will
> need to check to see if there are any saved tags even if !pte_tagged().
> 
> However don't check pages for which pte_access_permitted() returns false
> as these will not have been swapped out.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-05-24 10:45   ` Steven Price
  (?)
  (?)
@ 2021-06-03 16:00     ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..226035cf7d6c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>  	return PAGE_SIZE;
>  }
>  
> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> +			     unsigned long size)
> +{
> +	if (kvm_has_mte(kvm)) {

Nitpick (less indentation):

	if (!kvm_has_mte(kvm))
		return 0;

> +		/*
> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> +		 * the VM will be able to see the page's tags and therefore
> +		 * they must be initialised first. If PG_mte_tagged is set,
> +		 * tags have already been initialised.
> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> +		 * that may not support tags.
> +		 */
> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> +		struct page *page = pfn_to_online_page(pfn);
> +
> +		if (!page)
> +			return -EFAULT;
> +
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			/*
> +			 * There is a potential (but very unlikely) race
> +			 * between two VMs which are sharing a physical page
> +			 * entering this at the same time. However by splitting
> +			 * the test/set the only risk is tags being overwritten
> +			 * by the mte_clear_page_tags() call.
> +			 */

And I think the real risk here is when the page is writable by at least
one of the VMs sharing the page. This excludes KSM, so it only leaves
the MAP_SHARED mappings.

> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> +				mte_clear_page_tags(page_address(page));
> +				set_bit(PG_mte_tagged, &page->flags);
> +			}
> +		}

If we want to cover this race (I'd say in a separate patch), we can call
mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
got the arguments right). We can avoid the big lock in most cases if
kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
do for VM_MTE but the new flag would not affect the stage 1 VMM page
attributes).

> +	}
> +
> +	return 0;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>  			  unsigned long fault_status)
> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
> -	if (fault_status != FSC_PERM && !device)
> +	if (fault_status != FSC_PERM && !device) {
> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +		if (ret)
> +			goto out_unlock;

Maybe it was discussed in a previous version, why do we need this in
addition to kvm_set_spte_gfn()?

> +
>  		clean_dcache_guest_page(pfn, vma_pagesize);
> +	}
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>  	kvm_pfn_t pfn = pte_pfn(range->pte);
> +	int ret;
>  
>  	if (!kvm->arch.mmu.pgt)
>  		return 0;
>  
>  	WARN_ON(range->end - range->start != 1);
>  
> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +	if (ret)
> +		return false;
> +
>  	/*
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.

Otherwise the patch looks fine.

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-03 16:00     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..226035cf7d6c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>  	return PAGE_SIZE;
>  }
>  
> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> +			     unsigned long size)
> +{
> +	if (kvm_has_mte(kvm)) {

Nitpick (less indentation):

	if (!kvm_has_mte(kvm))
		return 0;

> +		/*
> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> +		 * the VM will be able to see the page's tags and therefore
> +		 * they must be initialised first. If PG_mte_tagged is set,
> +		 * tags have already been initialised.
> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> +		 * that may not support tags.
> +		 */
> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> +		struct page *page = pfn_to_online_page(pfn);
> +
> +		if (!page)
> +			return -EFAULT;
> +
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			/*
> +			 * There is a potential (but very unlikely) race
> +			 * between two VMs which are sharing a physical page
> +			 * entering this at the same time. However by splitting
> +			 * the test/set the only risk is tags being overwritten
> +			 * by the mte_clear_page_tags() call.
> +			 */

And I think the real risk here is when the page is writable by at least
one of the VMs sharing the page. This excludes KSM, so it only leaves
the MAP_SHARED mappings.

> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> +				mte_clear_page_tags(page_address(page));
> +				set_bit(PG_mte_tagged, &page->flags);
> +			}
> +		}

If we want to cover this race (I'd say in a separate patch), we can call
mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
got the arguments right). We can avoid the big lock in most cases if
kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
do for VM_MTE but the new flag would not affect the stage 1 VMM page
attributes).

> +	}
> +
> +	return 0;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>  			  unsigned long fault_status)
> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
> -	if (fault_status != FSC_PERM && !device)
> +	if (fault_status != FSC_PERM && !device) {
> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +		if (ret)
> +			goto out_unlock;

Maybe it was discussed in a previous version, why do we need this in
addition to kvm_set_spte_gfn()?

> +
>  		clean_dcache_guest_page(pfn, vma_pagesize);
> +	}
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>  	kvm_pfn_t pfn = pte_pfn(range->pte);
> +	int ret;
>  
>  	if (!kvm->arch.mmu.pgt)
>  		return 0;
>  
>  	WARN_ON(range->end - range->start != 1);
>  
> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +	if (ret)
> +		return false;
> +
>  	/*
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.

Otherwise the patch looks fine.

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-03 16:00     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..226035cf7d6c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>  	return PAGE_SIZE;
>  }
>  
> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> +			     unsigned long size)
> +{
> +	if (kvm_has_mte(kvm)) {

Nitpick (less indentation):

	if (!kvm_has_mte(kvm))
		return 0;

> +		/*
> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> +		 * the VM will be able to see the page's tags and therefore
> +		 * they must be initialised first. If PG_mte_tagged is set,
> +		 * tags have already been initialised.
> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> +		 * that may not support tags.
> +		 */
> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> +		struct page *page = pfn_to_online_page(pfn);
> +
> +		if (!page)
> +			return -EFAULT;
> +
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			/*
> +			 * There is a potential (but very unlikely) race
> +			 * between two VMs which are sharing a physical page
> +			 * entering this at the same time. However by splitting
> +			 * the test/set the only risk is tags being overwritten
> +			 * by the mte_clear_page_tags() call.
> +			 */

And I think the real risk here is when the page is writable by at least
one of the VMs sharing the page. This excludes KSM, so it only leaves
the MAP_SHARED mappings.

> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> +				mte_clear_page_tags(page_address(page));
> +				set_bit(PG_mte_tagged, &page->flags);
> +			}
> +		}

If we want to cover this race (I'd say in a separate patch), we can call
mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
got the arguments right). We can avoid the big lock in most cases if
kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
do for VM_MTE but the new flag would not affect the stage 1 VMM page
attributes).

> +	}
> +
> +	return 0;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>  			  unsigned long fault_status)
> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
> -	if (fault_status != FSC_PERM && !device)
> +	if (fault_status != FSC_PERM && !device) {
> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +		if (ret)
> +			goto out_unlock;

Maybe it was discussed in a previous version, why do we need this in
addition to kvm_set_spte_gfn()?

> +
>  		clean_dcache_guest_page(pfn, vma_pagesize);
> +	}
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>  	kvm_pfn_t pfn = pte_pfn(range->pte);
> +	int ret;
>  
>  	if (!kvm->arch.mmu.pgt)
>  		return 0;
>  
>  	WARN_ON(range->end - range->start != 1);
>  
> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +	if (ret)
> +		return false;
> +
>  	/*
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.

Otherwise the patch looks fine.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-03 16:00     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:00 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..226035cf7d6c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>  	return PAGE_SIZE;
>  }
>  
> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> +			     unsigned long size)
> +{
> +	if (kvm_has_mte(kvm)) {

Nitpick (less indentation):

	if (!kvm_has_mte(kvm))
		return 0;

> +		/*
> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> +		 * the VM will be able to see the page's tags and therefore
> +		 * they must be initialised first. If PG_mte_tagged is set,
> +		 * tags have already been initialised.
> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> +		 * that may not support tags.
> +		 */
> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> +		struct page *page = pfn_to_online_page(pfn);
> +
> +		if (!page)
> +			return -EFAULT;
> +
> +		for (i = 0; i < nr_pages; i++, page++) {
> +			/*
> +			 * There is a potential (but very unlikely) race
> +			 * between two VMs which are sharing a physical page
> +			 * entering this at the same time. However by splitting
> +			 * the test/set the only risk is tags being overwritten
> +			 * by the mte_clear_page_tags() call.
> +			 */

And I think the real risk here is when the page is writable by at least
one of the VMs sharing the page. This excludes KSM, so it only leaves
the MAP_SHARED mappings.

> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> +				mte_clear_page_tags(page_address(page));
> +				set_bit(PG_mte_tagged, &page->flags);
> +			}
> +		}

If we want to cover this race (I'd say in a separate patch), we can call
mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
got the arguments right). We can avoid the big lock in most cases if
kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
do for VM_MTE but the new flag would not affect the stage 1 VMM page
attributes).

> +	}
> +
> +	return 0;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>  			  unsigned long fault_status)
> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
> -	if (fault_status != FSC_PERM && !device)
> +	if (fault_status != FSC_PERM && !device) {
> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +		if (ret)
> +			goto out_unlock;

Maybe it was discussed in a previous version, why do we need this in
addition to kvm_set_spte_gfn()?

> +
>  		clean_dcache_guest_page(pfn, vma_pagesize);
> +	}
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>  	kvm_pfn_t pfn = pte_pfn(range->pte);
> +	int ret;
>  
>  	if (!kvm->arch.mmu.pgt)
>  		return 0;
>  
>  	WARN_ON(range->end - range->start != 1);
>  
> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
> +	if (ret)
> +		return false;
> +
>  	/*
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.

Otherwise the patch looks fine.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
  2021-05-24 10:45   ` Steven Price
  (?)
  (?)
@ 2021-06-03 16:48     ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:10AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
> new file mode 100644
> index 000000000000..eae4bce9e269
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_mte.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.

You may want to update some of the years.

> + */
> +#ifndef __ASM_KVM_MTE_H
> +#define __ASM_KVM_MTE_H
> +
> +#ifdef __ASSEMBLY__
> +
> +#include <asm/sysreg.h>
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> +	b	.L__skip_switch\@
> +alternative_else_nop_endif
> +	mrs	\reg1, hcr_el2
> +	and	\reg1, \reg1, #(HCR_ATA)
> +	cbz	\reg1, .L__skip_switch\@

Nitpick: TBZ would be shorter, though you need the bit number.

The patch looks fine (as per my understanding of the KVM context
switching code):

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-06-03 16:48     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Mon, May 24, 2021 at 11:45:10AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
> new file mode 100644
> index 000000000000..eae4bce9e269
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_mte.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.

You may want to update some of the years.

> + */
> +#ifndef __ASM_KVM_MTE_H
> +#define __ASM_KVM_MTE_H
> +
> +#ifdef __ASSEMBLY__
> +
> +#include <asm/sysreg.h>
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> +	b	.L__skip_switch\@
> +alternative_else_nop_endif
> +	mrs	\reg1, hcr_el2
> +	and	\reg1, \reg1, #(HCR_ATA)
> +	cbz	\reg1, .L__skip_switch\@

Nitpick: TBZ would be shorter, though you need the bit number.

The patch looks fine (as per my understanding of the KVM context
switching code):

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-06-03 16:48     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Mon, May 24, 2021 at 11:45:10AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
> new file mode 100644
> index 000000000000..eae4bce9e269
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_mte.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.

You may want to update some of the years.

> + */
> +#ifndef __ASM_KVM_MTE_H
> +#define __ASM_KVM_MTE_H
> +
> +#ifdef __ASSEMBLY__
> +
> +#include <asm/sysreg.h>
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> +	b	.L__skip_switch\@
> +alternative_else_nop_endif
> +	mrs	\reg1, hcr_el2
> +	and	\reg1, \reg1, #(HCR_ATA)
> +	cbz	\reg1, .L__skip_switch\@

Nitpick: TBZ would be shorter, though you need the bit number.

The patch looks fine (as per my understanding of the KVM context
switching code):

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers
@ 2021-06-03 16:48     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:48 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:10AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/asm/kvm_mte.h b/arch/arm64/include/asm/kvm_mte.h
> new file mode 100644
> index 000000000000..eae4bce9e269
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_mte.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 ARM Ltd.

You may want to update some of the years.

> + */
> +#ifndef __ASM_KVM_MTE_H
> +#define __ASM_KVM_MTE_H
> +
> +#ifdef __ASSEMBLY__
> +
> +#include <asm/sysreg.h>
> +
> +#ifdef CONFIG_ARM64_MTE
> +
> +.macro mte_switch_to_guest g_ctxt, h_ctxt, reg1
> +alternative_if_not ARM64_MTE
> +	b	.L__skip_switch\@
> +alternative_else_nop_endif
> +	mrs	\reg1, hcr_el2
> +	and	\reg1, \reg1, #(HCR_ATA)
> +	cbz	\reg1, .L__skip_switch\@

Nitpick: TBZ would be shorter, though you need the bit number.

The patch looks fine (as per my understanding of the KVM context
switching code):

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
  2021-05-24 10:45   ` Steven Price
  (?)
  (?)
@ 2021-06-03 16:58     ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:58 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:11AM +0100, Steven Price wrote:
> It's now safe for the VMM to enable MTE in a guest, so expose the
> capability to user space.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-06-03 16:58     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:58 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Mon, May 24, 2021 at 11:45:11AM +0100, Steven Price wrote:
> It's now safe for the VMM to enable MTE in a guest, so expose the
> capability to user space.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-06-03 16:58     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:58 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Mon, May 24, 2021 at 11:45:11AM +0100, Steven Price wrote:
> It's now safe for the VMM to enable MTE in a guest, so expose the
> capability to user space.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE
@ 2021-06-03 16:58     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 16:58 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:11AM +0100, Steven Price wrote:
> It's now safe for the VMM to enable MTE in a guest, so expose the
> capability to user space.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-05-24 10:45   ` Steven Price
  (?)
  (?)
@ 2021-06-03 17:13     ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 17:13 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..b3edde68bc3e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>  	__u32 reserved[12];
>  };
>  
> +struct kvm_arm_copy_mte_tags {
> +	__u64 guest_ipa;
> +	__u64 length;
> +	void __user *addr;
> +	__u64 flags;
> +	__u64 reserved[2];
> +};
> +
> +#define KVM_ARM_TAGS_TO_GUEST		0
> +#define KVM_ARM_TAGS_FROM_GUEST		1
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT	16
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e89a5e275e25..baa33359e477 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  
>  		return 0;
>  	}
> +	case KVM_ARM_MTE_COPY_TAGS: {
> +		struct kvm_arm_copy_mte_tags copy_tags;
> +
> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> +			return -EFAULT;
> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> +	}

I wonder whether we need an update of the user structure following a
fault, like how much was copied etc. In case of an error, some tags were
copied and the VMM may want to skip the page before continuing. But here
there's no such information provided.

On the ptrace interface, we return 0 on the syscall if any bytes were
copied and update iov_len to such number. Maybe you want to still return
an error here but updating copy_tags.length would be nice (and, of
course, a copy_to_user() back).

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-03 17:13     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 17:13 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..b3edde68bc3e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>  	__u32 reserved[12];
>  };
>  
> +struct kvm_arm_copy_mte_tags {
> +	__u64 guest_ipa;
> +	__u64 length;
> +	void __user *addr;
> +	__u64 flags;
> +	__u64 reserved[2];
> +};
> +
> +#define KVM_ARM_TAGS_TO_GUEST		0
> +#define KVM_ARM_TAGS_FROM_GUEST		1
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT	16
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e89a5e275e25..baa33359e477 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  
>  		return 0;
>  	}
> +	case KVM_ARM_MTE_COPY_TAGS: {
> +		struct kvm_arm_copy_mte_tags copy_tags;
> +
> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> +			return -EFAULT;
> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> +	}

I wonder whether we need an update of the user structure following a
fault, like how much was copied etc. In case of an error, some tags were
copied and the VMM may want to skip the page before continuing. But here
there's no such information provided.

On the ptrace interface, we return 0 on the syscall if any bytes were
copied and update iov_len to such number. Maybe you want to still return
an error here but updating copy_tags.length would be nice (and, of
course, a copy_to_user() back).

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-03 17:13     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 17:13 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..b3edde68bc3e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>  	__u32 reserved[12];
>  };
>  
> +struct kvm_arm_copy_mte_tags {
> +	__u64 guest_ipa;
> +	__u64 length;
> +	void __user *addr;
> +	__u64 flags;
> +	__u64 reserved[2];
> +};
> +
> +#define KVM_ARM_TAGS_TO_GUEST		0
> +#define KVM_ARM_TAGS_FROM_GUEST		1
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT	16
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e89a5e275e25..baa33359e477 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  
>  		return 0;
>  	}
> +	case KVM_ARM_MTE_COPY_TAGS: {
> +		struct kvm_arm_copy_mte_tags copy_tags;
> +
> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> +			return -EFAULT;
> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> +	}

I wonder whether we need an update of the user structure following a
fault, like how much was copied etc. In case of an error, some tags were
copied and the VMM may want to skip the page before continuing. But here
there's no such information provided.

On the ptrace interface, we return 0 on the syscall if any bytes were
copied and update iov_len to such number. Maybe you want to still return
an error here but updating copy_tags.length would be nice (and, of
course, a copy_to_user() back).

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-03 17:13     ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-03 17:13 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..b3edde68bc3e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>  	__u32 reserved[12];
>  };
>  
> +struct kvm_arm_copy_mte_tags {
> +	__u64 guest_ipa;
> +	__u64 length;
> +	void __user *addr;
> +	__u64 flags;
> +	__u64 reserved[2];
> +};
> +
> +#define KVM_ARM_TAGS_TO_GUEST		0
> +#define KVM_ARM_TAGS_FROM_GUEST		1
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT	16
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e89a5e275e25..baa33359e477 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>  
>  		return 0;
>  	}
> +	case KVM_ARM_MTE_COPY_TAGS: {
> +		struct kvm_arm_copy_mte_tags copy_tags;
> +
> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> +			return -EFAULT;
> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> +	}

I wonder whether we need an update of the user structure following a
fault, like how much was copied etc. In case of an error, some tags were
copied and the VMM may want to skip the page before continuing. But here
there's no such information provided.

On the ptrace interface, we return 0 on the syscall if any bytes were
copied and update iov_len to such number. Maybe you want to still return
an error here but updating copy_tags.length would be nice (and, of
course, a copy_to_user() back).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-03 16:00     ` Catalin Marinas
  (?)
  (?)
@ 2021-06-04  9:01       ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04  9:01 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Thu, Jun 03, 2021 at 05:00:31PM +0100, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d1f3c87dbd..226035cf7d6c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >  	return PAGE_SIZE;
> >  }
> >  
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +			     unsigned long size)
> > +{
> > +	if (kvm_has_mte(kvm)) {
> > +		/*
> > +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> > +		 * the VM will be able to see the page's tags and therefore
> > +		 * they must be initialised first. If PG_mte_tagged is set,
> > +		 * tags have already been initialised.
> > +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +		 * that may not support tags.
> > +		 */
> > +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +		struct page *page = pfn_to_online_page(pfn);
> > +
> > +		if (!page)
> > +			return -EFAULT;
> > +
> > +		for (i = 0; i < nr_pages; i++, page++) {
> > +			/*
> > +			 * There is a potential (but very unlikely) race
> > +			 * between two VMs which are sharing a physical page
> > +			 * entering this at the same time. However by splitting
> > +			 * the test/set the only risk is tags being overwritten
> > +			 * by the mte_clear_page_tags() call.
> > +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
> > +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +				mte_clear_page_tags(page_address(page));
> > +				set_bit(PG_mte_tagged, &page->flags);
> > +			}
> > +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

Another idea: if VM_SHARED is found for any vma within a region in
kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
for the guest or reject the memory slot if MTE was already enabled.

An alternative here would be to clear VM_MTE_ALLOWED so that any
subsequent mprotect(PROT_MTE) in the VMM would fail in
arch_validate_flags(). MTE would still be allowed in the guest but in
the VMM for the guest memory regions. We can probably do this
irrespective of VM_SHARED. Of course, the VMM can still mmap() the
memory initially with PROT_MTE but that's not an issue IIRC, only the
concurrent mprotect().

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04  9:01       ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04  9:01 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Thu, Jun 03, 2021 at 05:00:31PM +0100, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d1f3c87dbd..226035cf7d6c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >  	return PAGE_SIZE;
> >  }
> >  
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +			     unsigned long size)
> > +{
> > +	if (kvm_has_mte(kvm)) {
> > +		/*
> > +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> > +		 * the VM will be able to see the page's tags and therefore
> > +		 * they must be initialised first. If PG_mte_tagged is set,
> > +		 * tags have already been initialised.
> > +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +		 * that may not support tags.
> > +		 */
> > +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +		struct page *page = pfn_to_online_page(pfn);
> > +
> > +		if (!page)
> > +			return -EFAULT;
> > +
> > +		for (i = 0; i < nr_pages; i++, page++) {
> > +			/*
> > +			 * There is a potential (but very unlikely) race
> > +			 * between two VMs which are sharing a physical page
> > +			 * entering this at the same time. However by splitting
> > +			 * the test/set the only risk is tags being overwritten
> > +			 * by the mte_clear_page_tags() call.
> > +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
> > +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +				mte_clear_page_tags(page_address(page));
> > +				set_bit(PG_mte_tagged, &page->flags);
> > +			}
> > +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

Another idea: if VM_SHARED is found for any vma within a region in
kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
for the guest or reject the memory slot if MTE was already enabled.

An alternative here would be to clear VM_MTE_ALLOWED so that any
subsequent mprotect(PROT_MTE) in the VMM would fail in
arch_validate_flags(). MTE would still be allowed in the guest but in
the VMM for the guest memory regions. We can probably do this
irrespective of VM_SHARED. Of course, the VMM can still mmap() the
memory initially with PROT_MTE but that's not an issue IIRC, only the
concurrent mprotect().

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04  9:01       ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04  9:01 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Thu, Jun 03, 2021 at 05:00:31PM +0100, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d1f3c87dbd..226035cf7d6c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >  	return PAGE_SIZE;
> >  }
> >  
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +			     unsigned long size)
> > +{
> > +	if (kvm_has_mte(kvm)) {
> > +		/*
> > +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> > +		 * the VM will be able to see the page's tags and therefore
> > +		 * they must be initialised first. If PG_mte_tagged is set,
> > +		 * tags have already been initialised.
> > +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +		 * that may not support tags.
> > +		 */
> > +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +		struct page *page = pfn_to_online_page(pfn);
> > +
> > +		if (!page)
> > +			return -EFAULT;
> > +
> > +		for (i = 0; i < nr_pages; i++, page++) {
> > +			/*
> > +			 * There is a potential (but very unlikely) race
> > +			 * between two VMs which are sharing a physical page
> > +			 * entering this at the same time. However by splitting
> > +			 * the test/set the only risk is tags being overwritten
> > +			 * by the mte_clear_page_tags() call.
> > +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
> > +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +				mte_clear_page_tags(page_address(page));
> > +				set_bit(PG_mte_tagged, &page->flags);
> > +			}
> > +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

Another idea: if VM_SHARED is found for any vma within a region in
kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
for the guest or reject the memory slot if MTE was already enabled.

An alternative here would be to clear VM_MTE_ALLOWED so that any
subsequent mprotect(PROT_MTE) in the VMM would fail in
arch_validate_flags(). MTE would still be allowed in the guest but in
the VMM for the guest memory regions. We can probably do this
irrespective of VM_SHARED. Of course, the VMM can still mmap() the
memory initially with PROT_MTE but that's not an issue IIRC, only the
concurrent mprotect().

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04  9:01       ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04  9:01 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Thu, Jun 03, 2021 at 05:00:31PM +0100, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d1f3c87dbd..226035cf7d6c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >  	return PAGE_SIZE;
> >  }
> >  
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +			     unsigned long size)
> > +{
> > +	if (kvm_has_mte(kvm)) {
> > +		/*
> > +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> > +		 * the VM will be able to see the page's tags and therefore
> > +		 * they must be initialised first. If PG_mte_tagged is set,
> > +		 * tags have already been initialised.
> > +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +		 * that may not support tags.
> > +		 */
> > +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +		struct page *page = pfn_to_online_page(pfn);
> > +
> > +		if (!page)
> > +			return -EFAULT;
> > +
> > +		for (i = 0; i < nr_pages; i++, page++) {
> > +			/*
> > +			 * There is a potential (but very unlikely) race
> > +			 * between two VMs which are sharing a physical page
> > +			 * entering this at the same time. However by splitting
> > +			 * the test/set the only risk is tags being overwritten
> > +			 * by the mte_clear_page_tags() call.
> > +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
> > +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +				mte_clear_page_tags(page_address(page));
> > +				set_bit(PG_mte_tagged, &page->flags);
> > +			}
> > +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

Another idea: if VM_SHARED is found for any vma within a region in
kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
for the guest or reject the memory slot if MTE was already enabled.

An alternative here would be to clear VM_MTE_ALLOWED so that any
subsequent mprotect(PROT_MTE) in the VMM would fail in
arch_validate_flags(). MTE would still be allowed in the guest but in
the VMM for the guest memory regions. We can probably do this
irrespective of VM_SHARED. Of course, the VMM can still mmap() the
memory initially with PROT_MTE but that's not an issue IIRC, only the
concurrent mprotect().

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-03 16:00     ` Catalin Marinas
  (?)
  (?)
@ 2021-06-04 10:42       ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 10:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 03/06/2021 17:00, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c5d1f3c87dbd..226035cf7d6c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>  	return PAGE_SIZE;
>>  }
>>  
>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>> +			     unsigned long size)
>> +{
>> +	if (kvm_has_mte(kvm)) {
> 
> Nitpick (less indentation):
> 
> 	if (!kvm_has_mte(kvm))
> 		return 0;

Thanks, will change.

>> +		/*
>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>> +		 * the VM will be able to see the page's tags and therefore
>> +		 * they must be initialised first. If PG_mte_tagged is set,
>> +		 * tags have already been initialised.
>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>> +		 * that may not support tags.
>> +		 */
>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			/*
>> +			 * There is a potential (but very unlikely) race
>> +			 * between two VMs which are sharing a physical page
>> +			 * entering this at the same time. However by splitting
>> +			 * the test/set the only risk is tags being overwritten
>> +			 * by the mte_clear_page_tags() call.
>> +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>> +				mte_clear_page_tags(page_address(page));
>> +				set_bit(PG_mte_tagged, &page->flags);
>> +			}
>> +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

To be honest I'm coming round to just exporting a
mte_prepare_page_tags() function which does the clear/set with the lock
held. I doubt it's such a performance critical path that it will cause
any noticeable issues. Then if we run into performance problems in the
future we can start experimenting with extra VM flags etc as necessary.

And from your later email:
> Another idea: if VM_SHARED is found for any vma within a region in
> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> for the guest or reject the memory slot if MTE was already enabled.
> 
> An alternative here would be to clear VM_MTE_ALLOWED so that any
> subsequent mprotect(PROT_MTE) in the VMM would fail in
> arch_validate_flags(). MTE would still be allowed in the guest but in
> the VMM for the guest memory regions. We can probably do this
> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> memory initially with PROT_MTE but that's not an issue IIRC, only the
> concurrent mprotect().

This could work, but I worry that it's potential fragile. Also the rules
for what user space can do are not obvious and may be surprising. I'd
also want to look into the likes of mremap() to see how easy it would be
to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
memory sneaking into a memslot.

Unless you think it's worth complicating the ABI in the hope of avoiding
the big lock overhead I think it's probably best to stick with the big
lock at least until we have more data on the overhead.

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>  			  unsigned long fault_status)
>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	if (writable)
>>  		prot |= KVM_PGTABLE_PROT_W;
>>  
>> -	if (fault_status != FSC_PERM && !device)
>> +	if (fault_status != FSC_PERM && !device) {
>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>> +		if (ret)
>> +			goto out_unlock;
> 
> Maybe it was discussed in a previous version, why do we need this in
> addition to kvm_set_spte_gfn()?

kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
memslot is changed by the VMM). For the initial access we will normally
fault the page into stage 2 with user_mem_abort().

>> +
>>  		clean_dcache_guest_page(pfn, vma_pagesize);
>> +	}
>>  
>>  	if (exec_fault) {
>>  		prot |= KVM_PGTABLE_PROT_X;
>> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>>  {
>>  	kvm_pfn_t pfn = pte_pfn(range->pte);
>> +	int ret;
>>  
>>  	if (!kvm->arch.mmu.pgt)
>>  		return 0;
>>  
>>  	WARN_ON(range->end - range->start != 1);
>>  
>> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
>> +	if (ret)
>> +		return false;
>> +
>>  	/*
>>  	 * We've moved a page around, probably through CoW, so let's treat it
>>  	 * just like a translation fault and clean the cache to the PoC.
> 
> Otherwise the patch looks fine.
> 

Thanks for the review.

Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 10:42       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 10:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 03/06/2021 17:00, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c5d1f3c87dbd..226035cf7d6c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>  	return PAGE_SIZE;
>>  }
>>  
>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>> +			     unsigned long size)
>> +{
>> +	if (kvm_has_mte(kvm)) {
> 
> Nitpick (less indentation):
> 
> 	if (!kvm_has_mte(kvm))
> 		return 0;

Thanks, will change.

>> +		/*
>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>> +		 * the VM will be able to see the page's tags and therefore
>> +		 * they must be initialised first. If PG_mte_tagged is set,
>> +		 * tags have already been initialised.
>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>> +		 * that may not support tags.
>> +		 */
>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			/*
>> +			 * There is a potential (but very unlikely) race
>> +			 * between two VMs which are sharing a physical page
>> +			 * entering this at the same time. However by splitting
>> +			 * the test/set the only risk is tags being overwritten
>> +			 * by the mte_clear_page_tags() call.
>> +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>> +				mte_clear_page_tags(page_address(page));
>> +				set_bit(PG_mte_tagged, &page->flags);
>> +			}
>> +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

To be honest I'm coming round to just exporting a
mte_prepare_page_tags() function which does the clear/set with the lock
held. I doubt it's such a performance critical path that it will cause
any noticeable issues. Then if we run into performance problems in the
future we can start experimenting with extra VM flags etc as necessary.

And from your later email:
> Another idea: if VM_SHARED is found for any vma within a region in
> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> for the guest or reject the memory slot if MTE was already enabled.
> 
> An alternative here would be to clear VM_MTE_ALLOWED so that any
> subsequent mprotect(PROT_MTE) in the VMM would fail in
> arch_validate_flags(). MTE would still be allowed in the guest but in
> the VMM for the guest memory regions. We can probably do this
> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> memory initially with PROT_MTE but that's not an issue IIRC, only the
> concurrent mprotect().

This could work, but I worry that it's potential fragile. Also the rules
for what user space can do are not obvious and may be surprising. I'd
also want to look into the likes of mremap() to see how easy it would be
to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
memory sneaking into a memslot.

Unless you think it's worth complicating the ABI in the hope of avoiding
the big lock overhead I think it's probably best to stick with the big
lock at least until we have more data on the overhead.

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>  			  unsigned long fault_status)
>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	if (writable)
>>  		prot |= KVM_PGTABLE_PROT_W;
>>  
>> -	if (fault_status != FSC_PERM && !device)
>> +	if (fault_status != FSC_PERM && !device) {
>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>> +		if (ret)
>> +			goto out_unlock;
> 
> Maybe it was discussed in a previous version, why do we need this in
> addition to kvm_set_spte_gfn()?

kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
memslot is changed by the VMM). For the initial access we will normally
fault the page into stage 2 with user_mem_abort().

>> +
>>  		clean_dcache_guest_page(pfn, vma_pagesize);
>> +	}
>>  
>>  	if (exec_fault) {
>>  		prot |= KVM_PGTABLE_PROT_X;
>> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>>  {
>>  	kvm_pfn_t pfn = pte_pfn(range->pte);
>> +	int ret;
>>  
>>  	if (!kvm->arch.mmu.pgt)
>>  		return 0;
>>  
>>  	WARN_ON(range->end - range->start != 1);
>>  
>> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
>> +	if (ret)
>> +		return false;
>> +
>>  	/*
>>  	 * We've moved a page around, probably through CoW, so let's treat it
>>  	 * just like a translation fault and clean the cache to the PoC.
> 
> Otherwise the patch looks fine.
> 

Thanks for the review.

Steve


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 10:42       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 10:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On 03/06/2021 17:00, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c5d1f3c87dbd..226035cf7d6c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>  	return PAGE_SIZE;
>>  }
>>  
>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>> +			     unsigned long size)
>> +{
>> +	if (kvm_has_mte(kvm)) {
> 
> Nitpick (less indentation):
> 
> 	if (!kvm_has_mte(kvm))
> 		return 0;

Thanks, will change.

>> +		/*
>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>> +		 * the VM will be able to see the page's tags and therefore
>> +		 * they must be initialised first. If PG_mte_tagged is set,
>> +		 * tags have already been initialised.
>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>> +		 * that may not support tags.
>> +		 */
>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			/*
>> +			 * There is a potential (but very unlikely) race
>> +			 * between two VMs which are sharing a physical page
>> +			 * entering this at the same time. However by splitting
>> +			 * the test/set the only risk is tags being overwritten
>> +			 * by the mte_clear_page_tags() call.
>> +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>> +				mte_clear_page_tags(page_address(page));
>> +				set_bit(PG_mte_tagged, &page->flags);
>> +			}
>> +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

To be honest I'm coming round to just exporting a
mte_prepare_page_tags() function which does the clear/set with the lock
held. I doubt it's such a performance critical path that it will cause
any noticeable issues. Then if we run into performance problems in the
future we can start experimenting with extra VM flags etc as necessary.

And from your later email:
> Another idea: if VM_SHARED is found for any vma within a region in
> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> for the guest or reject the memory slot if MTE was already enabled.
> 
> An alternative here would be to clear VM_MTE_ALLOWED so that any
> subsequent mprotect(PROT_MTE) in the VMM would fail in
> arch_validate_flags(). MTE would still be allowed in the guest but in
> the VMM for the guest memory regions. We can probably do this
> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> memory initially with PROT_MTE but that's not an issue IIRC, only the
> concurrent mprotect().

This could work, but I worry that it's potential fragile. Also the rules
for what user space can do are not obvious and may be surprising. I'd
also want to look into the likes of mremap() to see how easy it would be
to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
memory sneaking into a memslot.

Unless you think it's worth complicating the ABI in the hope of avoiding
the big lock overhead I think it's probably best to stick with the big
lock at least until we have more data on the overhead.

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>  			  unsigned long fault_status)
>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	if (writable)
>>  		prot |= KVM_PGTABLE_PROT_W;
>>  
>> -	if (fault_status != FSC_PERM && !device)
>> +	if (fault_status != FSC_PERM && !device) {
>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>> +		if (ret)
>> +			goto out_unlock;
> 
> Maybe it was discussed in a previous version, why do we need this in
> addition to kvm_set_spte_gfn()?

kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
memslot is changed by the VMM). For the initial access we will normally
fault the page into stage 2 with user_mem_abort().

>> +
>>  		clean_dcache_guest_page(pfn, vma_pagesize);
>> +	}
>>  
>>  	if (exec_fault) {
>>  		prot |= KVM_PGTABLE_PROT_X;
>> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>>  {
>>  	kvm_pfn_t pfn = pte_pfn(range->pte);
>> +	int ret;
>>  
>>  	if (!kvm->arch.mmu.pgt)
>>  		return 0;
>>  
>>  	WARN_ON(range->end - range->start != 1);
>>  
>> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
>> +	if (ret)
>> +		return false;
>> +
>>  	/*
>>  	 * We've moved a page around, probably through CoW, so let's treat it
>>  	 * just like a translation fault and clean the cache to the PoC.
> 
> Otherwise the patch looks fine.
> 

Thanks for the review.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 10:42       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 10:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 03/06/2021 17:00, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c5d1f3c87dbd..226035cf7d6c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>  	return PAGE_SIZE;
>>  }
>>  
>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>> +			     unsigned long size)
>> +{
>> +	if (kvm_has_mte(kvm)) {
> 
> Nitpick (less indentation):
> 
> 	if (!kvm_has_mte(kvm))
> 		return 0;

Thanks, will change.

>> +		/*
>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>> +		 * the VM will be able to see the page's tags and therefore
>> +		 * they must be initialised first. If PG_mte_tagged is set,
>> +		 * tags have already been initialised.
>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>> +		 * that may not support tags.
>> +		 */
>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			/*
>> +			 * There is a potential (but very unlikely) race
>> +			 * between two VMs which are sharing a physical page
>> +			 * entering this at the same time. However by splitting
>> +			 * the test/set the only risk is tags being overwritten
>> +			 * by the mte_clear_page_tags() call.
>> +			 */
> 
> And I think the real risk here is when the page is writable by at least
> one of the VMs sharing the page. This excludes KSM, so it only leaves
> the MAP_SHARED mappings.
> 
>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>> +				mte_clear_page_tags(page_address(page));
>> +				set_bit(PG_mte_tagged, &page->flags);
>> +			}
>> +		}
> 
> If we want to cover this race (I'd say in a separate patch), we can call
> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> got the arguments right). We can avoid the big lock in most cases if
> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> do for VM_MTE but the new flag would not affect the stage 1 VMM page
> attributes).

To be honest I'm coming round to just exporting a
mte_prepare_page_tags() function which does the clear/set with the lock
held. I doubt it's such a performance critical path that it will cause
any noticeable issues. Then if we run into performance problems in the
future we can start experimenting with extra VM flags etc as necessary.

And from your later email:
> Another idea: if VM_SHARED is found for any vma within a region in
> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> for the guest or reject the memory slot if MTE was already enabled.
> 
> An alternative here would be to clear VM_MTE_ALLOWED so that any
> subsequent mprotect(PROT_MTE) in the VMM would fail in
> arch_validate_flags(). MTE would still be allowed in the guest but in
> the VMM for the guest memory regions. We can probably do this
> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> memory initially with PROT_MTE but that's not an issue IIRC, only the
> concurrent mprotect().

This could work, but I worry that it's potential fragile. Also the rules
for what user space can do are not obvious and may be surprising. I'd
also want to look into the likes of mremap() to see how easy it would be
to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
memory sneaking into a memslot.

Unless you think it's worth complicating the ABI in the hope of avoiding
the big lock overhead I think it's probably best to stick with the big
lock at least until we have more data on the overhead.

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>  			  unsigned long fault_status)
>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	if (writable)
>>  		prot |= KVM_PGTABLE_PROT_W;
>>  
>> -	if (fault_status != FSC_PERM && !device)
>> +	if (fault_status != FSC_PERM && !device) {
>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>> +		if (ret)
>> +			goto out_unlock;
> 
> Maybe it was discussed in a previous version, why do we need this in
> addition to kvm_set_spte_gfn()?

kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
memslot is changed by the VMM). For the initial access we will normally
fault the page into stage 2 with user_mem_abort().

>> +
>>  		clean_dcache_guest_page(pfn, vma_pagesize);
>> +	}
>>  
>>  	if (exec_fault) {
>>  		prot |= KVM_PGTABLE_PROT_X;
>> @@ -1168,12 +1209,17 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>>  bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>>  {
>>  	kvm_pfn_t pfn = pte_pfn(range->pte);
>> +	int ret;
>>  
>>  	if (!kvm->arch.mmu.pgt)
>>  		return 0;
>>  
>>  	WARN_ON(range->end - range->start != 1);
>>  
>> +	ret = sanitise_mte_tags(kvm, pfn, PAGE_SIZE);
>> +	if (ret)
>> +		return false;
>> +
>>  	/*
>>  	 * We've moved a page around, probably through CoW, so let's treat it
>>  	 * just like a translation fault and clean the cache to the PoC.
> 
> Otherwise the patch looks fine.
> 

Thanks for the review.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-03 17:13     ` Catalin Marinas
  (?)
  (?)
@ 2021-06-04 11:15       ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 11:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 03/06/2021 18:13, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 24223adae150..b3edde68bc3e 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>  	__u32 reserved[12];
>>  };
>>  
>> +struct kvm_arm_copy_mte_tags {
>> +	__u64 guest_ipa;
>> +	__u64 length;
>> +	void __user *addr;
>> +	__u64 flags;
>> +	__u64 reserved[2];
>> +};
>> +
>> +#define KVM_ARM_TAGS_TO_GUEST		0
>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index e89a5e275e25..baa33359e477 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>  
>>  		return 0;
>>  	}
>> +	case KVM_ARM_MTE_COPY_TAGS: {
>> +		struct kvm_arm_copy_mte_tags copy_tags;
>> +
>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>> +			return -EFAULT;
>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>> +	}
> 
> I wonder whether we need an update of the user structure following a
> fault, like how much was copied etc. In case of an error, some tags were
> copied and the VMM may want to skip the page before continuing. But here
> there's no such information provided.
> 
> On the ptrace interface, we return 0 on the syscall if any bytes were
> copied and update iov_len to such number. Maybe you want to still return
> an error here but updating copy_tags.length would be nice (and, of
> course, a copy_to_user() back).
> 

Good idea - as you suggest I'll make it update length with the number of
bytes not processed. Although in general I think we're expecting the VMM
to know where the memory is so this is more of a programming error - but
could still be useful for debugging.

Thanks,

Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:15       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 11:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 03/06/2021 18:13, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 24223adae150..b3edde68bc3e 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>  	__u32 reserved[12];
>>  };
>>  
>> +struct kvm_arm_copy_mte_tags {
>> +	__u64 guest_ipa;
>> +	__u64 length;
>> +	void __user *addr;
>> +	__u64 flags;
>> +	__u64 reserved[2];
>> +};
>> +
>> +#define KVM_ARM_TAGS_TO_GUEST		0
>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index e89a5e275e25..baa33359e477 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>  
>>  		return 0;
>>  	}
>> +	case KVM_ARM_MTE_COPY_TAGS: {
>> +		struct kvm_arm_copy_mte_tags copy_tags;
>> +
>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>> +			return -EFAULT;
>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>> +	}
> 
> I wonder whether we need an update of the user structure following a
> fault, like how much was copied etc. In case of an error, some tags were
> copied and the VMM may want to skip the page before continuing. But here
> there's no such information provided.
> 
> On the ptrace interface, we return 0 on the syscall if any bytes were
> copied and update iov_len to such number. Maybe you want to still return
> an error here but updating copy_tags.length would be nice (and, of
> course, a copy_to_user() back).
> 

Good idea - as you suggest I'll make it update length with the number of
bytes not processed. Although in general I think we're expecting the VMM
to know where the memory is so this is more of a programming error - but
could still be useful for debugging.

Thanks,

Steve


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:15       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 11:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On 03/06/2021 18:13, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 24223adae150..b3edde68bc3e 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>  	__u32 reserved[12];
>>  };
>>  
>> +struct kvm_arm_copy_mte_tags {
>> +	__u64 guest_ipa;
>> +	__u64 length;
>> +	void __user *addr;
>> +	__u64 flags;
>> +	__u64 reserved[2];
>> +};
>> +
>> +#define KVM_ARM_TAGS_TO_GUEST		0
>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index e89a5e275e25..baa33359e477 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>  
>>  		return 0;
>>  	}
>> +	case KVM_ARM_MTE_COPY_TAGS: {
>> +		struct kvm_arm_copy_mte_tags copy_tags;
>> +
>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>> +			return -EFAULT;
>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>> +	}
> 
> I wonder whether we need an update of the user structure following a
> fault, like how much was copied etc. In case of an error, some tags were
> copied and the VMM may want to skip the page before continuing. But here
> there's no such information provided.
> 
> On the ptrace interface, we return 0 on the syscall if any bytes were
> copied and update iov_len to such number. Maybe you want to still return
> an error here but updating copy_tags.length would be nice (and, of
> course, a copy_to_user() back).
> 

Good idea - as you suggest I'll make it update length with the number of
bytes not processed. Although in general I think we're expecting the VMM
to know where the memory is so this is more of a programming error - but
could still be useful for debugging.

Thanks,

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:15       ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 11:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 03/06/2021 18:13, Catalin Marinas wrote:
> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 24223adae150..b3edde68bc3e 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>  	__u32 reserved[12];
>>  };
>>  
>> +struct kvm_arm_copy_mte_tags {
>> +	__u64 guest_ipa;
>> +	__u64 length;
>> +	void __user *addr;
>> +	__u64 flags;
>> +	__u64 reserved[2];
>> +};
>> +
>> +#define KVM_ARM_TAGS_TO_GUEST		0
>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>> +
>>  /* If you need to interpret the index values, here is the key: */
>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index e89a5e275e25..baa33359e477 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>  
>>  		return 0;
>>  	}
>> +	case KVM_ARM_MTE_COPY_TAGS: {
>> +		struct kvm_arm_copy_mte_tags copy_tags;
>> +
>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>> +			return -EFAULT;
>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>> +	}
> 
> I wonder whether we need an update of the user structure following a
> fault, like how much was copied etc. In case of an error, some tags were
> copied and the VMM may want to skip the page before continuing. But here
> there's no such information provided.
> 
> On the ptrace interface, we return 0 on the syscall if any bytes were
> copied and update iov_len to such number. Maybe you want to still return
> an error here but updating copy_tags.length would be nice (and, of
> course, a copy_to_user() back).
> 

Good idea - as you suggest I'll make it update length with the number of
bytes not processed. Although in general I think we're expecting the VMM
to know where the memory is so this is more of a programming error - but
could still be useful for debugging.

Thanks,

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-04 10:42       ` Steven Price
  (?)
  (?)
@ 2021-06-04 11:36         ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:36 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> On 03/06/2021 17:00, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index c5d1f3c87dbd..226035cf7d6c 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >>  	return PAGE_SIZE;
> >>  }
> >>  
> >> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >> +			     unsigned long size)
> >> +{
> >> +	if (kvm_has_mte(kvm)) {
> >> +		/*
> >> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> >> +		 * the VM will be able to see the page's tags and therefore
> >> +		 * they must be initialised first. If PG_mte_tagged is set,
> >> +		 * tags have already been initialised.
> >> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> >> +		 * that may not support tags.
> >> +		 */
> >> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> >> +		struct page *page = pfn_to_online_page(pfn);
> >> +
> >> +		if (!page)
> >> +			return -EFAULT;
> >> +
> >> +		for (i = 0; i < nr_pages; i++, page++) {
> >> +			/*
> >> +			 * There is a potential (but very unlikely) race
> >> +			 * between two VMs which are sharing a physical page
> >> +			 * entering this at the same time. However by splitting
> >> +			 * the test/set the only risk is tags being overwritten
> >> +			 * by the mte_clear_page_tags() call.
> >> +			 */
> > 
> > And I think the real risk here is when the page is writable by at least
> > one of the VMs sharing the page. This excludes KSM, so it only leaves
> > the MAP_SHARED mappings.
> > 
> >> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> >> +				mte_clear_page_tags(page_address(page));
> >> +				set_bit(PG_mte_tagged, &page->flags);
> >> +			}
> >> +		}
> > 
> > If we want to cover this race (I'd say in a separate patch), we can call
> > mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> > got the arguments right). We can avoid the big lock in most cases if
> > kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> > and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> > do for VM_MTE but the new flag would not affect the stage 1 VMM page
> > attributes).
> 
> To be honest I'm coming round to just exporting a
> mte_prepare_page_tags() function which does the clear/set with the lock
> held. I doubt it's such a performance critical path that it will cause
> any noticeable issues. Then if we run into performance problems in the
> future we can start experimenting with extra VM flags etc as necessary.

It works for me.

> And from your later email:
> > Another idea: if VM_SHARED is found for any vma within a region in
> > kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> > for the guest or reject the memory slot if MTE was already enabled.
> > 
> > An alternative here would be to clear VM_MTE_ALLOWED so that any
> > subsequent mprotect(PROT_MTE) in the VMM would fail in
> > arch_validate_flags(). MTE would still be allowed in the guest but in
> > the VMM for the guest memory regions. We can probably do this
> > irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> > memory initially with PROT_MTE but that's not an issue IIRC, only the
> > concurrent mprotect().
> 
> This could work, but I worry that it's potential fragile. Also the rules
> for what user space can do are not obvious and may be surprising. I'd
> also want to look into the likes of mremap() to see how easy it would be
> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
> memory sneaking into a memslot.
> 
> Unless you think it's worth complicating the ABI in the hope of avoiding
> the big lock overhead I think it's probably best to stick with the big
> lock at least until we have more data on the overhead.

It's up to Marc but I think for now just make it safe and once we get
our hands on hardware, we can assess the impact. For example, starting
multiple VMs simultaneously will contend on such big lock but we have an
option to optimise it by setting PG_mte_tagged on allocation via a new
VM_* flag.

For my last suggestion above, changing the VMM ABI afterwards is a bit
tricky, so we could state now that VM_SHARED and MTE are not allowed
(though it needs a patch to enforce it). That's assuming that mprotect()
in the VMM cannot race with the user_mem_abort() on another CPU which
makes the lock necessary anyway.

> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> >>  			  unsigned long fault_status)
> >> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	if (writable)
> >>  		prot |= KVM_PGTABLE_PROT_W;
> >>  
> >> -	if (fault_status != FSC_PERM && !device)
> >> +	if (fault_status != FSC_PERM && !device) {
> >> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >> +		if (ret)
> >> +			goto out_unlock;
> > 
> > Maybe it was discussed in a previous version, why do we need this in
> > addition to kvm_set_spte_gfn()?
> 
> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> memslot is changed by the VMM). For the initial access we will normally
> fault the page into stage 2 with user_mem_abort().

Right. Can we move the sanitise_mte_tags() call to
kvm_pgtable_stage2_map() instead or we don't have the all the
information needed?

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 11:36         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:36 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> On 03/06/2021 17:00, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index c5d1f3c87dbd..226035cf7d6c 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >>  	return PAGE_SIZE;
> >>  }
> >>  
> >> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >> +			     unsigned long size)
> >> +{
> >> +	if (kvm_has_mte(kvm)) {
> >> +		/*
> >> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> >> +		 * the VM will be able to see the page's tags and therefore
> >> +		 * they must be initialised first. If PG_mte_tagged is set,
> >> +		 * tags have already been initialised.
> >> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> >> +		 * that may not support tags.
> >> +		 */
> >> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> >> +		struct page *page = pfn_to_online_page(pfn);
> >> +
> >> +		if (!page)
> >> +			return -EFAULT;
> >> +
> >> +		for (i = 0; i < nr_pages; i++, page++) {
> >> +			/*
> >> +			 * There is a potential (but very unlikely) race
> >> +			 * between two VMs which are sharing a physical page
> >> +			 * entering this at the same time. However by splitting
> >> +			 * the test/set the only risk is tags being overwritten
> >> +			 * by the mte_clear_page_tags() call.
> >> +			 */
> > 
> > And I think the real risk here is when the page is writable by at least
> > one of the VMs sharing the page. This excludes KSM, so it only leaves
> > the MAP_SHARED mappings.
> > 
> >> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> >> +				mte_clear_page_tags(page_address(page));
> >> +				set_bit(PG_mte_tagged, &page->flags);
> >> +			}
> >> +		}
> > 
> > If we want to cover this race (I'd say in a separate patch), we can call
> > mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> > got the arguments right). We can avoid the big lock in most cases if
> > kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> > and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> > do for VM_MTE but the new flag would not affect the stage 1 VMM page
> > attributes).
> 
> To be honest I'm coming round to just exporting a
> mte_prepare_page_tags() function which does the clear/set with the lock
> held. I doubt it's such a performance critical path that it will cause
> any noticeable issues. Then if we run into performance problems in the
> future we can start experimenting with extra VM flags etc as necessary.

It works for me.

> And from your later email:
> > Another idea: if VM_SHARED is found for any vma within a region in
> > kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> > for the guest or reject the memory slot if MTE was already enabled.
> > 
> > An alternative here would be to clear VM_MTE_ALLOWED so that any
> > subsequent mprotect(PROT_MTE) in the VMM would fail in
> > arch_validate_flags(). MTE would still be allowed in the guest but in
> > the VMM for the guest memory regions. We can probably do this
> > irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> > memory initially with PROT_MTE but that's not an issue IIRC, only the
> > concurrent mprotect().
> 
> This could work, but I worry that it's potential fragile. Also the rules
> for what user space can do are not obvious and may be surprising. I'd
> also want to look into the likes of mremap() to see how easy it would be
> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
> memory sneaking into a memslot.
> 
> Unless you think it's worth complicating the ABI in the hope of avoiding
> the big lock overhead I think it's probably best to stick with the big
> lock at least until we have more data on the overhead.

It's up to Marc but I think for now just make it safe and once we get
our hands on hardware, we can assess the impact. For example, starting
multiple VMs simultaneously will contend on such big lock but we have an
option to optimise it by setting PG_mte_tagged on allocation via a new
VM_* flag.

For my last suggestion above, changing the VMM ABI afterwards is a bit
tricky, so we could state now that VM_SHARED and MTE are not allowed
(though it needs a patch to enforce it). That's assuming that mprotect()
in the VMM cannot race with the user_mem_abort() on another CPU which
makes the lock necessary anyway.

> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> >>  			  unsigned long fault_status)
> >> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	if (writable)
> >>  		prot |= KVM_PGTABLE_PROT_W;
> >>  
> >> -	if (fault_status != FSC_PERM && !device)
> >> +	if (fault_status != FSC_PERM && !device) {
> >> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >> +		if (ret)
> >> +			goto out_unlock;
> > 
> > Maybe it was discussed in a previous version, why do we need this in
> > addition to kvm_set_spte_gfn()?
> 
> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> memslot is changed by the VMM). For the initial access we will normally
> fault the page into stage 2 with user_mem_abort().

Right. Can we move the sanitise_mte_tags() call to
kvm_pgtable_stage2_map() instead or we don't have the all the
information needed?

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 11:36         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:36 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> On 03/06/2021 17:00, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index c5d1f3c87dbd..226035cf7d6c 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >>  	return PAGE_SIZE;
> >>  }
> >>  
> >> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >> +			     unsigned long size)
> >> +{
> >> +	if (kvm_has_mte(kvm)) {
> >> +		/*
> >> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> >> +		 * the VM will be able to see the page's tags and therefore
> >> +		 * they must be initialised first. If PG_mte_tagged is set,
> >> +		 * tags have already been initialised.
> >> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> >> +		 * that may not support tags.
> >> +		 */
> >> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> >> +		struct page *page = pfn_to_online_page(pfn);
> >> +
> >> +		if (!page)
> >> +			return -EFAULT;
> >> +
> >> +		for (i = 0; i < nr_pages; i++, page++) {
> >> +			/*
> >> +			 * There is a potential (but very unlikely) race
> >> +			 * between two VMs which are sharing a physical page
> >> +			 * entering this at the same time. However by splitting
> >> +			 * the test/set the only risk is tags being overwritten
> >> +			 * by the mte_clear_page_tags() call.
> >> +			 */
> > 
> > And I think the real risk here is when the page is writable by at least
> > one of the VMs sharing the page. This excludes KSM, so it only leaves
> > the MAP_SHARED mappings.
> > 
> >> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> >> +				mte_clear_page_tags(page_address(page));
> >> +				set_bit(PG_mte_tagged, &page->flags);
> >> +			}
> >> +		}
> > 
> > If we want to cover this race (I'd say in a separate patch), we can call
> > mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> > got the arguments right). We can avoid the big lock in most cases if
> > kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> > and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> > do for VM_MTE but the new flag would not affect the stage 1 VMM page
> > attributes).
> 
> To be honest I'm coming round to just exporting a
> mte_prepare_page_tags() function which does the clear/set with the lock
> held. I doubt it's such a performance critical path that it will cause
> any noticeable issues. Then if we run into performance problems in the
> future we can start experimenting with extra VM flags etc as necessary.

It works for me.

> And from your later email:
> > Another idea: if VM_SHARED is found for any vma within a region in
> > kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> > for the guest or reject the memory slot if MTE was already enabled.
> > 
> > An alternative here would be to clear VM_MTE_ALLOWED so that any
> > subsequent mprotect(PROT_MTE) in the VMM would fail in
> > arch_validate_flags(). MTE would still be allowed in the guest but in
> > the VMM for the guest memory regions. We can probably do this
> > irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> > memory initially with PROT_MTE but that's not an issue IIRC, only the
> > concurrent mprotect().
> 
> This could work, but I worry that it's potential fragile. Also the rules
> for what user space can do are not obvious and may be surprising. I'd
> also want to look into the likes of mremap() to see how easy it would be
> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
> memory sneaking into a memslot.
> 
> Unless you think it's worth complicating the ABI in the hope of avoiding
> the big lock overhead I think it's probably best to stick with the big
> lock at least until we have more data on the overhead.

It's up to Marc but I think for now just make it safe and once we get
our hands on hardware, we can assess the impact. For example, starting
multiple VMs simultaneously will contend on such big lock but we have an
option to optimise it by setting PG_mte_tagged on allocation via a new
VM_* flag.

For my last suggestion above, changing the VMM ABI afterwards is a bit
tricky, so we could state now that VM_SHARED and MTE are not allowed
(though it needs a patch to enforce it). That's assuming that mprotect()
in the VMM cannot race with the user_mem_abort() on another CPU which
makes the lock necessary anyway.

> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> >>  			  unsigned long fault_status)
> >> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	if (writable)
> >>  		prot |= KVM_PGTABLE_PROT_W;
> >>  
> >> -	if (fault_status != FSC_PERM && !device)
> >> +	if (fault_status != FSC_PERM && !device) {
> >> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >> +		if (ret)
> >> +			goto out_unlock;
> > 
> > Maybe it was discussed in a previous version, why do we need this in
> > addition to kvm_set_spte_gfn()?
> 
> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> memslot is changed by the VMM). For the initial access we will normally
> fault the page into stage 2 with user_mem_abort().

Right. Can we move the sanitise_mte_tags() call to
kvm_pgtable_stage2_map() instead or we don't have the all the
information needed?

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 11:36         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:36 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> On 03/06/2021 17:00, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index c5d1f3c87dbd..226035cf7d6c 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >>  	return PAGE_SIZE;
> >>  }
> >>  
> >> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> >> +			     unsigned long size)
> >> +{
> >> +	if (kvm_has_mte(kvm)) {
> >> +		/*
> >> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
> >> +		 * the VM will be able to see the page's tags and therefore
> >> +		 * they must be initialised first. If PG_mte_tagged is set,
> >> +		 * tags have already been initialised.
> >> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> >> +		 * that may not support tags.
> >> +		 */
> >> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
> >> +		struct page *page = pfn_to_online_page(pfn);
> >> +
> >> +		if (!page)
> >> +			return -EFAULT;
> >> +
> >> +		for (i = 0; i < nr_pages; i++, page++) {
> >> +			/*
> >> +			 * There is a potential (but very unlikely) race
> >> +			 * between two VMs which are sharing a physical page
> >> +			 * entering this at the same time. However by splitting
> >> +			 * the test/set the only risk is tags being overwritten
> >> +			 * by the mte_clear_page_tags() call.
> >> +			 */
> > 
> > And I think the real risk here is when the page is writable by at least
> > one of the VMs sharing the page. This excludes KSM, so it only leaves
> > the MAP_SHARED mappings.
> > 
> >> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
> >> +				mte_clear_page_tags(page_address(page));
> >> +				set_bit(PG_mte_tagged, &page->flags);
> >> +			}
> >> +		}
> > 
> > If we want to cover this race (I'd say in a separate patch), we can call
> > mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
> > got the arguments right). We can avoid the big lock in most cases if
> > kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
> > and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
> > do for VM_MTE but the new flag would not affect the stage 1 VMM page
> > attributes).
> 
> To be honest I'm coming round to just exporting a
> mte_prepare_page_tags() function which does the clear/set with the lock
> held. I doubt it's such a performance critical path that it will cause
> any noticeable issues. Then if we run into performance problems in the
> future we can start experimenting with extra VM flags etc as necessary.

It works for me.

> And from your later email:
> > Another idea: if VM_SHARED is found for any vma within a region in
> > kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
> > for the guest or reject the memory slot if MTE was already enabled.
> > 
> > An alternative here would be to clear VM_MTE_ALLOWED so that any
> > subsequent mprotect(PROT_MTE) in the VMM would fail in
> > arch_validate_flags(). MTE would still be allowed in the guest but in
> > the VMM for the guest memory regions. We can probably do this
> > irrespective of VM_SHARED. Of course, the VMM can still mmap() the
> > memory initially with PROT_MTE but that's not an issue IIRC, only the
> > concurrent mprotect().
> 
> This could work, but I worry that it's potential fragile. Also the rules
> for what user space can do are not obvious and may be surprising. I'd
> also want to look into the likes of mremap() to see how easy it would be
> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
> memory sneaking into a memslot.
> 
> Unless you think it's worth complicating the ABI in the hope of avoiding
> the big lock overhead I think it's probably best to stick with the big
> lock at least until we have more data on the overhead.

It's up to Marc but I think for now just make it safe and once we get
our hands on hardware, we can assess the impact. For example, starting
multiple VMs simultaneously will contend on such big lock but we have an
option to optimise it by setting PG_mte_tagged on allocation via a new
VM_* flag.

For my last suggestion above, changing the VMM ABI afterwards is a bit
tricky, so we could state now that VM_SHARED and MTE are not allowed
(though it needs a patch to enforce it). That's assuming that mprotect()
in the VMM cannot race with the user_mem_abort() on another CPU which
makes the lock necessary anyway.

> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> >>  			  unsigned long fault_status)
> >> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	if (writable)
> >>  		prot |= KVM_PGTABLE_PROT_W;
> >>  
> >> -	if (fault_status != FSC_PERM && !device)
> >> +	if (fault_status != FSC_PERM && !device) {
> >> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >> +		if (ret)
> >> +			goto out_unlock;
> > 
> > Maybe it was discussed in a previous version, why do we need this in
> > addition to kvm_set_spte_gfn()?
> 
> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> memslot is changed by the VMM). For the initial access we will normally
> fault the page into stage 2 with user_mem_abort().

Right. Can we move the sanitise_mte_tags() call to
kvm_pgtable_stage2_map() instead or we don't have the all the
information needed?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-04 11:15       ` Steven Price
  (?)
  (?)
@ 2021-06-04 11:42         ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> On 03/06/2021 18:13, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 24223adae150..b3edde68bc3e 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>  	__u32 reserved[12];
> >>  };
> >>  
> >> +struct kvm_arm_copy_mte_tags {
> >> +	__u64 guest_ipa;
> >> +	__u64 length;
> >> +	void __user *addr;
> >> +	__u64 flags;
> >> +	__u64 reserved[2];
> >> +};
> >> +
> >> +#define KVM_ARM_TAGS_TO_GUEST		0
> >> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >> +
> >>  /* If you need to interpret the index values, here is the key: */
> >>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index e89a5e275e25..baa33359e477 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>  
> >>  		return 0;
> >>  	}
> >> +	case KVM_ARM_MTE_COPY_TAGS: {
> >> +		struct kvm_arm_copy_mte_tags copy_tags;
> >> +
> >> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >> +			return -EFAULT;
> >> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >> +	}
> > 
> > I wonder whether we need an update of the user structure following a
> > fault, like how much was copied etc. In case of an error, some tags were
> > copied and the VMM may want to skip the page before continuing. But here
> > there's no such information provided.
> > 
> > On the ptrace interface, we return 0 on the syscall if any bytes were
> > copied and update iov_len to such number. Maybe you want to still return
> > an error here but updating copy_tags.length would be nice (and, of
> > course, a copy_to_user() back).
> 
> Good idea - as you suggest I'll make it update length with the number of
> bytes not processed. Although in general I think we're expecting the VMM
> to know where the memory is so this is more of a programming error - but
> could still be useful for debugging.

Or update it to the number of bytes copied to be consistent with
ptrace()'s iov.len. On success, the structure is effectively left
unchanged.

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:42         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> On 03/06/2021 18:13, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 24223adae150..b3edde68bc3e 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>  	__u32 reserved[12];
> >>  };
> >>  
> >> +struct kvm_arm_copy_mte_tags {
> >> +	__u64 guest_ipa;
> >> +	__u64 length;
> >> +	void __user *addr;
> >> +	__u64 flags;
> >> +	__u64 reserved[2];
> >> +};
> >> +
> >> +#define KVM_ARM_TAGS_TO_GUEST		0
> >> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >> +
> >>  /* If you need to interpret the index values, here is the key: */
> >>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index e89a5e275e25..baa33359e477 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>  
> >>  		return 0;
> >>  	}
> >> +	case KVM_ARM_MTE_COPY_TAGS: {
> >> +		struct kvm_arm_copy_mte_tags copy_tags;
> >> +
> >> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >> +			return -EFAULT;
> >> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >> +	}
> > 
> > I wonder whether we need an update of the user structure following a
> > fault, like how much was copied etc. In case of an error, some tags were
> > copied and the VMM may want to skip the page before continuing. But here
> > there's no such information provided.
> > 
> > On the ptrace interface, we return 0 on the syscall if any bytes were
> > copied and update iov_len to such number. Maybe you want to still return
> > an error here but updating copy_tags.length would be nice (and, of
> > course, a copy_to_user() back).
> 
> Good idea - as you suggest I'll make it update length with the number of
> bytes not processed. Although in general I think we're expecting the VMM
> to know where the memory is so this is more of a programming error - but
> could still be useful for debugging.

Or update it to the number of bytes copied to be consistent with
ptrace()'s iov.len. On success, the structure is effectively left
unchanged.

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:42         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> On 03/06/2021 18:13, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 24223adae150..b3edde68bc3e 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>  	__u32 reserved[12];
> >>  };
> >>  
> >> +struct kvm_arm_copy_mte_tags {
> >> +	__u64 guest_ipa;
> >> +	__u64 length;
> >> +	void __user *addr;
> >> +	__u64 flags;
> >> +	__u64 reserved[2];
> >> +};
> >> +
> >> +#define KVM_ARM_TAGS_TO_GUEST		0
> >> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >> +
> >>  /* If you need to interpret the index values, here is the key: */
> >>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index e89a5e275e25..baa33359e477 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>  
> >>  		return 0;
> >>  	}
> >> +	case KVM_ARM_MTE_COPY_TAGS: {
> >> +		struct kvm_arm_copy_mte_tags copy_tags;
> >> +
> >> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >> +			return -EFAULT;
> >> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >> +	}
> > 
> > I wonder whether we need an update of the user structure following a
> > fault, like how much was copied etc. In case of an error, some tags were
> > copied and the VMM may want to skip the page before continuing. But here
> > there's no such information provided.
> > 
> > On the ptrace interface, we return 0 on the syscall if any bytes were
> > copied and update iov_len to such number. Maybe you want to still return
> > an error here but updating copy_tags.length would be nice (and, of
> > course, a copy_to_user() back).
> 
> Good idea - as you suggest I'll make it update length with the number of
> bytes not processed. Although in general I think we're expecting the VMM
> to know where the memory is so this is more of a programming error - but
> could still be useful for debugging.

Or update it to the number of bytes copied to be consistent with
ptrace()'s iov.len. On success, the structure is effectively left
unchanged.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 11:42         ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 11:42 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> On 03/06/2021 18:13, Catalin Marinas wrote:
> > On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 24223adae150..b3edde68bc3e 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>  	__u32 reserved[12];
> >>  };
> >>  
> >> +struct kvm_arm_copy_mte_tags {
> >> +	__u64 guest_ipa;
> >> +	__u64 length;
> >> +	void __user *addr;
> >> +	__u64 flags;
> >> +	__u64 reserved[2];
> >> +};
> >> +
> >> +#define KVM_ARM_TAGS_TO_GUEST		0
> >> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >> +
> >>  /* If you need to interpret the index values, here is the key: */
> >>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index e89a5e275e25..baa33359e477 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>  
> >>  		return 0;
> >>  	}
> >> +	case KVM_ARM_MTE_COPY_TAGS: {
> >> +		struct kvm_arm_copy_mte_tags copy_tags;
> >> +
> >> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >> +			return -EFAULT;
> >> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >> +	}
> > 
> > I wonder whether we need an update of the user structure following a
> > fault, like how much was copied etc. In case of an error, some tags were
> > copied and the VMM may want to skip the page before continuing. But here
> > there's no such information provided.
> > 
> > On the ptrace interface, we return 0 on the syscall if any bytes were
> > copied and update iov_len to such number. Maybe you want to still return
> > an error here but updating copy_tags.length would be nice (and, of
> > course, a copy_to_user() back).
> 
> Good idea - as you suggest I'll make it update length with the number of
> bytes not processed. Although in general I think we're expecting the VMM
> to know where the memory is so this is more of a programming error - but
> could still be useful for debugging.

Or update it to the number of bytes copied to be consistent with
ptrace()'s iov.len. On success, the structure is effectively left
unchanged.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-04 11:36         ` Catalin Marinas
  (?)
  (?)
@ 2021-06-04 12:51           ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 12:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 04/06/2021 12:36, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
>> On 03/06/2021 17:00, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index c5d1f3c87dbd..226035cf7d6c 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>>>  	return PAGE_SIZE;
>>>>  }
>>>>  
>>>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>>>> +			     unsigned long size)
>>>> +{
>>>> +	if (kvm_has_mte(kvm)) {
>>>> +		/*
>>>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>>>> +		 * the VM will be able to see the page's tags and therefore
>>>> +		 * they must be initialised first. If PG_mte_tagged is set,
>>>> +		 * tags have already been initialised.
>>>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>>>> +		 * that may not support tags.
>>>> +		 */
>>>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>>>> +		struct page *page = pfn_to_online_page(pfn);
>>>> +
>>>> +		if (!page)
>>>> +			return -EFAULT;
>>>> +
>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>> +			/*
>>>> +			 * There is a potential (but very unlikely) race
>>>> +			 * between two VMs which are sharing a physical page
>>>> +			 * entering this at the same time. However by splitting
>>>> +			 * the test/set the only risk is tags being overwritten
>>>> +			 * by the mte_clear_page_tags() call.
>>>> +			 */
>>>
>>> And I think the real risk here is when the page is writable by at least
>>> one of the VMs sharing the page. This excludes KSM, so it only leaves
>>> the MAP_SHARED mappings.
>>>
>>>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>>>> +				mte_clear_page_tags(page_address(page));
>>>> +				set_bit(PG_mte_tagged, &page->flags);
>>>> +			}
>>>> +		}
>>>
>>> If we want to cover this race (I'd say in a separate patch), we can call
>>> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
>>> got the arguments right). We can avoid the big lock in most cases if
>>> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
>>> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
>>> do for VM_MTE but the new flag would not affect the stage 1 VMM page
>>> attributes).
>>
>> To be honest I'm coming round to just exporting a
>> mte_prepare_page_tags() function which does the clear/set with the lock
>> held. I doubt it's such a performance critical path that it will cause
>> any noticeable issues. Then if we run into performance problems in the
>> future we can start experimenting with extra VM flags etc as necessary.
> 
> It works for me.
> 
>> And from your later email:
>>> Another idea: if VM_SHARED is found for any vma within a region in
>>> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
>>> for the guest or reject the memory slot if MTE was already enabled.
>>>
>>> An alternative here would be to clear VM_MTE_ALLOWED so that any
>>> subsequent mprotect(PROT_MTE) in the VMM would fail in
>>> arch_validate_flags(). MTE would still be allowed in the guest but in
>>> the VMM for the guest memory regions. We can probably do this
>>> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
>>> memory initially with PROT_MTE but that's not an issue IIRC, only the
>>> concurrent mprotect().
>>
>> This could work, but I worry that it's potential fragile. Also the rules
>> for what user space can do are not obvious and may be surprising. I'd
>> also want to look into the likes of mremap() to see how easy it would be
>> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
>> memory sneaking into a memslot.
>>
>> Unless you think it's worth complicating the ABI in the hope of avoiding
>> the big lock overhead I think it's probably best to stick with the big
>> lock at least until we have more data on the overhead.
> 
> It's up to Marc but I think for now just make it safe and once we get
> our hands on hardware, we can assess the impact. For example, starting
> multiple VMs simultaneously will contend on such big lock but we have an
> option to optimise it by setting PG_mte_tagged on allocation via a new
> VM_* flag.
> 
> For my last suggestion above, changing the VMM ABI afterwards is a bit
> tricky, so we could state now that VM_SHARED and MTE are not allowed
> (though it needs a patch to enforce it). That's assuming that mprotect()
> in the VMM cannot race with the user_mem_abort() on another CPU which
> makes the lock necessary anyway.
> 
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>>>  			  unsigned long fault_status)
>>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	if (writable)
>>>>  		prot |= KVM_PGTABLE_PROT_W;
>>>>  
>>>> -	if (fault_status != FSC_PERM && !device)
>>>> +	if (fault_status != FSC_PERM && !device) {
>>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>>>> +		if (ret)
>>>> +			goto out_unlock;
>>>
>>> Maybe it was discussed in a previous version, why do we need this in
>>> addition to kvm_set_spte_gfn()?
>>
>> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
>> memslot is changed by the VMM). For the initial access we will normally
>> fault the page into stage 2 with user_mem_abort().
> 
> Right. Can we move the sanitise_mte_tags() call to
> kvm_pgtable_stage2_map() instead or we don't have the all the
> information needed?

I tried that before: kvm_pgtable_stage2_map() is shared with the
hypervisor so sadly we can't go poking around in the host as this breaks
on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
text:

 * Move the code to sanitise tags out of user_mem_abort() into its own
   function. Also call this new function from kvm_set_spte_gfn() as that
   path was missing the sanitising.

   Originally I was going to move the code all the way down to
   kvm_pgtable_stage2_map(). Sadly as that also part of the EL2
   hypervisor this breaks nVHE as the code needs to perform actions in
   the host.

The only other option I could see would be to provide a wrapper for
kvm_pgtable_stage2_map() in mmu.c which could do the sanitising as
necessary. But considering we know the call site in
kvm_phys_addr_ioremap() doesn't need handling (PROT_DEVICE is always
specified) and there's only two more, it seemed easier just to add the
two calls necessary to the new sanitise_mte_tags().

We also have a direct pointer to 'kvm' this way which is much nicer than
pointer chasing it out of the kvm_pgtable structure.

Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 12:51           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 12:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 04/06/2021 12:36, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
>> On 03/06/2021 17:00, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index c5d1f3c87dbd..226035cf7d6c 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>>>  	return PAGE_SIZE;
>>>>  }
>>>>  
>>>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>>>> +			     unsigned long size)
>>>> +{
>>>> +	if (kvm_has_mte(kvm)) {
>>>> +		/*
>>>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>>>> +		 * the VM will be able to see the page's tags and therefore
>>>> +		 * they must be initialised first. If PG_mte_tagged is set,
>>>> +		 * tags have already been initialised.
>>>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>>>> +		 * that may not support tags.
>>>> +		 */
>>>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>>>> +		struct page *page = pfn_to_online_page(pfn);
>>>> +
>>>> +		if (!page)
>>>> +			return -EFAULT;
>>>> +
>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>> +			/*
>>>> +			 * There is a potential (but very unlikely) race
>>>> +			 * between two VMs which are sharing a physical page
>>>> +			 * entering this at the same time. However by splitting
>>>> +			 * the test/set the only risk is tags being overwritten
>>>> +			 * by the mte_clear_page_tags() call.
>>>> +			 */
>>>
>>> And I think the real risk here is when the page is writable by at least
>>> one of the VMs sharing the page. This excludes KSM, so it only leaves
>>> the MAP_SHARED mappings.
>>>
>>>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>>>> +				mte_clear_page_tags(page_address(page));
>>>> +				set_bit(PG_mte_tagged, &page->flags);
>>>> +			}
>>>> +		}
>>>
>>> If we want to cover this race (I'd say in a separate patch), we can call
>>> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
>>> got the arguments right). We can avoid the big lock in most cases if
>>> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
>>> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
>>> do for VM_MTE but the new flag would not affect the stage 1 VMM page
>>> attributes).
>>
>> To be honest I'm coming round to just exporting a
>> mte_prepare_page_tags() function which does the clear/set with the lock
>> held. I doubt it's such a performance critical path that it will cause
>> any noticeable issues. Then if we run into performance problems in the
>> future we can start experimenting with extra VM flags etc as necessary.
> 
> It works for me.
> 
>> And from your later email:
>>> Another idea: if VM_SHARED is found for any vma within a region in
>>> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
>>> for the guest or reject the memory slot if MTE was already enabled.
>>>
>>> An alternative here would be to clear VM_MTE_ALLOWED so that any
>>> subsequent mprotect(PROT_MTE) in the VMM would fail in
>>> arch_validate_flags(). MTE would still be allowed in the guest but in
>>> the VMM for the guest memory regions. We can probably do this
>>> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
>>> memory initially with PROT_MTE but that's not an issue IIRC, only the
>>> concurrent mprotect().
>>
>> This could work, but I worry that it's potential fragile. Also the rules
>> for what user space can do are not obvious and may be surprising. I'd
>> also want to look into the likes of mremap() to see how easy it would be
>> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
>> memory sneaking into a memslot.
>>
>> Unless you think it's worth complicating the ABI in the hope of avoiding
>> the big lock overhead I think it's probably best to stick with the big
>> lock at least until we have more data on the overhead.
> 
> It's up to Marc but I think for now just make it safe and once we get
> our hands on hardware, we can assess the impact. For example, starting
> multiple VMs simultaneously will contend on such big lock but we have an
> option to optimise it by setting PG_mte_tagged on allocation via a new
> VM_* flag.
> 
> For my last suggestion above, changing the VMM ABI afterwards is a bit
> tricky, so we could state now that VM_SHARED and MTE are not allowed
> (though it needs a patch to enforce it). That's assuming that mprotect()
> in the VMM cannot race with the user_mem_abort() on another CPU which
> makes the lock necessary anyway.
> 
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>>>  			  unsigned long fault_status)
>>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	if (writable)
>>>>  		prot |= KVM_PGTABLE_PROT_W;
>>>>  
>>>> -	if (fault_status != FSC_PERM && !device)
>>>> +	if (fault_status != FSC_PERM && !device) {
>>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>>>> +		if (ret)
>>>> +			goto out_unlock;
>>>
>>> Maybe it was discussed in a previous version, why do we need this in
>>> addition to kvm_set_spte_gfn()?
>>
>> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
>> memslot is changed by the VMM). For the initial access we will normally
>> fault the page into stage 2 with user_mem_abort().
> 
> Right. Can we move the sanitise_mte_tags() call to
> kvm_pgtable_stage2_map() instead or we don't have the all the
> information needed?

I tried that before: kvm_pgtable_stage2_map() is shared with the
hypervisor so sadly we can't go poking around in the host as this breaks
on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
text:

 * Move the code to sanitise tags out of user_mem_abort() into its own
   function. Also call this new function from kvm_set_spte_gfn() as that
   path was missing the sanitising.

   Originally I was going to move the code all the way down to
   kvm_pgtable_stage2_map(). Sadly as that also part of the EL2
   hypervisor this breaks nVHE as the code needs to perform actions in
   the host.

The only other option I could see would be to provide a wrapper for
kvm_pgtable_stage2_map() in mmu.c which could do the sanitising as
necessary. But considering we know the call site in
kvm_phys_addr_ioremap() doesn't need handling (PROT_DEVICE is always
specified) and there's only two more, it seemed easier just to add the
two calls necessary to the new sanitise_mte_tags().

We also have a direct pointer to 'kvm' this way which is much nicer than
pointer chasing it out of the kvm_pgtable structure.

Steve


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 12:51           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 12:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On 04/06/2021 12:36, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
>> On 03/06/2021 17:00, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index c5d1f3c87dbd..226035cf7d6c 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>>>  	return PAGE_SIZE;
>>>>  }
>>>>  
>>>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>>>> +			     unsigned long size)
>>>> +{
>>>> +	if (kvm_has_mte(kvm)) {
>>>> +		/*
>>>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>>>> +		 * the VM will be able to see the page's tags and therefore
>>>> +		 * they must be initialised first. If PG_mte_tagged is set,
>>>> +		 * tags have already been initialised.
>>>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>>>> +		 * that may not support tags.
>>>> +		 */
>>>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>>>> +		struct page *page = pfn_to_online_page(pfn);
>>>> +
>>>> +		if (!page)
>>>> +			return -EFAULT;
>>>> +
>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>> +			/*
>>>> +			 * There is a potential (but very unlikely) race
>>>> +			 * between two VMs which are sharing a physical page
>>>> +			 * entering this at the same time. However by splitting
>>>> +			 * the test/set the only risk is tags being overwritten
>>>> +			 * by the mte_clear_page_tags() call.
>>>> +			 */
>>>
>>> And I think the real risk here is when the page is writable by at least
>>> one of the VMs sharing the page. This excludes KSM, so it only leaves
>>> the MAP_SHARED mappings.
>>>
>>>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>>>> +				mte_clear_page_tags(page_address(page));
>>>> +				set_bit(PG_mte_tagged, &page->flags);
>>>> +			}
>>>> +		}
>>>
>>> If we want to cover this race (I'd say in a separate patch), we can call
>>> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
>>> got the arguments right). We can avoid the big lock in most cases if
>>> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
>>> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
>>> do for VM_MTE but the new flag would not affect the stage 1 VMM page
>>> attributes).
>>
>> To be honest I'm coming round to just exporting a
>> mte_prepare_page_tags() function which does the clear/set with the lock
>> held. I doubt it's such a performance critical path that it will cause
>> any noticeable issues. Then if we run into performance problems in the
>> future we can start experimenting with extra VM flags etc as necessary.
> 
> It works for me.
> 
>> And from your later email:
>>> Another idea: if VM_SHARED is found for any vma within a region in
>>> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
>>> for the guest or reject the memory slot if MTE was already enabled.
>>>
>>> An alternative here would be to clear VM_MTE_ALLOWED so that any
>>> subsequent mprotect(PROT_MTE) in the VMM would fail in
>>> arch_validate_flags(). MTE would still be allowed in the guest but in
>>> the VMM for the guest memory regions. We can probably do this
>>> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
>>> memory initially with PROT_MTE but that's not an issue IIRC, only the
>>> concurrent mprotect().
>>
>> This could work, but I worry that it's potential fragile. Also the rules
>> for what user space can do are not obvious and may be surprising. I'd
>> also want to look into the likes of mremap() to see how easy it would be
>> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
>> memory sneaking into a memslot.
>>
>> Unless you think it's worth complicating the ABI in the hope of avoiding
>> the big lock overhead I think it's probably best to stick with the big
>> lock at least until we have more data on the overhead.
> 
> It's up to Marc but I think for now just make it safe and once we get
> our hands on hardware, we can assess the impact. For example, starting
> multiple VMs simultaneously will contend on such big lock but we have an
> option to optimise it by setting PG_mte_tagged on allocation via a new
> VM_* flag.
> 
> For my last suggestion above, changing the VMM ABI afterwards is a bit
> tricky, so we could state now that VM_SHARED and MTE are not allowed
> (though it needs a patch to enforce it). That's assuming that mprotect()
> in the VMM cannot race with the user_mem_abort() on another CPU which
> makes the lock necessary anyway.
> 
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>>>  			  unsigned long fault_status)
>>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	if (writable)
>>>>  		prot |= KVM_PGTABLE_PROT_W;
>>>>  
>>>> -	if (fault_status != FSC_PERM && !device)
>>>> +	if (fault_status != FSC_PERM && !device) {
>>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>>>> +		if (ret)
>>>> +			goto out_unlock;
>>>
>>> Maybe it was discussed in a previous version, why do we need this in
>>> addition to kvm_set_spte_gfn()?
>>
>> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
>> memslot is changed by the VMM). For the initial access we will normally
>> fault the page into stage 2 with user_mem_abort().
> 
> Right. Can we move the sanitise_mte_tags() call to
> kvm_pgtable_stage2_map() instead or we don't have the all the
> information needed?

I tried that before: kvm_pgtable_stage2_map() is shared with the
hypervisor so sadly we can't go poking around in the host as this breaks
on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
text:

 * Move the code to sanitise tags out of user_mem_abort() into its own
   function. Also call this new function from kvm_set_spte_gfn() as that
   path was missing the sanitising.

   Originally I was going to move the code all the way down to
   kvm_pgtable_stage2_map(). Sadly as that also part of the EL2
   hypervisor this breaks nVHE as the code needs to perform actions in
   the host.

The only other option I could see would be to provide a wrapper for
kvm_pgtable_stage2_map() in mmu.c which could do the sanitising as
necessary. But considering we know the call site in
kvm_phys_addr_ioremap() doesn't need handling (PROT_DEVICE is always
specified) and there's only two more, it seemed easier just to add the
two calls necessary to the new sanitise_mte_tags().

We also have a direct pointer to 'kvm' this way which is much nicer than
pointer chasing it out of the kvm_pgtable structure.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 12:51           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 12:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 04/06/2021 12:36, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
>> On 03/06/2021 17:00, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index c5d1f3c87dbd..226035cf7d6c 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -822,6 +822,42 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
>>>>  	return PAGE_SIZE;
>>>>  }
>>>>  
>>>> +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>>>> +			     unsigned long size)
>>>> +{
>>>> +	if (kvm_has_mte(kvm)) {
>>>> +		/*
>>>> +		 * The page will be mapped in stage 2 as Normal Cacheable, so
>>>> +		 * the VM will be able to see the page's tags and therefore
>>>> +		 * they must be initialised first. If PG_mte_tagged is set,
>>>> +		 * tags have already been initialised.
>>>> +		 * pfn_to_online_page() is used to reject ZONE_DEVICE pages
>>>> +		 * that may not support tags.
>>>> +		 */
>>>> +		unsigned long i, nr_pages = size >> PAGE_SHIFT;
>>>> +		struct page *page = pfn_to_online_page(pfn);
>>>> +
>>>> +		if (!page)
>>>> +			return -EFAULT;
>>>> +
>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>> +			/*
>>>> +			 * There is a potential (but very unlikely) race
>>>> +			 * between two VMs which are sharing a physical page
>>>> +			 * entering this at the same time. However by splitting
>>>> +			 * the test/set the only risk is tags being overwritten
>>>> +			 * by the mte_clear_page_tags() call.
>>>> +			 */
>>>
>>> And I think the real risk here is when the page is writable by at least
>>> one of the VMs sharing the page. This excludes KSM, so it only leaves
>>> the MAP_SHARED mappings.
>>>
>>>> +			if (!test_bit(PG_mte_tagged, &page->flags)) {
>>>> +				mte_clear_page_tags(page_address(page));
>>>> +				set_bit(PG_mte_tagged, &page->flags);
>>>> +			}
>>>> +		}
>>>
>>> If we want to cover this race (I'd say in a separate patch), we can call
>>> mte_sync_page_tags(page, __pte(0), false, true) directly (hopefully I
>>> got the arguments right). We can avoid the big lock in most cases if
>>> kvm_arch_prepare_memory_region() sets a VM_MTE_RESET (tag clear etc.)
>>> and __alloc_zeroed_user_highpage() clears the tags on allocation (as we
>>> do for VM_MTE but the new flag would not affect the stage 1 VMM page
>>> attributes).
>>
>> To be honest I'm coming round to just exporting a
>> mte_prepare_page_tags() function which does the clear/set with the lock
>> held. I doubt it's such a performance critical path that it will cause
>> any noticeable issues. Then if we run into performance problems in the
>> future we can start experimenting with extra VM flags etc as necessary.
> 
> It works for me.
> 
>> And from your later email:
>>> Another idea: if VM_SHARED is found for any vma within a region in
>>> kvm_arch_prepare_memory_region(), we either prevent the enabling of MTE
>>> for the guest or reject the memory slot if MTE was already enabled.
>>>
>>> An alternative here would be to clear VM_MTE_ALLOWED so that any
>>> subsequent mprotect(PROT_MTE) in the VMM would fail in
>>> arch_validate_flags(). MTE would still be allowed in the guest but in
>>> the VMM for the guest memory regions. We can probably do this
>>> irrespective of VM_SHARED. Of course, the VMM can still mmap() the
>>> memory initially with PROT_MTE but that's not an issue IIRC, only the
>>> concurrent mprotect().
>>
>> This could work, but I worry that it's potential fragile. Also the rules
>> for what user space can do are not obvious and may be surprising. I'd
>> also want to look into the likes of mremap() to see how easy it would be
>> to ensure that we couldn't end up with VM_SHARED (or VM_MTE_ALLOWED)
>> memory sneaking into a memslot.
>>
>> Unless you think it's worth complicating the ABI in the hope of avoiding
>> the big lock overhead I think it's probably best to stick with the big
>> lock at least until we have more data on the overhead.
> 
> It's up to Marc but I think for now just make it safe and once we get
> our hands on hardware, we can assess the impact. For example, starting
> multiple VMs simultaneously will contend on such big lock but we have an
> option to optimise it by setting PG_mte_tagged on allocation via a new
> VM_* flag.
> 
> For my last suggestion above, changing the VMM ABI afterwards is a bit
> tricky, so we could state now that VM_SHARED and MTE are not allowed
> (though it needs a patch to enforce it). That's assuming that mprotect()
> in the VMM cannot race with the user_mem_abort() on another CPU which
> makes the lock necessary anyway.
> 
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>>>>  			  unsigned long fault_status)
>>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  	if (writable)
>>>>  		prot |= KVM_PGTABLE_PROT_W;
>>>>  
>>>> -	if (fault_status != FSC_PERM && !device)
>>>> +	if (fault_status != FSC_PERM && !device) {
>>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
>>>> +		if (ret)
>>>> +			goto out_unlock;
>>>
>>> Maybe it was discussed in a previous version, why do we need this in
>>> addition to kvm_set_spte_gfn()?
>>
>> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
>> memslot is changed by the VMM). For the initial access we will normally
>> fault the page into stage 2 with user_mem_abort().
> 
> Right. Can we move the sanitise_mte_tags() call to
> kvm_pgtable_stage2_map() instead or we don't have the all the
> information needed?

I tried that before: kvm_pgtable_stage2_map() is shared with the
hypervisor so sadly we can't go poking around in the host as this breaks
on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
text:

 * Move the code to sanitise tags out of user_mem_abort() into its own
   function. Also call this new function from kvm_set_spte_gfn() as that
   path was missing the sanitising.

   Originally I was going to move the code all the way down to
   kvm_pgtable_stage2_map(). Sadly as that also part of the EL2
   hypervisor this breaks nVHE as the code needs to perform actions in
   the host.

The only other option I could see would be to provide a wrapper for
kvm_pgtable_stage2_map() in mmu.c which could do the sanitising as
necessary. But considering we know the call site in
kvm_phys_addr_ioremap() doesn't need handling (PROT_DEVICE is always
specified) and there's only two more, it seemed easier just to add the
two calls necessary to the new sanitise_mte_tags().

We also have a direct pointer to 'kvm' this way which is much nicer than
pointer chasing it out of the kvm_pgtable structure.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-04 11:42         ` Catalin Marinas
  (?)
  (?)
@ 2021-06-04 13:09           ` Steven Price
  -1 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 13:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 04/06/2021 12:42, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
>> On 03/06/2021 18:13, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 24223adae150..b3edde68bc3e 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>>>  	__u32 reserved[12];
>>>>  };
>>>>  
>>>> +struct kvm_arm_copy_mte_tags {
>>>> +	__u64 guest_ipa;
>>>> +	__u64 length;
>>>> +	void __user *addr;
>>>> +	__u64 flags;
>>>> +	__u64 reserved[2];
>>>> +};
>>>> +
>>>> +#define KVM_ARM_TAGS_TO_GUEST		0
>>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>>>> +
>>>>  /* If you need to interpret the index values, here is the key: */
>>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index e89a5e275e25..baa33359e477 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>  
>>>>  		return 0;
>>>>  	}
>>>> +	case KVM_ARM_MTE_COPY_TAGS: {
>>>> +		struct kvm_arm_copy_mte_tags copy_tags;
>>>> +
>>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>>>> +			return -EFAULT;
>>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>>>> +	}
>>>
>>> I wonder whether we need an update of the user structure following a
>>> fault, like how much was copied etc. In case of an error, some tags were
>>> copied and the VMM may want to skip the page before continuing. But here
>>> there's no such information provided.
>>>
>>> On the ptrace interface, we return 0 on the syscall if any bytes were
>>> copied and update iov_len to such number. Maybe you want to still return
>>> an error here but updating copy_tags.length would be nice (and, of
>>> course, a copy_to_user() back).
>>
>> Good idea - as you suggest I'll make it update length with the number of
>> bytes not processed. Although in general I think we're expecting the VMM
>> to know where the memory is so this is more of a programming error - but
>> could still be useful for debugging.
> 
> Or update it to the number of bytes copied to be consistent with
> ptrace()'s iov.len. On success, the structure is effectively left
> unchanged.

I was avoiding that because it confuses the error code when the initial
copy_from_user() fails. In that case the structure is clearly unchanged,
so you can only tell from a -EFAULT return that nothing happened. By
returning the number of bytes left you can return an error code along
with the information that the copy only half completed.

It also seems cleaner to leave the structure unchanged if e.g. the flags
or reserved fields are invalid rather than having to set length=0 to
signal that nothing was done.

Although I do feel like arguing whether to use a ptrace() interface or a
copy_{to,from}_user() interface is somewhat ridiculous considering
neither are exactly considered good.

Rather than changing the structure we could return either an error code
(if nothing was copied) or the number of bytes left. That way ioctl()==0
means complete success, >0 means partial success and <0 means complete
failure and provides a detailed error code. The ioctl() can be repeated
(with adjusted pointers) if it returns >0 and a detailed error is needed.

Steve

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 13:09           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 13:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On 04/06/2021 12:42, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
>> On 03/06/2021 18:13, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 24223adae150..b3edde68bc3e 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>>>  	__u32 reserved[12];
>>>>  };
>>>>  
>>>> +struct kvm_arm_copy_mte_tags {
>>>> +	__u64 guest_ipa;
>>>> +	__u64 length;
>>>> +	void __user *addr;
>>>> +	__u64 flags;
>>>> +	__u64 reserved[2];
>>>> +};
>>>> +
>>>> +#define KVM_ARM_TAGS_TO_GUEST		0
>>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>>>> +
>>>>  /* If you need to interpret the index values, here is the key: */
>>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index e89a5e275e25..baa33359e477 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>  
>>>>  		return 0;
>>>>  	}
>>>> +	case KVM_ARM_MTE_COPY_TAGS: {
>>>> +		struct kvm_arm_copy_mte_tags copy_tags;
>>>> +
>>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>>>> +			return -EFAULT;
>>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>>>> +	}
>>>
>>> I wonder whether we need an update of the user structure following a
>>> fault, like how much was copied etc. In case of an error, some tags were
>>> copied and the VMM may want to skip the page before continuing. But here
>>> there's no such information provided.
>>>
>>> On the ptrace interface, we return 0 on the syscall if any bytes were
>>> copied and update iov_len to such number. Maybe you want to still return
>>> an error here but updating copy_tags.length would be nice (and, of
>>> course, a copy_to_user() back).
>>
>> Good idea - as you suggest I'll make it update length with the number of
>> bytes not processed. Although in general I think we're expecting the VMM
>> to know where the memory is so this is more of a programming error - but
>> could still be useful for debugging.
> 
> Or update it to the number of bytes copied to be consistent with
> ptrace()'s iov.len. On success, the structure is effectively left
> unchanged.

I was avoiding that because it confuses the error code when the initial
copy_from_user() fails. In that case the structure is clearly unchanged,
so you can only tell from a -EFAULT return that nothing happened. By
returning the number of bytes left you can return an error code along
with the information that the copy only half completed.

It also seems cleaner to leave the structure unchanged if e.g. the flags
or reserved fields are invalid rather than having to set length=0 to
signal that nothing was done.

Although I do feel like arguing whether to use a ptrace() interface or a
copy_{to,from}_user() interface is somewhat ridiculous considering
neither are exactly considered good.

Rather than changing the structure we could return either an error code
(if nothing was copied) or the number of bytes left. That way ioctl()==0
means complete success, >0 means partial success and <0 means complete
failure and provides a detailed error code. The ioctl() can be repeated
(with adjusted pointers) if it returns >0 and a detailed error is needed.

Steve


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 13:09           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 13:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On 04/06/2021 12:42, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
>> On 03/06/2021 18:13, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 24223adae150..b3edde68bc3e 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>>>  	__u32 reserved[12];
>>>>  };
>>>>  
>>>> +struct kvm_arm_copy_mte_tags {
>>>> +	__u64 guest_ipa;
>>>> +	__u64 length;
>>>> +	void __user *addr;
>>>> +	__u64 flags;
>>>> +	__u64 reserved[2];
>>>> +};
>>>> +
>>>> +#define KVM_ARM_TAGS_TO_GUEST		0
>>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>>>> +
>>>>  /* If you need to interpret the index values, here is the key: */
>>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index e89a5e275e25..baa33359e477 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>  
>>>>  		return 0;
>>>>  	}
>>>> +	case KVM_ARM_MTE_COPY_TAGS: {
>>>> +		struct kvm_arm_copy_mte_tags copy_tags;
>>>> +
>>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>>>> +			return -EFAULT;
>>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>>>> +	}
>>>
>>> I wonder whether we need an update of the user structure following a
>>> fault, like how much was copied etc. In case of an error, some tags were
>>> copied and the VMM may want to skip the page before continuing. But here
>>> there's no such information provided.
>>>
>>> On the ptrace interface, we return 0 on the syscall if any bytes were
>>> copied and update iov_len to such number. Maybe you want to still return
>>> an error here but updating copy_tags.length would be nice (and, of
>>> course, a copy_to_user() back).
>>
>> Good idea - as you suggest I'll make it update length with the number of
>> bytes not processed. Although in general I think we're expecting the VMM
>> to know where the memory is so this is more of a programming error - but
>> could still be useful for debugging.
> 
> Or update it to the number of bytes copied to be consistent with
> ptrace()'s iov.len. On success, the structure is effectively left
> unchanged.

I was avoiding that because it confuses the error code when the initial
copy_from_user() fails. In that case the structure is clearly unchanged,
so you can only tell from a -EFAULT return that nothing happened. By
returning the number of bytes left you can return an error code along
with the information that the copy only half completed.

It also seems cleaner to leave the structure unchanged if e.g. the flags
or reserved fields are invalid rather than having to set length=0 to
signal that nothing was done.

Although I do feel like arguing whether to use a ptrace() interface or a
copy_{to,from}_user() interface is somewhat ridiculous considering
neither are exactly considered good.

Rather than changing the structure we could return either an error code
(if nothing was copied) or the number of bytes left. That way ioctl()==0
means complete success, >0 means partial success and <0 means complete
failure and provides a detailed error code. The ioctl() can be repeated
(with adjusted pointers) if it returns >0 and a detailed error is needed.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 13:09           ` Steven Price
  0 siblings, 0 replies; 92+ messages in thread
From: Steven Price @ 2021-06-04 13:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On 04/06/2021 12:42, Catalin Marinas wrote:
> On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
>> On 03/06/2021 18:13, Catalin Marinas wrote:
>>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 24223adae150..b3edde68bc3e 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
>>>>  	__u32 reserved[12];
>>>>  };
>>>>  
>>>> +struct kvm_arm_copy_mte_tags {
>>>> +	__u64 guest_ipa;
>>>> +	__u64 length;
>>>> +	void __user *addr;
>>>> +	__u64 flags;
>>>> +	__u64 reserved[2];
>>>> +};
>>>> +
>>>> +#define KVM_ARM_TAGS_TO_GUEST		0
>>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
>>>> +
>>>>  /* If you need to interpret the index values, here is the key: */
>>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
>>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index e89a5e275e25..baa33359e477 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>>  
>>>>  		return 0;
>>>>  	}
>>>> +	case KVM_ARM_MTE_COPY_TAGS: {
>>>> +		struct kvm_arm_copy_mte_tags copy_tags;
>>>> +
>>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
>>>> +			return -EFAULT;
>>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
>>>> +	}
>>>
>>> I wonder whether we need an update of the user structure following a
>>> fault, like how much was copied etc. In case of an error, some tags were
>>> copied and the VMM may want to skip the page before continuing. But here
>>> there's no such information provided.
>>>
>>> On the ptrace interface, we return 0 on the syscall if any bytes were
>>> copied and update iov_len to such number. Maybe you want to still return
>>> an error here but updating copy_tags.length would be nice (and, of
>>> course, a copy_to_user() back).
>>
>> Good idea - as you suggest I'll make it update length with the number of
>> bytes not processed. Although in general I think we're expecting the VMM
>> to know where the memory is so this is more of a programming error - but
>> could still be useful for debugging.
> 
> Or update it to the number of bytes copied to be consistent with
> ptrace()'s iov.len. On success, the structure is effectively left
> unchanged.

I was avoiding that because it confuses the error code when the initial
copy_from_user() fails. In that case the structure is clearly unchanged,
so you can only tell from a -EFAULT return that nothing happened. By
returning the number of bytes left you can return an error code along
with the information that the copy only half completed.

It also seems cleaner to leave the structure unchanged if e.g. the flags
or reserved fields are invalid rather than having to set length=0 to
signal that nothing was done.

Although I do feel like arguing whether to use a ptrace() interface or a
copy_{to,from}_user() interface is somewhat ridiculous considering
neither are exactly considered good.

Rather than changing the structure we could return either an error code
(if nothing was copied) or the number of bytes left. That way ioctl()==0
means complete success, >0 means partial success and <0 means complete
failure and provides a detailed error code. The ioctl() can be repeated
(with adjusted pointers) if it returns >0 and a detailed error is needed.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
  2021-06-04 12:51           ` Steven Price
  (?)
  (?)
@ 2021-06-04 14:05             ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 01:51:38PM +0100, Steven Price wrote:
> On 04/06/2021 12:36, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> >> On 03/06/2021 17:00, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>>  	if (writable)
> >>>>  		prot |= KVM_PGTABLE_PROT_W;
> >>>>  
> >>>> -	if (fault_status != FSC_PERM && !device)
> >>>> +	if (fault_status != FSC_PERM && !device) {
> >>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >>>> +		if (ret)
> >>>> +			goto out_unlock;
> >>>
> >>> Maybe it was discussed in a previous version, why do we need this in
> >>> addition to kvm_set_spte_gfn()?
> >>
> >> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> >> memslot is changed by the VMM). For the initial access we will normally
> >> fault the page into stage 2 with user_mem_abort().
> > 
> > Right. Can we move the sanitise_mte_tags() call to
> > kvm_pgtable_stage2_map() instead or we don't have the all the
> > information needed?
> 
> I tried that before: kvm_pgtable_stage2_map() is shared with the
> hypervisor so sadly we can't go poking around in the host as this breaks
> on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
> text:

Ah, I missed this in the cover letter (haven't read it).

So, apart from the nitpick with the early return for less indentation,
feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 14:05             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Fri, Jun 04, 2021 at 01:51:38PM +0100, Steven Price wrote:
> On 04/06/2021 12:36, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> >> On 03/06/2021 17:00, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>>  	if (writable)
> >>>>  		prot |= KVM_PGTABLE_PROT_W;
> >>>>  
> >>>> -	if (fault_status != FSC_PERM && !device)
> >>>> +	if (fault_status != FSC_PERM && !device) {
> >>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >>>> +		if (ret)
> >>>> +			goto out_unlock;
> >>>
> >>> Maybe it was discussed in a previous version, why do we need this in
> >>> addition to kvm_set_spte_gfn()?
> >>
> >> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> >> memslot is changed by the VMM). For the initial access we will normally
> >> fault the page into stage 2 with user_mem_abort().
> > 
> > Right. Can we move the sanitise_mte_tags() call to
> > kvm_pgtable_stage2_map() instead or we don't have the all the
> > information needed?
> 
> I tried that before: kvm_pgtable_stage2_map() is shared with the
> hypervisor so sadly we can't go poking around in the host as this breaks
> on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
> text:

Ah, I missed this in the cover letter (haven't read it).

So, apart from the nitpick with the early return for less indentation,
feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 14:05             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Fri, Jun 04, 2021 at 01:51:38PM +0100, Steven Price wrote:
> On 04/06/2021 12:36, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> >> On 03/06/2021 17:00, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>>  	if (writable)
> >>>>  		prot |= KVM_PGTABLE_PROT_W;
> >>>>  
> >>>> -	if (fault_status != FSC_PERM && !device)
> >>>> +	if (fault_status != FSC_PERM && !device) {
> >>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >>>> +		if (ret)
> >>>> +			goto out_unlock;
> >>>
> >>> Maybe it was discussed in a previous version, why do we need this in
> >>> addition to kvm_set_spte_gfn()?
> >>
> >> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> >> memslot is changed by the VMM). For the initial access we will normally
> >> fault the page into stage 2 with user_mem_abort().
> > 
> > Right. Can we move the sanitise_mte_tags() call to
> > kvm_pgtable_stage2_map() instead or we don't have the all the
> > information needed?
> 
> I tried that before: kvm_pgtable_stage2_map() is shared with the
> hypervisor so sadly we can't go poking around in the host as this breaks
> on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
> text:

Ah, I missed this in the cover letter (haven't read it).

So, apart from the nitpick with the early return for less indentation,
feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature
@ 2021-06-04 14:05             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 14:05 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 01:51:38PM +0100, Steven Price wrote:
> On 04/06/2021 12:36, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 11:42:11AM +0100, Steven Price wrote:
> >> On 03/06/2021 17:00, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:09AM +0100, Steven Price wrote:
> >>>> @@ -971,8 +1007,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>>>  	if (writable)
> >>>>  		prot |= KVM_PGTABLE_PROT_W;
> >>>>  
> >>>> -	if (fault_status != FSC_PERM && !device)
> >>>> +	if (fault_status != FSC_PERM && !device) {
> >>>> +		ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> >>>> +		if (ret)
> >>>> +			goto out_unlock;
> >>>
> >>> Maybe it was discussed in a previous version, why do we need this in
> >>> addition to kvm_set_spte_gfn()?
> >>
> >> kvm_set_spte_gfn() is only used for the MMU notifier path (e.g. if a
> >> memslot is changed by the VMM). For the initial access we will normally
> >> fault the page into stage 2 with user_mem_abort().
> > 
> > Right. Can we move the sanitise_mte_tags() call to
> > kvm_pgtable_stage2_map() instead or we don't have the all the
> > information needed?
> 
> I tried that before: kvm_pgtable_stage2_map() is shared with the
> hypervisor so sadly we can't go poking around in the host as this breaks
> on nVHE. I mentioned it in the v12 cover letter but it was in a wall of
> text:

Ah, I missed this in the cover letter (haven't read it).

So, apart from the nitpick with the early return for less indentation,
feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
  2021-06-04 13:09           ` Steven Price
  (?)
  (?)
@ 2021-06-04 15:34             ` Catalin Marinas
  -1 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 15:34 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 02:09:50PM +0100, Steven Price wrote:
> On 04/06/2021 12:42, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> >> On 03/06/2021 18:13, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >>>> index 24223adae150..b3edde68bc3e 100644
> >>>> --- a/arch/arm64/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>>>  	__u32 reserved[12];
> >>>>  };
> >>>>  
> >>>> +struct kvm_arm_copy_mte_tags {
> >>>> +	__u64 guest_ipa;
> >>>> +	__u64 length;
> >>>> +	void __user *addr;
> >>>> +	__u64 flags;
> >>>> +	__u64 reserved[2];
> >>>> +};
> >>>> +
> >>>> +#define KVM_ARM_TAGS_TO_GUEST		0
> >>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >>>> +
> >>>>  /* If you need to interpret the index values, here is the key: */
> >>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..baa33359e477 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>  
> >>>>  		return 0;
> >>>>  	}
> >>>> +	case KVM_ARM_MTE_COPY_TAGS: {
> >>>> +		struct kvm_arm_copy_mte_tags copy_tags;
> >>>> +
> >>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >>>> +			return -EFAULT;
> >>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >>>> +	}
> >>>
> >>> I wonder whether we need an update of the user structure following a
> >>> fault, like how much was copied etc. In case of an error, some tags were
> >>> copied and the VMM may want to skip the page before continuing. But here
> >>> there's no such information provided.
> >>>
> >>> On the ptrace interface, we return 0 on the syscall if any bytes were
> >>> copied and update iov_len to such number. Maybe you want to still return
> >>> an error here but updating copy_tags.length would be nice (and, of
> >>> course, a copy_to_user() back).
> >>
> >> Good idea - as you suggest I'll make it update length with the number of
> >> bytes not processed. Although in general I think we're expecting the VMM
> >> to know where the memory is so this is more of a programming error - but
> >> could still be useful for debugging.
> > 
> > Or update it to the number of bytes copied to be consistent with
> > ptrace()'s iov.len. On success, the structure is effectively left
> > unchanged.
> 
> I was avoiding that because it confuses the error code when the initial
> copy_from_user() fails. In that case the structure is clearly unchanged,
> so you can only tell from a -EFAULT return that nothing happened. By
> returning the number of bytes left you can return an error code along
> with the information that the copy only half completed.
> 
> It also seems cleaner to leave the structure unchanged if e.g. the flags
> or reserved fields are invalid rather than having to set length=0 to
> signal that nothing was done.
> 
> Although I do feel like arguing whether to use a ptrace() interface or a
> copy_{to,from}_user() interface is somewhat ridiculous considering
> neither are exactly considered good.
> 
> Rather than changing the structure we could return either an error code
> (if nothing was copied) or the number of bytes left. That way ioctl()==0
> means complete success, >0 means partial success and <0 means complete
> failure and provides a detailed error code. The ioctl() can be repeated
> (with adjusted pointers) if it returns >0 and a detailed error is needed.

That would be more like read/write (nearly, those always return the
amount copied). Anyway, I don't have any strong preference, I'll leave
the details up to you as long as there is some indication of how much
was copied or left.

-- 
Catalin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 15:34             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 15:34 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Maydell, Dr. David Alan Gilbert,
	Andrew Jones, Haibo Xu, Suzuki K Poulose, qemu-devel,
	Marc Zyngier, Juan Quintela, Richard Henderson, linux-kernel,
	Dave Martin, James Morse, linux-arm-kernel, Thomas Gleixner,
	Will Deacon, kvmarm, Julien Thierry

On Fri, Jun 04, 2021 at 02:09:50PM +0100, Steven Price wrote:
> On 04/06/2021 12:42, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> >> On 03/06/2021 18:13, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >>>> index 24223adae150..b3edde68bc3e 100644
> >>>> --- a/arch/arm64/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>>>  	__u32 reserved[12];
> >>>>  };
> >>>>  
> >>>> +struct kvm_arm_copy_mte_tags {
> >>>> +	__u64 guest_ipa;
> >>>> +	__u64 length;
> >>>> +	void __user *addr;
> >>>> +	__u64 flags;
> >>>> +	__u64 reserved[2];
> >>>> +};
> >>>> +
> >>>> +#define KVM_ARM_TAGS_TO_GUEST		0
> >>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >>>> +
> >>>>  /* If you need to interpret the index values, here is the key: */
> >>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..baa33359e477 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>  
> >>>>  		return 0;
> >>>>  	}
> >>>> +	case KVM_ARM_MTE_COPY_TAGS: {
> >>>> +		struct kvm_arm_copy_mte_tags copy_tags;
> >>>> +
> >>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >>>> +			return -EFAULT;
> >>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >>>> +	}
> >>>
> >>> I wonder whether we need an update of the user structure following a
> >>> fault, like how much was copied etc. In case of an error, some tags were
> >>> copied and the VMM may want to skip the page before continuing. But here
> >>> there's no such information provided.
> >>>
> >>> On the ptrace interface, we return 0 on the syscall if any bytes were
> >>> copied and update iov_len to such number. Maybe you want to still return
> >>> an error here but updating copy_tags.length would be nice (and, of
> >>> course, a copy_to_user() back).
> >>
> >> Good idea - as you suggest I'll make it update length with the number of
> >> bytes not processed. Although in general I think we're expecting the VMM
> >> to know where the memory is so this is more of a programming error - but
> >> could still be useful for debugging.
> > 
> > Or update it to the number of bytes copied to be consistent with
> > ptrace()'s iov.len. On success, the structure is effectively left
> > unchanged.
> 
> I was avoiding that because it confuses the error code when the initial
> copy_from_user() fails. In that case the structure is clearly unchanged,
> so you can only tell from a -EFAULT return that nothing happened. By
> returning the number of bytes left you can return an error code along
> with the information that the copy only half completed.
> 
> It also seems cleaner to leave the structure unchanged if e.g. the flags
> or reserved fields are invalid rather than having to set length=0 to
> signal that nothing was done.
> 
> Although I do feel like arguing whether to use a ptrace() interface or a
> copy_{to,from}_user() interface is somewhat ridiculous considering
> neither are exactly considered good.
> 
> Rather than changing the structure we could return either an error code
> (if nothing was copied) or the number of bytes left. That way ioctl()==0
> means complete success, >0 means partial success and <0 means complete
> failure and provides a detailed error code. The ioctl() can be repeated
> (with adjusted pointers) if it returns >0 and a detailed error is needed.

That would be more like read/write (nearly, those always return the
amount copied). Anyway, I don't have any strong preference, I'll leave
the details up to you as long as there is some indication of how much
was copied or left.

-- 
Catalin


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 15:34             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 15:34 UTC (permalink / raw)
  To: Steven Price
  Cc: Dr. David Alan Gilbert, qemu-devel, Marc Zyngier, Juan Quintela,
	Richard Henderson, linux-kernel, Dave Martin, linux-arm-kernel,
	Thomas Gleixner, Will Deacon, kvmarm

On Fri, Jun 04, 2021 at 02:09:50PM +0100, Steven Price wrote:
> On 04/06/2021 12:42, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> >> On 03/06/2021 18:13, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >>>> index 24223adae150..b3edde68bc3e 100644
> >>>> --- a/arch/arm64/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>>>  	__u32 reserved[12];
> >>>>  };
> >>>>  
> >>>> +struct kvm_arm_copy_mte_tags {
> >>>> +	__u64 guest_ipa;
> >>>> +	__u64 length;
> >>>> +	void __user *addr;
> >>>> +	__u64 flags;
> >>>> +	__u64 reserved[2];
> >>>> +};
> >>>> +
> >>>> +#define KVM_ARM_TAGS_TO_GUEST		0
> >>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >>>> +
> >>>>  /* If you need to interpret the index values, here is the key: */
> >>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..baa33359e477 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>  
> >>>>  		return 0;
> >>>>  	}
> >>>> +	case KVM_ARM_MTE_COPY_TAGS: {
> >>>> +		struct kvm_arm_copy_mte_tags copy_tags;
> >>>> +
> >>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >>>> +			return -EFAULT;
> >>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >>>> +	}
> >>>
> >>> I wonder whether we need an update of the user structure following a
> >>> fault, like how much was copied etc. In case of an error, some tags were
> >>> copied and the VMM may want to skip the page before continuing. But here
> >>> there's no such information provided.
> >>>
> >>> On the ptrace interface, we return 0 on the syscall if any bytes were
> >>> copied and update iov_len to such number. Maybe you want to still return
> >>> an error here but updating copy_tags.length would be nice (and, of
> >>> course, a copy_to_user() back).
> >>
> >> Good idea - as you suggest I'll make it update length with the number of
> >> bytes not processed. Although in general I think we're expecting the VMM
> >> to know where the memory is so this is more of a programming error - but
> >> could still be useful for debugging.
> > 
> > Or update it to the number of bytes copied to be consistent with
> > ptrace()'s iov.len. On success, the structure is effectively left
> > unchanged.
> 
> I was avoiding that because it confuses the error code when the initial
> copy_from_user() fails. In that case the structure is clearly unchanged,
> so you can only tell from a -EFAULT return that nothing happened. By
> returning the number of bytes left you can return an error code along
> with the information that the copy only half completed.
> 
> It also seems cleaner to leave the structure unchanged if e.g. the flags
> or reserved fields are invalid rather than having to set length=0 to
> signal that nothing was done.
> 
> Although I do feel like arguing whether to use a ptrace() interface or a
> copy_{to,from}_user() interface is somewhat ridiculous considering
> neither are exactly considered good.
> 
> Rather than changing the structure we could return either an error code
> (if nothing was copied) or the number of bytes left. That way ioctl()==0
> means complete success, >0 means partial success and <0 means complete
> failure and provides a detailed error code. The ioctl() can be repeated
> (with adjusted pointers) if it returns >0 and a detailed error is needed.

That would be more like read/write (nearly, those always return the
amount copied). Anyway, I don't have any strong preference, I'll leave
the details up to you as long as there is some indication of how much
was copied or left.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
@ 2021-06-04 15:34             ` Catalin Marinas
  0 siblings, 0 replies; 92+ messages in thread
From: Catalin Marinas @ 2021-06-04 15:34 UTC (permalink / raw)
  To: Steven Price
  Cc: Marc Zyngier, Will Deacon, James Morse, Julien Thierry,
	Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel,
	Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel,
	Juan Quintela, Dr. David Alan Gilbert, Richard Henderson,
	Peter Maydell, Haibo Xu, Andrew Jones

On Fri, Jun 04, 2021 at 02:09:50PM +0100, Steven Price wrote:
> On 04/06/2021 12:42, Catalin Marinas wrote:
> > On Fri, Jun 04, 2021 at 12:15:56PM +0100, Steven Price wrote:
> >> On 03/06/2021 18:13, Catalin Marinas wrote:
> >>> On Mon, May 24, 2021 at 11:45:12AM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >>>> index 24223adae150..b3edde68bc3e 100644
> >>>> --- a/arch/arm64/include/uapi/asm/kvm.h
> >>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >>>> @@ -184,6 +184,17 @@ struct kvm_vcpu_events {
> >>>>  	__u32 reserved[12];
> >>>>  };
> >>>>  
> >>>> +struct kvm_arm_copy_mte_tags {
> >>>> +	__u64 guest_ipa;
> >>>> +	__u64 length;
> >>>> +	void __user *addr;
> >>>> +	__u64 flags;
> >>>> +	__u64 reserved[2];
> >>>> +};
> >>>> +
> >>>> +#define KVM_ARM_TAGS_TO_GUEST		0
> >>>> +#define KVM_ARM_TAGS_FROM_GUEST		1
> >>>> +
> >>>>  /* If you need to interpret the index values, here is the key: */
> >>>>  #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
> >>>>  #define KVM_REG_ARM_COPROC_SHIFT	16
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..baa33359e477 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1345,6 +1345,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>>>  
> >>>>  		return 0;
> >>>>  	}
> >>>> +	case KVM_ARM_MTE_COPY_TAGS: {
> >>>> +		struct kvm_arm_copy_mte_tags copy_tags;
> >>>> +
> >>>> +		if (copy_from_user(&copy_tags, argp, sizeof(copy_tags)))
> >>>> +			return -EFAULT;
> >>>> +		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
> >>>> +	}
> >>>
> >>> I wonder whether we need an update of the user structure following a
> >>> fault, like how much was copied etc. In case of an error, some tags were
> >>> copied and the VMM may want to skip the page before continuing. But here
> >>> there's no such information provided.
> >>>
> >>> On the ptrace interface, we return 0 on the syscall if any bytes were
> >>> copied and update iov_len to such number. Maybe you want to still return
> >>> an error here but updating copy_tags.length would be nice (and, of
> >>> course, a copy_to_user() back).
> >>
> >> Good idea - as you suggest I'll make it update length with the number of
> >> bytes not processed. Although in general I think we're expecting the VMM
> >> to know where the memory is so this is more of a programming error - but
> >> could still be useful for debugging.
> > 
> > Or update it to the number of bytes copied to be consistent with
> > ptrace()'s iov.len. On success, the structure is effectively left
> > unchanged.
> 
> I was avoiding that because it confuses the error code when the initial
> copy_from_user() fails. In that case the structure is clearly unchanged,
> so you can only tell from a -EFAULT return that nothing happened. By
> returning the number of bytes left you can return an error code along
> with the information that the copy only half completed.
> 
> It also seems cleaner to leave the structure unchanged if e.g. the flags
> or reserved fields are invalid rather than having to set length=0 to
> signal that nothing was done.
> 
> Although I do feel like arguing whether to use a ptrace() interface or a
> copy_{to,from}_user() interface is somewhat ridiculous considering
> neither are exactly considered good.
> 
> Rather than changing the structure we could return either an error code
> (if nothing was copied) or the number of bytes left. That way ioctl()==0
> means complete success, >0 means partial success and <0 means complete
> failure and provides a detailed error code. The ioctl() can be repeated
> (with adjusted pointers) if it returns >0 and a detailed error is needed.

That would be more like read/write (nearly, those always return the
amount copied). Anyway, I don't have any strong preference, I'll leave
the details up to you as long as there is some indication of how much
was copied or left.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2021-06-04 15:36 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-24 10:45 [PATCH v13 0/8] MTE support for KVM guest Steven Price
2021-05-24 10:45 ` Steven Price
2021-05-24 10:45 ` Steven Price
2021-05-24 10:45 ` Steven Price
2021-05-24 10:45 ` [PATCH v13 1/8] arm64: mte: Handle race when synchronising tags Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45 ` [PATCH v13 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45 ` [PATCH v13 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-06-03 14:20   ` Catalin Marinas
2021-06-03 14:20     ` Catalin Marinas
2021-06-03 14:20     ` Catalin Marinas
2021-06-03 14:20     ` Catalin Marinas
2021-05-24 10:45 ` [PATCH v13 4/8] KVM: arm64: Introduce MTE VM feature Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-06-03 16:00   ` Catalin Marinas
2021-06-03 16:00     ` Catalin Marinas
2021-06-03 16:00     ` Catalin Marinas
2021-06-03 16:00     ` Catalin Marinas
2021-06-04  9:01     ` Catalin Marinas
2021-06-04  9:01       ` Catalin Marinas
2021-06-04  9:01       ` Catalin Marinas
2021-06-04  9:01       ` Catalin Marinas
2021-06-04 10:42     ` Steven Price
2021-06-04 10:42       ` Steven Price
2021-06-04 10:42       ` Steven Price
2021-06-04 10:42       ` Steven Price
2021-06-04 11:36       ` Catalin Marinas
2021-06-04 11:36         ` Catalin Marinas
2021-06-04 11:36         ` Catalin Marinas
2021-06-04 11:36         ` Catalin Marinas
2021-06-04 12:51         ` Steven Price
2021-06-04 12:51           ` Steven Price
2021-06-04 12:51           ` Steven Price
2021-06-04 12:51           ` Steven Price
2021-06-04 14:05           ` Catalin Marinas
2021-06-04 14:05             ` Catalin Marinas
2021-06-04 14:05             ` Catalin Marinas
2021-06-04 14:05             ` Catalin Marinas
2021-05-24 10:45 ` [PATCH v13 5/8] KVM: arm64: Save/restore MTE registers Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-06-03 16:48   ` Catalin Marinas
2021-06-03 16:48     ` Catalin Marinas
2021-06-03 16:48     ` Catalin Marinas
2021-06-03 16:48     ` Catalin Marinas
2021-05-24 10:45 ` [PATCH v13 6/8] KVM: arm64: Expose KVM_ARM_CAP_MTE Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-06-03 16:58   ` Catalin Marinas
2021-06-03 16:58     ` Catalin Marinas
2021-06-03 16:58     ` Catalin Marinas
2021-06-03 16:58     ` Catalin Marinas
2021-05-24 10:45 ` [PATCH v13 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-06-03 17:13   ` Catalin Marinas
2021-06-03 17:13     ` Catalin Marinas
2021-06-03 17:13     ` Catalin Marinas
2021-06-03 17:13     ` Catalin Marinas
2021-06-04 11:15     ` Steven Price
2021-06-04 11:15       ` Steven Price
2021-06-04 11:15       ` Steven Price
2021-06-04 11:15       ` Steven Price
2021-06-04 11:42       ` Catalin Marinas
2021-06-04 11:42         ` Catalin Marinas
2021-06-04 11:42         ` Catalin Marinas
2021-06-04 11:42         ` Catalin Marinas
2021-06-04 13:09         ` Steven Price
2021-06-04 13:09           ` Steven Price
2021-06-04 13:09           ` Steven Price
2021-06-04 13:09           ` Steven Price
2021-06-04 15:34           ` Catalin Marinas
2021-06-04 15:34             ` Catalin Marinas
2021-06-04 15:34             ` Catalin Marinas
2021-06-04 15:34             ` Catalin Marinas
2021-05-24 10:45 ` [PATCH v13 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price
2021-05-24 10:45   ` Steven Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.