Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v7 00/26]  arm64: Memory Tagging Extension user-space support
@ 2020-07-15 17:08 Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 01/29] arm64: mte: system register definitions Catalin Marinas
                   ` (28 more replies)
  0 siblings, 29 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

This is version 7 (6th version here [1]) of the series adding user-space
support for the ARMv8.5 Memory Tagging Extension ([2], [3]). The patches
are also available on this branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/mte

There have been no further ABI changes and still aiming for a 5.9 merge.
I would be grateful for acks (or naks) on the following patches:

mm:

  [PATCH v7 06/29] mm: Add PG_arch_2 page flag
  [PATCH v7 07/29] mm: Preserve the PG_arch_2 flag in __split_huge_page_tail()
  [PATCH v7 13/29] mm: Introduce arch_calc_vm_flag_bits()

fs:
  
  [PATCH v7 24/29] fs: Handle intra-page faults in copy_mount_options()

arm64 KVM (small new addition in v7):

  [PATCH v7 02/29] arm64: mte: CPU feature detection and initial sysreg configuration

Changes in this version:

- __split_huge_page_tail() function update to preserve the PG_arch_2
  flag only. The PG_arch_1 will be submitted separately since s390 and
  x86 may be affected.

- ptrace NT_ARM_TAGGED_ADDR_CTRL regset for access to the tagged address
  ABI and MTE configuration of a process, together with documentation
  update.

- Kconfig update to check for the presence of the STGM instruction in
  addition to '.arch armv8.5-a+memtag'. A late architecture update
  introduced the LDGM/STGM and there are binutils versions out there
  that don't support it.

- Re-ordering of the CNP and MTE initialisation to cater for late
  secondary CPU bring-up (post the arch_initcall()).

[1] https://lore.kernel.org/linux-arm-kernel/20200703153718.16973-1-catalin.marinas@arm.com/
[2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[3] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Arm_Memory_Tagging_Extension_Whitepaper.pdf
[4] https://sourceware.org/pipermail/libc-alpha/2020-June/115039.html

Catalin Marinas (17):
  arm64: mte: Use Normal Tagged attributes for the linear map
  mm: Preserve the PG_arch_2 flag in __split_huge_page_tail()
  arm64: mte: Clear the tags when a page is mapped in user-space with
    PROT_MTE
  arm64: Avoid unnecessary clear_user_page() indirection
  arm64: mte: Tags-aware aware memcmp_pages() implementation
  arm64: mte: Handle the MAIR_EL1 changes for late CPU bring-up
  arm64: mte: Add PROT_MTE support to mmap() and mprotect()
  mm: Introduce arch_validate_flags()
  arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
  mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
  arm64: mte: Allow user control of the tag check mode via prctl()
  arm64: mte: Allow user control of the generated random tags via
    prctl()
  arm64: mte: Restore the GCR_EL1 register after a suspend
  arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
  arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
  arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
  fs: Handle intra-page faults in copy_mount_options()

Kevin Brodsky (1):
  mm: Introduce arch_calc_vm_flag_bits()

Steven Price (4):
  mm: Add PG_arch_2 page flag
  mm: Add arch hooks for saving/restoring tags
  arm64: mte: Enable swap of tagged pages
  arm64: mte: Save tags when hibernating

Vincenzo Frascino (7):
  arm64: mte: system register definitions
  arm64: mte: CPU feature detection and initial sysreg configuration
  arm64: mte: Add specific SIGSEGV codes
  arm64: mte: Handle synchronous and asynchronous tag check faults
  arm64: mte: Tags-aware copy_{user_,}highpage() implementations
  arm64: mte: Kconfig entry
  arm64: mte: Add Memory Tagging Extension documentation

 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/elf_hwcaps.rst            |   4 +
 Documentation/arm64/index.rst                 |   1 +
 .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++
 arch/arm64/Kconfig                            |  31 ++
 arch/arm64/include/asm/cpucaps.h              |   5 +-
 arch/arm64/include/asm/cpufeature.h           |   6 +
 arch/arm64/include/asm/hwcap.h                |   1 +
 arch/arm64/include/asm/kvm_arm.h              |   3 +-
 arch/arm64/include/asm/memory.h               |  17 +-
 arch/arm64/include/asm/mman.h                 |  56 ++-
 arch/arm64/include/asm/mte.h                  |  86 +++++
 arch/arm64/include/asm/page.h                 |  19 +-
 arch/arm64/include/asm/pgtable-prot.h         |   2 +
 arch/arm64/include/asm/pgtable.h              |  46 ++-
 arch/arm64/include/asm/processor.h            |  12 +-
 arch/arm64/include/asm/sysreg.h               |  61 ++++
 arch/arm64/include/asm/thread_info.h          |   4 +-
 arch/arm64/include/uapi/asm/hwcap.h           |   1 +
 arch/arm64/include/uapi/asm/mman.h            |   1 +
 arch/arm64/include/uapi/asm/ptrace.h          |   4 +
 arch/arm64/kernel/Makefile                    |   1 +
 arch/arm64/kernel/cpufeature.c                |  67 ++++
 arch/arm64/kernel/cpuinfo.c                   |   1 +
 arch/arm64/kernel/entry.S                     |  37 ++
 arch/arm64/kernel/hibernate.c                 | 118 ++++++
 arch/arm64/kernel/mte.c                       | 337 ++++++++++++++++++
 arch/arm64/kernel/process.c                   |  45 ++-
 arch/arm64/kernel/ptrace.c                    |  53 ++-
 arch/arm64/kernel/signal.c                    |   9 +
 arch/arm64/kernel/suspend.c                   |   4 +
 arch/arm64/kernel/syscall.c                   |  10 +
 arch/arm64/kvm/sys_regs.c                     |   2 +
 arch/arm64/lib/Makefile                       |   2 +
 arch/arm64/lib/mte.S                          | 151 ++++++++
 arch/arm64/mm/Makefile                        |   1 +
 arch/arm64/mm/copypage.c                      |  25 +-
 arch/arm64/mm/dump.c                          |   4 +
 arch/arm64/mm/fault.c                         |   9 +-
 arch/arm64/mm/mmu.c                           |  22 +-
 arch/arm64/mm/mteswap.c                       |  83 +++++
 arch/arm64/mm/proc.S                          |   8 +-
 arch/x86/kernel/signal_compat.c               |   2 +-
 fs/namespace.c                                |  25 +-
 fs/proc/page.c                                |   3 +
 fs/proc/task_mmu.c                            |   4 +
 include/linux/kernel-page-flags.h             |   1 +
 include/linux/mm.h                            |   8 +
 include/linux/mman.h                          |  23 +-
 include/linux/page-flags.h                    |   3 +
 include/linux/pgtable.h                       |  28 ++
 include/trace/events/mmflags.h                |   9 +-
 include/uapi/asm-generic/siginfo.h            |   4 +-
 include/uapi/linux/elf.h                      |   1 +
 include/uapi/linux/prctl.h                    |   9 +
 mm/huge_memory.c                              |   3 +
 mm/mmap.c                                     |   9 +
 mm/mprotect.c                                 |   6 +
 mm/page_io.c                                  |  10 +
 mm/shmem.c                                    |   9 +
 mm/swapfile.c                                 |   2 +
 mm/util.c                                     |   2 +-
 tools/vm/page-types.c                         |   2 +
 63 files changed, 1760 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/arm64/memory-tagging-extension.rst
 create mode 100644 arch/arm64/include/asm/mte.h
 create mode 100644 arch/arm64/kernel/mte.c
 create mode 100644 arch/arm64/lib/mte.S
 create mode 100644 arch/arm64/mm/mteswap.c



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 01/29] arm64: mte: system register definitions
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 02/29] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add Memory Tagging Extension system register definitions together with
the relevant bitfields.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Added SET_PSTATE_TCO() macro.

 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/sysreg.h      | 53 ++++++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/ptrace.h |  1 +
 arch/arm64/kernel/ptrace.c           |  2 +-
 4 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 51c1d9918999..8a1cbfd544d6 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -12,6 +12,7 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_ATA		(UL(1) << 56)
 #define HCR_FWB		(UL(1) << 46)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 463175f80341..97bc523882f3 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -91,10 +91,12 @@
 #define PSTATE_PAN			pstate_field(0, 4)
 #define PSTATE_UAO			pstate_field(0, 3)
 #define PSTATE_SSBS			pstate_field(3, 1)
+#define PSTATE_TCO			pstate_field(3, 4)
 
 #define SET_PSTATE_PAN(x)		__emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
+#define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
 
 #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
 	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
@@ -181,6 +183,8 @@
 #define SYS_SCTLR_EL1			sys_reg(3, 0, 1, 0, 0)
 #define SYS_ACTLR_EL1			sys_reg(3, 0, 1, 0, 1)
 #define SYS_CPACR_EL1			sys_reg(3, 0, 1, 0, 2)
+#define SYS_RGSR_EL1			sys_reg(3, 0, 1, 0, 5)
+#define SYS_GCR_EL1			sys_reg(3, 0, 1, 0, 6)
 
 #define SYS_ZCR_EL1			sys_reg(3, 0, 1, 2, 0)
 
@@ -218,6 +222,8 @@
 #define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
 #define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
 #define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+#define SYS_TFSR_EL1			sys_reg(3, 0, 5, 6, 0)
+#define SYS_TFSRE0_EL1			sys_reg(3, 0, 5, 6, 1)
 
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
@@ -368,6 +374,7 @@
 
 #define SYS_CCSIDR_EL1			sys_reg(3, 1, 0, 0, 0)
 #define SYS_CLIDR_EL1			sys_reg(3, 1, 0, 0, 1)
+#define SYS_GMID_EL1			sys_reg(3, 1, 0, 0, 4)
 #define SYS_AIDR_EL1			sys_reg(3, 1, 0, 0, 7)
 
 #define SYS_CSSELR_EL1			sys_reg(3, 2, 0, 0, 0)
@@ -460,6 +467,7 @@
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
+#define SYS_TFSR_EL2			sys_reg(3, 4, 5, 6, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
 #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
@@ -516,6 +524,7 @@
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
+#define SYS_TFSR_EL12			sys_reg(3, 5, 5, 6, 0)
 #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
 #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
 #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
@@ -531,6 +540,15 @@
 
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
+#define SCTLR_ELx_ATA	(BIT(43))
+
+#define SCTLR_ELx_TCF_SHIFT	40
+#define SCTLR_ELx_TCF_NONE	(UL(0x0) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_SYNC	(UL(0x1) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_ASYNC	(UL(0x2) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_MASK	(UL(0x3) << SCTLR_ELx_TCF_SHIFT)
+
+#define SCTLR_ELx_ITFSB	(BIT(37))
 #define SCTLR_ELx_ENIA	(BIT(31))
 #define SCTLR_ELx_ENIB	(BIT(30))
 #define SCTLR_ELx_ENDA	(BIT(27))
@@ -559,6 +577,14 @@
 #endif
 
 /* SCTLR_EL1 specific flags. */
+#define SCTLR_EL1_ATA0		(BIT(42))
+
+#define SCTLR_EL1_TCF0_SHIFT	38
+#define SCTLR_EL1_TCF0_NONE	(UL(0x0) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_SYNC	(UL(0x1) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_ASYNC	(UL(0x2) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_MASK	(UL(0x3) << SCTLR_EL1_TCF0_SHIFT)
+
 #define SCTLR_EL1_BT1		(BIT(36))
 #define SCTLR_EL1_BT0		(BIT(35))
 #define SCTLR_EL1_UCI		(BIT(26))
@@ -595,6 +621,7 @@
 #define MAIR_ATTR_DEVICE_GRE		UL(0x0c)
 #define MAIR_ATTR_NORMAL_NC		UL(0x44)
 #define MAIR_ATTR_NORMAL_WT		UL(0xbb)
+#define MAIR_ATTR_NORMAL_TAGGED		UL(0xf0)
 #define MAIR_ATTR_NORMAL		UL(0xff)
 #define MAIR_ATTR_MASK			UL(0xff)
 
@@ -683,6 +710,10 @@
 #define ID_AA64PFR1_SSBS_PSTATE_INSNS	2
 #define ID_AA64PFR1_BT_BTI		0x1
 
+#define ID_AA64PFR1_MTE_NI		0x0
+#define ID_AA64PFR1_MTE_EL0		0x1
+#define ID_AA64PFR1_MTE			0x2
+
 /* id_aa64zfr0 */
 #define ID_AA64ZFR0_F64MM_SHIFT		56
 #define ID_AA64ZFR0_F32MM_SHIFT		52
@@ -875,6 +906,28 @@
 #define CPACR_EL1_ZEN_EL0EN	(BIT(17)) /* enable EL0 access, if EL1EN set */
 #define CPACR_EL1_ZEN		(CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
 
+/* TCR EL1 Bit Definitions */
+#define SYS_TCR_EL1_TCMA1	(BIT(58))
+#define SYS_TCR_EL1_TCMA0	(BIT(57))
+
+/* GCR_EL1 Definitions */
+#define SYS_GCR_EL1_RRND	(BIT(16))
+#define SYS_GCR_EL1_EXCL_MASK	0xffffUL
+
+/* RGSR_EL1 Definitions */
+#define SYS_RGSR_EL1_TAG_MASK	0xfUL
+#define SYS_RGSR_EL1_SEED_SHIFT	8
+#define SYS_RGSR_EL1_SEED_MASK	0xffffUL
+
+/* GMID_EL1 field definitions */
+#define SYS_GMID_EL1_BS_SHIFT	0
+#define SYS_GMID_EL1_BS_SIZE	4
+
+/* TFSR{,E0}_EL1 bit definitions */
+#define SYS_TFSR_EL1_TF0_SHIFT	0
+#define SYS_TFSR_EL1_TF1_SHIFT	1
+#define SYS_TFSR_EL1_TF0	(UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
+#define SYS_TFSR_EL1_TF1	(UK(2) << SYS_TFSR_EL1_TF1_SHIFT)
 
 /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
 #define SYS_MPIDR_SAFE_VAL	(BIT(31))
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 42cbe34d95ce..06413d9f2341 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -51,6 +51,7 @@
 #define PSR_PAN_BIT	0x00400000
 #define PSR_UAO_BIT	0x00800000
 #define PSR_DIT_BIT	0x01000000
+#define PSR_TCO_BIT	0x02000000
 #define PSR_V_BIT	0x10000000
 #define PSR_C_BIT	0x20000000
 #define PSR_Z_BIT	0x40000000
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 68b7f34a08f5..4582014dda25 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1873,7 +1873,7 @@ void syscall_trace_exit(struct pt_regs *regs)
  * We also reserve IL for the kernel; SS is handled dynamically.
  */
 #define SPSR_EL1_AARCH64_RES0_BITS \
-	(GENMASK_ULL(63, 32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \
+	(GENMASK_ULL(63, 32) | GENMASK_ULL(27, 26) | GENMASK_ULL(23, 22) | \
 	 GENMASK_ULL(20, 13) | GENMASK_ULL(5, 5))
 #define SPSR_EL1_AARCH32_RES0_BITS \
 	(GENMASK_ULL(63, 32) | GENMASK_ULL(22, 22) | GENMASK_ULL(20, 20))


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 02/29] arm64: mte: CPU feature detection and initial sysreg configuration
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 01/29] arm64: mte: system register definitions Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 03/29] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Marc Zyngier, Suzuki K Poulose

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add the cpufeature and hwcap entries to detect the presence of MTE on
the boot CPUs (primary and secondary). Any late secondary CPU not
supporting the feature, if detected during boot, will be parked.

In addition, add the minimum SCTLR_EL1 and HCR_EL2 bits for enabling
MTE. Without subsequent setting of MAIR, these bits do not have an
effect on tag checking.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
---

Notes:
    v7:
    - Hide the MTE ID register field for guests until MTE gains support for KVM.

 arch/arm64/include/asm/cpucaps.h    |  3 ++-
 arch/arm64/include/asm/cpufeature.h |  6 ++++++
 arch/arm64/include/asm/hwcap.h      |  1 +
 arch/arm64/include/asm/kvm_arm.h    |  2 +-
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/include/uapi/asm/hwcap.h |  1 +
 arch/arm64/kernel/cpufeature.c      | 30 +++++++++++++++++++++++++++++
 arch/arm64/kernel/cpuinfo.c         |  1 +
 arch/arm64/kvm/sys_regs.c           |  2 ++
 9 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index d7b3bb0cb180..6bc3e21e5929 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -62,7 +62,8 @@
 #define ARM64_HAS_GENERIC_AUTH			52
 #define ARM64_HAS_32BIT_EL1			53
 #define ARM64_BTI				54
+#define ARM64_MTE				55
 
-#define ARM64_NCAPS				55
+#define ARM64_NCAPS				56
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 5d1f4ae42799..c673283abd31 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -681,6 +681,12 @@ static inline bool system_uses_irq_prio_masking(void)
 	       cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING);
 }
 
+static inline bool system_supports_mte(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_MTE) &&
+		cpus_have_const_cap(ARM64_MTE);
+}
+
 static inline bool system_has_prio_mask_debugging(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) &&
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index d683bcbf1e7c..0d4a6741b6a5 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -95,6 +95,7 @@
 #define KERNEL_HWCAP_DGH		__khwcap2_feature(DGH)
 #define KERNEL_HWCAP_RNG		__khwcap2_feature(RNG)
 #define KERNEL_HWCAP_BTI		__khwcap2_feature(BTI)
+#define KERNEL_HWCAP_MTE		__khwcap2_feature(MTE)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8a1cbfd544d6..6c3b2fc922bb 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -78,7 +78,7 @@
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK)
+#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 97bc523882f3..2e12d8049d1c 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -613,6 +613,7 @@
 			 SCTLR_EL1_SA0  | SCTLR_EL1_SED  | SCTLR_ELx_I    |\
 			 SCTLR_EL1_DZE  | SCTLR_EL1_UCT                   |\
 			 SCTLR_EL1_NTWE | SCTLR_ELx_IESB | SCTLR_EL1_SPAN |\
+			 SCTLR_ELx_ITFSB| SCTLR_ELx_ATA  | SCTLR_EL1_ATA0 |\
 			 ENDIAN_SET_EL1 | SCTLR_EL1_UCI  | SCTLR_EL1_RES1)
 
 /* MAIR_ELx memory attributes (used by Linux) */
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 2d6ba1c2592e..b8f41aa234ee 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -74,5 +74,6 @@
 #define HWCAP2_DGH		(1 << 15)
 #define HWCAP2_RNG		(1 << 16)
 #define HWCAP2_BTI		(1 << 17)
+#define HWCAP2_MTE		(1 << 18)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9f63053a63a9..ccfdcb93d736 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -240,6 +240,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
+		       FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI),
 	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_BTI),
 				    FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_BT_SHIFT, 4, 0),
@@ -1657,6 +1659,18 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
 }
 #endif /* CONFIG_ARM64_BTI */
 
+#ifdef CONFIG_ARM64_MTE
+static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
+{
+	/* all non-zero tags excluded by default */
+	write_sysreg_s(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK, SYS_GCR_EL1);
+	write_sysreg_s(0, SYS_TFSR_EL1);
+	write_sysreg_s(0, SYS_TFSRE0_EL1);
+
+	isb();
+}
+#endif /* CONFIG_ARM64_MTE */
+
 /* Internal helper functions to match cpu capability type */
 static bool
 cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -2056,6 +2070,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.sign = FTR_UNSIGNED,
 	},
 #endif
+#ifdef CONFIG_ARM64_MTE
+	{
+		.desc = "Memory Tagging Extension",
+		.capability = ARM64_MTE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR1_EL1,
+		.field_pos = ID_AA64PFR1_MTE_SHIFT,
+		.min_field_value = ID_AA64PFR1_MTE,
+		.sign = FTR_UNSIGNED,
+		.cpu_enable = cpu_enable_mte,
+	},
+#endif /* CONFIG_ARM64_MTE */
 	{},
 };
 
@@ -2172,6 +2199,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
 	HWCAP_MULTI_CAP(ptr_auth_hwcap_addr_matches, CAP_HWCAP, KERNEL_HWCAP_PACA),
 	HWCAP_MULTI_CAP(ptr_auth_hwcap_gen_matches, CAP_HWCAP, KERNEL_HWCAP_PACG),
 #endif
+#ifdef CONFIG_ARM64_MTE
+	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
+#endif /* CONFIG_ARM64_MTE */
 	{},
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 86637466daa8..5ce478c0b4b1 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -93,6 +93,7 @@ static const char *const hwcap_str[] = {
 	"dgh",
 	"rng",
 	"bti",
+	"mte",
 	NULL
 };
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index baf5ce9225ce..552a7d860a45 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1104,6 +1104,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		if (!vcpu_has_sve(vcpu))
 			val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 		val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
+	} else if (id == SYS_ID_AA64PFR1_EL1) {
+		val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
 			 (0xfUL << ID_AA64ISAR1_API_SHIFT) |


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 03/29] arm64: mte: Use Normal Tagged attributes for the linear map
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 01/29] arm64: mte: system register definitions Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 02/29] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 04/29] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Suzuki K Poulose

Once user space is given access to tagged memory, the kernel must be
able to clear/save/restore tags visible to the user. This is done via
the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED
index for MAIR_EL1 is initially mapped as Normal memory and later
changed to Normal Tagged via the cpufeature infrastructure. From a
mismatched attribute aliases perspective, the Tagged memory is
considered a permission and it won't lead to undefined behaviour.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
---

Notes:
    v5:
    - Move the clearing of the zero page since clear_page() to a later
      patch.
    
    v3:
    - Restrict the safe attribute change in pgattr_change_is_safe() only to
      Normal to/from Normal-Tagged (old version allow any other type as long
      as old or new was Normal(-Tagged)).

 arch/arm64/include/asm/memory.h       |  1 +
 arch/arm64/include/asm/pgtable-prot.h |  2 ++
 arch/arm64/kernel/cpufeature.c        | 24 ++++++++++++++++++++++++
 arch/arm64/mm/dump.c                  |  4 ++++
 arch/arm64/mm/mmu.c                   | 22 ++++++++++++++++++++--
 arch/arm64/mm/proc.S                  |  8 ++++++--
 6 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index a1871bb32bb1..472c77a68225 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -136,6 +136,7 @@
 #define MT_NORMAL_NC		3
 #define MT_NORMAL		4
 #define MT_NORMAL_WT		5
+#define MT_NORMAL_TAGGED	6
 
 /*
  * Memory types for Stage-2 translation
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 2e7e0f452301..292f1d58b96e 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -50,6 +50,7 @@ extern bool arm64_use_ng_mappings;
 #define PROT_NORMAL_NC		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_NC))
 #define PROT_NORMAL_WT		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_WT))
 #define PROT_NORMAL		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL))
+#define PROT_NORMAL_TAGGED	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_TAGGED))
 
 #define PROT_SECT_DEVICE_nGnRE	(PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_nGnRE))
 #define PROT_SECT_NORMAL	(PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
@@ -59,6 +60,7 @@ extern bool arm64_use_ng_mappings;
 #define _HYP_PAGE_DEFAULT	_PAGE_DEFAULT
 
 #define PAGE_KERNEL		__pgprot(PROT_NORMAL)
+#define PAGE_KERNEL_TAGGED	__pgprot(PROT_NORMAL_TAGGED)
 #define PAGE_KERNEL_RO		__pgprot((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
 #define PAGE_KERNEL_ROX		__pgprot((PROT_NORMAL & ~(PTE_WRITE | PTE_PXN)) | PTE_RDONLY)
 #define PAGE_KERNEL_EXEC	__pgprot(PROT_NORMAL & ~PTE_PXN)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ccfdcb93d736..790b0e1da136 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1662,13 +1662,37 @@ static void bti_enable(const struct arm64_cpu_capabilities *__unused)
 #ifdef CONFIG_ARM64_MTE
 static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 {
+	u64 mair;
+
 	/* all non-zero tags excluded by default */
 	write_sysreg_s(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK, SYS_GCR_EL1);
 	write_sysreg_s(0, SYS_TFSR_EL1);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 
+	/*
+	 * Update the MT_NORMAL_TAGGED index in MAIR_EL1. Tag checking is
+	 * disabled for the kernel, so there won't be any observable effect
+	 * other than allowing the kernel to read and write tags.
+	 */
+	mair = read_sysreg_s(SYS_MAIR_EL1);
+	mair &= ~MAIR_ATTRIDX(MAIR_ATTR_MASK, MT_NORMAL_TAGGED);
+	mair |= MAIR_ATTRIDX(MAIR_ATTR_NORMAL_TAGGED, MT_NORMAL_TAGGED);
+	write_sysreg_s(mair, SYS_MAIR_EL1);
+
 	isb();
 }
+
+static int __init system_enable_mte(void)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	/* Ensure the TLB does not have stale MAIR attributes */
+	flush_tlb_all();
+
+	return 0;
+}
+core_initcall(system_enable_mte);
 #endif /* CONFIG_ARM64_MTE */
 
 /* Internal helper functions to match cpu capability type */
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 0b8da1cc1c07..ba6d1d89f9b2 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -169,6 +169,10 @@ static const struct prot_bits pte_bits[] = {
 		.mask	= PTE_ATTRINDX_MASK,
 		.val	= PTE_ATTRINDX(MT_NORMAL),
 		.set	= "MEM/NORMAL",
+	}, {
+		.mask	= PTE_ATTRINDX_MASK,
+		.val	= PTE_ATTRINDX(MT_NORMAL_TAGGED),
+		.set	= "MEM/NORMAL-TAGGED",
 	}
 };
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1df25f26571d..0bbe96c006ad 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -121,7 +121,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 	 * The following mapping attributes may be updated in live
 	 * kernel mappings without the need for break-before-make.
 	 */
-	static const pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
+	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
 
 	/* creating or taking down mappings is always safe */
 	if (old == 0 || new == 0)
@@ -135,6 +135,19 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 	if (old & ~new & PTE_NG)
 		return false;
 
+	if (system_supports_mte()) {
+		/*
+		 * Changing the memory type between Normal and Normal-Tagged
+		 * is safe since Tagged is considered a permission attribute
+		 * from the mismatched attribute aliases perspective.
+		 */
+		if (((old & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL) ||
+		     (old & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_TAGGED)) &&
+		    ((new & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL) ||
+		     (new & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_TAGGED)))
+			mask |= PTE_ATTRINDX_MASK;
+	}
+
 	return ((old ^ new) & ~mask) == 0;
 }
 
@@ -490,7 +503,12 @@ static void __init map_mem(pgd_t *pgdp)
 		if (memblock_is_nomap(reg))
 			continue;
 
-		__map_memblock(pgdp, start, end, PAGE_KERNEL, flags);
+		/*
+		 * The linear map must allow allocation tags reading/writing
+		 * if MTE is present. Otherwise, it has the same attributes as
+		 * PAGE_KERNEL.
+		 */
+		__map_memblock(pgdp, start, end, PAGE_KERNEL_TAGGED, flags);
 	}
 
 	/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 796e47a571e6..152d74f2cc9c 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -44,14 +44,18 @@
 #define TCR_KASAN_FLAGS 0
 #endif
 
-/* Default MAIR_EL1 */
+/*
+ * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
+ * changed later to Normal Tagged if the system supports MTE.
+ */
 #define MAIR_EL1_SET							\
 	(MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |	\
 	 MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |	\
 	 MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |		\
 	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |		\
 	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |			\
-	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
+	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |		\
+	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
 
 #ifdef CONFIG_CPU_PM
 /**


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 04/29] arm64: mte: Add specific SIGSEGV codes
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (2 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 03/29] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 05/29] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Arnd Bergmann

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add MTE-specific SIGSEGV codes to siginfo.h and update the x86
BUILD_BUG_ON(NSIGSEGV != 7) compile check.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[catalin.marinas@arm.com: renamed precise/imprecise to sync/async]
[catalin.marinas@arm.com: dropped #ifdef __aarch64__, renumbered]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Fixed the BUILD_BUG_ON(NSIGSEGV != 7) on x86
    - Updated the commit log
    
    v2:
    - Dropped the #ifdef __aarch64__.
    - Renumbered the SEGV_MTE* values to avoid clash with ADI.

 arch/x86/kernel/signal_compat.c    | 2 +-
 include/uapi/asm-generic/siginfo.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf0576cd0..a7f3e12cfbdb 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void)
 	 */
 	BUILD_BUG_ON(NSIGILL  != 11);
 	BUILD_BUG_ON(NSIGFPE  != 15);
-	BUILD_BUG_ON(NSIGSEGV != 7);
+	BUILD_BUG_ON(NSIGSEGV != 9);
 	BUILD_BUG_ON(NSIGBUS  != 5);
 	BUILD_BUG_ON(NSIGTRAP != 5);
 	BUILD_BUG_ON(NSIGCHLD != 6);
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index cb3d6c267181..7aacf9389010 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -229,7 +229,9 @@ typedef struct siginfo {
 #define SEGV_ACCADI	5	/* ADI not enabled for mapped object */
 #define SEGV_ADIDERR	6	/* Disrupting MCD error */
 #define SEGV_ADIPERR	7	/* Precise MCD exception */
-#define NSIGSEGV	7
+#define SEGV_MTEAERR	8	/* Asynchronous ARM MTE error */
+#define SEGV_MTESERR	9	/* Synchronous ARM MTE exception */
+#define NSIGSEGV	9
 
 /*
  * SIGBUS si_codes


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 05/29] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (3 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 04/29] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 06/29] mm: Add PG_arch_2 page flag Catalin Marinas
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:

1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.

Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().

On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v6:
    - Fix sparse warning on the 0 used as pointer.
    
    v4:
    - Use send_signal_fault() instead of fault_signal_inject() for
      asynchronous tag check faults as execution can continue even if this
      signal is masked.
    - Add DSB ISH prior to writing TFSRE0_EL1 in the clear_mte_async_tcf
      macro.
    - Move clear_mte_async_tcf just after returning to user since
      do_notify_resume() may still cause async tag faults via do_signal().
    
    v3:
    - Asynchronous tag check faults during the uaccess routines in the
      kernel are ignored.
    - Fix check_mte_async_tcf calling site as it expects the first argument
      to be the thread flags.
    - Move the mte_thread_switch() definition and call to a later patch as
      this became empty with the removal of async uaccess checking.
    - Add dsb() and clearing of TFSRE0_EL1 in flush_mte_state(), in case
      execve() triggered a asynchronous tag check fault.
    - Clear TIF_MTE_ASYNC_FAULT in arch_dup_task_struct() so that the child
      does not inherit any pending tag fault in the parent.
    
    v2:
    - Clear PSTATE.TCO on exception entry (automatically set by the hardware).
    - On syscall entry, for asynchronous tag check faults from user space,
      generate the signal early via syscall restarting.
    - Before context switch, save any potential async tag check fault
      generated by the kernel to the TIF flag (this follows an architecture
      update where the uaccess routines use the TCF0 mode).
    - Moved the flush_mte_state() and mte_thread_switch() function to a new
      mte.c file.

 arch/arm64/include/asm/mte.h         | 23 +++++++++++++++++
 arch/arm64/include/asm/thread_info.h |  4 ++-
 arch/arm64/kernel/Makefile           |  1 +
 arch/arm64/kernel/entry.S            | 37 ++++++++++++++++++++++++++++
 arch/arm64/kernel/mte.c              | 21 ++++++++++++++++
 arch/arm64/kernel/process.c          |  5 ++++
 arch/arm64/kernel/signal.c           |  9 +++++++
 arch/arm64/kernel/syscall.c          | 10 ++++++++
 arch/arm64/mm/fault.c                |  9 ++++++-
 9 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/mte.h
 create mode 100644 arch/arm64/kernel/mte.c

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
new file mode 100644
index 000000000000..a0bf310da74b
--- /dev/null
+++ b/arch/arm64/include/asm/mte.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_MTE_H
+#define __ASM_MTE_H
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_ARM64_MTE
+
+void flush_mte_state(void);
+
+#else
+
+static inline void flush_mte_state(void)
+{
+}
+
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_MTE_H  */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 6ea8b6a26ae9..c91605faa9cb 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -67,6 +67,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
+#define TIF_MTE_ASYNC_FAULT	6	/* MTE Asynchronous Tag Check Fault */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -95,10 +96,11 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
+#define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK)
+				 _TIF_UPROBE | _TIF_FSCHECK | _TIF_MTE_ASYNC_FAULT)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..5fb9b728459b 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_SSBD)		+= ssbd.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_SHADOW_CALL_STACK)		+= scs.o
+obj-$(CONFIG_ARM64_MTE)			+= mte.o
 
 obj-y					+= vdso/ probes/
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32/
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5304d193c79d..cde127508e38 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -146,6 +146,32 @@ alternative_cb_end
 #endif
 	.endm
 
+	/* Check for MTE asynchronous tag check faults */
+	.macro check_mte_async_tcf, flgs, tmp
+#ifdef CONFIG_ARM64_MTE
+alternative_if_not ARM64_MTE
+	b	1f
+alternative_else_nop_endif
+	mrs_s	\tmp, SYS_TFSRE0_EL1
+	tbz	\tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
+	/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
+	orr	\flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
+	str	\flgs, [tsk, #TSK_TI_FLAGS]
+	msr_s	SYS_TFSRE0_EL1, xzr
+1:
+#endif
+	.endm
+
+	/* Clear the MTE asynchronous tag check faults */
+	.macro clear_mte_async_tcf
+#ifdef CONFIG_ARM64_MTE
+alternative_if ARM64_MTE
+	dsb	ish
+	msr_s	SYS_TFSRE0_EL1, xzr
+alternative_else_nop_endif
+#endif
+	.endm
+
 	.macro	kernel_entry, el, regsize = 64
 	.if	\regsize == 32
 	mov	w0, w0				// zero upper 32 bits of x0
@@ -177,6 +203,8 @@ alternative_cb_end
 	ldr	x19, [tsk, #TSK_TI_FLAGS]
 	disable_step_tsk x19, x20
 
+	/* Check for asynchronous tag check faults in user space */
+	check_mte_async_tcf x19, x22
 	apply_ssbd 1, x22, x23
 
 	ptrauth_keys_install_kernel tsk, x20, x22, x23
@@ -247,6 +275,13 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
 	str	x20, [sp, #S_PMR_SAVE]
 alternative_else_nop_endif
 
+	/* Re-enable tag checking (TCO set on exception entry) */
+#ifdef CONFIG_ARM64_MTE
+alternative_if ARM64_MTE
+	SET_PSTATE_TCO(0)
+alternative_else_nop_endif
+#endif
+
 	/*
 	 * Registers that may be useful after this macro is invoked:
 	 *
@@ -755,6 +790,8 @@ SYM_CODE_START_LOCAL(ret_to_user)
 	and	x2, x1, #_TIF_WORK_MASK
 	cbnz	x2, work_pending
 finish_ret_to_user:
+	/* Ignore asynchronous tag check faults in the uaccess routines */
+	clear_mte_async_tcf
 	enable_step_tsk x1, x2
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	bl	stackleak_erase
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
new file mode 100644
index 000000000000..032016823957
--- /dev/null
+++ b/arch/arm64/kernel/mte.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+
+#include <linux/thread_info.h>
+
+#include <asm/cpufeature.h>
+#include <asm/mte.h>
+#include <asm/sysreg.h>
+
+void flush_mte_state(void)
+{
+	if (!system_supports_mte())
+		return;
+
+	/* clear any pending asynchronous tag fault */
+	dsb(ish);
+	write_sysreg_s(0, SYS_TFSRE0_EL1);
+	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6089638c7d43..695705d1f8e5 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -52,6 +52,7 @@
 #include <asm/exec.h>
 #include <asm/fpsimd.h>
 #include <asm/mmu_context.h>
+#include <asm/mte.h>
 #include <asm/processor.h>
 #include <asm/pointer_auth.h>
 #include <asm/stacktrace.h>
@@ -338,6 +339,7 @@ void flush_thread(void)
 	tls_thread_flush();
 	flush_ptrace_hw_breakpoint(current);
 	flush_tagged_addr_state();
+	flush_mte_state();
 }
 
 void release_thread(struct task_struct *dead_task)
@@ -370,6 +372,9 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	/* clear any pending asynchronous tag fault raised by the parent */
+	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
+
 	return 0;
 }
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 801d56cdf701..c2a14a2a0d8e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -748,6 +748,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
 		regs->pstate |= PSR_BTYPE_C;
 	}
 
+	/* TCO (Tag Check Override) always cleared for signal handlers */
+	regs->pstate &= ~PSR_TCO_BIT;
+
 	if (ka->sa.sa_flags & SA_RESTORER)
 		sigtramp = ka->sa.sa_restorer;
 	else
@@ -939,6 +942,12 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_UPROBE)
 				uprobe_notify_resume(regs);
 
+			if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
+				clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+				send_sig_fault(SIGSEGV, SEGV_MTEAERR,
+					       (void __user *)NULL, current);
+			}
+
 			if (thread_flags & _TIF_SIGPENDING)
 				do_signal(regs);
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 5f5b868292f5..e4b977e1cf0b 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -120,6 +120,16 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	local_daif_restore(DAIF_PROCCTX);
 	user_exit();
 
+	if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
+		/*
+		 * Process the asynchronous tag check fault before the actual
+		 * syscall. do_notify_resume() will send a signal to userspace
+		 * before the syscall is restarted.
+		 */
+		regs->regs[0] = -ERESTARTNOINTR;
+		return;
+	}
+
 	if (has_syscall_work(flags)) {
 		/* set default errno for user-issued syscall(-1) */
 		if (scno == NO_SYSCALL)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8afb238ff335..5e832b3387f1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -658,6 +658,13 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 0;
 }
 
+static int do_tag_check_fault(unsigned long addr, unsigned int esr,
+			      struct pt_regs *regs)
+{
+	do_bad_area(addr, esr, regs);
+	return 0;
+}
+
 static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
@@ -676,7 +683,7 @@ static const struct fault_info fault_info[] = {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 17"			},
+	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 06/29] mm: Add PG_arch_2 page flag
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (4 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 05/29] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 07/29] mm: Preserve the PG_arch_2 flag in __split_huge_page_tail() Catalin Marinas
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Steven Price

From: Steven Price <steven.price@arm.com>

For arm64 MTE support it is necessary to be able to mark pages that
contain user space visible tags that will need to be saved/restored e.g.
when swapped out.

To support this add a new arch specific flag (PG_arch_2). This flag is
only available on 64-bit architectures due to the limited number of
spare page flags on the 32-bit ones.

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: use CONFIG_64BIT for guarding this new flag]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    v6:
    - Using CONFIG_64BIT instead of a new CONFIG_ARCH_USES_PG_ARCH_2 option.
    
    New in v4.

 fs/proc/page.c                    | 3 +++
 include/linux/kernel-page-flags.h | 1 +
 include/linux/page-flags.h        | 3 +++
 include/trace/events/mmflags.h    | 9 ++++++++-
 tools/vm/page-types.c             | 2 ++
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index f909243d4a66..9f1077d94cde 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -217,6 +217,9 @@ u64 stable_page_flags(struct page *page)
 	u |= kpf_copy_bit(k, KPF_PRIVATE_2,	PG_private_2);
 	u |= kpf_copy_bit(k, KPF_OWNER_PRIVATE,	PG_owner_priv_1);
 	u |= kpf_copy_bit(k, KPF_ARCH,		PG_arch_1);
+#ifdef CONFIG_64BIT
+	u |= kpf_copy_bit(k, KPF_ARCH_2,	PG_arch_2);
+#endif
 
 	return u;
 };
diff --git a/include/linux/kernel-page-flags.h b/include/linux/kernel-page-flags.h
index abd20ef93c98..eee1877a354e 100644
--- a/include/linux/kernel-page-flags.h
+++ b/include/linux/kernel-page-flags.h
@@ -17,5 +17,6 @@
 #define KPF_ARCH		38
 #define KPF_UNCACHED		39
 #define KPF_SOFTDIRTY		40
+#define KPF_ARCH_2		41
 
 #endif /* LINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6be1aa559b1e..276140c94f4a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -135,6 +135,9 @@ enum pageflags {
 #if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
 	PG_young,
 	PG_idle,
+#endif
+#ifdef CONFIG_64BIT
+	PG_arch_2,
 #endif
 	__NR_PAGEFLAGS,
 
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5fb752034386..67018d367b9f 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -79,6 +79,12 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
+#ifdef CONFIG_64BIT
+#define IF_HAVE_PG_ARCH_2(flag,string) ,{1UL << flag, string}
+#else
+#define IF_HAVE_PG_ARCH_2(flag,string)
+#endif
+
 #define __def_pageflag_names						\
 	{1UL << PG_locked,		"locked"	},		\
 	{1UL << PG_waiters,		"waiters"	},		\
@@ -105,7 +111,8 @@ IF_HAVE_PG_MLOCK(PG_mlocked,		"mlocked"	)		\
 IF_HAVE_PG_UNCACHED(PG_uncached,	"uncached"	)		\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
 IF_HAVE_PG_IDLE(PG_young,		"young"		)		\
-IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
+IF_HAVE_PG_IDLE(PG_idle,		"idle"		)		\
+IF_HAVE_PG_ARCH_2(PG_arch_2,		"arch_2"	)
 
 #define show_page_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 58c0eab71bca..0517c744b04e 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -78,6 +78,7 @@
 #define KPF_ARCH		38
 #define KPF_UNCACHED		39
 #define KPF_SOFTDIRTY		40
+#define KPF_ARCH_2		41
 
 /* [48-] take some arbitrary free slots for expanding overloaded flags
  * not part of kernel API
@@ -135,6 +136,7 @@ static const char * const page_flag_names[] = {
 	[KPF_ARCH]		= "h:arch",
 	[KPF_UNCACHED]		= "c:uncached",
 	[KPF_SOFTDIRTY]		= "f:softdirty",
+	[KPF_ARCH_2]		= "H:arch_2",
 
 	[KPF_READAHEAD]		= "I:readahead",
 	[KPF_SLOB_FREE]		= "P:slob_free",


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 07/29] mm: Preserve the PG_arch_2 flag in __split_huge_page_tail()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (5 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 06/29] mm: Add PG_arch_2 page flag Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 08/29] arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE Catalin Marinas
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

When a huge page is split into normal pages, part of the head page flags
are transferred to the tail pages. However, the PG_arch_* flags are not
part of the preserved set.

PG_arch_2 is used by the arm64 MTE support to mark pages that have valid
tags. The absence of such flag would cause the arm64 set_pte_at() to
clear the tags in order to avoid stale tags exposed to user or the
swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
page splitting leads to tag corruption in the tail pages.

Preserve the newly added PG_arch_2 flag in __split_huge_page_tail().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    v7:
    - Only preserve PG_arch_2 in __split_huge_page_tail(). The PG_arch_1
      flag will be discussed separately as it may potentially impact s390
      and x86.
    
    New in v6.

 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78c84bee7e29..21ca2fff6b19 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2364,6 +2364,9 @@ static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_workingset) |
 			 (1L << PG_locked) |
 			 (1L << PG_unevictable) |
+#ifdef CONFIG_64BIT
+			 (1L << PG_arch_2) |
+#endif
 			 (1L << PG_dirty)));
 
 	/* ->mapping in first tail page is compound_mapcount */


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 08/29] arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (6 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 07/29] mm: Preserve the PG_arch_2 flag in __split_huge_page_tail() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 09/29] arm64: mte: Tags-aware copy_{user_,}highpage() implementations Catalin Marinas
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Steven Price

Pages allocated by the kernel are not guaranteed to have the tags
zeroed, especially as the kernel does not (yet) use MTE itself. To
ensure the user can still access such pages when mapped into its address
space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged
(PG_arch_2) - is used to track pages with valid allocation tags.

Since the zero page is mapped as pte_special(), it won't be covered by
the above set_pte_at() mechanism. Clear its tags during early MTE
initialisation.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v5:
    - Fix the handling of compound pages. Previously, set_pte_at() could
      have erased already valid tags if the first page in a compound one
      did not have the PG_mte_tagged flag set.
    - Move the multi_tag_transfer_size macro from assembler.h to mte.S.
    - Ignore pte_special() mappings and clear the tags in the zero page
      separately (since it's mapped as a special pte).
    - Clearing the tags of the zero page was moved to this patch from an
      earlier one since mte_clear_page_tags() was not available.
    
    New in v4. Replacing a previous page zeroing the tags in clear_page().

 arch/arm64/include/asm/mte.h     | 16 +++++++++++++++
 arch/arm64/include/asm/pgtable.h |  7 +++++++
 arch/arm64/kernel/cpufeature.c   |  7 +++++++
 arch/arm64/kernel/mte.c          | 14 +++++++++++++
 arch/arm64/lib/Makefile          |  2 ++
 arch/arm64/lib/mte.S             | 34 ++++++++++++++++++++++++++++++++
 6 files changed, 80 insertions(+)
 create mode 100644 arch/arm64/lib/mte.S

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index a0bf310da74b..1716b3d02489 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -7,12 +7,28 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/page-flags.h>
+
+#include <asm/pgtable-types.h>
+
+void mte_clear_page_tags(void *addr);
+
 #ifdef CONFIG_ARM64_MTE
 
+/* track which pages have valid allocation tags */
+#define PG_mte_tagged	PG_arch_2
+
+void mte_sync_tags(pte_t *ptep, pte_t pte);
 void flush_mte_state(void);
 
 #else
 
+/* unused if !CONFIG_ARM64_MTE, silence the compiler */
+#define PG_mte_tagged	0
+
+static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
+{
+}
 static inline void flush_mte_state(void)
 {
 }
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 758e2d1577d0..f9401a3205a8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -9,6 +9,7 @@
 #include <asm/proc-fns.h>
 
 #include <asm/memory.h>
+#include <asm/mte.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/pgtable-prot.h>
 #include <asm/tlbflush.h>
@@ -80,6 +81,8 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 #define pte_user_exec(pte)	(!(pte_val(pte) & PTE_UXN))
 #define pte_cont(pte)		(!!(pte_val(pte) & PTE_CONT))
 #define pte_devmap(pte)		(!!(pte_val(pte) & PTE_DEVMAP))
+#define pte_tagged(pte)		((pte_val(pte) & PTE_ATTRINDX_MASK) == \
+				 PTE_ATTRINDX(MT_NORMAL_TAGGED))
 
 #define pte_cont_addr_end(addr, end)						\
 ({	unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;	\
@@ -274,6 +277,10 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 	if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte))
 		__sync_icache_dcache(pte);
 
+	if (system_supports_mte() &&
+	    pte_present(pte) && pte_tagged(pte) && !pte_special(pte))
+		mte_sync_tags(ptep, pte);
+
 	__check_racy_pte_update(mm, ptep, pte);
 
 	set_pte(ptep, pte);
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 790b0e1da136..c1df72bfede4 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -75,6 +75,7 @@
 #include <asm/cpu_ops.h>
 #include <asm/fpsimd.h>
 #include <asm/mmu_context.h>
+#include <asm/mte.h>
 #include <asm/processor.h>
 #include <asm/sysreg.h>
 #include <asm/traps.h>
@@ -1690,6 +1691,12 @@ static int __init system_enable_mte(void)
 	/* Ensure the TLB does not have stale MAIR attributes */
 	flush_tlb_all();
 
+	/*
+	 * Clear the tags in the zero page. This needs to be done via the
+	 * linear map which has the Tagged attribute.
+	 */
+	mte_clear_page_tags(lm_alias(empty_zero_page));
+
 	return 0;
 }
 core_initcall(system_enable_mte);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 032016823957..5bf9bbed5a25 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -3,12 +3,26 @@
  * Copyright (C) 2020 ARM Ltd.
  */
 
+#include <linux/bitops.h>
+#include <linux/mm.h>
 #include <linux/thread_info.h>
 
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
 #include <asm/sysreg.h>
 
+void mte_sync_tags(pte_t *ptep, pte_t pte)
+{
+	struct page *page = pte_page(pte);
+	long i, nr_pages = compound_nr(page);
+
+	/* if PG_mte_tagged is set, tags have already been initialised */
+	for (i = 0; i < nr_pages; i++, page++) {
+		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+			mte_clear_page_tags(page_address(page));
+	}
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 2fc253466dbf..d31e1169d9b8 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -16,3 +16,5 @@ lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 obj-$(CONFIG_CRC32) += crc32.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+
+obj-$(CONFIG_ARM64_MTE) += mte.o
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
new file mode 100644
index 000000000000..a36705640086
--- /dev/null
+++ b/arch/arm64/lib/mte.S
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#include <linux/linkage.h>
+
+#include <asm/assembler.h>
+#include <asm/sysreg.h>
+
+	.arch	armv8.5-a+memtag
+
+/*
+ * multitag_transfer_size - set \reg to the block size that is accessed by the
+ * LDGM/STGM instructions.
+ */
+	.macro	multitag_transfer_size, reg, tmp
+	mrs_s	\reg, SYS_GMID_EL1
+	ubfx	\reg, \reg, #SYS_GMID_EL1_BS_SHIFT, #SYS_GMID_EL1_BS_SIZE
+	mov	\tmp, #4
+	lsl	\reg, \tmp, \reg
+	.endm
+
+/*
+ * Clear the tags in a page
+ *   x0 - address of the page to be cleared
+ */
+SYM_FUNC_START(mte_clear_page_tags)
+	multitag_transfer_size x1, x2
+1:	stgm	xzr, [x0]
+	add	x0, x0, x1
+	tst	x0, #(PAGE_SIZE - 1)
+	b.ne	1b
+	ret
+SYM_FUNC_END(mte_clear_page_tags)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 09/29] arm64: mte: Tags-aware copy_{user_,}highpage() implementations
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (7 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 08/29] arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 10/29] arm64: Avoid unnecessary clear_user_page() indirection Catalin Marinas
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

When the Memory Tagging Extension is enabled, the tags need to be
preserved across page copy (e.g. for copy-on-write, page migration).

Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to
the destination if the source page has the PG_mte_tagged flag set.
copy_user_page() does not need to handle tag copying since, with this
patch, it is only called by the DAX code where there is no source page
structure (and no source tags).

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v5:
    - Handle tags in copy_highpage() (previously only copy_user_highpage()).
    - Ignore tags in copy_user_page() since it is only called directly by
      the DAX code where there is no source page structure.
    - Fix missing ret in mte_copy_page_tags().
    
    v4:
    - Moved the tag copying to a separate function in mte.S and only called
      if the source page has the PG_mte_tagged flag set.

 arch/arm64/include/asm/mte.h  |  4 ++++
 arch/arm64/include/asm/page.h | 14 +++++++++++---
 arch/arm64/lib/mte.S          | 19 +++++++++++++++++++
 arch/arm64/mm/copypage.c      | 25 +++++++++++++++++++++----
 4 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 1716b3d02489..b2577eee62c2 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -19,6 +19,7 @@ void mte_clear_page_tags(void *addr);
 #define PG_mte_tagged	PG_arch_2
 
 void mte_sync_tags(pte_t *ptep, pte_t pte);
+void mte_copy_page_tags(void *kto, const void *kfrom);
 void flush_mte_state(void);
 
 #else
@@ -29,6 +30,9 @@ void flush_mte_state(void);
 static inline void mte_sync_tags(pte_t *ptep, pte_t pte)
 {
 }
+static inline void mte_copy_page_tags(void *kto, const void *kfrom)
+{
+}
 static inline void flush_mte_state(void)
 {
 }
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index c01b52add377..11734ce29702 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -15,18 +15,26 @@
 #include <linux/personality.h> /* for READ_IMPLIES_EXEC */
 #include <asm/pgtable-types.h>
 
+struct page;
+struct vm_area_struct;
+
 extern void __cpu_clear_user_page(void *p, unsigned long user);
-extern void __cpu_copy_user_page(void *to, const void *from,
-				 unsigned long user);
 extern void copy_page(void *to, const void *from);
 extern void clear_page(void *to);
 
+void copy_user_highpage(struct page *to, struct page *from,
+			unsigned long vaddr, struct vm_area_struct *vma);
+#define __HAVE_ARCH_COPY_USER_HIGHPAGE
+
+void copy_highpage(struct page *to, struct page *from);
+#define __HAVE_ARCH_COPY_HIGHPAGE
+
 #define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 #define clear_user_page(addr,vaddr,pg)  __cpu_clear_user_page(addr, vaddr)
-#define copy_user_page(to,from,vaddr,pg) __cpu_copy_user_page(to, from, vaddr)
+#define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
 typedef struct page *pgtable_t;
 
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index a36705640086..3c3d0edbbca3 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -5,6 +5,7 @@
 #include <linux/linkage.h>
 
 #include <asm/assembler.h>
+#include <asm/page.h>
 #include <asm/sysreg.h>
 
 	.arch	armv8.5-a+memtag
@@ -32,3 +33,21 @@ SYM_FUNC_START(mte_clear_page_tags)
 	b.ne	1b
 	ret
 SYM_FUNC_END(mte_clear_page_tags)
+
+/*
+ * Copy the tags from the source page to the destination one
+ *   x0 - address of the destination page
+ *   x1 - address of the source page
+ */
+SYM_FUNC_START(mte_copy_page_tags)
+	mov	x2, x0
+	mov	x3, x1
+	multitag_transfer_size x5, x6
+1:	ldgm	x4, [x3]
+	stgm	x4, [x2]
+	add	x2, x2, x5
+	add	x3, x3, x5
+	tst	x2, #(PAGE_SIZE - 1)
+	b.ne	1b
+	ret
+SYM_FUNC_END(mte_copy_page_tags)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 2ee7b73433a5..4a2233fa674e 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -6,18 +6,35 @@
  * Copyright (C) 2012 ARM Ltd.
  */
 
+#include <linux/bitops.h>
 #include <linux/mm.h>
 
 #include <asm/page.h>
 #include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
+#include <asm/mte.h>
 
-void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr)
+void copy_highpage(struct page *to, struct page *from)
 {
-	struct page *page = virt_to_page(kto);
+	struct page *kto = page_address(to);
+	struct page *kfrom = page_address(from);
+
 	copy_page(kto, kfrom);
-	flush_dcache_page(page);
+
+	if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) {
+		set_bit(PG_mte_tagged, &to->flags);
+		mte_copy_page_tags(kto, kfrom);
+	}
+}
+EXPORT_SYMBOL(copy_highpage);
+
+void copy_user_highpage(struct page *to, struct page *from,
+			unsigned long vaddr, struct vm_area_struct *vma)
+{
+	copy_highpage(to, from);
+	flush_dcache_page(to);
 }
-EXPORT_SYMBOL_GPL(__cpu_copy_user_page);
+EXPORT_SYMBOL_GPL(copy_user_highpage);
 
 void __cpu_clear_user_page(void *kaddr, unsigned long vaddr)
 {


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 10/29] arm64: Avoid unnecessary clear_user_page() indirection
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (8 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 09/29] arm64: mte: Tags-aware copy_{user_,}highpage() implementations Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 11/29] arm64: mte: Tags-aware aware memcmp_pages() implementation Catalin Marinas
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

Since clear_user_page() calls clear_page() directly, avoid the
unnecessary indirection.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v5.

 arch/arm64/include/asm/page.h | 3 +--
 arch/arm64/mm/copypage.c      | 6 ------
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 11734ce29702..d918cb1d83a6 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -18,7 +18,6 @@
 struct page;
 struct vm_area_struct;
 
-extern void __cpu_clear_user_page(void *p, unsigned long user);
 extern void copy_page(void *to, const void *from);
 extern void clear_page(void *to);
 
@@ -33,7 +32,7 @@ void copy_highpage(struct page *to, struct page *from);
 	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
-#define clear_user_page(addr,vaddr,pg)  __cpu_clear_user_page(addr, vaddr)
+#define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
 
 typedef struct page *pgtable_t;
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 4a2233fa674e..70a71f38b6a9 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -35,9 +35,3 @@ void copy_user_highpage(struct page *to, struct page *from,
 	flush_dcache_page(to);
 }
 EXPORT_SYMBOL_GPL(copy_user_highpage);
-
-void __cpu_clear_user_page(void *kaddr, unsigned long vaddr)
-{
-	clear_page(kaddr);
-}
-EXPORT_SYMBOL_GPL(__cpu_clear_user_page);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 11/29] arm64: mte: Tags-aware aware memcmp_pages() implementation
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (9 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 10/29] arm64: Avoid unnecessary clear_user_page() indirection Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 12/29] arm64: mte: Handle the MAIR_EL1 changes for late CPU bring-up Catalin Marinas
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v4:
    - Remove page tag comparison. This is not very useful to detect
      identical pages as long as set_pte_at() can zero the tags on a page
      without copy-on-write if mapped with PROT_MTE. This can be improved
      if a real case appears but it's unlikely for heap pages to be
      identical across multiple processes.
    - Move the memcmp_pages() function to mte.c.

 arch/arm64/kernel/mte.c | 26 ++++++++++++++++++++++++++
 mm/util.c               |  2 +-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 5bf9bbed5a25..5f54fd140610 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -5,6 +5,7 @@
 
 #include <linux/bitops.h>
 #include <linux/mm.h>
+#include <linux/string.h>
 #include <linux/thread_info.h>
 
 #include <asm/cpufeature.h>
@@ -23,6 +24,31 @@ void mte_sync_tags(pte_t *ptep, pte_t pte)
 	}
 }
 
+int memcmp_pages(struct page *page1, struct page *page2)
+{
+	char *addr1, *addr2;
+	int ret;
+
+	addr1 = page_address(page1);
+	addr2 = page_address(page2);
+	ret = memcmp(addr1, addr2, PAGE_SIZE);
+
+	if (!system_supports_mte() || ret)
+		return ret;
+
+	/*
+	 * If the page content is identical but at least one of the pages is
+	 * tagged, return non-zero to avoid KSM merging. If only one of the
+	 * pages is tagged, set_pte_at() may zero or change the tags of the
+	 * other page via mte_sync_tags().
+	 */
+	if (test_bit(PG_mte_tagged, &page1->flags) ||
+	    test_bit(PG_mte_tagged, &page2->flags))
+		return addr1 != addr2;
+
+	return ret;
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
diff --git a/mm/util.c b/mm/util.c
index c63c8e47be57..c856f5fec69d 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -911,7 +911,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
 	return res;
 }
 
-int memcmp_pages(struct page *page1, struct page *page2)
+int __weak memcmp_pages(struct page *page1, struct page *page2)
 {
 	char *addr1, *addr2;
 	int ret;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 12/29] arm64: mte: Handle the MAIR_EL1 changes for late CPU bring-up
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (10 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 11/29] arm64: mte: Tags-aware aware memcmp_pages() implementation Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 13/29] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

CnP must be enabled only after the MAIR_EL1 register has been set up by
the cpu_enable_mte() function. Inconsistent MAIR_EL1 between CPUs
sharing the same TLB may lead to the wrong memory type being used for a
brief window during CPU power-up.

Move the ARM64_HAS_CNP capability to a higher number and add a
corresponding BUILD_BUG_ON() to check for any inadvertent future
change in the relative positions of MTE and CnP. The cpufeature.c code
ensures that the cpu_enable() function is called in the ascending order
of the capability number.

In addition, move the TLB invalidation to cpu_enable_mte() since late
CPUs brought up won't be covered by the flush_tlb_all() in
system_enable_mte().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v7.

 arch/arm64/include/asm/cpucaps.h |  4 ++--
 arch/arm64/kernel/cpufeature.c   | 14 ++++++++++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 6bc3e21e5929..bc39fdbf0706 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -22,7 +22,7 @@
 #define ARM64_WORKAROUND_CAVIUM_27456		12
 #define ARM64_HAS_32BIT_EL0			13
 #define ARM64_HARDEN_EL2_VECTORS		14
-#define ARM64_HAS_CNP				15
+#define ARM64_MTE				15
 #define ARM64_HAS_NO_FPSIMD			16
 #define ARM64_WORKAROUND_REPEAT_TLBI		17
 #define ARM64_WORKAROUND_QCOM_FALKOR_E1003	18
@@ -62,7 +62,7 @@
 #define ARM64_HAS_GENERIC_AUTH			52
 #define ARM64_HAS_32BIT_EL1			53
 #define ARM64_BTI				54
-#define ARM64_MTE				55
+#define ARM64_HAS_CNP				55
 
 #define ARM64_NCAPS				56
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c1df72bfede4..4d3abb51f7d4 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1670,6 +1670,14 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 	write_sysreg_s(0, SYS_TFSR_EL1);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 
+	/*
+	 * CnP must be enabled only after the MAIR_EL1 register has been set
+	 * up. Inconsistent MAIR_EL1 between CPUs sharing the same TLB may
+	 * lead to the wrong memory type being used for a brief window during
+	 * CPU power-up.
+	 */
+	BUILD_BUG_ON(ARM64_HAS_CNP < ARM64_MTE);
+
 	/*
 	 * Update the MT_NORMAL_TAGGED index in MAIR_EL1. Tag checking is
 	 * disabled for the kernel, so there won't be any observable effect
@@ -1679,8 +1687,9 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 	mair &= ~MAIR_ATTRIDX(MAIR_ATTR_MASK, MT_NORMAL_TAGGED);
 	mair |= MAIR_ATTRIDX(MAIR_ATTR_NORMAL_TAGGED, MT_NORMAL_TAGGED);
 	write_sysreg_s(mair, SYS_MAIR_EL1);
-
 	isb();
+
+	local_flush_tlb_all();
 }
 
 static int __init system_enable_mte(void)
@@ -1688,9 +1697,6 @@ static int __init system_enable_mte(void)
 	if (!system_supports_mte())
 		return 0;
 
-	/* Ensure the TLB does not have stale MAIR attributes */
-	flush_tlb_all();
-
 	/*
 	 * Clear the tags in the zero page. This needs to be done via the
 	 * linear map which has the Tagged attribute.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 13/29] mm: Introduce arch_calc_vm_flag_bits()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (11 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 12/29] arm64: mte: Handle the MAIR_EL1 changes for late CPU bring-up Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 14/29] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Kevin Brodsky

From: Kevin Brodsky <Kevin.Brodsky@arm.com>

Similarly to arch_calc_vm_prot_bits(), introduce a dummy
arch_calc_vm_flag_bits() invoked from calc_vm_flag_bits(). This macro
can be overridden by architectures to insert specific VM_* flags derived
from the mmap() MAP_* flags.

Signed-off-by: Kevin Brodsky <Kevin.Brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    v6:
    - Added comment on where the arch code should define overriding arch_*
      macros (asm/mman.h).
    
    v2:
    - Updated the comment above arch_calc_vm_prot_bits().
    - Changed author since this patch had already been posted (internally).

 include/linux/mman.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 4b08e9c9c538..9ea3f95faa90 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -74,13 +74,18 @@ static inline void vm_unacct_memory(long pages)
 }
 
 /*
- * Allow architectures to handle additional protection bits
+ * Allow architectures to handle additional protection and flag bits. The
+ * overriding macros must be defined in the arch-specific asm/mman.h file.
  */
 
 #ifndef arch_calc_vm_prot_bits
 #define arch_calc_vm_prot_bits(prot, pkey) 0
 #endif
 
+#ifndef arch_calc_vm_flag_bits
+#define arch_calc_vm_flag_bits(flags) 0
+#endif
+
 #ifndef arch_vm_get_page_prot
 #define arch_vm_get_page_prot(vm_flags) __pgprot(0)
 #endif
@@ -131,7 +136,8 @@ calc_vm_flag_bits(unsigned long flags)
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
-	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       arch_calc_vm_flag_bits(flags);
 }
 
 unsigned long vm_commit_limit(void);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 14/29] arm64: mte: Add PROT_MTE support to mmap() and mprotect()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (12 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 13/29] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 15/29] mm: Introduce arch_validate_flags() Catalin Marinas
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

To enable tagging on a memory range, the user must explicitly opt in via
a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
memory type in the AttrIndx field of a pte, simplify the or'ing of these
bits over the protection_map[] attributes by making MT_NORMAL index 0.

There are two conditions for arch_vm_get_page_prot() to return the
MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.

arch_calc_vm_prot_bits() is responsible for registering the user request
as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
mmap() file ops call.

In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
stack or brk area.

The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
will not report an error and the memory will not be mapped as Normal
Tagged. For consistency, mprotect(PROT_MTE) will not report an error
either if the memory range does not support MTE. Two subsequent patches
in the series will propose tightening of this behaviour.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Add VM_MTE_ALLOWED to show_smap_vma_flags().

 arch/arm64/include/asm/memory.h    | 18 +++++++-----
 arch/arm64/include/asm/mman.h      | 44 ++++++++++++++++++++++++++++--
 arch/arm64/include/asm/page.h      |  2 +-
 arch/arm64/include/asm/pgtable.h   |  7 ++++-
 arch/arm64/include/uapi/asm/mman.h |  1 +
 fs/proc/task_mmu.c                 |  4 +++
 include/linux/mm.h                 |  8 ++++++
 7 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 472c77a68225..770535b7ca35 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -129,14 +129,18 @@
 
 /*
  * Memory types available.
+ *
+ * IMPORTANT: MT_NORMAL must be index 0 since vm_get_page_prot() may 'or' in
+ *	      the MT_NORMAL_TAGGED memory type for PROT_MTE mappings. Note
+ *	      that protection_map[] only contains MT_NORMAL attributes.
  */
-#define MT_DEVICE_nGnRnE	0
-#define MT_DEVICE_nGnRE		1
-#define MT_DEVICE_GRE		2
-#define MT_NORMAL_NC		3
-#define MT_NORMAL		4
-#define MT_NORMAL_WT		5
-#define MT_NORMAL_TAGGED	6
+#define MT_NORMAL		0
+#define MT_NORMAL_TAGGED	1
+#define MT_NORMAL_NC		2
+#define MT_NORMAL_WT		3
+#define MT_DEVICE_nGnRnE	4
+#define MT_DEVICE_nGnRE		5
+#define MT_DEVICE_GRE		6
 
 /*
  * Memory types for Stage-2 translation
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 081ec8de9ea6..b01051be7750 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -9,16 +9,51 @@
 static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
 	unsigned long pkey __always_unused)
 {
+	unsigned long ret = 0;
+
 	if (system_supports_bti() && (prot & PROT_BTI))
-		return VM_ARM64_BTI;
+		ret |= VM_ARM64_BTI;
 
-	return 0;
+	if (system_supports_mte() && (prot & PROT_MTE))
+		ret |= VM_MTE;
+
+	return ret;
 }
 #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
 
+static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+{
+	/*
+	 * Only allow MTE on anonymous mappings as these are guaranteed to be
+	 * backed by tags-capable memory. The vm_flags may be overridden by a
+	 * filesystem supporting MTE (RAM-based).
+	 */
+	if (system_supports_mte() && (flags & MAP_ANONYMOUS))
+		return VM_MTE_ALLOWED;
+
+	return 0;
+}
+#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+
 static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 {
-	return (vm_flags & VM_ARM64_BTI) ? __pgprot(PTE_GP) : __pgprot(0);
+	pteval_t prot = 0;
+
+	if (vm_flags & VM_ARM64_BTI)
+		prot |= PTE_GP;
+
+	/*
+	 * There are two conditions required for returning a Normal Tagged
+	 * memory type: (1) the user requested it via PROT_MTE passed to
+	 * mmap() or mprotect() and (2) the corresponding vma supports MTE. We
+	 * register (1) as VM_MTE in the vma->vm_flags and (2) as
+	 * VM_MTE_ALLOWED. Note that the latter can only be set during the
+	 * mmap() call since mprotect() does not accept MAP_* flags.
+	 */
+	if ((vm_flags & VM_MTE) && (vm_flags & VM_MTE_ALLOWED))
+		prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
+
+	return __pgprot(prot);
 }
 #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
@@ -30,6 +65,9 @@ static inline bool arch_validate_prot(unsigned long prot,
 	if (system_supports_bti())
 		supported |= PROT_BTI;
 
+	if (system_supports_mte())
+		supported |= PROT_MTE;
+
 	return (prot & ~supported) == 0;
 }
 #define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index d918cb1d83a6..012cffc574e8 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -43,7 +43,7 @@ extern int pfn_valid(unsigned long);
 
 #endif /* !__ASSEMBLY__ */
 
-#define VM_DATA_DEFAULT_FLAGS	VM_DATA_FLAGS_TSK_EXEC
+#define VM_DATA_DEFAULT_FLAGS	(VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)
 
 #include <asm-generic/getorder.h>
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f9401a3205a8..78a545536a45 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -671,8 +671,13 @@ static inline unsigned long p4d_page_vaddr(p4d_t p4d)
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
+	/*
+	 * Normal and Normal-Tagged are two different memory types and indices
+	 * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
+	 */
 	const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
-			      PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP;
+			      PTE_PROT_NONE | PTE_VALID | PTE_WRITE | PTE_GP |
+			      PTE_ATTRINDX_MASK;
 	/* preserve the hardware dirty information */
 	if (pte_hw_dirty(pte))
 		pte = pte_mkdirty(pte);
diff --git a/arch/arm64/include/uapi/asm/mman.h b/arch/arm64/include/uapi/asm/mman.h
index 6fdd71eb644f..1e6482a838e1 100644
--- a/arch/arm64/include/uapi/asm/mman.h
+++ b/arch/arm64/include/uapi/asm/mman.h
@@ -5,5 +5,6 @@
 #include <asm-generic/mman.h>
 
 #define PROT_BTI	0x10		/* BTI guarded page */
+#define PROT_MTE	0x20		/* Normal Tagged mapping */
 
 #endif /* ! _UAPI__ASM_MMAN_H */
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dbda4499a859..4f8ae353a8c5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -653,6 +653,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_MERGEABLE)]	= "mg",
 		[ilog2(VM_UFFD_MISSING)]= "um",
 		[ilog2(VM_UFFD_WP)]	= "uw",
+#ifdef CONFIG_ARM64_MTE
+		[ilog2(VM_MTE)]		= "mt",
+		[ilog2(VM_MTE_ALLOWED)]	= "",
+#endif
 #ifdef CONFIG_ARCH_HAS_PKEYS
 		/* These come out via ProtectionKey: */
 		[ilog2(VM_PKEY_BIT0)]	= "",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dc7b87310c10..65cbbfaa739b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -333,6 +333,14 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
+#if defined(CONFIG_ARM64_MTE)
+# define VM_MTE		VM_HIGH_ARCH_0	/* Use Tagged memory for access control */
+# define VM_MTE_ALLOWED	VM_HIGH_ARCH_1	/* Tagged memory permitted */
+#else
+# define VM_MTE		VM_NONE
+# define VM_MTE_ALLOWED	VM_NONE
+#endif
+
 #ifndef VM_GROWSUP
 # define VM_GROWSUP	VM_NONE
 #endif


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 15/29] mm: Introduce arch_validate_flags()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (13 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 14/29] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 16/29] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

Similarly to arch_validate_prot() called from do_mprotect_pkey(), an
architecture may need to sanity-check the new vm_flags.

Define a dummy function always returning true. In addition to
do_mprotect_pkey(), also invoke it from mmap_region() prior to updating
vma->vm_page_prot to allow the architecture code to veto potentially
inconsistent vm_flags.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    v2:
    - Some comments updated.

 include/linux/mman.h | 13 +++++++++++++
 mm/mmap.c            |  9 +++++++++
 mm/mprotect.c        |  6 ++++++
 3 files changed, 28 insertions(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 9ea3f95faa90..6733f2fc858b 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -104,6 +104,19 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
 #define arch_validate_prot arch_validate_prot
 #endif
 
+#ifndef arch_validate_flags
+/*
+ * This is called from mmap() and mprotect() with the updated vma->vm_flags.
+ *
+ * Returns true if the VM_* flags are valid.
+ */
+static inline bool arch_validate_flags(unsigned long flags)
+{
+	return true;
+}
+#define arch_validate_flags arch_validate_flags
+#endif
+
 /*
  * Optimisation macro.  It is equivalent to:
  *      (x & bit1) ? bit2 : 0
diff --git a/mm/mmap.c b/mm/mmap.c
index 59a4682ebf3f..19518a03fe9a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1792,6 +1792,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vma_set_anonymous(vma);
 	}
 
+	/* Allow architectures to sanity-check the vm_flags */
+	if (!arch_validate_flags(vma->vm_flags)) {
+		error = -EINVAL;
+		if (file)
+			goto unmap_and_free_vma;
+		else
+			goto free_vma;
+	}
+
 	vma_link(mm, vma, prev, rb_link, rb_parent);
 	/* Once vma denies write, undo our temporary denial count */
 	if (file) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ce8b8a5eacbb..56c02beb6041 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -603,6 +603,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			goto out;
 		}
 
+		/* Allow architectures to sanity-check the new flags */
+		if (!arch_validate_flags(newflags)) {
+			error = -EINVAL;
+			goto out;
+		}
+
 		error = security_file_mprotect(vma, reqprot, prot);
 		if (error)
 			goto out;


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 16/29] arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (14 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 15/29] mm: Introduce arch_validate_flags() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 17/29] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

Make use of the newly introduced arch_validate_flags() hook to
sanity-check the PROT_MTE request passed to mmap() and mprotect(). If
the mapping does not support MTE, these syscalls will return -EINVAL.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mman.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index b01051be7750..e3e28f7daf62 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -49,8 +49,10 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 	 * register (1) as VM_MTE in the vma->vm_flags and (2) as
 	 * VM_MTE_ALLOWED. Note that the latter can only be set during the
 	 * mmap() call since mprotect() does not accept MAP_* flags.
+	 * Checking for VM_MTE only is sufficient since arch_validate_flags()
+	 * does not permit (VM_MTE & !VM_MTE_ALLOWED).
 	 */
-	if ((vm_flags & VM_MTE) && (vm_flags & VM_MTE_ALLOWED))
+	if (vm_flags & VM_MTE)
 		prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
 
 	return __pgprot(prot);
@@ -72,4 +74,14 @@ static inline bool arch_validate_prot(unsigned long prot,
 }
 #define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
 
+static inline bool arch_validate_flags(unsigned long vm_flags)
+{
+	if (!system_supports_mte())
+		return true;
+
+	/* only allow VM_MTE if VM_MTE_ALLOWED has been set previously */
+	return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED);
+}
+#define arch_validate_flags(vm_flags) arch_validate_flags(vm_flags)
+
 #endif /* ! __ASM_MMAN_H__ */


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 17/29] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (15 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 16/29] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

Since arm64 memory (allocation) tags can only be stored in RAM, mapping
files with PROT_MTE is not allowed by default. RAM-based files like
those in a tmpfs mount or memfd_create() can support memory tagging, so
update the vm_flags accordingly in shmem_mmap().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
---
 mm/shmem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index a0dbe62f8042..dacee627dae6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2206,6 +2206,9 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 			vma->vm_flags &= ~(VM_MAYWRITE);
 	}
 
+	/* arm64 - allow memory tagging on RAM-based files */
+	vma->vm_flags |= VM_MTE_ALLOWED;
+
 	file_accessed(file);
 	vma->vm_ops = &shmem_vm_ops;
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (16 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 17/29] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-20 15:30   ` Kevin Brodsky
  2020-08-04 19:34   ` Kevin Brodsky
  2020-07-15 17:08 ` [PATCH v7 19/29] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
                   ` (10 subsequent siblings)
  28 siblings, 2 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

By default, even if PROT_MTE is set on a memory range, there is no tag
check fault reporting (SIGSEGV). Introduce a set of option to the
exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
check fault mode:

  PR_MTE_TCF_NONE  - no reporting (default)
  PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
  PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting

These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
context-switched by the kernel. Note that uaccess done by the kernel is
not checked and cannot be configured by the user.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
    - Move mte_thread_switch() in this patch from an earlier one. In
      addition, it is called after the dsb() in __switch_to() so that any
      asynchronous tag check faults have been registered in the TFSR_EL1
      registers (to be added with the in-kernel MTE support.
    
    v2:
    - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
    - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
      Collingbourne).
    - Added ISB to update_sctlr_el1_tcf0() since, with the latest
      architecture update/fix, the TCF0 field is used by the uaccess
      routines.

 arch/arm64/include/asm/mte.h       | 14 ++++++
 arch/arm64/include/asm/processor.h |  3 ++
 arch/arm64/kernel/mte.c            | 77 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c        | 26 ++++++++--
 include/uapi/linux/prctl.h         |  6 +++
 5 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index b2577eee62c2..df2efbc9f8f1 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -21,6 +21,9 @@ void mte_clear_page_tags(void *addr);
 void mte_sync_tags(pte_t *ptep, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void flush_mte_state(void);
+void mte_thread_switch(struct task_struct *next);
+long set_mte_ctrl(unsigned long arg);
+long get_mte_ctrl(void);
 
 #else
 
@@ -36,6 +39,17 @@ static inline void mte_copy_page_tags(void *kto, const void *kfrom)
 static inline void flush_mte_state(void)
 {
 }
+static inline void mte_thread_switch(struct task_struct *next)
+{
+}
+static inline long set_mte_ctrl(unsigned long arg)
+{
+	return 0;
+}
+static inline long get_mte_ctrl(void)
+{
+	return 0;
+}
 
 #endif
 
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 240fe5e5b720..80e7f0573309 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -151,6 +151,9 @@ struct thread_struct {
 	struct ptrauth_keys_user	keys_user;
 	struct ptrauth_keys_kernel	keys_kernel;
 #endif
+#ifdef CONFIG_ARM64_MTE
+	u64			sctlr_tcf0;
+#endif
 };
 
 static inline void arch_thread_struct_whitelist(unsigned long *offset,
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 5f54fd140610..375483a1f573 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -5,6 +5,8 @@
 
 #include <linux/bitops.h>
 #include <linux/mm.h>
+#include <linux/prctl.h>
+#include <linux/sched.h>
 #include <linux/string.h>
 #include <linux/thread_info.h>
 
@@ -49,6 +51,26 @@ int memcmp_pages(struct page *page1, struct page *page2)
 	return ret;
 }
 
+static void update_sctlr_el1_tcf0(u64 tcf0)
+{
+	/* ISB required for the kernel uaccess routines */
+	sysreg_clear_set(sctlr_el1, SCTLR_EL1_TCF0_MASK, tcf0);
+	isb();
+}
+
+static void set_sctlr_el1_tcf0(u64 tcf0)
+{
+	/*
+	 * mte_thread_switch() checks current->thread.sctlr_tcf0 as an
+	 * optimisation. Disable preemption so that it does not see
+	 * the variable update before the SCTLR_EL1.TCF0 one.
+	 */
+	preempt_disable();
+	current->thread.sctlr_tcf0 = tcf0;
+	update_sctlr_el1_tcf0(tcf0);
+	preempt_enable();
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
@@ -58,4 +80,59 @@ void flush_mte_state(void)
 	dsb(ish);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+	/* disable tag checking */
+	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
+}
+
+void mte_thread_switch(struct task_struct *next)
+{
+	if (!system_supports_mte())
+		return;
+
+	/* avoid expensive SCTLR_EL1 accesses if no change */
+	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
+		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
+}
+
+long set_mte_ctrl(unsigned long arg)
+{
+	u64 tcf0;
+
+	if (!system_supports_mte())
+		return 0;
+
+	switch (arg & PR_MTE_TCF_MASK) {
+	case PR_MTE_TCF_NONE:
+		tcf0 = SCTLR_EL1_TCF0_NONE;
+		break;
+	case PR_MTE_TCF_SYNC:
+		tcf0 = SCTLR_EL1_TCF0_SYNC;
+		break;
+	case PR_MTE_TCF_ASYNC:
+		tcf0 = SCTLR_EL1_TCF0_ASYNC;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	set_sctlr_el1_tcf0(tcf0);
+
+	return 0;
+}
+
+long get_mte_ctrl(void)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	switch (current->thread.sctlr_tcf0) {
+	case SCTLR_EL1_TCF0_NONE:
+		return PR_MTE_TCF_NONE;
+	case SCTLR_EL1_TCF0_SYNC:
+		return PR_MTE_TCF_SYNC;
+	case SCTLR_EL1_TCF0_ASYNC:
+		return PR_MTE_TCF_ASYNC;
+	}
+
+	return 0;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 695705d1f8e5..d19ce8053a03 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -544,6 +544,13 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/*
+	 * MTE thread switching must happen after the DSB above to ensure that
+	 * any asynchronous tag check faults have been logged in the TFSR*_EL1
+	 * registers.
+	 */
+	mte_thread_switch(next);
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
@@ -603,9 +610,15 @@ static unsigned int tagged_addr_disabled;
 
 long set_tagged_addr_ctrl(unsigned long arg)
 {
+	unsigned long valid_mask = PR_TAGGED_ADDR_ENABLE;
+
 	if (is_compat_task())
 		return -EINVAL;
-	if (arg & ~PR_TAGGED_ADDR_ENABLE)
+
+	if (system_supports_mte())
+		valid_mask |= PR_MTE_TCF_MASK;
+
+	if (arg & ~valid_mask)
 		return -EINVAL;
 
 	/*
@@ -615,6 +628,9 @@ long set_tagged_addr_ctrl(unsigned long arg)
 	if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
 		return -EINVAL;
 
+	if (set_mte_ctrl(arg) != 0)
+		return -EINVAL;
+
 	update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
 
 	return 0;
@@ -622,13 +638,17 @@ long set_tagged_addr_ctrl(unsigned long arg)
 
 long get_tagged_addr_ctrl(void)
 {
+	long ret = 0;
+
 	if (is_compat_task())
 		return -EINVAL;
 
 	if (test_thread_flag(TIF_TAGGED_ADDR))
-		return PR_TAGGED_ADDR_ENABLE;
+		ret = PR_TAGGED_ADDR_ENABLE;
 
-	return 0;
+	ret |= get_mte_ctrl();
+
+	return ret;
 }
 
 /*
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..2390ab324afa 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -233,6 +233,12 @@ struct prctl_mm_map {
 #define PR_SET_TAGGED_ADDR_CTRL		55
 #define PR_GET_TAGGED_ADDR_CTRL		56
 # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
+/* MTE tag check fault modes */
+# define PR_MTE_TCF_SHIFT		1
+# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 19/29] arm64: mte: Allow user control of the generated random tags via prctl()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (17 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 20/29] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

The IRG, ADDG and SUBG instructions insert a random tag in the resulting
address. Certain tags can be excluded via the GCR_EL1.Exclude bitmap
when, for example, the user wants a certain colour for freed buffers.
Since the GCR_EL1 register is not accessible at EL0, extend the
prctl(PR_SET_TAGGED_ADDR_CTRL) interface to include a 16-bit field in
the first argument for controlling which tags can be generated by the
above instruction (an include rather than exclude mask). Note that by
default all non-zero tags are excluded. This setting is per-thread.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v5:
    - Rename gcr_incl to gcr_user_incl (there will be a subsequent
      gcr_kernel when support for in-kernel MTE is added).
    
    v2:
    - Switch from an exclude mask to an include one for the prctl()
      interface.
    - Reset the allowed tags mask during flush_thread().

 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/include/asm/sysreg.h    |  7 ++++++
 arch/arm64/kernel/mte.c            | 35 +++++++++++++++++++++++++++---
 arch/arm64/kernel/process.c        |  2 +-
 include/uapi/linux/prctl.h         |  3 +++
 5 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 80e7f0573309..e1b1c2a6086e 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -153,6 +153,7 @@ struct thread_struct {
 #endif
 #ifdef CONFIG_ARM64_MTE
 	u64			sctlr_tcf0;
+	u64			gcr_user_incl;
 #endif
 };
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 2e12d8049d1c..d6357c4ea015 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -1033,6 +1033,13 @@
 		write_sysreg(__scs_new, sysreg);			\
 } while (0)
 
+#define sysreg_clear_set_s(sysreg, clear, set) do {			\
+	u64 __scs_val = read_sysreg_s(sysreg);				\
+	u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set);		\
+	if (__scs_new != __scs_val)					\
+		write_sysreg_s(__scs_new, sysreg);			\
+} while (0)
+
 #endif
 
 #endif	/* __ASM_SYSREG_H */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 375483a1f573..07798b8d5039 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -71,6 +71,25 @@ static void set_sctlr_el1_tcf0(u64 tcf0)
 	preempt_enable();
 }
 
+static void update_gcr_el1_excl(u64 incl)
+{
+	u64 excl = ~incl & SYS_GCR_EL1_EXCL_MASK;
+
+	/*
+	 * Note that 'incl' is an include mask (controlled by the user via
+	 * prctl()) while GCR_EL1 accepts an exclude mask.
+	 * No need for ISB since this only affects EL0 currently, implicit
+	 * with ERET.
+	 */
+	sysreg_clear_set_s(SYS_GCR_EL1, SYS_GCR_EL1_EXCL_MASK, excl);
+}
+
+static void set_gcr_el1_excl(u64 incl)
+{
+	current->thread.gcr_user_incl = incl;
+	update_gcr_el1_excl(incl);
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
@@ -82,6 +101,8 @@ void flush_mte_state(void)
 	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
 	/* disable tag checking */
 	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
+	/* reset tag generation mask */
+	set_gcr_el1_excl(0);
 }
 
 void mte_thread_switch(struct task_struct *next)
@@ -92,6 +113,7 @@ void mte_thread_switch(struct task_struct *next)
 	/* avoid expensive SCTLR_EL1 accesses if no change */
 	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
 		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
+	update_gcr_el1_excl(next->thread.gcr_user_incl);
 }
 
 long set_mte_ctrl(unsigned long arg)
@@ -116,23 +138,30 @@ long set_mte_ctrl(unsigned long arg)
 	}
 
 	set_sctlr_el1_tcf0(tcf0);
+	set_gcr_el1_excl((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT);
 
 	return 0;
 }
 
 long get_mte_ctrl(void)
 {
+	unsigned long ret;
+
 	if (!system_supports_mte())
 		return 0;
 
+	ret = current->thread.gcr_user_incl << PR_MTE_TAG_SHIFT;
+
 	switch (current->thread.sctlr_tcf0) {
 	case SCTLR_EL1_TCF0_NONE:
 		return PR_MTE_TCF_NONE;
 	case SCTLR_EL1_TCF0_SYNC:
-		return PR_MTE_TCF_SYNC;
+		ret |= PR_MTE_TCF_SYNC;
+		break;
 	case SCTLR_EL1_TCF0_ASYNC:
-		return PR_MTE_TCF_ASYNC;
+		ret |= PR_MTE_TCF_ASYNC;
+		break;
 	}
 
-	return 0;
+	return ret;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index d19ce8053a03..b5c1c975d38e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -616,7 +616,7 @@ long set_tagged_addr_ctrl(unsigned long arg)
 		return -EINVAL;
 
 	if (system_supports_mte())
-		valid_mask |= PR_MTE_TCF_MASK;
+		valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK;
 
 	if (arg & ~valid_mask)
 		return -EINVAL;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 2390ab324afa..7f0827705c9a 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -239,6 +239,9 @@ struct prctl_mm_map {
 # define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
+/* MTE tag inclusion mask */
+# define PR_MTE_TAG_SHIFT		3
+# define PR_MTE_TAG_MASK		(0xffffUL << PR_MTE_TAG_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 20/29] arm64: mte: Restore the GCR_EL1 register after a suspend
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (18 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 19/29] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 21/29] arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks Catalin Marinas
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

The CPU resume/suspend routines only take care of the common system
registers. Restore GCR_EL1 in addition via the __cpu_suspend_exit()
function.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---

Notes:
    New in v3.

 arch/arm64/include/asm/mte.h | 4 ++++
 arch/arm64/kernel/mte.c      | 8 ++++++++
 arch/arm64/kernel/suspend.c  | 4 ++++
 3 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index df2efbc9f8f1..c93047eff9fe 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -22,6 +22,7 @@ void mte_sync_tags(pte_t *ptep, pte_t pte);
 void mte_copy_page_tags(void *kto, const void *kfrom);
 void flush_mte_state(void);
 void mte_thread_switch(struct task_struct *next);
+void mte_suspend_exit(void);
 long set_mte_ctrl(unsigned long arg);
 long get_mte_ctrl(void);
 
@@ -42,6 +43,9 @@ static inline void flush_mte_state(void)
 static inline void mte_thread_switch(struct task_struct *next)
 {
 }
+static inline void mte_suspend_exit(void)
+{
+}
 static inline long set_mte_ctrl(unsigned long arg)
 {
 	return 0;
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 07798b8d5039..09cf76fc1090 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -116,6 +116,14 @@ void mte_thread_switch(struct task_struct *next)
 	update_gcr_el1_excl(next->thread.gcr_user_incl);
 }
 
+void mte_suspend_exit(void)
+{
+	if (!system_supports_mte())
+		return;
+
+	update_gcr_el1_excl(current->thread.gcr_user_incl);
+}
+
 long set_mte_ctrl(unsigned long arg)
 {
 	u64 tcf0;
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index c1dee9066ff9..62c239cd60c2 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -10,6 +10,7 @@
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
 #include <asm/exec.h>
+#include <asm/mte.h>
 #include <asm/memory.h>
 #include <asm/mmu_context.h>
 #include <asm/smp_plat.h>
@@ -74,6 +75,9 @@ void notrace __cpu_suspend_exit(void)
 	 */
 	if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE)
 		arm64_set_ssbd_mitigation(false);
+
+	/* Restore additional MTE-specific configuration */
+	mte_suspend_exit();
 }
 
 /*


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 21/29] arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (19 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 20/29] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

In preparation for ptrace() access to the prctl() value, allow calling
these functions on non-current tasks.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v7.

 arch/arm64/include/asm/mte.h       |  8 ++++----
 arch/arm64/include/asm/processor.h |  8 ++++----
 arch/arm64/kernel/mte.c            | 18 ++++++++++++------
 arch/arm64/kernel/process.c        | 18 ++++++++++--------
 4 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index c93047eff9fe..1a919905295b 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -23,8 +23,8 @@ void mte_copy_page_tags(void *kto, const void *kfrom);
 void flush_mte_state(void);
 void mte_thread_switch(struct task_struct *next);
 void mte_suspend_exit(void);
-long set_mte_ctrl(unsigned long arg);
-long get_mte_ctrl(void);
+long set_mte_ctrl(struct task_struct *task, unsigned long arg);
+long get_mte_ctrl(struct task_struct *task);
 
 #else
 
@@ -46,11 +46,11 @@ static inline void mte_thread_switch(struct task_struct *next)
 static inline void mte_suspend_exit(void)
 {
 }
-static inline long set_mte_ctrl(unsigned long arg)
+static inline long set_mte_ctrl(struct task_struct *task, unsigned long arg)
 {
 	return 0;
 }
-static inline long get_mte_ctrl(void)
+static inline long get_mte_ctrl(struct task_struct *task)
 {
 	return 0;
 }
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index e1b1c2a6086e..fec204d28fce 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -319,10 +319,10 @@ extern void __init minsigstksz_setup(void);
 
 #ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
 /* PR_{SET,GET}_TAGGED_ADDR_CTRL prctl */
-long set_tagged_addr_ctrl(unsigned long arg);
-long get_tagged_addr_ctrl(void);
-#define SET_TAGGED_ADDR_CTRL(arg)	set_tagged_addr_ctrl(arg)
-#define GET_TAGGED_ADDR_CTRL()		get_tagged_addr_ctrl()
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg);
+long get_tagged_addr_ctrl(struct task_struct *task);
+#define SET_TAGGED_ADDR_CTRL(arg)	set_tagged_addr_ctrl(current, arg)
+#define GET_TAGGED_ADDR_CTRL()		get_tagged_addr_ctrl(current)
 #endif
 
 /*
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 09cf76fc1090..e80c49af74af 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -124,9 +124,10 @@ void mte_suspend_exit(void)
 	update_gcr_el1_excl(current->thread.gcr_user_incl);
 }
 
-long set_mte_ctrl(unsigned long arg)
+long set_mte_ctrl(struct task_struct *task, unsigned long arg)
 {
 	u64 tcf0;
+	u64 gcr_incl = (arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT;
 
 	if (!system_supports_mte())
 		return 0;
@@ -145,22 +146,27 @@ long set_mte_ctrl(unsigned long arg)
 		return -EINVAL;
 	}
 
-	set_sctlr_el1_tcf0(tcf0);
-	set_gcr_el1_excl((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT);
+	if (task != current) {
+		task->thread.sctlr_tcf0 = tcf0;
+		task->thread.gcr_user_incl = gcr_incl;
+	} else {
+		set_sctlr_el1_tcf0(tcf0);
+		set_gcr_el1_excl(gcr_incl);
+	}
 
 	return 0;
 }
 
-long get_mte_ctrl(void)
+long get_mte_ctrl(struct task_struct *task)
 {
 	unsigned long ret;
 
 	if (!system_supports_mte())
 		return 0;
 
-	ret = current->thread.gcr_user_incl << PR_MTE_TAG_SHIFT;
+	ret = task->thread.gcr_user_incl << PR_MTE_TAG_SHIFT;
 
-	switch (current->thread.sctlr_tcf0) {
+	switch (task->thread.sctlr_tcf0) {
 	case SCTLR_EL1_TCF0_NONE:
 		return PR_MTE_TCF_NONE;
 	case SCTLR_EL1_TCF0_SYNC:
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index b5c1c975d38e..35090dbd7363 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -608,11 +608,12 @@ void arch_setup_new_exec(void)
  */
 static unsigned int tagged_addr_disabled;
 
-long set_tagged_addr_ctrl(unsigned long arg)
+long set_tagged_addr_ctrl(struct task_struct *task, unsigned long arg)
 {
 	unsigned long valid_mask = PR_TAGGED_ADDR_ENABLE;
+	struct thread_info *ti = task_thread_info(task);
 
-	if (is_compat_task())
+	if (is_compat_thread(ti))
 		return -EINVAL;
 
 	if (system_supports_mte())
@@ -628,25 +629,26 @@ long set_tagged_addr_ctrl(unsigned long arg)
 	if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
 		return -EINVAL;
 
-	if (set_mte_ctrl(arg) != 0)
+	if (set_mte_ctrl(task, arg) != 0)
 		return -EINVAL;
 
-	update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
+	update_ti_thread_flag(ti, TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
 
 	return 0;
 }
 
-long get_tagged_addr_ctrl(void)
+long get_tagged_addr_ctrl(struct task_struct *task)
 {
 	long ret = 0;
+	struct thread_info *ti = task_thread_info(task);
 
-	if (is_compat_task())
+	if (is_compat_thread(ti))
 		return -EINVAL;
 
-	if (test_thread_flag(TIF_TAGGED_ADDR))
+	if (test_ti_thread_flag(ti, TIF_TAGGED_ADDR))
 		ret = PR_TAGGED_ADDR_ENABLE;
 
-	ret |= get_mte_ctrl();
+	ret |= get_mte_ctrl(task);
 
 	return ret;
 }


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (20 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 21/29] arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-08-13 14:01   ` Luis Machado
  2020-07-15 17:08 ` [PATCH v7 23/29] arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset Catalin Marinas
                   ` (6 subsequent siblings)
  28 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Alan Hayward, Luis Machado, Omair Javaid

Add support for bulk setting/getting of the MTE tags in a tracee's
address space at 'addr' in the ptrace() syscall prototype. 'data' points
to a struct iovec in the tracer's address space with iov_base
representing the address of a tracer's buffer of length iov_len. The
tags to be copied to/from the tracer's buffer are stored as one tag per
byte.

On successfully copying at least one tag, ptrace() returns 0 and updates
the tracer's iov_len with the number of tags copied. In case of error,
either -EIO or -EFAULT is returned, trying to follow the ptrace() man
page.

Note that the tag copying functions are not performance critical,
therefore they lack optimisations found in typical memory copy routines.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alan Hayward <Alan.Hayward@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Omair Javaid <omair.javaid@linaro.org>
---

Notes:
    v4:
    - Following the change to only clear the tags in a page if it is mapped
      to user with PROT_MTE, ptrace() now will refuse to access tags in
      pages not previously mapped with PROT_MTE (PG_mte_tagged set). This is
      primarily to avoid leaking uninitialised tags to user via ptrace().
    - Fix SYM_FUNC_END argument typo.
    - Rename MTE_ALLOC_* to MTE_GRANULE_*.
    - Use uao_user_alternative for the user access in case we ever want to
      call mte_copy_tags_* with a kernel buffer. It also matches the other
      uaccess routines in the kernel.
    - Simplify arch_ptrace() slightly.
    - Reorder down_write_killable() with access_ok() in
      __access_remote_tags().
    - Handle copy length 0 in mte_copy_tags_{to,from}_user().
    - Use put_user() instead of __put_user().
    
    New in v3.

 arch/arm64/include/asm/mte.h         |  17 ++++
 arch/arm64/include/uapi/asm/ptrace.h |   3 +
 arch/arm64/kernel/mte.c              | 139 +++++++++++++++++++++++++++
 arch/arm64/kernel/ptrace.c           |   7 ++
 arch/arm64/lib/mte.S                 |  53 ++++++++++
 5 files changed, 219 insertions(+)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 1a919905295b..7ea0c0e526d1 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -5,6 +5,11 @@
 #ifndef __ASM_MTE_H
 #define __ASM_MTE_H
 
+#define MTE_GRANULE_SIZE	UL(16)
+#define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
+#define MTE_TAG_SHIFT		56
+#define MTE_TAG_SIZE		4
+
 #ifndef __ASSEMBLY__
 
 #include <linux/page-flags.h>
@@ -12,6 +17,10 @@
 #include <asm/pgtable-types.h>
 
 void mte_clear_page_tags(void *addr);
+unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
+				      unsigned long n);
+unsigned long mte_copy_tags_to_user(void __user *to, void *from,
+				    unsigned long n);
 
 #ifdef CONFIG_ARM64_MTE
 
@@ -25,6 +34,8 @@ void mte_thread_switch(struct task_struct *next);
 void mte_suspend_exit(void);
 long set_mte_ctrl(struct task_struct *task, unsigned long arg);
 long get_mte_ctrl(struct task_struct *task);
+int mte_ptrace_copy_tags(struct task_struct *child, long request,
+			 unsigned long addr, unsigned long data);
 
 #else
 
@@ -54,6 +65,12 @@ static inline long get_mte_ctrl(struct task_struct *task)
 {
 	return 0;
 }
+static inline int mte_ptrace_copy_tags(struct task_struct *child,
+				       long request, unsigned long addr,
+				       unsigned long data)
+{
+	return -EIO;
+}
 
 #endif
 
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 06413d9f2341..758ae984ff97 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -76,6 +76,9 @@
 /* syscall emulation path in ptrace */
 #define PTRACE_SYSEMU		  31
 #define PTRACE_SYSEMU_SINGLESTEP  32
+/* MTE allocation tag access */
+#define PTRACE_PEEKMTETAGS	  33
+#define PTRACE_POKEMTETAGS	  34
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index e80c49af74af..0a8b90afe9d7 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -4,14 +4,18 @@
  */
 
 #include <linux/bitops.h>
+#include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/prctl.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/string.h>
 #include <linux/thread_info.h>
+#include <linux/uio.h>
 
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
+#include <asm/ptrace.h>
 #include <asm/sysreg.h>
 
 void mte_sync_tags(pte_t *ptep, pte_t pte)
@@ -179,3 +183,138 @@ long get_mte_ctrl(struct task_struct *task)
 
 	return ret;
 }
+
+/*
+ * Access MTE tags in another process' address space as given in mm. Update
+ * the number of tags copied. Return 0 if any tags copied, error otherwise.
+ * Inspired by __access_remote_vm().
+ */
+static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
+				unsigned long addr, struct iovec *kiov,
+				unsigned int gup_flags)
+{
+	struct vm_area_struct *vma;
+	void __user *buf = kiov->iov_base;
+	size_t len = kiov->iov_len;
+	int ret;
+	int write = gup_flags & FOLL_WRITE;
+
+	if (!access_ok(buf, len))
+		return -EFAULT;
+
+	if (mmap_read_lock_killable(mm))
+		return -EIO;
+
+	while (len) {
+		unsigned long tags, offset;
+		void *maddr;
+		struct page *page = NULL;
+
+		ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
+					    &page, &vma, NULL);
+		if (ret <= 0)
+			break;
+
+		/*
+		 * Only copy tags if the page has been mapped as PROT_MTE
+		 * (PG_mte_tagged set). Otherwise the tags are not valid and
+		 * not accessible to user. Moreover, an mprotect(PROT_MTE)
+		 * would cause the existing tags to be cleared if the page
+		 * was never mapped with PROT_MTE.
+		 */
+		if (!test_bit(PG_mte_tagged, &page->flags)) {
+			ret = -EOPNOTSUPP;
+			put_page(page);
+			break;
+		}
+
+		/* limit access to the end of the page */
+		offset = offset_in_page(addr);
+		tags = min(len, (PAGE_SIZE - offset) / MTE_GRANULE_SIZE);
+
+		maddr = page_address(page);
+		if (write) {
+			tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
+			set_page_dirty_lock(page);
+		} else {
+			tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
+		}
+		put_page(page);
+
+		/* error accessing the tracer's buffer */
+		if (!tags)
+			break;
+
+		len -= tags;
+		buf += tags;
+		addr += tags * MTE_GRANULE_SIZE;
+	}
+	mmap_read_unlock(mm);
+
+	/* return an error if no tags copied */
+	kiov->iov_len = buf - kiov->iov_base;
+	if (!kiov->iov_len) {
+		/* check for error accessing the tracee's address space */
+		if (ret <= 0)
+			return -EIO;
+		else
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * Copy MTE tags in another process' address space at 'addr' to/from tracer's
+ * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
+ */
+static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
+			      struct iovec *kiov, unsigned int gup_flags)
+{
+	struct mm_struct *mm;
+	int ret;
+
+	mm = get_task_mm(tsk);
+	if (!mm)
+		return -EPERM;
+
+	if (!tsk->ptrace || (current != tsk->parent) ||
+	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
+	     !ptracer_capable(tsk, mm->user_ns))) {
+		mmput(mm);
+		return -EPERM;
+	}
+
+	ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
+	mmput(mm);
+
+	return ret;
+}
+
+int mte_ptrace_copy_tags(struct task_struct *child, long request,
+			 unsigned long addr, unsigned long data)
+{
+	int ret;
+	struct iovec kiov;
+	struct iovec __user *uiov = (void __user *)data;
+	unsigned int gup_flags = FOLL_FORCE;
+
+	if (!system_supports_mte())
+		return -EIO;
+
+	if (get_user(kiov.iov_base, &uiov->iov_base) ||
+	    get_user(kiov.iov_len, &uiov->iov_len))
+		return -EFAULT;
+
+	if (request == PTRACE_POKEMTETAGS)
+		gup_flags |= FOLL_WRITE;
+
+	/* align addr to the MTE tag granule */
+	addr &= MTE_GRANULE_MASK;
+
+	ret = access_remote_tags(child, addr, &kiov, gup_flags);
+	if (!ret)
+		ret = put_user(kiov.iov_len, &uiov->iov_len);
+
+	return ret;
+}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 4582014dda25..653a03598c75 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -34,6 +34,7 @@
 #include <asm/cpufeature.h>
 #include <asm/debug-monitors.h>
 #include <asm/fpsimd.h>
+#include <asm/mte.h>
 #include <asm/pointer_auth.h>
 #include <asm/stacktrace.h>
 #include <asm/syscall.h>
@@ -1796,6 +1797,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 long arch_ptrace(struct task_struct *child, long request,
 		 unsigned long addr, unsigned long data)
 {
+	switch (request) {
+	case PTRACE_PEEKMTETAGS:
+	case PTRACE_POKEMTETAGS:
+		return mte_ptrace_copy_tags(child, request, addr, data);
+	}
+
 	return ptrace_request(child, request, addr, data);
 }
 
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index 3c3d0edbbca3..434f81d9a180 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -4,7 +4,9 @@
  */
 #include <linux/linkage.h>
 
+#include <asm/alternative.h>
 #include <asm/assembler.h>
+#include <asm/mte.h>
 #include <asm/page.h>
 #include <asm/sysreg.h>
 
@@ -51,3 +53,54 @@ SYM_FUNC_START(mte_copy_page_tags)
 	b.ne	1b
 	ret
 SYM_FUNC_END(mte_copy_page_tags)
+
+/*
+ * Read tags from a user buffer (one tag per byte) and set the corresponding
+ * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
+ *   x0 - kernel address (to)
+ *   x1 - user buffer (from)
+ *   x2 - number of tags/bytes (n)
+ * Returns:
+ *   x0 - number of tags read/set
+ */
+SYM_FUNC_START(mte_copy_tags_from_user)
+	mov	x3, x1
+	cbz	x2, 2f
+1:
+	uao_user_alternative 2f, ldrb, ldtrb, w4, x1, 0
+	lsl	x4, x4, #MTE_TAG_SHIFT
+	stg	x4, [x0], #MTE_GRANULE_SIZE
+	add	x1, x1, #1
+	subs	x2, x2, #1
+	b.ne	1b
+
+	// exception handling and function return
+2:	sub	x0, x1, x3		// update the number of tags set
+	ret
+SYM_FUNC_END(mte_copy_tags_from_user)
+
+/*
+ * Get the tags from a kernel address range and write the tag values to the
+ * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
+ *   x0 - user buffer (to)
+ *   x1 - kernel address (from)
+ *   x2 - number of tags/bytes (n)
+ * Returns:
+ *   x0 - number of tags read/set
+ */
+SYM_FUNC_START(mte_copy_tags_to_user)
+	mov	x3, x0
+	cbz	x2, 2f
+1:
+	ldg	x4, [x1]
+	ubfx	x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
+	uao_user_alternative 2f, strb, sttrb, w4, x0, 0
+	add	x0, x0, #1
+	add	x1, x1, #MTE_GRANULE_SIZE
+	subs	x2, x2, #1
+	b.ne	1b
+
+	// exception handling and function return
+2:	sub	x0, x0, x3		// update the number of tags copied
+	ret
+SYM_FUNC_END(mte_copy_tags_to_user)


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 23/29] arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (21 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 24/29] fs: Handle intra-page faults in copy_mount_options() Catalin Marinas
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Alan Hayward, Luis Machado, Omair Javaid

This regset allows read/write access to a ptraced process
prctl(PR_SET_TAGGED_ADDR_CTRL) setting.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alan Hayward <Alan.Hayward@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Omair Javaid <omair.javaid@linaro.org>
---

Notes:
    New in v7.

 arch/arm64/kernel/ptrace.c | 44 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 45 insertions(+)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 653a03598c75..e21ce360348b 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1108,6 +1108,37 @@ static int pac_generic_keys_set(struct task_struct *target,
 #endif /* CONFIG_CHECKPOINT_RESTORE */
 #endif /* CONFIG_ARM64_PTR_AUTH */
 
+#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
+static int tagged_addr_ctrl_get(struct task_struct *target,
+				const struct user_regset *regset,
+				unsigned int pos, unsigned int count,
+				void *kbuf, void __user *ubuf)
+{
+	long ctrl = get_tagged_addr_ctrl(target);
+
+	if (IS_ERR_VALUE(ctrl))
+		return ctrl;
+
+	return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
+				   &ctrl, 0, -1);
+}
+
+static int tagged_addr_ctrl_set(struct task_struct *target, const struct
+				user_regset *regset, unsigned int pos,
+				unsigned int count, const void *kbuf, const
+				void __user *ubuf)
+{
+	int ret;
+	long ctrl;
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &ctrl, 0, -1);
+	if (ret)
+		return ret;
+
+	return set_tagged_addr_ctrl(target, ctrl);
+}
+#endif
+
 enum aarch64_regset {
 	REGSET_GPR,
 	REGSET_FPR,
@@ -1127,6 +1158,9 @@ enum aarch64_regset {
 	REGSET_PACG_KEYS,
 #endif
 #endif
+#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
+	REGSET_TAGGED_ADDR_CTRL,
+#endif
 };
 
 static const struct user_regset aarch64_regsets[] = {
@@ -1225,6 +1259,16 @@ static const struct user_regset aarch64_regsets[] = {
 	},
 #endif
 #endif
+#ifdef CONFIG_ARM64_TAGGED_ADDR_ABI
+	[REGSET_TAGGED_ADDR_CTRL] = {
+		.core_note_type = NT_ARM_TAGGED_ADDR_CTRL,
+		.n = 1,
+		.size = sizeof(long),
+		.align = sizeof(long),
+		.get = tagged_addr_ctrl_get,
+		.set = tagged_addr_ctrl_set,
+	},
+#endif
 };
 
 static const struct user_regset_view user_aarch64_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index c6dd0215482e..0a806c1c4a02 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -425,6 +425,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_PAC_MASK		0x406	/* ARM pointer authentication code masks */
 #define NT_ARM_PACA_KEYS	0x407	/* ARM pointer authentication address keys */
 #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
+#define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 24/29] fs: Handle intra-page faults in copy_mount_options()
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (22 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 23/29] arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 25/29] mm: Add arch hooks for saving/restoring tags Catalin Marinas
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	linux-fsdevel, Alexander Viro

The copy_mount_options() function takes a user pointer argument but no
size and it tries to read up to a PAGE_SIZE. However, copy_from_user()
is not guaranteed to return all the accessible bytes if, for example,
the access crosses a page boundary and gets a fault on the second page.
To work around this, the current copy_mount_options() implementation
performs two copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.

On arm64 with MTE enabled, access to a user page may trigger a fault
after part of the buffer in a page has been copied (when the user
pointer tag, bits 56-59, no longer matches the allocation tag stored in
memory). Allow copy_mount_options() to handle such intra-page faults by
resorting to byte at a time copy in case of copy_from_user() failure.

Note that copy_from_user() handles the zeroing of the kernel buffer in
case of error.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---

Al, I thought I'd not spam you with the whole arm64 MTE series, so only
cc'ing you on a patch in the middle. Would you mind ack'ing/nak'ing this
patch? I intend to push it upstream with the rest of the series for 5.9
(well, assuming nothing falls apart). It's not as elegant as the
previous approach but with MTE we can get faults in the middle of a
page, so we'd have to fall back to byte by byte copying.

Thanks.

Notes:
    v6:
    - Simplified logic to fall-back to byte-by-byte if the copy_from_user()
      fails.
    
    v4:
    - Rewrite to avoid arch_has_exact_copy_from_user()
    
    New in v3.

 fs/namespace.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f30ed401cc6d..fc45020c244e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3074,7 +3074,7 @@ static void shrink_submounts(struct mount *mnt)
 void *copy_mount_options(const void __user * data)
 {
 	char *copy;
-	unsigned size;
+	unsigned left, offset;
 
 	if (!data)
 		return NULL;
@@ -3083,16 +3083,27 @@ void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	size = PAGE_SIZE - offset_in_page(data);
+	left = copy_from_user(copy, data, PAGE_SIZE);
 
-	if (copy_from_user(copy, data, size)) {
+	/*
+	 * Not all architectures have an exact copy_from_user(). Resort to
+	 * byte at a time.
+	 */
+	offset = PAGE_SIZE - left;
+	while (left) {
+		char c;
+		if (get_user(c, (const char __user *)data + offset))
+			break;
+		copy[offset] = c;
+		left--;
+		offset++;
+	}
+
+	if (left == PAGE_SIZE) {
 		kfree(copy);
 		return ERR_PTR(-EFAULT);
 	}
-	if (size != PAGE_SIZE) {
-		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
-			memset(copy + size, 0, PAGE_SIZE - size);
-	}
+
 	return copy;
 }
 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 25/29] mm: Add arch hooks for saving/restoring tags
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (23 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 24/29] fs: Handle intra-page faults in copy_mount_options() Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 26/29] arm64: mte: Enable swap of tagged pages Catalin Marinas
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Steven Price

From: Steven Price <steven.price@arm.com>

Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
every physical page, when swapping pages out to disk it is necessary to
save these tags, and later restore them when reading the pages back.

Add some hooks along with dummy implementations to enable the
arch code to handle this.

Three new hooks are added to the swap code:
 * arch_prepare_to_swap() and
 * arch_swap_invalidate_page() / arch_swap_invalidate_area().
One new hook is added to shmem:
 * arch_swap_restore()

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: add unlock_page() on the error path]
[catalin.marinas@arm.com: dropped the _tags suffix]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    v6:
    - Added comment on where the arch code should define the overriding
      macros (asm/pgtable.h).
    - Dropped _tags suffix from arch_swap_restore_tags().
    
    New in v4.

 include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++
 mm/page_io.c            | 10 ++++++++++
 mm/shmem.c              |  6 ++++++
 mm/swapfile.c           |  2 ++
 4 files changed, 46 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 56c1e8eb7bb0..cbe775d3446f 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -631,6 +631,34 @@ static inline int arch_unmap_one(struct mm_struct *mm,
 }
 #endif
 
+/*
+ * Allow architectures to preserve additional metadata associated with
+ * swapped-out pages. The corresponding __HAVE_ARCH_SWAP_* macros and function
+ * prototypes must be defined in the arch-specific asm/pgtable.h file.
+ */
+#ifndef __HAVE_ARCH_PREPARE_TO_SWAP
+static inline int arch_prepare_to_swap(struct page *page)
+{
+	return 0;
+}
+#endif
+
+#ifndef __HAVE_ARCH_SWAP_INVALIDATE
+static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
+{
+}
+
+static inline void arch_swap_invalidate_area(int type)
+{
+}
+#endif
+
+#ifndef __HAVE_ARCH_SWAP_RESTORE
+static inline void arch_swap_restore(swp_entry_t entry, struct page *page)
+{
+}
+#endif
+
 #ifndef __HAVE_ARCH_PGD_OFFSET_GATE
 #define pgd_offset_gate(mm, addr)	pgd_offset(mm, addr)
 #endif
diff --git a/mm/page_io.c b/mm/page_io.c
index e8726f3e3820..9f3835161002 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -252,6 +252,16 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		unlock_page(page);
 		goto out;
 	}
+	/*
+	 * Arch code may have to preserve more data than just the page
+	 * contents, e.g. memory tags.
+	 */
+	ret = arch_prepare_to_swap(page);
+	if (ret) {
+		set_page_dirty(page);
+		unlock_page(page);
+		goto out;
+	}
 	if (frontswap_store(page) == 0) {
 		set_page_writeback(page);
 		unlock_page(page);
diff --git a/mm/shmem.c b/mm/shmem.c
index dacee627dae6..66024b1884c1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1673,6 +1673,12 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
 	}
 	wait_on_page_writeback(page);
 
+	/*
+	 * Some architectures may have to restore extra metadata to the
+	 * physical page after reading from swap.
+	 */
+	arch_swap_restore(swap, page);
+
 	if (shmem_should_replace_page(page, gfp)) {
 		error = shmem_replace_page(&page, gfp, info, index);
 		if (error)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 987276c557d1..b7a3ed45e606 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -716,6 +716,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset,
 	else
 		swap_slot_free_notify = NULL;
 	while (offset <= end) {
+		arch_swap_invalidate_page(si->type, offset);
 		frontswap_invalidate_page(si->type, offset);
 		if (swap_slot_free_notify)
 			swap_slot_free_notify(si->bdev, offset);
@@ -2675,6 +2676,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	frontswap_map = frontswap_map_get(p);
 	spin_unlock(&p->lock);
 	spin_unlock(&swap_lock);
+	arch_swap_invalidate_area(p->type);
 	frontswap_invalidate_area(p->type);
 	frontswap_map_set(p, NULL);
 	mutex_unlock(&swapon_mutex);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 26/29] arm64: mte: Enable swap of tagged pages
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (24 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 25/29] mm: Add arch hooks for saving/restoring tags Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 27/29] arm64: mte: Save tags when hibernating Catalin Marinas
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Steven Price

From: Steven Price <steven.price@arm.com>

When swapping pages out to disk it is necessary to save any tags that
have been set, and restore when swapping back in. Make use of the new
page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages
with tags. When swapping out these pages the tags are stored in memory
and later restored when the pages are brought back in. Because shmem can
swap pages back in without restoring the userspace PTE it is also
necessary to add a hook for shmem.

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: move function prototypes to mte.h]
[catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v6:
    - Remove stale copy of include/asm-generic/pgtable.h (bad conflict
      resolution in v5).
    - check_swap should be true for nr_pages == 1.
    
    New in v4.

 arch/arm64/include/asm/mte.h     |  8 +++
 arch/arm64/include/asm/pgtable.h | 32 ++++++++++++
 arch/arm64/kernel/mte.c          | 19 +++++++-
 arch/arm64/lib/mte.S             | 45 +++++++++++++++++
 arch/arm64/mm/Makefile           |  1 +
 arch/arm64/mm/mteswap.c          | 83 ++++++++++++++++++++++++++++++++
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/mm/mteswap.c

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 7ea0c0e526d1..1c99fcadb58c 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -21,6 +21,14 @@ unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
 				      unsigned long n);
 unsigned long mte_copy_tags_to_user(void __user *to, void *from,
 				    unsigned long n);
+int mte_save_tags(struct page *page);
+void mte_save_page_tags(const void *page_addr, void *tag_storage);
+bool mte_restore_tags(swp_entry_t entry, struct page *page);
+void mte_restore_page_tags(void *page_addr, const void *tag_storage);
+void mte_invalidate_tags(int type, pgoff_t offset);
+void mte_invalidate_tags_area(int type);
+void *mte_allocate_tag_storage(void);
+void mte_free_tag_storage(char *storage);
 
 #ifdef CONFIG_ARM64_MTE
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 78a545536a45..6150d5bcc7d8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -857,6 +857,38 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 
 extern int kern_addr_valid(unsigned long addr);
 
+#ifdef CONFIG_ARM64_MTE
+
+#define __HAVE_ARCH_PREPARE_TO_SWAP
+static inline int arch_prepare_to_swap(struct page *page)
+{
+	if (system_supports_mte())
+		return mte_save_tags(page);
+	return 0;
+}
+
+#define __HAVE_ARCH_SWAP_INVALIDATE
+static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
+{
+	if (system_supports_mte())
+		mte_invalidate_tags(type, offset);
+}
+
+static inline void arch_swap_invalidate_area(int type)
+{
+	if (system_supports_mte())
+		mte_invalidate_tags_area(type);
+}
+
+#define __HAVE_ARCH_SWAP_RESTORE
+static inline void arch_swap_restore(swp_entry_t entry, struct page *page)
+{
+	if (system_supports_mte() && mte_restore_tags(entry, page))
+		set_bit(PG_mte_tagged, &page->flags);
+}
+
+#endif /* CONFIG_ARM64_MTE */
+
 /*
  * On AArch64, the cache coherency is handled via the set_pte_at() function.
  */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 0a8b90afe9d7..eb39504e390a 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -10,6 +10,8 @@
 #include <linux/sched.h>
 #include <linux/sched/mm.h>
 #include <linux/string.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
 #include <linux/thread_info.h>
 #include <linux/uio.h>
 
@@ -18,15 +20,30 @@
 #include <asm/ptrace.h>
 #include <asm/sysreg.h>
 
+static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap)
+{
+	pte_t old_pte = READ_ONCE(*ptep);
+
+	if (check_swap && is_swap_pte(old_pte)) {
+		swp_entry_t entry = pte_to_swp_entry(old_pte);
+
+		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
+			return;
+	}
+
+	mte_clear_page_tags(page_address(page));
+}
+
 void mte_sync_tags(pte_t *ptep, pte_t pte)
 {
 	struct page *page = pte_page(pte);
 	long i, nr_pages = compound_nr(page);
+	bool check_swap = nr_pages == 1;
 
 	/* if PG_mte_tagged is set, tags have already been initialised */
 	for (i = 0; i < nr_pages; i++, page++) {
 		if (!test_and_set_bit(PG_mte_tagged, &page->flags))
-			mte_clear_page_tags(page_address(page));
+			mte_sync_page_tags(page, ptep, check_swap);
 	}
 }
 
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index 434f81d9a180..03ca6d8b8670 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -104,3 +104,48 @@ SYM_FUNC_START(mte_copy_tags_to_user)
 2:	sub	x0, x0, x3		// update the number of tags copied
 	ret
 SYM_FUNC_END(mte_copy_tags_to_user)
+
+/*
+ * Save the tags in a page
+ *   x0 - page address
+ *   x1 - tag storage
+ */
+SYM_FUNC_START(mte_save_page_tags)
+	multitag_transfer_size x7, x5
+1:
+	mov	x2, #0
+2:
+	ldgm	x5, [x0]
+	orr	x2, x2, x5
+	add	x0, x0, x7
+	tst	x0, #0xFF		// 16 tag values fit in a register,
+	b.ne	2b			// which is 16*16=256 bytes
+
+	str	x2, [x1], #8
+
+	tst	x0, #(PAGE_SIZE - 1)
+	b.ne	1b
+
+	ret
+SYM_FUNC_END(mte_save_page_tags)
+
+/*
+ * Restore the tags in a page
+ *   x0 - page address
+ *   x1 - tag storage
+ */
+SYM_FUNC_START(mte_restore_page_tags)
+	multitag_transfer_size x7, x5
+1:
+	ldr	x2, [x1], #8
+2:
+	stgm	x2, [x0]
+	add	x0, x0, x7
+	tst	x0, #0xFF
+	b.ne	2b
+
+	tst	x0, #(PAGE_SIZE - 1)
+	b.ne	1b
+
+	ret
+SYM_FUNC_END(mte_restore_page_tags)
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index d91030f0ffee..5bcc9e0aa259 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PTDUMP_CORE)	+= dump.o
 obj-$(CONFIG_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
+obj-$(CONFIG_ARM64_MTE)		+= mteswap.o
 KASAN_SANITIZE_physaddr.o	+= n
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
new file mode 100644
index 000000000000..c52c1847079c
--- /dev/null
+++ b/arch/arm64/mm/mteswap.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/pagemap.h>
+#include <linux/xarray.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <asm/mte.h>
+
+static DEFINE_XARRAY(mte_pages);
+
+void *mte_allocate_tag_storage(void)
+{
+	/* tags granule is 16 bytes, 2 tags stored per byte */
+	return kmalloc(PAGE_SIZE / 16 / 2, GFP_KERNEL);
+}
+
+void mte_free_tag_storage(char *storage)
+{
+	kfree(storage);
+}
+
+int mte_save_tags(struct page *page)
+{
+	void *tag_storage, *ret;
+
+	if (!test_bit(PG_mte_tagged, &page->flags))
+		return 0;
+
+	tag_storage = mte_allocate_tag_storage();
+	if (!tag_storage)
+		return -ENOMEM;
+
+	mte_save_page_tags(page_address(page), tag_storage);
+
+	/* page_private contains the swap entry.val set in do_swap_page */
+	ret = xa_store(&mte_pages, page_private(page), tag_storage, GFP_KERNEL);
+	if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
+		mte_free_tag_storage(tag_storage);
+		return xa_err(ret);
+	} else if (ret) {
+		/* Entry is being replaced, free the old entry */
+		mte_free_tag_storage(ret);
+	}
+
+	return 0;
+}
+
+bool mte_restore_tags(swp_entry_t entry, struct page *page)
+{
+	void *tags = xa_load(&mte_pages, entry.val);
+
+	if (!tags)
+		return false;
+
+	mte_restore_page_tags(page_address(page), tags);
+
+	return true;
+}
+
+void mte_invalidate_tags(int type, pgoff_t offset)
+{
+	swp_entry_t entry = swp_entry(type, offset);
+	void *tags = xa_erase(&mte_pages, entry.val);
+
+	mte_free_tag_storage(tags);
+}
+
+void mte_invalidate_tags_area(int type)
+{
+	swp_entry_t entry = swp_entry(type, 0);
+	swp_entry_t last_entry = swp_entry(type + 1, 0);
+	void *tags;
+
+	XA_STATE(xa_state, &mte_pages, entry.val);
+
+	xa_lock(&mte_pages);
+	xas_for_each(&xa_state, tags, last_entry.val - 1) {
+		__xa_erase(&mte_pages, xa_state.xa_index);
+		mte_free_tag_storage(tags);
+	}
+	xa_unlock(&mte_pages);
+}


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 27/29] arm64: mte: Save tags when hibernating
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (25 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 26/29] arm64: mte: Enable swap of tagged pages Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 28/29] arm64: mte: Kconfig entry Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Steven Price, James Morse

From: Steven Price <steven.price@arm.com>

When hibernating the contents of all pages in the system are written to
disk, however the MTE tags are not visible to the generic hibernation
code. So just before the hibernation image is created copy the tags out
of the physical tag storage into standard memory so they will be
included in the hibernation image. After hibernation apply the tags back
into the physical tag storage.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v4.

 arch/arm64/kernel/hibernate.c | 118 ++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 68e14152d6e9..23467092e24d 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -31,6 +31,7 @@
 #include <asm/kexec.h>
 #include <asm/memory.h>
 #include <asm/mmu_context.h>
+#include <asm/mte.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/sections.h>
@@ -285,6 +286,117 @@ static int create_safe_exec_page(void *src_start, size_t length,
 
 #define dcache_clean_range(start, end)	__flush_dcache_area(start, (end - start))
 
+#ifdef CONFIG_ARM64_MTE
+
+static DEFINE_XARRAY(mte_pages);
+
+static int save_tags(struct page *page, unsigned long pfn)
+{
+	void *tag_storage, *ret;
+
+	tag_storage = mte_allocate_tag_storage();
+	if (!tag_storage)
+		return -ENOMEM;
+
+	mte_save_page_tags(page_address(page), tag_storage);
+
+	ret = xa_store(&mte_pages, pfn, tag_storage, GFP_KERNEL);
+	if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
+		mte_free_tag_storage(tag_storage);
+		return xa_err(ret);
+	} else if (WARN(ret, "swsusp: %s: Duplicate entry", __func__)) {
+		mte_free_tag_storage(ret);
+	}
+
+	return 0;
+}
+
+static void swsusp_mte_free_storage(void)
+{
+	XA_STATE(xa_state, &mte_pages, 0);
+	void *tags;
+
+	xa_lock(&mte_pages);
+	xas_for_each(&xa_state, tags, ULONG_MAX) {
+		mte_free_tag_storage(tags);
+	}
+	xa_unlock(&mte_pages);
+
+	xa_destroy(&mte_pages);
+}
+
+static int swsusp_mte_save_tags(void)
+{
+	struct zone *zone;
+	unsigned long pfn, max_zone_pfn;
+	int ret = 0;
+	int n = 0;
+
+	if (!system_supports_mte())
+		return 0;
+
+	for_each_populated_zone(zone) {
+		max_zone_pfn = zone_end_pfn(zone);
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
+			struct page *page = pfn_to_online_page(pfn);
+
+			if (!page)
+				continue;
+
+			if (!test_bit(PG_mte_tagged, &page->flags))
+				continue;
+
+			ret = save_tags(page, pfn);
+			if (ret) {
+				swsusp_mte_free_storage();
+				goto out;
+			}
+
+			n++;
+		}
+	}
+	pr_info("Saved %d MTE pages\n", n);
+
+out:
+	return ret;
+}
+
+static void swsusp_mte_restore_tags(void)
+{
+	XA_STATE(xa_state, &mte_pages, 0);
+	int n = 0;
+	void *tags;
+
+	xa_lock(&mte_pages);
+	xas_for_each(&xa_state, tags, ULONG_MAX) {
+		unsigned long pfn = xa_state.xa_index;
+		struct page *page = pfn_to_online_page(pfn);
+
+		mte_restore_page_tags(page_address(page), tags);
+
+		mte_free_tag_storage(tags);
+		n++;
+	}
+	xa_unlock(&mte_pages);
+
+	pr_info("Restored %d MTE pages\n", n);
+
+	xa_destroy(&mte_pages);
+}
+
+#else	/* CONFIG_ARM64_MTE */
+
+static int swsusp_mte_save_tags(void)
+{
+	return 0;
+}
+
+static void swsusp_mte_restore_tags(void)
+{
+}
+
+#endif	/* CONFIG_ARM64_MTE */
+
 int swsusp_arch_suspend(void)
 {
 	int ret = 0;
@@ -302,6 +414,10 @@ int swsusp_arch_suspend(void)
 		/* make the crash dump kernel image visible/saveable */
 		crash_prepare_suspend();
 
+		ret = swsusp_mte_save_tags();
+		if (ret)
+			return ret;
+
 		sleep_cpu = smp_processor_id();
 		ret = swsusp_save();
 	} else {
@@ -315,6 +431,8 @@ int swsusp_arch_suspend(void)
 			dcache_clean_range(__hyp_text_start, __hyp_text_end);
 		}
 
+		swsusp_mte_restore_tags();
+
 		/* make the crash dump kernel image protected again */
 		crash_post_resume();
 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 28/29] arm64: mte: Kconfig entry
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (26 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 27/29] arm64: mte: Save tags when hibernating Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-15 17:08 ` [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
  28 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add Memory Tagging Extension support to the arm64 kbuild.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v7:
    - Binutils gained initial support for MTE in 2.32.0. However, a late
      architecture addition (LDGM/STGM) is only supported in the newer
      2.32.x and 2.33 versions. Change the AS_HAS_MTE option to also check
      for stgm in addition to .arch armv8.5-a+memtag.
    
    v6:
    - Remove select ARCH_USES_PG_ARCH_2, no longer defined.
    
    v5:
    - Remove duplicate ARMv8.5 menu entry.
    
    v4:
    - select ARCH_USES_PG_ARCH_2.
    - remove ARCH_NO_SWAP.
    - default y.

 arch/arm64/Kconfig | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 66dc41fd49f2..32ceff21acc1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1664,6 +1664,37 @@ config ARCH_RANDOM
 	  provides a high bandwidth, cryptographically secure
 	  hardware random number generator.
 
+config ARM64_AS_HAS_MTE
+	# Binutils gained initial support for MTE in 2.32.0. However, a
+	# late architecture addition (LDGM/STGM) is only supported in
+	# the newer 2.32.x and 2.33 versions.
+	def_bool $(as-instr,.arch armv8.5-a+memtag\nstgm xzr$(comma)[x0])
+
+config ARM64_MTE
+	bool "Memory Tagging Extension support"
+	default y
+	depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
+	select ARCH_USES_HIGH_VMA_FLAGS
+	help
+	  Memory Tagging (part of the ARMv8.5 Extensions) provides
+	  architectural support for run-time, always-on detection of
+	  various classes of memory error to aid with software debugging
+	  to eliminate vulnerabilities arising from memory-unsafe
+	  languages.
+
+	  This option enables the support for the Memory Tagging
+	  Extension at EL0 (i.e. for userspace).
+
+	  Selecting this option allows the feature to be detected at
+	  runtime. Any secondary CPU not implementing this feature will
+	  not be allowed a late bring-up.
+
+	  Userspace binaries that want to use this feature must
+	  explicitly opt in. The mechanism for the userspace is
+	  described in:
+
+	  Documentation/arm64/memory-tagging-extension.rst.
+
 endmenu
 
 config ARM64_SVE


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (27 preceding siblings ...)
  2020-07-15 17:08 ` [PATCH v7 28/29] arm64: mte: Kconfig entry Catalin Marinas
@ 2020-07-15 17:08 ` Catalin Marinas
  2020-07-27 16:36   ` Szabolcs Nagy
  28 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-07-15 17:08 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
a mechanism to detect the sources of memory related errors which
may be vulnerable to exploitation, including bounds violations,
use-after-free, use-after-return, use-out-of-scope and use before
initialization errors.

Add Memory Tagging Extension documentation for the arm64 linux
kernel support.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v7:
    - Add information on ptrace() regset access (NT_ARM_TAGGED_ADDR_CTRL).
    
    v4:
    - Document behaviour of madvise(MADV_DONTNEED/MADV_FREE).
    - Document the initial process state on fork/execve.
    - Clarify when the kernel uaccess checks the tags.
    - Minor updates to the example code.
    - A few other minor clean-ups following review.
    
    v3:
    - Modify the uaccess checking conditions: only when the sync mode is
      selected by the user. In async mode, the kernel uaccesses are not
      checked.
    - Clarify that an include mask of 0 (exclude mask 0xffff) results in
      always generating tag 0.
    - Document the ptrace() interface.
    
    v2:
    - Documented the uaccess kernel tag checking mode.
    - Removed the BTI definitions from cpu-feature-registers.rst.
    - Removed the paragraph stating that MTE depends on the tagged address
      ABI (while the Kconfig entry does, there is no requirement for the
      user to enable both).
    - Changed the GCR_EL1.Exclude handling description following the change
      in the prctl() interface (include vs exclude mask).
    - Updated the example code.

 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/elf_hwcaps.rst            |   4 +
 Documentation/arm64/index.rst                 |   1 +
 .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++++
 4 files changed, 312 insertions(+)
 create mode 100644 Documentation/arm64/memory-tagging-extension.rst

diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index 314fa5bc2655..27d8559d565b 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -174,6 +174,8 @@ infrastructure:
      +------------------------------+---------+---------+
      | Name                         |  bits   | visible |
      +------------------------------+---------+---------+
+     | MTE                          | [11-8]  |    y    |
+     +------------------------------+---------+---------+
      | SSBS                         | [7-4]   |    y    |
      +------------------------------+---------+---------+
      | BT                           | [3-0]   |    y    |
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 84a9fd2d41b4..bbd9cf54db6c 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -240,6 +240,10 @@ HWCAP2_BTI
 
     Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
 
+HWCAP2_MTE
+
+    Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
+    by Documentation/arm64/memory-tagging-extension.rst.
 
 4. Unused AT_HWCAP bits
 -----------------------
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 09cbb4ed2237..4cd0e696f064 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -14,6 +14,7 @@ ARM64 Architecture
     hugetlbpage
     legacy_instructions
     memory
+    memory-tagging-extension
     pointer-authentication
     silicon-errata
     sve
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
new file mode 100644
index 000000000000..e3709b536b89
--- /dev/null
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -0,0 +1,305 @@
+===============================================
+Memory Tagging Extension (MTE) in AArch64 Linux
+===============================================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+         Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 2020-02-25
+
+This document describes the provision of the Memory Tagging Extension
+functionality in AArch64 Linux.
+
+Introduction
+============
+
+ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
+feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
+(Top Byte Ignore) feature and allows software to access a 4-bit
+allocation tag for each 16-byte granule in the physical address space.
+Such memory range must be mapped with the Normal-Tagged memory
+attribute. A logical tag is derived from bits 59-56 of the virtual
+address used for the memory access. A CPU with MTE enabled will compare
+the logical tag against the allocation tag and potentially raise an
+exception on mismatch, subject to system registers configuration.
+
+Userspace Support
+=================
+
+When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
+supported by the hardware, the kernel advertises the feature to
+userspace via ``HWCAP2_MTE``.
+
+PROT_MTE
+--------
+
+To access the allocation tags, a user process must enable the Tagged
+memory attribute on an address range using a new ``prot`` flag for
+``mmap()`` and ``mprotect()``:
+
+``PROT_MTE`` - Pages allow access to the MTE allocation tags.
+
+The allocation tag is set to 0 when such pages are first mapped in the
+user address space and preserved on copy-on-write. ``MAP_SHARED`` is
+supported and the allocation tags can be shared between processes.
+
+**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
+RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
+types of mapping will result in ``-EINVAL`` returned by these system
+calls.
+
+**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
+be cleared by ``mprotect()``.
+
+**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
+``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
+point after the system call.
+
+Tag Check Faults
+----------------
+
+When ``PROT_MTE`` is enabled on an address range and a mismatch between
+the logical and allocation tags occurs on access, there are three
+configurable behaviours:
+
+- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
+  tag check fault.
+
+- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
+  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
+  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
+  by the offending thread, the containing process is terminated with a
+  ``coredump``.
+
+- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
+  thread, asynchronously following one or multiple tag check faults,
+  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
+  address is unknown).
+
+The user can select the above modes, per thread, using the
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
+``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+bit-field:
+
+- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
+- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
+- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+
+The current tag check fault mode can be read using the
+``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
+
+Tag checking can also be disabled for a user thread by setting the
+``PSTATE.TCO`` bit with ``MSR TCO, #1``.
+
+**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
+irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
+``sigreturn()``.
+
+**Note**: There are no *match-all* logical tags available for user
+applications.
+
+**Note**: Kernel accesses to the user address space (e.g. ``read()``
+system call) are not checked if the user thread tag checking mode is
+``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
+``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
+address accesses, however it cannot always guarantee it.
+
+Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
+-----------------------------------------------------------------
+
+The architecture allows excluding certain tags to be randomly generated
+via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
+excludes all tags other than 0. A user thread can enable specific tags
+in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
+in the ``PR_MTE_TAG_MASK`` bit-field.
+
+**Note**: The hardware uses an exclude mask but the ``prctl()``
+interface provides an include mask. An include mask of ``0`` (exclusion
+mask ``0xffff``) results in the CPU always generating tag ``0``.
+
+Initial process state
+---------------------
+
+On ``execve()``, the new process has the following configuration:
+
+- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
+- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
+- ``PSTATE.TCO`` set to 0
+- ``PROT_MTE`` not set on any of the initial memory maps
+
+On ``fork()``, the new process inherits the parent's configuration and
+memory map attributes with the exception of the ``madvise()`` ranges
+with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
+to 0).
+
+The ``ptrace()`` interface
+--------------------------
+
+``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
+the tags from or set the tags to a tracee's address space. The
+``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
+data)`` where:
+
+- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
+- ``pid`` - the tracee's PID.
+- ``addr`` - address in the tracee's address space.
+- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
+  a buffer of ``iov_len`` length in the tracer's address space.
+
+The tags in the tracer's ``iov_base`` buffer are represented as one
+4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
+tracee's address space.
+
+**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
+will use the corresponding aligned address.
+
+``ptrace()`` return value:
+
+- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
+  number of tags transferred. This may be smaller than the requested
+  ``iov_len`` if the requested address range in the tracee's or the
+  tracer's space cannot be accessed or does not have valid tags.
+- ``-EPERM`` - the specified process cannot be traced.
+- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
+  address) and no tags copied. ``iov_len`` not updated.
+- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
+  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
+- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
+  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
+
+**Note**: There are no transient errors for the requests above, so user
+programs should not retry in case of a non-zero system call return.
+
+``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
+``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
+address ABI control and MTE configuration of a process as per the
+``prctl()`` options described in
+Documentation/arm64/tagged-address-abi.rst and above. The corresponding
+``regset`` is 1 element of 8 bytes (``sizeof(long))``).
+
+Example of correct usage
+========================
+
+*MTE Example code*
+
+.. code-block:: c
+
+    /*
+     * To be compiled with -march=armv8.5-a+memtag
+     */
+    #include <errno.h>
+    #include <stdint.h>
+    #include <stdio.h>
+    #include <stdlib.h>
+    #include <unistd.h>
+    #include <sys/auxv.h>
+    #include <sys/mman.h>
+    #include <sys/prctl.h>
+
+    /*
+     * From arch/arm64/include/uapi/asm/hwcap.h
+     */
+    #define HWCAP2_MTE              (1 << 18)
+
+    /*
+     * From arch/arm64/include/uapi/asm/mman.h
+     */
+    #define PROT_MTE                 0x20
+
+    /*
+     * From include/uapi/linux/prctl.h
+     */
+    #define PR_SET_TAGGED_ADDR_CTRL 55
+    #define PR_GET_TAGGED_ADDR_CTRL 56
+    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
+    # define PR_MTE_TCF_SHIFT       1
+    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TAG_SHIFT       3
+    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
+
+    /*
+     * Insert a random logical tag into the given pointer.
+     */
+    #define insert_random_tag(ptr) ({                       \
+            uint64_t __val;                                 \
+            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
+            __val;                                          \
+    })
+
+    /*
+     * Set the allocation tag on the destination address.
+     */
+    #define set_tag(tagged_addr) do {                                      \
+            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
+    } while (0)
+
+    int main()
+    {
+            unsigned char *a;
+            unsigned long page_sz = sysconf(_SC_PAGESIZE);
+            unsigned long hwcap2 = getauxval(AT_HWCAP2);
+
+            /* check if MTE is present */
+            if (!(hwcap2 & HWCAP2_MTE))
+                    return EXIT_FAILURE;
+
+            /*
+             * Enable the tagged address ABI, synchronous MTE tag check faults and
+             * allow all non-zero tags in the randomly generated set.
+             */
+            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
+                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+                      0, 0, 0)) {
+                    perror("prctl() failed");
+                    return EXIT_FAILURE;
+            }
+
+            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
+                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+            if (a == MAP_FAILED) {
+                    perror("mmap() failed");
+                    return EXIT_FAILURE;
+            }
+
+            /*
+             * Enable MTE on the above anonymous mmap. The flag could be passed
+             * directly to mmap() and skip this step.
+             */
+            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
+                    perror("mprotect() failed");
+                    return EXIT_FAILURE;
+            }
+
+            /* access with the default tag (0) */
+            a[0] = 1;
+            a[1] = 2;
+
+            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+            /* set the logical and allocation tags */
+            a = (unsigned char *)insert_random_tag(a);
+            set_tag(a);
+
+            printf("%p\n", a);
+
+            /* non-zero tag access */
+            a[0] = 3;
+            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+            /*
+             * If MTE is enabled correctly the next instruction will generate an
+             * exception.
+             */
+            printf("Expecting SIGSEGV...\n");
+            a[16] = 0xdd;
+
+            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
+            printf("...haven't got one\n");
+
+            return EXIT_FAILURE;
+    }


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-15 17:08 ` [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
@ 2020-07-20 15:30   ` Kevin Brodsky
  2020-07-20 17:00     ` Dave Martin
  2020-07-22 11:09     ` Catalin Marinas
  2020-08-04 19:34   ` Kevin Brodsky
  1 sibling, 2 replies; 53+ messages in thread
From: Kevin Brodsky @ 2020-07-20 15:30 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Andrey Konovalov,
	Peter Collingbourne, Andrew Morton

On 15/07/2020 18:08, Catalin Marinas wrote:
> By default, even if PROT_MTE is set on a memory range, there is no tag
> check fault reporting (SIGSEGV). Introduce a set of option to the
> exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
> check fault mode:
>
>    PR_MTE_TCF_NONE  - no reporting (default)
>    PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
>    PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
>
> These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
> context-switched by the kernel. Note that uaccess done by the kernel is
> not checked and cannot be configured by the user.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>
> Notes:
>      v3:
>      - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
>      - Move mte_thread_switch() in this patch from an earlier one. In
>        addition, it is called after the dsb() in __switch_to() so that any
>        asynchronous tag check faults have been registered in the TFSR_EL1
>        registers (to be added with the in-kernel MTE support.
>      
>      v2:
>      - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
>      - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
>        Collingbourne).
>      - Added ISB to update_sctlr_el1_tcf0() since, with the latest
>        architecture update/fix, the TCF0 field is used by the uaccess
>        routines.
>
>   arch/arm64/include/asm/mte.h       | 14 ++++++
>   arch/arm64/include/asm/processor.h |  3 ++
>   arch/arm64/kernel/mte.c            | 77 ++++++++++++++++++++++++++++++
>   arch/arm64/kernel/process.c        | 26 ++++++++--
>   include/uapi/linux/prctl.h         |  6 +++
>   5 files changed, 123 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index b2577eee62c2..df2efbc9f8f1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -21,6 +21,9 @@ void mte_clear_page_tags(void *addr);
>   void mte_sync_tags(pte_t *ptep, pte_t pte);
>   void mte_copy_page_tags(void *kto, const void *kfrom);
>   void flush_mte_state(void);
> +void mte_thread_switch(struct task_struct *next);
> +long set_mte_ctrl(unsigned long arg);
> +long get_mte_ctrl(void);
>   
>   #else
>   
> @@ -36,6 +39,17 @@ static inline void mte_copy_page_tags(void *kto, const void *kfrom)
>   static inline void flush_mte_state(void)
>   {
>   }
> +static inline void mte_thread_switch(struct task_struct *next)
> +{
> +}
> +static inline long set_mte_ctrl(unsigned long arg)
> +{
> +	return 0;
> +}
> +static inline long get_mte_ctrl(void)
> +{
> +	return 0;
> +}
>   
>   #endif
>   
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 240fe5e5b720..80e7f0573309 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -151,6 +151,9 @@ struct thread_struct {
>   	struct ptrauth_keys_user	keys_user;
>   	struct ptrauth_keys_kernel	keys_kernel;
>   #endif
> +#ifdef CONFIG_ARM64_MTE
> +	u64			sctlr_tcf0;
> +#endif
>   };
>   
>   static inline void arch_thread_struct_whitelist(unsigned long *offset,
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 5f54fd140610..375483a1f573 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -5,6 +5,8 @@
>   
>   #include <linux/bitops.h>
>   #include <linux/mm.h>
> +#include <linux/prctl.h>
> +#include <linux/sched.h>
>   #include <linux/string.h>
>   #include <linux/thread_info.h>
>   
> @@ -49,6 +51,26 @@ int memcmp_pages(struct page *page1, struct page *page2)
>   	return ret;
>   }
>   
> +static void update_sctlr_el1_tcf0(u64 tcf0)
> +{
> +	/* ISB required for the kernel uaccess routines */
> +	sysreg_clear_set(sctlr_el1, SCTLR_EL1_TCF0_MASK, tcf0);
> +	isb();
> +}
> +
> +static void set_sctlr_el1_tcf0(u64 tcf0)
> +{
> +	/*
> +	 * mte_thread_switch() checks current->thread.sctlr_tcf0 as an
> +	 * optimisation. Disable preemption so that it does not see
> +	 * the variable update before the SCTLR_EL1.TCF0 one.
> +	 */
> +	preempt_disable();
> +	current->thread.sctlr_tcf0 = tcf0;
> +	update_sctlr_el1_tcf0(tcf0);
> +	preempt_enable();
> +}
> +
>   void flush_mte_state(void)
>   {
>   	if (!system_supports_mte())
> @@ -58,4 +80,59 @@ void flush_mte_state(void)
>   	dsb(ish);
>   	write_sysreg_s(0, SYS_TFSRE0_EL1);
>   	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> +	/* disable tag checking */
> +	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
> +}
> +
> +void mte_thread_switch(struct task_struct *next)
> +{
> +	if (!system_supports_mte())
> +		return;
> +
> +	/* avoid expensive SCTLR_EL1 accesses if no change */
> +	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)

I think this could be improved by checking whether `next` is a kernel thread, in 
which case thread.sctlr_tcf0 is 0 but there is no point in setting SCTLR_EL1.TCF0, 
since there should not be any access via TTBR0.

Kevin

> +		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
> +}
> +
> +long set_mte_ctrl(unsigned long arg)
> +{
> +	u64 tcf0;
> +
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	switch (arg & PR_MTE_TCF_MASK) {
> +	case PR_MTE_TCF_NONE:
> +		tcf0 = SCTLR_EL1_TCF0_NONE;
> +		break;
> +	case PR_MTE_TCF_SYNC:
> +		tcf0 = SCTLR_EL1_TCF0_SYNC;
> +		break;
> +	case PR_MTE_TCF_ASYNC:
> +		tcf0 = SCTLR_EL1_TCF0_ASYNC;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	set_sctlr_el1_tcf0(tcf0);
> +
> +	return 0;
> +}
> +
> +long get_mte_ctrl(void)
> +{
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	switch (current->thread.sctlr_tcf0) {
> +	case SCTLR_EL1_TCF0_NONE:
> +		return PR_MTE_TCF_NONE;
> +	case SCTLR_EL1_TCF0_SYNC:
> +		return PR_MTE_TCF_SYNC;
> +	case SCTLR_EL1_TCF0_ASYNC:
> +		return PR_MTE_TCF_ASYNC;
> +	}
> +
> +	return 0;
>   }
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 695705d1f8e5..d19ce8053a03 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -544,6 +544,13 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
>   	 */
>   	dsb(ish);
>   
> +	/*
> +	 * MTE thread switching must happen after the DSB above to ensure that
> +	 * any asynchronous tag check faults have been logged in the TFSR*_EL1
> +	 * registers.
> +	 */
> +	mte_thread_switch(next);
> +
>   	/* the actual thread switch */
>   	last = cpu_switch_to(prev, next);
>   
> @@ -603,9 +610,15 @@ static unsigned int tagged_addr_disabled;
>   
>   long set_tagged_addr_ctrl(unsigned long arg)
>   {
> +	unsigned long valid_mask = PR_TAGGED_ADDR_ENABLE;
> +
>   	if (is_compat_task())
>   		return -EINVAL;
> -	if (arg & ~PR_TAGGED_ADDR_ENABLE)
> +
> +	if (system_supports_mte())
> +		valid_mask |= PR_MTE_TCF_MASK;
> +
> +	if (arg & ~valid_mask)
>   		return -EINVAL;
>   
>   	/*
> @@ -615,6 +628,9 @@ long set_tagged_addr_ctrl(unsigned long arg)
>   	if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
>   		return -EINVAL;
>   
> +	if (set_mte_ctrl(arg) != 0)
> +		return -EINVAL;
> +
>   	update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
>   
>   	return 0;
> @@ -622,13 +638,17 @@ long set_tagged_addr_ctrl(unsigned long arg)
>   
>   long get_tagged_addr_ctrl(void)
>   {
> +	long ret = 0;
> +
>   	if (is_compat_task())
>   		return -EINVAL;
>   
>   	if (test_thread_flag(TIF_TAGGED_ADDR))
> -		return PR_TAGGED_ADDR_ENABLE;
> +		ret = PR_TAGGED_ADDR_ENABLE;
>   
> -	return 0;
> +	ret |= get_mte_ctrl();
> +
> +	return ret;
>   }
>   
>   /*
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..2390ab324afa 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -233,6 +233,12 @@ struct prctl_mm_map {
>   #define PR_SET_TAGGED_ADDR_CTRL		55
>   #define PR_GET_TAGGED_ADDR_CTRL		56
>   # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
> +/* MTE tag check fault modes */
> +# define PR_MTE_TCF_SHIFT		1
> +# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
>   
>   /* Control reclaim behavior when allocating memory */
>   #define PR_SET_IO_FLUSHER		57



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-20 15:30   ` Kevin Brodsky
@ 2020-07-20 17:00     ` Dave Martin
  2020-07-22 10:28       ` Catalin Marinas
  2020-07-23 19:33       ` Kevin Brodsky
  2020-07-22 11:09     ` Catalin Marinas
  1 sibling, 2 replies; 53+ messages in thread
From: Dave Martin @ 2020-07-20 17:00 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: Catalin Marinas, linux-arm-kernel, linux-arch, Szabolcs Nagy,
	Andrey Konovalov, Peter Collingbourne, linux-mm, Andrew Morton,
	Vincenzo Frascino, Will Deacon

On Mon, Jul 20, 2020 at 04:30:35PM +0100, Kevin Brodsky wrote:
> On 15/07/2020 18:08, Catalin Marinas wrote:
> >By default, even if PROT_MTE is set on a memory range, there is no tag
> >check fault reporting (SIGSEGV). Introduce a set of option to the
> >exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
> >check fault mode:
> >
> >   PR_MTE_TCF_NONE  - no reporting (default)
> >   PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
> >   PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
> >
> >These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
> >context-switched by the kernel. Note that uaccess done by the kernel is
> >not checked and cannot be configured by the user.
> >
> >Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> >Cc: Will Deacon <will@kernel.org>
> >---
> >
> >Notes:
> >     v3:
> >     - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
> >     - Move mte_thread_switch() in this patch from an earlier one. In
> >       addition, it is called after the dsb() in __switch_to() so that any
> >       asynchronous tag check faults have been registered in the TFSR_EL1
> >       registers (to be added with the in-kernel MTE support.
> >     v2:
> >     - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
> >     - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
> >       Collingbourne).
> >     - Added ISB to update_sctlr_el1_tcf0() since, with the latest
> >       architecture update/fix, the TCF0 field is used by the uaccess
> >       routines.

[...]

> >diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c

[...]

> >+void mte_thread_switch(struct task_struct *next)
> >+{
> >+	if (!system_supports_mte())
> >+		return;
> >+
> >+	/* avoid expensive SCTLR_EL1 accesses if no change */
> >+	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
> 
> I think this could be improved by checking whether `next` is a kernel
> thread, in which case thread.sctlr_tcf0 is 0 but there is no point in
> setting SCTLR_EL1.TCF0, since there should not be any access via TTBR0.

Out of interest, do we have a nice way of testing for a kernel thread
now?

I remember fpsimd_thread_switch() used to check for task->mm, but we
seem to have got rid of that at some point.  set_mm() can defeat this,
and anyway the heavy lifting for FPSIMD is now deferred until returning
to userspace.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-20 17:00     ` Dave Martin
@ 2020-07-22 10:28       ` Catalin Marinas
  2020-07-23 19:33       ` Kevin Brodsky
  1 sibling, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-22 10:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: Kevin Brodsky, linux-arm-kernel, linux-arch, Szabolcs Nagy,
	Andrey Konovalov, Peter Collingbourne, linux-mm, Andrew Morton,
	Vincenzo Frascino, Will Deacon

On Mon, Jul 20, 2020 at 06:00:50PM +0100, Dave P Martin wrote:
> On Mon, Jul 20, 2020 at 04:30:35PM +0100, Kevin Brodsky wrote:
> > On 15/07/2020 18:08, Catalin Marinas wrote:
> > >+void mte_thread_switch(struct task_struct *next)
> > >+{
> > >+	if (!system_supports_mte())
> > >+		return;
> > >+
> > >+	/* avoid expensive SCTLR_EL1 accesses if no change */
> > >+	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
> > 
> > I think this could be improved by checking whether `next` is a kernel
> > thread, in which case thread.sctlr_tcf0 is 0 but there is no point in
> > setting SCTLR_EL1.TCF0, since there should not be any access via TTBR0.
> 
> Out of interest, do we have a nice way of testing for a kernel thread
> now?
> 
> I remember fpsimd_thread_switch() used to check for task->mm, but we
> seem to have got rid of that at some point.  set_mm() can defeat this,
> and anyway the heavy lifting for FPSIMD is now deferred until returning
> to userspace.

I think a better way is (current->flags & PF_KTHREAD). kthread_use_mm()
indeed changes current->mm.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-20 15:30   ` Kevin Brodsky
  2020-07-20 17:00     ` Dave Martin
@ 2020-07-22 11:09     ` Catalin Marinas
  1 sibling, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-07-22 11:09 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-arm-kernel, linux-mm, linux-arch, Will Deacon,
	Dave P Martin, Vincenzo Frascino, Szabolcs Nagy,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

On Mon, Jul 20, 2020 at 04:30:35PM +0100, Kevin Brodsky wrote:
> On 15/07/2020 18:08, Catalin Marinas wrote:
> > +void mte_thread_switch(struct task_struct *next)
> > +{
> > +	if (!system_supports_mte())
> > +		return;
> > +
> > +	/* avoid expensive SCTLR_EL1 accesses if no change */
> > +	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
> 
> I think this could be improved by checking whether `next` is a kernel
> thread, in which case thread.sctlr_tcf0 is 0 but there is no point in
> setting SCTLR_EL1.TCF0, since there should not be any access via TTBR0.

It's not about kernel or user thread here. kthread_use_mm() (just
use_mm() in older kernels) would set an mm on a kernel thread,
temporarily making it behave as a user one. Since the sctlr_tcf0 is per
thread, not per mm, we need to switch to the default TCF0 for kthreads
so that user accesses (if use_mm() is called) don't generate any tag
check faults. Note that switch_mm() does not touch TCF0.

If we did allow a global, per-mm TCF0 setting, such kthreads could only
handle synchronous faults and no SIGSEGV generated (as we do with
copy_{from,to}_user() for normal threads).

If we want to revisit per-thread vs per-mm TCF0 setting, now is the
time.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-20 17:00     ` Dave Martin
  2020-07-22 10:28       ` Catalin Marinas
@ 2020-07-23 19:33       ` Kevin Brodsky
  1 sibling, 0 replies; 53+ messages in thread
From: Kevin Brodsky @ 2020-07-23 19:33 UTC (permalink / raw)
  To: Dave Martin
  Cc: Catalin Marinas, linux-arm-kernel, linux-arch, Szabolcs Nagy,
	Andrey Konovalov, Peter Collingbourne, linux-mm, Andrew Morton,
	Vincenzo Frascino, Will Deacon

On 20/07/2020 18:00, Dave Martin wrote:
> On Mon, Jul 20, 2020 at 04:30:35PM +0100, Kevin Brodsky wrote:
>> On 15/07/2020 18:08, Catalin Marinas wrote:
>>> By default, even if PROT_MTE is set on a memory range, there is no tag
>>> check fault reporting (SIGSEGV). Introduce a set of option to the
>>> exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
>>> check fault mode:
>>>
>>>    PR_MTE_TCF_NONE  - no reporting (default)
>>>    PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
>>>    PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
>>>
>>> These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
>>> context-switched by the kernel. Note that uaccess done by the kernel is
>>> not checked and cannot be configured by the user.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> ---
>>>
>>> Notes:
>>>      v3:
>>>      - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
>>>      - Move mte_thread_switch() in this patch from an earlier one. In
>>>        addition, it is called after the dsb() in __switch_to() so that any
>>>        asynchronous tag check faults have been registered in the TFSR_EL1
>>>        registers (to be added with the in-kernel MTE support.
>>>      v2:
>>>      - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
>>>      - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
>>>        Collingbourne).
>>>      - Added ISB to update_sctlr_el1_tcf0() since, with the latest
>>>        architecture update/fix, the TCF0 field is used by the uaccess
>>>        routines.
> [...]
>
>>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> [...]
>
>>> +void mte_thread_switch(struct task_struct *next)
>>> +{
>>> +	if (!system_supports_mte())
>>> +		return;
>>> +
>>> +	/* avoid expensive SCTLR_EL1 accesses if no change */
>>> +	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
>> I think this could be improved by checking whether `next` is a kernel
>> thread, in which case thread.sctlr_tcf0 is 0 but there is no point in
>> setting SCTLR_EL1.TCF0, since there should not be any access via TTBR0.
> Out of interest, do we have a nice way of testing for a kernel thread
> now?

Isn't it as simple as checking if PF_KTHREAD is set in tsk->flags? At least this is 
what ssbs_thread_switch() does.

Kevin

> I remember fpsimd_thread_switch() used to check for task->mm, but we
> seem to have got rid of that at some point.  set_mm() can defeat this,
> and anyway the heavy lifting for FPSIMD is now deferred until returning
> to userspace.
>
> Cheers
> ---Dave



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-15 17:08 ` [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
@ 2020-07-27 16:36   ` Szabolcs Nagy
  2020-07-28 11:08     ` Dave Martin
  0 siblings, 1 reply; 53+ messages in thread
From: Szabolcs Nagy @ 2020-07-27 16:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-mm, linux-arch, Will Deacon,
	Dave P Martin, Vincenzo Frascino, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

The 07/15/2020 18:08, Catalin Marinas wrote:
> From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> 
> Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
> a mechanism to detect the sources of memory related errors which
> may be vulnerable to exploitation, including bounds violations,
> use-after-free, use-after-return, use-out-of-scope and use before
> initialization errors.
> 
> Add Memory Tagging Extension documentation for the arm64 linux
> kernel support.
> 
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> 
> Notes:
>     v7:
>     - Add information on ptrace() regset access (NT_ARM_TAGGED_ADDR_CTRL).
>     
>     v4:
>     - Document behaviour of madvise(MADV_DONTNEED/MADV_FREE).
>     - Document the initial process state on fork/execve.
>     - Clarify when the kernel uaccess checks the tags.
>     - Minor updates to the example code.
>     - A few other minor clean-ups following review.
>     
>     v3:
>     - Modify the uaccess checking conditions: only when the sync mode is
>       selected by the user. In async mode, the kernel uaccesses are not
>       checked.
>     - Clarify that an include mask of 0 (exclude mask 0xffff) results in
>       always generating tag 0.
>     - Document the ptrace() interface.
>     
>     v2:
>     - Documented the uaccess kernel tag checking mode.
>     - Removed the BTI definitions from cpu-feature-registers.rst.
>     - Removed the paragraph stating that MTE depends on the tagged address
>       ABI (while the Kconfig entry does, there is no requirement for the
>       user to enable both).
>     - Changed the GCR_EL1.Exclude handling description following the change
>       in the prctl() interface (include vs exclude mask).
>     - Updated the example code.
> 
>  Documentation/arm64/cpu-feature-registers.rst |   2 +
>  Documentation/arm64/elf_hwcaps.rst            |   4 +
>  Documentation/arm64/index.rst                 |   1 +
>  .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++++
>  4 files changed, 312 insertions(+)
>  create mode 100644 Documentation/arm64/memory-tagging-extension.rst
> 
> diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
...
> +Tag Check Faults
> +----------------
> +
> +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> +the logical and allocation tags occurs on access, there are three
> +configurable behaviours:
> +
> +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> +  tag check fault.
> +
> +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> +  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
> +  by the offending thread, the containing process is terminated with a
> +  ``coredump``.
> +
> +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
> +  thread, asynchronously following one or multiple tag check faults,
> +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
> +  address is unknown).
> +
> +The user can select the above modes, per thread, using the
> +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> +bit-field:
> +
> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> +
> +The current tag check fault mode can be read using the
> +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.

we discussed the need for per process prctl off list, i will
try to summarize the requirement here:

- it cannot be guaranteed in general that a library initializer
or first call into a library happens when the process is still
single threaded.

- user code currently has no way to call prctl in all threads of
a process and even within the c runtime doing so is problematic
(it has to signal all threads, which requires a reserved signal
and dealing with exiting threads and signal masks, such mechanism
can break qemu user and various other userspace tooling).

- we don't yet have defined contract in userspace about how user
code may enable mte (i.e. use the prctl call), but it seems that
there will be use cases for it: LD_PRELOADing malloc for heap
tagging is one such case, but any library or custom allocator
that wants to use mte will have this issue: when it enables mte
it wants to enable it for all threads in the process. (or at
least all threads managed by the c runtime).

- even if user code is not allowed to call the prctl directly,
i.e. the prctl settings are owned by the libc, there will be
cases when the settings have to be changed in a multithreaded
process (e.g. dlopening a library that requires a particular
mte state).

a solution is to introduce a flag like SECCOMP_FILTER_FLAG_TSYNC
that means the prctl is for all threads in the process not just
for the current one. however the exact semantics is not obvious
if there are inconsistent settings in different threads or user
code tries to use the prctl concurrently: first checking then
setting the mte state via separate prctl calls is racy. but if
the userspace contract for enabling mte limits who and when can
call the prctl then i think the simple sync flag approach works.

(the sync flag should apply to all prctl settings: tagged addr
syscall abi, mte check fault mode, irg tag excludes. ideally it
would work for getting the process wide state and it would fail
in case of inconsistent settings.)

we may need to document some memory ordering details when
memory accesses in other threads are affected, but i think
that can be something simple that leaves it unspecified
what happens with memory accesses that are not synchrnized
with the prctl call.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-27 16:36   ` Szabolcs Nagy
@ 2020-07-28 11:08     ` Dave Martin
  2020-07-28 14:53       ` Szabolcs Nagy
  0 siblings, 1 reply; 53+ messages in thread
From: Dave Martin @ 2020-07-28 11:08 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, linux-arch, Peter Collingbourne,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Andrew Morton,
	Vincenzo Frascino, Will Deacon, linux-arm-kernel

On Mon, Jul 27, 2020 at 05:36:35PM +0100, Szabolcs Nagy wrote:
> The 07/15/2020 18:08, Catalin Marinas wrote:
> > From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > 
> > Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
> > a mechanism to detect the sources of memory related errors which
> > may be vulnerable to exploitation, including bounds violations,
> > use-after-free, use-after-return, use-out-of-scope and use before
> > initialization errors.
> > 
> > Add Memory Tagging Extension documentation for the arm64 linux
> > kernel support.
> > 
> > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> > 
> > Notes:
> >     v7:
> >     - Add information on ptrace() regset access (NT_ARM_TAGGED_ADDR_CTRL).
> >     
> >     v4:
> >     - Document behaviour of madvise(MADV_DONTNEED/MADV_FREE).
> >     - Document the initial process state on fork/execve.
> >     - Clarify when the kernel uaccess checks the tags.
> >     - Minor updates to the example code.
> >     - A few other minor clean-ups following review.
> >     
> >     v3:
> >     - Modify the uaccess checking conditions: only when the sync mode is
> >       selected by the user. In async mode, the kernel uaccesses are not
> >       checked.
> >     - Clarify that an include mask of 0 (exclude mask 0xffff) results in
> >       always generating tag 0.
> >     - Document the ptrace() interface.
> >     
> >     v2:
> >     - Documented the uaccess kernel tag checking mode.
> >     - Removed the BTI definitions from cpu-feature-registers.rst.
> >     - Removed the paragraph stating that MTE depends on the tagged address
> >       ABI (while the Kconfig entry does, there is no requirement for the
> >       user to enable both).
> >     - Changed the GCR_EL1.Exclude handling description following the change
> >       in the prctl() interface (include vs exclude mask).
> >     - Updated the example code.
> > 
> >  Documentation/arm64/cpu-feature-registers.rst |   2 +
> >  Documentation/arm64/elf_hwcaps.rst            |   4 +
> >  Documentation/arm64/index.rst                 |   1 +
> >  .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++++
> >  4 files changed, 312 insertions(+)
> >  create mode 100644 Documentation/arm64/memory-tagging-extension.rst
> > 
> > diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> ...
> > +Tag Check Faults
> > +----------------
> > +
> > +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> > +the logical and allocation tags occurs on access, there are three
> > +configurable behaviours:
> > +
> > +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> > +  tag check fault.
> > +
> > +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> > +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> > +  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
> > +  by the offending thread, the containing process is terminated with a
> > +  ``coredump``.
> > +
> > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
> > +  thread, asynchronously following one or multiple tag check faults,
> > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
> > +  address is unknown).
> > +
> > +The user can select the above modes, per thread, using the
> > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> > +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> > +bit-field:
> > +
> > +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> > +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> > +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> > +
> > +The current tag check fault mode can be read using the
> > +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
> 
> we discussed the need for per process prctl off list, i will
> try to summarize the requirement here:
> 
> - it cannot be guaranteed in general that a library initializer
> or first call into a library happens when the process is still
> single threaded.
> 
> - user code currently has no way to call prctl in all threads of
> a process and even within the c runtime doing so is problematic
> (it has to signal all threads, which requires a reserved signal
> and dealing with exiting threads and signal masks, such mechanism
> can break qemu user and various other userspace tooling).

When working on the SVE support, I came to the conclusion that this
kind of thing would normally either be done by the runtime itself, or in
close cooperation with the runtime.  However, for SVE it never makes
sense for one thread to asynchronously change the vector length of
another thread -- that's different from the MTE situation.

> - we don't yet have defined contract in userspace about how user
> code may enable mte (i.e. use the prctl call), but it seems that
> there will be use cases for it: LD_PRELOADing malloc for heap
> tagging is one such case, but any library or custom allocator
> that wants to use mte will have this issue: when it enables mte
> it wants to enable it for all threads in the process. (or at
> least all threads managed by the c runtime).

What are the situations where we anticipate a need to twiddle MTE in
multiple threads simultaneously, other than during process startup?

> - even if user code is not allowed to call the prctl directly,
> i.e. the prctl settings are owned by the libc, there will be
> cases when the settings have to be changed in a multithreaded
> process (e.g. dlopening a library that requires a particular
> mte state).

Could be avoided by refusing to dlopen a library that is incompatible
with the current process.

dlopen()ing a library that doesn't support tagged addresses, in a
process that does use tagged addresses, seems undesirable even if tag
checking is currently turned off.


> a solution is to introduce a flag like SECCOMP_FILTER_FLAG_TSYNC
> that means the prctl is for all threads in the process not just
> for the current one. however the exact semantics is not obvious
> if there are inconsistent settings in different threads or user
> code tries to use the prctl concurrently: first checking then
> setting the mte state via separate prctl calls is racy. but if
> the userspace contract for enabling mte limits who and when can
> call the prctl then i think the simple sync flag approach works.
> 
> (the sync flag should apply to all prctl settings: tagged addr
> syscall abi, mte check fault mode, irg tag excludes. ideally it
> would work for getting the process wide state and it would fail
> in case of inconsistent settings.)

If going down this route, perhaps we could have sets of settings:
so for each setting we have a process-wide value and a per-thread
value, with defines rules about how they combine.

Since MTE is a debugging feature, we might be able to be less aggressive
about synchronisation than in the SECCOMP case.

> we may need to document some memory ordering details when
> memory accesses in other threads are affected, but i think
> that can be something simple that leaves it unspecified
> what happens with memory accesses that are not synchrnized
> with the prctl call.

Hmmm...

Cheers
---Dave


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-28 11:08     ` Dave Martin
@ 2020-07-28 14:53       ` Szabolcs Nagy
  2020-07-28 19:59         ` Catalin Marinas
  0 siblings, 1 reply; 53+ messages in thread
From: Szabolcs Nagy @ 2020-07-28 14:53 UTC (permalink / raw)
  To: Dave Martin
  Cc: Catalin Marinas, linux-arch, Peter Collingbourne,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Andrew Morton,
	Vincenzo Frascino, Will Deacon, linux-arm-kernel, nd

The 07/28/2020 12:08, Dave Martin wrote:
> On Mon, Jul 27, 2020 at 05:36:35PM +0100, Szabolcs Nagy wrote:
> > The 07/15/2020 18:08, Catalin Marinas wrote:
> > > +The user can select the above modes, per thread, using the
> > > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> > > +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> > > +bit-field:
> > > +
> > > +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> > > +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> > > +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> > > +
> > > +The current tag check fault mode can be read using the
> > > +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
> > 
> > we discussed the need for per process prctl off list, i will
> > try to summarize the requirement here:
> > 
> > - it cannot be guaranteed in general that a library initializer
> > or first call into a library happens when the process is still
> > single threaded.
> > 
> > - user code currently has no way to call prctl in all threads of
> > a process and even within the c runtime doing so is problematic
> > (it has to signal all threads, which requires a reserved signal
> > and dealing with exiting threads and signal masks, such mechanism
> > can break qemu user and various other userspace tooling).
> 
> When working on the SVE support, I came to the conclusion that this
> kind of thing would normally either be done by the runtime itself, or in
> close cooperation with the runtime.  However, for SVE it never makes
> sense for one thread to asynchronously change the vector length of
> another thread -- that's different from the MTE situation.

currently there is libc mechanism to do some operation
in all threads (e.g. for set*id) but this is fragile
and not something that can be exposed to user code.
(on the kernel side it should be much simpler to do)

> > - we don't yet have defined contract in userspace about how user
> > code may enable mte (i.e. use the prctl call), but it seems that
> > there will be use cases for it: LD_PRELOADing malloc for heap
> > tagging is one such case, but any library or custom allocator
> > that wants to use mte will have this issue: when it enables mte
> > it wants to enable it for all threads in the process. (or at
> > least all threads managed by the c runtime).
> 
> What are the situations where we anticipate a need to twiddle MTE in
> multiple threads simultaneously, other than during process startup?
> 
> > - even if user code is not allowed to call the prctl directly,
> > i.e. the prctl settings are owned by the libc, there will be
> > cases when the settings have to be changed in a multithreaded
> > process (e.g. dlopening a library that requires a particular
> > mte state).
> 
> Could be avoided by refusing to dlopen a library that is incompatible
> with the current process.
> 
> dlopen()ing a library that doesn't support tagged addresses, in a
> process that does use tagged addresses, seems undesirable even if tag
> checking is currently turned off.

yes but it can go the other way too:

at startup the libc does not enable tag checks
for performance reasons, but at dlopen time a
library is detected to use mte (e.g. stack
tagging or custom allocator).

then libc or the dlopened library has to ensure
that checks are enabled in all threads. (in case
of stack tagging the libc has to mark existing
stacks with PROT_MTE too, there is mechanism for
this in glibc to deal with dlopened libraries
that require executable stack and only reject
the dlopen if this cannot be performed.)

another usecase is that the libc is mte-safe
(it accepts tagged pointers and memory in its
interfaces), but it does not enable mte (this
will be the case with glibc 2.32) and user
libraries have to enable mte to use it (custom
allocator or malloc interposition are examples).

and i think this is necessary if userpsace wants
to turn async tag check into sync tag check at
runtime when a failure is detected.

> > a solution is to introduce a flag like SECCOMP_FILTER_FLAG_TSYNC
> > that means the prctl is for all threads in the process not just
> > for the current one. however the exact semantics is not obvious
> > if there are inconsistent settings in different threads or user
> > code tries to use the prctl concurrently: first checking then
> > setting the mte state via separate prctl calls is racy. but if
> > the userspace contract for enabling mte limits who and when can
> > call the prctl then i think the simple sync flag approach works.
> > 
> > (the sync flag should apply to all prctl settings: tagged addr
> > syscall abi, mte check fault mode, irg tag excludes. ideally it
> > would work for getting the process wide state and it would fail
> > in case of inconsistent settings.)
> 
> If going down this route, perhaps we could have sets of settings:
> so for each setting we have a process-wide value and a per-thread
> value, with defines rules about how they combine.
> 
> Since MTE is a debugging feature, we might be able to be less aggressive
> about synchronisation than in the SECCOMP case.

separate process-wide and per-thread value
works for me and i expect most uses will
be process wide settings.

i don't think mte is less of a security
feature than seccomp.

if linux does not want to add a per process
setting then only libc will be able to opt-in
to mte and only at very early in the startup
process (before executing any user code that
may start threads). this is not out of question,
but i think it limits the usage and deployment
options.

> > we may need to document some memory ordering details when
> > memory accesses in other threads are affected, but i think
> > that can be something simple that leaves it unspecified
> > what happens with memory accesses that are not synchrnized
> > with the prctl call.
> 
> Hmmm...

e.g. it may be enough if the spec only works if
there is no PROT_MTE memory mapped yet, and no
tagged addresses are present in the multi-threaded
process when the prctl is called.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-28 14:53       ` Szabolcs Nagy
@ 2020-07-28 19:59         ` Catalin Marinas
  2020-08-03 12:43           ` Szabolcs Nagy
  0 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-07-28 19:59 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

On Tue, Jul 28, 2020 at 03:53:51PM +0100, Szabolcs Nagy wrote:
> The 07/28/2020 12:08, Dave Martin wrote:
> > On Mon, Jul 27, 2020 at 05:36:35PM +0100, Szabolcs Nagy wrote:
> > > a solution is to introduce a flag like SECCOMP_FILTER_FLAG_TSYNC
> > > that means the prctl is for all threads in the process not just
> > > for the current one. however the exact semantics is not obvious
> > > if there are inconsistent settings in different threads or user
> > > code tries to use the prctl concurrently: first checking then
> > > setting the mte state via separate prctl calls is racy. but if
> > > the userspace contract for enabling mte limits who and when can
> > > call the prctl then i think the simple sync flag approach works.
> > > 
> > > (the sync flag should apply to all prctl settings: tagged addr
> > > syscall abi, mte check fault mode, irg tag excludes. ideally it
> > > would work for getting the process wide state and it would fail
> > > in case of inconsistent settings.)
> > 
> > If going down this route, perhaps we could have sets of settings:
> > so for each setting we have a process-wide value and a per-thread
> > value, with defines rules about how they combine.
> > 
> > Since MTE is a debugging feature, we might be able to be less aggressive
> > about synchronisation than in the SECCOMP case.
> 
> separate process-wide and per-thread value
> works for me and i expect most uses will
> be process wide settings.

The problem with the thread synchronisation is, unlike SECCOMP, that we
need to update the SCTLR_EL1.TCF0 field across all the CPUs that may run
threads of the current process. I haven't convinced myself that this is
race-free without heavy locking. If we go for some heavy mechanism like
stop_machine(), that opens the kernel to DoS attacks from user. Still
investigating if something like membarrier() would be sufficient.

SECCOMP gets away with this as it only needs to set some variable
without IPI'ing the other CPUs.

> i don't think mte is less of a security
> feature than seccomp.

Well, MTE is probabilistic, SECCOMP seems to be more precise ;).

> if linux does not want to add a per process
> setting then only libc will be able to opt-in
> to mte and only at very early in the startup
> process (before executing any user code that
> may start threads). this is not out of question,
> but i think it limits the usage and deployment
> options.

There is also the risk that we try to be too flexible at this stage
without a real use-case.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-07-28 19:59         ` Catalin Marinas
@ 2020-08-03 12:43           ` Szabolcs Nagy
  2020-08-07 15:19             ` Catalin Marinas
  0 siblings, 1 reply; 53+ messages in thread
From: Szabolcs Nagy @ 2020-08-03 12:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 07/28/2020 20:59, Catalin Marinas wrote:
> On Tue, Jul 28, 2020 at 03:53:51PM +0100, Szabolcs Nagy wrote:
> > if linux does not want to add a per process
> > setting then only libc will be able to opt-in
> > to mte and only at very early in the startup
> > process (before executing any user code that
> > may start threads). this is not out of question,
> > but i think it limits the usage and deployment
> > options.
> 
> There is also the risk that we try to be too flexible at this stage
> without a real use-case.

i don't know how mte will be turned on in libc.

if we can always turn sync tag checks on early
whenever mte is available then i think there is
no issue.

but if we have to make the decision later for
compatibility or performance reasons then per
thread setting is problematic.

use of the prctl outside of libc is very limited
if it's per thread only:

- application code may use it in a (elf specific)
  pre-initialization function, but that's a bit
  obscure (not exposed in c) and it is reasonable
  for an application to enable mte checks after
  it registered a signal handler for mte faults.
  (and at that point it may be multi-threaded).

- library code normally initializes per thread
  state on the first call into the library from a
  given thread, but with mte, as soon as memory /
  pointers are tagged in one thread, all threads
  are affected: not performing checks in other
  threads is less secure (may be ok) and it means
  incompatible syscall abi (not ok). so at least
  PR_TAGGED_ADDR_ENABLE should have process wide
  setting for this usage.

but i guess it is fine to design the mechanism
for these in a later linux version, until then
such usage will be unreliable (will depend on
how early threads are created).


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-07-15 17:08 ` [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
  2020-07-20 15:30   ` Kevin Brodsky
@ 2020-08-04 19:34   ` Kevin Brodsky
  2020-08-05  9:24     ` Catalin Marinas
  1 sibling, 1 reply; 53+ messages in thread
From: Kevin Brodsky @ 2020-08-04 19:34 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Andrey Konovalov,
	Peter Collingbourne, Andrew Morton

On 15/07/2020 18:08, Catalin Marinas wrote:
> By default, even if PROT_MTE is set on a memory range, there is no tag
> check fault reporting (SIGSEGV). Introduce a set of option to the
> exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
> check fault mode:
>
>    PR_MTE_TCF_NONE  - no reporting (default)
>    PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
>    PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
>
> These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
> context-switched by the kernel. Note that uaccess done by the kernel is
> not checked and cannot be configured by the user.

The last sentence is outdated, it should probably say that uaccess is only checked in 
in synchronous mode.

Kevin

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>
> Notes:
>      v3:
>      - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
>      - Move mte_thread_switch() in this patch from an earlier one. In
>        addition, it is called after the dsb() in __switch_to() so that any
>        asynchronous tag check faults have been registered in the TFSR_EL1
>        registers (to be added with the in-kernel MTE support.
>      
>      v2:
>      - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
>      - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
>        Collingbourne).
>      - Added ISB to update_sctlr_el1_tcf0() since, with the latest
>        architecture update/fix, the TCF0 field is used by the uaccess
>        routines.
>
>   arch/arm64/include/asm/mte.h       | 14 ++++++
>   arch/arm64/include/asm/processor.h |  3 ++
>   arch/arm64/kernel/mte.c            | 77 ++++++++++++++++++++++++++++++
>   arch/arm64/kernel/process.c        | 26 ++++++++--
>   include/uapi/linux/prctl.h         |  6 +++
>   5 files changed, 123 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index b2577eee62c2..df2efbc9f8f1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -21,6 +21,9 @@ void mte_clear_page_tags(void *addr);
>   void mte_sync_tags(pte_t *ptep, pte_t pte);
>   void mte_copy_page_tags(void *kto, const void *kfrom);
>   void flush_mte_state(void);
> +void mte_thread_switch(struct task_struct *next);
> +long set_mte_ctrl(unsigned long arg);
> +long get_mte_ctrl(void);
>   
>   #else
>   
> @@ -36,6 +39,17 @@ static inline void mte_copy_page_tags(void *kto, const void *kfrom)
>   static inline void flush_mte_state(void)
>   {
>   }
> +static inline void mte_thread_switch(struct task_struct *next)
> +{
> +}
> +static inline long set_mte_ctrl(unsigned long arg)
> +{
> +	return 0;
> +}
> +static inline long get_mte_ctrl(void)
> +{
> +	return 0;
> +}
>   
>   #endif
>   
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 240fe5e5b720..80e7f0573309 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -151,6 +151,9 @@ struct thread_struct {
>   	struct ptrauth_keys_user	keys_user;
>   	struct ptrauth_keys_kernel	keys_kernel;
>   #endif
> +#ifdef CONFIG_ARM64_MTE
> +	u64			sctlr_tcf0;
> +#endif
>   };
>   
>   static inline void arch_thread_struct_whitelist(unsigned long *offset,
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 5f54fd140610..375483a1f573 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -5,6 +5,8 @@
>   
>   #include <linux/bitops.h>
>   #include <linux/mm.h>
> +#include <linux/prctl.h>
> +#include <linux/sched.h>
>   #include <linux/string.h>
>   #include <linux/thread_info.h>
>   
> @@ -49,6 +51,26 @@ int memcmp_pages(struct page *page1, struct page *page2)
>   	return ret;
>   }
>   
> +static void update_sctlr_el1_tcf0(u64 tcf0)
> +{
> +	/* ISB required for the kernel uaccess routines */
> +	sysreg_clear_set(sctlr_el1, SCTLR_EL1_TCF0_MASK, tcf0);
> +	isb();
> +}
> +
> +static void set_sctlr_el1_tcf0(u64 tcf0)
> +{
> +	/*
> +	 * mte_thread_switch() checks current->thread.sctlr_tcf0 as an
> +	 * optimisation. Disable preemption so that it does not see
> +	 * the variable update before the SCTLR_EL1.TCF0 one.
> +	 */
> +	preempt_disable();
> +	current->thread.sctlr_tcf0 = tcf0;
> +	update_sctlr_el1_tcf0(tcf0);
> +	preempt_enable();
> +}
> +
>   void flush_mte_state(void)
>   {
>   	if (!system_supports_mte())
> @@ -58,4 +80,59 @@ void flush_mte_state(void)
>   	dsb(ish);
>   	write_sysreg_s(0, SYS_TFSRE0_EL1);
>   	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> +	/* disable tag checking */
> +	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
> +}
> +
> +void mte_thread_switch(struct task_struct *next)
> +{
> +	if (!system_supports_mte())
> +		return;
> +
> +	/* avoid expensive SCTLR_EL1 accesses if no change */
> +	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
> +		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
> +}
> +
> +long set_mte_ctrl(unsigned long arg)
> +{
> +	u64 tcf0;
> +
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	switch (arg & PR_MTE_TCF_MASK) {
> +	case PR_MTE_TCF_NONE:
> +		tcf0 = SCTLR_EL1_TCF0_NONE;
> +		break;
> +	case PR_MTE_TCF_SYNC:
> +		tcf0 = SCTLR_EL1_TCF0_SYNC;
> +		break;
> +	case PR_MTE_TCF_ASYNC:
> +		tcf0 = SCTLR_EL1_TCF0_ASYNC;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	set_sctlr_el1_tcf0(tcf0);
> +
> +	return 0;
> +}
> +
> +long get_mte_ctrl(void)
> +{
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	switch (current->thread.sctlr_tcf0) {
> +	case SCTLR_EL1_TCF0_NONE:
> +		return PR_MTE_TCF_NONE;
> +	case SCTLR_EL1_TCF0_SYNC:
> +		return PR_MTE_TCF_SYNC;
> +	case SCTLR_EL1_TCF0_ASYNC:
> +		return PR_MTE_TCF_ASYNC;
> +	}
> +
> +	return 0;
>   }
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 695705d1f8e5..d19ce8053a03 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -544,6 +544,13 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
>   	 */
>   	dsb(ish);
>   
> +	/*
> +	 * MTE thread switching must happen after the DSB above to ensure that
> +	 * any asynchronous tag check faults have been logged in the TFSR*_EL1
> +	 * registers.
> +	 */
> +	mte_thread_switch(next);
> +
>   	/* the actual thread switch */
>   	last = cpu_switch_to(prev, next);
>   
> @@ -603,9 +610,15 @@ static unsigned int tagged_addr_disabled;
>   
>   long set_tagged_addr_ctrl(unsigned long arg)
>   {
> +	unsigned long valid_mask = PR_TAGGED_ADDR_ENABLE;
> +
>   	if (is_compat_task())
>   		return -EINVAL;
> -	if (arg & ~PR_TAGGED_ADDR_ENABLE)
> +
> +	if (system_supports_mte())
> +		valid_mask |= PR_MTE_TCF_MASK;
> +
> +	if (arg & ~valid_mask)
>   		return -EINVAL;
>   
>   	/*
> @@ -615,6 +628,9 @@ long set_tagged_addr_ctrl(unsigned long arg)
>   	if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
>   		return -EINVAL;
>   
> +	if (set_mte_ctrl(arg) != 0)
> +		return -EINVAL;
> +
>   	update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
>   
>   	return 0;
> @@ -622,13 +638,17 @@ long set_tagged_addr_ctrl(unsigned long arg)
>   
>   long get_tagged_addr_ctrl(void)
>   {
> +	long ret = 0;
> +
>   	if (is_compat_task())
>   		return -EINVAL;
>   
>   	if (test_thread_flag(TIF_TAGGED_ADDR))
> -		return PR_TAGGED_ADDR_ENABLE;
> +		ret = PR_TAGGED_ADDR_ENABLE;
>   
> -	return 0;
> +	ret |= get_mte_ctrl();
> +
> +	return ret;
>   }
>   
>   /*
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 07b4f8131e36..2390ab324afa 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -233,6 +233,12 @@ struct prctl_mm_map {
>   #define PR_SET_TAGGED_ADDR_CTRL		55
>   #define PR_GET_TAGGED_ADDR_CTRL		56
>   # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
> +/* MTE tag check fault modes */
> +# define PR_MTE_TCF_SHIFT		1
> +# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
> +# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
>   
>   /* Control reclaim behavior when allocating memory */
>   #define PR_SET_IO_FLUSHER		57



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-08-04 19:34   ` Kevin Brodsky
@ 2020-08-05  9:24     ` Catalin Marinas
  0 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-08-05  9:24 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-arm-kernel, linux-mm, linux-arch, Will Deacon,
	Dave P Martin, Vincenzo Frascino, Szabolcs Nagy,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton

On Tue, Aug 04, 2020 at 08:34:42PM +0100, Kevin Brodsky wrote:
> On 15/07/2020 18:08, Catalin Marinas wrote:
> > By default, even if PROT_MTE is set on a memory range, there is no tag
> > check fault reporting (SIGSEGV). Introduce a set of option to the
> > exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
> > check fault mode:
> > 
> >    PR_MTE_TCF_NONE  - no reporting (default)
> >    PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
> >    PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
> > 
> > These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
> > context-switched by the kernel. Note that uaccess done by the kernel is
> > not checked and cannot be configured by the user.
> 
> The last sentence is outdated, it should probably say that uaccess is only
> checked in in synchronous mode.

Thanks, I forgot about the commit log. The documentation was updated to:

**Note**: Kernel accesses to the user address space (e.g. ``read()``
system call) are not checked if the user thread tag checking mode is
``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
address accesses, however it cannot always guarantee it.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-03 12:43           ` Szabolcs Nagy
@ 2020-08-07 15:19             ` Catalin Marinas
  2020-08-10 14:13               ` Szabolcs Nagy
  0 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-08-07 15:19 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

Hi Szabolcs,

On Mon, Aug 03, 2020 at 01:43:10PM +0100, Szabolcs Nagy wrote:
> The 07/28/2020 20:59, Catalin Marinas wrote:
> > On Tue, Jul 28, 2020 at 03:53:51PM +0100, Szabolcs Nagy wrote:
> > > if linux does not want to add a per process setting then only libc
> > > will be able to opt-in to mte and only at very early in the
> > > startup process (before executing any user code that may start
> > > threads). this is not out of question, but i think it limits the
> > > usage and deployment options.
> > 
> > There is also the risk that we try to be too flexible at this stage
> > without a real use-case.
> 
> i don't know how mte will be turned on in libc.
> 
> if we can always turn sync tag checks on early whenever mte is
> available then i think there is no issue.
> 
> but if we have to make the decision later for compatibility or
> performance reasons then per thread setting is problematic.

At least for libc, I'm not sure how you could even turn MTE on at
run-time. The heap allocations would have to be mapped with PROT_MTE as
we can't easily change them (well, you could mprotect(), assuming the
user doesn't use tagged pointers on them).

There is a case to switch tag checking from asynchronous to synchronous
at run-time based on a signal but that's rather specific to Android
where zygote controls the signal handler. I don't think you can do this
with libc. Even on Android, since the async fault signal is delivered
per thread, it probably does this lazily (alternatively, it could issue
a SIGUSRx to the other threads for synchronisation).

> use of the prctl outside of libc is very limited if it's per thread
> only:

In the non-Android context, I think the prctl() for MTE control should
be restricted to the libc. You can control the mode prior to the process
being started using environment variables. I really don't see how the
libc could handle the changing of the MTE behaviour at run-time without
itself handling signals.

> - application code may use it in a (elf specific) pre-initialization
>   function, but that's a bit obscure (not exposed in c) and it is
>   reasonable for an application to enable mte checks after it
>   registered a signal handler for mte faults. (and at that point it
>   may be multi-threaded).

Since the app can install signal handlers, it can also deal with
notifying other threads with a SIGUSRx, assuming that it decided this
after multiple threads were created. If it does this while
single-threaded, subsequent threads would inherit the first one.

The only use-case I see for doing this in the kernel is if the code
requiring an MTE behaviour change cannot install signal handlers. More
on this below.

> - library code normally initializes per thread state on the first call
>   into the library from a given thread, but with mte, as soon as
>   memory / pointers are tagged in one thread, all threads are
>   affected: not performing checks in other threads is less secure (may
>   be ok) and it means incompatible syscall abi (not ok). so at least
>   PR_TAGGED_ADDR_ENABLE should have process wide setting for this
>   usage.

My assumption with MTE is that the libc will initialise it when the
library is loaded (something __attribute__((constructor))) and it's
still in single-threaded mode. Does it wait until the first malloc()
call? Also, is there such thing as a per-thread initialiser for a
dynamic library (not sure it can be implemented in practice though)?

The PR_TAGGED_ADDR_ENABLE synchronisation at least doesn't require IPIs
to other CPUs to change the hardware state. However, it can still race
with thread creation or a prctl() on another thread, not sure what we
can define here, especially as it depends on the kernel internals: e.g.
thread creation copies some data structures of the calling thread but at
the same time another thread wants to change such structures for all
threads of that process. The ordering of events here looks pretty
fragile.

Maybe with another global status (per process) which takes priority over
the per thread one would be easier. But such priority is not temporal
(i.e. whoever called prctl() last) but pretty strict: once a global
control was requested, it will remain global no matter what subsequent
threads request (or we can do it the other way around).

> but i guess it is fine to design the mechanism for these in a later
> linux version, until then such usage will be unreliable (will depend
> on how early threads are created).

Until we have a real use-case, I'd not complicate the matters further.
For example, I'm still not sure how realistic it is for an application
to load a new heap allocator after some threads were created. Even the
glibc support, I don't think it needs this.

Could an LD_PRELOADED library be initialised after threads were created
(I guess it could if another preloaded library created threads)? Even if
it does, do we have an example or it's rather theoretical.

If this becomes an essential use-case, we can look at adding a new flag
for prctl() which would set the option globally, with the caveats
mentioned above. It doesn't need to be in the initial ABI (and the
PR_TAGGED_ADDR_ENABLE is already upstream).

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-07 15:19             ` Catalin Marinas
@ 2020-08-10 14:13               ` Szabolcs Nagy
  2020-08-11 17:20                 ` Catalin Marinas
  0 siblings, 1 reply; 53+ messages in thread
From: Szabolcs Nagy @ 2020-08-10 14:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 08/07/2020 16:19, Catalin Marinas wrote:
> On Mon, Aug 03, 2020 at 01:43:10PM +0100, Szabolcs Nagy wrote:
> > if we can always turn sync tag checks on early whenever mte is
> > available then i think there is no issue.
> > 
> > but if we have to make the decision later for compatibility or
> > performance reasons then per thread setting is problematic.
> 
> At least for libc, I'm not sure how you could even turn MTE on at
> run-time. The heap allocations would have to be mapped with PROT_MTE as
> we can't easily change them (well, you could mprotect(), assuming the
> user doesn't use tagged pointers on them).

e.g. dlopen of library with stack tagging.
(libc can mark stacks with PROT_MTE at that time)

or just turn on sync tag checks later when using
heap tagging.

> 
> There is a case to switch tag checking from asynchronous to synchronous
> at run-time based on a signal but that's rather specific to Android
> where zygote controls the signal handler. I don't think you can do this
> with libc. Even on Android, since the async fault signal is delivered
> per thread, it probably does this lazily (alternatively, it could issue
> a SIGUSRx to the other threads for synchronisation).

i think what that zygote is doing is a valid use-case but
in a normal linux setup the application owns the signal
handlers so the tag check switch has to be done by the
application. the libc can expose some api for it, so in
principle it's enough if the libc can do the runtime
switch, but we dont plan to add new libc apis for mte.

> > use of the prctl outside of libc is very limited if it's per thread
> > only:
> 
> In the non-Android context, I think the prctl() for MTE control should
> be restricted to the libc. You can control the mode prior to the process
> being started using environment variables. I really don't see how the
> libc could handle the changing of the MTE behaviour at run-time without
> itself handling signals.
> 
> > - application code may use it in a (elf specific) pre-initialization
> >   function, but that's a bit obscure (not exposed in c) and it is
> >   reasonable for an application to enable mte checks after it
> >   registered a signal handler for mte faults. (and at that point it
> >   may be multi-threaded).
> 
> Since the app can install signal handlers, it can also deal with
> notifying other threads with a SIGUSRx, assuming that it decided this
> after multiple threads were created. If it does this while
> single-threaded, subsequent threads would inherit the first one.

the application does not know what libraries create what
threads in the background, i dont think there is a way to
send signals to each thread (e.g. /proc/self/task cannot
be read atomically with respect to thread creation/exit).

the libc controls thread creation and exit so it can have
a list of threads it can notify, but an application cannot
do this. (libc could provide an api so applications can
do some per thread operation, but a libc would not do this
happily: currently there are locks around thread creation
and exit that are only needed for this "signal all threads"
mechanism which makes it hard to expose to users)

one way applications sometimes work this around is to
self re-exec. but that's a big hammer and not entirely
reliable (e.g. the exe may not be available on the
filesystem any more or the commandline args may need
to be preserved but they are clobbered, or some complex
application state needs to be recreated etc)

> 
> The only use-case I see for doing this in the kernel is if the code
> requiring an MTE behaviour change cannot install signal handlers. More
> on this below.
> 
> > - library code normally initializes per thread state on the first call
> >   into the library from a given thread, but with mte, as soon as
> >   memory / pointers are tagged in one thread, all threads are
> >   affected: not performing checks in other threads is less secure (may
> >   be ok) and it means incompatible syscall abi (not ok). so at least
> >   PR_TAGGED_ADDR_ENABLE should have process wide setting for this
> >   usage.
> 
> My assumption with MTE is that the libc will initialise it when the
> library is loaded (something __attribute__((constructor))) and it's
> still in single-threaded mode. Does it wait until the first malloc()
> call? Also, is there such thing as a per-thread initialiser for a
> dynamic library (not sure it can be implemented in practice though)?

there is no per thread initializer in an elf module.
(tls state is usually initialized lazily in threads
when necessary.)

malloc calls can happen before the ctors of an LD_PRELOAD
library and threads can be created before both.
glibc runs ldpreload ctors after other library ctors.

custom allocator can be of course dlopened.
(i'd expect several language runtimes to have their
own allocator and support dlopening the runtime)

> 
> The PR_TAGGED_ADDR_ENABLE synchronisation at least doesn't require IPIs
> to other CPUs to change the hardware state. However, it can still race
> with thread creation or a prctl() on another thread, not sure what we
> can define here, especially as it depends on the kernel internals: e.g.
> thread creation copies some data structures of the calling thread but at
> the same time another thread wants to change such structures for all
> threads of that process. The ordering of events here looks pretty
> fragile.
> 
> Maybe with another global status (per process) which takes priority over
> the per thread one would be easier. But such priority is not temporal
> (i.e. whoever called prctl() last) but pretty strict: once a global
> control was requested, it will remain global no matter what subsequent
> threads request (or we can do it the other way around).

i see.

> > but i guess it is fine to design the mechanism for these in a later
> > linux version, until then such usage will be unreliable (will depend
> > on how early threads are created).
> 
> Until we have a real use-case, I'd not complicate the matters further.
> For example, I'm still not sure how realistic it is for an application
> to load a new heap allocator after some threads were created. Even the
> glibc support, I don't think it needs this.
> 
> Could an LD_PRELOADED library be initialised after threads were created
> (I guess it could if another preloaded library created threads)? Even if
> it does, do we have an example or it's rather theoretical.

i believe this happens e.g. in applications built
with tsan. (the thread sanitizer creates a
background thread early which i think does not
call malloc itself but may want to access
malloced memory, but i don't have a setup with
tsan support to test this)

> 
> If this becomes an essential use-case, we can look at adding a new flag
> for prctl() which would set the option globally, with the caveats
> mentioned above. It doesn't need to be in the initial ABI (and the
> PR_TAGGED_ADDR_ENABLE is already upstream).
> 
> Thanks.
> 
> -- 
> Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-10 14:13               ` Szabolcs Nagy
@ 2020-08-11 17:20                 ` Catalin Marinas
  2020-08-12 12:45                   ` Szabolcs Nagy
  0 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-08-11 17:20 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

On Mon, Aug 10, 2020 at 03:13:09PM +0100, Szabolcs Nagy wrote:
> The 08/07/2020 16:19, Catalin Marinas wrote:
> > On Mon, Aug 03, 2020 at 01:43:10PM +0100, Szabolcs Nagy wrote:
> > > if we can always turn sync tag checks on early whenever mte is
> > > available then i think there is no issue.
> > > 
> > > but if we have to make the decision later for compatibility or
> > > performance reasons then per thread setting is problematic.
> > 
> > At least for libc, I'm not sure how you could even turn MTE on at
> > run-time. The heap allocations would have to be mapped with PROT_MTE as
> > we can't easily change them (well, you could mprotect(), assuming the
> > user doesn't use tagged pointers on them).
> 
> e.g. dlopen of library with stack tagging. (libc can mark stacks with
> PROT_MTE at that time)

If we allow such mixed object support with stack tagging enabled at
dlopen, PROT_MTE would need to be turned on for each thread stack. This
wouldn't require synchronisation, only knowing where the thread stacks
are, but you'd need to make sure threads don't call into the new library
until the stacks have been mprotect'ed. Doing this midway through a
function execution may corrupt the tags.

So I'm not sure how safe any of this is without explicit user
synchronisation (i.e. don't call into the library until all threads have
been updated). Even changing options like GCR_EL1.Excl across multiple
threads may have unwanted effects. See this comment from Peter, the
difference being that instead of an explicit prctl() call on the current
stack, another thread would do it:

https://lore.kernel.org/linux-arch/CAMn1gO5rhOG1W+nVe103v=smvARcFFp_Ct9XqH2Ca4BUMfpDdg@mail.gmail.com/

> or just turn on sync tag checks later when using heap tagging.

I wonder whether setting the synchronous tag check mode by default would
improve this aspect. This would not have any effect until PROT_MTE is
used. If software wants some better performance they can explicitly opt
in to asynchronous mode or disable tag checking after some SIGSEGV +
reporting (this shouldn't exclude the environment variables you
currently use for controlling the tag check mode).

Also, if there are saner defaults for the user GCR_EL1.Excl (currently
all masked), we should decide them now.

If stack tagging will come with some ELF information, we could make the
default tag checking and GCR_EL1.Excl choices based on that, otherwise
maybe we should revisit the default configuration the kernel sets for
the user in the absence of any other information.

> > There is a case to switch tag checking from asynchronous to synchronous
> > at run-time based on a signal but that's rather specific to Android
> > where zygote controls the signal handler. I don't think you can do this
> > with libc. Even on Android, since the async fault signal is delivered
> > per thread, it probably does this lazily (alternatively, it could issue
> > a SIGUSRx to the other threads for synchronisation).
> 
> i think what that zygote is doing is a valid use-case but
> in a normal linux setup the application owns the signal
> handlers so the tag check switch has to be done by the
> application. the libc can expose some api for it, so in
> principle it's enough if the libc can do the runtime
> switch, but we dont plan to add new libc apis for mte.

Due to the synchronisation aspect especially regarding the stack
tagging, I'm not sure the kernel alone can safely do this.

Changing the tagged address syscall ABI across multiple threads should
be safer (well, at least the relaxing part). But if we don't solve the
other aspects I mentioned above, I don't think there is much point in
only doing it for this.

> > > - library code normally initializes per thread state on the first call
> > >   into the library from a given thread, but with mte, as soon as
> > >   memory / pointers are tagged in one thread, all threads are
> > >   affected: not performing checks in other threads is less secure (may
> > >   be ok) and it means incompatible syscall abi (not ok). so at least
> > >   PR_TAGGED_ADDR_ENABLE should have process wide setting for this
> > >   usage.
> > 
> > My assumption with MTE is that the libc will initialise it when the
> > library is loaded (something __attribute__((constructor))) and it's
> > still in single-threaded mode. Does it wait until the first malloc()
> > call? Also, is there such thing as a per-thread initialiser for a
> > dynamic library (not sure it can be implemented in practice though)?
> 
> there is no per thread initializer in an elf module.
> (tls state is usually initialized lazily in threads
> when necessary.)
> 
> malloc calls can happen before the ctors of an LD_PRELOAD
> library and threads can be created before both.
> glibc runs ldpreload ctors after other library ctors.

In the presence of stack tagging, I think any subsequent MTE config
change across all threads is unsafe, irrespective of whether it's done
by the kernel or user via SIGUSRx. I think the best we can do here is
start with more appropriate defaults or enable them based on an ELF note
before the application is started. The dynamic loader would not have to
do anything extra here.

If we ignore stack tagging, the global configuration change may be
achievable. I think for the MTE bits, this could be done lazily by the
libc (e.g. on malloc()/free() call). The tag checking won't happen
before such calls unless we change the kernel defaults. There is still
the tagged address ABI enabling, could this be done lazily on syscall by
the libc? If not, the kernel could synchronise (force) this on syscall
entry from each thread based on some global prctl() bit.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-11 17:20                 ` Catalin Marinas
@ 2020-08-12 12:45                   ` Szabolcs Nagy
  2020-08-19  9:54                     ` Catalin Marinas
  0 siblings, 1 reply; 53+ messages in thread
From: Szabolcs Nagy @ 2020-08-12 12:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 08/11/2020 18:20, Catalin Marinas wrote:
> On Mon, Aug 10, 2020 at 03:13:09PM +0100, Szabolcs Nagy wrote:
> > The 08/07/2020 16:19, Catalin Marinas wrote:
> > > On Mon, Aug 03, 2020 at 01:43:10PM +0100, Szabolcs Nagy wrote:
> > > > if we can always turn sync tag checks on early whenever mte is
> > > > available then i think there is no issue.
> > > > 
> > > > but if we have to make the decision later for compatibility or
> > > > performance reasons then per thread setting is problematic.
> > > 
> > > At least for libc, I'm not sure how you could even turn MTE on at
> > > run-time. The heap allocations would have to be mapped with PROT_MTE as
> > > we can't easily change them (well, you could mprotect(), assuming the
> > > user doesn't use tagged pointers on them).
> > 
> > e.g. dlopen of library with stack tagging. (libc can mark stacks with
> > PROT_MTE at that time)
> 
> If we allow such mixed object support with stack tagging enabled at
> dlopen, PROT_MTE would need to be turned on for each thread stack. This
> wouldn't require synchronisation, only knowing where the thread stacks
> are, but you'd need to make sure threads don't call into the new library
> until the stacks have been mprotect'ed. Doing this midway through a
> function execution may corrupt the tags.
> 
> So I'm not sure how safe any of this is without explicit user
> synchronisation (i.e. don't call into the library until all threads have
> been updated). Even changing options like GCR_EL1.Excl across multiple
> threads may have unwanted effects. See this comment from Peter, the
> difference being that instead of an explicit prctl() call on the current
> stack, another thread would do it:
> 
> https://lore.kernel.org/linux-arch/CAMn1gO5rhOG1W+nVe103v=smvARcFFp_Ct9XqH2Ca4BUMfpDdg@mail.gmail.com/

there is no midway problem: the libc (ld.so) would do
the PROT_MTE at dlopen time based on some elf marking
(which can be handled before relocation processing,
so before library code can run, the midway problem
happens when a library, e.g libc, wants to turn on
stack tagging on itself).

the libc already does this when a library is loaded
that requires executable stack (it marks stacks as
PROT_EXEC at dlopen time or fails the dlopen if that
is not possible, this does not require running code
in other threads, only synchronization with thread
creation and exit. but changing the check mode for
mte needs per thread code execution.).

i'm not entirely sure if this is a good idea, but i
expect stack tagging not to be used in the libc
(because libc needs to run on all hw and we don't yet
have a backward compatible stack tagging solution),
so stack tagging should work when only some elf modules
in a process are built with it, which implies that
enabling it at dlopen time should work otherwise
it will not be very useful.

> > or just turn on sync tag checks later when using heap tagging.
> 
> I wonder whether setting the synchronous tag check mode by default would
> improve this aspect. This would not have any effect until PROT_MTE is
> used. If software wants some better performance they can explicitly opt
> in to asynchronous mode or disable tag checking after some SIGSEGV +
> reporting (this shouldn't exclude the environment variables you
> currently use for controlling the tag check mode).
> 
> Also, if there are saner defaults for the user GCR_EL1.Excl (currently
> all masked), we should decide them now.
> 
> If stack tagging will come with some ELF information, we could make the
> default tag checking and GCR_EL1.Excl choices based on that, otherwise
> maybe we should revisit the default configuration the kernel sets for
> the user in the absence of any other information.

do tag checks have overhead if PROT_MTE is not used?
i'd expect some checks are still done at memory access.
(and the tagged address syscall abi has to be in use.)

turning sync tag checks on early would enable the
most of the interesting usecases (only PROT_MTE has
to be handled at runtime not the prctls. however i
don't yet know how userspace will deal with compat
issues, i.e. it may not be valid to unconditionally
turn tag checks on early).

> > > > - library code normally initializes per thread state on the first call
> > > >   into the library from a given thread, but with mte, as soon as
> > > >   memory / pointers are tagged in one thread, all threads are
> > > >   affected: not performing checks in other threads is less secure (may
> > > >   be ok) and it means incompatible syscall abi (not ok). so at least
> > > >   PR_TAGGED_ADDR_ENABLE should have process wide setting for this
> > > >   usage.
> > > 
> > > My assumption with MTE is that the libc will initialise it when the
> > > library is loaded (something __attribute__((constructor))) and it's
> > > still in single-threaded mode. Does it wait until the first malloc()
> > > call? Also, is there such thing as a per-thread initialiser for a
> > > dynamic library (not sure it can be implemented in practice though)?
> > 
> > there is no per thread initializer in an elf module.
> > (tls state is usually initialized lazily in threads
> > when necessary.)
> > 
> > malloc calls can happen before the ctors of an LD_PRELOAD
> > library and threads can be created before both.
> > glibc runs ldpreload ctors after other library ctors.
> 
> In the presence of stack tagging, I think any subsequent MTE config
> change across all threads is unsafe, irrespective of whether it's done
> by the kernel or user via SIGUSRx. I think the best we can do here is
> start with more appropriate defaults or enable them based on an ELF note
> before the application is started. The dynamic loader would not have to
> do anything extra here.
> 
> If we ignore stack tagging, the global configuration change may be
> achievable. I think for the MTE bits, this could be done lazily by the
> libc (e.g. on malloc()/free() call). The tag checking won't happen
> before such calls unless we change the kernel defaults. There is still
> the tagged address ABI enabling, could this be done lazily on syscall by
> the libc? If not, the kernel could synchronise (force) this on syscall
> entry from each thread based on some global prctl() bit.

i think the interesting use-cases are all about
changing mte settings before mte is in use in any
way but after there are multiple threads.
(the async -> sync mode change on tag faults is
i think less interesting to the gnu linux world.)

i guess lazy syscall abi switch works, but it is
ugly: raw syscall usage will be problematic and
doing checks before calling into the vdso might
have unwanted overhead.

based on the discussion it seems we should design
the userspace abis so that per process prctl is
not required and then see how far we get.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-07-15 17:08 ` [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
@ 2020-08-13 14:01   ` Luis Machado
  2020-08-22 10:56     ` Catalin Marinas
  0 siblings, 1 reply; 53+ messages in thread
From: Luis Machado @ 2020-08-13 14:01 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: linux-mm, linux-arch, Will Deacon, Dave P Martin,
	Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Alan Hayward, Omair Javaid

Hi Catalin,

There has been changes to GDB's/LLDB's side to incorporate a tag type 
into the peek/poke requests. This is an attempt to anticipate required 
support for other tag types, like CHERI's tags. Those are somewhat 
different from MTE's tags.

The core file design for storing tags, which is in progress and 
currently in my court, is also taking into account other types of tags.

Given the above, should we consider passing a type to the kernel ptrace 
requests as well?

Also, since the ptrace requests would have to handle different types of 
tags, should we rename PEEKMTETAGS/POKEMTETAGS to PEEKTAGS/POKETAGS 
instead and make those requests generic?

Regards,
Luis

On 7/15/20 2:08 PM, Catalin Marinas wrote:
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
> 
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
> 
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
> 
> Notes:
>      v4:
>      - Following the change to only clear the tags in a page if it is mapped
>        to user with PROT_MTE, ptrace() now will refuse to access tags in
>        pages not previously mapped with PROT_MTE (PG_mte_tagged set). This is
>        primarily to avoid leaking uninitialised tags to user via ptrace().
>      - Fix SYM_FUNC_END argument typo.
>      - Rename MTE_ALLOC_* to MTE_GRANULE_*.
>      - Use uao_user_alternative for the user access in case we ever want to
>        call mte_copy_tags_* with a kernel buffer. It also matches the other
>        uaccess routines in the kernel.
>      - Simplify arch_ptrace() slightly.
>      - Reorder down_write_killable() with access_ok() in
>        __access_remote_tags().
>      - Handle copy length 0 in mte_copy_tags_{to,from}_user().
>      - Use put_user() instead of __put_user().
>      
>      New in v3.
> 
>   arch/arm64/include/asm/mte.h         |  17 ++++
>   arch/arm64/include/uapi/asm/ptrace.h |   3 +
>   arch/arm64/kernel/mte.c              | 139 +++++++++++++++++++++++++++
>   arch/arm64/kernel/ptrace.c           |   7 ++
>   arch/arm64/lib/mte.S                 |  53 ++++++++++
>   5 files changed, 219 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 1a919905295b..7ea0c0e526d1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -5,6 +5,11 @@
>   #ifndef __ASM_MTE_H
>   #define __ASM_MTE_H
>   
> +#define MTE_GRANULE_SIZE	UL(16)
> +#define MTE_GRANULE_MASK	(~(MTE_GRANULE_SIZE - 1))
> +#define MTE_TAG_SHIFT		56
> +#define MTE_TAG_SIZE		4
> +
>   #ifndef __ASSEMBLY__
>   
>   #include <linux/page-flags.h>
> @@ -12,6 +17,10 @@
>   #include <asm/pgtable-types.h>
>   
>   void mte_clear_page_tags(void *addr);
> +unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
> +				      unsigned long n);
> +unsigned long mte_copy_tags_to_user(void __user *to, void *from,
> +				    unsigned long n);
>   
>   #ifdef CONFIG_ARM64_MTE
>   
> @@ -25,6 +34,8 @@ void mte_thread_switch(struct task_struct *next);
>   void mte_suspend_exit(void);
>   long set_mte_ctrl(struct task_struct *task, unsigned long arg);
>   long get_mte_ctrl(struct task_struct *task);
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +			 unsigned long addr, unsigned long data);
>   
>   #else
>   
> @@ -54,6 +65,12 @@ static inline long get_mte_ctrl(struct task_struct *task)
>   {
>   	return 0;
>   }
> +static inline int mte_ptrace_copy_tags(struct task_struct *child,
> +				       long request, unsigned long addr,
> +				       unsigned long data)
> +{
> +	return -EIO;
> +}
>   
>   #endif
>   
> diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
> index 06413d9f2341..758ae984ff97 100644
> --- a/arch/arm64/include/uapi/asm/ptrace.h
> +++ b/arch/arm64/include/uapi/asm/ptrace.h
> @@ -76,6 +76,9 @@
>   /* syscall emulation path in ptrace */
>   #define PTRACE_SYSEMU		  31
>   #define PTRACE_SYSEMU_SINGLESTEP  32
> +/* MTE allocation tag access */
> +#define PTRACE_PEEKMTETAGS	  33
> +#define PTRACE_POKEMTETAGS	  34
>   
>   #ifndef __ASSEMBLY__
>   
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index e80c49af74af..0a8b90afe9d7 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -4,14 +4,18 @@
>    */
>   
>   #include <linux/bitops.h>
> +#include <linux/kernel.h>
>   #include <linux/mm.h>
>   #include <linux/prctl.h>
>   #include <linux/sched.h>
> +#include <linux/sched/mm.h>
>   #include <linux/string.h>
>   #include <linux/thread_info.h>
> +#include <linux/uio.h>
>   
>   #include <asm/cpufeature.h>
>   #include <asm/mte.h>
> +#include <asm/ptrace.h>
>   #include <asm/sysreg.h>
>   
>   void mte_sync_tags(pte_t *ptep, pte_t pte)
> @@ -179,3 +183,138 @@ long get_mte_ctrl(struct task_struct *task)
>   
>   	return ret;
>   }
> +
> +/*
> + * Access MTE tags in another process' address space as given in mm. Update
> + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> + * Inspired by __access_remote_vm().
> + */
> +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> +				unsigned long addr, struct iovec *kiov,
> +				unsigned int gup_flags)
> +{
> +	struct vm_area_struct *vma;
> +	void __user *buf = kiov->iov_base;
> +	size_t len = kiov->iov_len;
> +	int ret;
> +	int write = gup_flags & FOLL_WRITE;
> +
> +	if (!access_ok(buf, len))
> +		return -EFAULT;
> +
> +	if (mmap_read_lock_killable(mm))
> +		return -EIO;
> +
> +	while (len) {
> +		unsigned long tags, offset;
> +		void *maddr;
> +		struct page *page = NULL;
> +
> +		ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
> +					    &page, &vma, NULL);
> +		if (ret <= 0)
> +			break;
> +
> +		/*
> +		 * Only copy tags if the page has been mapped as PROT_MTE
> +		 * (PG_mte_tagged set). Otherwise the tags are not valid and
> +		 * not accessible to user. Moreover, an mprotect(PROT_MTE)
> +		 * would cause the existing tags to be cleared if the page
> +		 * was never mapped with PROT_MTE.
> +		 */
> +		if (!test_bit(PG_mte_tagged, &page->flags)) {
> +			ret = -EOPNOTSUPP;
> +			put_page(page);
> +			break;
> +		}
> +
> +		/* limit access to the end of the page */
> +		offset = offset_in_page(addr);
> +		tags = min(len, (PAGE_SIZE - offset) / MTE_GRANULE_SIZE);
> +
> +		maddr = page_address(page);
> +		if (write) {
> +			tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
> +			set_page_dirty_lock(page);
> +		} else {
> +			tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
> +		}
> +		put_page(page);
> +
> +		/* error accessing the tracer's buffer */
> +		if (!tags)
> +			break;
> +
> +		len -= tags;
> +		buf += tags;
> +		addr += tags * MTE_GRANULE_SIZE;
> +	}
> +	mmap_read_unlock(mm);
> +
> +	/* return an error if no tags copied */
> +	kiov->iov_len = buf - kiov->iov_base;
> +	if (!kiov->iov_len) {
> +		/* check for error accessing the tracee's address space */
> +		if (ret <= 0)
> +			return -EIO;
> +		else
> +			return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Copy MTE tags in another process' address space at 'addr' to/from tracer's
> + * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
> + */
> +static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
> +			      struct iovec *kiov, unsigned int gup_flags)
> +{
> +	struct mm_struct *mm;
> +	int ret;
> +
> +	mm = get_task_mm(tsk);
> +	if (!mm)
> +		return -EPERM;
> +
> +	if (!tsk->ptrace || (current != tsk->parent) ||
> +	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
> +	     !ptracer_capable(tsk, mm->user_ns))) {
> +		mmput(mm);
> +		return -EPERM;
> +	}
> +
> +	ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +			 unsigned long addr, unsigned long data)
> +{
> +	int ret;
> +	struct iovec kiov;
> +	struct iovec __user *uiov = (void __user *)data;
> +	unsigned int gup_flags = FOLL_FORCE;
> +
> +	if (!system_supports_mte())
> +		return -EIO;
> +
> +	if (get_user(kiov.iov_base, &uiov->iov_base) ||
> +	    get_user(kiov.iov_len, &uiov->iov_len))
> +		return -EFAULT;
> +
> +	if (request == PTRACE_POKEMTETAGS)
> +		gup_flags |= FOLL_WRITE;
> +
> +	/* align addr to the MTE tag granule */
> +	addr &= MTE_GRANULE_MASK;
> +
> +	ret = access_remote_tags(child, addr, &kiov, gup_flags);
> +	if (!ret)
> +		ret = put_user(kiov.iov_len, &uiov->iov_len);
> +
> +	return ret;
> +}
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 4582014dda25..653a03598c75 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -34,6 +34,7 @@
>   #include <asm/cpufeature.h>
>   #include <asm/debug-monitors.h>
>   #include <asm/fpsimd.h>
> +#include <asm/mte.h>
>   #include <asm/pointer_auth.h>
>   #include <asm/stacktrace.h>
>   #include <asm/syscall.h>
> @@ -1796,6 +1797,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>   long arch_ptrace(struct task_struct *child, long request,
>   		 unsigned long addr, unsigned long data)
>   {
> +	switch (request) {
> +	case PTRACE_PEEKMTETAGS:
> +	case PTRACE_POKEMTETAGS:
> +		return mte_ptrace_copy_tags(child, request, addr, data);
> +	}
> +
>   	return ptrace_request(child, request, addr, data);
>   }
>   
> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> index 3c3d0edbbca3..434f81d9a180 100644
> --- a/arch/arm64/lib/mte.S
> +++ b/arch/arm64/lib/mte.S
> @@ -4,7 +4,9 @@
>    */
>   #include <linux/linkage.h>
>   
> +#include <asm/alternative.h>
>   #include <asm/assembler.h>
> +#include <asm/mte.h>
>   #include <asm/page.h>
>   #include <asm/sysreg.h>
>   
> @@ -51,3 +53,54 @@ SYM_FUNC_START(mte_copy_page_tags)
>   	b.ne	1b
>   	ret
>   SYM_FUNC_END(mte_copy_page_tags)
> +
> +/*
> + * Read tags from a user buffer (one tag per byte) and set the corresponding
> + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> + *   x0 - kernel address (to)
> + *   x1 - user buffer (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_from_user)
> +	mov	x3, x1
> +	cbz	x2, 2f
> +1:
> +	uao_user_alternative 2f, ldrb, ldtrb, w4, x1, 0
> +	lsl	x4, x4, #MTE_TAG_SHIFT
> +	stg	x4, [x0], #MTE_GRANULE_SIZE
> +	add	x1, x1, #1
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x1, x3		// update the number of tags set
> +	ret
> +SYM_FUNC_END(mte_copy_tags_from_user)
> +
> +/*
> + * Get the tags from a kernel address range and write the tag values to the
> + * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
> + *   x0 - user buffer (to)
> + *   x1 - kernel address (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_to_user)
> +	mov	x3, x0
> +	cbz	x2, 2f
> +1:
> +	ldg	x4, [x1]
> +	ubfx	x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
> +	uao_user_alternative 2f, strb, sttrb, w4, x0, 0
> +	add	x0, x0, #1
> +	add	x1, x1, #MTE_GRANULE_SIZE
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x0, x3		// update the number of tags copied
> +	ret
> +SYM_FUNC_END(mte_copy_tags_to_user)
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-12 12:45                   ` Szabolcs Nagy
@ 2020-08-19  9:54                     ` Catalin Marinas
  2020-08-20 16:43                       ` Szabolcs Nagy
  0 siblings, 1 reply; 53+ messages in thread
From: Catalin Marinas @ 2020-08-19  9:54 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

On Wed, Aug 12, 2020 at 01:45:21PM +0100, Szabolcs Nagy wrote:
> On 08/11/2020 18:20, Catalin Marinas wrote:
> > If we allow such mixed object support with stack tagging enabled at
> > dlopen, PROT_MTE would need to be turned on for each thread stack. This
> > wouldn't require synchronisation, only knowing where the thread stacks
> > are, but you'd need to make sure threads don't call into the new library
> > until the stacks have been mprotect'ed. Doing this midway through a
> > function execution may corrupt the tags.
> > 
> > So I'm not sure how safe any of this is without explicit user
> > synchronisation (i.e. don't call into the library until all threads have
> > been updated). Even changing options like GCR_EL1.Excl across multiple
> > threads may have unwanted effects. See this comment from Peter, the
> > difference being that instead of an explicit prctl() call on the current
> > stack, another thread would do it:
> > 
> > https://lore.kernel.org/linux-arch/CAMn1gO5rhOG1W+nVe103v=smvARcFFp_Ct9XqH2Ca4BUMfpDdg@mail.gmail.com/
> 
> there is no midway problem: the libc (ld.so) would do the PROT_MTE at
> dlopen time based on some elf marking (which can be handled before
> relocation processing, so before library code can run, the midway
> problem happens when a library, e.g libc, wants to turn on stack
> tagging on itself).

OK, that makes sense, you can't call into the new object until the
relocations have been resolved.

> the libc already does this when a library is loaded that requires
> executable stack (it marks stacks as PROT_EXEC at dlopen time or fails
> the dlopen if that is not possible, this does not require running code
> in other threads, only synchronization with thread creation and exit.
> but changing the check mode for mte needs per thread code execution.).
> 
> i'm not entirely sure if this is a good idea, but i expect stack
> tagging not to be used in the libc (because libc needs to run on all
> hw and we don't yet have a backward compatible stack tagging
> solution),

In theory, you could have two libc deployed in your distro and ldd gets
smarter to pick the right one. I still hope we'd find a compromise with
stack tagging and single binary.

> so stack tagging should work when only some elf modules in a process
> are built with it, which implies that enabling it at dlopen time
> should work otherwise it will not be very useful.

There is still the small risk of an old object using tagged pointers to
the stack. Since the stack would be shared between such objects, turning
PROT_MTE on would cause issues. Hopefully such problems are minor and
not really a concern for the kernel.

> do tag checks have overhead if PROT_MTE is not used? i'd expect some
> checks are still done at memory access. (and the tagged address
> syscall abi has to be in use.)

My understanding from talking to hardware engineers is that there won't
be an overhead if PROT_MTE is not used, no tags being fetched or
checked. But I can't guarantee until we get real silicon.

> turning sync tag checks on early would enable the most of the
> interesting usecases (only PROT_MTE has to be handled at runtime not
> the prctls. however i don't yet know how userspace will deal with
> compat issues, i.e. it may not be valid to unconditionally turn tag
> checks on early).

If we change the defaults so that no prctl() is required for the
standard use-case, it would solve most of the common deployment issues:

1. Tagged address ABI default on when HWCAP2_MTE is present
2. Synchronous TCF by default
3. GCR_EL1.Excl allows all tags except 0 by default

Any other configuration diverging from the above is considered
specialist deployment and will have to issue the prctl() on a per-thread
basis.

Compat issues in user-space will be dealt with via environment
variables but pretty much on/off rather than fine-grained tag checking
mode. So for glibc, you'd have only _MTAG=0 or 1 and the only effect is
using PROT_MTE + tagged pointers or no-PROT_MTE + tag 0.

> > In the presence of stack tagging, I think any subsequent MTE config
> > change across all threads is unsafe, irrespective of whether it's done
> > by the kernel or user via SIGUSRx. I think the best we can do here is
> > start with more appropriate defaults or enable them based on an ELF note
> > before the application is started. The dynamic loader would not have to
> > do anything extra here.
> > 
> > If we ignore stack tagging, the global configuration change may be
> > achievable. I think for the MTE bits, this could be done lazily by the
> > libc (e.g. on malloc()/free() call). The tag checking won't happen
> > before such calls unless we change the kernel defaults. There is still
> > the tagged address ABI enabling, could this be done lazily on syscall by
> > the libc? If not, the kernel could synchronise (force) this on syscall
> > entry from each thread based on some global prctl() bit.
> 
> i think the interesting use-cases are all about changing mte settings
> before mte is in use in any way but after there are multiple threads.
> (the async -> sync mode change on tag faults is i think less
> interesting to the gnu linux world.)

So let's consider async/sync/no-check specialist uses and glibc would
not have to handle them. I don't think async mode is useful on its own
unless you have a way to turn on sync mode at run-time for more precise
error identification (well, hoping that it will happen again).

> i guess lazy syscall abi switch works, but it is ugly: raw syscall
> usage will be problematic and doing checks before calling into the
> vdso might have unwanted overhead.

This lazy ABI switch could be handled by the kernel, though I wonder
whether we should just relax it permanently when HWCAP2_MTE is present.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-19  9:54                     ` Catalin Marinas
@ 2020-08-20 16:43                       ` Szabolcs Nagy
  2020-08-20 17:27                         ` Paul Eggert
  2020-08-22 11:28                         ` Catalin Marinas
  0 siblings, 2 replies; 53+ messages in thread
From: Szabolcs Nagy @ 2020-08-20 16:43 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd, libc-alpha, Richard Earnshaw,
	Matthew Malcomson

adding libc-alpha on cc, they might have
additional input about compat issue handling:

the discussion is about enabling tag checks,
which cannot be done reasonably at runtime when
a process is already multi-threaded so it has to
be decided early either in the libc or in the
kernel. such early decision might have backward
compat issues (we dont yet know if any allocator
wants to use tagging opportunistically later for
debugging or if there will be code loaded later
that is incompatible with tag checks).

The 08/19/2020 10:54, Catalin Marinas wrote:
> On Wed, Aug 12, 2020 at 01:45:21PM +0100, Szabolcs Nagy wrote:
> > On 08/11/2020 18:20, Catalin Marinas wrote:
> > > If we allow such mixed object support with stack tagging enabled at
> > > dlopen, PROT_MTE would need to be turned on for each thread stack. This
> > > wouldn't require synchronisation, only knowing where the thread stacks
> > > are, but you'd need to make sure threads don't call into the new library
> > > until the stacks have been mprotect'ed. Doing this midway through a
> > > function execution may corrupt the tags.
...
> > there is no midway problem: the libc (ld.so) would do the PROT_MTE at
> > dlopen time based on some elf marking (which can be handled before
> > relocation processing, so before library code can run, the midway
> > problem happens when a library, e.g libc, wants to turn on stack
> > tagging on itself).
> 
> OK, that makes sense, you can't call into the new object until the
> relocations have been resolved.
> 
> > the libc already does this when a library is loaded that requires
> > executable stack (it marks stacks as PROT_EXEC at dlopen time or fails
> > the dlopen if that is not possible, this does not require running code
> > in other threads, only synchronization with thread creation and exit.
> > but changing the check mode for mte needs per thread code execution.).
> > 
> > i'm not entirely sure if this is a good idea, but i expect stack
> > tagging not to be used in the libc (because libc needs to run on all
> > hw and we don't yet have a backward compatible stack tagging
> > solution),
> 
> In theory, you could have two libc deployed in your distro and ldd gets
> smarter to pick the right one. I still hope we'd find a compromise with
> stack tagging and single binary.

distros don't like the two libc solution, i
think we will look at backward compat stack
tagging support, but we might end up not
having any stack tagging in libc, only in
user binaries (used for debugging).

> > so stack tagging should work when only some elf modules in a process
> > are built with it, which implies that enabling it at dlopen time
> > should work otherwise it will not be very useful.
> 
> There is still the small risk of an old object using tagged pointers to
> the stack. Since the stack would be shared between such objects, turning
> PROT_MTE on would cause issues. Hopefully such problems are minor and
> not really a concern for the kernel.
> 
> > do tag checks have overhead if PROT_MTE is not used? i'd expect some
> > checks are still done at memory access. (and the tagged address
> > syscall abi has to be in use.)
> 
> My understanding from talking to hardware engineers is that there won't
> be an overhead if PROT_MTE is not used, no tags being fetched or
> checked. But I can't guarantee until we get real silicon.
> 
> > turning sync tag checks on early would enable the most of the
> > interesting usecases (only PROT_MTE has to be handled at runtime not
> > the prctls. however i don't yet know how userspace will deal with
> > compat issues, i.e. it may not be valid to unconditionally turn tag
> > checks on early).
> 
> If we change the defaults so that no prctl() is required for the
> standard use-case, it would solve most of the common deployment issues:
> 
> 1. Tagged address ABI default on when HWCAP2_MTE is present
> 2. Synchronous TCF by default
> 3. GCR_EL1.Excl allows all tags except 0 by default
> 
> Any other configuration diverging from the above is considered
> specialist deployment and will have to issue the prctl() on a per-thread
> basis.
> 
> Compat issues in user-space will be dealt with via environment
> variables but pretty much on/off rather than fine-grained tag checking
> mode. So for glibc, you'd have only _MTAG=0 or 1 and the only effect is
> using PROT_MTE + tagged pointers or no-PROT_MTE + tag 0.

enabling mte checks by default would be
nice and simple (a libc can support tagging
allocators without any change assuming its
code is mte safe which is true e.g. for the
latest glibc release and for musl libc).

the compat issue with this is existing code
using pointer top bits which i assume faults
when dereferenced with the mte checks enabled.
(although this should be very rare since
top byte ignore on deref is aarch64 specific.)

i see two options:

- don't care about top bit compat issues:
  change the default in the kernel as you
  described (so checks are enabled and users
  only need PROT_MTE mapping if they want to
  use taggging).

- care about top bit issues:
  leave the kernel abi as in the patch set
  and do the mte setup early in the libc.
  add elf markings to new binaries that
  they are mte compatible and libc can use
  that marking for the mte setup. dlopening
  incompatible libraries will fail. the
  issue with this is that we have no idea
  how to add the marking and the marking
  prevents mte use with existing binaries
  (and eg. ldpreload malloc would require
  an updated libc).

for me it's hard to figure out which is
the right direction for mte.

> > > In the presence of stack tagging, I think any subsequent MTE config
> > > change across all threads is unsafe, irrespective of whether it's done
> > > by the kernel or user via SIGUSRx. I think the best we can do here is
> > > start with more appropriate defaults or enable them based on an ELF note
> > > before the application is started. The dynamic loader would not have to
> > > do anything extra here.
> > > 
> > > If we ignore stack tagging, the global configuration change may be
> > > achievable. I think for the MTE bits, this could be done lazily by the
> > > libc (e.g. on malloc()/free() call). The tag checking won't happen
> > > before such calls unless we change the kernel defaults. There is still
> > > the tagged address ABI enabling, could this be done lazily on syscall by
> > > the libc? If not, the kernel could synchronise (force) this on syscall
> > > entry from each thread based on some global prctl() bit.
> > 
> > i think the interesting use-cases are all about changing mte settings
> > before mte is in use in any way but after there are multiple threads.
> > (the async -> sync mode change on tag faults is i think less
> > interesting to the gnu linux world.)
> 
> So let's consider async/sync/no-check specialist uses and glibc would
> not have to handle them. I don't think async mode is useful on its own
> unless you have a way to turn on sync mode at run-time for more precise
> error identification (well, hoping that it will happen again).
> 
> > i guess lazy syscall abi switch works, but it is ugly: raw syscall
> > usage will be problematic and doing checks before calling into the
> > vdso might have unwanted overhead.
> 
> This lazy ABI switch could be handled by the kernel, though I wonder
> whether we should just relax it permanently when HWCAP2_MTE is present.

yeah i don't immediately see a problem with that.
but ideally there would be an escape hatch
(a way to opt out from the change).

thanks



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-20 16:43                       ` Szabolcs Nagy
@ 2020-08-20 17:27                         ` Paul Eggert
  2020-08-22 11:31                           ` Catalin Marinas
  2020-08-22 11:28                         ` Catalin Marinas
  1 sibling, 1 reply; 53+ messages in thread
From: Paul Eggert @ 2020-08-20 17:27 UTC (permalink / raw)
  To: Szabolcs Nagy, Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, nd, libc-alpha, Will Deacon,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Matthew Malcomson,
	Andrew Morton, Vincenzo Frascino, Peter Collingbourne,
	Dave Martin, linux-arm-kernel

On 8/20/20 9:43 AM, Szabolcs Nagy wrote:
> the compat issue with this is existing code
> using pointer top bits which i assume faults
> when dereferenced with the mte checks enabled.
> (although this should be very rare since
> top byte ignore on deref is aarch64 specific.)

Does anyone know of significant aarch64-specific application code that depends 
on top byte ignore? I would think it's so rare (nonexistent?) as to not be worth 
worrying about.

Even in the bad old days when Emacs used pointer top bits for typechecking, it 
carefully removed those bits before dereferencing. Any other reasonably-portable 
application would have to do the same of course.

This whole thing reminds me of the ancient IBM S/360 mainframes that were 
documented to ignore the top 8 bits of 32-bit addresses merely because a single 
model (the IBM 360/30, circa 1965) was so underpowered that it couldn't quickly 
check that the top bits were zero. This has caused countless software hassles 
over the years. Even today, the IBM z-Series hardware and software still 
supports 24-bit addressing mode because of that early-1960s design mistake. See:

Mashey JR. The Long Road to 64 Bits. ACM Queue. 2006-10-10. 
https://queue.acm.org/detail.cfm?id=1165766


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-08-13 14:01   ` Luis Machado
@ 2020-08-22 10:56     ` Catalin Marinas
  0 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-08-22 10:56 UTC (permalink / raw)
  To: Luis Machado
  Cc: linux-arm-kernel, linux-mm, linux-arch, Will Deacon,
	Dave P Martin, Vincenzo Frascino, Szabolcs Nagy, Kevin Brodsky,
	Andrey Konovalov, Peter Collingbourne, Andrew Morton,
	Alan Hayward, Omair Javaid

Hi Luis,

On Thu, Aug 13, 2020 at 11:01:06AM -0300, Luis Machado wrote:
> There has been changes to GDB's/LLDB's side to incorporate a tag type into
> the peek/poke requests. This is an attempt to anticipate required support
> for other tag types, like CHERI's tags. Those are somewhat different from
> MTE's tags.

Please note that Morello (the ARM's CHERI implementation) won't go into
mainline Linux for the time being. It's a development board to
experiment with CHERI and the architecture may eventually turn out
slightly different. Also note that the current Morello hardware doesn't
support MTE.

The tags are indeed different from the MTE ones, though both are just
additional metadata associated with a set of bytes in memory. It happens
that a tag in both cases corresponds to a 16-byte memory range.

> The core file design for storing tags, which is in progress and currently in
> my court, is also taking into account other types of tags.

It makes sense for the core file.

> Given the above, should we consider passing a type to the kernel ptrace
> requests as well?
> 
> Also, since the ptrace requests would have to handle different types of
> tags, should we rename PEEKMTETAGS/POKEMTETAGS to PEEKTAGS/POKETAGS instead
> and make those requests generic?

I'm not sure how we could pass a type since ptrace() only takes a single
argument for the request. We could use a different structure than iovec
and encode a type in a field in the new structure but I'd rather keep
the generic struct iovec. So basically the "MTE" part in PEEKMTETAGS is
the type.

Internally, the kernel implementation will probably translate the
request into a common function call with a tag type but for the
user-visible ptrace() interface, I don't see what benefits it would
bring. If you have a better suggestion on how to encode the type, I'm
open to discuss it.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-20 16:43                       ` Szabolcs Nagy
  2020-08-20 17:27                         ` Paul Eggert
@ 2020-08-22 11:28                         ` Catalin Marinas
  1 sibling, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-08-22 11:28 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Peter Collingbourne, Andrey Konovalov,
	Kevin Brodsky, linux-mm, Andrew Morton, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd, libc-alpha, Richard Earnshaw,
	Matthew Malcomson

On Thu, Aug 20, 2020 at 05:43:15PM +0100, Szabolcs Nagy wrote:
> The 08/19/2020 10:54, Catalin Marinas wrote:
> > On Wed, Aug 12, 2020 at 01:45:21PM +0100, Szabolcs Nagy wrote:
> > > On 08/11/2020 18:20, Catalin Marinas wrote:
> > > turning sync tag checks on early would enable the most of the
> > > interesting usecases (only PROT_MTE has to be handled at runtime not
> > > the prctls. however i don't yet know how userspace will deal with
> > > compat issues, i.e. it may not be valid to unconditionally turn tag
> > > checks on early).
> > 
> > If we change the defaults so that no prctl() is required for the
> > standard use-case, it would solve most of the common deployment issues:
> > 
> > 1. Tagged address ABI default on when HWCAP2_MTE is present
> > 2. Synchronous TCF by default
> > 3. GCR_EL1.Excl allows all tags except 0 by default
> > 
> > Any other configuration diverging from the above is considered
> > specialist deployment and will have to issue the prctl() on a per-thread
> > basis.
> > 
> > Compat issues in user-space will be dealt with via environment
> > variables but pretty much on/off rather than fine-grained tag checking
> > mode. So for glibc, you'd have only _MTAG=0 or 1 and the only effect is
> > using PROT_MTE + tagged pointers or no-PROT_MTE + tag 0.
> 
> enabling mte checks by default would be nice and simple (a libc can
> support tagging allocators without any change assuming its code is mte
> safe which is true e.g. for the latest glibc release and for musl
> libc).

While talking to the Android folk, it occurred to me that the default
tag checking mode doesn't even need to be decided by the kernel. The
dynamic loader can set the desired tag check mode and the tagged address
ABI based on environment variables (_MTAG_ENABLE=x) and do a prctl()
before any threads have been created. Subsequent malloc() calls or
dlopen() can mmap/mprotect different memory regions to PROT_MTE and all
threads will be affected equally.

The only configuration a heap allocator may want to change is the tag
exclude mask (GCR_EL1.Excl) but even this can, by convention, be
configured by the dynamic loader.

> the compat issue with this is existing code using pointer top bits
> which i assume faults when dereferenced with the mte checks enabled.
> (although this should be very rare since top byte ignore on deref is
> aarch64 specific.)

They'd fault only if they dereference PROT_MTE memory and the tag check
mode is async or sync.

> i see two options:
> 
> - don't care about top bit compat issues:
>   change the default in the kernel as you described (so checks are
>   enabled and users only need PROT_MTE mapping if they want to use
>   taggging).

As I said above, suggested by the Google guys, this default choice can
be left with the dynamic loader before any threads are started.

> - care about top bit issues:
>   leave the kernel abi as in the patch set and do the mte setup early
>   in the libc. add elf markings to new binaries that they are mte
>   compatible and libc can use that marking for the mte setup.
>   dlopening incompatible libraries will fail. the issue with this is
>   that we have no idea how to add the marking and the marking prevents
>   mte use with existing binaries (and eg. ldpreload malloc would
>   require an updated libc).

Maybe a third option (which leaves the kernel ABI as is):

If the ELF markings only control the PROT_MTE regions (stack or heap),
we can configure the tag checking mode and tagged address ABI early
through environment variables (_MTAG_ENABLE). If you have a problematic
binary, just set _MTAG_ENABLE=0 and a dlopen, even if loading an
MTE-capable object, would not map the stack with PROT_MTE. Heap
allocators could also ignore _MTAG_ENABLE since PROT_MTE doesn't have an
effect if no tag checking is in place. This way we can probably mix
objects as long as we have a control.

So, in summary, I think we can get away with only issuing the prctl() in
the dynamic loader before any threads start and using PROT_MTE later at
run-time, multi-threaded, as needed by malloc(), dlopen etc.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation
  2020-08-20 17:27                         ` Paul Eggert
@ 2020-08-22 11:31                           ` Catalin Marinas
  0 siblings, 0 replies; 53+ messages in thread
From: Catalin Marinas @ 2020-08-22 11:31 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Szabolcs Nagy, linux-arch, Richard Earnshaw, nd, libc-alpha,
	Will Deacon, Andrey Konovalov, Kevin Brodsky, linux-mm,
	Matthew Malcomson, Andrew Morton, Vincenzo Frascino,
	Peter Collingbourne, Dave Martin, linux-arm-kernel

On Thu, Aug 20, 2020 at 10:27:43AM -0700, Paul Eggert wrote:
> On 8/20/20 9:43 AM, Szabolcs Nagy wrote:
> > the compat issue with this is existing code
> > using pointer top bits which i assume faults
> > when dereferenced with the mte checks enabled.
> > (although this should be very rare since
> > top byte ignore on deref is aarch64 specific.)
> 
> Does anyone know of significant aarch64-specific application code that
> depends on top byte ignore? I would think it's so rare (nonexistent?) as to
> not be worth worrying about.

Apart from the LLVM hwasan feature, I'm not aware of code relying on the
top byte ignore. There were discussions in the past to use it with some
JITs but I'm not sure they ever materialised.

I think the Mozilla JS engine uses (used?) additional bits on top of a
pointer but they are masked out before the access.

> Even in the bad old days when Emacs used pointer top bits for typechecking,
> it carefully removed those bits before dereferencing. Any other
> reasonably-portable application would have to do the same of course.

I agree.

-- 
Catalin


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, back to index

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 17:08 [PATCH v7 00/26] arm64: Memory Tagging Extension user-space support Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 01/29] arm64: mte: system register definitions Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 02/29] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 03/29] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 04/29] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 05/29] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 06/29] mm: Add PG_arch_2 page flag Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 07/29] mm: Preserve the PG_arch_2 flag in __split_huge_page_tail() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 08/29] arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 09/29] arm64: mte: Tags-aware copy_{user_,}highpage() implementations Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 10/29] arm64: Avoid unnecessary clear_user_page() indirection Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 11/29] arm64: mte: Tags-aware aware memcmp_pages() implementation Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 12/29] arm64: mte: Handle the MAIR_EL1 changes for late CPU bring-up Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 13/29] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 14/29] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 15/29] mm: Introduce arch_validate_flags() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 16/29] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 17/29] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 18/29] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
2020-07-20 15:30   ` Kevin Brodsky
2020-07-20 17:00     ` Dave Martin
2020-07-22 10:28       ` Catalin Marinas
2020-07-23 19:33       ` Kevin Brodsky
2020-07-22 11:09     ` Catalin Marinas
2020-08-04 19:34   ` Kevin Brodsky
2020-08-05  9:24     ` Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 19/29] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 20/29] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 21/29] arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 22/29] arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
2020-08-13 14:01   ` Luis Machado
2020-08-22 10:56     ` Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 23/29] arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 24/29] fs: Handle intra-page faults in copy_mount_options() Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 25/29] mm: Add arch hooks for saving/restoring tags Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 26/29] arm64: mte: Enable swap of tagged pages Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 27/29] arm64: mte: Save tags when hibernating Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 28/29] arm64: mte: Kconfig entry Catalin Marinas
2020-07-15 17:08 ` [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
2020-07-27 16:36   ` Szabolcs Nagy
2020-07-28 11:08     ` Dave Martin
2020-07-28 14:53       ` Szabolcs Nagy
2020-07-28 19:59         ` Catalin Marinas
2020-08-03 12:43           ` Szabolcs Nagy
2020-08-07 15:19             ` Catalin Marinas
2020-08-10 14:13               ` Szabolcs Nagy
2020-08-11 17:20                 ` Catalin Marinas
2020-08-12 12:45                   ` Szabolcs Nagy
2020-08-19  9:54                     ` Catalin Marinas
2020-08-20 16:43                       ` Szabolcs Nagy
2020-08-20 17:27                         ` Paul Eggert
2020-08-22 11:31                           ` Catalin Marinas
2020-08-22 11:28                         ` Catalin Marinas

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git