linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support
@ 2020-04-21 14:25 Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction Catalin Marinas
                   ` (22 more replies)
  0 siblings, 23 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

Hi,

This is the third version (second version here [1]) of the series
adding user-space support for the ARMv8.5 Memory Tagging Extension ([2],
[3]). The patches are also available on this branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/mte-v3

Changes in this version:

- Ignore asynchronous tag check faults caused by the kernel accessing
  user space (uaccess routines). One of the reasons is that the kernel
  has a few places where it over-reads the user buffer
  (copy_mount_options(), strncpy_from user()) causing an incorrect
  reporting of a tag check fault. The second reason is that such
  asynchronous tag check fault in uaccess was reported as a SIGSEGV
  while the uaccess synchronous faults usually cause an error being
  returned (-EFAULT).

- Device Tree support for marking a memory node as MTE-capable. MTE will
  only be enabled if all the memory nodes support the feature. In the
  absence of any memory node in the DT or if the system is booted with
  ACPI, MTE will be disabled.

- ptrace() support to access the tags in another process address space.
  See the documentation patch for details.

- fs patch for copy_mount_options() to cope with in-page faults. Prior
  to 5.7, this function had a byte-by-byte fallback. It has since been
  updated so that a fault in the first page would lead to the copy being
  aborted altogether.

- GCR_EL1 restoring after a CPU suspend.

- Fix the pgattr_change_is_safe() to only allow changes between Normal
  and Normal-Tagged.

- Update a BUILD_BUG on x86 for NSIGSEGV != 7 following the generic
  value update.

- Rebased to 5.7-rc2.

Swap support is available but not included with this series as we'd like
some feedback from linux-mm folk on the right approach. To be posted
shortly.

Kselftest patches will also be made available soon.

To be decided/implemented:

- mmap(tagged_addr, PROT_MTE) pre-tagging the memory with the tag given
  in the tagged_addr hint.

- coredump (user) to also dump the tags.

[1] https://lore.kernel.org/linux-arm-kernel/20200226180526.3272848-1-catalin.marinas@arm.com/
[2] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety
[3] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Arm_Memory_Tagging_Extension_Whitepaper.pdf

Catalin Marinas (14):
  arm64: alternative: Allow alternative_insn to always issue the first
    instruction
  arm64: mte: Use Normal Tagged attributes for the linear map
  arm64: mte: Assembler macros and default architecture for .S files
  arm64: Tags-aware memcmp_pages() implementation
  arm64: mte: Add PROT_MTE support to mmap() and mprotect()
  mm: Introduce arch_validate_flags()
  arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
  mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
  arm64: mte: Allow user control of the tag check mode via prctl()
  arm64: mte: Allow user control of the generated random tags via
    prctl()
  arm64: mte: Restore the GCR_EL1 register after a suspend
  arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  fs: Allow copy_mount_options() to access user-space in a single pass
  arm64: mte: Check the DT memory nodes for MTE support

Kevin Brodsky (1):
  mm: Introduce arch_calc_vm_flag_bits()

Vincenzo Frascino (8):
  arm64: mte: system register definitions
  arm64: mte: CPU feature detection and initial sysreg configuration
  arm64: mte: Tags-aware clear_page() implementation
  arm64: mte: Tags-aware copy_page() implementation
  arm64: mte: Add specific SIGSEGV codes
  arm64: mte: Handle synchronous and asynchronous tag check faults
  arm64: mte: Kconfig entry
  arm64: mte: Add Memory Tagging Extension documentation

 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/elf_hwcaps.rst            |   5 +
 Documentation/arm64/index.rst                 |   1 +
 .../arm64/memory-tagging-extension.rst        | 260 +++++++++++++++++
 arch/arm64/Kconfig                            |  32 +++
 arch/arm64/boot/dts/arm/fvp-base-revc.dts     |   1 +
 arch/arm64/include/asm/alternative.h          |   8 +-
 arch/arm64/include/asm/assembler.h            |  17 ++
 arch/arm64/include/asm/cpucaps.h              |   4 +-
 arch/arm64/include/asm/cpufeature.h           |   6 +
 arch/arm64/include/asm/hwcap.h                |   1 +
 arch/arm64/include/asm/kvm_arm.h              |   3 +-
 arch/arm64/include/asm/memory.h               |  17 +-
 arch/arm64/include/asm/mman.h                 |  78 ++++++
 arch/arm64/include/asm/mte.h                  |  56 ++++
 arch/arm64/include/asm/page.h                 |   2 +-
 arch/arm64/include/asm/pgtable-prot.h         |   2 +
 arch/arm64/include/asm/pgtable.h              |   7 +-
 arch/arm64/include/asm/processor.h            |   4 +
 arch/arm64/include/asm/sysreg.h               |  62 +++++
 arch/arm64/include/asm/thread_info.h          |   4 +-
 arch/arm64/include/asm/uaccess.h              |  11 +
 arch/arm64/include/uapi/asm/hwcap.h           |   2 +
 arch/arm64/include/uapi/asm/mman.h            |  14 +
 arch/arm64/include/uapi/asm/ptrace.h          |   4 +
 arch/arm64/kernel/Makefile                    |   1 +
 arch/arm64/kernel/cpufeature.c                | 107 +++++++
 arch/arm64/kernel/cpuinfo.c                   |   2 +
 arch/arm64/kernel/entry.S                     |  36 +++
 arch/arm64/kernel/mte.c                       | 262 ++++++++++++++++++
 arch/arm64/kernel/process.c                   |  31 ++-
 arch/arm64/kernel/ptrace.c                    |  17 +-
 arch/arm64/kernel/signal.c                    |   8 +
 arch/arm64/kernel/suspend.c                   |   4 +
 arch/arm64/kernel/syscall.c                   |  10 +
 arch/arm64/lib/Makefile                       |   2 +
 arch/arm64/lib/clear_page.S                   |   7 +-
 arch/arm64/lib/copy_page.S                    |  23 ++
 arch/arm64/lib/mte.S                          |  96 +++++++
 arch/arm64/mm/Makefile                        |   1 +
 arch/arm64/mm/cmppages.c                      |  26 ++
 arch/arm64/mm/dump.c                          |   4 +
 arch/arm64/mm/fault.c                         |   9 +-
 arch/arm64/mm/mmu.c                           |  22 +-
 arch/arm64/mm/proc.S                          |   8 +-
 arch/x86/kernel/signal_compat.c               |   2 +-
 fs/namespace.c                                |   7 +-
 fs/proc/task_mmu.c                            |   4 +
 include/linux/mm.h                            |   8 +
 include/linux/mman.h                          |  22 +-
 include/linux/uaccess.h                       |   8 +
 include/uapi/asm-generic/siginfo.h            |   4 +-
 include/uapi/linux/prctl.h                    |   9 +
 mm/mmap.c                                     |   9 +
 mm/mprotect.c                                 |   6 +
 mm/shmem.c                                    |   3 +
 mm/util.c                                     |   2 +-
 57 files changed, 1332 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/arm64/memory-tagging-extension.rst
 create mode 100644 arch/arm64/include/asm/mman.h
 create mode 100644 arch/arm64/include/asm/mte.h
 create mode 100644 arch/arm64/include/uapi/asm/mman.h
 create mode 100644 arch/arm64/kernel/mte.c
 create mode 100644 arch/arm64/lib/mte.S
 create mode 100644 arch/arm64/mm/cmppages.c



^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-27 16:57   ` Dave Martin
  2020-04-21 14:25 ` [PATCH v3 02/23] arm64: mte: system register definitions Catalin Marinas
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

There are situations where we do not want to disable the whole block
based on a config option, only the alternative part while keeping the
first instruction. Improve the alternative_insn assembler macro to take
a 'first_insn' argument, default 0, to preserve the current behaviour.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/alternative.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 5e5dc05d63a0..67d7cc608336 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -111,7 +111,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 	.byte \alt_len
 .endm
 
-.macro alternative_insn insn1, insn2, cap, enable = 1
+/*
+ * Disable the whole block if enable == 0, unless first_insn == 1 in which
+ * case insn1 will always be issued but without an alternative insn2.
+ */
+.macro alternative_insn insn1, insn2, cap, enable = 1, first_insn = 0
 	.if \enable
 661:	\insn1
 662:	.pushsection .altinstructions, "a"
@@ -122,6 +126,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 664:	.popsection
 	.org	. - (664b-663b) + (662b-661b)
 	.org	. - (662b-661b) + (664b-663b)
+	.elseif \first_insn
+	\insn1
 	.endif
 .endm
 


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 02/23] arm64: mte: system register definitions
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 03/23] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add Memory Tagging Extension system register definitions together with
the relevant bitfields.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Added SET_PSTATE_TCO() macro.

 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/sysreg.h      | 54 ++++++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/ptrace.h |  1 +
 arch/arm64/kernel/ptrace.c           |  2 +-
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 51c1d9918999..8a1cbfd544d6 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -12,6 +12,7 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_ATA		(UL(1) << 56)
 #define HCR_FWB		(UL(1) << 46)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c4ac0ac25a00..e823e93b7429 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -91,10 +91,12 @@
 #define PSTATE_PAN			pstate_field(0, 4)
 #define PSTATE_UAO			pstate_field(0, 3)
 #define PSTATE_SSBS			pstate_field(3, 1)
+#define PSTATE_TCO			pstate_field(3, 4)
 
 #define SET_PSTATE_PAN(x)		__emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_UAO(x)		__emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift))
 #define SET_PSTATE_SSBS(x)		__emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift))
+#define SET_PSTATE_TCO(x)		__emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift))
 
 #define __SYS_BARRIER_INSN(CRm, op2, Rt) \
 	__emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f))
@@ -174,6 +176,8 @@
 #define SYS_SCTLR_EL1			sys_reg(3, 0, 1, 0, 0)
 #define SYS_ACTLR_EL1			sys_reg(3, 0, 1, 0, 1)
 #define SYS_CPACR_EL1			sys_reg(3, 0, 1, 0, 2)
+#define SYS_RGSR_EL1			sys_reg(3, 0, 1, 0, 5)
+#define SYS_GCR_EL1			sys_reg(3, 0, 1, 0, 6)
 
 #define SYS_ZCR_EL1			sys_reg(3, 0, 1, 2, 0)
 
@@ -211,6 +215,8 @@
 #define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
 #define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
 #define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+#define SYS_TFSR_EL1			sys_reg(3, 0, 5, 6, 0)
+#define SYS_TFSRE0_EL1			sys_reg(3, 0, 5, 6, 1)
 
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
@@ -361,6 +367,7 @@
 
 #define SYS_CCSIDR_EL1			sys_reg(3, 1, 0, 0, 0)
 #define SYS_CLIDR_EL1			sys_reg(3, 1, 0, 0, 1)
+#define SYS_GMID_EL1			sys_reg(3, 1, 0, 0, 4)
 #define SYS_AIDR_EL1			sys_reg(3, 1, 0, 0, 7)
 
 #define SYS_CSSELR_EL1			sys_reg(3, 2, 0, 0, 0)
@@ -453,6 +460,7 @@
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
+#define SYS_TFSR_EL2			sys_reg(3, 4, 5, 6, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
 #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
@@ -509,6 +517,7 @@
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
+#define SYS_TFSR_EL12			sys_reg(3, 5, 5, 6, 0)
 #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
 #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
 #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
@@ -524,6 +533,15 @@
 
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
+#define SCTLR_ELx_ATA	(BIT(43))
+
+#define SCTLR_ELx_TCF_SHIFT	40
+#define SCTLR_ELx_TCF_NONE	(UL(0x0) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_SYNC	(UL(0x1) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_ASYNC	(UL(0x2) << SCTLR_ELx_TCF_SHIFT)
+#define SCTLR_ELx_TCF_MASK	(UL(0x3) << SCTLR_ELx_TCF_SHIFT)
+
+#define SCTLR_ELx_ITFSB	(BIT(37))
 #define SCTLR_ELx_ENIA	(BIT(31))
 #define SCTLR_ELx_ENIB	(BIT(30))
 #define SCTLR_ELx_ENDA	(BIT(27))
@@ -552,6 +570,14 @@
 #endif
 
 /* SCTLR_EL1 specific flags. */
+#define SCTLR_EL1_ATA0		(BIT(42))
+
+#define SCTLR_EL1_TCF0_SHIFT	38
+#define SCTLR_EL1_TCF0_NONE	(UL(0x0) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_SYNC	(UL(0x1) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_ASYNC	(UL(0x2) << SCTLR_EL1_TCF0_SHIFT)
+#define SCTLR_EL1_TCF0_MASK	(UL(0x3) << SCTLR_EL1_TCF0_SHIFT)
+
 #define SCTLR_EL1_UCI		(BIT(26))
 #define SCTLR_EL1_E0E		(BIT(24))
 #define SCTLR_EL1_SPAN		(BIT(23))
@@ -586,6 +612,7 @@
 #define MAIR_ATTR_DEVICE_GRE		UL(0x0c)
 #define MAIR_ATTR_NORMAL_NC		UL(0x44)
 #define MAIR_ATTR_NORMAL_WT		UL(0xbb)
+#define MAIR_ATTR_NORMAL_TAGGED		UL(0xf0)
 #define MAIR_ATTR_NORMAL		UL(0xff)
 #define MAIR_ATTR_MASK			UL(0xff)
 
@@ -660,11 +687,16 @@
 
 /* id_aa64pfr1 */
 #define ID_AA64PFR1_SSBS_SHIFT		4
+#define ID_AA64PFR1_MTE_SHIFT		8
 
 #define ID_AA64PFR1_SSBS_PSTATE_NI	0
 #define ID_AA64PFR1_SSBS_PSTATE_ONLY	1
 #define ID_AA64PFR1_SSBS_PSTATE_INSNS	2
 
+#define ID_AA64PFR1_MTE_NI		0x0
+#define ID_AA64PFR1_MTE_EL0		0x1
+#define ID_AA64PFR1_MTE			0x2
+
 /* id_aa64zfr0 */
 #define ID_AA64ZFR0_F64MM_SHIFT		56
 #define ID_AA64ZFR0_F32MM_SHIFT		52
@@ -822,6 +854,28 @@
 #define CPACR_EL1_ZEN_EL0EN	(BIT(17)) /* enable EL0 access, if EL1EN set */
 #define CPACR_EL1_ZEN		(CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN)
 
+/* TCR EL1 Bit Definitions */
+#define SYS_TCR_EL1_TCMA1	(BIT(58))
+#define SYS_TCR_EL1_TCMA0	(BIT(57))
+
+/* GCR_EL1 Definitions */
+#define SYS_GCR_EL1_RRND	(BIT(16))
+#define SYS_GCR_EL1_EXCL_MASK	0xffffUL
+
+/* RGSR_EL1 Definitions */
+#define SYS_RGSR_EL1_TAG_MASK	0xfUL
+#define SYS_RGSR_EL1_SEED_SHIFT	8
+#define SYS_RGSR_EL1_SEED_MASK	0xffffUL
+
+/* GMID_EL1 field definitions */
+#define SYS_GMID_EL1_BS_SHIFT	0
+#define SYS_GMID_EL1_BS_SIZE	4
+
+/* TFSR{,E0}_EL1 bit definitions */
+#define SYS_TFSR_EL1_TF0_SHIFT	0
+#define SYS_TFSR_EL1_TF1_SHIFT	1
+#define SYS_TFSR_EL1_TF0	(UL(1) << SYS_TFSR_EL1_TF0_SHIFT)
+#define SYS_TFSR_EL1_TF1	(UK(2) << SYS_TFSR_EL1_TF1_SHIFT)
 
 /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */
 #define SYS_MPIDR_SAFE_VAL	(BIT(31))
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index d1bb5b69f1ce..1daf6dda8af0 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -50,6 +50,7 @@
 #define PSR_PAN_BIT	0x00400000
 #define PSR_UAO_BIT	0x00800000
 #define PSR_DIT_BIT	0x01000000
+#define PSR_TCO_BIT	0x02000000
 #define PSR_V_BIT	0x10000000
 #define PSR_C_BIT	0x20000000
 #define PSR_Z_BIT	0x40000000
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index b3d3005d9515..077e352495eb 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1873,7 +1873,7 @@ void syscall_trace_exit(struct pt_regs *regs)
  * We also reserve IL for the kernel; SS is handled dynamically.
  */
 #define SPSR_EL1_AARCH64_RES0_BITS \
-	(GENMASK_ULL(63, 32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \
+	(GENMASK_ULL(63, 32) | GENMASK_ULL(27, 26) | GENMASK_ULL(23, 22) | \
 	 GENMASK_ULL(20, 13) | GENMASK_ULL(11, 10) | GENMASK_ULL(5, 5))
 #define SPSR_EL1_AARCH32_RES0_BITS \
 	(GENMASK_ULL(63, 32) | GENMASK_ULL(22, 22) | GENMASK_ULL(20, 20))


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 03/23] arm64: mte: CPU feature detection and initial sysreg configuration
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 02/23] arm64: mte: system register definitions Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 04/23] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Suzuki K Poulose

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add the cpufeature and hwcap entries to detect the presence of MTE on
the boot CPUs (primary and secondary). Any late secondary CPU not
supporting the feature, if detected during boot, will be parked.

In addition, add the minimum SCTLR_EL1 and HCR_EL2 bits for enabling
MTE. Without subsequent setting of MAIR, these bits do not have an
effect on tag checking.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
---
 arch/arm64/include/asm/cpucaps.h    |  4 +++-
 arch/arm64/include/asm/cpufeature.h |  6 ++++++
 arch/arm64/include/asm/hwcap.h      |  1 +
 arch/arm64/include/asm/kvm_arm.h    |  2 +-
 arch/arm64/include/asm/sysreg.h     |  1 +
 arch/arm64/include/uapi/asm/hwcap.h |  2 ++
 arch/arm64/kernel/cpufeature.c      | 30 +++++++++++++++++++++++++++++
 arch/arm64/kernel/cpuinfo.c         |  2 ++
 8 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8eb5a088ae65..4731ebacff54 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -61,7 +61,9 @@
 #define ARM64_HAS_AMU_EXTN			51
 #define ARM64_HAS_ADDRESS_AUTH			52
 #define ARM64_HAS_GENERIC_AUTH			53
+/* 54 reserved for ARM64_BTI */
+#define ARM64_MTE				55
 
-#define ARM64_NCAPS				54
+#define ARM64_NCAPS				56
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index afe08251ff95..afc315814563 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -674,6 +674,12 @@ static inline bool system_uses_irq_prio_masking(void)
 	       cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING);
 }
 
+static inline bool system_supports_mte(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_MTE) &&
+		cpus_have_const_cap(ARM64_MTE);
+}
+
 static inline bool system_has_prio_mask_debugging(void)
 {
 	return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) &&
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
index 0f00265248b5..8b302c88cfeb 100644
--- a/arch/arm64/include/asm/hwcap.h
+++ b/arch/arm64/include/asm/hwcap.h
@@ -94,6 +94,7 @@
 #define KERNEL_HWCAP_BF16		__khwcap2_feature(BF16)
 #define KERNEL_HWCAP_DGH		__khwcap2_feature(DGH)
 #define KERNEL_HWCAP_RNG		__khwcap2_feature(RNG)
+#define KERNEL_HWCAP_MTE		__khwcap2_feature(MTE)
 
 /*
  * This yields a mask that user programs can use to figure out what
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8a1cbfd544d6..6c3b2fc922bb 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -78,7 +78,7 @@
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK)
+#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index e823e93b7429..86236ae6c4e7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -604,6 +604,7 @@
 			 SCTLR_EL1_SA0  | SCTLR_EL1_SED  | SCTLR_ELx_I    |\
 			 SCTLR_EL1_DZE  | SCTLR_EL1_UCT                   |\
 			 SCTLR_EL1_NTWE | SCTLR_ELx_IESB | SCTLR_EL1_SPAN |\
+			 SCTLR_ELx_ITFSB| SCTLR_ELx_ATA  | SCTLR_EL1_ATA0 |\
 			 ENDIAN_SET_EL1 | SCTLR_EL1_UCI  | SCTLR_EL1_RES1)
 
 /* MAIR_ELx memory attributes (used by Linux) */
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 7752d93bb50f..73ac5aede18c 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -73,5 +73,7 @@
 #define HWCAP2_BF16		(1 << 14)
 #define HWCAP2_DGH		(1 << 15)
 #define HWCAP2_RNG		(1 << 16)
+/* bit 17 reserved for HWCAP2_BTI */
+#define HWCAP2_MTE		(1 << 18)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb..512a8b24c5df 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -182,6 +182,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI),
+	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
+		       FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI),
 	ARM64_FTR_END,
 };
 
@@ -1409,6 +1411,18 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
 }
 #endif
 
+#ifdef CONFIG_ARM64_MTE
+static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
+{
+	/* all non-zero tags excluded by default */
+	write_sysreg_s(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK, SYS_GCR_EL1);
+	write_sysreg_s(0, SYS_TFSR_EL1);
+	write_sysreg_s(0, SYS_TFSRE0_EL1);
+
+	isb();
+}
+#endif /* CONFIG_ARM64_MTE */
+
 /* Internal helper functions to match cpu capability type */
 static bool
 cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@@ -1779,6 +1793,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.min_field_value = 1,
 	},
 #endif
+#ifdef CONFIG_ARM64_MTE
+	{
+		.desc = "Memory Tagging Extension",
+		.capability = ARM64_MTE,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR1_EL1,
+		.field_pos = ID_AA64PFR1_MTE_SHIFT,
+		.min_field_value = ID_AA64PFR1_MTE,
+		.sign = FTR_UNSIGNED,
+		.cpu_enable = cpu_enable_mte,
+	},
+#endif /* CONFIG_ARM64_MTE */
 	{},
 };
 
@@ -1892,6 +1919,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
 	HWCAP_MULTI_CAP(ptr_auth_hwcap_addr_matches, CAP_HWCAP, KERNEL_HWCAP_PACA),
 	HWCAP_MULTI_CAP(ptr_auth_hwcap_gen_matches, CAP_HWCAP, KERNEL_HWCAP_PACG),
 #endif
+#ifdef CONFIG_ARM64_MTE
+	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
+#endif /* CONFIG_ARM64_MTE */
 	{},
 };
 
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 86136075ae41..d14b29de2c73 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -92,6 +92,8 @@ static const char *const hwcap_str[] = {
 	"bf16",
 	"dgh",
 	"rng",
+	"",		/* reserved for BTI */
+	"mte",
 	NULL
 };
 


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 04/23] arm64: mte: Use Normal Tagged attributes for the linear map
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (2 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 03/23] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 05/23] arm64: mte: Assembler macros and default architecture for .S files Catalin Marinas
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Suzuki K Poulose

Once user space is given access to tagged memory, the kernel must be
able to clear/save/restore tags visible to the user. This is done via
the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED
index for MAIR_EL1 is initially mapped as Normal memory and later
changed to Normal Tagged via the cpufeature infrastructure. From a
mismatched attribute aliases perspective, the Tagged memory is
considered a permission and it won't lead to undefined behaviour.

The empty_zero_page is cleared to ensure that the tags it contains are
already zeroed. The actual tags-aware clear_page() implementation is
part of a subsequent patch.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
---

Notes:
    v3:
    - Restrict the safe attribute change in pgattr_change_is_safe() only to
      Normal to/from Normal-Tagged (old version allow any other type as long
      as old or new was Normal(-Tagged)).

 arch/arm64/include/asm/memory.h       |  1 +
 arch/arm64/include/asm/pgtable-prot.h |  2 ++
 arch/arm64/kernel/cpufeature.c        | 30 +++++++++++++++++++++++++++
 arch/arm64/mm/dump.c                  |  4 ++++
 arch/arm64/mm/mmu.c                   | 22 ++++++++++++++++++--
 arch/arm64/mm/proc.S                  |  8 +++++--
 6 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index a1871bb32bb1..472c77a68225 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -136,6 +136,7 @@
 #define MT_NORMAL_NC		3
 #define MT_NORMAL		4
 #define MT_NORMAL_WT		5
+#define MT_NORMAL_TAGGED	6
 
 /*
  * Memory types for Stage-2 translation
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 1305e28225fc..9c924b09d5c8 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -39,6 +39,7 @@ extern bool arm64_use_ng_mappings;
 #define PROT_NORMAL_NC		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_NC))
 #define PROT_NORMAL_WT		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_WT))
 #define PROT_NORMAL		(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL))
+#define PROT_NORMAL_TAGGED	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_NORMAL_TAGGED))
 
 #define PROT_SECT_DEVICE_nGnRE	(PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_DEVICE_nGnRE))
 #define PROT_SECT_NORMAL	(PROT_SECT_DEFAULT | PMD_SECT_PXN | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
@@ -48,6 +49,7 @@ extern bool arm64_use_ng_mappings;
 #define _HYP_PAGE_DEFAULT	_PAGE_DEFAULT
 
 #define PAGE_KERNEL		__pgprot(PROT_NORMAL)
+#define PAGE_KERNEL_TAGGED	__pgprot(PROT_NORMAL_TAGGED)
 #define PAGE_KERNEL_RO		__pgprot((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
 #define PAGE_KERNEL_ROX		__pgprot((PROT_NORMAL & ~(PTE_WRITE | PTE_PXN)) | PTE_RDONLY)
 #define PAGE_KERNEL_EXEC	__pgprot(PROT_NORMAL & ~PTE_PXN)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 512a8b24c5df..d2fe8ff72324 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1414,13 +1414,43 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
 #ifdef CONFIG_ARM64_MTE
 static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 {
+	u64 mair;
+
 	/* all non-zero tags excluded by default */
 	write_sysreg_s(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK, SYS_GCR_EL1);
 	write_sysreg_s(0, SYS_TFSR_EL1);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 
+	/*
+	 * Update the MT_NORMAL_TAGGED index in MAIR_EL1. Tag checking is
+	 * disabled for the kernel, so there won't be any observable effect
+	 * other than allowing the kernel to read and write tags.
+	 */
+	mair = read_sysreg_s(SYS_MAIR_EL1);
+	mair &= ~MAIR_ATTRIDX(MAIR_ATTR_MASK, MT_NORMAL_TAGGED);
+	mair |= MAIR_ATTRIDX(MAIR_ATTR_NORMAL_TAGGED, MT_NORMAL_TAGGED);
+	write_sysreg_s(mair, SYS_MAIR_EL1);
+
 	isb();
 }
+
+static int __init system_enable_mte(void)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	/* Ensure the TLB does not have stale MAIR attributes */
+	flush_tlb_all();
+
+	/*
+	 * Clear the zero page (again) so that tags are reset. This needs to
+	 * be done via the linear map which has the Tagged attribute.
+	 */
+	clear_page(lm_alias(empty_zero_page));
+
+	return 0;
+}
+core_initcall(system_enable_mte);
 #endif /* CONFIG_ARM64_MTE */
 
 /* Internal helper functions to match cpu capability type */
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 860c00ec8bd3..416a2404ac83 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -165,6 +165,10 @@ static const struct prot_bits pte_bits[] = {
 		.mask	= PTE_ATTRINDX_MASK,
 		.val	= PTE_ATTRINDX(MT_NORMAL),
 		.set	= "MEM/NORMAL",
+	}, {
+		.mask	= PTE_ATTRINDX_MASK,
+		.val	= PTE_ATTRINDX(MT_NORMAL_TAGGED),
+		.set	= "MEM/NORMAL-TAGGED",
 	}
 };
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a374e4f51a62..37bb5b19bdf4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -121,7 +121,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 	 * The following mapping attributes may be updated in live
 	 * kernel mappings without the need for break-before-make.
 	 */
-	static const pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
+	pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG;
 
 	/* creating or taking down mappings is always safe */
 	if (old == 0 || new == 0)
@@ -135,6 +135,19 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 	if (old & ~new & PTE_NG)
 		return false;
 
+	if (system_supports_mte()) {
+		/*
+		 * Changing the memory type between Normal and Normal-Tagged
+		 * is safe since Tagged is considered a permission attribute
+		 * from the mismatched attribute aliases perspective.
+		 */
+		if (((old & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL) ||
+		     (old & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_TAGGED)) &&
+		    ((new & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL) ||
+		     (new & PTE_ATTRINDX_MASK) == PTE_ATTRINDX(MT_NORMAL_TAGGED)))
+			mask |= PTE_ATTRINDX_MASK;
+	}
+
 	return ((old ^ new) & ~mask) == 0;
 }
 
@@ -489,7 +502,12 @@ static void __init map_mem(pgd_t *pgdp)
 		if (memblock_is_nomap(reg))
 			continue;
 
-		__map_memblock(pgdp, start, end, PAGE_KERNEL, flags);
+		/*
+		 * The linear map must allow allocation tags reading/writing
+		 * if MTE is present. Otherwise, it has the same attributes as
+		 * PAGE_KERNEL.
+		 */
+		__map_memblock(pgdp, start, end, PAGE_KERNEL_TAGGED, flags);
 	}
 
 	/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 197a9ba2d5ea..48891095eaed 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -44,14 +44,18 @@
 #define TCR_KASAN_FLAGS 0
 #endif
 
-/* Default MAIR_EL1 */
+/*
+ * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and
+ * changed later to Normal Tagged if the system supports MTE.
+ */
 #define MAIR_EL1_SET							\
 	(MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |	\
 	 MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |	\
 	 MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |		\
 	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |		\
 	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |			\
-	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT))
+	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |		\
+	 MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
 
 #ifdef CONFIG_CPU_PM
 /**


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 05/23] arm64: mte: Assembler macros and default architecture for .S files
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (3 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 04/23] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 06/23] arm64: mte: Tags-aware clear_page() implementation Catalin Marinas
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

Add the multitag_transfer_size macro to the arm64 assembler.h, together
with '.arch armv8.5-a+memtag' when CONFIG_ARM64_MTE is enabled.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    This patch may change as ".arch armv8.5-a" will be required for other
    features like BTI.
    
    v2:
    - Separate .arch armv8.5-a from .arch_extension memtag.

 arch/arm64/include/asm/assembler.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 0bff325117b4..e7338e129dfd 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -21,8 +21,14 @@
 #include <asm/page.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/ptrace.h>
+#include <asm/sysreg.h>
 #include <asm/thread_info.h>
 
+#ifdef CONFIG_ARM64_MTE
+	.arch		armv8.5-a
+	.arch_extension memtag
+#endif
+
 	.macro save_and_disable_daif, flags
 	mrs	\flags, daif
 	msr	daifset, #0xf
@@ -736,4 +742,15 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+/*
+ * multitag_transfer_size - set \reg to the block size that is accessed by the
+ * LDGM/STGM instructions.
+ */
+	.macro	multitag_transfer_size, reg, tmp
+	mrs_s	\reg, SYS_GMID_EL1
+	ubfx	\reg, \reg, #SYS_GMID_EL1_BS_SHIFT, #SYS_GMID_EL1_BS_SIZE
+	mov	\tmp, #4
+	lsl	\reg, \tmp, \reg
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 06/23] arm64: mte: Tags-aware clear_page() implementation
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (4 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 05/23] arm64: mte: Assembler macros and default architecture for .S files Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 07/23] arm64: mte: Tags-aware copy_page() implementation Catalin Marinas
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

When the Memory Tagging Extension is enabled, the tags need to be set to
zero a page is cleared as they are visible to the user.

Introduce an MTE-aware clear_page() which clears the tags in addition to
data.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/lib/clear_page.S | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..9f85a4cf9568 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -5,7 +5,9 @@
 
 #include <linux/linkage.h>
 #include <linux/const.h>
+#include <asm/alternative.h>
 #include <asm/assembler.h>
+#include <asm/cpufeature.h>
 #include <asm/page.h>
 
 /*
@@ -19,8 +21,9 @@ SYM_FUNC_START(clear_page)
 	and	w1, w1, #0xf
 	mov	x2, #4
 	lsl	x1, x2, x1
-
-1:	dc	zva, x0
+1:
+alternative_insn "dc zva, x0", "stzgm xzr, [x0]", \
+			 ARM64_MTE, IS_ENABLED(CONFIG_ARM64_MTE), 1
 	add	x0, x0, x1
 	tst	x0, #(PAGE_SIZE - 1)
 	b.ne	1b


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 07/23] arm64: mte: Tags-aware copy_page() implementation
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (5 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 06/23] arm64: mte: Tags-aware clear_page() implementation Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 08/23] arm64: Tags-aware memcmp_pages() implementation Catalin Marinas
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

When the Memory Tagging Extension is enabled, the tags need to be
preserved across page copy (e.g. for copy-on-write).

Introduce MTE-aware copy_page() which preserves the tags across page
copy.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/lib/copy_page.S | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..c3234175efe0 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -25,6 +25,29 @@ alternative_if ARM64_HAS_NO_HW_PREFETCH
 	prfm	pldl1strm, [x1, #384]
 alternative_else_nop_endif
 
+#ifdef CONFIG_ARM64_MTE
+alternative_if_not ARM64_MTE
+	b	2f
+alternative_else_nop_endif
+	/*
+	 * Copy tags if MTE has been enabled.
+	 */
+	mov	x2, x0
+	mov	x3, x1
+
+	multitag_transfer_size x7, x5
+1:
+	ldgm	x4, [x3]
+	stgm	x4, [x2]
+
+	add	x2, x2, x7
+	add	x3, x3, x7
+
+	tst	x2, #(PAGE_SIZE - 1)
+	b.ne	1b
+2:
+#endif
+
 	ldp	x2, x3, [x1]
 	ldp	x4, x5, [x1, #16]
 	ldp	x6, x7, [x1, #32]


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 08/23] arm64: Tags-aware memcmp_pages() implementation
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (6 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 07/23] arm64: mte: Tags-aware copy_page() implementation Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 09/23] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.

Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which takes care of the tags comparison.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mte.h | 11 +++++++++
 arch/arm64/lib/Makefile      |  2 ++
 arch/arm64/lib/mte.S         | 46 ++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/Makefile       |  1 +
 arch/arm64/mm/cmppages.c     | 26 ++++++++++++++++++++
 mm/util.c                    |  2 +-
 6 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/mte.h
 create mode 100644 arch/arm64/lib/mte.S
 create mode 100644 arch/arm64/mm/cmppages.c

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
new file mode 100644
index 000000000000..64e814273659
--- /dev/null
+++ b/arch/arm64/include/asm/mte.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MTE_H
+#define __ASM_MTE_H
+
+#ifndef __ASSEMBLY__
+
+/* Memory Tagging API */
+int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
+
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_MTE_H  */
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 2fc253466dbf..d31e1169d9b8 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -16,3 +16,5 @@ lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 obj-$(CONFIG_CRC32) += crc32.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+
+obj-$(CONFIG_ARM64_MTE) += mte.o
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
new file mode 100644
index 000000000000..bd51ea7e2fcb
--- /dev/null
+++ b/arch/arm64/lib/mte.S
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+#include <linux/linkage.h>
+
+#include <asm/assembler.h>
+
+/*
+ * Compare tags of two pages
+ *   x0 - page1 address
+ *   x1 - page2 address
+ * Returns:
+ *   w0 - negative, zero or positive value if the tag in the first page is
+ *	  less than, equal to or greater than the tag in the second page
+ */
+SYM_FUNC_START(mte_memcmp_pages)
+	multitag_transfer_size x7, x5
+1:
+	ldgm	x2, [x0]
+	ldgm	x3, [x1]
+
+	eor	x4, x2, x3
+	cbnz	x4, 2f
+
+	add	x0, x0, x7
+	add	x1, x1, x7
+
+	tst	x0, #(PAGE_SIZE - 1)
+	b.ne	1b
+
+	mov	w0, #0
+	ret
+2:
+	rbit	x4, x4
+	clz	x4, x4			// count the least significant equal bits
+	and	x4, x4, #~3		// round down to a multiple of 4 (bits per tag)
+
+	lsr	x2, x2, x4		// remove equal tags
+	lsr	x3, x3, x4
+
+	lsl	w2, w2, #28		// compare the differing tags
+	sub	w0, w2, w3, lsl #28
+
+	ret
+SYM_FUNC_END(mte_memcmp_pages)
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index d91030f0ffee..e93d696295d0 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PTDUMP_CORE)	+= dump.o
 obj-$(CONFIG_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
+obj-$(CONFIG_ARM64_MTE)		+= cmppages.o
 KASAN_SANITIZE_physaddr.o	+= n
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
diff --git a/arch/arm64/mm/cmppages.c b/arch/arm64/mm/cmppages.c
new file mode 100644
index 000000000000..943c1877e014
--- /dev/null
+++ b/arch/arm64/mm/cmppages.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+
+#include <linux/mm.h>
+#include <linux/string.h>
+
+#include <asm/cpufeature.h>
+#include <asm/mte.h>
+
+int memcmp_pages(struct page *page1, struct page *page2)
+{
+	char *addr1, *addr2;
+	int ret;
+
+	addr1 = page_address(page1);
+	addr2 = page_address(page2);
+
+	ret = memcmp(addr1, addr2, PAGE_SIZE);
+	/* if page content identical, check the tags */
+	if (ret == 0 && system_supports_mte())
+		ret = mte_memcmp_pages(addr1, addr2);
+
+	return ret;
+}
diff --git a/mm/util.c b/mm/util.c
index 988d11e6c17c..662fb3da6d01 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -899,7 +899,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen)
 	return res;
 }
 
-int memcmp_pages(struct page *page1, struct page *page2)
+int __weak memcmp_pages(struct page *page1, struct page *page2)
 {
 	char *addr1, *addr2;
 	int ret;


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 09/23] arm64: mte: Add specific SIGSEGV codes
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (7 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 08/23] arm64: Tags-aware memcmp_pages() implementation Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Arnd Bergmann

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add MTE-specific SIGSEGV codes to siginfo.h and update the x86
BUILD_BUG_ON(NSIGSEGV != 7) compile check.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[catalin.marinas@arm.com: renamed precise/imprecise to sync/async]
[catalin.marinas@arm.com: dropped #ifdef __aarch64__, renumbered]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Fixed the BUILD_BUG_ON(NSIGSEGV != 7) on x86
    - Updated the commit log
    
    v2:
    - Dropped the #ifdef __aarch64__.
    - Renumbered the SEGV_MTE* values to avoid clash with ADI.

 arch/x86/kernel/signal_compat.c    | 2 +-
 include/uapi/asm-generic/siginfo.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index 9ccbf0576cd0..a7f3e12cfbdb 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void)
 	 */
 	BUILD_BUG_ON(NSIGILL  != 11);
 	BUILD_BUG_ON(NSIGFPE  != 15);
-	BUILD_BUG_ON(NSIGSEGV != 7);
+	BUILD_BUG_ON(NSIGSEGV != 9);
 	BUILD_BUG_ON(NSIGBUS  != 5);
 	BUILD_BUG_ON(NSIGTRAP != 5);
 	BUILD_BUG_ON(NSIGCHLD != 6);
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index cb3d6c267181..7aacf9389010 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -229,7 +229,9 @@ typedef struct siginfo {
 #define SEGV_ACCADI	5	/* ADI not enabled for mapped object */
 #define SEGV_ADIDERR	6	/* Disrupting MCD error */
 #define SEGV_ADIPERR	7	/* Precise MCD exception */
-#define NSIGSEGV	7
+#define SEGV_MTEAERR	8	/* Asynchronous ARM MTE error */
+#define SEGV_MTESERR	9	/* Synchronous ARM MTE exception */
+#define NSIGSEGV	9
 
 /*
  * SIGBUS si_codes


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (8 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 09/23] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-23 10:38   ` Catalin Marinas
  2020-04-27 16:58   ` Dave Martin
  2020-04-21 14:25 ` [PATCH v3 11/23] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
                   ` (12 subsequent siblings)
  22 siblings, 2 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:

1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.

Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().

On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Asynchronous tag check faults during the uaccess routines in the
      kernel are ignored.
    - Fix check_mte_async_tcf calling site as it expects the first argument
      to be the thread flags.
    - Move the mte_thread_switch() definition and call to a later patch as
      this became empty with the removal of async uaccess checking.
    - Add dsb() and clearing of TFSRE0_EL1 in flush_mte_state(), in case
      execve() triggered a asynchronous tag check fault.
    - Clear TIF_MTE_ASYNC_FAULT in arch_dup_task_struct() so that the child
      does not inherit any pending tag fault in the parent.
    
    v2:
    - Clear PSTATE.TCO on exception entry (automatically set by the hardware).
    - On syscall entry, for asynchronous tag check faults from user space,
      generate the signal early via syscall restarting.
    - Before context switch, save any potential async tag check fault
      generated by the kernel to the TIF flag (this follows an architecture
      update where the uaccess routines use the TCF0 mode).
    - Moved the flush_mte_state() and mte_thread_switch() function to a new
      mte.c file.

 arch/arm64/include/asm/mte.h         | 10 ++++++++
 arch/arm64/include/asm/thread_info.h |  4 +++-
 arch/arm64/kernel/Makefile           |  1 +
 arch/arm64/kernel/entry.S            | 36 ++++++++++++++++++++++++++++
 arch/arm64/kernel/mte.c              | 21 ++++++++++++++++
 arch/arm64/kernel/process.c          |  5 ++++
 arch/arm64/kernel/signal.c           |  8 +++++++
 arch/arm64/kernel/syscall.c          | 10 ++++++++
 arch/arm64/mm/fault.c                |  9 ++++++-
 9 files changed, 102 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kernel/mte.c

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 64e814273659..e9711ea51eb5 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -4,8 +4,18 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/sched.h>
+
 /* Memory Tagging API */
 int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
 
+#ifdef CONFIG_ARM64_MTE
+void flush_mte_state(void);
+#else
+static inline void flush_mte_state(void)
+{
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ASM_MTE_H  */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 512174a8e789..0c6e5523b932 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -63,6 +63,7 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define TIF_FOREIGN_FPSTATE	3	/* CPU's FP state is not current's */
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_FSCHECK		5	/* Check FS is USER_DS on return */
+#define TIF_MTE_ASYNC_FAULT	6	/* MTE Asynchronous Tag Check Fault */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -91,10 +92,11 @@ void arch_release_task_struct(struct task_struct *tsk);
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_SVE		(1 << TIF_SVE)
+#define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
-				 _TIF_UPROBE | _TIF_FSCHECK)
+				 _TIF_UPROBE | _TIF_FSCHECK | _TIF_MTE_ASYNC_FAULT)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 4e5b8ee31442..dbede7a4c5fb 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_CRASH_CORE)		+= crash_core.o
 obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_SSBD)		+= ssbd.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
+obj-$(CONFIG_ARM64_MTE)			+= mte.o
 
 obj-y					+= vdso/ probes/
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32/
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ddcde093c433..3650a0a77ed0 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -145,6 +145,31 @@ alternative_cb_end
 #endif
 	.endm
 
+	/* Check for MTE asynchronous tag check faults */
+	.macro check_mte_async_tcf, flgs, tmp
+#ifdef CONFIG_ARM64_MTE
+alternative_if_not ARM64_MTE
+	b	1f
+alternative_else_nop_endif
+	mrs_s	\tmp, SYS_TFSRE0_EL1
+	tbz	\tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
+	/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
+	orr	\flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
+	str	\flgs, [tsk, #TSK_TI_FLAGS]
+	msr_s	SYS_TFSRE0_EL1, xzr
+1:
+#endif
+	.endm
+
+	/* Clear the MTE asynchronous tag check faults */
+	.macro clear_mte_async_tcf
+#ifdef CONFIG_ARM64_MTE
+alternative_if ARM64_MTE
+	msr_s	SYS_TFSRE0_EL1, xzr
+alternative_else_nop_endif
+#endif
+	.endm
+
 	.macro	kernel_entry, el, regsize = 64
 	.if	\regsize == 32
 	mov	w0, w0				// zero upper 32 bits of x0
@@ -176,6 +201,8 @@ alternative_cb_end
 	ldr	x19, [tsk, #TSK_TI_FLAGS]
 	disable_step_tsk x19, x20
 
+	/* Check for asynchronous tag check faults in user space */
+	check_mte_async_tcf x19, x22
 	apply_ssbd 1, x22, x23
 
 	ptrauth_keys_install_kernel tsk, 1, x20, x22, x23
@@ -244,6 +271,13 @@ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
 	str	x20, [sp, #S_PMR_SAVE]
 alternative_else_nop_endif
 
+	/* Re-enable tag checking (TCO set on exception entry) */
+#ifdef CONFIG_ARM64_MTE
+alternative_if ARM64_MTE
+	SET_PSTATE_TCO(0)
+alternative_else_nop_endif
+#endif
+
 	/*
 	 * Registers that may be useful after this macro is invoked:
 	 *
@@ -744,6 +778,8 @@ work_pending:
 ret_to_user:
 	disable_daif
 	gic_prio_kentry_setup tmp=x3
+	/* Ignore asynchronous tag check faults in the uaccess routines */
+	clear_mte_async_tcf
 	ldr	x1, [tsk, #TSK_TI_FLAGS]
 	and	x2, x1, #_TIF_WORK_MASK
 	cbnz	x2, work_pending
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
new file mode 100644
index 000000000000..032016823957
--- /dev/null
+++ b/arch/arm64/kernel/mte.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+
+#include <linux/thread_info.h>
+
+#include <asm/cpufeature.h>
+#include <asm/mte.h>
+#include <asm/sysreg.h>
+
+void flush_mte_state(void)
+{
+	if (!system_supports_mte())
+		return;
+
+	/* clear any pending asynchronous tag fault */
+	dsb(ish);
+	write_sysreg_s(0, SYS_TFSRE0_EL1);
+	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 56be4cbf771f..740047c9cd13 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -50,6 +50,7 @@
 #include <asm/exec.h>
 #include <asm/fpsimd.h>
 #include <asm/mmu_context.h>
+#include <asm/mte.h>
 #include <asm/processor.h>
 #include <asm/pointer_auth.h>
 #include <asm/stacktrace.h>
@@ -323,6 +324,7 @@ void flush_thread(void)
 	tls_thread_flush();
 	flush_ptrace_hw_breakpoint(current);
 	flush_tagged_addr_state();
+	flush_mte_state();
 }
 
 void release_thread(struct task_struct *dead_task)
@@ -355,6 +357,9 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.sve_state = NULL;
 	clear_tsk_thread_flag(dst, TIF_SVE);
 
+	/* clear any pending asynchronous tag fault raised by the parent */
+	clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
+
 	return 0;
 }
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 339882db5a91..e377d77c065e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -732,6 +732,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
 	regs->regs[29] = (unsigned long)&user->next_frame->fp;
 	regs->pc = (unsigned long)ka->sa.sa_handler;
 
+	/* TCO (Tag Check Override) always cleared for signal handlers */
+	regs->pstate &= ~PSR_TCO_BIT;
+
 	if (ka->sa.sa_flags & SA_RESTORER)
 		sigtramp = ka->sa.sa_restorer;
 	else
@@ -923,6 +926,11 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 			if (thread_flags & _TIF_UPROBE)
 				uprobe_notify_resume(regs);
 
+			if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
+				clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+				force_signal_inject(SIGSEGV, SEGV_MTEAERR, 0);
+			}
+
 			if (thread_flags & _TIF_SIGPENDING)
 				do_signal(regs);
 
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index a12c0c88d345..db25f5d6a07c 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -102,6 +102,16 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	local_daif_restore(DAIF_PROCCTX);
 	user_exit();
 
+	if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
+		/*
+		 * Process the asynchronous tag check fault before the actual
+		 * syscall. do_notify_resume() will send a signal to userspace
+		 * before the syscall is restarted.
+		 */
+		regs->regs[0] = -ERESTARTNOINTR;
+		return;
+	}
+
 	if (has_syscall_work(flags)) {
 		/* set default errno for user-issued syscall(-1) */
 		if (scno == NO_SYSCALL)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c9cedc0432d2..38b59cace3e3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -650,6 +650,13 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 0;
 }
 
+static int do_tag_check_fault(unsigned long addr, unsigned int esr,
+			      struct pt_regs *regs)
+{
+	do_bad_area(addr, esr, regs);
+	return 0;
+}
+
 static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
@@ -668,7 +675,7 @@ static const struct fault_info fault_info[] = {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 17"			},
+	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 11/23] mm: Introduce arch_calc_vm_flag_bits()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (9 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 12/23] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Kevin Brodsky

From: Kevin Brodsky <Kevin.Brodsky@arm.com>

Similarly to arch_calc_vm_prot_bits(), introduce a dummy
arch_calc_vm_flag_bits() invoked from calc_vm_flag_bits(). This macro
can be overridden by architectures to insert specific VM_* flags derived
from the mmap() MAP_* flags.

Signed-off-by: Kevin Brodsky <Kevin.Brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Updated the comment above arch_calc_vm_prot_bits().
    - Changed author since this patch had already been posted (internally).

 include/linux/mman.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 4b08e9c9c538..15c1162b9d65 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -74,13 +74,17 @@ static inline void vm_unacct_memory(long pages)
 }
 
 /*
- * Allow architectures to handle additional protection bits
+ * Allow architectures to handle additional protection and flag bits
  */
 
 #ifndef arch_calc_vm_prot_bits
 #define arch_calc_vm_prot_bits(prot, pkey) 0
 #endif
 
+#ifndef arch_calc_vm_flag_bits
+#define arch_calc_vm_flag_bits(flags) 0
+#endif
+
 #ifndef arch_vm_get_page_prot
 #define arch_vm_get_page_prot(vm_flags) __pgprot(0)
 #endif
@@ -131,7 +135,8 @@ calc_vm_flag_bits(unsigned long flags)
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
-	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      );
+	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       arch_calc_vm_flag_bits(flags);
 }
 
 unsigned long vm_commit_limit(void);


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 12/23] arm64: mte: Add PROT_MTE support to mmap() and mprotect()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (10 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 11/23] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 13/23] mm: Introduce arch_validate_flags() Catalin Marinas
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

To enable tagging on a memory range, the user must explicitly opt in via
a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
memory type in the AttrIndx field of a pte, simplify the or'ing of these
bits over the protection_map[] attributes by making MT_NORMAL index 0.

There are two conditions for arch_vm_get_page_prot() to return the
MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.

arch_calc_vm_prot_bits() is responsible for registering the user request
as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
mmap() file ops call.

In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
stack or brk area.

The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
will not report an error and the memory will not be mapped as Normal
Tagged. For consistency, mprotect(PROT_MTE) will not report an error
either if the memory range does not support MTE. Two subsequent patches
in the series will propose tightening of this behaviour.

Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Add VM_MTE_ALLOWED to show_smap_vma_flags().

 arch/arm64/include/asm/memory.h    | 18 +++++----
 arch/arm64/include/asm/mman.h      | 64 ++++++++++++++++++++++++++++++
 arch/arm64/include/asm/page.h      |  2 +-
 arch/arm64/include/asm/pgtable.h   |  7 +++-
 arch/arm64/include/uapi/asm/mman.h | 14 +++++++
 fs/proc/task_mmu.c                 |  4 ++
 include/linux/mm.h                 |  8 ++++
 7 files changed, 108 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/include/asm/mman.h
 create mode 100644 arch/arm64/include/uapi/asm/mman.h

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 472c77a68225..770535b7ca35 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -129,14 +129,18 @@
 
 /*
  * Memory types available.
+ *
+ * IMPORTANT: MT_NORMAL must be index 0 since vm_get_page_prot() may 'or' in
+ *	      the MT_NORMAL_TAGGED memory type for PROT_MTE mappings. Note
+ *	      that protection_map[] only contains MT_NORMAL attributes.
  */
-#define MT_DEVICE_nGnRnE	0
-#define MT_DEVICE_nGnRE		1
-#define MT_DEVICE_GRE		2
-#define MT_NORMAL_NC		3
-#define MT_NORMAL		4
-#define MT_NORMAL_WT		5
-#define MT_NORMAL_TAGGED	6
+#define MT_NORMAL		0
+#define MT_NORMAL_TAGGED	1
+#define MT_NORMAL_NC		2
+#define MT_NORMAL_WT		3
+#define MT_DEVICE_nGnRnE	4
+#define MT_DEVICE_nGnRE		5
+#define MT_DEVICE_GRE		6
 
 /*
  * Memory types for Stage-2 translation
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
new file mode 100644
index 000000000000..c77a23869223
--- /dev/null
+++ b/arch/arm64/include/asm/mman.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MMAN_H__
+#define __ASM_MMAN_H__
+
+#include <uapi/asm/mman.h>
+
+/*
+ * There are two conditions required for returning a Normal Tagged memory type
+ * in arch_vm_get_page_prot(): (1) the user requested it via PROT_MTE passed
+ * to mmap() or mprotect() and (2) the corresponding vma supports MTE. We
+ * register (1) as VM_MTE in the vma->vm_flags and (2) as VM_MTE_ALLOWED. Note
+ * that the latter can only be set during the mmap() call since mprotect()
+ * does not accept MAP_* flags.
+ */
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
+						   unsigned long pkey)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	if (prot & PROT_MTE)
+		return VM_MTE;
+
+	return 0;
+}
+#define arch_calc_vm_prot_bits arch_calc_vm_prot_bits
+
+static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	/*
+	 * Only allow MTE on anonymous mappings as these are guaranteed to be
+	 * backed by tags-capable memory. The vm_flags may be overridden by a
+	 * filesystem supporting MTE (RAM-based).
+	 */
+	if (flags & MAP_ANONYMOUS)
+		return VM_MTE_ALLOWED;
+
+	return 0;
+}
+#define arch_calc_vm_flag_bits arch_calc_vm_flag_bits
+
+static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
+{
+	return (vm_flags & VM_MTE) && (vm_flags & VM_MTE_ALLOWED) ?
+		__pgprot(PTE_ATTRINDX(MT_NORMAL_TAGGED)) :
+		__pgprot(0);
+}
+#define arch_vm_get_page_prot arch_vm_get_page_prot
+
+static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
+{
+	unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
+
+	if (system_supports_mte())
+		supported |= PROT_MTE;
+
+	return (prot & ~supported) == 0;
+}
+#define arch_validate_prot arch_validate_prot
+
+#endif /* !__ASM_MMAN_H__ */
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index c01b52add377..673033e0393b 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -36,7 +36,7 @@ extern int pfn_valid(unsigned long);
 
 #endif /* !__ASSEMBLY__ */
 
-#define VM_DATA_DEFAULT_FLAGS	VM_DATA_FLAGS_TSK_EXEC
+#define VM_DATA_DEFAULT_FLAGS	(VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)
 
 #include <asm-generic/getorder.h>
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 538c85e62f86..39a372bf8afc 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -659,8 +659,13 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
+	/*
+	 * Normal and Normal-Tagged are two different memory types and indices
+	 * in MAIR_EL1. The mask below has to include PTE_ATTRINDX_MASK.
+	 */
 	const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
-			      PTE_PROT_NONE | PTE_VALID | PTE_WRITE;
+			      PTE_PROT_NONE | PTE_VALID | PTE_WRITE |
+			      PTE_ATTRINDX_MASK;
 	/* preserve the hardware dirty information */
 	if (pte_hw_dirty(pte))
 		pte = pte_mkdirty(pte);
diff --git a/arch/arm64/include/uapi/asm/mman.h b/arch/arm64/include/uapi/asm/mman.h
new file mode 100644
index 000000000000..d7677ee84878
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/mman.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI__ASM_MMAN_H
+#define _UAPI__ASM_MMAN_H
+
+#include <asm-generic/mman.h>
+
+/*
+ * The generic mman.h file reserves 0x10 and 0x20 for arch-specific PROT_*
+ * flags.
+ */
+/* 0x10 reserved for PROT_BTI */
+#define PROT_MTE	 0x20		/* Normal Tagged mapping */
+
+#endif /* !_UAPI__ASM_MMAN_H */
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8d382d4ec067..2f26112ebb77 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -647,6 +647,10 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_MERGEABLE)]	= "mg",
 		[ilog2(VM_UFFD_MISSING)]= "um",
 		[ilog2(VM_UFFD_WP)]	= "uw",
+#ifdef CONFIG_ARM64_MTE
+		[ilog2(VM_MTE)]		= "mt",
+		[ilog2(VM_MTE_ALLOWED)]	= "",
+#endif
 #ifdef CONFIG_ARCH_HAS_PKEYS
 		/* These come out via ProtectionKey: */
 		[ilog2(VM_PKEY_BIT0)]	= "",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5a323422d783..132ca88e407d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -336,6 +336,14 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MPX		VM_NONE
 #endif
 
+#if defined(CONFIG_ARM64_MTE)
+# define VM_MTE		VM_HIGH_ARCH_0	/* Use Tagged memory for access control */
+# define VM_MTE_ALLOWED	VM_HIGH_ARCH_1	/* Tagged memory permitted */
+#else
+# define VM_MTE		VM_NONE
+# define VM_MTE_ALLOWED	VM_NONE
+#endif
+
 #ifndef VM_GROWSUP
 # define VM_GROWSUP	VM_NONE
 #endif


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 13/23] mm: Introduce arch_validate_flags()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (11 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 12/23] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 14/23] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

Similarly to arch_validate_prot() called from do_mprotect_pkey(), an
architecture may need to sanity-check the new vm_flags.

Define a dummy function always returning true. In addition to
do_mprotect_pkey(), also invoke it from mmap_region() prior to updating
vma->vm_page_prot to allow the architecture code to veto potentially
inconsistent vm_flags.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Some comments updated.

 include/linux/mman.h | 13 +++++++++++++
 mm/mmap.c            |  9 +++++++++
 mm/mprotect.c        |  6 ++++++
 3 files changed, 28 insertions(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 15c1162b9d65..09dd414b81b6 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -103,6 +103,19 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
 #define arch_validate_prot arch_validate_prot
 #endif
 
+#ifndef arch_validate_flags
+/*
+ * This is called from mmap() and mprotect() with the updated vma->vm_flags.
+ *
+ * Returns true if the VM_* flags are valid.
+ */
+static inline bool arch_validate_flags(unsigned long flags)
+{
+	return true;
+}
+#define arch_validate_flags arch_validate_flags
+#endif
+
 /*
  * Optimisation macro.  It is equivalent to:
  *      (x & bit1) ? bit2 : 0
diff --git a/mm/mmap.c b/mm/mmap.c
index f609e9ec4a25..d5fc93c2072e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1792,6 +1792,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		vma_set_anonymous(vma);
 	}
 
+	/* Allow architectures to sanity-check the vm_flags */
+	if (!arch_validate_flags(vma->vm_flags)) {
+		error = -EINVAL;
+		if (file)
+			goto unmap_and_free_vma;
+		else
+			goto free_vma;
+	}
+
 	vma_link(mm, vma, prev, rb_link, rb_parent);
 	/* Once vma denies write, undo our temporary denial count */
 	if (file) {
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 494192ca954b..04b1d2cf0e74 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -603,6 +603,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 			goto out;
 		}
 
+		/* Allow architectures to sanity-check the new flags */
+		if (!arch_validate_flags(newflags)) {
+			error = -EINVAL;
+			goto out;
+		}
+
 		error = security_file_mprotect(vma, reqprot, prot);
 		if (error)
 			goto out;


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 14/23] arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (12 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 13/23] mm: Introduce arch_validate_flags() Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 15/23] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

Make use of the newly introduced arch_validate_flags() hook to
sanity-check the PROT_MTE request passed to mmap() and mprotect(). If
the mapping does not support MTE, these syscalls will return -EINVAL.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/mman.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index c77a23869223..5c356d1ca266 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -44,7 +44,11 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
 
 static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 {
-	return (vm_flags & VM_MTE) && (vm_flags & VM_MTE_ALLOWED) ?
+	/*
+	 * Checking for VM_MTE only is sufficient since arch_validate_flags()
+	 * does not permit (VM_MTE & !VM_MTE_ALLOWED).
+	 */
+	return (vm_flags & VM_MTE) ?
 		__pgprot(PTE_ATTRINDX(MT_NORMAL_TAGGED)) :
 		__pgprot(0);
 }
@@ -61,4 +65,14 @@ static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
 }
 #define arch_validate_prot arch_validate_prot
 
+static inline bool arch_validate_flags(unsigned long flags)
+{
+	if (!system_supports_mte())
+		return true;
+
+	/* only allow VM_MTE if VM_MTE_ALLOWED has been set previously */
+	return !(flags & VM_MTE) || (flags & VM_MTE_ALLOWED);
+}
+#define arch_validate_flags arch_validate_flags
+
 #endif /* !__ASM_MMAN_H__ */


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 15/23] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (13 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 14/23] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 16/23] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

Since arm64 memory (allocation) tags can only be stored in RAM, mapping
files with PROT_MTE is not allowed by default. RAM-based files like
those in a tmpfs mount or memfd_create() can support memory tagging, so
update the vm_flags accordingly in shmem_mmap().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 mm/shmem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index d722eb830317..73754ed7af69 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2221,6 +2221,9 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 			vma->vm_flags &= ~(VM_MAYWRITE);
 	}
 
+	/* arm64 - allow memory tagging on RAM-based files */
+	vma->vm_flags |= VM_MTE_ALLOWED;
+
 	file_accessed(file);
 	vma->vm_ops = &shmem_vm_ops;
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 16/23] arm64: mte: Allow user control of the tag check mode via prctl()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (14 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 15/23] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 17/23] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

By default, even if PROT_MTE is set on a memory range, there is no tag
check fault reporting (SIGSEGV). Introduce a set of option to the
exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
check fault mode:

  PR_MTE_TCF_NONE  - no reporting (default)
  PR_MTE_TCF_SYNC  - synchronous tag check fault reporting
  PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting

These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
context-switched by the kernel. Note that uaccess done by the kernel is
not checked and cannot be configured by the user.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Use SCTLR_EL1_TCF0_NONE instead of 0 for consistency.
    - Move mte_thread_switch() in this patch from an earlier one. In
      addition, it is called after the dsb() in __switch_to() so that any
      asynchronous tag check faults have been registered in the TFSR_EL1
      registers (to be added with the in-kernel MTE support.
    
    v2:
    - Handle SCTLR_EL1_TCF0_NONE explicitly for consistency with PR_MTE_TCF_NONE.
    - Fix SCTLR_EL1 register setting in flush_mte_state() (thanks to Peter
      Collingbourne).
    - Added ISB to update_sctlr_el1_tcf0() since, with the latest
      architecture update/fix, the TCF0 field is used by the uaccess
      routines.

 arch/arm64/include/asm/mte.h       | 14 ++++++
 arch/arm64/include/asm/processor.h |  3 ++
 arch/arm64/kernel/mte.c            | 77 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/process.c        | 26 ++++++++--
 include/uapi/linux/prctl.h         |  6 +++
 5 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index e9711ea51eb5..3dc0a7977124 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -11,10 +11,24 @@ int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
 
 #ifdef CONFIG_ARM64_MTE
 void flush_mte_state(void);
+void mte_thread_switch(struct task_struct *next);
+long set_mte_ctrl(unsigned long arg);
+long get_mte_ctrl(void);
 #else
 static inline void flush_mte_state(void)
 {
 }
+static inline void mte_thread_switch(struct task_struct *next)
+{
+}
+static inline long set_mte_ctrl(unsigned long arg)
+{
+	return 0;
+}
+static inline long get_mte_ctrl(void)
+{
+	return 0;
+}
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 240fe5e5b720..80e7f0573309 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -151,6 +151,9 @@ struct thread_struct {
 	struct ptrauth_keys_user	keys_user;
 	struct ptrauth_keys_kernel	keys_kernel;
 #endif
+#ifdef CONFIG_ARM64_MTE
+	u64			sctlr_tcf0;
+#endif
 };
 
 static inline void arch_thread_struct_whitelist(unsigned long *offset,
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 032016823957..e62d02890d12 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -3,12 +3,34 @@
  * Copyright (C) 2020 ARM Ltd.
  */
 
+#include <linux/prctl.h>
+#include <linux/sched.h>
 #include <linux/thread_info.h>
 
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
 #include <asm/sysreg.h>
 
+static void update_sctlr_el1_tcf0(u64 tcf0)
+{
+	/* ISB required for the kernel uaccess routines */
+	sysreg_clear_set(sctlr_el1, SCTLR_EL1_TCF0_MASK, tcf0);
+	isb();
+}
+
+static void set_sctlr_el1_tcf0(u64 tcf0)
+{
+	/*
+	 * mte_thread_switch() checks current->thread.sctlr_tcf0 as an
+	 * optimisation. Disable preemption so that it does not see
+	 * the variable update before the SCTLR_EL1.TCF0 one.
+	 */
+	preempt_disable();
+	current->thread.sctlr_tcf0 = tcf0;
+	update_sctlr_el1_tcf0(tcf0);
+	preempt_enable();
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
@@ -18,4 +40,59 @@ void flush_mte_state(void)
 	dsb(ish);
 	write_sysreg_s(0, SYS_TFSRE0_EL1);
 	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
+	/* disable tag checking */
+	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
+}
+
+void mte_thread_switch(struct task_struct *next)
+{
+	if (!system_supports_mte())
+		return;
+
+	/* avoid expensive SCTLR_EL1 accesses if no change */
+	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
+		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
+}
+
+long set_mte_ctrl(unsigned long arg)
+{
+	u64 tcf0;
+
+	if (!system_supports_mte())
+		return 0;
+
+	switch (arg & PR_MTE_TCF_MASK) {
+	case PR_MTE_TCF_NONE:
+		tcf0 = SCTLR_EL1_TCF0_NONE;
+		break;
+	case PR_MTE_TCF_SYNC:
+		tcf0 = SCTLR_EL1_TCF0_SYNC;
+		break;
+	case PR_MTE_TCF_ASYNC:
+		tcf0 = SCTLR_EL1_TCF0_ASYNC;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	set_sctlr_el1_tcf0(tcf0);
+
+	return 0;
+}
+
+long get_mte_ctrl(void)
+{
+	if (!system_supports_mte())
+		return 0;
+
+	switch (current->thread.sctlr_tcf0) {
+	case SCTLR_EL1_TCF0_NONE:
+		return PR_MTE_TCF_NONE;
+	case SCTLR_EL1_TCF0_SYNC:
+		return PR_MTE_TCF_SYNC;
+	case SCTLR_EL1_TCF0_ASYNC:
+		return PR_MTE_TCF_ASYNC;
+	}
+
+	return 0;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 740047c9cd13..ff6031a398d0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -529,6 +529,13 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/*
+	 * MTE thread switching must happen after the DSB above to ensure that
+	 * any asynchronous tag check faults have been logged in the TFSR*_EL1
+	 * registers.
+	 */
+	mte_thread_switch(next);
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
@@ -588,9 +595,15 @@ static unsigned int tagged_addr_disabled;
 
 long set_tagged_addr_ctrl(unsigned long arg)
 {
+	unsigned long valid_mask = PR_TAGGED_ADDR_ENABLE;
+
 	if (is_compat_task())
 		return -EINVAL;
-	if (arg & ~PR_TAGGED_ADDR_ENABLE)
+
+	if (system_supports_mte())
+		valid_mask |= PR_MTE_TCF_MASK;
+
+	if (arg & ~valid_mask)
 		return -EINVAL;
 
 	/*
@@ -600,6 +613,9 @@ long set_tagged_addr_ctrl(unsigned long arg)
 	if (arg & PR_TAGGED_ADDR_ENABLE && tagged_addr_disabled)
 		return -EINVAL;
 
+	if (set_mte_ctrl(arg) != 0)
+		return -EINVAL;
+
 	update_thread_flag(TIF_TAGGED_ADDR, arg & PR_TAGGED_ADDR_ENABLE);
 
 	return 0;
@@ -607,13 +623,17 @@ long set_tagged_addr_ctrl(unsigned long arg)
 
 long get_tagged_addr_ctrl(void)
 {
+	long ret = 0;
+
 	if (is_compat_task())
 		return -EINVAL;
 
 	if (test_thread_flag(TIF_TAGGED_ADDR))
-		return PR_TAGGED_ADDR_ENABLE;
+		ret = PR_TAGGED_ADDR_ENABLE;
 
-	return 0;
+	ret |= get_mte_ctrl();
+
+	return ret;
 }
 
 /*
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..2390ab324afa 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -233,6 +233,12 @@ struct prctl_mm_map {
 #define PR_SET_TAGGED_ADDR_CTRL		55
 #define PR_GET_TAGGED_ADDR_CTRL		56
 # define PR_TAGGED_ADDR_ENABLE		(1UL << 0)
+/* MTE tag check fault modes */
+# define PR_MTE_TCF_SHIFT		1
+# define PR_MTE_TCF_NONE		(0UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
+# define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 17/23] arm64: mte: Allow user control of the generated random tags via prctl()
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (15 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 16/23] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-21 14:25 ` [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

The IRG, ADDG and SUBG instructions insert a random tag in the resulting
address. Certain tags can be excluded via the GCR_EL1.Exclude bitmap
when, for example, the user wants a certain colour for freed buffers.
Since the GCR_EL1 register is not accessible at EL0, extend the
prctl(PR_SET_TAGGED_ADDR_CTRL) interface to include a 16-bit field in
the first argument for controlling which tags can be generated by the
above instruction (an include rather than exclude mask). Note that by
default all non-zero tags are excluded. This setting is per-thread.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v2:
    - Switch from an exclude mask to an include one for the prctl()
      interface.
    - Reset the allowed tags mask during flush_thread().

 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/include/asm/sysreg.h    |  7 ++++++
 arch/arm64/kernel/mte.c            | 35 +++++++++++++++++++++++++++---
 arch/arm64/kernel/process.c        |  2 +-
 include/uapi/linux/prctl.h         |  3 +++
 5 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 80e7f0573309..996b882a32d9 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -153,6 +153,7 @@ struct thread_struct {
 #endif
 #ifdef CONFIG_ARM64_MTE
 	u64			sctlr_tcf0;
+	u64			gcr_incl;
 #endif
 };
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 86236ae6c4e7..cb247f2f75ca 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -981,6 +981,13 @@
 		write_sysreg(__scs_new, sysreg);			\
 } while (0)
 
+#define sysreg_clear_set_s(sysreg, clear, set) do {			\
+	u64 __scs_val = read_sysreg_s(sysreg);				\
+	u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set);		\
+	if (__scs_new != __scs_val)					\
+		write_sysreg_s(__scs_new, sysreg);			\
+} while (0)
+
 #endif
 
 #endif	/* __ASM_SYSREG_H */
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index e62d02890d12..212b9fac294d 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -31,6 +31,25 @@ static void set_sctlr_el1_tcf0(u64 tcf0)
 	preempt_enable();
 }
 
+static void update_gcr_el1_excl(u64 incl)
+{
+	u64 excl = ~incl & SYS_GCR_EL1_EXCL_MASK;
+
+	/*
+	 * Note that 'incl' is an include mask (controlled by the user via
+	 * prctl()) while GCR_EL1 accepts an exclude mask.
+	 * No need for ISB since this only affects EL0 currently, implicit
+	 * with ERET.
+	 */
+	sysreg_clear_set_s(SYS_GCR_EL1, SYS_GCR_EL1_EXCL_MASK, excl);
+}
+
+static void set_gcr_el1_excl(u64 incl)
+{
+	current->thread.gcr_incl = incl;
+	update_gcr_el1_excl(incl);
+}
+
 void flush_mte_state(void)
 {
 	if (!system_supports_mte())
@@ -42,6 +61,8 @@ void flush_mte_state(void)
 	clear_thread_flag(TIF_MTE_ASYNC_FAULT);
 	/* disable tag checking */
 	set_sctlr_el1_tcf0(SCTLR_EL1_TCF0_NONE);
+	/* reset tag generation mask */
+	set_gcr_el1_excl(0);
 }
 
 void mte_thread_switch(struct task_struct *next)
@@ -52,6 +73,7 @@ void mte_thread_switch(struct task_struct *next)
 	/* avoid expensive SCTLR_EL1 accesses if no change */
 	if (current->thread.sctlr_tcf0 != next->thread.sctlr_tcf0)
 		update_sctlr_el1_tcf0(next->thread.sctlr_tcf0);
+	update_gcr_el1_excl(next->thread.gcr_incl);
 }
 
 long set_mte_ctrl(unsigned long arg)
@@ -76,23 +98,30 @@ long set_mte_ctrl(unsigned long arg)
 	}
 
 	set_sctlr_el1_tcf0(tcf0);
+	set_gcr_el1_excl((arg & PR_MTE_TAG_MASK) >> PR_MTE_TAG_SHIFT);
 
 	return 0;
 }
 
 long get_mte_ctrl(void)
 {
+	unsigned long ret;
+
 	if (!system_supports_mte())
 		return 0;
 
+	ret = current->thread.gcr_incl << PR_MTE_TAG_SHIFT;
+
 	switch (current->thread.sctlr_tcf0) {
 	case SCTLR_EL1_TCF0_NONE:
 		return PR_MTE_TCF_NONE;
 	case SCTLR_EL1_TCF0_SYNC:
-		return PR_MTE_TCF_SYNC;
+		ret |= PR_MTE_TCF_SYNC;
+		break;
 	case SCTLR_EL1_TCF0_ASYNC:
-		return PR_MTE_TCF_ASYNC;
+		ret |= PR_MTE_TCF_ASYNC;
+		break;
 	}
 
-	return 0;
+	return ret;
 }
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index ff6031a398d0..697571be259b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -601,7 +601,7 @@ long set_tagged_addr_ctrl(unsigned long arg)
 		return -EINVAL;
 
 	if (system_supports_mte())
-		valid_mask |= PR_MTE_TCF_MASK;
+		valid_mask |= PR_MTE_TCF_MASK | PR_MTE_TAG_MASK;
 
 	if (arg & ~valid_mask)
 		return -EINVAL;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 2390ab324afa..7f0827705c9a 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -239,6 +239,9 @@ struct prctl_mm_map {
 # define PR_MTE_TCF_SYNC		(1UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_ASYNC		(2UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_MASK		(3UL << PR_MTE_TCF_SHIFT)
+/* MTE tag inclusion mask */
+# define PR_MTE_TAG_SHIFT		3
+# define PR_MTE_TAG_MASK		(0xffffUL << PR_MTE_TAG_SHIFT)
 
 /* Control reclaim behavior when allocating memory */
 #define PR_SET_IO_FLUSHER		57


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (16 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 17/23] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-23 15:23   ` Lorenzo Pieralisi
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Lorenzo Pieralisi

The CPU resume/suspend routines only take care of the common system
registers. Restore GCR_EL1 in addition via the __cpu_suspend_exit()
function.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
---

Notes:
    New in v3.

 arch/arm64/include/asm/mte.h | 4 ++++
 arch/arm64/kernel/mte.c      | 8 ++++++++
 arch/arm64/kernel/suspend.c  | 4 ++++
 3 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 3dc0a7977124..22eb3e06f311 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -12,6 +12,7 @@ int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
 #ifdef CONFIG_ARM64_MTE
 void flush_mte_state(void);
 void mte_thread_switch(struct task_struct *next);
+void mte_suspend_exit(void);
 long set_mte_ctrl(unsigned long arg);
 long get_mte_ctrl(void);
 #else
@@ -21,6 +22,9 @@ static inline void flush_mte_state(void)
 static inline void mte_thread_switch(struct task_struct *next)
 {
 }
+static inline void mte_suspend_exit(void)
+{
+}
 static inline long set_mte_ctrl(unsigned long arg)
 {
 	return 0;
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 212b9fac294d..fa4a4196b248 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -76,6 +76,14 @@ void mte_thread_switch(struct task_struct *next)
 	update_gcr_el1_excl(next->thread.gcr_incl);
 }
 
+void mte_suspend_exit(void)
+{
+	if (!system_supports_mte())
+		return;
+
+	update_gcr_el1_excl(current->thread.gcr_incl);
+}
+
 long set_mte_ctrl(unsigned long arg)
 {
 	u64 tcf0;
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 9405d1b7f4b0..1d405b73d009 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -9,6 +9,7 @@
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
 #include <asm/exec.h>
+#include <asm/mte.h>
 #include <asm/pgtable.h>
 #include <asm/memory.h>
 #include <asm/mmu_context.h>
@@ -74,6 +75,9 @@ void notrace __cpu_suspend_exit(void)
 	 */
 	if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE)
 		arm64_set_ssbd_mitigation(false);
+
+	/* Restore additional MTE-specific configuration */
+	mte_suspend_exit();
 }
 
 /*


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (17 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
@ 2020-04-21 14:25 ` Catalin Marinas
  2020-04-24 23:28   ` Peter Collingbourne
                     ` (4 more replies)
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
                   ` (3 subsequent siblings)
  22 siblings, 5 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:25 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Alan Hayward, Luis Machado, Omair Javaid

Add support for bulk setting/getting of the MTE tags in a tracee's
address space at 'addr' in the ptrace() syscall prototype. 'data' points
to a struct iovec in the tracer's address space with iov_base
representing the address of a tracer's buffer of length iov_len. The
tags to be copied to/from the tracer's buffer are stored as one tag per
byte.

On successfully copying at least one tag, ptrace() returns 0 and updates
the tracer's iov_len with the number of tags copied. In case of error,
either -EIO or -EFAULT is returned, trying to follow the ptrace() man
page.

Note that the tag copying functions are not performance critical,
therefore they lack optimisations found in typical memory copy routines.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Alan Hayward <Alan.Hayward@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Omair Javaid <omair.javaid@linaro.org>
---

Notes:
    New in v3.

 arch/arm64/include/asm/mte.h         |  17 ++++
 arch/arm64/include/uapi/asm/ptrace.h |   3 +
 arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
 arch/arm64/kernel/ptrace.c           |  15 +++-
 arch/arm64/lib/mte.S                 |  50 +++++++++++
 5 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 22eb3e06f311..0ca2aaff07a1 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -2,12 +2,21 @@
 #ifndef __ASM_MTE_H
 #define __ASM_MTE_H
 
+#define MTE_ALLOC_SIZE	UL(16)
+#define MTE_ALLOC_MASK	(~(MTE_ALLOC_SIZE - 1))
+#define MTE_TAG_SHIFT	(56)
+#define MTE_TAG_SIZE	(4)
+
 #ifndef __ASSEMBLY__
 
 #include <linux/sched.h>
 
 /* Memory Tagging API */
 int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
+unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
+				      unsigned long n);
+unsigned long mte_copy_tags_to_user(void __user *to, void *from,
+				    unsigned long n);
 
 #ifdef CONFIG_ARM64_MTE
 void flush_mte_state(void);
@@ -15,6 +24,8 @@ void mte_thread_switch(struct task_struct *next);
 void mte_suspend_exit(void);
 long set_mte_ctrl(unsigned long arg);
 long get_mte_ctrl(void);
+int mte_ptrace_copy_tags(struct task_struct *child, long request,
+			 unsigned long addr, unsigned long data);
 #else
 static inline void flush_mte_state(void)
 {
@@ -33,6 +44,12 @@ static inline long get_mte_ctrl(void)
 {
 	return 0;
 }
+static inline int mte_ptrace_copy_tags(struct task_struct *child,
+				       long request, unsigned long addr,
+				       unsigned long data)
+{
+	return -EIO;
+}
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 1daf6dda8af0..cd2a4a164de3 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -67,6 +67,9 @@
 /* syscall emulation path in ptrace */
 #define PTRACE_SYSEMU		  31
 #define PTRACE_SYSEMU_SINGLESTEP  32
+/* MTE allocation tag access */
+#define PTRACE_PEEKMTETAGS	  33
+#define PTRACE_POKEMTETAGS	  34
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index fa4a4196b248..0cb496ed9bf9 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -3,12 +3,17 @@
  * Copyright (C) 2020 ARM Ltd.
  */
 
+#include <linux/kernel.h>
+#include <linux/mm.h>
 #include <linux/prctl.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/thread_info.h>
+#include <linux/uio.h>
 
 #include <asm/cpufeature.h>
 #include <asm/mte.h>
+#include <asm/ptrace.h>
 #include <asm/sysreg.h>
 
 static void update_sctlr_el1_tcf0(u64 tcf0)
@@ -133,3 +138,125 @@ long get_mte_ctrl(void)
 
 	return ret;
 }
+
+/*
+ * Access MTE tags in another process' address space as given in mm. Update
+ * the number of tags copied. Return 0 if any tags copied, error otherwise.
+ * Inspired by __access_remote_vm().
+ */
+static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
+				unsigned long addr, struct iovec *kiov,
+				unsigned int gup_flags)
+{
+	struct vm_area_struct *vma;
+	void __user *buf = kiov->iov_base;
+	size_t len = kiov->iov_len;
+	int ret;
+	int write = gup_flags & FOLL_WRITE;
+
+	if (down_read_killable(&mm->mmap_sem))
+		return -EIO;
+
+	if (!access_ok(buf, len))
+		return -EFAULT;
+
+	while (len) {
+		unsigned long tags, offset;
+		void *maddr;
+		struct page *page = NULL;
+
+		ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
+					    &page, &vma, NULL);
+		if (ret <= 0)
+			break;
+
+		/* limit access to the end of the page */
+		offset = offset_in_page(addr);
+		tags = min(len, (PAGE_SIZE - offset) / MTE_ALLOC_SIZE);
+
+		maddr = page_address(page);
+		if (write) {
+			tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
+			set_page_dirty_lock(page);
+		} else {
+			tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
+		}
+		put_page(page);
+
+		/* error accessing the tracer's buffer */
+		if (!tags)
+			break;
+
+		len -= tags;
+		buf += tags;
+		addr += tags * MTE_ALLOC_SIZE;
+	}
+	up_read(&mm->mmap_sem);
+
+	/* return an error if no tags copied */
+	kiov->iov_len = buf - kiov->iov_base;
+	if (!kiov->iov_len) {
+		/* check for error accessing the tracee's address space */
+		if (ret <= 0)
+			return -EIO;
+		else
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+/*
+ * Copy MTE tags in another process' address space at 'addr' to/from tracer's
+ * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
+ */
+static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
+			      struct iovec *kiov, unsigned int gup_flags)
+{
+	struct mm_struct *mm;
+	int ret;
+
+	mm = get_task_mm(tsk);
+	if (!mm)
+		return -EPERM;
+
+	if (!tsk->ptrace || (current != tsk->parent) ||
+	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
+	     !ptracer_capable(tsk, mm->user_ns))) {
+		mmput(mm);
+		return -EPERM;
+	}
+
+	ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
+	mmput(mm);
+
+	return ret;
+}
+
+int mte_ptrace_copy_tags(struct task_struct *child, long request,
+			 unsigned long addr, unsigned long data)
+{
+	int ret;
+	struct iovec kiov;
+	struct iovec __user *uiov = (void __user *)data;
+	unsigned int gup_flags = FOLL_FORCE;
+
+	if (!system_supports_mte())
+		return -EIO;
+
+	if (get_user(kiov.iov_base, &uiov->iov_base) ||
+	    get_user(kiov.iov_len, &uiov->iov_len))
+		return -EFAULT;
+
+	if (request == PTRACE_POKEMTETAGS)
+		gup_flags |= FOLL_WRITE;
+
+	/* align addr to the MTE tag granule */
+	addr &= MTE_ALLOC_MASK;
+
+	ret = access_remote_tags(child, addr, &kiov, gup_flags);
+	if (!ret)
+		ret = __put_user(kiov.iov_len, &uiov->iov_len);
+
+	return ret;
+}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 077e352495eb..1fdb841ad536 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -34,6 +34,7 @@
 #include <asm/cpufeature.h>
 #include <asm/debug-monitors.h>
 #include <asm/fpsimd.h>
+#include <asm/mte.h>
 #include <asm/pgtable.h>
 #include <asm/pointer_auth.h>
 #include <asm/stacktrace.h>
@@ -1797,7 +1798,19 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
 long arch_ptrace(struct task_struct *child, long request,
 		 unsigned long addr, unsigned long data)
 {
-	return ptrace_request(child, request, addr, data);
+	int ret;
+
+	switch (request) {
+	case PTRACE_PEEKMTETAGS:
+	case PTRACE_POKEMTETAGS:
+		ret = mte_ptrace_copy_tags(child, request, addr, data);
+		break;
+	default:
+		ret = ptrace_request(child, request, addr, data);
+		break;
+	}
+
+	return ret;
 }
 
 enum ptrace_syscall_dir {
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index bd51ea7e2fcb..45be04a8c73c 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -5,6 +5,7 @@
 #include <linux/linkage.h>
 
 #include <asm/assembler.h>
+#include <asm/mte.h>
 
 /*
  * Compare tags of two pages
@@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
 
 	ret
 SYM_FUNC_END(mte_memcmp_pages)
+
+/*
+ * Read tags from a user buffer (one tag per byte) and set the corresponding
+ * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
+ *   x0 - kernel address (to)
+ *   x1 - user buffer (from)
+ *   x2 - number of tags/bytes (n)
+ * Returns:
+ *   x0 - number of tags read/set
+ */
+SYM_FUNC_START(mte_copy_tags_from_user)
+	mov	x3, x1
+1:
+USER(2f, ldtrb	w4, [x1])
+	lsl	x4, x4, #MTE_TAG_SHIFT
+	stg	x4, [x0], #MTE_ALLOC_SIZE
+	add	x1, x1, #1
+	subs	x2, x2, #1
+	b.ne	1b
+
+	// exception handling and function return
+2:	sub	x0, x1, x3		// update the number of tags set
+	ret
+SYM_FUNC_END(mte_copy_tags_from_user)
+
+/*
+ * Get the tags from a kernel address range and write the tag values to the
+ * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
+ *   x0 - user buffer (to)
+ *   x1 - kernel address (from)
+ *   x2 - number of tags/bytes (n)
+ * Returns:
+ *   x0 - number of tags read/set
+ */
+SYM_FUNC_START(mte_copy_tags_to_user)
+	mov	x3, x0
+1:
+	ldg	x4, [x1]
+	ubfx	x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
+USER(2f, sttrb	w4, [x0])
+	add	x0, x0, #1
+	add	x1, x1, #MTE_ALLOC_SIZE
+	subs	x2, x2, #1
+	b.ne	1b
+
+	// exception handling and function return
+2:	sub	x0, x0, x3		// update the number of tags copied
+	ret
+SYM_FUNC_END(mte_copy_tags_from_user)


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (18 preceding siblings ...)
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
@ 2020-04-21 14:26 ` Catalin Marinas
  2020-04-21 15:29   ` Al Viro
                     ` (4 more replies)
  2020-04-21 14:26 ` [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support Catalin Marinas
                   ` (2 subsequent siblings)
  22 siblings, 5 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Alexander Viro

The copy_mount_options() function takes a user pointer argument but not
a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
not guaranteed to return all the accessible bytes if, for example, the
access crosses a page boundary and gets a fault on the second page. To
work around this, the current copy_mount_options() implementations
performs to copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.

Some architectures like arm64 can guarantee an exact copy_from_user()
depending on the size (since the arch function performs some alignment
on the source register). Introduce an arch_has_exact_copy_from_user()
function and allow copy_mount_options() to perform the user access in a
single pass.

While this function is not on a critical path, the single-pass behaviour
is required for arm64 MTE (memory tagging) support where a uaccess can
trigger intra-page faults (tag not matching). With the current
implementation, if this happens during the first page, the function will
return -EFAULT.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    New in v3.

 arch/arm64/include/asm/uaccess.h | 11 +++++++++++
 fs/namespace.c                   |  7 +++++--
 include/linux/uaccess.h          |  8 ++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 32fc8061aa76..566da441eba2 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
 #define INLINE_COPY_TO_USER
 #define INLINE_COPY_FROM_USER
 
+static inline bool arch_has_exact_copy_from_user(unsigned long n)
+{
+	/*
+	 * copy_from_user() aligns the source pointer if the size is greater
+	 * than 15. Since all the loads are naturally aligned, they can only
+	 * fail on the first byte.
+	 */
+	return n > 15;
+}
+#define arch_has_exact_copy_from_user
+
 extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
 static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
 {
diff --git a/fs/namespace.c b/fs/namespace.c
index a28e4db075ed..8febc50dfc5d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	size = PAGE_SIZE - offset_in_page(data);
+	size = PAGE_SIZE;
+	if (!arch_has_exact_copy_from_user(size))
+		size -= offset_in_page(data);
 
-	if (copy_from_user(copy, data, size)) {
+	if (copy_from_user(copy, data, size) == size) {
 		kfree(copy);
 		return ERR_PTR(-EFAULT);
 	}
 	if (size != PAGE_SIZE) {
+		WARN_ON(1);
 		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
 			memset(copy + size, 0, PAGE_SIZE - size);
 	}
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 67f016010aad..00e097a9e8d6 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -152,6 +152,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 		n = _copy_to_user(to, from, n);
 	return n;
 }
+
+#ifndef arch_has_exact_copy_from_user
+static inline bool arch_has_exact_copy_from_user(unsigned long n)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_COMPAT
 static __always_inline unsigned long __must_check
 copy_in_user(void __user *to, const void __user *from, unsigned long n)


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (19 preceding siblings ...)
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
@ 2020-04-21 14:26 ` Catalin Marinas
  2020-04-24 13:57   ` Catalin Marinas
  2020-04-21 14:26 ` [PATCH v3 22/23] arm64: mte: Kconfig entry Catalin Marinas
  2020-04-21 14:26 ` [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
  22 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Rob Herring, Mark Rutland, Suzuki K Poulose

Even if the ID_AA64PFR1_EL1 register advertises the presence of MTE, it
is not guaranteed that the memory system on the SoC supports the
feature. In the absence of system-wide MTE support, the behaviour is
undefined and the kernel should not enable the MTE memory type in
MAIR_EL1.

For FDT, add an 'arm,armv8.5-memtag' property to the /memory nodes and
check for its presence during MTE probing. For example:

	memory@80000000 {
		device_type = "memory";
		arm,armv8.5-memtag;
		reg = <0x00000000 0x80000000 0 0x80000000>,
		      <0x00000008 0x80000000 0 0x80000000>;
	};

If the /memory nodes are not present in DT or if at least one node does
not support MTE, the feature will be disabled. On EFI systems, it is
assumed that the memory description matches the EFI memory map (if not,
it is considered a firmware bug).

MTE is not currently supported on ACPI systems.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Rob Herring <Rob.Herring@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
---

Notes:
    New in v3.
    
    Ongoing (internal) discussions on whether this is the right approach.
    The issue needs to be solved similarly for ACPI systems.

 arch/arm64/boot/dts/arm/fvp-base-revc.dts |  1 +
 arch/arm64/kernel/cpufeature.c            | 51 ++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
index 66381d89c1ce..c620a289f15e 100644
--- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
+++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
@@ -94,6 +94,7 @@
 
 	memory@80000000 {
 		device_type = "memory";
+		arm,armv8.5-memtag;
 		reg = <0x00000000 0x80000000 0 0x80000000>,
 		      <0x00000008 0x80000000 0 0x80000000>;
 	};
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d2fe8ff72324..a32aad1d5b57 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -7,6 +7,7 @@
 
 #define pr_fmt(fmt) "CPU features: " fmt
 
+#include <linux/acpi.h>
 #include <linux/bsearch.h>
 #include <linux/cpumask.h>
 #include <linux/crash_dump.h>
@@ -14,6 +15,7 @@
 #include <linux/stop_machine.h>
 #include <linux/types.h>
 #include <linux/mm.h>
+#include <linux/of.h>
 #include <linux/cpu.h>
 #include <asm/cpu.h>
 #include <asm/cpufeature.h>
@@ -1412,6 +1414,51 @@ static bool can_use_gic_priorities(const struct arm64_cpu_capabilities *entry,
 #endif
 
 #ifdef CONFIG_ARM64_MTE
+static bool has_usable_mte(const struct arm64_cpu_capabilities *entry,
+			   int scope)
+{
+	struct device_node *np;
+	bool memory_checked = false;
+	bool mte_capable = true;
+
+	if (!has_cpuid_feature(entry, scope))
+		return false;
+
+	/*
+	 * If !SCOPE_SYSTEM, return true as per the above CPUID check (late
+	 * CPU bring-up/hotplug). Otherwise, perform addtional checks on the
+	 * system memory MTE support.
+	 */
+	if (scope != SCOPE_SYSTEM)
+		return true;
+
+	if (!acpi_disabled) {
+		pr_warn("MTE not supported on ACPI systems\n");
+		return false;
+	}
+
+	/* check the "memory" nodes for MTE support */
+	for_each_node_by_type(np, "memory") {
+		memory_checked = true;
+		mte_capable &= of_property_read_bool(np, "arm,armv8.5-memtag");
+	}
+
+	if (!memory_checked || !mte_capable) {
+		pr_warn("System memory is not MTE-capable\n");
+		return false;
+	}
+
+	return true;
+}
+
+static bool has_hwcap_mte(const struct arm64_cpu_capabilities *entry,
+			  int scope)
+{
+	if (scope == SCOPE_SYSTEM)
+		return system_supports_mte();
+	return this_cpu_has_cap(ARM64_MTE);
+}
+
 static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 {
 	u64 mair;
@@ -1828,7 +1875,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.desc = "Memory Tagging Extension",
 		.capability = ARM64_MTE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
-		.matches = has_cpuid_feature,
+		.matches = has_usable_mte,
 		.sys_reg = SYS_ID_AA64PFR1_EL1,
 		.field_pos = ID_AA64PFR1_MTE_SHIFT,
 		.min_field_value = ID_AA64PFR1_MTE,
@@ -1950,7 +1997,7 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
 	HWCAP_MULTI_CAP(ptr_auth_hwcap_gen_matches, CAP_HWCAP, KERNEL_HWCAP_PACG),
 #endif
 #ifdef CONFIG_ARM64_MTE
-	HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_MTE_SHIFT, FTR_UNSIGNED, ID_AA64PFR1_MTE, CAP_HWCAP, KERNEL_HWCAP_MTE),
+	HWCAP_CAP_MATCH(has_hwcap_mte, CAP_HWCAP, KERNEL_HWCAP_MTE),
 #endif /* CONFIG_ARM64_MTE */
 	{},
 };


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 22/23] arm64: mte: Kconfig entry
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (20 preceding siblings ...)
  2020-04-21 14:26 ` [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support Catalin Marinas
@ 2020-04-21 14:26 ` Catalin Marinas
  2020-04-21 14:26 ` [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
  22 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Add Memory Tagging Extension support to the arm64 kbuild.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/Kconfig | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 40fb05d96c60..af2e6e5dae1b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1606,6 +1606,38 @@ config ARCH_RANDOM
 
 endmenu
 
+menu "ARMv8.5 architectural features"
+
+config ARM64_AS_HAS_MTE
+	def_bool $(as-instr,.arch armv8.5-a+memtag)
+
+config ARM64_MTE
+	bool "Memory Tagging Extension support"
+	depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
+	select ARCH_USES_HIGH_VMA_FLAGS
+	select ARCH_NO_SWAP
+	help
+	  Memory Tagging (part of the ARMv8.5 Extensions) provides
+	  architectural support for run-time, always-on detection of
+	  various classes of memory error to aid with software debugging
+	  to eliminate vulnerabilities arising from memory-unsafe
+	  languages.
+
+	  This option enables the support for the Memory Tagging
+	  Extension at EL0 (i.e. for userspace).
+
+	  Selecting this option allows the feature to be detected at
+	  runtime. Any secondary CPU not implementing this feature will
+	  not be allowed a late bring-up.
+
+	  Userspace binaries that want to use this feature must
+	  explicitly opt in. The mechanism for the userspace is
+	  described in:
+
+	  Documentation/arm64/memory-tagging-extension.rst.
+
+endmenu
+
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y


^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
                   ` (21 preceding siblings ...)
  2020-04-21 14:26 ` [PATCH v3 22/23] arm64: mte: Kconfig entry Catalin Marinas
@ 2020-04-21 14:26 ` Catalin Marinas
  2020-04-29 16:47   ` Dave Martin
  2020-05-05 10:32   ` Szabolcs Nagy
  22 siblings, 2 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 14:26 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

From: Vincenzo Frascino <vincenzo.frascino@arm.com>

Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
a mechanism to detect the sources of memory related errors which
may be vulnerable to exploitation, including bounds violations,
use-after-free, use-after-return, use-out-of-scope and use before
initialization errors.

Add Memory Tagging Extension documentation for the arm64 linux
kernel support.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---

Notes:
    v3:
    - Modify the uaccess checking conditions: only when the sync mode is
      selected by the user. In async mode, the kernel uaccesses are not
      checked.
    - Clarify that an include mask of 0 (exclude mask 0xffff) results in
      always generating tag 0.
    - Document the ptrace() interface.
    
    v2:
    - Documented the uaccess kernel tag checking mode.
    - Removed the BTI definitions from cpu-feature-registers.rst.
    - Removed the paragraph stating that MTE depends on the tagged address
      ABI (while the Kconfig entry does, there is no requirement for the
      user to enable both).
    - Changed the GCR_EL1.Exclude handling description following the change
      in the prctl() interface (include vs exclude mask).
    - Updated the example code.

 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/elf_hwcaps.rst            |   5 +
 Documentation/arm64/index.rst                 |   1 +
 .../arm64/memory-tagging-extension.rst        | 260 ++++++++++++++++++
 4 files changed, 268 insertions(+)
 create mode 100644 Documentation/arm64/memory-tagging-extension.rst

diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index 41937a8091aa..b5679fa85ad9 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -174,6 +174,8 @@ infrastructure:
      +------------------------------+---------+---------+
      | Name                         |  bits   | visible |
      +------------------------------+---------+---------+
+     | MTE                          | [11-8]  |    y    |
+     +------------------------------+---------+---------+
      | SSBS                         | [7-4]   |    y    |
      +------------------------------+---------+---------+
 
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 7dfb97dfe416..ca7f90e99e3a 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -236,6 +236,11 @@ HWCAP2_RNG
 
     Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001.
 
+HWCAP2_MTE
+
+    Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
+    by Documentation/arm64/memory-tagging-extension.rst.
+
 4. Unused AT_HWCAP bits
 -----------------------
 
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 09cbb4ed2237..4cd0e696f064 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -14,6 +14,7 @@ ARM64 Architecture
     hugetlbpage
     legacy_instructions
     memory
+    memory-tagging-extension
     pointer-authentication
     silicon-errata
     sve
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
new file mode 100644
index 000000000000..f82dfbd70061
--- /dev/null
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -0,0 +1,260 @@
+===============================================
+Memory Tagging Extension (MTE) in AArch64 Linux
+===============================================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+         Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 2020-02-25
+
+This document describes the provision of the Memory Tagging Extension
+functionality in AArch64 Linux.
+
+Introduction
+============
+
+ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
+feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
+(Top Byte Ignore) feature and allows software to access a 4-bit
+allocation tag for each 16-byte granule in the physical address space.
+Such memory range must be mapped with the Normal-Tagged memory
+attribute. A logical tag is derived from bits 59-56 of the virtual
+address used for the memory access. A CPU with MTE enabled will compare
+the logical tag against the allocation tag and potentially raise an
+exception on mismatch, subject to system registers configuration.
+
+Userspace Support
+=================
+
+When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
+supported by the hardware, the kernel advertises the feature to
+userspace via ``HWCAP2_MTE``.
+
+PROT_MTE
+--------
+
+To access the allocation tags, a user process must enable the Tagged
+memory attribute on an address range using a new ``prot`` flag for
+``mmap()`` and ``mprotect()``:
+
+``PROT_MTE`` - Pages allow access to the MTE allocation tags.
+
+The allocation tag is set to 0 when such pages are first mapped in the
+user address space and preserved on copy-on-write. ``MAP_SHARED`` is
+supported and the allocation tags can be shared between processes.
+
+**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
+RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
+types of mapping will result in ``-EINVAL`` returned by these system
+calls.
+
+**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
+be cleared by ``mprotect()``.
+
+Tag Check Faults
+----------------
+
+When ``PROT_MTE`` is enabled on an address range and a mismatch between
+the logical and allocation tags occurs on access, there are three
+configurable behaviours:
+
+- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
+  tag check fault.
+
+- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
+  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
+  memory access is not performed.
+
+- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
+  thread, asynchronously following one or multiple tag check faults,
+  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
+
+**Note**: There are no *match-all* logical tags available for user
+applications.
+
+The user can select the above modes, per thread, using the
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
+``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+bit-field:
+
+- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
+- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
+- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+
+Tag checking can also be disabled for a user thread by setting the
+``PSTATE.TCO`` bit with ``MSR TCO, #1``.
+
+**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
+irrespective of the interrupted context.
+
+**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
+are only checked if the current thread tag checking mode is
+PR_MTE_TCF_SYNC.
+
+Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
+-----------------------------------------------------------------
+
+The architecture allows excluding certain tags to be randomly generated
+via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
+excludes all tags other than 0. A user thread can enable specific tags
+in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
+in the ``PR_MTE_TAG_MASK`` bit-field.
+
+**Note**: The hardware uses an exclude mask but the ``prctl()``
+interface provides an include mask. An include mask of ``0`` (exclusion
+mask ``0xffff``) results in the CPU always generating tag ``0``.
+
+The ``ptrace()`` interface
+--------------------------
+
+``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
+the tags from or set the tags to a tracee's address space. The
+``ptrace()`` syscall is invoked as ``ptrace(request, pid, addr, data)``
+where:
+
+- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
+- ``pid`` - the tracee's PID.
+- ``addr`` - address in the tracee's address space.
+- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
+  a buffer of ``iov_len`` length in the tracer's address space.
+
+The tags in the tracer's ``iov_base`` buffer are represented as one tag
+per byte and correspond to a 16-byte MTE tag granule in the tracee's
+address space.
+
+``ptrace()`` return value:
+
+- 0 - success, the tracer's ``iov_len`` was updated to the number of
+  tags copied (it may be smaller than the requested ``iov_len`` if the
+  requested address range in the tracee's or the tracer's space cannot
+  be fully accessed).
+- ``-EPERM`` - the specified process cannot be traced.
+- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
+  address) and no tags copied. ``iov_len`` not updated.
+- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
+  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
+
+Example of correct usage
+========================
+
+*MTE Example code*
+
+.. code-block:: c
+
+    /*
+     * To be compiled with -march=armv8.5-a+memtag
+     */
+    #include <errno.h>
+    #include <stdio.h>
+    #include <stdlib.h>
+    #include <unistd.h>
+    #include <sys/auxv.h>
+    #include <sys/mman.h>
+    #include <sys/prctl.h>
+
+    /*
+     * From arch/arm64/include/uapi/asm/hwcap.h
+     */
+    #define HWCAP2_MTE              (1 << 18)
+
+    /*
+     * From arch/arm64/include/uapi/asm/mman.h
+     */
+    #define PROT_MTE                 0x20
+
+    /*
+     * From include/uapi/linux/prctl.h
+     */
+    #define PR_SET_TAGGED_ADDR_CTRL 55
+    #define PR_GET_TAGGED_ADDR_CTRL 56
+    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
+    # define PR_MTE_TCF_SHIFT       1
+    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TAG_SHIFT       3
+    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
+
+    /*
+     * Insert a random logical tag into the given pointer.
+     */
+    #define insert_random_tag(ptr) ({                       \
+            __u64 __val;                                    \
+            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
+            __val;                                          \
+    })
+
+    /*
+     * Set the allocation tag on the destination address.
+     */
+    #define set_tag(tagged_addr) do {                                      \
+            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
+    } while (0)
+
+    int main()
+    {
+            unsigned long *a;
+            unsigned long page_sz = getpagesize();
+            unsigned long hwcap2 = getauxval(AT_HWCAP2);
+
+            /* check if MTE is present */
+            if (!(hwcap2 & HWCAP2_MTE))
+                    return -1;
+
+            /*
+             * Enable the tagged address ABI, synchronous MTE tag check faults and
+             * allow all non-zero tags in the randomly generated set.
+             */
+            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
+                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+                      0, 0, 0)) {
+                    perror("prctl() failed");
+                    return -1;
+            }
+
+            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
+                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+            if (a == MAP_FAILED) {
+                    perror("mmap() failed");
+                    return -1;
+            }
+
+            /*
+             * Enable MTE on the above anonymous mmap. The flag could be passed
+             * directly to mmap() and skip this step.
+             */
+            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
+                    perror("mprotect() failed");
+                    return -1;
+            }
+
+            /* access with the default tag (0) */
+            a[0] = 1;
+            a[1] = 2;
+
+            printf("a[0] = %lu a[1] = %lu\n", a[0], a[1]);
+
+            /* set the logical and allocation tags */
+            a = (unsigned long *)insert_random_tag(a);
+            set_tag(a);
+
+            printf("%p\n", a);
+
+            /* non-zero tag access */
+            a[0] = 3;
+            printf("a[0] = %lu a[1] = %lu\n", a[0], a[1]);
+
+            /*
+             * If MTE is enabled correctly the next instruction will generate an
+             * exception.
+             */
+            printf("Expecting SIGSEGV...\n");
+            a[2] = 0xdead;
+
+            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
+            printf("...done\n");
+
+            return 0;
+    }


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
@ 2020-04-21 15:29   ` Al Viro
  2020-04-21 16:45     ` Catalin Marinas
  2020-04-27 16:56   ` Dave Martin
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Al Viro @ 2020-04-21 15:29 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch

On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:

> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.

Details, please.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 15:29   ` Al Viro
@ 2020-04-21 16:45     ` Catalin Marinas
  0 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-21 16:45 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch

On Tue, Apr 21, 2020 at 04:29:48PM +0100, Al Viro wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > While this function is not on a critical path, the single-pass behaviour
> > is required for arm64 MTE (memory tagging) support where a uaccess can
> > trigger intra-page faults (tag not matching). With the current
> > implementation, if this happens during the first page, the function will
> > return -EFAULT.
> 
> Details, please.

With the arm64 MTE support (memory tagging extensions, see [1] for the
full series), bits 56..59 of a pointer (the tag) are checked against the
corresponding tag/colour set in memory (on a 16-byte granule). When
copy_mount_options() gets such tagged user pointer, it attempts to read
4K even though the user buffer is smaller. The user would only guarantee
the same matching tag for the data it masses to mount(), not the whole
4K or to the end of a page. The side effect is that the first
copy_from_user() could still fault after reading some bytes but before
reaching the end of the page.

Prior to commit 12efec560274 ("saner copy_mount_options()"), this code
had a fallback to byte-by-byte copying. I thought I'd not revert this
commit as the copy_mount_options() now looks cleaner.

[1] https://lore.kernel.org/linux-arm-kernel/20200421142603.3894-1-catalin.marinas@arm.com/

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-04-21 14:25 ` [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
@ 2020-04-23 10:38   ` Catalin Marinas
  2020-04-27 16:58   ` Dave Martin
  1 sibling, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-23 10:38 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch

On Tue, Apr 21, 2020 at 03:25:50PM +0100, Catalin Marinas wrote:
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ddcde093c433..3650a0a77ed0 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -145,6 +145,31 @@ alternative_cb_end
>  #endif
>  	.endm
>  
> +	/* Check for MTE asynchronous tag check faults */
> +	.macro check_mte_async_tcf, flgs, tmp
> +#ifdef CONFIG_ARM64_MTE
> +alternative_if_not ARM64_MTE
> +	b	1f
> +alternative_else_nop_endif
> +	mrs_s	\tmp, SYS_TFSRE0_EL1
> +	tbz	\tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
> +	/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
> +	orr	\flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
> +	str	\flgs, [tsk, #TSK_TI_FLAGS]
> +	msr_s	SYS_TFSRE0_EL1, xzr
> +1:
> +#endif
> +	.endm
> +
> +	/* Clear the MTE asynchronous tag check faults */
> +	.macro clear_mte_async_tcf
> +#ifdef CONFIG_ARM64_MTE
> +alternative_if ARM64_MTE
> +	msr_s	SYS_TFSRE0_EL1, xzr
> +alternative_else_nop_endif

This needs a 'dsb ish' prior to the msr as an indirect write (async tag
check fault) to the TFSRE0_EL1 register is not ordered with a subsequent
direct write (msr) to this register.

The check_mte_async_tcf macro is fine as we execute it after taking an
exception with SCTLR_EL1.ITFSB bit set (which triggers such
synchronisation).

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend
  2020-04-21 14:25 ` [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
@ 2020-04-23 15:23   ` Lorenzo Pieralisi
  0 siblings, 0 replies; 81+ messages in thread
From: Lorenzo Pieralisi @ 2020-04-23 15:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch

On Tue, Apr 21, 2020 at 03:25:58PM +0100, Catalin Marinas wrote:
> The CPU resume/suspend routines only take care of the common system
> registers. Restore GCR_EL1 in addition via the __cpu_suspend_exit()
> function.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
> ---
> 
> Notes:
>     New in v3.
> 
>  arch/arm64/include/asm/mte.h | 4 ++++
>  arch/arm64/kernel/mte.c      | 8 ++++++++
>  arch/arm64/kernel/suspend.c  | 4 ++++
>  3 files changed, 16 insertions(+)

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 3dc0a7977124..22eb3e06f311 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -12,6 +12,7 @@ int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
>  #ifdef CONFIG_ARM64_MTE
>  void flush_mte_state(void);
>  void mte_thread_switch(struct task_struct *next);
> +void mte_suspend_exit(void);
>  long set_mte_ctrl(unsigned long arg);
>  long get_mte_ctrl(void);
>  #else
> @@ -21,6 +22,9 @@ static inline void flush_mte_state(void)
>  static inline void mte_thread_switch(struct task_struct *next)
>  {
>  }
> +static inline void mte_suspend_exit(void)
> +{
> +}
>  static inline long set_mte_ctrl(unsigned long arg)
>  {
>  	return 0;
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index 212b9fac294d..fa4a4196b248 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -76,6 +76,14 @@ void mte_thread_switch(struct task_struct *next)
>  	update_gcr_el1_excl(next->thread.gcr_incl);
>  }
>  
> +void mte_suspend_exit(void)
> +{
> +	if (!system_supports_mte())
> +		return;
> +
> +	update_gcr_el1_excl(current->thread.gcr_incl);
> +}
> +
>  long set_mte_ctrl(unsigned long arg)
>  {
>  	u64 tcf0;
> diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
> index 9405d1b7f4b0..1d405b73d009 100644
> --- a/arch/arm64/kernel/suspend.c
> +++ b/arch/arm64/kernel/suspend.c
> @@ -9,6 +9,7 @@
>  #include <asm/daifflags.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/exec.h>
> +#include <asm/mte.h>
>  #include <asm/pgtable.h>
>  #include <asm/memory.h>
>  #include <asm/mmu_context.h>
> @@ -74,6 +75,9 @@ void notrace __cpu_suspend_exit(void)
>  	 */
>  	if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE)
>  		arm64_set_ssbd_mitigation(false);
> +
> +	/* Restore additional MTE-specific configuration */
> +	mte_suspend_exit();
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support
  2020-04-21 14:26 ` [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support Catalin Marinas
@ 2020-04-24 13:57   ` Catalin Marinas
  2020-04-24 16:17     ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-24 13:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Rob Herring, Mark Rutland, Suzuki K Poulose

On Tue, Apr 21, 2020 at 03:26:01PM +0100, Catalin Marinas wrote:
> Even if the ID_AA64PFR1_EL1 register advertises the presence of MTE, it
> is not guaranteed that the memory system on the SoC supports the
> feature. In the absence of system-wide MTE support, the behaviour is
> undefined and the kernel should not enable the MTE memory type in
> MAIR_EL1.
> 
> For FDT, add an 'arm,armv8.5-memtag' property to the /memory nodes and
> check for its presence during MTE probing. For example:
> 
> 	memory@80000000 {
> 		device_type = "memory";
> 		arm,armv8.5-memtag;
> 		reg = <0x00000000 0x80000000 0 0x80000000>,
> 		      <0x00000008 0x80000000 0 0x80000000>;
> 	};
> 
> If the /memory nodes are not present in DT or if at least one node does
> not support MTE, the feature will be disabled. On EFI systems, it is
> assumed that the memory description matches the EFI memory map (if not,
> it is considered a firmware bug).
> 
> MTE is not currently supported on ACPI systems.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Rob Herring <Rob.Herring@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>

This patch turns out to be incomplete. While it does not expose the
HWCAP2_MTE to user when the above DT property is not present, it still
allows user access to the ID_AA64PFR1_EL1.MTE field (via MRS emulations)
since it is marked as visible.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support
  2020-04-24 13:57   ` Catalin Marinas
@ 2020-04-24 16:17     ` Catalin Marinas
  2020-04-27 11:14       ` Suzuki K Poulose
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-24 16:17 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Rob Herring, Mark Rutland, Suzuki K Poulose

On Fri, Apr 24, 2020 at 02:57:36PM +0100, Catalin Marinas wrote:
> On Tue, Apr 21, 2020 at 03:26:01PM +0100, Catalin Marinas wrote:
> > Even if the ID_AA64PFR1_EL1 register advertises the presence of MTE, it
> > is not guaranteed that the memory system on the SoC supports the
> > feature. In the absence of system-wide MTE support, the behaviour is
> > undefined and the kernel should not enable the MTE memory type in
> > MAIR_EL1.
> > 
> > For FDT, add an 'arm,armv8.5-memtag' property to the /memory nodes and
> > check for its presence during MTE probing. For example:
> > 
> > 	memory@80000000 {
> > 		device_type = "memory";
> > 		arm,armv8.5-memtag;
> > 		reg = <0x00000000 0x80000000 0 0x80000000>,
> > 		      <0x00000008 0x80000000 0 0x80000000>;
> > 	};
> > 
> > If the /memory nodes are not present in DT or if at least one node does
> > not support MTE, the feature will be disabled. On EFI systems, it is
> > assumed that the memory description matches the EFI memory map (if not,
> > it is considered a firmware bug).
> > 
> > MTE is not currently supported on ACPI systems.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Rob Herring <Rob.Herring@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
> 
> This patch turns out to be incomplete. While it does not expose the
> HWCAP2_MTE to user when the above DT property is not present, it still
> allows user access to the ID_AA64PFR1_EL1.MTE field (via MRS emulations)
> since it is marked as visible.

Attempt below at moving the check to the CPUID fields setup. This way we
can avoid the original patch entirely since the sanitised id regs will
have a zero MTE field if DT doesn't support it.

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index afc315814563..0a24d36bf231 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -61,6 +61,7 @@ struct arm64_ftr_bits {
 	u8		shift;
 	u8		width;
 	s64		safe_val; /* safe value for FTR_EXACT features */
+	s64		(*filter)(const struct arm64_ftr_bits *, s64);
 };
 
 /*
@@ -542,7 +543,10 @@ cpuid_feature_extract_field(u64 features, int field, bool sign)
 
 static inline s64 arm64_ftr_value(const struct arm64_ftr_bits *ftrp, u64 val)
 {
-	return (s64)cpuid_feature_extract_field_width(val, ftrp->shift, ftrp->width, ftrp->sign);
+	s64 fval = (s64)cpuid_feature_extract_field_width(val, ftrp->shift, ftrp->width, ftrp->sign);
+	if (ftrp->filter)
+		fval = ftrp->filter(ftrp, fval);
+	return fval;
 }
 
 static inline bool id_aa64mmfr0_mixed_endian_el0(u64 mmfr0)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a32aad1d5b57..b0f37c77ec77 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -89,23 +89,28 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cpu_hwcap_keys, ARM64_NCAPS);
 EXPORT_SYMBOL(cpu_hwcap_keys);
 
 #define __ARM64_FTR_BITS(SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
-	{						\
 		.sign = SIGNED,				\
 		.visible = VISIBLE,			\
 		.strict = STRICT,			\
 		.type = TYPE,				\
 		.shift = SHIFT,				\
 		.width = WIDTH,				\
-		.safe_val = SAFE_VAL,			\
-	}
+		.safe_val = SAFE_VAL
 
 /* Define a feature with unsigned values */
 #define ARM64_FTR_BITS(VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
-	__ARM64_FTR_BITS(FTR_UNSIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL)
+	{ __ARM64_FTR_BITS(FTR_UNSIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL), }
 
 /* Define a feature with a signed value */
 #define S_ARM64_FTR_BITS(VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL) \
-	__ARM64_FTR_BITS(FTR_SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL)
+	{ __ARM64_FTR_BITS(FTR_SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL), }
+
+/* Define a feature with a filter function to process the field value */
+#define FILTERED_ARM64_FTR_BITS(SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL, filter_fn) \
+	{											\
+		__ARM64_FTR_BITS(SIGNED, VISIBLE, STRICT, TYPE, SHIFT, WIDTH, SAFE_VAL),	\
+		.filter = filter_fn,								\
+	}
 
 #define ARM64_FTR_END					\
 	{						\
@@ -120,6 +125,42 @@ static void cpu_enable_cnp(struct arm64_cpu_capabilities const *cap);
 
 static bool __system_matches_cap(unsigned int n);
 
+#ifdef CONFIG_ARM64_MTE
+s64 mte_ftr_filter(const struct arm64_ftr_bits *ftrp, s64 val)
+{
+	struct device_node *np;
+	static bool memory_checked = false;
+	static bool mte_capable = true;
+
+	/* EL0-only MTE is not supported by Linux, don't expose it */
+	if (val < ID_AA64PFR1_MTE)
+		return ID_AA64PFR1_MTE_NI;
+
+	if (memory_checked)
+		return mte_capable ? val : ID_AA64PFR1_MTE_NI;
+
+	if (!acpi_disabled) {
+		pr_warn("MTE not supported on ACPI systems\n");
+		return ID_AA64PFR1_MTE_NI;
+	}
+
+	/* check the DT "memory" nodes for MTE support */
+	for_each_node_by_type(np, "memory") {
+		memory_checked = true;
+		mte_capable &= of_property_read_bool(np, "arm,armv8.5-memtag");
+	}
+
+	if (!memory_checked || !mte_capable) {
+		pr_warn("System memory is not MTE-capable\n");
+		memory_checked = true;
+		mte_capable = false;
+		return ID_AA64PFR1_MTE_NI;
+	}
+
+	return val;
+}
+#endif
+
 /*
  * NOTE: Any changes to the visibility of features should be kept in
  * sync with the documentation of the CPU feature register ABI.
@@ -184,8 +225,10 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI),
-	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
-		       FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI),
+#ifdef CONFIG_ARM64_MTE
+	FILTERED_ARM64_FTR_BITS(FTR_UNSIGNED, FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
+				ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI, mte_ftr_filter),
+#endif
 	ARM64_FTR_END,
 };
 

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
@ 2020-04-24 23:28   ` Peter Collingbourne
  2020-04-29 10:27   ` Kevin Brodsky
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 81+ messages in thread
From: Peter Collingbourne @ 2020-04-24 23:28 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linux ARM, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov, linux-mm,
	linux-arch, Alan Hayward, Luis Machado, Omair Javaid

On Tue, Apr 21, 2020 at 7:26 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
>
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
>
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
>
> Notes:
>     New in v3.
>
>  arch/arm64/include/asm/mte.h         |  17 ++++
>  arch/arm64/include/uapi/asm/ptrace.h |   3 +
>  arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>  arch/arm64/kernel/ptrace.c           |  15 +++-
>  arch/arm64/lib/mte.S                 |  50 +++++++++++
>  5 files changed, 211 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 22eb3e06f311..0ca2aaff07a1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -2,12 +2,21 @@
>  #ifndef __ASM_MTE_H
>  #define __ASM_MTE_H
>
> +#define MTE_ALLOC_SIZE UL(16)
> +#define MTE_ALLOC_MASK (~(MTE_ALLOC_SIZE - 1))
> +#define MTE_TAG_SHIFT  (56)
> +#define MTE_TAG_SIZE   (4)
> +
>  #ifndef __ASSEMBLY__
>
>  #include <linux/sched.h>
>
>  /* Memory Tagging API */
>  int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
> +unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
> +                                     unsigned long n);
> +unsigned long mte_copy_tags_to_user(void __user *to, void *from,
> +                                   unsigned long n);
>
>  #ifdef CONFIG_ARM64_MTE
>  void flush_mte_state(void);
> @@ -15,6 +24,8 @@ void mte_thread_switch(struct task_struct *next);
>  void mte_suspend_exit(void);
>  long set_mte_ctrl(unsigned long arg);
>  long get_mte_ctrl(void);
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +                        unsigned long addr, unsigned long data);
>  #else
>  static inline void flush_mte_state(void)
>  {
> @@ -33,6 +44,12 @@ static inline long get_mte_ctrl(void)
>  {
>         return 0;
>  }
> +static inline int mte_ptrace_copy_tags(struct task_struct *child,
> +                                      long request, unsigned long addr,
> +                                      unsigned long data)
> +{
> +       return -EIO;
> +}
>  #endif
>
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
> index 1daf6dda8af0..cd2a4a164de3 100644
> --- a/arch/arm64/include/uapi/asm/ptrace.h
> +++ b/arch/arm64/include/uapi/asm/ptrace.h
> @@ -67,6 +67,9 @@
>  /* syscall emulation path in ptrace */
>  #define PTRACE_SYSEMU            31
>  #define PTRACE_SYSEMU_SINGLESTEP  32
> +/* MTE allocation tag access */
> +#define PTRACE_PEEKMTETAGS       33
> +#define PTRACE_POKEMTETAGS       34
>
>  #ifndef __ASSEMBLY__
>
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index fa4a4196b248..0cb496ed9bf9 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -3,12 +3,17 @@
>   * Copyright (C) 2020 ARM Ltd.
>   */
>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
>  #include <linux/prctl.h>
>  #include <linux/sched.h>
> +#include <linux/sched/mm.h>
>  #include <linux/thread_info.h>
> +#include <linux/uio.h>
>
>  #include <asm/cpufeature.h>
>  #include <asm/mte.h>
> +#include <asm/ptrace.h>
>  #include <asm/sysreg.h>
>
>  static void update_sctlr_el1_tcf0(u64 tcf0)
> @@ -133,3 +138,125 @@ long get_mte_ctrl(void)
>
>         return ret;
>  }
> +
> +/*
> + * Access MTE tags in another process' address space as given in mm. Update
> + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> + * Inspired by __access_remote_vm().
> + */
> +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> +                               unsigned long addr, struct iovec *kiov,
> +                               unsigned int gup_flags)
> +{
> +       struct vm_area_struct *vma;
> +       void __user *buf = kiov->iov_base;
> +       size_t len = kiov->iov_len;
> +       int ret;
> +       int write = gup_flags & FOLL_WRITE;
> +
> +       if (down_read_killable(&mm->mmap_sem))
> +               return -EIO;
> +
> +       if (!access_ok(buf, len))
> +               return -EFAULT;
> +
> +       while (len) {
> +               unsigned long tags, offset;
> +               void *maddr;
> +               struct page *page = NULL;
> +
> +               ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
> +                                           &page, &vma, NULL);
> +               if (ret <= 0)
> +                       break;
> +
> +               /* limit access to the end of the page */
> +               offset = offset_in_page(addr);
> +               tags = min(len, (PAGE_SIZE - offset) / MTE_ALLOC_SIZE);
> +
> +               maddr = page_address(page);
> +               if (write) {
> +                       tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
> +                       set_page_dirty_lock(page);
> +               } else {
> +                       tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
> +               }
> +               put_page(page);
> +
> +               /* error accessing the tracer's buffer */
> +               if (!tags)
> +                       break;
> +
> +               len -= tags;
> +               buf += tags;
> +               addr += tags * MTE_ALLOC_SIZE;
> +       }
> +       up_read(&mm->mmap_sem);
> +
> +       /* return an error if no tags copied */
> +       kiov->iov_len = buf - kiov->iov_base;
> +       if (!kiov->iov_len) {
> +               /* check for error accessing the tracee's address space */
> +               if (ret <= 0)
> +                       return -EIO;
> +               else
> +                       return -EFAULT;
> +       }
> +
> +       return 0;
> +}
> +
> +/*
> + * Copy MTE tags in another process' address space at 'addr' to/from tracer's
> + * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
> + */
> +static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
> +                             struct iovec *kiov, unsigned int gup_flags)
> +{
> +       struct mm_struct *mm;
> +       int ret;
> +
> +       mm = get_task_mm(tsk);
> +       if (!mm)
> +               return -EPERM;
> +
> +       if (!tsk->ptrace || (current != tsk->parent) ||
> +           ((get_dumpable(mm) != SUID_DUMP_USER) &&
> +            !ptracer_capable(tsk, mm->user_ns))) {
> +               mmput(mm);
> +               return -EPERM;
> +       }
> +
> +       ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
> +       mmput(mm);
> +
> +       return ret;
> +}
> +
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +                        unsigned long addr, unsigned long data)
> +{
> +       int ret;
> +       struct iovec kiov;
> +       struct iovec __user *uiov = (void __user *)data;
> +       unsigned int gup_flags = FOLL_FORCE;
> +
> +       if (!system_supports_mte())
> +               return -EIO;
> +
> +       if (get_user(kiov.iov_base, &uiov->iov_base) ||
> +           get_user(kiov.iov_len, &uiov->iov_len))
> +               return -EFAULT;
> +
> +       if (request == PTRACE_POKEMTETAGS)
> +               gup_flags |= FOLL_WRITE;
> +
> +       /* align addr to the MTE tag granule */
> +       addr &= MTE_ALLOC_MASK;
> +
> +       ret = access_remote_tags(child, addr, &kiov, gup_flags);
> +       if (!ret)
> +               ret = __put_user(kiov.iov_len, &uiov->iov_len);
> +
> +       return ret;
> +}
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 077e352495eb..1fdb841ad536 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -34,6 +34,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/fpsimd.h>
> +#include <asm/mte.h>
>  #include <asm/pgtable.h>
>  #include <asm/pointer_auth.h>
>  #include <asm/stacktrace.h>
> @@ -1797,7 +1798,19 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>  long arch_ptrace(struct task_struct *child, long request,
>                  unsigned long addr, unsigned long data)
>  {
> -       return ptrace_request(child, request, addr, data);
> +       int ret;
> +
> +       switch (request) {
> +       case PTRACE_PEEKMTETAGS:
> +       case PTRACE_POKEMTETAGS:
> +               ret = mte_ptrace_copy_tags(child, request, addr, data);
> +               break;
> +       default:
> +               ret = ptrace_request(child, request, addr, data);
> +               break;
> +       }
> +
> +       return ret;
>  }
>
>  enum ptrace_syscall_dir {
> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> index bd51ea7e2fcb..45be04a8c73c 100644
> --- a/arch/arm64/lib/mte.S
> +++ b/arch/arm64/lib/mte.S
> @@ -5,6 +5,7 @@
>  #include <linux/linkage.h>
>
>  #include <asm/assembler.h>
> +#include <asm/mte.h>
>
>  /*
>   * Compare tags of two pages
> @@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
>
>         ret
>  SYM_FUNC_END(mte_memcmp_pages)
> +
> +/*
> + * Read tags from a user buffer (one tag per byte) and set the corresponding
> + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> + *   x0 - kernel address (to)
> + *   x1 - user buffer (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_from_user)
> +       mov     x3, x1
> +1:
> +USER(2f, ldtrb w4, [x1])
> +       lsl     x4, x4, #MTE_TAG_SHIFT
> +       stg     x4, [x0], #MTE_ALLOC_SIZE
> +       add     x1, x1, #1
> +       subs    x2, x2, #1
> +       b.ne    1b
> +
> +       // exception handling and function return
> +2:     sub     x0, x1, x3              // update the number of tags set
> +       ret
> +SYM_FUNC_END(mte_copy_tags_from_user)
> +
> +/*
> + * Get the tags from a kernel address range and write the tag values to the
> + * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
> + *   x0 - user buffer (to)
> + *   x1 - kernel address (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_to_user)
> +       mov     x3, x0
> +1:
> +       ldg     x4, [x1]
> +       ubfx    x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
> +USER(2f, sttrb w4, [x0])
> +       add     x0, x0, #1
> +       add     x1, x1, #MTE_ALLOC_SIZE
> +       subs    x2, x2, #1
> +       b.ne    1b
> +
> +       // exception handling and function return
> +2:     sub     x0, x0, x3              // update the number of tags copied
> +       ret
> +SYM_FUNC_END(mte_copy_tags_from_user)

Nit: should be SYM_FUNC_END(mte_copy_tags_to_user).

Peter


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support
  2020-04-24 16:17     ` Catalin Marinas
@ 2020-04-27 11:14       ` Suzuki K Poulose
  0 siblings, 0 replies; 81+ messages in thread
From: Suzuki K Poulose @ 2020-04-27 11:14 UTC (permalink / raw)
  To: catalin.marinas, linux-arm-kernel
  Cc: will, Vincenzo.Frascino, Szabolcs.Nagy, Richard.Earnshaw,
	kevin.brodsky, andreyknvl, pcc, linux-mm, linux-arch,
	Rob.Herring, mark.rutland

Hi Catalin,

On 04/24/2020 05:17 PM, Catalin Marinas wrote:
> On Fri, Apr 24, 2020 at 02:57:36PM +0100, Catalin Marinas wrote:
>> On Tue, Apr 21, 2020 at 03:26:01PM +0100, Catalin Marinas wrote:
>>> Even if the ID_AA64PFR1_EL1 register advertises the presence of MTE, it
>>> is not guaranteed that the memory system on the SoC supports the
>>> feature. In the absence of system-wide MTE support, the behaviour is
>>> undefined and the kernel should not enable the MTE memory type in
>>> MAIR_EL1.
>>>
>>> For FDT, add an 'arm,armv8.5-memtag' property to the /memory nodes and
>>> check for its presence during MTE probing. For example:
>>>
>>> 	memory@80000000 {
>>> 		device_type = "memory";
>>> 		arm,armv8.5-memtag;
>>> 		reg = <0x00000000 0x80000000 0 0x80000000>,
>>> 		      <0x00000008 0x80000000 0 0x80000000>;
>>> 	};
>>>
>>> If the /memory nodes are not present in DT or if at least one node does
>>> not support MTE, the feature will be disabled. On EFI systems, it is
>>> assumed that the memory description matches the EFI memory map (if not,
>>> it is considered a firmware bug).
>>>
>>> MTE is not currently supported on ACPI systems.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Rob Herring <Rob.Herring@arm.com>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
>>
>> This patch turns out to be incomplete. While it does not expose the
>> HWCAP2_MTE to user when the above DT property is not present, it still
>> allows user access to the ID_AA64PFR1_EL1.MTE field (via MRS emulations)
>> since it is marked as visible.
> 
> Attempt below at moving the check to the CPUID fields setup. This way we
> can avoid the original patch entirely since the sanitised id regs will
> have a zero MTE field if DT doesn't support it.
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index afc315814563..0a24d36bf231 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -61,6 +61,7 @@ struct arm64_ftr_bits {
>   	u8		shift;
>   	u8		width;
>   	s64		safe_val; /* safe value for FTR_EXACT features */
> +	s64		(*filter)(const struct arm64_ftr_bits *, s64);
>   };
>   
>   /*
> @@ -542,7 +543,10 @@ cpuid_feature_extract_field(u64 features, int field, bool sign)
>   
>   static inline s64 arm64_ftr_value(const struct arm64_ftr_bits *ftrp, u64 val)
>   {
> -	return (s64)cpuid_feature_extract_field_width(val, ftrp->shift, ftrp->width, ftrp->sign);
> +	s64 fval = (s64)cpuid_feature_extract_field_width(val, ftrp->shift, ftrp->width, ftrp->sign);
> +	if (ftrp->filter)
> +		fval = ftrp->filter(ftrp, fval);
> +	return fval;
>   }
>   

This change makes sure that the sanitised infrastructure is initialised
with masked value and all consumers see a "sanitised" value, including
KVM (unless they emulate it directly on the local CPU)




>   
> +#ifdef CONFIG_ARM64_MTE
> +s64 mte_ftr_filter(const struct arm64_ftr_bits *ftrp, s64 val)
> +{
> +	struct device_node *np;
> +	static bool memory_checked = false;
> +	static bool mte_capable = true;
> +
> +	/* EL0-only MTE is not supported by Linux, don't expose it */
> +	if (val < ID_AA64PFR1_MTE)
> +		return ID_AA64PFR1_MTE_NI;
> +
> +	if (memory_checked)
> +		return mte_capable ? val : ID_AA64PFR1_MTE_NI;
> +
> +	if (!acpi_disabled) {
> +		pr_warn("MTE not supported on ACPI systems\n");
> +		return ID_AA64PFR1_MTE_NI;
> +	}
> +
> +	/* check the DT "memory" nodes for MTE support */
> +	for_each_node_by_type(np, "memory") {
> +		memory_checked = true;
> +		mte_capable &= of_property_read_bool(np, "arm,armv8.5-memtag");
> +	}
> +
> +	if (!memory_checked || !mte_capable) {
> +		pr_warn("System memory is not MTE-capable\n");
> +		memory_checked = true;
> +		mte_capable = false;
> +		return ID_AA64PFR1_MTE_NI;
> +	}
> +
> +	return val;
> +}
> +#endif
> +
>   /*
>    * NOTE: Any changes to the visibility of features should be kept in
>    * sync with the documentation of the CPU feature register ABI.
> @@ -184,8 +225,10 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
>   
>   static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
>   	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SSBS_SHIFT, 4, ID_AA64PFR1_SSBS_PSTATE_NI),
> -	ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
> -		       FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI),
> +#ifdef CONFIG_ARM64_MTE
> +	FILTERED_ARM64_FTR_BITS(FTR_UNSIGNED, FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE,
> +				ID_AA64PFR1_MTE_SHIFT, 4, ID_AA64PFR1_MTE_NI, mte_ftr_filter),
> +#endif
>   	ARM64_FTR_END,
>   };
>   

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
  2020-04-21 15:29   ` Al Viro
@ 2020-04-27 16:56   ` Dave Martin
  2020-04-28 14:06     ` Catalin Marinas
  2020-04-28 18:16   ` Kevin Brodsky
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-27 16:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Alexander Viro, Vincenzo Frascino, Will Deacon

On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current
> page and the second to what's left in the subsequent page.
> 
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
> 
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.

Do you know how much extra overhead we'd incur if we read at must one
tag granule at a time, instead of PAGE_SIZE?

I'm guessing that in practice strcpy_from_user() type operations copy
much less than a page most of the time, so what we lose in uaccess
overheads we _might_ regain in less redundant copying.

Would need behchmarking though.

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction
  2020-04-21 14:25 ` [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction Catalin Marinas
@ 2020-04-27 16:57   ` Dave Martin
  2020-04-28 11:43     ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-27 16:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

On Tue, Apr 21, 2020 at 03:25:41PM +0100, Catalin Marinas wrote:
> There are situations where we do not want to disable the whole block
> based on a config option, only the alternative part while keeping the
> first instruction. Improve the alternative_insn assembler macro to take
> a 'first_insn' argument, default 0, to preserve the current behaviour.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/alternative.h | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> index 5e5dc05d63a0..67d7cc608336 100644
> --- a/arch/arm64/include/asm/alternative.h
> +++ b/arch/arm64/include/asm/alternative.h
> @@ -111,7 +111,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
>  	.byte \alt_len
>  .endm
>  
> -.macro alternative_insn insn1, insn2, cap, enable = 1
> +/*
> + * Disable the whole block if enable == 0, unless first_insn == 1 in which
> + * case insn1 will always be issued but without an alternative insn2.
> + */
> +.macro alternative_insn insn1, insn2, cap, enable = 1, first_insn = 0
>  	.if \enable
>  661:	\insn1
>  662:	.pushsection .altinstructions, "a"
> @@ -122,6 +126,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
>  664:	.popsection
>  	.org	. - (664b-663b) + (662b-661b)
>  	.org	. - (662b-661b) + (664b-663b)
> +	.elseif \first_insn
> +	\insn1

This becomes quite unreadable at the invocation site, especially when
invoked as "alternative_insn ..., 1".  "... first_insn=1" is not much
better either).

I'm struggling to find non-trivial users of this that actually want the
whole block to be deleted dependent on the config.

Can we instead just always behave as if first_insn=1 instead?  This this
works intuitively as an alternative, not the current weird 3-way choice
between insn1, insn2 and nothing at all.  The only time that makes sense
is when one of the insns is a branch that skips the block, but that's
handled via the alternative_if macros instead.

Behaving always like first_insn=1 provides an if-else that is statically
optimised if the relevant feature is configured out, which I think is
the only thing people are ever going to want.

Maybe something depends on the current behaviour, but I can't see it so
far...

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-04-21 14:25 ` [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
  2020-04-23 10:38   ` Catalin Marinas
@ 2020-04-27 16:58   ` Dave Martin
  2020-04-28 13:43     ` Catalin Marinas
  1 sibling, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-27 16:58 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

On Tue, Apr 21, 2020 at 03:25:50PM +0100, Catalin Marinas wrote:
> From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> 
> The Memory Tagging Extension has two modes of notifying a tag check
> fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
> 
> 1. Synchronous raising of a Data Abort exception with DFSC 17.
> 2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
> 
> Add the exception handler for the synchronous exception and handling of
> the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
> do_notify_resume().
> 
> On a tag check failure in user-space, whether synchronous or
> asynchronous, a SIGSEGV will be raised on the faulting thread.

Has there been any discussion on whether this should be SIGSEGV or
SIGBUS?

Probably neither is much more appropriate than the other.

> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>

[...]

> diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> index 339882db5a91..e377d77c065e 100644
> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -732,6 +732,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
>  	regs->regs[29] = (unsigned long)&user->next_frame->fp;
>  	regs->pc = (unsigned long)ka->sa.sa_handler;
>  
> +	/* TCO (Tag Check Override) always cleared for signal handlers */
> +	regs->pstate &= ~PSR_TCO_BIT;
> +
>  	if (ka->sa.sa_flags & SA_RESTORER)
>  		sigtramp = ka->sa.sa_restorer;
>  	else
> @@ -923,6 +926,11 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>  			if (thread_flags & _TIF_UPROBE)
>  				uprobe_notify_resume(regs);
>  
> +			if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
> +				clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> +				force_signal_inject(SIGSEGV, SEGV_MTEAERR, 0);
> +			}
> +

Should this definitely be a force_signal_inject()?

SEGV_MTEAERR is not intrinsically fatal: it must be possible to run past
the error, because that's the whole point -- chances are we already did.

Compare this with MTESERR where running past the signal would lead to a
spin.


If MTEAERR is forced, a martian tag check failure might land in the
middle of a "normal" SIGSEGV, when SIGSEGV would usually be blocked for
good reasons, defeating the process' own handling mechanisms for no good
reason: delivering the MTEAERR when SIGSEGV is next unblocked seems
perfectly reasonable in such a case.

Only braindead software would block or ignore things like SIGSEGV across
exec, so software shouldn't end up ignoring these non-forced signals
unless it does so on purpose.

Alternatively, perhaps asynchronous errors should be delivered via a
different signal.  I don't have a good suggestion though.

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction
  2020-04-27 16:57   ` Dave Martin
@ 2020-04-28 11:43     ` Catalin Marinas
  2020-04-29 10:26       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-28 11:43 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

Hi Dave,

On Mon, Apr 27, 2020 at 05:57:37PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:25:41PM +0100, Catalin Marinas wrote:
> > There are situations where we do not want to disable the whole block
> > based on a config option, only the alternative part while keeping the
> > first instruction. Improve the alternative_insn assembler macro to take
> > a 'first_insn' argument, default 0, to preserve the current behaviour.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/alternative.h | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> > index 5e5dc05d63a0..67d7cc608336 100644
> > --- a/arch/arm64/include/asm/alternative.h
> > +++ b/arch/arm64/include/asm/alternative.h
> > @@ -111,7 +111,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> >  	.byte \alt_len
> >  .endm
> >  
> > -.macro alternative_insn insn1, insn2, cap, enable = 1
> > +/*
> > + * Disable the whole block if enable == 0, unless first_insn == 1 in which
> > + * case insn1 will always be issued but without an alternative insn2.
> > + */
> > +.macro alternative_insn insn1, insn2, cap, enable = 1, first_insn = 0
> >  	.if \enable
> >  661:	\insn1
> >  662:	.pushsection .altinstructions, "a"
> > @@ -122,6 +126,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> >  664:	.popsection
> >  	.org	. - (664b-663b) + (662b-661b)
> >  	.org	. - (662b-661b) + (664b-663b)
> > +	.elseif \first_insn
> > +	\insn1
> 
> This becomes quite unreadable at the invocation site, especially when
> invoked as "alternative_insn ..., 1".  "... first_insn=1" is not much
> better either).

That I agree.

The reason I didn't leave the alternative in place here is that if gas
doesn't support MTE, it will fail to compile. I wanted to avoid the
several #ifdef's.

> I'm struggling to find non-trivial users of this that actually want the
> whole block to be deleted dependent on the config.

Some of the errata stuff like CONFIG_ARM64_REPEAT_TLBI ends up with
unnecessary nops. Similarly for CONFIG_ARM64_UAO/PAN and maybe a few
others (it's all additional nops). We also have a few errata workaround
where we didn't bother with the config enable option at all.

While this is C code + inline asm, I'd like to have a consistent
behaviour of ALTERNATIVE between C and .S files. Now, given that some of
them (like UAO/PAN) are on by default, it probably doesn't make any
difference if we always keep the first block (non-alternative).

We could add a new macro ALTERNATIVE_OR_NOP.

> Can we instead just always behave as if first_insn=1 instead?  This this
> works intuitively as an alternative, not the current weird 3-way choice
> between insn1, insn2 and nothing at all.  The only time that makes sense
> is when one of the insns is a branch that skips the block, but that's
> handled via the alternative_if macros instead.
> 
> Behaving always like first_insn=1 provides an if-else that is statically
> optimised if the relevant feature is configured out, which I think is
> the only thing people are ever going to want.
> 
> Maybe something depends on the current behaviour, but I can't see it so
> far...

I'll give it a go in v4 and see how it looks.

Another option would be an alternative_else which takes an enable
argument.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-04-27 16:58   ` Dave Martin
@ 2020-04-28 13:43     ` Catalin Marinas
  2020-04-29 10:26       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-28 13:43 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

On Mon, Apr 27, 2020 at 05:58:22PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:25:50PM +0100, Catalin Marinas wrote:
> > From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > 
> > The Memory Tagging Extension has two modes of notifying a tag check
> > fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
> > 
> > 1. Synchronous raising of a Data Abort exception with DFSC 17.
> > 2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
> > 
> > Add the exception handler for the synchronous exception and handling of
> > the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
> > do_notify_resume().
> > 
> > On a tag check failure in user-space, whether synchronous or
> > asynchronous, a SIGSEGV will be raised on the faulting thread.
> 
> Has there been any discussion on whether this should be SIGSEGV or
> SIGBUS?
> 
> Probably neither is much more appropriate than the other.

You could argue either way. I don't recall a firm conclusion on this, so
I picked one that follows SPARC ADI.

> > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> > index 339882db5a91..e377d77c065e 100644
> > --- a/arch/arm64/kernel/signal.c
> > +++ b/arch/arm64/kernel/signal.c
> > @@ -732,6 +732,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> >  	regs->regs[29] = (unsigned long)&user->next_frame->fp;
> >  	regs->pc = (unsigned long)ka->sa.sa_handler;
> >  
> > +	/* TCO (Tag Check Override) always cleared for signal handlers */
> > +	regs->pstate &= ~PSR_TCO_BIT;
> > +
> >  	if (ka->sa.sa_flags & SA_RESTORER)
> >  		sigtramp = ka->sa.sa_restorer;
> >  	else
> > @@ -923,6 +926,11 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> >  			if (thread_flags & _TIF_UPROBE)
> >  				uprobe_notify_resume(regs);
> >  
> > +			if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
> > +				clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> > +				force_signal_inject(SIGSEGV, SEGV_MTEAERR, 0);
> > +			}
> > +
> 
> Should this definitely be a force_signal_inject()?
> 
> SEGV_MTEAERR is not intrinsically fatal: it must be possible to run past
> the error, because that's the whole point -- chances are we already did.
> 
> Compare this with MTESERR where running past the signal would lead to a
> spin.

Good point. This can be a send_sig_fault() (I need to check the right
API).

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-27 16:56   ` Dave Martin
@ 2020-04-28 14:06     ` Catalin Marinas
  2020-04-29 10:28       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-28 14:06 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Alexander Viro, Vincenzo Frascino, Will Deacon

On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > The copy_mount_options() function takes a user pointer argument but not
> > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> > not guaranteed to return all the accessible bytes if, for example, the
> > access crosses a page boundary and gets a fault on the second page. To
> > work around this, the current copy_mount_options() implementations
> > performs to copy_from_user() passes, first to the end of the current
> > page and the second to what's left in the subsequent page.
> > 
> > Some architectures like arm64 can guarantee an exact copy_from_user()
> > depending on the size (since the arch function performs some alignment
> > on the source register). Introduce an arch_has_exact_copy_from_user()
> > function and allow copy_mount_options() to perform the user access in a
> > single pass.
> > 
> > While this function is not on a critical path, the single-pass behaviour
> > is required for arm64 MTE (memory tagging) support where a uaccess can
> > trigger intra-page faults (tag not matching). With the current
> > implementation, if this happens during the first page, the function will
> > return -EFAULT.
> 
> Do you know how much extra overhead we'd incur if we read at must one
> tag granule at a time, instead of PAGE_SIZE?

Our copy routines already read 16 bytes at a time, so that's the tag
granule. With current copy_mount_options() we have the issue that it
assumes a fault in the first page is fatal.

Even if we change it to a loop of smaller uaccess, we still have the
issue of unaligned accesses which can fail without reading all that's
possible (i.e. the access goes across a tag granule boundary).

The previous copy_mount_options() implementation (from couple of months
ago I think) had a fallback to byte-by-byte, didn't have this issue.

> I'm guessing that in practice strcpy_from_user() type operations copy
> much less than a page most of the time, so what we lose in uaccess
> overheads we _might_ regain in less redundant copying.

strncpy_from_user() has a fallback to byte by byte, so we don't have an
issue here.

The above is only for synchronous accesses. For async, in v3 I disabled
such checks for the uaccess routines.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
  2020-04-21 15:29   ` Al Viro
  2020-04-27 16:56   ` Dave Martin
@ 2020-04-28 18:16   ` Kevin Brodsky
  2020-04-28 19:40     ` Catalin Marinas
  2020-04-29 11:58     ` Catalin Marinas
  2020-04-28 19:36   ` Catalin Marinas
  2020-04-29 10:26   ` Dave Martin
  4 siblings, 2 replies; 81+ messages in thread
From: Kevin Brodsky @ 2020-04-28 18:16 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Andrey Konovalov, Peter Collingbourne, linux-mm, linux-arch,
	Alexander Viro

On 21/04/2020 15:26, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current
> page and the second to what's left in the subsequent page.
>
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
>
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Will Deacon <will@kernel.org>
> ---
>
> Notes:
>      New in v3.
>
>   arch/arm64/include/asm/uaccess.h | 11 +++++++++++
>   fs/namespace.c                   |  7 +++++--
>   include/linux/uaccess.h          |  8 ++++++++
>   3 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..566da441eba2 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
>   #define INLINE_COPY_TO_USER
>   #define INLINE_COPY_FROM_USER
>   
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	/*
> +	 * copy_from_user() aligns the source pointer if the size is greater
> +	 * than 15. Since all the loads are naturally aligned, they can only
> +	 * fail on the first byte.
> +	 */
> +	return n > 15;
> +}
> +#define arch_has_exact_copy_from_user
> +
>   extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
>   static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
>   {
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a28e4db075ed..8febc50dfc5d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
>   	if (!copy)
>   		return ERR_PTR(-ENOMEM);
>   
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>   
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>   		kfree(copy);
>   		return ERR_PTR(-EFAULT);
>   	}
>   	if (size != PAGE_SIZE) {
> +		WARN_ON(1);

I'm not sure I understand the rationale here. If we don't have exact copy_from_user 
for size, then we will attempt to copy up to the end of the page. Assuming this 
doesn't fault, we then want to carry on copying from the start of the next page, 
until we reach a total size of up to 4K. Why would we warn in that case? AIUI, if you 
don't have exact copy_from_user, there are 3 cases:
1. copy_from_user() returns size, we bail out.
2. copy_from_user() returns 0, we carry on copying from the next page.
3. copy_from_user() returns anything else, we return immediately.

I think you're not handling case 3 here.

Kevin

>   		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>   			memset(copy + size, 0, PAGE_SIZE - size);
>   	}
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 67f016010aad..00e097a9e8d6 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -152,6 +152,14 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
>   		n = _copy_to_user(to, from, n);
>   	return n;
>   }
> +
> +#ifndef arch_has_exact_copy_from_user
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	return false;
> +}
> +#endif
> +
>   #ifdef CONFIG_COMPAT
>   static __always_inline unsigned long __must_check
>   copy_in_user(void __user *to, const void __user *from, unsigned long n)



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
                     ` (2 preceding siblings ...)
  2020-04-28 18:16   ` Kevin Brodsky
@ 2020-04-28 19:36   ` Catalin Marinas
  2020-04-29 10:26   ` Dave Martin
  4 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-28 19:36 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Alexander Viro

On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
>  	if (!copy)
>  		return ERR_PTR(-ENOMEM);
>  
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>  
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>  		kfree(copy);
>  		return ERR_PTR(-EFAULT);
>  	}
>  	if (size != PAGE_SIZE) {
> +		WARN_ON(1);
>  		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>  			memset(copy + size, 0, PAGE_SIZE - size);
>  	}

Argh, this WARN_ON should not be here at all. It's something I added to
make check that I don't reach this part in arm64. Will remove in v4.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-28 18:16   ` Kevin Brodsky
@ 2020-04-28 19:40     ` Catalin Marinas
  2020-04-29 11:58     ` Catalin Marinas
  1 sibling, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-28 19:40 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Andrey Konovalov, Peter Collingbourne,
	linux-mm, linux-arch, Alexander Viro

On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote:
> On 21/04/2020 15:26, Catalin Marinas wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 32fc8061aa76..566da441eba2 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> >   #define INLINE_COPY_TO_USER
> >   #define INLINE_COPY_FROM_USER
> > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > +{
> > +	/*
> > +	 * copy_from_user() aligns the source pointer if the size is greater
> > +	 * than 15. Since all the loads are naturally aligned, they can only
> > +	 * fail on the first byte.
> > +	 */
> > +	return n > 15;
> > +}
> > +#define arch_has_exact_copy_from_user
> > +
> >   extern unsigned long __must_check __arch_clear_user(void __user *to, unsigned long n);
> >   static inline unsigned long __must_check __clear_user(void __user *to, unsigned long n)
> >   {
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> >   	if (!copy)
> >   		return ERR_PTR(-ENOMEM);
> > -	size = PAGE_SIZE - offset_in_page(data);
> > +	size = PAGE_SIZE;
> > +	if (!arch_has_exact_copy_from_user(size))
> > +		size -= offset_in_page(data);
> > -	if (copy_from_user(copy, data, size)) {
> > +	if (copy_from_user(copy, data, size) == size) {
> >   		kfree(copy);
> >   		return ERR_PTR(-EFAULT);
> >   	}
> >   	if (size != PAGE_SIZE) {
> > +		WARN_ON(1);
> 
> I'm not sure I understand the rationale here. If we don't have exact
> copy_from_user for size, then we will attempt to copy up to the end of the
> page. Assuming this doesn't fault, we then want to carry on copying from the
> start of the next page, until we reach a total size of up to 4K. Why would
> we warn in that case?

We shouldn't warn, thanks for spotting this. I added it for some testing
and somehow ended up in the commit.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction
  2020-04-28 11:43     ` Catalin Marinas
@ 2020-04-29 10:26       ` Dave Martin
  2020-04-29 14:04         ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-29 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Tue, Apr 28, 2020 at 12:43:54PM +0100, Catalin Marinas wrote:
> Hi Dave,
> 
> On Mon, Apr 27, 2020 at 05:57:37PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:25:41PM +0100, Catalin Marinas wrote:
> > > There are situations where we do not want to disable the whole block
> > > based on a config option, only the alternative part while keeping the
> > > first instruction. Improve the alternative_insn assembler macro to take
> > > a 'first_insn' argument, default 0, to preserve the current behaviour.
> > > 
> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/alternative.h | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> > > index 5e5dc05d63a0..67d7cc608336 100644
> > > --- a/arch/arm64/include/asm/alternative.h
> > > +++ b/arch/arm64/include/asm/alternative.h
> > > @@ -111,7 +111,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> > >  	.byte \alt_len
> > >  .endm
> > >  
> > > -.macro alternative_insn insn1, insn2, cap, enable = 1
> > > +/*
> > > + * Disable the whole block if enable == 0, unless first_insn == 1 in which
> > > + * case insn1 will always be issued but without an alternative insn2.
> > > + */
> > > +.macro alternative_insn insn1, insn2, cap, enable = 1, first_insn = 0
> > >  	.if \enable
> > >  661:	\insn1
> > >  662:	.pushsection .altinstructions, "a"
> > > @@ -122,6 +126,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> > >  664:	.popsection
> > >  	.org	. - (664b-663b) + (662b-661b)
> > >  	.org	. - (662b-661b) + (664b-663b)
> > > +	.elseif \first_insn
> > > +	\insn1
> > 
> > This becomes quite unreadable at the invocation site, especially when
> > invoked as "alternative_insn ..., 1".  "... first_insn=1" is not much
> > better either).
> 
> That I agree.
> 
> The reason I didn't leave the alternative in place here is that if gas
> doesn't support MTE, it will fail to compile. I wanted to avoid the
> several #ifdef's.

We could solve that by synthesising the opcodes instead of relying on
gas (as we do for other extensions).

But I'd agree that's just pushing the problem around rather than solving
it.  It seems dumb to go to that trouble for a case where the affected
insn isn't going to be emitted...


> > I'm struggling to find non-trivial users of this that actually want the
> > whole block to be deleted dependent on the config.
> 
> Some of the errata stuff like CONFIG_ARM64_REPEAT_TLBI ends up with
> unnecessary nops. Similarly for CONFIG_ARM64_UAO/PAN and maybe a few
> others (it's all additional nops). We also have a few errata workaround
> where we didn't bother with the config enable option at all.

OK, looks like I may have missed some cases.  There's a dense thicket of
macros that call each other here, and I've not looked at it for a while ;)

> While this is C code + inline asm, I'd like to have a consistent
> behaviour of ALTERNATIVE between C and .S files. Now, given that some of
> them (like UAO/PAN) are on by default, it probably doesn't make any
> difference if we always keep the first block (non-alternative).
> 
> We could add a new macro ALTERNATIVE_OR_NOP.

alternative_insn doesn't seem exist for C at all.  Did I miss something?


> > Can we instead just always behave as if first_insn=1 instead?  This this
> > works intuitively as an alternative, not the current weird 3-way choice
> > between insn1, insn2 and nothing at all.  The only time that makes sense
> > is when one of the insns is a branch that skips the block, but that's
> > handled via the alternative_if macros instead.
> > 
> > Behaving always like first_insn=1 provides an if-else that is statically
> > optimised if the relevant feature is configured out, which I think is
> > the only thing people are ever going to want.
> > 
> > Maybe something depends on the current behaviour, but I can't see it so
> > far...
> 
> I'll give it a go in v4 and see how it looks.
> 
> Another option would be an alternative_else which takes an enable
> argument.

Sure, I think it could make sense to have a different wrapper so that
the meaning of invocations is clearer for this special case.


For the underlying macro, maybe it would be simpler to make it truly
3-way:

.macro alternative_insn insn_with_cap:req, insn_without_cap:req, cap:req, \
				enable_alternative=1, fallback_insn=
	// ...
	.if (\enable_alternative)
		// as currently
	.else
	\fallback_insn
	.endif
.endm

Then we can rejig the various frontends around that.

If you don't want anything when the alternative is disabled, you just
omit fallback_insn.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults
  2020-04-28 13:43     ` Catalin Marinas
@ 2020-04-29 10:26       ` Dave Martin
  0 siblings, 0 replies; 81+ messages in thread
From: Dave Martin @ 2020-04-29 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Tue, Apr 28, 2020 at 02:43:01PM +0100, Catalin Marinas wrote:
> On Mon, Apr 27, 2020 at 05:58:22PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:25:50PM +0100, Catalin Marinas wrote:
> > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > 
> > > The Memory Tagging Extension has two modes of notifying a tag check
> > > fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
> > > 
> > > 1. Synchronous raising of a Data Abort exception with DFSC 17.
> > > 2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
> > > 
> > > Add the exception handler for the synchronous exception and handling of
> > > the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
> > > do_notify_resume().
> > > 
> > > On a tag check failure in user-space, whether synchronous or
> > > asynchronous, a SIGSEGV will be raised on the faulting thread.
> > 
> > Has there been any discussion on whether this should be SIGSEGV or
> > SIGBUS?
> > 
> > Probably neither is much more appropriate than the other.
> 
> You could argue either way. I don't recall a firm conclusion on this, so
> I picked one that follows SPARC ADI.

Agreed, that precedent is good enough for me.  I hadn't refreshed my
memory of how sparc was using these signals.

> 
> > > diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
> > > index 339882db5a91..e377d77c065e 100644
> > > --- a/arch/arm64/kernel/signal.c
> > > +++ b/arch/arm64/kernel/signal.c
> > > @@ -732,6 +732,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > >  	regs->regs[29] = (unsigned long)&user->next_frame->fp;
> > >  	regs->pc = (unsigned long)ka->sa.sa_handler;
> > >  
> > > +	/* TCO (Tag Check Override) always cleared for signal handlers */
> > > +	regs->pstate &= ~PSR_TCO_BIT;
> > > +
> > >  	if (ka->sa.sa_flags & SA_RESTORER)
> > >  		sigtramp = ka->sa.sa_restorer;
> > >  	else
> > > @@ -923,6 +926,11 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
> > >  			if (thread_flags & _TIF_UPROBE)
> > >  				uprobe_notify_resume(regs);
> > >  
> > > +			if (thread_flags & _TIF_MTE_ASYNC_FAULT) {
> > > +				clear_thread_flag(TIF_MTE_ASYNC_FAULT);
> > > +				force_signal_inject(SIGSEGV, SEGV_MTEAERR, 0);
> > > +			}
> > > +
> > 
> > Should this definitely be a force_signal_inject()?
> > 
> > SEGV_MTEAERR is not intrinsically fatal: it must be possible to run past
> > the error, because that's the whole point -- chances are we already did.
> > 
> > Compare this with MTESERR where running past the signal would lead to a
> > spin.
> 
> Good point. This can be a send_sig_fault() (I need to check the right
> API).

Sounds fair.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
                     ` (3 preceding siblings ...)
  2020-04-28 19:36   ` Catalin Marinas
@ 2020-04-29 10:26   ` Dave Martin
  2020-04-29 13:52     ` Catalin Marinas
  4 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-29 10:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Alexander Viro, Vincenzo Frascino, Will Deacon

On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> The copy_mount_options() function takes a user pointer argument but not
> a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> not guaranteed to return all the accessible bytes if, for example, the
> access crosses a page boundary and gets a fault on the second page. To
> work around this, the current copy_mount_options() implementations
> performs to copy_from_user() passes, first to the end of the current

implementation performs two

> page and the second to what's left in the subsequent page.
> 
> Some architectures like arm64 can guarantee an exact copy_from_user()
> depending on the size (since the arch function performs some alignment
> on the source register). Introduce an arch_has_exact_copy_from_user()
> function and allow copy_mount_options() to perform the user access in a
> single pass.
> 
> While this function is not on a critical path, the single-pass behaviour
> is required for arm64 MTE (memory tagging) support where a uaccess can
> trigger intra-page faults (tag not matching). With the current
> implementation, if this happens during the first page, the function will
> return -EFAULT.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Will Deacon <will@kernel.org>
> ---
> 
> Notes:
>     New in v3.
> 
>  arch/arm64/include/asm/uaccess.h | 11 +++++++++++
>  fs/namespace.c                   |  7 +++++--
>  include/linux/uaccess.h          |  8 ++++++++
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..566da441eba2 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
>  #define INLINE_COPY_TO_USER
>  #define INLINE_COPY_FROM_USER
>  
> +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> +{
> +	/*
> +	 * copy_from_user() aligns the source pointer if the size is greater
> +	 * than 15. Since all the loads are naturally aligned, they can only
> +	 * fail on the first byte.
> +	 */
> +	return n > 15;
> +}
> +#define arch_has_exact_copy_from_user

Did you mean:

#define arch_has_exact_copy_from_user arch_has_exact_copy_from_user

Mind you, if this expands to 1 I'd have expected copy_mount_options()
not to compile, so I may be missing something.

[...]

> diff --git a/fs/namespace.c b/fs/namespace.c
> index a28e4db075ed..8febc50dfc5d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)

[ Is this applying a band-aid to duct tape?

The fs presumably knows ahead of time whether it's expecting a string or
a fixed-size blob for data, so I'd hope we could just DTRT rather than
playing SEGV roulette here.

This might require more refactoring than makes sense for this series
though. ]

>  	if (!copy)
>  		return ERR_PTR(-ENOMEM);
>  
> -	size = PAGE_SIZE - offset_in_page(data);
> +	size = PAGE_SIZE;
> +	if (!arch_has_exact_copy_from_user(size))
> +		size -= offset_in_page(data);
>  
> -	if (copy_from_user(copy, data, size)) {
> +	if (copy_from_user(copy, data, size) == size) {
>  		kfree(copy);
>  		return ERR_PTR(-EFAULT);
>  	}
>  	if (size != PAGE_SIZE) {
> +		WARN_ON(1);
>  		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
>  			memset(copy + size, 0, PAGE_SIZE - size);
>  	}

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
  2020-04-24 23:28   ` Peter Collingbourne
@ 2020-04-29 10:27   ` Kevin Brodsky
  2020-04-29 15:24     ` Catalin Marinas
  2020-04-29 16:46   ` Dave Martin
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 81+ messages in thread
From: Kevin Brodsky @ 2020-04-29 10:27 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Andrey Konovalov, Peter Collingbourne, linux-mm, linux-arch,
	Alan Hayward, Luis Machado, Omair Javaid

On 21/04/2020 15:25, Catalin Marinas wrote:
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
>
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
>
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
>
> Notes:
>      New in v3.
>
>   arch/arm64/include/asm/mte.h         |  17 ++++
>   arch/arm64/include/uapi/asm/ptrace.h |   3 +
>   arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>   arch/arm64/kernel/ptrace.c           |  15 +++-
>   arch/arm64/lib/mte.S                 |  50 +++++++++++
>   5 files changed, 211 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 22eb3e06f311..0ca2aaff07a1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -2,12 +2,21 @@
>   #ifndef __ASM_MTE_H
>   #define __ASM_MTE_H
>   
> +#define MTE_ALLOC_SIZE	UL(16)
> +#define MTE_ALLOC_MASK	(~(MTE_ALLOC_SIZE - 1))

Nit: maybe MTE_GRANULE_* would be clearer than MTE_ALLOC_*?

> +#define MTE_TAG_SHIFT	(56)
> +#define MTE_TAG_SIZE	(4)
> +
>   #ifndef __ASSEMBLY__
>   
>   #include <linux/sched.h>
>   
>   /* Memory Tagging API */
>   int mte_memcmp_pages(const void *page1_addr, const void *page2_addr);
> +unsigned long mte_copy_tags_from_user(void *to, const void __user *from,
> +				      unsigned long n);
> +unsigned long mte_copy_tags_to_user(void __user *to, void *from,
> +				    unsigned long n);
>   
>   #ifdef CONFIG_ARM64_MTE
>   void flush_mte_state(void);
> @@ -15,6 +24,8 @@ void mte_thread_switch(struct task_struct *next);
>   void mte_suspend_exit(void);
>   long set_mte_ctrl(unsigned long arg);
>   long get_mte_ctrl(void);
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +			 unsigned long addr, unsigned long data);
>   #else
>   static inline void flush_mte_state(void)
>   {
> @@ -33,6 +44,12 @@ static inline long get_mte_ctrl(void)
>   {
>   	return 0;
>   }
> +static inline int mte_ptrace_copy_tags(struct task_struct *child,
> +				       long request, unsigned long addr,
> +				       unsigned long data)
> +{
> +	return -EIO;
> +}
>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
> index 1daf6dda8af0..cd2a4a164de3 100644
> --- a/arch/arm64/include/uapi/asm/ptrace.h
> +++ b/arch/arm64/include/uapi/asm/ptrace.h
> @@ -67,6 +67,9 @@
>   /* syscall emulation path in ptrace */
>   #define PTRACE_SYSEMU		  31
>   #define PTRACE_SYSEMU_SINGLESTEP  32
> +/* MTE allocation tag access */
> +#define PTRACE_PEEKMTETAGS	  33
> +#define PTRACE_POKEMTETAGS	  34
>   
>   #ifndef __ASSEMBLY__
>   
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index fa4a4196b248..0cb496ed9bf9 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -3,12 +3,17 @@
>    * Copyright (C) 2020 ARM Ltd.
>    */
>   
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
>   #include <linux/prctl.h>
>   #include <linux/sched.h>
> +#include <linux/sched/mm.h>
>   #include <linux/thread_info.h>
> +#include <linux/uio.h>
>   
>   #include <asm/cpufeature.h>
>   #include <asm/mte.h>
> +#include <asm/ptrace.h>
>   #include <asm/sysreg.h>
>   
>   static void update_sctlr_el1_tcf0(u64 tcf0)
> @@ -133,3 +138,125 @@ long get_mte_ctrl(void)
>   
>   	return ret;
>   }
> +
> +/*
> + * Access MTE tags in another process' address space as given in mm. Update
> + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> + * Inspired by __access_remote_vm().
> + */
> +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> +				unsigned long addr, struct iovec *kiov,
> +				unsigned int gup_flags)
> +{
> +	struct vm_area_struct *vma;
> +	void __user *buf = kiov->iov_base;
> +	size_t len = kiov->iov_len;
> +	int ret;
> +	int write = gup_flags & FOLL_WRITE;
> +
> +	if (down_read_killable(&mm->mmap_sem))
> +		return -EIO;
> +
> +	if (!access_ok(buf, len))
> +		return -EFAULT;
> +
> +	while (len) {
> +		unsigned long tags, offset;
> +		void *maddr;
> +		struct page *page = NULL;
> +
> +		ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
> +					    &page, &vma, NULL);
> +		if (ret <= 0)
> +			break;
> +
> +		/* limit access to the end of the page */
> +		offset = offset_in_page(addr);
> +		tags = min(len, (PAGE_SIZE - offset) / MTE_ALLOC_SIZE);
> +
> +		maddr = page_address(page);
> +		if (write) {
> +			tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
> +			set_page_dirty_lock(page);
> +		} else {
> +			tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
> +		}
> +		put_page(page);
> +
> +		/* error accessing the tracer's buffer */
> +		if (!tags)
> +			break;
> +
> +		len -= tags;
> +		buf += tags;
> +		addr += tags * MTE_ALLOC_SIZE;
> +	}
> +	up_read(&mm->mmap_sem);
> +
> +	/* return an error if no tags copied */
> +	kiov->iov_len = buf - kiov->iov_base;
> +	if (!kiov->iov_len) {
> +		/* check for error accessing the tracee's address space */
> +		if (ret <= 0)
> +			return -EIO;
> +		else
> +			return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Copy MTE tags in another process' address space at 'addr' to/from tracer's
> + * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
> + */
> +static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
> +			      struct iovec *kiov, unsigned int gup_flags)
> +{
> +	struct mm_struct *mm;
> +	int ret;
> +
> +	mm = get_task_mm(tsk);
> +	if (!mm)
> +		return -EPERM;
> +
> +	if (!tsk->ptrace || (current != tsk->parent) ||
> +	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
> +	     !ptracer_capable(tsk, mm->user_ns))) {
> +		mmput(mm);
> +		return -EPERM;
> +	}
> +
> +	ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +			 unsigned long addr, unsigned long data)
> +{
> +	int ret;
> +	struct iovec kiov;
> +	struct iovec __user *uiov = (void __user *)data;
> +	unsigned int gup_flags = FOLL_FORCE;
> +
> +	if (!system_supports_mte())
> +		return -EIO;
> +
> +	if (get_user(kiov.iov_base, &uiov->iov_base) ||
> +	    get_user(kiov.iov_len, &uiov->iov_len))
> +		return -EFAULT;
> +
> +	if (request == PTRACE_POKEMTETAGS)
> +		gup_flags |= FOLL_WRITE;
> +
> +	/* align addr to the MTE tag granule */
> +	addr &= MTE_ALLOC_MASK;
> +
> +	ret = access_remote_tags(child, addr, &kiov, gup_flags);
> +	if (!ret)
> +		ret = __put_user(kiov.iov_len, &uiov->iov_len);
> +
> +	return ret;
> +}
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 077e352495eb..1fdb841ad536 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -34,6 +34,7 @@
>   #include <asm/cpufeature.h>
>   #include <asm/debug-monitors.h>
>   #include <asm/fpsimd.h>
> +#include <asm/mte.h>
>   #include <asm/pgtable.h>
>   #include <asm/pointer_auth.h>
>   #include <asm/stacktrace.h>
> @@ -1797,7 +1798,19 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>   long arch_ptrace(struct task_struct *child, long request,
>   		 unsigned long addr, unsigned long data)
>   {
> -	return ptrace_request(child, request, addr, data);
> +	int ret;
> +
> +	switch (request) {
> +	case PTRACE_PEEKMTETAGS:
> +	case PTRACE_POKEMTETAGS:
> +		ret = mte_ptrace_copy_tags(child, request, addr, data);
> +		break;
> +	default:
> +		ret = ptrace_request(child, request, addr, data);
> +		break;
> +	}
> +
> +	return ret;
>   }
>   
>   enum ptrace_syscall_dir {
> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> index bd51ea7e2fcb..45be04a8c73c 100644
> --- a/arch/arm64/lib/mte.S
> +++ b/arch/arm64/lib/mte.S
> @@ -5,6 +5,7 @@
>   #include <linux/linkage.h>
>   
>   #include <asm/assembler.h>
> +#include <asm/mte.h>
>   
>   /*
>    * Compare tags of two pages
> @@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
>   
>   	ret
>   SYM_FUNC_END(mte_memcmp_pages)
> +
> +/*
> + * Read tags from a user buffer (one tag per byte) and set the corresponding
> + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> + *   x0 - kernel address (to)
> + *   x1 - user buffer (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_from_user)
> +	mov	x3, x1
> +1:
> +USER(2f, ldtrb	w4, [x1])

Here we are making either of the following assumptions:
1. The __user pointer (here `from`) actually points to user memory, not kernel memory 
(and we have set_fs(USER_DS) in place).
2. CONFIG_ARM64_UAO is enabled and the hardware implements UAO.

1. is currently true because these functions are only used for the new ptrace 
requests, which indeed pass pointers to user memory. However, future users of these 
functions may not know about this requirement.
2. is not necessarily true because ARM64_MTE does not depend on ARM64_UAO.

It is unlikely that future users of these functions actually need to pass __user 
pointers to kernel memory, so adding a comment spelling out the first assumption is 
probably fine.

Kevin

> +	lsl	x4, x4, #MTE_TAG_SHIFT
> +	stg	x4, [x0], #MTE_ALLOC_SIZE
> +	add	x1, x1, #1
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x1, x3		// update the number of tags set
> +	ret
> +SYM_FUNC_END(mte_copy_tags_from_user)
> +
> +/*
> + * Get the tags from a kernel address range and write the tag values to the
> + * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
> + *   x0 - user buffer (to)
> + *   x1 - kernel address (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_to_user)
> +	mov	x3, x0
> +1:
> +	ldg	x4, [x1]
> +	ubfx	x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
> +USER(2f, sttrb	w4, [x0])
> +	add	x0, x0, #1
> +	add	x1, x1, #MTE_ALLOC_SIZE
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x0, x3		// update the number of tags copied
> +	ret
> +SYM_FUNC_END(mte_copy_tags_from_user)



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-28 14:06     ` Catalin Marinas
@ 2020-04-29 10:28       ` Dave Martin
  0 siblings, 0 replies; 81+ messages in thread
From: Dave Martin @ 2020-04-29 10:28 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Alexander Viro,
	Vincenzo Frascino, Peter Collingbourne, linux-arm-kernel

On Tue, Apr 28, 2020 at 03:06:27PM +0100, Catalin Marinas wrote:
> On Mon, Apr 27, 2020 at 05:56:42PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > > The copy_mount_options() function takes a user pointer argument but not
> > > a size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
> > > not guaranteed to return all the accessible bytes if, for example, the
> > > access crosses a page boundary and gets a fault on the second page. To
> > > work around this, the current copy_mount_options() implementations
> > > performs to copy_from_user() passes, first to the end of the current
> > > page and the second to what's left in the subsequent page.
> > > 
> > > Some architectures like arm64 can guarantee an exact copy_from_user()
> > > depending on the size (since the arch function performs some alignment
> > > on the source register). Introduce an arch_has_exact_copy_from_user()
> > > function and allow copy_mount_options() to perform the user access in a
> > > single pass.
> > > 
> > > While this function is not on a critical path, the single-pass behaviour
> > > is required for arm64 MTE (memory tagging) support where a uaccess can
> > > trigger intra-page faults (tag not matching). With the current
> > > implementation, if this happens during the first page, the function will
> > > return -EFAULT.
> > 
> > Do you know how much extra overhead we'd incur if we read at must one
> > tag granule at a time, instead of PAGE_SIZE?
> 
> Our copy routines already read 16 bytes at a time, so that's the tag
> granule. With current copy_mount_options() we have the issue that it
> assumes a fault in the first page is fatal.
> 
> Even if we change it to a loop of smaller uaccess, we still have the
> issue of unaligned accesses which can fail without reading all that's
> possible (i.e. the access goes across a tag granule boundary).
> 
> The previous copy_mount_options() implementation (from couple of months
> ago I think) had a fallback to byte-by-byte, didn't have this issue.
> 
> > I'm guessing that in practice strcpy_from_user() type operations copy
> > much less than a page most of the time, so what we lose in uaccess
> > overheads we _might_ regain in less redundant copying.
> 
> strncpy_from_user() has a fallback to byte by byte, so we don't have an
> issue here.
> 
> The above is only for synchronous accesses. For async, in v3 I disabled
> such checks for the uaccess routines.

Fair enough, I hadn't fully got my head around what's going on here.

(But see my other reply.)


I was suspicious about the WARN_ON(), but I see people are on top of
that.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-28 18:16   ` Kevin Brodsky
  2020-04-28 19:40     ` Catalin Marinas
@ 2020-04-29 11:58     ` Catalin Marinas
  1 sibling, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-29 11:58 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Andrey Konovalov, Peter Collingbourne,
	linux-mm, linux-arch, Alexander Viro

On Tue, Apr 28, 2020 at 07:16:29PM +0100, Kevin Brodsky wrote:
> On 21/04/2020 15:26, Catalin Marinas wrote:
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> >   	if (!copy)
> >   		return ERR_PTR(-ENOMEM);
> > -	size = PAGE_SIZE - offset_in_page(data);
> > +	size = PAGE_SIZE;
> > +	if (!arch_has_exact_copy_from_user(size))
> > +		size -= offset_in_page(data);
> > -	if (copy_from_user(copy, data, size)) {
> > +	if (copy_from_user(copy, data, size) == size) {
> >   		kfree(copy);
> >   		return ERR_PTR(-EFAULT);
> >   	}
> >   	if (size != PAGE_SIZE) {
> > +		WARN_ON(1);
> 
> I'm not sure I understand the rationale here. If we don't have exact
> copy_from_user for size, then we will attempt to copy up to the end of the
> page. Assuming this doesn't fault, we then want to carry on copying from the
> start of the next page, until we reach a total size of up to 4K. Why would
> we warn in that case? AIUI, if you don't have exact copy_from_user, there
> are 3 cases:
> 1. copy_from_user() returns size, we bail out.
> 2. copy_from_user() returns 0, we carry on copying from the next page.
> 3. copy_from_user() returns anything else, we return immediately.
> 
> I think you're not handling case 3 here.

(3) is still handled as (2) since the only check we have is whether
copy_from_user() returned size. Since size is not updated, it falls
through the second if block (where WARN_ON should have disappeared).

Thinking some more about this, I think it can be simplified without
adding arch_has_exact_copy_from_user(). We do have to guarantee on arm64
that a copy_from_user() to the end of a page (4K aligned, hence tag
granule aligned) is exact but that's just matching the current
semantics.

What about this new patch below, replacing the current one:

-------------8<-------------------------------
From cf9a1c9668ce77af3ef6589ee8038e91df127dab Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 15 Apr 2020 18:45:44 +0100
Subject: [PATCH] fs: Handle intra-page faults in copy_mount_options()

The copy_mount_options() function takes a user pointer argument but no
size. It tries to read up to a PAGE_SIZE. However, copy_from_user() is
not guaranteed to return all the accessible bytes if, for example, the
access crosses a page boundary and gets a fault on the second page. To
work around this, the current copy_mount_options() implementation
performs two copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.

On arm64 with MTE enabled, access to a user page may trigger a fault
after part of the buffer has been copied (when the user pointer tag,
bits 56-59, no longer matches the allocation tag stored in memory).
Allow copy_mount_options() to handle such case by only returning -EFAULT
if the first copy_from_user() has not copied any bytes.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
---
 fs/namespace.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a28e4db075ed..51eecbd8ea89 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3016,7 +3016,7 @@ static void shrink_submounts(struct mount *mnt)
 void *copy_mount_options(const void __user * data)
 {
 	char *copy;
-	unsigned size;
+	unsigned size, left;
 
 	if (!data)
 		return NULL;
@@ -3027,11 +3027,22 @@ void *copy_mount_options(const void __user * data)
 
 	size = PAGE_SIZE - offset_in_page(data);
 
-	if (copy_from_user(copy, data, size)) {
+	/*
+	 * Attempt to copy to the end of the first user page. On success,
+	 * left == 0, copy the rest from the second user page (if it is
+	 * accessible).
+	 *
+	 * On architectures with intra-page faults (arm64 with MTE), the read
+	 * from the first page may fail after copying part of the user data
+	 * (left > 0 && left < size). Do not attempt the second copy in this
+	 * case as the end of the valid user buffer has already been reached.
+	 */
+	left = copy_from_user(copy, data, size);
+	if (left == size) {
 		kfree(copy);
 		return ERR_PTR(-EFAULT);
 	}
-	if (size != PAGE_SIZE) {
+	if (left == 0 && size != PAGE_SIZE) {
 		if (copy_from_user(copy + size, data + size, PAGE_SIZE - size))
 			memset(copy + size, 0, PAGE_SIZE - size);
 	}


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-29 10:26   ` Dave Martin
@ 2020-04-29 13:52     ` Catalin Marinas
  2020-05-04 16:40       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-29 13:52 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Alexander Viro, Vincenzo Frascino, Will Deacon

On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 32fc8061aa76..566da441eba2 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> >  #define INLINE_COPY_TO_USER
> >  #define INLINE_COPY_FROM_USER
> >  
> > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > +{
> > +	/*
> > +	 * copy_from_user() aligns the source pointer if the size is greater
> > +	 * than 15. Since all the loads are naturally aligned, they can only
> > +	 * fail on the first byte.
> > +	 */
> > +	return n > 15;
> > +}
> > +#define arch_has_exact_copy_from_user
> 
> Did you mean:
> 
> #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user

Yes (and I shouldn't write patches late in the day).

> Mind you, if this expands to 1 I'd have expected copy_mount_options()
> not to compile, so I may be missing something.

I think arch_has_exact_copy_from_user() (with the braces) is looked up
in the function namespace, so the macro isn't expanded. So arguably the
patch is correct but pretty dodgy ;).

I scrapped this in my second attempt in reply to Kevin.

> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index a28e4db075ed..8febc50dfc5d 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> 
> [ Is this applying a band-aid to duct tape?
> 
> The fs presumably knows ahead of time whether it's expecting a string or
> a fixed-size blob for data, so I'd hope we could just DTRT rather than
> playing SEGV roulette here.
> 
> This might require more refactoring than makes sense for this series
> though. ]

That's possible but it means moving the copy from sys_mount() to the
specific places where it has additional information (the filesystems).
I'm not even sure it's guaranteed to be strings. If it is, we could just
replace all this with a strncpy_from_user().

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction
  2020-04-29 10:26       ` Dave Martin
@ 2020-04-29 14:04         ` Catalin Marinas
  0 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-29 14:04 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Wed, Apr 29, 2020 at 11:26:00AM +0100, Dave P Martin wrote:
> On Tue, Apr 28, 2020 at 12:43:54PM +0100, Catalin Marinas wrote:
> > On Mon, Apr 27, 2020 at 05:57:37PM +0100, Dave P Martin wrote:
> > > On Tue, Apr 21, 2020 at 03:25:41PM +0100, Catalin Marinas wrote:
> > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> > > > index 5e5dc05d63a0..67d7cc608336 100644
> > > > --- a/arch/arm64/include/asm/alternative.h
> > > > +++ b/arch/arm64/include/asm/alternative.h
> > > > @@ -111,7 +111,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> > > >  	.byte \alt_len
> > > >  .endm
> > > >  
> > > > -.macro alternative_insn insn1, insn2, cap, enable = 1
> > > > +/*
> > > > + * Disable the whole block if enable == 0, unless first_insn == 1 in which
> > > > + * case insn1 will always be issued but without an alternative insn2.
> > > > + */
> > > > +.macro alternative_insn insn1, insn2, cap, enable = 1, first_insn = 0
> > > >  	.if \enable
> > > >  661:	\insn1
> > > >  662:	.pushsection .altinstructions, "a"
> > > > @@ -122,6 +126,8 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> > > >  664:	.popsection
> > > >  	.org	. - (664b-663b) + (662b-661b)
> > > >  	.org	. - (662b-661b) + (664b-663b)
> > > > +	.elseif \first_insn
> > > > +	\insn1
> > > 
> > > This becomes quite unreadable at the invocation site, especially when
> > > invoked as "alternative_insn ..., 1".  "... first_insn=1" is not much
> > > better either).
> > 
> > That I agree.
> > 
> > The reason I didn't leave the alternative in place here is that if gas
> > doesn't support MTE, it will fail to compile. I wanted to avoid the
> > several #ifdef's.
> 
> We could solve that by synthesising the opcodes instead of relying on
> gas (as we do for other extensions).

While in this particular case the instruction takes only one register,
we need gas with MTE support anyway for more complex instructions in the
other .S files. I don't think it's worth the effort of writing our own
assembler in the kernel as macros.

> > While this is C code + inline asm, I'd like to have a consistent
> > behaviour of ALTERNATIVE between C and .S files. Now, given that some of
> > them (like UAO/PAN) are on by default, it probably doesn't make any
> > difference if we always keep the first block (non-alternative).
> > 
> > We could add a new macro ALTERNATIVE_OR_NOP.
> 
> alternative_insn doesn't seem exist for C at all.  Did I miss something?

There is ALTERNATIVE() which is defined for both C and asm (the latter
ends up using alternative_insn).

> > > Can we instead just always behave as if first_insn=1 instead?  This this
> > > works intuitively as an alternative, not the current weird 3-way choice
> > > between insn1, insn2 and nothing at all.  The only time that makes sense
> > > is when one of the insns is a branch that skips the block, but that's
> > > handled via the alternative_if macros instead.
> > > 
> > > Behaving always like first_insn=1 provides an if-else that is statically
> > > optimised if the relevant feature is configured out, which I think is
> > > the only thing people are ever going to want.
> > > 
> > > Maybe something depends on the current behaviour, but I can't see it so
> > > far...
> > 
> > I'll give it a go in v4 and see how it looks.
> > 
> > Another option would be an alternative_else which takes an enable
> > argument.
> 
> Sure, I think it could make sense to have a different wrapper so that
> the meaning of invocations is clearer for this special case.
> 
> 
> For the underlying macro, maybe it would be simpler to make it truly
> 3-way:
> 
> .macro alternative_insn insn_with_cap:req, insn_without_cap:req, cap:req, \
> 				enable_alternative=1, fallback_insn=

'fallback' is an option as well.

See below for what it takes to always emit the first instruction in the
alternative blocks (replacing this patch). The clear_page() zeroing line
would become:

ALTERNATIVE("dc zva, x0", "stzgm xzr, [x0]", ARM64_MTE, CONFIG_ARM64_MTE)

(or alternative_insn, the above save an IS_ENABLED).

--------8<------------------------
From 73f3869cb68fab1505d7b400ae8a39a19c5fc9e9 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 27 Nov 2019 09:07:30 +0000
Subject: [PATCH] arm64: alternative: Always emit the first instruction in
 ALTERNATIVE blocks

Currently with the ALTERNATIVE macro or alternative_insn, the cfg (or
enable) arguments disable the entire asm block. Change the macros to
only omit the alternative block on !IS_ENABLED(cfg). In addition, remove
the cfg arguments to to ALTERNATIVE in those few calls where it is still
passed. There is no change to the resulting kernel image with defconfig.

alternative_insn's enable argument will be used in a subsequent patch
and we are keeping the ALTERNATIVE C macro arguments in line with the
asm version.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/alternative.h | 13 ++++++++-----
 arch/arm64/include/asm/tlbflush.h    |  6 ++----
 arch/arm64/include/asm/uaccess.h     | 15 +++++----------
 arch/arm64/kvm/hyp/entry.S           |  2 +-
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 5e5dc05d63a0..ecb44cb0d6b1 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -66,9 +66,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
  * Alternatives with callbacks do not generate replacement instructions.
  */
 #define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled)	\
-	".if "__stringify(cfg_enabled)" == 1\n"				\
 	"661:\n\t"							\
 	oldinstr "\n"							\
+	".if "__stringify(cfg_enabled)" == 1\n"				\
 	"662:\n"							\
 	".pushsection .altinstructions,\"a\"\n"				\
 	ALTINSTR_ENTRY(feature)						\
@@ -83,9 +83,9 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 	".endif\n"
 
 #define __ALTERNATIVE_CFG_CB(oldinstr, feature, cfg_enabled, cb)	\
-	".if "__stringify(cfg_enabled)" == 1\n"				\
 	"661:\n\t"							\
 	oldinstr "\n"							\
+	".if "__stringify(cfg_enabled)" == 1\n"				\
 	"662:\n"							\
 	".pushsection .altinstructions,\"a\"\n"				\
 	ALTINSTR_ENTRY_CB(feature, cb)					\
@@ -111,9 +111,12 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 	.byte \alt_len
 .endm
 
+/*
+ * If enable == 0, the alternative block will be omitted.
+ */
 .macro alternative_insn insn1, insn2, cap, enable = 1
-	.if \enable
 661:	\insn1
+	.if \enable
 662:	.pushsection .altinstructions, "a"
 	altinstruction_entry 661b, 663f, \cap, 662b-661b, 664f-663f
 	.popsection
@@ -289,8 +292,8 @@ alternative_endif
  * Usage: asm(ALTERNATIVE(oldinstr, newinstr, feature));
  *
  * Usage: asm(ALTERNATIVE(oldinstr, newinstr, feature, CONFIG_FOO));
- * N.B. If CONFIG_FOO is specified, but not selected, the whole block
- *      will be omitted, including oldinstr.
+ * N.B. If CONFIG_FOO is specified, but not selected, the alternative block
+ *      will be omitted.
  */
 #define ALTERNATIVE(oldinstr, newinstr, ...)   \
 	_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc3949064725..8c79f12900ce 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -30,15 +30,13 @@
 #define __TLBI_0(op, arg) asm ("tlbi " #op "\n"				       \
 		   ALTERNATIVE("nop\n			nop",		       \
 			       "dsb ish\n		tlbi " #op,	       \
-			       ARM64_WORKAROUND_REPEAT_TLBI,		       \
-			       CONFIG_ARM64_WORKAROUND_REPEAT_TLBI)	       \
+			       ARM64_WORKAROUND_REPEAT_TLBI)		       \
 			    : : )
 
 #define __TLBI_1(op, arg) asm ("tlbi " #op ", %0\n"			       \
 		   ALTERNATIVE("nop\n			nop",		       \
 			       "dsb ish\n		tlbi " #op ", %0",     \
-			       ARM64_WORKAROUND_REPEAT_TLBI,		       \
-			       CONFIG_ARM64_WORKAROUND_REPEAT_TLBI)	       \
+			       ARM64_WORKAROUND_REPEAT_TLBI)		       \
 			    : : "r" (arg))
 
 #define __TLBI_N(op, arg, n, ...) __TLBI_##n(op, arg)
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 32fc8061aa76..d1812cdaab01 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -45,8 +45,7 @@ static inline void set_fs(mm_segment_t fs)
 	if (IS_ENABLED(CONFIG_ARM64_UAO) && fs == KERNEL_DS)
 		asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), ARM64_HAS_UAO));
 	else
-		asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO,
-				CONFIG_ARM64_UAO));
+		asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO));
 }
 
 #define segment_eq(a, b)	((a) == (b))
@@ -175,28 +174,24 @@ static inline bool uaccess_ttbr0_enable(void)
 
 static inline void __uaccess_disable_hw_pan(void)
 {
-	asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN,
-			CONFIG_ARM64_PAN));
+	asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), ARM64_HAS_PAN));
 }
 
 static inline void __uaccess_enable_hw_pan(void)
 {
-	asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,
-			CONFIG_ARM64_PAN));
+	asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN));
 }
 
 #define __uaccess_disable(alt)						\
 do {									\
 	if (!uaccess_ttbr0_disable())					\
-		asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), alt,		\
-				CONFIG_ARM64_PAN));			\
+		asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), alt));	\
 } while (0)
 
 #define __uaccess_enable(alt)						\
 do {									\
 	if (!uaccess_ttbr0_enable())					\
-		asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), alt,		\
-				CONFIG_ARM64_PAN));			\
+		asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), alt));	\
 } while (0)
 
 static inline void uaccess_disable(void)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index d22d0534dd60..88b096c18223 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -109,7 +109,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
 
 	add	x1, x1, #VCPU_CONTEXT
 
-	ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
+	ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN)
 
 	// Store the guest regs x2 and x3
 	stp	x2, x3,   [x1, #CPU_XREG_OFFSET(2)]


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-29 10:27   ` Kevin Brodsky
@ 2020-04-29 15:24     ` Catalin Marinas
  0 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-04-29 15:24 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Andrey Konovalov, Peter Collingbourne,
	linux-mm, linux-arch, Alan Hayward, Luis Machado, Omair Javaid

On Wed, Apr 29, 2020 at 11:27:10AM +0100, Kevin Brodsky wrote:
> On 21/04/2020 15:25, Catalin Marinas wrote:
> > diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> > index bd51ea7e2fcb..45be04a8c73c 100644
> > --- a/arch/arm64/lib/mte.S
> > +++ b/arch/arm64/lib/mte.S
> > @@ -5,6 +5,7 @@
> >   #include <linux/linkage.h>
> >   #include <asm/assembler.h>
> > +#include <asm/mte.h>
> >   /*
> >    * Compare tags of two pages
> > @@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
> >   	ret
> >   SYM_FUNC_END(mte_memcmp_pages)
> > +
> > +/*
> > + * Read tags from a user buffer (one tag per byte) and set the corresponding
> > + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> > + *   x0 - kernel address (to)
> > + *   x1 - user buffer (from)
> > + *   x2 - number of tags/bytes (n)
> > + * Returns:
> > + *   x0 - number of tags read/set
> > + */
> > +SYM_FUNC_START(mte_copy_tags_from_user)
> > +	mov	x3, x1
> > +1:
> > +USER(2f, ldtrb	w4, [x1])
> 
> Here we are making either of the following assumptions:
> 1. The __user pointer (here `from`) actually points to user memory, not
> kernel memory (and we have set_fs(USER_DS) in place).
> 2. CONFIG_ARM64_UAO is enabled and the hardware implements UAO.
> 
> 1. is currently true because these functions are only used for the new
> ptrace requests, which indeed pass pointers to user memory. However, future
> users of these functions may not know about this requirement.
> 2. is not necessarily true because ARM64_MTE does not depend on ARM64_UAO.
> 
> It is unlikely that future users of these functions actually need to pass
> __user pointers to kernel memory, so adding a comment spelling out the first
> assumption is probably fine.

I found it easier to add uao_user_alternative rather than writing a
comment ;).

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
  2020-04-24 23:28   ` Peter Collingbourne
  2020-04-29 10:27   ` Kevin Brodsky
@ 2020-04-29 16:46   ` Dave Martin
  2020-04-30 10:21     ` Catalin Marinas
  2020-05-05 18:03   ` Luis Machado
  2020-05-12 19:05   ` Luis Machado
  4 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-29 16:46 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Luis Machado,
	Omair Javaid, Szabolcs Nagy, Andrey Konovalov, Kevin Brodsky,
	Peter Collingbourne, linux-mm, Alan Hayward, Vincenzo Frascino,
	Will Deacon

On Tue, Apr 21, 2020 at 03:25:59PM +0100, Catalin Marinas wrote:
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
> 
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
> 
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.

Doesn't quite belong here, but:

Can we dump the tags and possible the faulting mode etc. when dumping
core?

That information seems potentially valuable for debugging.
Tweaking the fault mode from a debugger may also be useful (which is
quite easy to achieve if coredump support is done by wrapping the MTE
control word in a regset).

These could probably be added later, though.


> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
> 
> Notes:
>     New in v3.
> 
>  arch/arm64/include/asm/mte.h         |  17 ++++
>  arch/arm64/include/uapi/asm/ptrace.h |   3 +
>  arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>  arch/arm64/kernel/ptrace.c           |  15 +++-
>  arch/arm64/lib/mte.S                 |  50 +++++++++++
>  5 files changed, 211 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 22eb3e06f311..0ca2aaff07a1 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -2,12 +2,21 @@
>  #ifndef __ASM_MTE_H
>  #define __ASM_MTE_H
>  
> +#define MTE_ALLOC_SIZE	UL(16)
> +#define MTE_ALLOC_MASK	(~(MTE_ALLOC_SIZE - 1))
> +#define MTE_TAG_SHIFT	(56)
> +#define MTE_TAG_SIZE	(4)
> +

Nit: pointless () on the last two #defines.

[...]

> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index fa4a4196b248..0cb496ed9bf9 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -3,12 +3,17 @@
>   * Copyright (C) 2020 ARM Ltd.
>   */
>  
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
>  #include <linux/prctl.h>
>  #include <linux/sched.h>
> +#include <linux/sched/mm.h>
>  #include <linux/thread_info.h>
> +#include <linux/uio.h>
>  
>  #include <asm/cpufeature.h>
>  #include <asm/mte.h>
> +#include <asm/ptrace.h>
>  #include <asm/sysreg.h>
>  
>  static void update_sctlr_el1_tcf0(u64 tcf0)
> @@ -133,3 +138,125 @@ long get_mte_ctrl(void)
>  
>  	return ret;
>  }
> +
> +/*
> + * Access MTE tags in another process' address space as given in mm. Update
> + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> + * Inspired by __access_remote_vm().
> + */
> +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> +				unsigned long addr, struct iovec *kiov,
> +				unsigned int gup_flags)
> +{
> +	struct vm_area_struct *vma;
> +	void __user *buf = kiov->iov_base;
> +	size_t len = kiov->iov_len;
> +	int ret;
> +	int write = gup_flags & FOLL_WRITE;
> +
> +	if (down_read_killable(&mm->mmap_sem))
> +		return -EIO;
> +
> +	if (!access_ok(buf, len))
> +		return -EFAULT;

Leaked down_read()?

> +
> +	while (len) {
> +		unsigned long tags, offset;
> +		void *maddr;
> +		struct page *page = NULL;
> +
> +		ret = get_user_pages_remote(tsk, mm, addr, 1, gup_flags,
> +					    &page, &vma, NULL);
> +		if (ret <= 0)
> +			break;
> +
> +		/* limit access to the end of the page */
> +		offset = offset_in_page(addr);
> +		tags = min(len, (PAGE_SIZE - offset) / MTE_ALLOC_SIZE);
> +
> +		maddr = page_address(page);
> +		if (write) {
> +			tags = mte_copy_tags_from_user(maddr + offset, buf, tags);
> +			set_page_dirty_lock(page);
> +		} else {
> +			tags = mte_copy_tags_to_user(buf, maddr + offset, tags);
> +		}
> +		put_page(page);
> +
> +		/* error accessing the tracer's buffer */
> +		if (!tags)
> +			break;
> +
> +		len -= tags;
> +		buf += tags;
> +		addr += tags * MTE_ALLOC_SIZE;
> +	}
> +	up_read(&mm->mmap_sem);
> +
> +	/* return an error if no tags copied */
> +	kiov->iov_len = buf - kiov->iov_base;
> +	if (!kiov->iov_len) {
> +		/* check for error accessing the tracee's address space */
> +		if (ret <= 0)
> +			return -EIO;
> +		else
> +			return -EFAULT;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Copy MTE tags in another process' address space at 'addr' to/from tracer's
> + * iovec buffer. Return 0 on success. Inspired by ptrace_access_vm().
> + */
> +static int access_remote_tags(struct task_struct *tsk, unsigned long addr,
> +			      struct iovec *kiov, unsigned int gup_flags)
> +{
> +	struct mm_struct *mm;
> +	int ret;
> +
> +	mm = get_task_mm(tsk);
> +	if (!mm)
> +		return -EPERM;
> +
> +	if (!tsk->ptrace || (current != tsk->parent) ||
> +	    ((get_dumpable(mm) != SUID_DUMP_USER) &&
> +	     !ptracer_capable(tsk, mm->user_ns))) {
> +		mmput(mm);
> +		return -EPERM;
> +	}
> +
> +	ret = __access_remote_tags(tsk, mm, addr, kiov, gup_flags);
> +	mmput(mm);
> +
> +	return ret;
> +}
> +
> +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> +			 unsigned long addr, unsigned long data)
> +{
> +	int ret;
> +	struct iovec kiov;
> +	struct iovec __user *uiov = (void __user *)data;
> +	unsigned int gup_flags = FOLL_FORCE;
> +
> +	if (!system_supports_mte())
> +		return -EIO;
> +
> +	if (get_user(kiov.iov_base, &uiov->iov_base) ||
> +	    get_user(kiov.iov_len, &uiov->iov_len))
> +		return -EFAULT;
> +
> +	if (request == PTRACE_POKEMTETAGS)
> +		gup_flags |= FOLL_WRITE;
> +
> +	/* align addr to the MTE tag granule */
> +	addr &= MTE_ALLOC_MASK;
> +
> +	ret = access_remote_tags(child, addr, &kiov, gup_flags);
> +	if (!ret)
> +		ret = __put_user(kiov.iov_len, &uiov->iov_len);

Should this be put_user()?  We didn't use __get_user() above, and I
don't see what guards the access.

> +
> +	return ret;
> +}
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 077e352495eb..1fdb841ad536 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -34,6 +34,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/fpsimd.h>
> +#include <asm/mte.h>
>  #include <asm/pgtable.h>
>  #include <asm/pointer_auth.h>
>  #include <asm/stacktrace.h>
> @@ -1797,7 +1798,19 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
>  long arch_ptrace(struct task_struct *child, long request,
>  		 unsigned long addr, unsigned long data)
>  {
> -	return ptrace_request(child, request, addr, data);
> +	int ret;
> +
> +	switch (request) {
> +	case PTRACE_PEEKMTETAGS:
> +	case PTRACE_POKEMTETAGS:
> +		ret = mte_ptrace_copy_tags(child, request, addr, data);
> +		break;

Nit: return mte_trace_copy_tags()?

This is a new function, so we don't need to follow the verbose style of
the core code.  Not everyone likes returning out of switches though.

> +	default:
> +		ret = ptrace_request(child, request, addr, data);
> +		break;
> +	}
> +
> +	return ret;
>  }
>  
>  enum ptrace_syscall_dir {
> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> index bd51ea7e2fcb..45be04a8c73c 100644
> --- a/arch/arm64/lib/mte.S
> +++ b/arch/arm64/lib/mte.S
> @@ -5,6 +5,7 @@
>  #include <linux/linkage.h>
>  
>  #include <asm/assembler.h>
> +#include <asm/mte.h>
>  
>  /*
>   * Compare tags of two pages
> @@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
>  
>  	ret
>  SYM_FUNC_END(mte_memcmp_pages)
> +
> +/*
> + * Read tags from a user buffer (one tag per byte) and set the corresponding
> + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> + *   x0 - kernel address (to)
> + *   x1 - user buffer (from)
> + *   x2 - number of tags/bytes (n)

Is it worth checking for x2 == 0?  Currently, x2 will underflow and
we'll try to loop 2^64 times (until a fault stops us).

I don't think callers currently pass 0 here, but it feels like an
accident waiting to happen.  Things like memcpy() usually try to close
this loophole.

Similarly for _to_user().

Cheers
---Dave

> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_from_user)
> +	mov	x3, x1
> +1:
> +USER(2f, ldtrb	w4, [x1])
> +	lsl	x4, x4, #MTE_TAG_SHIFT
> +	stg	x4, [x0], #MTE_ALLOC_SIZE
> +	add	x1, x1, #1
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x1, x3		// update the number of tags set
> +	ret
> +SYM_FUNC_END(mte_copy_tags_from_user)
> +
> +/*
> + * Get the tags from a kernel address range and write the tag values to the
> + * given user buffer (one tag per byte). Used by PTRACE_PEEKMTETAGS.
> + *   x0 - user buffer (to)
> + *   x1 - kernel address (from)
> + *   x2 - number of tags/bytes (n)
> + * Returns:
> + *   x0 - number of tags read/set
> + */
> +SYM_FUNC_START(mte_copy_tags_to_user)
> +	mov	x3, x0
> +1:
> +	ldg	x4, [x1]
> +	ubfx	x4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
> +USER(2f, sttrb	w4, [x0])
> +	add	x0, x0, #1
> +	add	x1, x1, #MTE_ALLOC_SIZE
> +	subs	x2, x2, #1
> +	b.ne	1b
> +
> +	// exception handling and function return
> +2:	sub	x0, x0, x3		// update the number of tags copied
> +	ret
> +SYM_FUNC_END(mte_copy_tags_from_user)
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-04-21 14:26 ` [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
@ 2020-04-29 16:47   ` Dave Martin
  2020-04-30 16:23     ` Catalin Marinas
  2020-05-05 10:32   ` Szabolcs Nagy
  1 sibling, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-04-29 16:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> From: Vincenzo Frascino <vincenzo.frascino@arm.com>
> 
> Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
> a mechanism to detect the sources of memory related errors which
> may be vulnerable to exploitation, including bounds violations,
> use-after-free, use-after-return, use-out-of-scope and use before
> initialization errors.
> 
> Add Memory Tagging Extension documentation for the arm64 linux
> kernel support.
> 
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> 
> Notes:
>     v3:
>     - Modify the uaccess checking conditions: only when the sync mode is
>       selected by the user. In async mode, the kernel uaccesses are not
>       checked.
>     - Clarify that an include mask of 0 (exclude mask 0xffff) results in
>       always generating tag 0.
>     - Document the ptrace() interface.
>     
>     v2:
>     - Documented the uaccess kernel tag checking mode.
>     - Removed the BTI definitions from cpu-feature-registers.rst.
>     - Removed the paragraph stating that MTE depends on the tagged address
>       ABI (while the Kconfig entry does, there is no requirement for the
>       user to enable both).
>     - Changed the GCR_EL1.Exclude handling description following the change
>       in the prctl() interface (include vs exclude mask).
>     - Updated the example code.
> 
>  Documentation/arm64/cpu-feature-registers.rst |   2 +
>  Documentation/arm64/elf_hwcaps.rst            |   5 +
>  Documentation/arm64/index.rst                 |   1 +
>  .../arm64/memory-tagging-extension.rst        | 260 ++++++++++++++++++
>  4 files changed, 268 insertions(+)
>  create mode 100644 Documentation/arm64/memory-tagging-extension.rst
> 
> diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
> index 41937a8091aa..b5679fa85ad9 100644
> --- a/Documentation/arm64/cpu-feature-registers.rst
> +++ b/Documentation/arm64/cpu-feature-registers.rst
> @@ -174,6 +174,8 @@ infrastructure:
>       +------------------------------+---------+---------+
>       | Name                         |  bits   | visible |
>       +------------------------------+---------+---------+
> +     | MTE                          | [11-8]  |    y    |
> +     +------------------------------+---------+---------+
>       | SSBS                         | [7-4]   |    y    |
>       +------------------------------+---------+---------+
>  
> diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
> index 7dfb97dfe416..ca7f90e99e3a 100644
> --- a/Documentation/arm64/elf_hwcaps.rst
> +++ b/Documentation/arm64/elf_hwcaps.rst
> @@ -236,6 +236,11 @@ HWCAP2_RNG
>  
>      Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001.
>  
> +HWCAP2_MTE
> +
> +    Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
> +    by Documentation/arm64/memory-tagging-extension.rst.
> +
>  4. Unused AT_HWCAP bits
>  -----------------------
>  
> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
> index 09cbb4ed2237..4cd0e696f064 100644
> --- a/Documentation/arm64/index.rst
> +++ b/Documentation/arm64/index.rst
> @@ -14,6 +14,7 @@ ARM64 Architecture
>      hugetlbpage
>      legacy_instructions
>      memory
> +    memory-tagging-extension
>      pointer-authentication
>      silicon-errata
>      sve
> diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
> new file mode 100644
> index 000000000000..f82dfbd70061
> --- /dev/null
> +++ b/Documentation/arm64/memory-tagging-extension.rst
> @@ -0,0 +1,260 @@
> +===============================================
> +Memory Tagging Extension (MTE) in AArch64 Linux
> +===============================================
> +
> +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
> +         Catalin Marinas <catalin.marinas@arm.com>
> +
> +Date: 2020-02-25
> +
> +This document describes the provision of the Memory Tagging Extension
> +functionality in AArch64 Linux.
> +
> +Introduction
> +============
> +
> +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
> +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
> +(Top Byte Ignore) feature and allows software to access a 4-bit
> +allocation tag for each 16-byte granule in the physical address space.
> +Such memory range must be mapped with the Normal-Tagged memory
> +attribute. A logical tag is derived from bits 59-56 of the virtual
> +address used for the memory access. A CPU with MTE enabled will compare
> +the logical tag against the allocation tag and potentially raise an
> +exception on mismatch, subject to system registers configuration.
> +
> +Userspace Support
> +=================
> +
> +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
> +supported by the hardware, the kernel advertises the feature to
> +userspace via ``HWCAP2_MTE``.
> +
> +PROT_MTE
> +--------
> +
> +To access the allocation tags, a user process must enable the Tagged
> +memory attribute on an address range using a new ``prot`` flag for
> +``mmap()`` and ``mprotect()``:
> +
> +``PROT_MTE`` - Pages allow access to the MTE allocation tags.
> +
> +The allocation tag is set to 0 when such pages are first mapped in the
> +user address space and preserved on copy-on-write. ``MAP_SHARED`` is
> +supported and the allocation tags can be shared between processes.
> +
> +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
> +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
> +types of mapping will result in ``-EINVAL`` returned by these system
> +calls.
> +
> +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
> +be cleared by ``mprotect()``.

What enforces this?  I don't have my head fully around the code yet.

I'm wondering whether attempting to clear PROT_MTE should be reported as
an error.  Is there any rationale for not doing so?


> +
> +Tag Check Faults
> +----------------
> +
> +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> +the logical and allocation tags occurs on access, there are three
> +configurable behaviours:
> +
> +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> +  tag check fault.
> +
> +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> +  memory access is not performed.

Also say that if in this case, if SIGSEGV is ignored or blocked by the
offending thread then containing processes is terminated with a coredump
(at least, that's what ought to happen).

> +
> +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> +  thread, asynchronously following one or multiple tag check faults,
> +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.

For "current thread": that's a kernel concept.  For user-facing
documentation, can we say "the offending thread" or similar?

For clarity, it's worth saying that the faulting address is not
reported.  Or, we could be optimistic that someday this information will
be available and say that si_addr is the faulting address if available,
with 0 meaning the address is not available.

Maybe (void *)-1 would be better duff address, but I can't see it
mattering much.  If there's already precedent for si_addr==0 elsewhere,
it makes sense to follow it.

> +
> +**Note**: There are no *match-all* logical tags available for user
> +applications.

This note seems misplaced.

> +
> +The user can select the above modes, per thread, using the
> +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where

PR_GET_TAGGED_ADDR_CTRL seems to be missing here.

> +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> +bit-field:
> +
> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode

Done naively, this will destroy the PR_MTE_TAG_MASK field.  Is there a
preferred way to change only parts of this control word?  If the answer
is "cache the value in userspace if you care about performance, or
otherwise use PR_GET_TAGGED_ADDR_CTRL as part of a read-modify-write,"
so be it.

If we think this might be an issue for software, it might be worth
splitting out separate prctls for each field.)

> +
> +Tag checking can also be disabled for a user thread by setting the
> +``PSTATE.TCO`` bit with ``MSR TCO, #1``.

Users should probably not touch this unless they know what they're
doing -- should this flag ever be left set across function boundaries
etc.?

What's it for?  Temporarily masking MTE faults in critical sections?
Is this self-synchronising... what happens to pending asynchronous
faults?  Are faults occurring while the flag is set pended or discarded?

(Deliberately not reading the spec here -- if the explanation is not
straightforward, then it may be sufficient to tell people to go read
it.)

> +
> +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
> +irrespective of the interrupted context.

Rationale?  Do we have advice on what signal handlers should do?

Is PSTATE.TC0 restored by sigreturn?

> +
> +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> +are only checked if the current thread tag checking mode is
> +PR_MTE_TCF_SYNC.

Vague?  Can we make a precise statement about when the kernel will and
won't check such accesses?  And aren't there limitations (like use of
get_user_pages() etc.)?

> +
> +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
> +-----------------------------------------------------------------
> +
> +The architecture allows excluding certain tags to be randomly generated
> +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux

Can we have a separate section on what execve() and fork()/clone() do
to the MTE controls and PSTATE.TCO?  "By default" could mean a variety
of things, and I'm not sure we cover everything.

Is PROT_MTE ever set on the initial pages mapped by execve()?

> +excludes all tags other than 0. A user thread can enable specific tags
> +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> +in the ``PR_MTE_TAG_MASK`` bit-field.
> +
> +**Note**: The hardware uses an exclude mask but the ``prctl()``
> +interface provides an include mask. An include mask of ``0`` (exclusion
> +mask ``0xffff``) results in the CPU always generating tag ``0``.

Is there no way to make this default to 1 rather than having a magic
meaning for 0?

> +
> +The ``ptrace()`` interface
> +--------------------------
> +
> +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
> +the tags from or set the tags to a tracee's address space. The
> +``ptrace()`` syscall is invoked as ``ptrace(request, pid, addr, data)``
> +where:
> +
> +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
> +- ``pid`` - the tracee's PID.
> +- ``addr`` - address in the tracee's address space.

What if addr is not 16-byte aligned?  Is this considered valid use?

> +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
> +  a buffer of ``iov_len`` length in the tracer's address space.

What's the data format for the copied tags?

> +
> +The tags in the tracer's ``iov_base`` buffer are represented as one tag
> +per byte and correspond to a 16-byte MTE tag granule in the tracee's
> +address space.

We could say that the whole operation accesses the tags for 16 * iov_len
bytes of the tracee's address space.  Maybe superfluous though.

> +
> +``ptrace()`` return value:
> +
> +- 0 - success, the tracer's ``iov_len`` was updated to the number of
> +  tags copied (it may be smaller than the requested ``iov_len`` if the
> +  requested address range in the tracee's or the tracer's space cannot
> +  be fully accessed).

I'd replace "success" with something like "some tags were copied:
``iov_len`` is updated to indicate the actual number of tags
transferred.  This may be fewer than requested: [...]"

Can we get a short PEEKTAGS/POKETAGS for transient reasons (like minor
page faults)?  i.e., should the caller attempt to retry, or is that a
a stupid thing to do?

> +- ``-EPERM`` - the specified process cannot be traced.
> +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
> +  address) and no tags copied. ``iov_len`` not updated.
> +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
> +  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
> +
> +Example of correct usage
> +========================
> +
> +*MTE Example code*
> +
> +.. code-block:: c
> +
> +    /*
> +     * To be compiled with -march=armv8.5-a+memtag
> +     */
> +    #include <errno.h>
> +    #include <stdio.h>
> +    #include <stdlib.h>
> +    #include <unistd.h>
> +    #include <sys/auxv.h>
> +    #include <sys/mman.h>
> +    #include <sys/prctl.h>
> +
> +    /*
> +     * From arch/arm64/include/uapi/asm/hwcap.h
> +     */
> +    #define HWCAP2_MTE              (1 << 18)
> +
> +    /*
> +     * From arch/arm64/include/uapi/asm/mman.h
> +     */
> +    #define PROT_MTE                 0x20
> +
> +    /*
> +     * From include/uapi/linux/prctl.h
> +     */
> +    #define PR_SET_TAGGED_ADDR_CTRL 55
> +    #define PR_GET_TAGGED_ADDR_CTRL 56
> +    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
> +    # define PR_MTE_TCF_SHIFT       1
> +    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
> +    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
> +    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
> +    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
> +    # define PR_MTE_TAG_SHIFT       3
> +    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
> +
> +    /*
> +     * Insert a random logical tag into the given pointer.
> +     */
> +    #define insert_random_tag(ptr) ({                       \
> +            __u64 __val;                                    \
> +            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
> +            __val;                                          \
> +    })
> +
> +    /*
> +     * Set the allocation tag on the destination address.
> +     */
> +    #define set_tag(tagged_addr) do {                                      \
> +            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
> +    } while (0)
> +
> +    int main()
> +    {
> +            unsigned long *a;
> +            unsigned long page_sz = getpagesize();

Nit: obsolete in POSIX.  Prefer sysconf(_SC_PAGESIZE).

> +            unsigned long hwcap2 = getauxval(AT_HWCAP2);
> +
> +            /* check if MTE is present */
> +            if (!(hwcap2 & HWCAP2_MTE))
> +                    return -1;

Nit: -1 isn't a valid exit code, so it's preferable to return 1 or
EXIT_FAILURE.

> +
> +            /*
> +             * Enable the tagged address ABI, synchronous MTE tag check faults and
> +             * allow all non-zero tags in the randomly generated set.
> +             */
> +            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
> +                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
> +                      0, 0, 0)) {
> +                    perror("prctl() failed");
> +                    return -1;
> +            }
> +
> +            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
> +                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

Is this a vaild assignment?

I can't remember whether C's "pointer values must be correctly aligned"
rule applies only to dereferences, or whether it applies to conversions
too.  From memory I have a feeling that it does.

If so, the compiler could legimitately optimise the failure check away,
since MAP_FAILED is not correctly aligned for unsigned long.

> +            if (a == MAP_FAILED) {
> +                    perror("mmap() failed");
> +                    return -1;
> +            }
> +
> +            /*
> +             * Enable MTE on the above anonymous mmap. The flag could be passed
> +             * directly to mmap() and skip this step.
> +             */
> +            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
> +                    perror("mprotect() failed");
> +                    return -1;
> +            }
> +
> +            /* access with the default tag (0) */
> +            a[0] = 1;
> +            a[1] = 2;
> +
> +            printf("a[0] = %lu a[1] = %lu\n", a[0], a[1]);
> +
> +            /* set the logical and allocation tags */
> +            a = (unsigned long *)insert_random_tag(a);
> +            set_tag(a);
> +
> +            printf("%p\n", a);
> +
> +            /* non-zero tag access */
> +            a[0] = 3;
> +            printf("a[0] = %lu a[1] = %lu\n", a[0], a[1]);
> +
> +            /*
> +             * If MTE is enabled correctly the next instruction will generate an
> +             * exception.
> +             */
> +            printf("Expecting SIGSEGV...\n");
> +            a[2] = 0xdead;
> +
> +            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
> +            printf("...done\n");
> +
> +            return 0;
> +    }

Since this shouldn't happen, can we print an error and return nonzero?

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-29 16:46   ` Dave Martin
@ 2020-04-30 10:21     ` Catalin Marinas
  2020-05-04 16:40       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-30 10:21 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Luis Machado,
	Omair Javaid, Szabolcs Nagy, Andrey Konovalov, Kevin Brodsky,
	Peter Collingbourne, linux-mm, Alan Hayward, Vincenzo Frascino,
	Will Deacon

On Wed, Apr 29, 2020 at 05:46:07PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:25:59PM +0100, Catalin Marinas wrote:
> > Add support for bulk setting/getting of the MTE tags in a tracee's
> > address space at 'addr' in the ptrace() syscall prototype. 'data' points
> > to a struct iovec in the tracer's address space with iov_base
> > representing the address of a tracer's buffer of length iov_len. The
> > tags to be copied to/from the tracer's buffer are stored as one tag per
> > byte.
> > 
> > On successfully copying at least one tag, ptrace() returns 0 and updates
> > the tracer's iov_len with the number of tags copied. In case of error,
> > either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> > page.
> > 
> > Note that the tag copying functions are not performance critical,
> > therefore they lack optimisations found in typical memory copy routines.
> 
> Doesn't quite belong here, but:
> 
> Can we dump the tags and possible the faulting mode etc. when dumping
> core?

Yes, a regset containing GCR_EL1 and SCTLR_EL1.TCF0 bits, maybe
TFSRE_EL1 could be useful. Discussing with Luis M (cc'ed, working on gdb
support), he didn't have an immediate need for this but it can be added
as a new patch.

Also coredump containing the tags may also be useful, I just have to
figure out how.

> These could probably be added later, though.

Yes, it wouldn't be a (breaking) ABI change if we do them later, just an
addition.

> > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > index fa4a4196b248..0cb496ed9bf9 100644
> > --- a/arch/arm64/kernel/mte.c
> > +++ b/arch/arm64/kernel/mte.c
> > @@ -133,3 +138,125 @@ long get_mte_ctrl(void)
> >  
> >  	return ret;
> >  }
> > +
> > +/*
> > + * Access MTE tags in another process' address space as given in mm. Update
> > + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> > + * Inspired by __access_remote_vm().
> > + */
> > +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> > +				unsigned long addr, struct iovec *kiov,
> > +				unsigned int gup_flags)
> > +{
> > +	struct vm_area_struct *vma;
> > +	void __user *buf = kiov->iov_base;
> > +	size_t len = kiov->iov_len;
> > +	int ret;
> > +	int write = gup_flags & FOLL_WRITE;
> > +
> > +	if (down_read_killable(&mm->mmap_sem))
> > +		return -EIO;
> > +
> > +	if (!access_ok(buf, len))
> > +		return -EFAULT;
> 
> Leaked down_read()?

Ah, wrongly placed access_ok() check.

> > +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> > +			 unsigned long addr, unsigned long data)
> > +{
> > +	int ret;
> > +	struct iovec kiov;
> > +	struct iovec __user *uiov = (void __user *)data;
> > +	unsigned int gup_flags = FOLL_FORCE;
> > +
> > +	if (!system_supports_mte())
> > +		return -EIO;
> > +
> > +	if (get_user(kiov.iov_base, &uiov->iov_base) ||
> > +	    get_user(kiov.iov_len, &uiov->iov_len))
> > +		return -EFAULT;
> > +
> > +	if (request == PTRACE_POKEMTETAGS)
> > +		gup_flags |= FOLL_WRITE;
> > +
> > +	/* align addr to the MTE tag granule */
> > +	addr &= MTE_ALLOC_MASK;
> > +
> > +	ret = access_remote_tags(child, addr, &kiov, gup_flags);
> > +	if (!ret)
> > +		ret = __put_user(kiov.iov_len, &uiov->iov_len);
> 
> Should this be put_user()?  We didn't use __get_user() above, and I
> don't see what guards the access.

It doesn't make any difference on arm64 (it's just put_user) but we had
get_user() to check the access to &uiov->iov_len already above.

> > +	default:
> > +		ret = ptrace_request(child, request, addr, data);
> > +		break;
> > +	}
> > +
> > +	return ret;
> >  }
> >  
> >  enum ptrace_syscall_dir {
> > diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> > index bd51ea7e2fcb..45be04a8c73c 100644
> > --- a/arch/arm64/lib/mte.S
> > +++ b/arch/arm64/lib/mte.S
> > @@ -5,6 +5,7 @@
> >  #include <linux/linkage.h>
> >  
> >  #include <asm/assembler.h>
> > +#include <asm/mte.h>
> >  
> >  /*
> >   * Compare tags of two pages
> > @@ -44,3 +45,52 @@ SYM_FUNC_START(mte_memcmp_pages)
> >  
> >  	ret
> >  SYM_FUNC_END(mte_memcmp_pages)
> > +
> > +/*
> > + * Read tags from a user buffer (one tag per byte) and set the corresponding
> > + * tags at the given kernel address. Used by PTRACE_POKEMTETAGS.
> > + *   x0 - kernel address (to)
> > + *   x1 - user buffer (from)
> > + *   x2 - number of tags/bytes (n)
> 
> Is it worth checking for x2 == 0?  Currently, x2 will underflow and
> we'll try to loop 2^64 times (until a fault stops us).
> 
> I don't think callers currently pass 0 here, but it feels like an
> accident waiting to happen.  Things like memcpy() usually try to close
> this loophole.

Good point.

Thanks.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-04-29 16:47   ` Dave Martin
@ 2020-04-30 16:23     ` Catalin Marinas
  2020-05-04 16:46       ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-04-30 16:23 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, linux-arch, Richard Earnshaw, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, Peter Collingbourne, linux-mm,
	Vincenzo Frascino, Will Deacon

On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > +Userspace Support
> > +=================
> > +
> > +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
> > +supported by the hardware, the kernel advertises the feature to
> > +userspace via ``HWCAP2_MTE``.
> > +
> > +PROT_MTE
> > +--------
> > +
> > +To access the allocation tags, a user process must enable the Tagged
> > +memory attribute on an address range using a new ``prot`` flag for
> > +``mmap()`` and ``mprotect()``:
> > +
> > +``PROT_MTE`` - Pages allow access to the MTE allocation tags.
> > +
> > +The allocation tag is set to 0 when such pages are first mapped in the
> > +user address space and preserved on copy-on-write. ``MAP_SHARED`` is
> > +supported and the allocation tags can be shared between processes.
> > +
> > +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
> > +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
> > +types of mapping will result in ``-EINVAL`` returned by these system
> > +calls.
> > +
> > +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
> > +be cleared by ``mprotect()``.
> 
> What enforces this?  I don't have my head fully around the code yet.
> 
> I'm wondering whether attempting to clear PROT_MTE should be reported as
> an error.  Is there any rationale for not doing so?

A use-case is a JIT compiler where the memory is allocated by some
malloc() code with PROT_MTE set and passed down to a code generator
library which may not be MTE aware (and doesn't need to be, only tagged
ptr aware). Such library, once it generated the code, may do an
mprotect(PROT_READ|PROT_EXEC) without PROT_MTE. We didn't want to
inadvertently clear PROT_MTE, especially if the memory will be given
back to the original allocator (free) at some point.

Basically mprotect() may be done outside the heap allocator but it
should not interfere with allocator's decision to use MTE. For this
reason, I wouldn't report an error but silently ignore the lack of
PROT_MTE.

The way we handle this is by not including VM_MTE in VM_ARCH_CLEAR
(VM_MPX isn't either, though VM_SPARC_ADI is but when they added it, the
syscall ABI didn't even accept tagged pointers).

> > +Tag Check Faults
> > +----------------
> > +
> > +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> > +the logical and allocation tags occurs on access, there are three
> > +configurable behaviours:
> > +
> > +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> > +  tag check fault.
> > +
> > +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> > +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> > +  memory access is not performed.
> 
> Also say that if in this case, if SIGSEGV is ignored or blocked by the
> offending thread then containing processes is terminated with a coredump
> (at least, that's what ought to happen).

Makes sense.

> > +
> > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> > +  thread, asynchronously following one or multiple tag check faults,
> > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> 
> For "current thread": that's a kernel concept.  For user-facing
> documentation, can we say "the offending thread" or similar?
> 
> For clarity, it's worth saying that the faulting address is not
> reported.  Or, we could be optimistic that someday this information will
> be available and say that si_addr is the faulting address if available,
> with 0 meaning the address is not available.
> 
> Maybe (void *)-1 would be better duff address, but I can't see it
> mattering much.  If there's already precedent for si_addr==0 elsewhere,
> it makes sense to follow it.

At a quick grep, I can see a few instances on other architectures where
si_addr==0. I'll add a comment here.

If the hardware gives us something in the future, it will likely be in a
separate register and we can present it as a new sigcontext structure.
In the meantime I'll add a some text that the faulting address is
unknown.

> > +**Note**: There are no *match-all* logical tags available for user
> > +applications.
> 
> This note seems misplaced.

This was in the context of tag checking. I'll move it further down when
talking about PSTATE.TCO.

> > +
> > +The user can select the above modes, per thread, using the
> > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> 
> PR_GET_TAGGED_ADDR_CTRL seems to be missing here.

Added.

> > +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> > +bit-field:
> > +
> > +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> > +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> > +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> 
> Done naively, this will destroy the PR_MTE_TAG_MASK field.  Is there a
> preferred way to change only parts of this control word?  If the answer
> is "cache the value in userspace if you care about performance, or
> otherwise use PR_GET_TAGGED_ADDR_CTRL as part of a read-modify-write,"
> so be it.
> 
> If we think this might be an issue for software, it might be worth
> splitting out separate prctls for each field.)

We lack some feedback from user space people on how this prctl is going
to be used. I worked on the assumption that it is a one-off event during
libc setup, potentially driven by some environment variable (but that's
user's problem).

There were some suggestions that on an async SIGSEGV, the handler may
switch to synchronous mode. Since that's a rare event, a get/set
approach would be fine.

Anyway, with an additional argument to prctl (we have 3 spare), we could
do a set/clear mask approach. The current behaviour could be emulated
as:

  prctl(PR_SET_TAGGED_ADDR_CTRL, PR_MTE_bits, -1UL, 0, 0);

where -1 is the clear mask. The mask can be 0 for the initial prctl() or
we can say that if the mask is non-zero, only the bits in the mask will
be set.

If you want to only set the TCF bits:

  prctl(PR_SET_TAGGED_ADDR_CTRL, PR_MTE_TCF_SYNC, PR_MTE_TCF_MASK, 0, 0);

> > +Tag checking can also be disabled for a user thread by setting the
> > +``PSTATE.TCO`` bit with ``MSR TCO, #1``.
> 
> Users should probably not touch this unless they know what they're
> doing -- should this flag ever be left set across function boundaries
> etc.?

We can't control function boundaries from the kernel anyway.

> What's it for?  Temporarily masking MTE faults in critical sections?
> Is this self-synchronising... what happens to pending asynchronous
> faults?  Are faults occurring while the flag is set pended or discarded?

Something like a garbage collector scanning the memory. Since we do not
allow tag 0 as a match-all, it needs a cheaper option than prctl().

> > +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
> > +irrespective of the interrupted context.
> 
> Rationale?  Do we have advice on what signal handlers should do?

Well, that's the default mode - tag check override = 0, it means that
tag checking takes place.

> Is PSTATE.TC0 restored by sigreturn?

s/TC0/TCO/

Yes, it is restored on sigreturn.

> > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> > +are only checked if the current thread tag checking mode is
> > +PR_MTE_TCF_SYNC.
> 
> Vague?  Can we make a precise statement about when the kernel will and
> won't check such accesses?  And aren't there limitations (like use of
> get_user_pages() etc.)?

We could make it slightly clearer by say "kernel accesses to the user
address space".

> > +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
> > +-----------------------------------------------------------------
> > +
> > +The architecture allows excluding certain tags to be randomly generated
> > +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
> 
> Can we have a separate section on what execve() and fork()/clone() do
> to the MTE controls and PSTATE.TCO?  "By default" could mean a variety
> of things, and I'm not sure we cover everything.

Good point. I'll add a note on initial state for processes and threads.

> Is PROT_MTE ever set on the initial pages mapped by execve()?

No. There were discussions about mapping the initial stack with PROT_MTE
based on some ELF note but it can also be done in userspace with
mprotect(). I think we concluded that the .data/.bss sections will be
untagged.

> > +excludes all tags other than 0. A user thread can enable specific tags
> > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > +
> > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > +interface provides an include mask. An include mask of ``0`` (exclusion
> > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> 
> Is there no way to make this default to 1 rather than having a magic
> meaning for 0?

We follow the hardware behaviour where 0xffff and 0xfffe give the same
result.

> > +The ``ptrace()`` interface
> > +--------------------------
> > +
> > +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
> > +the tags from or set the tags to a tracee's address space. The
> > +``ptrace()`` syscall is invoked as ``ptrace(request, pid, addr, data)``
> > +where:
> > +
> > +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
> > +- ``pid`` - the tracee's PID.
> > +- ``addr`` - address in the tracee's address space.
> 
> What if addr is not 16-byte aligned?  Is this considered valid use?

Yes, I don't think we should impose a restriction here. Each address in
a 16-byte range has the same (shared) tag.

> > +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
> > +  a buffer of ``iov_len`` length in the tracer's address space.
> 
> What's the data format for the copied tags?

I could state that the tag are placed in the lower 4-bit of the byte
with the upper 4-bit set to 0.

> > +The tags in the tracer's ``iov_base`` buffer are represented as one tag
> > +per byte and correspond to a 16-byte MTE tag granule in the tracee's
> > +address space.
> 
> We could say that the whole operation accesses the tags for 16 * iov_len
> bytes of the tracee's address space.  Maybe superfluous though.
> 
> > +
> > +``ptrace()`` return value:
> > +
> > +- 0 - success, the tracer's ``iov_len`` was updated to the number of
> > +  tags copied (it may be smaller than the requested ``iov_len`` if the
> > +  requested address range in the tracee's or the tracer's space cannot
> > +  be fully accessed).
> 
> I'd replace "success" with something like "some tags were copied:
> ``iov_len`` is updated to indicate the actual number of tags
> transferred.  This may be fewer than requested: [...]"
> 
> Can we get a short PEEKTAGS/POKETAGS for transient reasons (like minor
> page faults)?  i.e., should the caller attempt to retry, or is that a
> a stupid thing to do?

I initially thought it should retry but managed to get the interface so
that no retries are needed. If fewer tags were transferred, it's for a
good reason (e.g. permission fault).

[...]

> > +            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
> > +                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 
> Is this a vaild assignment?
> 
> I can't remember whether C's "pointer values must be correctly aligned"
> rule applies only to dereferences, or whether it applies to conversions
> too.  From memory I have a feeling that it does.
> 
> If so, the compiler could legimitately optimise the failure check away,
> since MAP_FAILED is not correctly aligned for unsigned long.

I'm not going to dig into standards ;). I can change this to an unsigned
char *.

> > +            printf("Expecting SIGSEGV...\n");
> > +            a[2] = 0xdead;
> > +
> > +            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
> > +            printf("...done\n");
> > +
> > +            return 0;
> > +    }
> 
> Since this shouldn't happen, can we print an error and return nonzero?

Fair enough. I also agree with the other points you raised but to which
I haven't explicitly commented.

Thanks for the review, really useful.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-30 10:21     ` Catalin Marinas
@ 2020-05-04 16:40       ` Dave Martin
  0 siblings, 0 replies; 81+ messages in thread
From: Dave Martin @ 2020-05-04 16:40 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Luis Machado, Will Deacon,
	Omair Javaid, Szabolcs Nagy, Andrey Konovalov, Kevin Brodsky,
	linux-mm, Alan Hayward, Vincenzo Frascino, Peter Collingbourne,
	linux-arm-kernel

On Thu, Apr 30, 2020 at 11:21:32AM +0100, Catalin Marinas wrote:
> On Wed, Apr 29, 2020 at 05:46:07PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:25:59PM +0100, Catalin Marinas wrote:
> > > Add support for bulk setting/getting of the MTE tags in a tracee's
> > > address space at 'addr' in the ptrace() syscall prototype. 'data' points
> > > to a struct iovec in the tracer's address space with iov_base
> > > representing the address of a tracer's buffer of length iov_len. The
> > > tags to be copied to/from the tracer's buffer are stored as one tag per
> > > byte.
> > > 
> > > On successfully copying at least one tag, ptrace() returns 0 and updates
> > > the tracer's iov_len with the number of tags copied. In case of error,
> > > either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> > > page.
> > > 
> > > Note that the tag copying functions are not performance critical,
> > > therefore they lack optimisations found in typical memory copy routines.
> > 
> > Doesn't quite belong here, but:
> > 
> > Can we dump the tags and possible the faulting mode etc. when dumping
> > core?
> 
> Yes, a regset containing GCR_EL1 and SCTLR_EL1.TCF0 bits, maybe
> TFSRE_EL1 could be useful. Discussing with Luis M (cc'ed, working on gdb
> support), he didn't have an immediate need for this but it can be added
> as a new patch.
> 
> Also coredump containing the tags may also be useful, I just have to
> figure out how.
> 
> > These could probably be added later, though.
> 
> Yes, it wouldn't be a (breaking) ABI change if we do them later, just an
> addition.

Agreed

> > > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> > > index fa4a4196b248..0cb496ed9bf9 100644
> > > --- a/arch/arm64/kernel/mte.c
> > > +++ b/arch/arm64/kernel/mte.c
> > > @@ -133,3 +138,125 @@ long get_mte_ctrl(void)
> > >  
> > >  	return ret;
> > >  }
> > > +
> > > +/*
> > > + * Access MTE tags in another process' address space as given in mm. Update
> > > + * the number of tags copied. Return 0 if any tags copied, error otherwise.
> > > + * Inspired by __access_remote_vm().
> > > + */
> > > +static int __access_remote_tags(struct task_struct *tsk, struct mm_struct *mm,
> > > +				unsigned long addr, struct iovec *kiov,
> > > +				unsigned int gup_flags)
> > > +{
> > > +	struct vm_area_struct *vma;
> > > +	void __user *buf = kiov->iov_base;
> > > +	size_t len = kiov->iov_len;
> > > +	int ret;
> > > +	int write = gup_flags & FOLL_WRITE;
> > > +
> > > +	if (down_read_killable(&mm->mmap_sem))
> > > +		return -EIO;
> > > +
> > > +	if (!access_ok(buf, len))
> > > +		return -EFAULT;
> > 
> > Leaked down_read()?
> 
> Ah, wrongly placed access_ok() check.
> 
> > > +int mte_ptrace_copy_tags(struct task_struct *child, long request,
> > > +			 unsigned long addr, unsigned long data)
> > > +{
> > > +	int ret;
> > > +	struct iovec kiov;
> > > +	struct iovec __user *uiov = (void __user *)data;
> > > +	unsigned int gup_flags = FOLL_FORCE;
> > > +
> > > +	if (!system_supports_mte())
> > > +		return -EIO;
> > > +
> > > +	if (get_user(kiov.iov_base, &uiov->iov_base) ||
> > > +	    get_user(kiov.iov_len, &uiov->iov_len))
> > > +		return -EFAULT;
> > > +
> > > +	if (request == PTRACE_POKEMTETAGS)
> > > +		gup_flags |= FOLL_WRITE;
> > > +
> > > +	/* align addr to the MTE tag granule */
> > > +	addr &= MTE_ALLOC_MASK;
> > > +
> > > +	ret = access_remote_tags(child, addr, &kiov, gup_flags);
> > > +	if (!ret)
> > > +		ret = __put_user(kiov.iov_len, &uiov->iov_len);
> > 
> > Should this be put_user()?  We didn't use __get_user() above, and I
> > don't see what guards the access.
> 
> It doesn't make any difference on arm64 (it's just put_user) but we had
> get_user() to check the access to &uiov->iov_len already above.

Given that this isn't a critical path, I'd opt for now relying on side-
effects, since this could lead to mismaintenance in the future -- or
badly educate people who read the code.

That's just my preference though.

[...]

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass
  2020-04-29 13:52     ` Catalin Marinas
@ 2020-05-04 16:40       ` Dave Martin
  0 siblings, 0 replies; 81+ messages in thread
From: Dave Martin @ 2020-05-04 16:40 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Alexander Viro,
	Vincenzo Frascino, Peter Collingbourne, linux-arm-kernel

On Wed, Apr 29, 2020 at 02:52:25PM +0100, Catalin Marinas wrote:
> On Wed, Apr 29, 2020 at 11:26:51AM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:26:00PM +0100, Catalin Marinas wrote:
> > > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > > index 32fc8061aa76..566da441eba2 100644
> > > --- a/arch/arm64/include/asm/uaccess.h
> > > +++ b/arch/arm64/include/asm/uaccess.h
> > > @@ -416,6 +416,17 @@ extern unsigned long __must_check __arch_copy_in_user(void __user *to, const voi
> > >  #define INLINE_COPY_TO_USER
> > >  #define INLINE_COPY_FROM_USER
> > >  
> > > +static inline bool arch_has_exact_copy_from_user(unsigned long n)
> > > +{
> > > +	/*
> > > +	 * copy_from_user() aligns the source pointer if the size is greater
> > > +	 * than 15. Since all the loads are naturally aligned, they can only
> > > +	 * fail on the first byte.
> > > +	 */
> > > +	return n > 15;
> > > +}
> > > +#define arch_has_exact_copy_from_user
> > 
> > Did you mean:
> > 
> > #define arch_has_exact_copy_from_user arch_has_exact_copy_from_user
> 
> Yes (and I shouldn't write patches late in the day).
> 
> > Mind you, if this expands to 1 I'd have expected copy_mount_options()
> > not to compile, so I may be missing something.
> 
> I think arch_has_exact_copy_from_user() (with the braces) is looked up
> in the function namespace, so the macro isn't expanded. So arguably the
> patch is correct but pretty dodgy ;).
> 
> I scrapped this in my second attempt in reply to Kevin.

Problem solved!

> > > diff --git a/fs/namespace.c b/fs/namespace.c
> > > index a28e4db075ed..8febc50dfc5d 100644
> > > --- a/fs/namespace.c
> > > +++ b/fs/namespace.c
> > > @@ -3025,13 +3025,16 @@ void *copy_mount_options(const void __user * data)
> > 
> > [ Is this applying a band-aid to duct tape?
> > 
> > The fs presumably knows ahead of time whether it's expecting a string or
> > a fixed-size blob for data, so I'd hope we could just DTRT rather than
> > playing SEGV roulette here.
> > 
> > This might require more refactoring than makes sense for this series
> > though. ]
> 
> That's possible but it means moving the copy from sys_mount() to the
> specific places where it has additional information (the filesystems).
> I'm not even sure it's guaranteed to be strings. If it is, we could just
> replace all this with a strncpy_from_user().

Fair enough.  I'll add it to my wishlist...

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-04-30 16:23     ` Catalin Marinas
@ 2020-05-04 16:46       ` Dave Martin
  2020-05-11 16:40         ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-05-04 16:46 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Thu, Apr 30, 2020 at 05:23:17PM +0100, Catalin Marinas wrote:
> On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > +Userspace Support
> > > +=================
> > > +
> > > +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
> > > +supported by the hardware, the kernel advertises the feature to
> > > +userspace via ``HWCAP2_MTE``.
> > > +
> > > +PROT_MTE
> > > +--------
> > > +
> > > +To access the allocation tags, a user process must enable the Tagged
> > > +memory attribute on an address range using a new ``prot`` flag for
> > > +``mmap()`` and ``mprotect()``:
> > > +
> > > +``PROT_MTE`` - Pages allow access to the MTE allocation tags.
> > > +
> > > +The allocation tag is set to 0 when such pages are first mapped in the
> > > +user address space and preserved on copy-on-write. ``MAP_SHARED`` is
> > > +supported and the allocation tags can be shared between processes.
> > > +
> > > +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
> > > +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
> > > +types of mapping will result in ``-EINVAL`` returned by these system
> > > +calls.
> > > +
> > > +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
> > > +be cleared by ``mprotect()``.
> > 
> > What enforces this?  I don't have my head fully around the code yet.
> > 
> > I'm wondering whether attempting to clear PROT_MTE should be reported as
> > an error.  Is there any rationale for not doing so?
> 
> A use-case is a JIT compiler where the memory is allocated by some
> malloc() code with PROT_MTE set and passed down to a code generator
> library which may not be MTE aware (and doesn't need to be, only tagged
> ptr aware). Such library, once it generated the code, may do an
> mprotect(PROT_READ|PROT_EXEC) without PROT_MTE. We didn't want to
> inadvertently clear PROT_MTE, especially if the memory will be given
> back to the original allocator (free) at some point.
> 
> Basically mprotect() may be done outside the heap allocator but it
> should not interfere with allocator's decision to use MTE. For this
> reason, I wouldn't report an error but silently ignore the lack of
> PROT_MTE.
> 
> The way we handle this is by not including VM_MTE in VM_ARCH_CLEAR
> (VM_MPX isn't either, though VM_SPARC_ADI is but when they added it, the
> syscall ABI didn't even accept tagged pointers).

OK, I think this makes sense.

For BTI, I think mprotect() will clear PROT_BTI unless it's included in
prot, but that's a bit different: PROT_BTI relates to the memory
contents (i.e., it's BTI-aware code), where PROT_MTE is a property of
the memory itself.

> > > +Tag Check Faults
> > > +----------------
> > > +
> > > +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> > > +the logical and allocation tags occurs on access, there are three
> > > +configurable behaviours:
> > > +
> > > +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> > > +  tag check fault.
> > > +
> > > +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> > > +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> > > +  memory access is not performed.
> > 
> > Also say that if in this case, if SIGSEGV is ignored or blocked by the
> > offending thread then containing processes is terminated with a coredump
> > (at least, that's what ought to happen).
> 
> Makes sense.
> 
> > > +
> > > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> > > +  thread, asynchronously following one or multiple tag check faults,
> > > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> > 
> > For "current thread": that's a kernel concept.  For user-facing
> > documentation, can we say "the offending thread" or similar?
> > 
> > For clarity, it's worth saying that the faulting address is not
> > reported.  Or, we could be optimistic that someday this information will
> > be available and say that si_addr is the faulting address if available,
> > with 0 meaning the address is not available.
> > 
> > Maybe (void *)-1 would be better duff address, but I can't see it
> > mattering much.  If there's already precedent for si_addr==0 elsewhere,
> > it makes sense to follow it.
> 
> At a quick grep, I can see a few instances on other architectures where
> si_addr==0. I'll add a comment here.

OK, cool

Except: what if we're in PR_MTE_TCF_ASYNC mode.  If the SIGSEGV handler
triggers an asynchronous MTE fault itself, we could then get into a
spin.  Hmm.

I take it we drain any pending MTE faults when crossing EL boundaries?
In that case, an asynchronous MTE fault pending at sigreturn must have
been caused by the signal handler.  We could make that particular case
of MTE_AERR a force_sig.

> If the hardware gives us something in the future, it will likely be in a
> separate register and we can present it as a new sigcontext structure.
> In the meantime I'll add a some text that the faulting address is
> unknown.

I guess we can decide that later.  I think that if we can put something
sensible in si_addr we should do so, but that doesn't stop us also
putting more detailed info somewhere else.

> 
> > > +**Note**: There are no *match-all* logical tags available for user
> > > +applications.
> > 
> > This note seems misplaced.
> 
> This was in the context of tag checking. I'll move it further down when
> talking about PSTATE.TCO.

OK

> > > +
> > > +The user can select the above modes, per thread, using the
> > > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> > 
> > PR_GET_TAGGED_ADDR_CTRL seems to be missing here.
> 
> Added.
> 
> > > +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> > > +bit-field:
> > > +
> > > +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> > > +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> > > +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> > 
> > Done naively, this will destroy the PR_MTE_TAG_MASK field.  Is there a
> > preferred way to change only parts of this control word?  If the answer
> > is "cache the value in userspace if you care about performance, or
> > otherwise use PR_GET_TAGGED_ADDR_CTRL as part of a read-modify-write,"
> > so be it.
> > 
> > If we think this might be an issue for software, it might be worth
> > splitting out separate prctls for each field.)
> 
> We lack some feedback from user space people on how this prctl is going
> to be used. I worked on the assumption that it is a one-off event during
> libc setup, potentially driven by some environment variable (but that's
> user's problem).
> 
> There were some suggestions that on an async SIGSEGV, the handler may
> switch to synchronous mode. Since that's a rare event, a get/set
> approach would be fine.
> 
> Anyway, with an additional argument to prctl (we have 3 spare), we could
> do a set/clear mask approach. The current behaviour could be emulated
> as:
> 
>   prctl(PR_SET_TAGGED_ADDR_CTRL, PR_MTE_bits, -1UL, 0, 0);
> 
> where -1 is the clear mask. The mask can be 0 for the initial prctl() or
> we can say that if the mask is non-zero, only the bits in the mask will
> be set.
> 
> If you want to only set the TCF bits:
> 
>   prctl(PR_SET_TAGGED_ADDR_CTRL, PR_MTE_TCF_SYNC, PR_MTE_TCF_MASK, 0, 0);

If this isn't critical path, I guess it's not a big deal either way.

If we make that mask argument an mask of bits _not_ to change than we
can add it as a backwards compatible extension later on without having
to define it now.  As you suggest, it may never matter.

So, I don't object to this staying as-is.

> > > +Tag checking can also be disabled for a user thread by setting the
> > > +``PSTATE.TCO`` bit with ``MSR TCO, #1``.
> > 
> > Users should probably not touch this unless they know what they're
> > doing -- should this flag ever be left set across function boundaries
> > etc.?
> 
> We can't control function boundaries from the kernel anyway.
> 
> > What's it for?  Temporarily masking MTE faults in critical sections?
> > Is this self-synchronising... what happens to pending asynchronous
> > faults?  Are faults occurring while the flag is set pended or discarded?
> 
> Something like a garbage collector scanning the memory. Since we do not
> allow tag 0 as a match-all, it needs a cheaper option than prctl().
> 
> > > +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
> > > +irrespective of the interrupted context.
> > 
> > Rationale?  Do we have advice on what signal handlers should do?
> 
> Well, that's the default mode - tag check override = 0, it means that
> tag checking takes place.

Sort of implies that a SIGSEGV handler must be careful not to trigger
any more faults.  But I guess that's nothing new.

> 
> > Is PSTATE.TC0 restored by sigreturn?
> 
> s/TC0/TCO/
> 
> Yes, it is restored on sigreturn.

OK.  I think it's worth mentioning (does no harm, anyway).

> 
> > > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> > > +are only checked if the current thread tag checking mode is
> > > +PR_MTE_TCF_SYNC.
> > 
> > Vague?  Can we make a precise statement about when the kernel will and
> > won't check such accesses?  And aren't there limitations (like use of
> > get_user_pages() etc.)?
> 
> We could make it slightly clearer by say "kernel accesses to the user
> address space".

That's not the ambiguity.

My question is

1) Does the kernel guarantee not to check tags on kernel accesses to user memory without PR_MTE_TCF_SYNC?

2) Does the kernel guarantee to check tags on kernel accesses to user memory with PR_MTE_TCF_SYNC?


In practice, this note sounds to be more like a kernel implementation
detail rather than advice to userspace.

Would it make sense to say something like:

 * PR_MTE_TCF_NONE: the kernel does not check tags for kernel accesses
   to use memory done by syscalls in the thread.

 * PR_MTE_TCF_ASYNC: the kernel may check some tags for kernel accesses
   to user memory done by syscalls.  (Should we guarantee that such
   faults are reported synchronously on syscall exit?  In practice I
   think they are.  Should we use SEGV_MTESERR in this case?  Perhaps
   it's not worth making this a special case.)
   
 * PR_MTE_TCF_SYNC: the kernel makes best efforts to check tags for
   kernel accesses to user memory done by the syscalls, but does not
   guarantee to check everything (or does it?  I thought we can't really
   do that for some odd cases...)

> > > +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
> > > +-----------------------------------------------------------------
> > > +
> > > +The architecture allows excluding certain tags to be randomly generated
> > > +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
> > 
> > Can we have a separate section on what execve() and fork()/clone() do
> > to the MTE controls and PSTATE.TCO?  "By default" could mean a variety
> > of things, and I'm not sure we cover everything.
> 
> Good point. I'll add a note on initial state for processes and threads.
> 
> > Is PROT_MTE ever set on the initial pages mapped by execve()?
> 
> No. There were discussions about mapping the initial stack with PROT_MTE
> based on some ELF note but it can also be done in userspace with
> mprotect(). I think we concluded that the .data/.bss sections will be
> untagged.

Yes, I recall.  Sounds fine: probably worth mentioning here that
PROT_MTE is never set on the exec mappings for now.

> > > +excludes all tags other than 0. A user thread can enable specific tags
> > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > +
> > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > 
> > Is there no way to make this default to 1 rather than having a magic
> > meaning for 0?
> 
> We follow the hardware behaviour where 0xffff and 0xfffe give the same
> result.

Exposing this through a purely software interface seems a bit odd:
because the exclude mask is privileged-access-only, the architecture
could amend it to assign a different meaning to 0xffff, providing this
was an opt-in change.  Then we'd have to make a mess here.

Can't we just forbid the nonsense value 0 here, or are there other
reasons why that's problematic?

I presume the architecture defines a meaning for 0 to avoid making
it UNPREDICTABLE etc., not because this is deemed useful.

> > > +The ``ptrace()`` interface
> > > +--------------------------
> > > +
> > > +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
> > > +the tags from or set the tags to a tracee's address space. The
> > > +``ptrace()`` syscall is invoked as ``ptrace(request, pid, addr, data)``
> > > +where:
> > > +
> > > +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
> > > +- ``pid`` - the tracee's PID.
> > > +- ``addr`` - address in the tracee's address space.
> > 
> > What if addr is not 16-byte aligned?  Is this considered valid use?
> 
> Yes, I don't think we should impose a restriction here. Each address in
> a 16-byte range has the same (shared) tag.

OK.  We might want to clarify what this means when addr is misaligned:
we do not colour the 16 bytes starting at addr, but the reader might
assume that's what happens.

> > > +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
> > > +  a buffer of ``iov_len`` length in the tracer's address space.
> > 
> > What's the data format for the copied tags?
> 
> I could state that the tag are placed in the lower 4-bit of the byte
> with the upper 4-bit set to 0.

What if it's not?  I didn't find this in the architecture spec, but I
didn't look very hard so far...

> > > +The tags in the tracer's ``iov_base`` buffer are represented as one tag
> > > +per byte and correspond to a 16-byte MTE tag granule in the tracee's
> > > +address space.
> > 
> > We could say that the whole operation accesses the tags for 16 * iov_len
> > bytes of the tracee's address space.  Maybe superfluous though.
> > 
> > > +
> > > +``ptrace()`` return value:
> > > +
> > > +- 0 - success, the tracer's ``iov_len`` was updated to the number of
> > > +  tags copied (it may be smaller than the requested ``iov_len`` if the
> > > +  requested address range in the tracee's or the tracer's space cannot
> > > +  be fully accessed).
> > 
> > I'd replace "success" with something like "some tags were copied:
> > ``iov_len`` is updated to indicate the actual number of tags
> > transferred.  This may be fewer than requested: [...]"
> > 
> > Can we get a short PEEKTAGS/POKETAGS for transient reasons (like minor
> > page faults)?  i.e., should the caller attempt to retry, or is that a
> > a stupid thing to do?
> 
> I initially thought it should retry but managed to get the interface so
> that no retries are needed. If fewer tags were transferred, it's for a
> good reason (e.g. permission fault).

OK, we should mention that here then.  Software that retries things that
can't make progress can get stuck in a loop (or at least waste cycles).

> > > +            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
> > > +                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > 
> > Is this a vaild assignment?
> > 
> > I can't remember whether C's "pointer values must be correctly aligned"
> > rule applies only to dereferences, or whether it applies to conversions
> > too.  From memory I have a feeling that it does.
> > 
> > If so, the compiler could legimitately optimise the failure check away,
> > since MAP_FAILED is not correctly aligned for unsigned long.
> 
> I'm not going to dig into standards ;). I can change this to an unsigned
> char *.

Sure, I guess that solves the problem.

Something like

	void *p;
	unsigned long *a;

	p = mmap( ... );
	if (p == MAP_FAILED) {
		/* barf */
	}

	a = p;

might provide a clue that care is needed, but it's not essential.

> 
> > > +            printf("Expecting SIGSEGV...\n");
> > > +            a[2] = 0xdead;
> > > +
> > > +            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
> > > +            printf("...done\n");
> > > +
> > > +            return 0;
> > > +    }
> > 
> > Since this shouldn't happen, can we print an error and return nonzero?
> 
> Fair enough. I also agree with the other points you raised but to which
> I haven't explicitly commented.
> 
> Thanks for the review, really useful.

Np

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-04-21 14:26 ` [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
  2020-04-29 16:47   ` Dave Martin
@ 2020-05-05 10:32   ` Szabolcs Nagy
  2020-05-05 17:30     ` Catalin Marinas
  1 sibling, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2020-05-05 10:32 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, nd

The 04/21/2020 15:26, Catalin Marinas wrote:
> diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
> new file mode 100644
> index 000000000000..f82dfbd70061
> --- /dev/null
> +++ b/Documentation/arm64/memory-tagging-extension.rst
> @@ -0,0 +1,260 @@
> +===============================================
> +Memory Tagging Extension (MTE) in AArch64 Linux
> +===============================================
> +
> +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
> +         Catalin Marinas <catalin.marinas@arm.com>
> +
> +Date: 2020-02-25
> +
> +This document describes the provision of the Memory Tagging Extension
> +functionality in AArch64 Linux.
> +
> +Introduction
> +============
> +
> +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
> +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
> +(Top Byte Ignore) feature and allows software to access a 4-bit
> +allocation tag for each 16-byte granule in the physical address space.
> +Such memory range must be mapped with the Normal-Tagged memory
> +attribute. A logical tag is derived from bits 59-56 of the virtual
> +address used for the memory access. A CPU with MTE enabled will compare
> +the logical tag against the allocation tag and potentially raise an
> +exception on mismatch, subject to system registers configuration.
> +
> +Userspace Support
> +=================
> +
> +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
> +supported by the hardware, the kernel advertises the feature to
> +userspace via ``HWCAP2_MTE``.
> +
> +PROT_MTE
> +--------
> +
> +To access the allocation tags, a user process must enable the Tagged
> +memory attribute on an address range using a new ``prot`` flag for
> +``mmap()`` and ``mprotect()``:
> +
> +``PROT_MTE`` - Pages allow access to the MTE allocation tags.
> +
> +The allocation tag is set to 0 when such pages are first mapped in the
> +user address space and preserved on copy-on-write. ``MAP_SHARED`` is
> +supported and the allocation tags can be shared between processes.
> +
> +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
> +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
> +types of mapping will result in ``-EINVAL`` returned by these system
> +calls.
> +
> +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
> +be cleared by ``mprotect()``.

i think there are some non-obvious madvise operations that may
be worth documenting too for mte specific semantics.

e.g. MADV_DONTNEED or MADV_FREE can presumably drop tags which
means that existing pointers can no longer write to the memory
which is a change of behaviour compared to the non-mte case.
(affects most malloc implementations that will have to deal
with this when implementing heap coloring) there might be other
similar problems like MADV_WIPEONFORK that wont work as
currently expected when mte is enabled.

if such behaviour changes cause serious problems to existing
software there may be a need to have a way to opt out from
these changes (e.g. MADV_ flag variant that only affects the
memory content but not the tags) or to make that the default
behaviour. (but i can't tell how widely these are used in
ways that can be expected to work with PROT_MTE)


> +Tag Check Faults
> +----------------
> +
> +When ``PROT_MTE`` is enabled on an address range and a mismatch between
> +the logical and allocation tags occurs on access, there are three
> +configurable behaviours:
> +
> +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
> +  tag check fault.
> +
> +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
> +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
> +  memory access is not performed.
> +
> +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> +  thread, asynchronously following one or multiple tag check faults,
> +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> +
> +**Note**: There are no *match-all* logical tags available for user
> +applications.
> +
> +The user can select the above modes, per thread, using the
> +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
> +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
> +bit-field:
> +
> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
> +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
> +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
> +
> +Tag checking can also be disabled for a user thread by setting the
> +``PSTATE.TCO`` bit with ``MSR TCO, #1``.
> +
> +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
> +irrespective of the interrupted context.
> +
> +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> +are only checked if the current thread tag checking mode is
> +PR_MTE_TCF_SYNC.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-05 10:32   ` Szabolcs Nagy
@ 2020-05-05 17:30     ` Catalin Marinas
  0 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-05-05 17:30 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, nd

On Tue, May 05, 2020 at 11:32:33AM +0100, Szabolcs Nagy wrote:
> The 04/21/2020 15:26, Catalin Marinas wrote:
> > +PROT_MTE
> > +--------
> > +
> > +To access the allocation tags, a user process must enable the Tagged
> > +memory attribute on an address range using a new ``prot`` flag for
> > +``mmap()`` and ``mprotect()``:
> > +
> > +``PROT_MTE`` - Pages allow access to the MTE allocation tags.
> > +
> > +The allocation tag is set to 0 when such pages are first mapped in the
> > +user address space and preserved on copy-on-write. ``MAP_SHARED`` is
> > +supported and the allocation tags can be shared between processes.
> > +
> > +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
> > +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
> > +types of mapping will result in ``-EINVAL`` returned by these system
> > +calls.
> > +
> > +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
> > +be cleared by ``mprotect()``.
> 
> i think there are some non-obvious madvise operations that may
> be worth documenting too for mte specific semantics.
> 
> e.g. MADV_DONTNEED or MADV_FREE can presumably drop tags which
> means that existing pointers can no longer write to the memory
> which is a change of behaviour compared to the non-mte case.
> (affects most malloc implementations that will have to deal
> with this when implementing heap coloring) there might be other
> similar problems like MADV_WIPEONFORK that wont work as
> currently expected when mte is enabled.
> 
> if such behaviour changes cause serious problems to existing
> software there may be a need to have a way to opt out from
> these changes (e.g. MADV_ flag variant that only affects the
> memory content but not the tags) or to make that the default
> behaviour. (but i can't tell how widely these are used in
> ways that can be expected to work with PROT_MTE)

Thanks. I'll document this behaviour as it may not be obvious.

For the record (as we discussed this internally), I think the kernel
behaviour is entirely expected. On mmap(PROT_MTE), the kernel would
return pages with tags set to 0. On madvise(MADV_DONTNEED), the kernel
may free the pages but map them back on access using the same conditions
they were previously given to the user, i.e. tags set to 0. There isn't
any expectations for the kernel to preserve the tags of
MADV_DONTNEED/FREE pages (which defeats the point of dontneed/free).

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
                     ` (2 preceding siblings ...)
  2020-04-29 16:46   ` Dave Martin
@ 2020-05-05 18:03   ` Luis Machado
  2020-05-12 19:05   ` Luis Machado
  4 siblings, 0 replies; 81+ messages in thread
From: Luis Machado @ 2020-05-05 18:03 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Alan Hayward, Omair Javaid

On 4/21/20 11:25 AM, Catalin Marinas wrote:
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
> 
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
> 
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
> 
> Notes:
>      New in v3.
> 
>   arch/arm64/include/asm/mte.h         |  17 ++++
>   arch/arm64/include/uapi/asm/ptrace.h |   3 +
>   arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>   arch/arm64/kernel/ptrace.c           |  15 +++-
>   arch/arm64/lib/mte.S                 |  50 +++++++++++
>   5 files changed, 211 insertions(+), 1 deletion(-)

I'll try to exercise the new ptrace requests with QEMU and GDB.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-04 16:46       ` Dave Martin
@ 2020-05-11 16:40         ` Catalin Marinas
  2020-05-13 15:48           ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-11 16:40 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Richard Earnshaw, Will Deacon, Szabolcs Nagy,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Mon, May 04, 2020 at 05:46:17PM +0100, Dave P Martin wrote:
> On Thu, Apr 30, 2020 at 05:23:17PM +0100, Catalin Marinas wrote:
> > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> > > > +  thread, asynchronously following one or multiple tag check faults,
> > > > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> > > 
> > > For "current thread": that's a kernel concept.  For user-facing
> > > documentation, can we say "the offending thread" or similar?
> > > 
> > > For clarity, it's worth saying that the faulting address is not
> > > reported.  Or, we could be optimistic that someday this information will
> > > be available and say that si_addr is the faulting address if available,
> > > with 0 meaning the address is not available.
> > > 
> > > Maybe (void *)-1 would be better duff address, but I can't see it
> > > mattering much.  If there's already precedent for si_addr==0 elsewhere,
> > > it makes sense to follow it.
> > 
> > At a quick grep, I can see a few instances on other architectures where
> > si_addr==0. I'll add a comment here.
> 
> OK, cool
> 
> Except: what if we're in PR_MTE_TCF_ASYNC mode.  If the SIGSEGV handler
> triggers an asynchronous MTE fault itself, we could then get into a
> spin.  Hmm.

How do we handle standard segfaults here? Presumably a signal handler
can trigger a SIGSEGV itself.

> I take it we drain any pending MTE faults when crossing EL boundaries?

We clear the hardware bit on entry to EL1 from EL0 and set a TIF flag.

> In that case, an asynchronous MTE fault pending at sigreturn must have
> been caused by the signal handler.  We could make that particular case
> of MTE_AERR a force_sig.

We clear the TIF flag when delivering the signal. I don't think there is
a way for the kernel to detect when it is running in a signal handler.
sigreturn() is not mandatory either.

> > > > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> > > > +are only checked if the current thread tag checking mode is
> > > > +PR_MTE_TCF_SYNC.
> > > 
> > > Vague?  Can we make a precise statement about when the kernel will and
> > > won't check such accesses?  And aren't there limitations (like use of
> > > get_user_pages() etc.)?
> > 
> > We could make it slightly clearer by say "kernel accesses to the user
> > address space".
> 
> That's not the ambiguity.
> 
> My question is
> 
> 1) Does the kernel guarantee not to check tags on kernel accesses to
> user memory without PR_MTE_TCF_SYNC?

For ASYNC and NONE, yes, we can guarantee this.

> 2) Does the kernel guarantee to check tags on kernel accesses to user
> memory with PR_MTE_TCF_SYNC?

I'd say yes but it depends on how much knowledge one has about the
syscall implementation. If it's access to user address directly, it
would be checked. If it goes via get_user_pages(), it won't. Since the
user doesn't need to have knowledge of the kernel internals, you are
right that we don't guarantee this.

> In practice, this note sounds to be more like a kernel implementation
> detail rather than advice to userspace.
> 
> Would it make sense to say something like:
> 
>  * PR_MTE_TCF_NONE: the kernel does not check tags for kernel accesses
>    to use memory done by syscalls in the thread.
> 
>  * PR_MTE_TCF_ASYNC: the kernel may check some tags for kernel accesses
>    to user memory done by syscalls.  (Should we guarantee that such
>    faults are reported synchronously on syscall exit?  In practice I
>    think they are.  Should we use SEGV_MTESERR in this case?  Perhaps
>    it's not worth making this a special case.)

Both NONE and ASYNC are now the same for kernel uaccess - not checked.

For background information, I decided against ASYNC uaccess checking
since (1) there are some cases where the kernel overreads
(strncpy_from_user) and (2) we don't normally generate SIGSEGV on
uaccess but rather return -EFAULT. The latter is not possible to contain
since we only learn about the fault asynchronously, usually after the
transfer.

>  * PR_MTE_TCF_SYNC: the kernel makes best efforts to check tags for
>    kernel accesses to user memory done by the syscalls, but does not
>    guarantee to check everything (or does it?  I thought we can't really
>    do that for some odd cases...)

It doesn't. I'll add some notes along the lines of your text above.

> > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > +
> > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > 
> > > Is there no way to make this default to 1 rather than having a magic
> > > meaning for 0?
> > 
> > We follow the hardware behaviour where 0xffff and 0xfffe give the same
> > result.
> 
> Exposing this through a purely software interface seems a bit odd:
> because the exclude mask is privileged-access-only, the architecture
> could amend it to assign a different meaning to 0xffff, providing this
> was an opt-in change.  Then we'd have to make a mess here.

You have a point. An include mask of 0 translates to an exclude mask of
0xffff as per the current patches. If the hardware gains support for one
more bit (32 colours), old software running on new hardware may run into
unexpected results with an exclude mask of 0xffff.

> Can't we just forbid the nonsense value 0 here, or are there other
> reasons why that's problematic?

It was just easier to start with a default. I wonder whether we should
actually switch back to the exclude mask, as per the hardware
definition. This way 0 would mean all tags allowed. We can still
disallow 0xffff as an exclude mask.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
                     ` (3 preceding siblings ...)
  2020-05-05 18:03   ` Luis Machado
@ 2020-05-12 19:05   ` Luis Machado
  2020-05-13 10:48     ` Catalin Marinas
  4 siblings, 1 reply; 81+ messages in thread
From: Luis Machado @ 2020-05-12 19:05 UTC (permalink / raw)
  To: Catalin Marinas, linux-arm-kernel
  Cc: Will Deacon, Vincenzo Frascino, Szabolcs Nagy, Richard Earnshaw,
	Kevin Brodsky, Andrey Konovalov, Peter Collingbourne, linux-mm,
	linux-arch, Alan Hayward, Omair Javaid

Hi Catalin,

On 4/21/20 11:25 AM, Catalin Marinas wrote:
> Add support for bulk setting/getting of the MTE tags in a tracee's
> address space at 'addr' in the ptrace() syscall prototype. 'data' points
> to a struct iovec in the tracer's address space with iov_base
> representing the address of a tracer's buffer of length iov_len. The
> tags to be copied to/from the tracer's buffer are stored as one tag per
> byte.
> 
> On successfully copying at least one tag, ptrace() returns 0 and updates
> the tracer's iov_len with the number of tags copied. In case of error,
> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> page.
> 
> Note that the tag copying functions are not performance critical,
> therefore they lack optimisations found in typical memory copy routines.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Alan Hayward <Alan.Hayward@arm.com>
> Cc: Luis Machado <luis.machado@linaro.org>
> Cc: Omair Javaid <omair.javaid@linaro.org>
> ---
> 
> Notes:
>      New in v3.
> 
>   arch/arm64/include/asm/mte.h         |  17 ++++
>   arch/arm64/include/uapi/asm/ptrace.h |   3 +
>   arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>   arch/arm64/kernel/ptrace.c           |  15 +++-
>   arch/arm64/lib/mte.S                 |  50 +++++++++++
>   5 files changed, 211 insertions(+), 1 deletion(-)
>
I started working on MTE support for GDB and I'm wondering if we've 
already defined a way to check for runtime MTE support (as opposed to a 
HWCAP2-based check) in a traced process.

Originally we were going to do it via empty-parameter ptrace calls, but 
you had mentioned something about a proc-based method, if I'm not mistaken.

Regards,
Luis


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-12 19:05   ` Luis Machado
@ 2020-05-13 10:48     ` Catalin Marinas
  2020-05-13 12:52       ` Luis Machado
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-13 10:48 UTC (permalink / raw)
  To: Luis Machado
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

Hi Luis,

On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
> On 4/21/20 11:25 AM, Catalin Marinas wrote:
> > Add support for bulk setting/getting of the MTE tags in a tracee's
> > address space at 'addr' in the ptrace() syscall prototype. 'data' points
> > to a struct iovec in the tracer's address space with iov_base
> > representing the address of a tracer's buffer of length iov_len. The
> > tags to be copied to/from the tracer's buffer are stored as one tag per
> > byte.
> > 
> > On successfully copying at least one tag, ptrace() returns 0 and updates
> > the tracer's iov_len with the number of tags copied. In case of error,
> > either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> > page.
> > 
> > Note that the tag copying functions are not performance critical,
> > therefore they lack optimisations found in typical memory copy routines.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Alan Hayward <Alan.Hayward@arm.com>
> > Cc: Luis Machado <luis.machado@linaro.org>
> > Cc: Omair Javaid <omair.javaid@linaro.org>
> > ---
> > 
> > Notes:
> >      New in v3.
> > 
> >   arch/arm64/include/asm/mte.h         |  17 ++++
> >   arch/arm64/include/uapi/asm/ptrace.h |   3 +
> >   arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
> >   arch/arm64/kernel/ptrace.c           |  15 +++-
> >   arch/arm64/lib/mte.S                 |  50 +++++++++++
> >   5 files changed, 211 insertions(+), 1 deletion(-)
> > 
> I started working on MTE support for GDB and I'm wondering if we've already
> defined a way to check for runtime MTE support (as opposed to a HWCAP2-based
> check) in a traced process.
> 
> Originally we were going to do it via empty-parameter ptrace calls, but you
> had mentioned something about a proc-based method, if I'm not mistaken.

We could expose more information via proc_pid_arch_status() but that
would be the tagged address ABI and tag check fault mode and intended
for human consumption mostly. We don't have any ptrace interface that
exposes HWCAPs. Since the gdbserver runs on the same machine as the
debugged process, it can check the HWCAPs itself, they are the same for
all processes.

BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
the ptrace tag access slightly to return an error (and no tags copied)
if the page has not been mapped with PROT_MTE. The other option would
have been read-as-zero/write-ignored as per the hardware behaviour.
Either option is fine by me but I thought the write-ignored part would
be more confusing for the debugger. If you have any preference here,
please let me know.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 10:48     ` Catalin Marinas
@ 2020-05-13 12:52       ` Luis Machado
  2020-05-13 14:11         ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Luis Machado @ 2020-05-13 12:52 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

On 5/13/20 7:48 AM, Catalin Marinas wrote:
> Hi Luis,
> 
> On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
>> On 4/21/20 11:25 AM, Catalin Marinas wrote:
>>> Add support for bulk setting/getting of the MTE tags in a tracee's
>>> address space at 'addr' in the ptrace() syscall prototype. 'data' points
>>> to a struct iovec in the tracer's address space with iov_base
>>> representing the address of a tracer's buffer of length iov_len. The
>>> tags to be copied to/from the tracer's buffer are stored as one tag per
>>> byte.
>>>
>>> On successfully copying at least one tag, ptrace() returns 0 and updates
>>> the tracer's iov_len with the number of tags copied. In case of error,
>>> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
>>> page.
>>>
>>> Note that the tag copying functions are not performance critical,
>>> therefore they lack optimisations found in typical memory copy routines.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Will Deacon <will@kernel.org>
>>> Cc: Alan Hayward <Alan.Hayward@arm.com>
>>> Cc: Luis Machado <luis.machado@linaro.org>
>>> Cc: Omair Javaid <omair.javaid@linaro.org>
>>> ---
>>>
>>> Notes:
>>>       New in v3.
>>>
>>>    arch/arm64/include/asm/mte.h         |  17 ++++
>>>    arch/arm64/include/uapi/asm/ptrace.h |   3 +
>>>    arch/arm64/kernel/mte.c              | 127 +++++++++++++++++++++++++++
>>>    arch/arm64/kernel/ptrace.c           |  15 +++-
>>>    arch/arm64/lib/mte.S                 |  50 +++++++++++
>>>    5 files changed, 211 insertions(+), 1 deletion(-)
>>>
>> I started working on MTE support for GDB and I'm wondering if we've already
>> defined a way to check for runtime MTE support (as opposed to a HWCAP2-based
>> check) in a traced process.
>>
>> Originally we were going to do it via empty-parameter ptrace calls, but you
>> had mentioned something about a proc-based method, if I'm not mistaken.
> 
> We could expose more information via proc_pid_arch_status() but that
> would be the tagged address ABI and tag check fault mode and intended
> for human consumption mostly. We don't have any ptrace interface that
> exposes HWCAPs. Since the gdbserver runs on the same machine as the
> debugged process, it can check the HWCAPs itself, they are the same for
> all processes.

Sorry, I think i haven't made it clear. I already have access to HWCAP2 
both from GDB's and gdbserver's side. But HWCAP2 only indicates the 
availability of a particular feature in a CPU, it doesn't necessarily 
means the traced process is actively using MTE, right?

So GDB/gdbserver would need runtime checks to be able to tell if a 
process is using MTE, in which case the tools will pay attention to tags 
and additional MTE-related registers (sctlr and gcr) we plan to make 
available to userspace.

This would be similar to SVE, where we have a HWCAP bit indicating the 
presence of the feature, but it may not be in use at runtime for a 
particular running process.

The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a 
NULL address and check the result. Then GDB would be able to decide if 
the process is using MTE or not.

> 
> BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
> the ptrace tag access slightly to return an error (and no tags copied)
> if the page has not been mapped with PROT_MTE. The other option would
> have been read-as-zero/write-ignored as per the hardware behaviour.
> Either option is fine by me but I thought the write-ignored part would
> be more confusing for the debugger. If you have any preference here,
> please let me know.
> 

I think erroring out is a better alternative, as long as the debugger 
can tell what the error means, like, for example, "this particular 
address doesn't make use of tags".


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 12:52       ` Luis Machado
@ 2020-05-13 14:11         ` Catalin Marinas
  2020-05-13 15:09           ` Luis Machado
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-13 14:11 UTC (permalink / raw)
  To: Luis Machado
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

On Wed, May 13, 2020 at 09:52:52AM -0300, Luis Machado wrote:
> On 5/13/20 7:48 AM, Catalin Marinas wrote:
> > On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
> > > On 4/21/20 11:25 AM, Catalin Marinas wrote:
> > > > Add support for bulk setting/getting of the MTE tags in a tracee's
> > > > address space at 'addr' in the ptrace() syscall prototype. 'data' points
> > > > to a struct iovec in the tracer's address space with iov_base
> > > > representing the address of a tracer's buffer of length iov_len. The
> > > > tags to be copied to/from the tracer's buffer are stored as one tag per
> > > > byte.
> > > > 
> > > > On successfully copying at least one tag, ptrace() returns 0 and updates
> > > > the tracer's iov_len with the number of tags copied. In case of error,
> > > > either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> > > > page.
> > > > 
> > > > Note that the tag copying functions are not performance critical,
> > > > therefore they lack optimisations found in typical memory copy routines.
> > > > 
> > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Alan Hayward <Alan.Hayward@arm.com>
> > > > Cc: Luis Machado <luis.machado@linaro.org>
> > > > Cc: Omair Javaid <omair.javaid@linaro.org>
> > > 
> > > I started working on MTE support for GDB and I'm wondering if we've already
> > > defined a way to check for runtime MTE support (as opposed to a HWCAP2-based
> > > check) in a traced process.
> > > 
> > > Originally we were going to do it via empty-parameter ptrace calls, but you
> > > had mentioned something about a proc-based method, if I'm not mistaken.
> > 
> > We could expose more information via proc_pid_arch_status() but that
> > would be the tagged address ABI and tag check fault mode and intended
> > for human consumption mostly. We don't have any ptrace interface that
> > exposes HWCAPs. Since the gdbserver runs on the same machine as the
> > debugged process, it can check the HWCAPs itself, they are the same for
> > all processes.
> 
> Sorry, I think i haven't made it clear. I already have access to HWCAP2 both
> from GDB's and gdbserver's side. But HWCAP2 only indicates the availability
> of a particular feature in a CPU, it doesn't necessarily means the traced
> process is actively using MTE, right?

Right, but "actively" is not well defined either. The only way to tell
whether a process is using MTE is to look for any PROT_MTE mappings. You
can access these via /proc/<pid>/maps. In theory, one can use MTE
without enabling the tagged address ABI or even tag checking (i.e. no
prctl() call).

> So GDB/gdbserver would need runtime checks to be able to tell if a process
> is using MTE, in which case the tools will pay attention to tags and
> additional MTE-related registers (sctlr and gcr) we plan to make available
> to userspace.

I'm happy to expose GCR_EL1.Excl and the SCTLR_EL1.TCF0 bits via ptrace
as a thread state. The tags, however, are a property of the memory range
rather than a per-thread state. That's what makes it different from
other register-based features like SVE.

> The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a NULL
> address and check the result. Then GDB would be able to decide if the
> process is using MTE or not.

We don't store this information in the kernel as a bool and I don't
think it would be useful either. I think gdb, when displaying memory,
should attempt to show tags as well if the corresponding range was
mapped with PROT_MTE. Just probing whether a thread ever used MTE
doesn't help since you need to be more precise on which address supports
tags.

> > BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
> > the ptrace tag access slightly to return an error (and no tags copied)
> > if the page has not been mapped with PROT_MTE. The other option would
> > have been read-as-zero/write-ignored as per the hardware behaviour.
> > Either option is fine by me but I thought the write-ignored part would
> > be more confusing for the debugger. If you have any preference here,
> > please let me know.
> 
> I think erroring out is a better alternative, as long as the debugger can
> tell what the error means, like, for example, "this particular address
> doesn't make use of tags".

And you could use this for probing whether the range has tags or not.
With my current patches it returns -EFAULT but happy to change this to
-EOPNOTSUPP or -EINVAL. Note that it only returns an error if no tags
copied. If gdb asks for a range of two pages and only the first one has
PROT_MTE, it will return 0 and set the number of tags copied equivalent
to the first page. A subsequent call would return an error.

In my discussion with Dave on the documentation patch, I thought retries
wouldn't be needed but in the above case it may be useful to get an
error code. That's unless we change the interface to return an error and
also update the user iovec structure.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 14:11         ` Catalin Marinas
@ 2020-05-13 15:09           ` Luis Machado
  2020-05-13 16:45             ` Luis Machado
  0 siblings, 1 reply; 81+ messages in thread
From: Luis Machado @ 2020-05-13 15:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

On 5/13/20 11:11 AM, Catalin Marinas wrote:
> On Wed, May 13, 2020 at 09:52:52AM -0300, Luis Machado wrote:
>> On 5/13/20 7:48 AM, Catalin Marinas wrote:
>>> On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
>>>> On 4/21/20 11:25 AM, Catalin Marinas wrote:
>>>>> Add support for bulk setting/getting of the MTE tags in a tracee's
>>>>> address space at 'addr' in the ptrace() syscall prototype. 'data' points
>>>>> to a struct iovec in the tracer's address space with iov_base
>>>>> representing the address of a tracer's buffer of length iov_len. The
>>>>> tags to be copied to/from the tracer's buffer are stored as one tag per
>>>>> byte.
>>>>>
>>>>> On successfully copying at least one tag, ptrace() returns 0 and updates
>>>>> the tracer's iov_len with the number of tags copied. In case of error,
>>>>> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
>>>>> page.
>>>>>
>>>>> Note that the tag copying functions are not performance critical,
>>>>> therefore they lack optimisations found in typical memory copy routines.
>>>>>
>>>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Will Deacon <will@kernel.org>
>>>>> Cc: Alan Hayward <Alan.Hayward@arm.com>
>>>>> Cc: Luis Machado <luis.machado@linaro.org>
>>>>> Cc: Omair Javaid <omair.javaid@linaro.org>
>>>>
>>>> I started working on MTE support for GDB and I'm wondering if we've already
>>>> defined a way to check for runtime MTE support (as opposed to a HWCAP2-based
>>>> check) in a traced process.
>>>>
>>>> Originally we were going to do it via empty-parameter ptrace calls, but you
>>>> had mentioned something about a proc-based method, if I'm not mistaken.
>>>
>>> We could expose more information via proc_pid_arch_status() but that
>>> would be the tagged address ABI and tag check fault mode and intended
>>> for human consumption mostly. We don't have any ptrace interface that
>>> exposes HWCAPs. Since the gdbserver runs on the same machine as the
>>> debugged process, it can check the HWCAPs itself, they are the same for
>>> all processes.
>>
>> Sorry, I think i haven't made it clear. I already have access to HWCAP2 both
>> from GDB's and gdbserver's side. But HWCAP2 only indicates the availability
>> of a particular feature in a CPU, it doesn't necessarily means the traced
>> process is actively using MTE, right?
> 
> Right, but "actively" is not well defined either. The only way to tell
> whether a process is using MTE is to look for any PROT_MTE mappings. You
> can access these via /proc/<pid>/maps. In theory, one can use MTE
> without enabling the tagged address ABI or even tag checking (i.e. no
> prctl() call).
> 

I see the problem. I was hoping for a more immediate form of runtime 
check. One debuggers would validate and enable all the tag checks and 
register access at process attach/startup.

With that said, checking for PROT_MTE in /proc/<pid>/maps may still be 
useful, but a process with no immediate PROT_MTE maps doesn't mean such 
process won't attempt to use PROT_MTE later on. I'll have to factor that 
in, but I think it'll work.

I guess HWCAP2_MTE will be useful after all. We can just assume that 
whenever we have HWCAP2_MTE, we can fetch MTE registers and check for 
PROT_MTE.

>> So GDB/gdbserver would need runtime checks to be able to tell if a process
>> is using MTE, in which case the tools will pay attention to tags and
>> additional MTE-related registers (sctlr and gcr) we plan to make available
>> to userspace.
> 
> I'm happy to expose GCR_EL1.Excl and the SCTLR_EL1.TCF0 bits via ptrace
> as a thread state. The tags, however, are a property of the memory range
> rather than a per-thread state. That's what makes it different from
> other register-based features like SVE.

That's my understanding as well. I'm assuming, based on our previous 
discussion, that we'll have those couple registers under a regset (maybe 
NT_ARM_MTE).

> 
>> The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a NULL
>> address and check the result. Then GDB would be able to decide if the
>> process is using MTE or not.
> 
> We don't store this information in the kernel as a bool and I don't
> think it would be useful either. I think gdb, when displaying memory,
> should attempt to show tags as well if the corresponding range was
> mapped with PROT_MTE. Just probing whether a thread ever used MTE
> doesn't help since you need to be more precise on which address supports
> tags.

Thanks for making this clear. Checking with ptrace won't work then. It 
seems like /proc/<pid>/maps is the way to go.

> 
>>> BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
>>> the ptrace tag access slightly to return an error (and no tags copied)
>>> if the page has not been mapped with PROT_MTE. The other option would
>>> have been read-as-zero/write-ignored as per the hardware behaviour.
>>> Either option is fine by me but I thought the write-ignored part would
>>> be more confusing for the debugger. If you have any preference here,
>>> please let me know.
>>
>> I think erroring out is a better alternative, as long as the debugger can
>> tell what the error means, like, for example, "this particular address
>> doesn't make use of tags".
> 
> And you could use this for probing whether the range has tags or not.
> With my current patches it returns -EFAULT but happy to change this to
> -EOPNOTSUPP or -EINVAL. Note that it only returns an error if no tags
> copied. If gdb asks for a range of two pages and only the first one has
> PROT_MTE, it will return 0 and set the number of tags copied equivalent
> to the first page. A subsequent call would return an error.
> 
> In my discussion with Dave on the documentation patch, I thought retries
> wouldn't be needed but in the above case it may be useful to get an
> error code. That's unless we change the interface to return an error and
> also update the user iovec structure.
> 

Let me think about this for a bit. I'm trying to factor in the 
/proc/<pid>/maps contents. If debuggers know which pages have PROT_MTE 
set, then we can teach the tools not to PEEK/POKE tags from/to those 
memory ranges, which simplifies the error handling a bit.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-11 16:40         ` Catalin Marinas
@ 2020-05-13 15:48           ` Dave Martin
  2020-05-14 11:37             ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-05-13 15:48 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, Richard Earnshaw, Szabolcs Nagy, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On Mon, May 11, 2020 at 05:40:19PM +0100, Catalin Marinas wrote:
> On Mon, May 04, 2020 at 05:46:17PM +0100, Dave P Martin wrote:
> > On Thu, Apr 30, 2020 at 05:23:17PM +0100, Catalin Marinas wrote:
> > > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> > > > > +  thread, asynchronously following one or multiple tag check faults,
> > > > > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> > > > 
> > > > For "current thread": that's a kernel concept.  For user-facing
> > > > documentation, can we say "the offending thread" or similar?
> > > > 
> > > > For clarity, it's worth saying that the faulting address is not
> > > > reported.  Or, we could be optimistic that someday this information will
> > > > be available and say that si_addr is the faulting address if available,
> > > > with 0 meaning the address is not available.
> > > > 
> > > > Maybe (void *)-1 would be better duff address, but I can't see it
> > > > mattering much.  If there's already precedent for si_addr==0 elsewhere,
> > > > it makes sense to follow it.
> > > 
> > > At a quick grep, I can see a few instances on other architectures where
> > > si_addr==0. I'll add a comment here.
> > 
> > OK, cool
> > 
> > Except: what if we're in PR_MTE_TCF_ASYNC mode.  If the SIGSEGV handler
> > triggers an asynchronous MTE fault itself, we could then get into a
> > spin.  Hmm.
> 
> How do we handle standard segfaults here? Presumably a signal handler
> can trigger a SIGSEGV itself.

This is similar to the problem is a data abort inside the data abort
handler.  It can of course happen, but if you don't want this to be
fatal then you code the handler carefully so this can't happen.

> > I take it we drain any pending MTE faults when crossing EL boundaries?
> 
> We clear the hardware bit on entry to EL1 from EL0 and set a TIF flag.
> 
> > In that case, an asynchronous MTE fault pending at sigreturn must have
> > been caused by the signal handler.  We could make that particular case
> > of MTE_AERR a force_sig.
> 
> We clear the TIF flag when delivering the signal. I don't think there is
> a way for the kernel to detect when it is running in a signal handler.
> sigreturn() is not mandatory either.

I guess we can put up with this signal not being fatal then.

If you have a SEGV handler at all, you're supposed to code it carefully.

This brings us back to force_sig for SERR and a normal signal for AERR.
That's probably OK.

> 
> > > > > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> > > > > +are only checked if the current thread tag checking mode is
> > > > > +PR_MTE_TCF_SYNC.
> > > > 
> > > > Vague?  Can we make a precise statement about when the kernel will and
> > > > won't check such accesses?  And aren't there limitations (like use of
> > > > get_user_pages() etc.)?
> > > 
> > > We could make it slightly clearer by say "kernel accesses to the user
> > > address space".
> > 
> > That's not the ambiguity.
> > 
> > My question is
> > 
> > 1) Does the kernel guarantee not to check tags on kernel accesses to
> > user memory without PR_MTE_TCF_SYNC?
> 
> For ASYNC and NONE, yes, we can guarantee this.
> 
> > 2) Does the kernel guarantee to check tags on kernel accesses to user
> > memory with PR_MTE_TCF_SYNC?
> 
> I'd say yes but it depends on how much knowledge one has about the
> syscall implementation. If it's access to user address directly, it
> would be checked. If it goes via get_user_pages(), it won't. Since the
> user doesn't need to have knowledge of the kernel internals, you are
> right that we don't guarantee this.

So, from userspace it's not guaranteed.

This is what I'd describe as "making best efforts", but not a guarantee.

> > In practice, this note sounds to be more like a kernel implementation
> > detail rather than advice to userspace.
> > 
> > Would it make sense to say something like:
> > 
> >  * PR_MTE_TCF_NONE: the kernel does not check tags for kernel accesses
> >    to use memory done by syscalls in the thread.
> > 
> >  * PR_MTE_TCF_ASYNC: the kernel may check some tags for kernel accesses
> >    to user memory done by syscalls.  (Should we guarantee that such
> >    faults are reported synchronously on syscall exit?  In practice I
> >    think they are.  Should we use SEGV_MTESERR in this case?  Perhaps
> >    it's not worth making this a special case.)
> 
> Both NONE and ASYNC are now the same for kernel uaccess - not checked.
>
> For background information, I decided against ASYNC uaccess checking
> since (1) there are some cases where the kernel overreads
> (strncpy_from_user) and (2) we don't normally generate SIGSEGV on
> uaccess but rather return -EFAULT. The latter is not possible to contain
> since we only learn about the fault asynchronously, usually after the
> transfer.

I may be missing something here.  Do we still rely on the hardware to
detect tag mismatches in kernel accesses to user memory?  I was assuming
we do some kind of explicit checking, but now I think that's nonsense
(except for get_user_pages() etc.)


Since MTE is a new opt-in feature, I think we might have the option to
report failures with SIGSEGV instead of -EFAULT.  This seems exactly to
implement the concept of an asynchronous versus synchronous error. 

The kernel may not normally do this, but software usually doesn't use
raw syscalls.  In reality "syscalls" can trigger a SIGSEGV in the libc
wrapper anyway.  From the caller's point of view the whole thing is a
black box.

Probably needs discussion with the bionic / glibc folks though (though
likely this has been discussed already...)


My concern is that the spirit of asynchrous checking in the architecture
is that accesses _are_ checked, and we seem to be breaking that
principle here.

Although MTE's guarantees are statistical, based on small random numbers
not matching, this imperfection is quite different from systematically
not checking at all, ever, on certain major code paths.

> 
> >  * PR_MTE_TCF_SYNC: the kernel makes best efforts to check tags for
> >    kernel accesses to user memory done by the syscalls, but does not
> >    guarantee to check everything (or does it?  I thought we can't really
> >    do that for some odd cases...)
> 
> It doesn't. I'll add some notes along the lines of your text above.

OK

> > > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > > +
> > > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > > 
> > > > Is there no way to make this default to 1 rather than having a magic
> > > > meaning for 0?
> > > 
> > > We follow the hardware behaviour where 0xffff and 0xfffe give the same
> > > result.
> > 
> > Exposing this through a purely software interface seems a bit odd:
> > because the exclude mask is privileged-access-only, the architecture
> > could amend it to assign a different meaning to 0xffff, providing this
> > was an opt-in change.  Then we'd have to make a mess here.
> 
> You have a point. An include mask of 0 translates to an exclude mask of
> 0xffff as per the current patches. If the hardware gains support for one
> more bit (32 colours), old software running on new hardware may run into
> unexpected results with an exclude mask of 0xffff.
> 
> > Can't we just forbid the nonsense value 0 here, or are there other
> > reasons why that's problematic?
> 
> It was just easier to start with a default. I wonder whether we should
> actually switch back to the exclude mask, as per the hardware
> definition. This way 0 would mean all tags allowed. We can still
> disallow 0xffff as an exclude mask.

If the number of bits might grow, I guess we can make the exclude mask
full-width.

For example, the hardware can trivially exclude tags 16 and up, because
they don't exist anyway.

Similarly, the hardware can trivially include tags 16 and up: inclusion
only means that the hardware is allowed to generate them, not that it
guarantees to.

The only configuration that doesn't make sense is "no tags allowed", so
I'd argue for explicity blocking that, even if the architeture alises
that encoding to something else.

If we prefer 0 as a default value so that init inherits the correct
value from the kernel without any special acrobatics, then we make it an
exclude mask, with the semantics that the hardware is allowed to
generate any of these tags, but does not have to be capable of
generating all of them.

Make sense?  This is bikeshedding from my end...


Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 15:09           ` Luis Machado
@ 2020-05-13 16:45             ` Luis Machado
  2020-05-13 17:11               ` Catalin Marinas
  2020-05-18 16:47               ` Dave Martin
  0 siblings, 2 replies; 81+ messages in thread
From: Luis Machado @ 2020-05-13 16:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

On 5/13/20 12:09 PM, Luis Machado wrote:
> On 5/13/20 11:11 AM, Catalin Marinas wrote:
>> On Wed, May 13, 2020 at 09:52:52AM -0300, Luis Machado wrote:
>>> On 5/13/20 7:48 AM, Catalin Marinas wrote:
>>>> On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
>>>>> On 4/21/20 11:25 AM, Catalin Marinas wrote:
>>>>>> Add support for bulk setting/getting of the MTE tags in a tracee's
>>>>>> address space at 'addr' in the ptrace() syscall prototype. 'data' 
>>>>>> points
>>>>>> to a struct iovec in the tracer's address space with iov_base
>>>>>> representing the address of a tracer's buffer of length iov_len. The
>>>>>> tags to be copied to/from the tracer's buffer are stored as one 
>>>>>> tag per
>>>>>> byte.
>>>>>>
>>>>>> On successfully copying at least one tag, ptrace() returns 0 and 
>>>>>> updates
>>>>>> the tracer's iov_len with the number of tags copied. In case of 
>>>>>> error,
>>>>>> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
>>>>>> page.
>>>>>>
>>>>>> Note that the tag copying functions are not performance critical,
>>>>>> therefore they lack optimisations found in typical memory copy 
>>>>>> routines.
>>>>>>
>>>>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>>>>> Cc: Will Deacon <will@kernel.org>
>>>>>> Cc: Alan Hayward <Alan.Hayward@arm.com>
>>>>>> Cc: Luis Machado <luis.machado@linaro.org>
>>>>>> Cc: Omair Javaid <omair.javaid@linaro.org>
>>>>>
>>>>> I started working on MTE support for GDB and I'm wondering if we've 
>>>>> already
>>>>> defined a way to check for runtime MTE support (as opposed to a 
>>>>> HWCAP2-based
>>>>> check) in a traced process.
>>>>>
>>>>> Originally we were going to do it via empty-parameter ptrace calls, 
>>>>> but you
>>>>> had mentioned something about a proc-based method, if I'm not 
>>>>> mistaken.
>>>>
>>>> We could expose more information via proc_pid_arch_status() but that
>>>> would be the tagged address ABI and tag check fault mode and intended
>>>> for human consumption mostly. We don't have any ptrace interface that
>>>> exposes HWCAPs. Since the gdbserver runs on the same machine as the
>>>> debugged process, it can check the HWCAPs itself, they are the same for
>>>> all processes.
>>>
>>> Sorry, I think i haven't made it clear. I already have access to 
>>> HWCAP2 both
>>> from GDB's and gdbserver's side. But HWCAP2 only indicates the 
>>> availability
>>> of a particular feature in a CPU, it doesn't necessarily means the 
>>> traced
>>> process is actively using MTE, right?
>>
>> Right, but "actively" is not well defined either. The only way to tell
>> whether a process is using MTE is to look for any PROT_MTE mappings. You
>> can access these via /proc/<pid>/maps. In theory, one can use MTE
>> without enabling the tagged address ABI or even tag checking (i.e. no
>> prctl() call).
>>
> 
> I see the problem. I was hoping for a more immediate form of runtime 
> check. One debuggers would validate and enable all the tag checks and 
> register access at process attach/startup.
> 
> With that said, checking for PROT_MTE in /proc/<pid>/maps may still be 
> useful, but a process with no immediate PROT_MTE maps doesn't mean such 
> process won't attempt to use PROT_MTE later on. I'll have to factor that 
> in, but I think it'll work.
> 
> I guess HWCAP2_MTE will be useful after all. We can just assume that 
> whenever we have HWCAP2_MTE, we can fetch MTE registers and check for 
> PROT_MTE.
> 
>>> So GDB/gdbserver would need runtime checks to be able to tell if a 
>>> process
>>> is using MTE, in which case the tools will pay attention to tags and
>>> additional MTE-related registers (sctlr and gcr) we plan to make 
>>> available
>>> to userspace.
>>
>> I'm happy to expose GCR_EL1.Excl and the SCTLR_EL1.TCF0 bits via ptrace
>> as a thread state. The tags, however, are a property of the memory range
>> rather than a per-thread state. That's what makes it different from
>> other register-based features like SVE.
> 
> That's my understanding as well. I'm assuming, based on our previous 
> discussion, that we'll have those couple registers under a regset (maybe 
> NT_ARM_MTE).
> 
>>
>>> The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a 
>>> NULL
>>> address and check the result. Then GDB would be able to decide if the
>>> process is using MTE or not.
>>
>> We don't store this information in the kernel as a bool and I don't
>> think it would be useful either. I think gdb, when displaying memory,
>> should attempt to show tags as well if the corresponding range was
>> mapped with PROT_MTE. Just probing whether a thread ever used MTE
>> doesn't help since you need to be more precise on which address supports
>> tags.
> 
> Thanks for making this clear. Checking with ptrace won't work then. It 
> seems like /proc/<pid>/maps is the way to go.
> 
>>
>>>> BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
>>>> the ptrace tag access slightly to return an error (and no tags copied)
>>>> if the page has not been mapped with PROT_MTE. The other option would
>>>> have been read-as-zero/write-ignored as per the hardware behaviour.
>>>> Either option is fine by me but I thought the write-ignored part would
>>>> be more confusing for the debugger. If you have any preference here,
>>>> please let me know.
>>>
>>> I think erroring out is a better alternative, as long as the debugger 
>>> can
>>> tell what the error means, like, for example, "this particular address
>>> doesn't make use of tags".
>>
>> And you could use this for probing whether the range has tags or not.
>> With my current patches it returns -EFAULT but happy to change this to
>> -EOPNOTSUPP or -EINVAL. Note that it only returns an error if no tags
>> copied. If gdb asks for a range of two pages and only the first one has
>> PROT_MTE, it will return 0 and set the number of tags copied equivalent
>> to the first page. A subsequent call would return an error.
>>
>> In my discussion with Dave on the documentation patch, I thought retries
>> wouldn't be needed but in the above case it may be useful to get an
>> error code. That's unless we change the interface to return an error and
>> also update the user iovec structure.
>>
> 
> Let me think about this for a bit. I'm trying to factor in the 
> /proc/<pid>/maps contents. If debuggers know which pages have PROT_MTE 
> set, then we can teach the tools not to PEEK/POKE tags from/to those 
> memory ranges, which simplifies the error handling a bit.

I was checking the output of /proc/<pid>/maps and it doesn't seem to 
contain flags against which i can match PROT_MTE. It seems 
/proc/<pid>/smaps is the one that contains the flags (mt) for MTE. Am i 
missing something?

Is this the only place debuggers can check for PROT_MTE? If so, that's 
unfortunate. /proc/<pid>/smaps doesn't seem to be convenient for parsing.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 16:45             ` Luis Machado
@ 2020-05-13 17:11               ` Catalin Marinas
  2020-05-18 16:47               ` Dave Martin
  1 sibling, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-05-13 17:11 UTC (permalink / raw)
  To: Luis Machado
  Cc: linux-arm-kernel, Will Deacon, Vincenzo Frascino, Szabolcs Nagy,
	Richard Earnshaw, Kevin Brodsky, Andrey Konovalov,
	Peter Collingbourne, linux-mm, linux-arch, Alan Hayward,
	Omair Javaid

On Wed, May 13, 2020 at 01:45:27PM -0300, Luis Machado wrote:
> On 5/13/20 12:09 PM, Luis Machado wrote:
> > Let me think about this for a bit. I'm trying to factor in the
> > /proc/<pid>/maps contents. If debuggers know which pages have PROT_MTE
> > set, then we can teach the tools not to PEEK/POKE tags from/to those
> > memory ranges, which simplifies the error handling a bit.
> 
> I was checking the output of /proc/<pid>/maps and it doesn't seem to contain
> flags against which i can match PROT_MTE. It seems /proc/<pid>/smaps is the
> one that contains the flags (mt) for MTE. Am i missing something?

You are right, the smaps is the one with the MTE information.

> Is this the only place debuggers can check for PROT_MTE? If so, that's
> unfortunate. /proc/<pid>/smaps doesn't seem to be convenient for parsing.

We can't change 'maps' as it's a pretty standard format with rwxp
properties only.

If you don't want to check any /proc file, just attempt to read the tags
and check the ptrace return code. The downside is that you can't easily
probe if a process is using MTE or not. But is this piece of information
relevant? The gdb user should know what to look for (well, it's been a
while since I used a debugger ;)).

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-13 15:48           ` Dave Martin
@ 2020-05-14 11:37             ` Catalin Marinas
  2020-05-15 10:38               ` Catalin Marinas
  2020-05-18 17:13               ` Catalin Marinas
  0 siblings, 2 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-05-14 11:37 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Richard Earnshaw, Szabolcs Nagy, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On Wed, May 13, 2020 at 04:48:46PM +0100, Dave P Martin wrote:
> On Mon, May 11, 2020 at 05:40:19PM +0100, Catalin Marinas wrote:
> > On Mon, May 04, 2020 at 05:46:17PM +0100, Dave P Martin wrote:
> > > On Thu, Apr 30, 2020 at 05:23:17PM +0100, Catalin Marinas wrote:
> > > > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > > > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current
> > > > > > +  thread, asynchronously following one or multiple tag check faults,
> > > > > > +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``.
> > > > > 
> > > > > For "current thread": that's a kernel concept.  For user-facing
> > > > > documentation, can we say "the offending thread" or similar?
> > > > > 
> > > > > For clarity, it's worth saying that the faulting address is not
> > > > > reported.  Or, we could be optimistic that someday this information will
> > > > > be available and say that si_addr is the faulting address if available,
> > > > > with 0 meaning the address is not available.
> > > > > 
> > > > > Maybe (void *)-1 would be better duff address, but I can't see it
> > > > > mattering much.  If there's already precedent for si_addr==0 elsewhere,
> > > > > it makes sense to follow it.
> > > > 
> > > > At a quick grep, I can see a few instances on other architectures where
> > > > si_addr==0. I'll add a comment here.
> > > 
> > > OK, cool
> > > 
> > > Except: what if we're in PR_MTE_TCF_ASYNC mode.  If the SIGSEGV handler
> > > triggers an asynchronous MTE fault itself, we could then get into a
> > > spin.  Hmm.
[...]
> > > In that case, an asynchronous MTE fault pending at sigreturn must have
> > > been caused by the signal handler.  We could make that particular case
> > > of MTE_AERR a force_sig.
> > 
> > We clear the TIF flag when delivering the signal. I don't think there is
> > a way for the kernel to detect when it is running in a signal handler.
> > sigreturn() is not mandatory either.
> 
> I guess we can put up with this signal not being fatal then.
> 
> If you have a SEGV handler at all, you're supposed to code it carefully.
> 
> This brings us back to force_sig for SERR and a normal signal for AERR.
> That's probably OK.

I think we are in agreement now but please check the patches when I post
the v4.

> > > > > > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call)
> > > > > > +are only checked if the current thread tag checking mode is
> > > > > > +PR_MTE_TCF_SYNC.
> > > > > 
> > > > > Vague?  Can we make a precise statement about when the kernel will and
> > > > > won't check such accesses?  And aren't there limitations (like use of
> > > > > get_user_pages() etc.)?
> > > > 
> > > > We could make it slightly clearer by say "kernel accesses to the user
> > > > address space".
> > > 
> > > That's not the ambiguity.
> > > 
> > > My question is
> > > 
> > > 1) Does the kernel guarantee not to check tags on kernel accesses to
> > > user memory without PR_MTE_TCF_SYNC?
[...]
> > > 2) Does the kernel guarantee to check tags on kernel accesses to user
> > > memory with PR_MTE_TCF_SYNC?
[...]
> > > In practice, this note sounds to be more like a kernel implementation
> > > detail rather than advice to userspace.
> > > 
> > > Would it make sense to say something like:
> > > 
> > >  * PR_MTE_TCF_NONE: the kernel does not check tags for kernel accesses
> > >    to use memory done by syscalls in the thread.
> > > 
> > >  * PR_MTE_TCF_ASYNC: the kernel may check some tags for kernel accesses
> > >    to user memory done by syscalls.  (Should we guarantee that such
> > >    faults are reported synchronously on syscall exit?  In practice I
> > >    think they are.  Should we use SEGV_MTESERR in this case?  Perhaps
> > >    it's not worth making this a special case.)
> > 
> > Both NONE and ASYNC are now the same for kernel uaccess - not checked.
> >
> > For background information, I decided against ASYNC uaccess checking
> > since (1) there are some cases where the kernel overreads
> > (strncpy_from_user) and (2) we don't normally generate SIGSEGV on
> > uaccess but rather return -EFAULT. The latter is not possible to contain
> > since we only learn about the fault asynchronously, usually after the
> > transfer.
> 
> I may be missing something here.  Do we still rely on the hardware to
> detect tag mismatches in kernel accesses to user memory?  I was assuming
> we do some kind of explicit checking, but now I think that's nonsense
> (except for get_user_pages() etc.)

For synchronous tag checking, we expect the uaccess (via the user
address, e.g. copy_from_user()) to be checked by the hardware. If the
access happens via a kernel mapping (get_user_pages()), the access is
unchecked. There is no point in an explicit tag access+check from the
kernel since the get_user_pages() accesses are not expected to generate
faults anyway (once the pages have been returned). We also most likely
lost the actual user address at the point of access, so not easy to
infer the original tag.

> Since MTE is a new opt-in feature, I think we might have the option to
> report failures with SIGSEGV instead of -EFAULT.  This seems exactly to
> implement the concept of an asynchronous versus synchronous error. 

With synchronous checking, we return -EFAULT, smaller number of bytes
etc. since no/less data was copied. With async, the uaccess would
perform all the accesses, only that the user may get a SIGSEGV delivered
on return from the syscall.

> The kernel may not normally do this, but software usually doesn't use
> raw syscalls.  In reality "syscalls" can trigger a SIGSEGV in the libc
> wrapper anyway.  From the caller's point of view the whole thing is a
> black box.
> 
> Probably needs discussion with the bionic / glibc folks though (though
> likely this has been discussed already...)

The initial plan was to generate SIGSEGV on asynchronous faults for
uaccess (on syscall return). This changed when we noticed (in version 3
I think) that the kernel over-reads buffers in some cases
(strncpy_from_user(), copy_mount_options()) and triggers false
positives.

We could fix the above two cases, though in different ways:
strncpy_from_user() can align its source (user) address and would no
longer be expected to trigger a fault if the string is correctly tagged.
copy_mount_options(), OTOH, always reads 4K (not zero-terminated), so it
will trip over some tag mismatch. The workaround is to contain the async
tag check fault (with DSB before and after the access) and ignore it.

However, are these the only two cases where the kernel over-reads user
buffers? Without MTE, such faults on uaccess (page faults) were handled
by the kernel transparently. We may now start delivering SIGSEGV every
time some piece of uaccess kernel code changes and over-reads.

> My concern is that the spirit of asynchrous checking in the
> architecture is that accesses _are_ checked, and we seem to be
> breaking that principle here.

I agree with you on the principle but my concern is about the
practicality of chasing any future code changes and plugging potentially
fatal SIGSEGVs sent to the user.

Maybe we need a way to log this so that user (admin) can do something
about it like force synchronous. Or we could also toggle synchronous
uaccesses irrespective of the user mode or expose this option as a
prctl().

Also, do we want some big knob (sysctl) to force some of these modes for
all user processes: e.g. force-upgrade async to sync?

> > > > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > > > +
> > > > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > > > 
> > > > > Is there no way to make this default to 1 rather than having a magic
> > > > > meaning for 0?
> > > > 
> > > > We follow the hardware behaviour where 0xffff and 0xfffe give the same
> > > > result.
> > > 
> > > Exposing this through a purely software interface seems a bit odd:
> > > because the exclude mask is privileged-access-only, the architecture
> > > could amend it to assign a different meaning to 0xffff, providing this
> > > was an opt-in change.  Then we'd have to make a mess here.
> > 
> > You have a point. An include mask of 0 translates to an exclude mask of
> > 0xffff as per the current patches. If the hardware gains support for one
> > more bit (32 colours), old software running on new hardware may run into
> > unexpected results with an exclude mask of 0xffff.
> > 
> > > Can't we just forbid the nonsense value 0 here, or are there other
> > > reasons why that's problematic?
> > 
> > It was just easier to start with a default. I wonder whether we should
> > actually switch back to the exclude mask, as per the hardware
> > definition. This way 0 would mean all tags allowed. We can still
> > disallow 0xffff as an exclude mask.
[...]
> The only configuration that doesn't make sense is "no tags allowed", so
> I'd argue for explicity blocking that, even if the architeture aliases
> that encoding to something else.
> 
> If we prefer 0 as a default value so that init inherits the correct
> value from the kernel without any special acrobatics, then we make it an
> exclude mask, with the semantics that the hardware is allowed to
> generate any of these tags, but does not have to be capable of
> generating all of them.

That's more of a question to the libc people and their preference.
We have two options with suboptions:

1. prctl() gets an exclude mask with 0xffff illegal even though the
   hardware accepts it:
   a) default exclude mask 0, allowing all tags to be generated by IRG
   b) default exclude mask of 0xfffe so that only tag 0 is generated

2. prctl() gets an include mask with 0 illegal:
   a) default include mask is 0xffff, allowing all tags to be generated
   b) default include mask 0f 0x0001 so that only tag 0 is generated

We currently have (2) with mask 0 but could be changed to (2.b). If we
are to follow the hardware description (which makes more sense to me but
I don't write the C library), (1.a) is the most appropriate.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-14 11:37             ` Catalin Marinas
@ 2020-05-15 10:38               ` Catalin Marinas
  2020-05-15 11:14                 ` Szabolcs Nagy
  2020-05-18 17:13               ` Catalin Marinas
  1 sibling, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-15 10:38 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Richard Earnshaw, Szabolcs Nagy, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On Thu, May 14, 2020 at 12:37:22PM +0100, Catalin Marinas wrote:
> On Wed, May 13, 2020 at 04:48:46PM +0100, Dave P Martin wrote:
> > > > > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > > > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > > > > +
> > > > > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > > > > 
> > > > > > Is there no way to make this default to 1 rather than having a magic
> > > > > > meaning for 0?
> [...]
> > The only configuration that doesn't make sense is "no tags allowed", so
> > I'd argue for explicity blocking that, even if the architeture aliases
> > that encoding to something else.
> > 
> > If we prefer 0 as a default value so that init inherits the correct
> > value from the kernel without any special acrobatics, then we make it an
> > exclude mask, with the semantics that the hardware is allowed to
> > generate any of these tags, but does not have to be capable of
> > generating all of them.
> 
> That's more of a question to the libc people and their preference.
> We have two options with suboptions:
> 
> 1. prctl() gets an exclude mask with 0xffff illegal even though the
>    hardware accepts it:
>    a) default exclude mask 0, allowing all tags to be generated by IRG
>    b) default exclude mask of 0xfffe so that only tag 0 is generated
> 
> 2. prctl() gets an include mask with 0 illegal:
>    a) default include mask is 0xffff, allowing all tags to be generated
>    b) default include mask 0f 0x0001 so that only tag 0 is generated
> 
> We currently have (2) with mask 0 but could be changed to (2.b). If we
> are to follow the hardware description (which makes more sense to me but
> I don't write the C library), (1.a) is the most appropriate.

Thinking some more about this, as we are to expose the GCR_EL1.Excl via
a ptrace interface as a regset, it makes more sense to move back to an
exclude mask here with default 0. That would be option 1.a above.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 10:38               ` Catalin Marinas
@ 2020-05-15 11:14                 ` Szabolcs Nagy
  2020-05-15 11:27                   ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2020-05-15 11:14 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 05/15/2020 11:38, Catalin Marinas wrote:
> On Thu, May 14, 2020 at 12:37:22PM +0100, Catalin Marinas wrote:
> > On Wed, May 13, 2020 at 04:48:46PM +0100, Dave P Martin wrote:
> > > > > > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > > > > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > > > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > > > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > > > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > > > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > > > > > +
> > > > > > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > > > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > > > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > > > > > 
> > > > > > > Is there no way to make this default to 1 rather than having a magic
> > > > > > > meaning for 0?
> > [...]
> > > The only configuration that doesn't make sense is "no tags allowed", so
> > > I'd argue for explicity blocking that, even if the architeture aliases
> > > that encoding to something else.
> > > 
> > > If we prefer 0 as a default value so that init inherits the correct
> > > value from the kernel without any special acrobatics, then we make it an
> > > exclude mask, with the semantics that the hardware is allowed to
> > > generate any of these tags, but does not have to be capable of
> > > generating all of them.
> > 
> > That's more of a question to the libc people and their preference.
> > We have two options with suboptions:
> > 
> > 1. prctl() gets an exclude mask with 0xffff illegal even though the
> >    hardware accepts it:
> >    a) default exclude mask 0, allowing all tags to be generated by IRG
> >    b) default exclude mask of 0xfffe so that only tag 0 is generated
> > 
> > 2. prctl() gets an include mask with 0 illegal:
> >    a) default include mask is 0xffff, allowing all tags to be generated
> >    b) default include mask 0f 0x0001 so that only tag 0 is generated
> > 
> > We currently have (2) with mask 0 but could be changed to (2.b). If we
> > are to follow the hardware description (which makes more sense to me but
> > I don't write the C library), (1.a) is the most appropriate.
> 
> Thinking some more about this, as we are to expose the GCR_EL1.Excl via
> a ptrace interface as a regset, it makes more sense to move back to an
> exclude mask here with default 0. That would be option 1.a above.

i think the libc has to do a prctl call to set
mte up and at that point it will use whatever
arguments necessary, so 1.a should work (just
like the other options).

likely libc will disable 0 for irg and possibly
one or two other fixed colors (which will have
specific use).

the difference i see between 1 vs 2 is forward
compatibility if the architecture changes (e.g.
adding more tag bits) but then likely new prctl
flag will be needed for handling that so it's
probably not an issue.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 11:14                 ` Szabolcs Nagy
@ 2020-05-15 11:27                   ` Catalin Marinas
  2020-05-15 12:04                     ` Szabolcs Nagy
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-15 11:27 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

On Fri, May 15, 2020 at 12:14:00PM +0100, Szabolcs Nagy wrote:
> The 05/15/2020 11:38, Catalin Marinas wrote:
> > On Thu, May 14, 2020 at 12:37:22PM +0100, Catalin Marinas wrote:
> > > We have two options with suboptions:
> > > 
> > > 1. prctl() gets an exclude mask with 0xffff illegal even though the
> > >    hardware accepts it:
> > >    a) default exclude mask 0, allowing all tags to be generated by IRG
> > >    b) default exclude mask of 0xfffe so that only tag 0 is generated
> > > 
> > > 2. prctl() gets an include mask with 0 illegal:
> > >    a) default include mask is 0xffff, allowing all tags to be generated
> > >    b) default include mask 0f 0x0001 so that only tag 0 is generated
> > > 
> > > We currently have (2) with mask 0 but could be changed to (2.b). If we
> > > are to follow the hardware description (which makes more sense to me but
> > > I don't write the C library), (1.a) is the most appropriate.
> > 
> > Thinking some more about this, as we are to expose the GCR_EL1.Excl via
> > a ptrace interface as a regset, it makes more sense to move back to an
> > exclude mask here with default 0. That would be option 1.a above.
> 
> i think the libc has to do a prctl call to set
> mte up and at that point it will use whatever
> arguments necessary, so 1.a should work (just
> like the other options).
> 
> likely libc will disable 0 for irg and possibly
> one or two other fixed colors (which will have
> specific use).
> 
> the difference i see between 1 vs 2 is forward
> compatibility if the architecture changes (e.g.
> adding more tag bits) but then likely new prctl
> flag will be needed for handling that so it's
> probably not an issue.

Thanks Szabolcs. While we are at this, no-one so far asked for the
GCR_EL1.RRND to be exposed to user (and this implies RGSR_EL1.SEED).
Since RRND=1 guarantees a distribution "no worse" than that of RRND=0, I
thought there isn't much point in exposing this configuration to the
user. The only advantage of RRND=0 I see is that the kernel can change
the seed randomly but, with only 4 bits per tag, it really doesn't
matter much.

Anyway, mentioning it here in case anyone is surprised later about the
lack of RRND configurability.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 11:27                   ` Catalin Marinas
@ 2020-05-15 12:04                     ` Szabolcs Nagy
  2020-05-15 12:13                       ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2020-05-15 12:04 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 05/15/2020 12:27, Catalin Marinas wrote:
> Thanks Szabolcs. While we are at this, no-one so far asked for the
> GCR_EL1.RRND to be exposed to user (and this implies RGSR_EL1.SEED).
> Since RRND=1 guarantees a distribution "no worse" than that of RRND=0, I
> thought there isn't much point in exposing this configuration to the
> user. The only advantage of RRND=0 I see is that the kernel can change

it seems RRND=1 is the impl specific algorithm.

> the seed randomly but, with only 4 bits per tag, it really doesn't
> matter much.
> 
> Anyway, mentioning it here in case anyone is surprised later about the
> lack of RRND configurability.

i'm not familiar with how irg works.

is the seed per process state that's set
up at process startup in some way?
or shared (and thus effectively irg is
non-deterministic in userspace)?


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 12:04                     ` Szabolcs Nagy
@ 2020-05-15 12:13                       ` Catalin Marinas
  2020-05-15 12:53                         ` Szabolcs Nagy
  0 siblings, 1 reply; 81+ messages in thread
From: Catalin Marinas @ 2020-05-15 12:13 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

On Fri, May 15, 2020 at 01:04:33PM +0100, Szabolcs Nagy wrote:
> The 05/15/2020 12:27, Catalin Marinas wrote:
> > Thanks Szabolcs. While we are at this, no-one so far asked for the
> > GCR_EL1.RRND to be exposed to user (and this implies RGSR_EL1.SEED).
> > Since RRND=1 guarantees a distribution "no worse" than that of RRND=0, I
> > thought there isn't much point in exposing this configuration to the
> > user. The only advantage of RRND=0 I see is that the kernel can change
> 
> it seems RRND=1 is the impl specific algorithm.

Yes, that's the implementation specific algorithm which shouldn't be
worse than the standard one.

> > the seed randomly but, with only 4 bits per tag, it really doesn't
> > matter much.
> > 
> > Anyway, mentioning it here in case anyone is surprised later about the
> > lack of RRND configurability.
> 
> i'm not familiar with how irg works.

It generates a random tag based on some algorithm.

> is the seed per process state that's set up at process startup in some
> way? or shared (and thus effectively irg is non-deterministic in
> userspace)?

The seed is only relevant if the standard algorithm is used (RRND=0).

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 12:13                       ` Catalin Marinas
@ 2020-05-15 12:53                         ` Szabolcs Nagy
  2020-05-18 16:52                           ` Dave Martin
  0 siblings, 1 reply; 81+ messages in thread
From: Szabolcs Nagy @ 2020-05-15 12:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel, nd

The 05/15/2020 13:13, Catalin Marinas wrote:
> On Fri, May 15, 2020 at 01:04:33PM +0100, Szabolcs Nagy wrote:
> > The 05/15/2020 12:27, Catalin Marinas wrote:
> > > Thanks Szabolcs. While we are at this, no-one so far asked for the
> > > GCR_EL1.RRND to be exposed to user (and this implies RGSR_EL1.SEED).
> > > Since RRND=1 guarantees a distribution "no worse" than that of RRND=0, I
> > > thought there isn't much point in exposing this configuration to the
> > > user. The only advantage of RRND=0 I see is that the kernel can change
> > 
> > it seems RRND=1 is the impl specific algorithm.
> 
> Yes, that's the implementation specific algorithm which shouldn't be
> worse than the standard one.
> 
> > > the seed randomly but, with only 4 bits per tag, it really doesn't
> > > matter much.
> > > 
> > > Anyway, mentioning it here in case anyone is surprised later about the
> > > lack of RRND configurability.
> > 
> > i'm not familiar with how irg works.
> 
> It generates a random tag based on some algorithm.
> 
> > is the seed per process state that's set up at process startup in some
> > way? or shared (and thus effectively irg is non-deterministic in
> > userspace)?
> 
> The seed is only relevant if the standard algorithm is used (RRND=0).

i wanted to understand if we can get deterministic
irg behaviour in user space (which may be useful
for debugging to get reproducible tag failures).

i guess if no control is exposed that means non-
deterministic irg. i think this is fine.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-13 16:45             ` Luis Machado
  2020-05-13 17:11               ` Catalin Marinas
@ 2020-05-18 16:47               ` Dave Martin
  2020-05-18 17:12                 ` Luis Machado
  1 sibling, 1 reply; 81+ messages in thread
From: Dave Martin @ 2020-05-18 16:47 UTC (permalink / raw)
  To: Luis Machado
  Cc: Catalin Marinas, linux-arch, Richard Earnshaw, Omair Javaid,
	Szabolcs Nagy, Andrey Konovalov, Kevin Brodsky,
	Peter Collingbourne, linux-mm, Alan Hayward, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On Wed, May 13, 2020 at 01:45:27PM -0300, Luis Machado wrote:
> On 5/13/20 12:09 PM, Luis Machado wrote:
> >On 5/13/20 11:11 AM, Catalin Marinas wrote:
> >>On Wed, May 13, 2020 at 09:52:52AM -0300, Luis Machado wrote:
> >>>On 5/13/20 7:48 AM, Catalin Marinas wrote:
> >>>>On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
> >>>>>On 4/21/20 11:25 AM, Catalin Marinas wrote:
> >>>>>>Add support for bulk setting/getting of the MTE tags in a tracee's
> >>>>>>address space at 'addr' in the ptrace() syscall prototype.
> >>>>>>'data' points
> >>>>>>to a struct iovec in the tracer's address space with iov_base
> >>>>>>representing the address of a tracer's buffer of length iov_len. The
> >>>>>>tags to be copied to/from the tracer's buffer are stored as one
> >>>>>>tag per
> >>>>>>byte.
> >>>>>>
> >>>>>>On successfully copying at least one tag, ptrace() returns 0 and
> >>>>>>updates
> >>>>>>the tracer's iov_len with the number of tags copied. In case of
> >>>>>>error,
> >>>>>>either -EIO or -EFAULT is returned, trying to follow the ptrace() man
> >>>>>>page.
> >>>>>>
> >>>>>>Note that the tag copying functions are not performance critical,
> >>>>>>therefore they lack optimisations found in typical memory copy
> >>>>>>routines.
> >>>>>>
> >>>>>>Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> >>>>>>Cc: Will Deacon <will@kernel.org>
> >>>>>>Cc: Alan Hayward <Alan.Hayward@arm.com>
> >>>>>>Cc: Luis Machado <luis.machado@linaro.org>
> >>>>>>Cc: Omair Javaid <omair.javaid@linaro.org>
> >>>>>
> >>>>>I started working on MTE support for GDB and I'm wondering if
> >>>>>we've already
> >>>>>defined a way to check for runtime MTE support (as opposed to a
> >>>>>HWCAP2-based
> >>>>>check) in a traced process.
> >>>>>
> >>>>>Originally we were going to do it via empty-parameter ptrace
> >>>>>calls, but you
> >>>>>had mentioned something about a proc-based method, if I'm not
> >>>>>mistaken.
> >>>>
> >>>>We could expose more information via proc_pid_arch_status() but that
> >>>>would be the tagged address ABI and tag check fault mode and intended
> >>>>for human consumption mostly. We don't have any ptrace interface that
> >>>>exposes HWCAPs. Since the gdbserver runs on the same machine as the
> >>>>debugged process, it can check the HWCAPs itself, they are the same for
> >>>>all processes.
> >>>
> >>>Sorry, I think i haven't made it clear. I already have access to
> >>>HWCAP2 both
> >>>from GDB's and gdbserver's side. But HWCAP2 only indicates the
> >>>availability
> >>>of a particular feature in a CPU, it doesn't necessarily means the
> >>>traced
> >>>process is actively using MTE, right?
> >>
> >>Right, but "actively" is not well defined either. The only way to tell
> >>whether a process is using MTE is to look for any PROT_MTE mappings. You
> >>can access these via /proc/<pid>/maps. In theory, one can use MTE
> >>without enabling the tagged address ABI or even tag checking (i.e. no
> >>prctl() call).
> >>
> >
> >I see the problem. I was hoping for a more immediate form of runtime
> >check. One debuggers would validate and enable all the tag checks and
> >register access at process attach/startup.
> >
> >With that said, checking for PROT_MTE in /proc/<pid>/maps may still be
> >useful, but a process with no immediate PROT_MTE maps doesn't mean such
> >process won't attempt to use PROT_MTE later on. I'll have to factor that
> >in, but I think it'll work.
> >
> >I guess HWCAP2_MTE will be useful after all. We can just assume that
> >whenever we have HWCAP2_MTE, we can fetch MTE registers and check for
> >PROT_MTE.
> >
> >>>So GDB/gdbserver would need runtime checks to be able to tell if a
> >>>process
> >>>is using MTE, in which case the tools will pay attention to tags and
> >>>additional MTE-related registers (sctlr and gcr) we plan to make
> >>>available
> >>>to userspace.
> >>
> >>I'm happy to expose GCR_EL1.Excl and the SCTLR_EL1.TCF0 bits via ptrace
> >>as a thread state. The tags, however, are a property of the memory range
> >>rather than a per-thread state. That's what makes it different from
> >>other register-based features like SVE.
> >
> >That's my understanding as well. I'm assuming, based on our previous
> >discussion, that we'll have those couple registers under a regset (maybe
> >NT_ARM_MTE).
> >
> >>
> >>>The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a
> >>>NULL
> >>>address and check the result. Then GDB would be able to decide if the
> >>>process is using MTE or not.
> >>
> >>We don't store this information in the kernel as a bool and I don't
> >>think it would be useful either. I think gdb, when displaying memory,
> >>should attempt to show tags as well if the corresponding range was
> >>mapped with PROT_MTE. Just probing whether a thread ever used MTE
> >>doesn't help since you need to be more precise on which address supports
> >>tags.
> >
> >Thanks for making this clear. Checking with ptrace won't work then. It
> >seems like /proc/<pid>/maps is the way to go.
> >
> >>
> >>>>BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
> >>>>the ptrace tag access slightly to return an error (and no tags copied)
> >>>>if the page has not been mapped with PROT_MTE. The other option would
> >>>>have been read-as-zero/write-ignored as per the hardware behaviour.
> >>>>Either option is fine by me but I thought the write-ignored part would
> >>>>be more confusing for the debugger. If you have any preference here,
> >>>>please let me know.
> >>>
> >>>I think erroring out is a better alternative, as long as the debugger
> >>>can
> >>>tell what the error means, like, for example, "this particular address
> >>>doesn't make use of tags".
> >>
> >>And you could use this for probing whether the range has tags or not.
> >>With my current patches it returns -EFAULT but happy to change this to
> >>-EOPNOTSUPP or -EINVAL. Note that it only returns an error if no tags
> >>copied. If gdb asks for a range of two pages and only the first one has
> >>PROT_MTE, it will return 0 and set the number of tags copied equivalent
> >>to the first page. A subsequent call would return an error.
> >>
> >>In my discussion with Dave on the documentation patch, I thought retries
> >>wouldn't be needed but in the above case it may be useful to get an
> >>error code. That's unless we change the interface to return an error and
> >>also update the user iovec structure.
> >>
> >
> >Let me think about this for a bit. I'm trying to factor in the
> >/proc/<pid>/maps contents. If debuggers know which pages have PROT_MTE
> >set, then we can teach the tools not to PEEK/POKE tags from/to those
> >memory ranges, which simplifies the error handling a bit.
> 
> I was checking the output of /proc/<pid>/maps and it doesn't seem to contain
> flags against which i can match PROT_MTE. It seems /proc/<pid>/smaps is the
> one that contains the flags (mt) for MTE. Am i missing something?
> 
> Is this the only place debuggers can check for PROT_MTE? If so, that's
> unfortunate. /proc/<pid>/smaps doesn't seem to be convenient for parsing.

Does the /proc approach work for gdbserver?

For the SVE ptrace interface we eventually went with existence of the
NT_ARM_SVE regset as being the canonical way of detecting whether SVE is
present.

As has been discussed here, I think we probably do want to expose the
current MTE config for a thread via a new regset.  Without this, I can't
see how the debugger can know for sure what's going on.


Wrinkle: just because MTE is "off", pages might still be mapped with
PROT_MTE and have arbitrary tags set on them, and the debugger perhaps
needs a way to know that.  Currently grubbing around in /proc is the
only way to discover that.  Dunno whether it matters.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-15 12:53                         ` Szabolcs Nagy
@ 2020-05-18 16:52                           ` Dave Martin
  0 siblings, 0 replies; 81+ messages in thread
From: Dave Martin @ 2020-05-18 16:52 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Catalin Marinas, linux-arch, Richard Earnshaw, nd, Will Deacon,
	Andrey Konovalov, Kevin Brodsky, linux-mm, Vincenzo Frascino,
	Peter Collingbourne, linux-arm-kernel

On Fri, May 15, 2020 at 01:53:32PM +0100, Szabolcs Nagy wrote:
> The 05/15/2020 13:13, Catalin Marinas wrote:
> > On Fri, May 15, 2020 at 01:04:33PM +0100, Szabolcs Nagy wrote:
> > > The 05/15/2020 12:27, Catalin Marinas wrote:
> > > > Thanks Szabolcs. While we are at this, no-one so far asked for the
> > > > GCR_EL1.RRND to be exposed to user (and this implies RGSR_EL1.SEED).
> > > > Since RRND=1 guarantees a distribution "no worse" than that of RRND=0, I
> > > > thought there isn't much point in exposing this configuration to the
> > > > user. The only advantage of RRND=0 I see is that the kernel can change
> > > 
> > > it seems RRND=1 is the impl specific algorithm.
> > 
> > Yes, that's the implementation specific algorithm which shouldn't be
> > worse than the standard one.
> > 
> > > > the seed randomly but, with only 4 bits per tag, it really doesn't
> > > > matter much.
> > > > 
> > > > Anyway, mentioning it here in case anyone is surprised later about the
> > > > lack of RRND configurability.
> > > 
> > > i'm not familiar with how irg works.
> > 
> > It generates a random tag based on some algorithm.
> > 
> > > is the seed per process state that's set up at process startup in some
> > > way? or shared (and thus effectively irg is non-deterministic in
> > > userspace)?
> > 
> > The seed is only relevant if the standard algorithm is used (RRND=0).
> 
> i wanted to understand if we can get deterministic
> irg behaviour in user space (which may be useful
> for debugging to get reproducible tag failures).
> 
> i guess if no control is exposed that means non-
> deterministic irg. i think this is fine.

Hmmm, I guess this might eventually be wanted.  But it's probably OK not
to have it to begin with.

Things like CRIU restores won't be reproducible unless the seeds can be
saved/restored.

Doesn't seem essential from day 1 though.

Cheers
---Dave


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-18 16:47               ` Dave Martin
@ 2020-05-18 17:12                 ` Luis Machado
  2020-05-19 16:10                   ` Catalin Marinas
  0 siblings, 1 reply; 81+ messages in thread
From: Luis Machado @ 2020-05-18 17:12 UTC (permalink / raw)
  To: Dave Martin
  Cc: Catalin Marinas, linux-arch, Richard Earnshaw, Omair Javaid,
	Szabolcs Nagy, Andrey Konovalov, Kevin Brodsky,
	Peter Collingbourne, linux-mm, Alan Hayward, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On 5/18/20 1:47 PM, Dave Martin wrote:
> On Wed, May 13, 2020 at 01:45:27PM -0300, Luis Machado wrote:
>> On 5/13/20 12:09 PM, Luis Machado wrote:
>>> On 5/13/20 11:11 AM, Catalin Marinas wrote:
>>>> On Wed, May 13, 2020 at 09:52:52AM -0300, Luis Machado wrote:
>>>>> On 5/13/20 7:48 AM, Catalin Marinas wrote:
>>>>>> On Tue, May 12, 2020 at 04:05:15PM -0300, Luis Machado wrote:
>>>>>>> On 4/21/20 11:25 AM, Catalin Marinas wrote:
>>>>>>>> Add support for bulk setting/getting of the MTE tags in a tracee's
>>>>>>>> address space at 'addr' in the ptrace() syscall prototype.
>>>>>>>> 'data' points
>>>>>>>> to a struct iovec in the tracer's address space with iov_base
>>>>>>>> representing the address of a tracer's buffer of length iov_len. The
>>>>>>>> tags to be copied to/from the tracer's buffer are stored as one
>>>>>>>> tag per
>>>>>>>> byte.
>>>>>>>>
>>>>>>>> On successfully copying at least one tag, ptrace() returns 0 and
>>>>>>>> updates
>>>>>>>> the tracer's iov_len with the number of tags copied. In case of
>>>>>>>> error,
>>>>>>>> either -EIO or -EFAULT is returned, trying to follow the ptrace() man
>>>>>>>> page.
>>>>>>>>
>>>>>>>> Note that the tag copying functions are not performance critical,
>>>>>>>> therefore they lack optimisations found in typical memory copy
>>>>>>>> routines.
>>>>>>>>
>>>>>>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>>>>>>> Cc: Will Deacon <will@kernel.org>
>>>>>>>> Cc: Alan Hayward <Alan.Hayward@arm.com>
>>>>>>>> Cc: Luis Machado <luis.machado@linaro.org>
>>>>>>>> Cc: Omair Javaid <omair.javaid@linaro.org>
>>>>>>>
>>>>>>> I started working on MTE support for GDB and I'm wondering if
>>>>>>> we've already
>>>>>>> defined a way to check for runtime MTE support (as opposed to a
>>>>>>> HWCAP2-based
>>>>>>> check) in a traced process.
>>>>>>>
>>>>>>> Originally we were going to do it via empty-parameter ptrace
>>>>>>> calls, but you
>>>>>>> had mentioned something about a proc-based method, if I'm not
>>>>>>> mistaken.
>>>>>>
>>>>>> We could expose more information via proc_pid_arch_status() but that
>>>>>> would be the tagged address ABI and tag check fault mode and intended
>>>>>> for human consumption mostly. We don't have any ptrace interface that
>>>>>> exposes HWCAPs. Since the gdbserver runs on the same machine as the
>>>>>> debugged process, it can check the HWCAPs itself, they are the same for
>>>>>> all processes.
>>>>>
>>>>> Sorry, I think i haven't made it clear. I already have access to
>>>>> HWCAP2 both
>>>> >from GDB's and gdbserver's side. But HWCAP2 only indicates the
>>>>> availability
>>>>> of a particular feature in a CPU, it doesn't necessarily means the
>>>>> traced
>>>>> process is actively using MTE, right?
>>>>
>>>> Right, but "actively" is not well defined either. The only way to tell
>>>> whether a process is using MTE is to look for any PROT_MTE mappings. You
>>>> can access these via /proc/<pid>/maps. In theory, one can use MTE
>>>> without enabling the tagged address ABI or even tag checking (i.e. no
>>>> prctl() call).
>>>>
>>>
>>> I see the problem. I was hoping for a more immediate form of runtime
>>> check. One debuggers would validate and enable all the tag checks and
>>> register access at process attach/startup.
>>>
>>> With that said, checking for PROT_MTE in /proc/<pid>/maps may still be
>>> useful, but a process with no immediate PROT_MTE maps doesn't mean such
>>> process won't attempt to use PROT_MTE later on. I'll have to factor that
>>> in, but I think it'll work.
>>>
>>> I guess HWCAP2_MTE will be useful after all. We can just assume that
>>> whenever we have HWCAP2_MTE, we can fetch MTE registers and check for
>>> PROT_MTE.
>>>
>>>>> So GDB/gdbserver would need runtime checks to be able to tell if a
>>>>> process
>>>>> is using MTE, in which case the tools will pay attention to tags and
>>>>> additional MTE-related registers (sctlr and gcr) we plan to make
>>>>> available
>>>>> to userspace.
>>>>
>>>> I'm happy to expose GCR_EL1.Excl and the SCTLR_EL1.TCF0 bits via ptrace
>>>> as a thread state. The tags, however, are a property of the memory range
>>>> rather than a per-thread state. That's what makes it different from
>>>> other register-based features like SVE.
>>>
>>> That's my understanding as well. I'm assuming, based on our previous
>>> discussion, that we'll have those couple registers under a regset (maybe
>>> NT_ARM_MTE).
>>>
>>>>
>>>>> The original proposal was to have GDB send PTRACE_PEEKMTETAGS with a
>>>>> NULL
>>>>> address and check the result. Then GDB would be able to decide if the
>>>>> process is using MTE or not.
>>>>
>>>> We don't store this information in the kernel as a bool and I don't
>>>> think it would be useful either. I think gdb, when displaying memory,
>>>> should attempt to show tags as well if the corresponding range was
>>>> mapped with PROT_MTE. Just probing whether a thread ever used MTE
>>>> doesn't help since you need to be more precise on which address supports
>>>> tags.
>>>
>>> Thanks for making this clear. Checking with ptrace won't work then. It
>>> seems like /proc/<pid>/maps is the way to go.
>>>
>>>>
>>>>>> BTW, in my pre-v4 patches (hopefully I'll post v4 this week), I changed
>>>>>> the ptrace tag access slightly to return an error (and no tags copied)
>>>>>> if the page has not been mapped with PROT_MTE. The other option would
>>>>>> have been read-as-zero/write-ignored as per the hardware behaviour.
>>>>>> Either option is fine by me but I thought the write-ignored part would
>>>>>> be more confusing for the debugger. If you have any preference here,
>>>>>> please let me know.
>>>>>
>>>>> I think erroring out is a better alternative, as long as the debugger
>>>>> can
>>>>> tell what the error means, like, for example, "this particular address
>>>>> doesn't make use of tags".
>>>>
>>>> And you could use this for probing whether the range has tags or not.
>>>> With my current patches it returns -EFAULT but happy to change this to
>>>> -EOPNOTSUPP or -EINVAL. Note that it only returns an error if no tags
>>>> copied. If gdb asks for a range of two pages and only the first one has
>>>> PROT_MTE, it will return 0 and set the number of tags copied equivalent
>>>> to the first page. A subsequent call would return an error.
>>>>
>>>> In my discussion with Dave on the documentation patch, I thought retries
>>>> wouldn't be needed but in the above case it may be useful to get an
>>>> error code. That's unless we change the interface to return an error and
>>>> also update the user iovec structure.
>>>>
>>>
>>> Let me think about this for a bit. I'm trying to factor in the
>>> /proc/<pid>/maps contents. If debuggers know which pages have PROT_MTE
>>> set, then we can teach the tools not to PEEK/POKE tags from/to those
>>> memory ranges, which simplifies the error handling a bit.
>>
>> I was checking the output of /proc/<pid>/maps and it doesn't seem to contain
>> flags against which i can match PROT_MTE. It seems /proc/<pid>/smaps is the
>> one that contains the flags (mt) for MTE. Am i missing something?
>>
>> Is this the only place debuggers can check for PROT_MTE? If so, that's
>> unfortunate. /proc/<pid>/smaps doesn't seem to be convenient for parsing.
> 
> Does the /proc approach work for gdbserver?

gdbserver also has access to /proc and reads memory from there 
(/proc/<pid>/mem.

> 
> For the SVE ptrace interface we eventually went with existence of the
> NT_ARM_SVE regset as being the canonical way of detecting whether SVE is
> present.

Do you mean "present" as in "this process is actively using SVE 
registers" or do you mean the CPU and kernel support SVE, but there's no 
guarantee SVE is being used?

 From what i remember, SVE runtime usage check is based on header data 
returned by the NT_ARM_SVE regset.

Right now i have a HWCAP2_MTE check for MTE. And for GDB, having 
HWCAP2_MTE implies having the NT_ARM_MTE regset.

> 
> As has been discussed here, I think we probably do want to expose the
> current MTE config for a thread via a new regset.  Without this, I can't
> see how the debugger can know for sure what's going on.

What kind of information would the debugger be looking for in those 
registers (sctlr and gcr)? Can MTE be switched on/off via those registers?

> 
> 
> Wrinkle: just because MTE is "off", pages might still be mapped with
> PROT_MTE and have arbitrary tags set on them, and the debugger perhaps
> needs a way to know that.  Currently grubbing around in /proc is the
> only way to discover that.  Dunno whether it matters.

That is the sort of thing that may confused the debugger.

If MTE is "off" (and thus the debugger doesn't need to validate tags), 
then the pages mapped with PROT_MTE that show up in /proc/<pid>/smaps 
should be ignored?

I'm looking for a precise way to tell if MTE is being used or not for a 
particular process/thread. This, in turn, will tell debuggers when to 
look for PROT_MTE mappings in /proc/<pid>/smaps and when to validate 
tagged addresses.

So far my assumption was that MTE will always be "on" when HWCAP2_MTE is 
present. So having HWCAP2_MTE means we have the NT_ARM_MTE regset and 
that PROT_MTE pages have to be checked.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation
  2020-05-14 11:37             ` Catalin Marinas
  2020-05-15 10:38               ` Catalin Marinas
@ 2020-05-18 17:13               ` Catalin Marinas
  1 sibling, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-05-18 17:13 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arch, Richard Earnshaw, Szabolcs Nagy, Andrey Konovalov,
	Kevin Brodsky, Peter Collingbourne, linux-mm, Vincenzo Frascino,
	Will Deacon, linux-arm-kernel

On Thu, May 14, 2020 at 12:37:22PM +0100, Catalin Marinas wrote:
> On Wed, May 13, 2020 at 04:48:46PM +0100, Dave P Martin wrote:
> > On Mon, May 11, 2020 at 05:40:19PM +0100, Catalin Marinas wrote:
> > > On Mon, May 04, 2020 at 05:46:17PM +0100, Dave P Martin wrote:
> > > > On Thu, Apr 30, 2020 at 05:23:17PM +0100, Catalin Marinas wrote:
> > > > > On Wed, Apr 29, 2020 at 05:47:05PM +0100, Dave P Martin wrote:
> > > > > > On Tue, Apr 21, 2020 at 03:26:03PM +0100, Catalin Marinas wrote:
> > > > > > > +excludes all tags other than 0. A user thread can enable specific tags
> > > > > > > +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
> > > > > > > +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
> > > > > > > +in the ``PR_MTE_TAG_MASK`` bit-field.
> > > > > > > +
> > > > > > > +**Note**: The hardware uses an exclude mask but the ``prctl()``
> > > > > > > +interface provides an include mask. An include mask of ``0`` (exclusion
> > > > > > > +mask ``0xffff``) results in the CPU always generating tag ``0``.
> > > > > > 
> > > > > > Is there no way to make this default to 1 rather than having a magic
> > > > > > meaning for 0?
> > > > > 
> > > > > We follow the hardware behaviour where 0xffff and 0xfffe give the same
> > > > > result.
> > > > 
> > > > Exposing this through a purely software interface seems a bit odd:
> > > > because the exclude mask is privileged-access-only, the architecture
> > > > could amend it to assign a different meaning to 0xffff, providing this
> > > > was an opt-in change.  Then we'd have to make a mess here.
> > > 
> > > You have a point. An include mask of 0 translates to an exclude mask of
> > > 0xffff as per the current patches. If the hardware gains support for one
> > > more bit (32 colours), old software running on new hardware may run into
> > > unexpected results with an exclude mask of 0xffff.
> > > 
> > > > Can't we just forbid the nonsense value 0 here, or are there other
> > > > reasons why that's problematic?
> > > 
> > > It was just easier to start with a default. I wonder whether we should
> > > actually switch back to the exclude mask, as per the hardware
> > > definition. This way 0 would mean all tags allowed. We can still
> > > disallow 0xffff as an exclude mask.
> [...]
> > The only configuration that doesn't make sense is "no tags allowed", so
> > I'd argue for explicity blocking that, even if the architeture aliases
> > that encoding to something else.
> > 
> > If we prefer 0 as a default value so that init inherits the correct
> > value from the kernel without any special acrobatics, then we make it an
> > exclude mask, with the semantics that the hardware is allowed to
> > generate any of these tags, but does not have to be capable of
> > generating all of them.
> 
> That's more of a question to the libc people and their preference.
> We have two options with suboptions:
> 
> 1. prctl() gets an exclude mask with 0xffff illegal even though the
>    hardware accepts it:
>    a) default exclude mask 0, allowing all tags to be generated by IRG
>    b) default exclude mask of 0xfffe so that only tag 0 is generated
> 
> 2. prctl() gets an include mask with 0 illegal:
>    a) default include mask is 0xffff, allowing all tags to be generated
>    b) default include mask 0f 0x0001 so that only tag 0 is generated
> 
> We currently have (2) with mask 0 but could be changed to (2.b). If we
> are to follow the hardware description (which makes more sense to me but
> I don't write the C library), (1.a) is the most appropriate.

As Peter pointed out on Friday (call), 2.b doesn't work as it breaks the
existing prctl() for turning on the tagged address ABI. So we have to
accept 0 as the tag mask field.

Dave, if you feel strongly about avoiding the exclude mask confusion
with 0xffff equivalent to 0xfffe, I'll go for 1.a. I have not changed
this in the v4 series of the patches (no ABI change in there apart from
some minor ptrace tweaks).

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support
  2020-05-18 17:12                 ` Luis Machado
@ 2020-05-19 16:10                   ` Catalin Marinas
  0 siblings, 0 replies; 81+ messages in thread
From: Catalin Marinas @ 2020-05-19 16:10 UTC (permalink / raw)
  To: Luis Machado
  Cc: Dave Martin, linux-arch, Richard Earnshaw, Will Deacon,
	Omair Javaid, Szabolcs Nagy, Kevin Brodsky, linux-mm,
	Andrey Konovalov, Vincenzo Frascino, Peter Collingbourne,
	Alan Hayward, linux-arm-kernel

On Mon, May 18, 2020 at 02:12:24PM -0300, Luis Machado wrote:
> On 5/18/20 1:47 PM, Dave Martin wrote:
> > Wrinkle: just because MTE is "off", pages might still be mapped with
> > PROT_MTE and have arbitrary tags set on them, and the debugger perhaps
> > needs a way to know that.  Currently grubbing around in /proc is the
> > only way to discover that.  Dunno whether it matters.
> 
> That is the sort of thing that may confused the debugger.
> 
> If MTE is "off" (and thus the debugger doesn't need to validate tags), then
> the pages mapped with PROT_MTE that show up in /proc/<pid>/smaps should be
> ignored?

There is no such thing as global MTE "off". If the HWCAP is present, a
user program can map an address with PROT_MTE and access tags. Maybe it
uses it for extra storage, you never know, doesn't have to be heap
allocation related.

> I'm looking for a precise way to tell if MTE is being used or not for a
> particular process/thread. This, in turn, will tell debuggers when to look
> for PROT_MTE mappings in /proc/<pid>/smaps and when to validate tagged
> addresses.
> 
> So far my assumption was that MTE will always be "on" when HWCAP2_MTE is
> present. So having HWCAP2_MTE means we have the NT_ARM_MTE regset and that
> PROT_MTE pages have to be checked.

Yes. I haven't figured out what to put in the regset yet, most likely
the prctl value as it has other software-only controls like the tagged
address ABI.

-- 
Catalin


^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2020-05-19 16:11 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 14:25 [PATCH v3 00/23] arm64: Memory Tagging Extension user-space support Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 01/23] arm64: alternative: Allow alternative_insn to always issue the first instruction Catalin Marinas
2020-04-27 16:57   ` Dave Martin
2020-04-28 11:43     ` Catalin Marinas
2020-04-29 10:26       ` Dave Martin
2020-04-29 14:04         ` Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 02/23] arm64: mte: system register definitions Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 03/23] arm64: mte: CPU feature detection and initial sysreg configuration Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 04/23] arm64: mte: Use Normal Tagged attributes for the linear map Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 05/23] arm64: mte: Assembler macros and default architecture for .S files Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 06/23] arm64: mte: Tags-aware clear_page() implementation Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 07/23] arm64: mte: Tags-aware copy_page() implementation Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 08/23] arm64: Tags-aware memcmp_pages() implementation Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 09/23] arm64: mte: Add specific SIGSEGV codes Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 10/23] arm64: mte: Handle synchronous and asynchronous tag check faults Catalin Marinas
2020-04-23 10:38   ` Catalin Marinas
2020-04-27 16:58   ` Dave Martin
2020-04-28 13:43     ` Catalin Marinas
2020-04-29 10:26       ` Dave Martin
2020-04-21 14:25 ` [PATCH v3 11/23] mm: Introduce arch_calc_vm_flag_bits() Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 12/23] arm64: mte: Add PROT_MTE support to mmap() and mprotect() Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 13/23] mm: Introduce arch_validate_flags() Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 14/23] arm64: mte: Validate the PROT_MTE request via arch_validate_flags() Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 15/23] mm: Allow arm64 mmap(PROT_MTE) on RAM-based files Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 16/23] arm64: mte: Allow user control of the tag check mode via prctl() Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 17/23] arm64: mte: Allow user control of the generated random tags " Catalin Marinas
2020-04-21 14:25 ` [PATCH v3 18/23] arm64: mte: Restore the GCR_EL1 register after a suspend Catalin Marinas
2020-04-23 15:23   ` Lorenzo Pieralisi
2020-04-21 14:25 ` [PATCH v3 19/23] arm64: mte: Add PTRACE_{PEEK,POKE}MTETAGS support Catalin Marinas
2020-04-24 23:28   ` Peter Collingbourne
2020-04-29 10:27   ` Kevin Brodsky
2020-04-29 15:24     ` Catalin Marinas
2020-04-29 16:46   ` Dave Martin
2020-04-30 10:21     ` Catalin Marinas
2020-05-04 16:40       ` Dave Martin
2020-05-05 18:03   ` Luis Machado
2020-05-12 19:05   ` Luis Machado
2020-05-13 10:48     ` Catalin Marinas
2020-05-13 12:52       ` Luis Machado
2020-05-13 14:11         ` Catalin Marinas
2020-05-13 15:09           ` Luis Machado
2020-05-13 16:45             ` Luis Machado
2020-05-13 17:11               ` Catalin Marinas
2020-05-18 16:47               ` Dave Martin
2020-05-18 17:12                 ` Luis Machado
2020-05-19 16:10                   ` Catalin Marinas
2020-04-21 14:26 ` [PATCH v3 20/23] fs: Allow copy_mount_options() to access user-space in a single pass Catalin Marinas
2020-04-21 15:29   ` Al Viro
2020-04-21 16:45     ` Catalin Marinas
2020-04-27 16:56   ` Dave Martin
2020-04-28 14:06     ` Catalin Marinas
2020-04-29 10:28       ` Dave Martin
2020-04-28 18:16   ` Kevin Brodsky
2020-04-28 19:40     ` Catalin Marinas
2020-04-29 11:58     ` Catalin Marinas
2020-04-28 19:36   ` Catalin Marinas
2020-04-29 10:26   ` Dave Martin
2020-04-29 13:52     ` Catalin Marinas
2020-05-04 16:40       ` Dave Martin
2020-04-21 14:26 ` [PATCH v3 21/23] arm64: mte: Check the DT memory nodes for MTE support Catalin Marinas
2020-04-24 13:57   ` Catalin Marinas
2020-04-24 16:17     ` Catalin Marinas
2020-04-27 11:14       ` Suzuki K Poulose
2020-04-21 14:26 ` [PATCH v3 22/23] arm64: mte: Kconfig entry Catalin Marinas
2020-04-21 14:26 ` [PATCH v3 23/23] arm64: mte: Add Memory Tagging Extension documentation Catalin Marinas
2020-04-29 16:47   ` Dave Martin
2020-04-30 16:23     ` Catalin Marinas
2020-05-04 16:46       ` Dave Martin
2020-05-11 16:40         ` Catalin Marinas
2020-05-13 15:48           ` Dave Martin
2020-05-14 11:37             ` Catalin Marinas
2020-05-15 10:38               ` Catalin Marinas
2020-05-15 11:14                 ` Szabolcs Nagy
2020-05-15 11:27                   ` Catalin Marinas
2020-05-15 12:04                     ` Szabolcs Nagy
2020-05-15 12:13                       ` Catalin Marinas
2020-05-15 12:53                         ` Szabolcs Nagy
2020-05-18 16:52                           ` Dave Martin
2020-05-18 17:13               ` Catalin Marinas
2020-05-05 10:32   ` Szabolcs Nagy
2020-05-05 17:30     ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).