All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv14 00/17] Linear Address Masking enabling
@ 2023-01-11 12:37 Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
                   ` (17 more replies)
  0 siblings, 18 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Linear Address Masking[1] (LAM) modifies the checking that is applied to
64-bit linear addresses, allowing software to use of the untranslated
address bits for metadata.

The capability can be used for efficient address sanitizers (ASAN)
implementation and for optimizations in JITs and virtual machines.

The patchset brings support for LAM for userspace addresses. Only LAM_U57 at
this time.

Please review and consider applying.

git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git lam

v14:
  - Rework address range check in get_user() and put_user();
  - Introduce CONFIG_ADDRESS_MASKING;
  - Cache untag masking in per-CPU variable;
  - Reject LAM enabling via PTRACE_ARCH_PRCTL;
  - Fix locking around untagged_addr_remote();
  - Fix typo in MM_CONTEXT_* conversion patch;
  - Fix selftest;
v13:
  - Fix race between untagged_addr() and LAM enabling:
    + Do not allow to enable LAM after the process spawned the second thread;
    + untagged_addr() untags the address according to rules of the current
      process;
    + untagged_addr_remote() can be used for untagging addresses for foreign
      process. It requires mmap lock for the target process to be taken;
v12:
  - Rebased onto tip/x86/mm;
  - Drop VM_WARN_ON() that may produce false-positive on race between context
    switch and LAM enabling;
  - Adjust comments explain possible race;
  - User READ_ONCE() in mm_lam_cr3_mask();
  - Do not assume &init_mm == mm in initialize_tlbstate_and_flush();
  - Ack by Andy;
v11:
  - Move untag_mask to /proc/$PID/status;
  - s/SVM/SVA/g;
  - static inline arch_pgtable_dma_compat() instead of macros;
  - Replace pasid_valid() with mm_valid_pasid();
  - Acks from Ashok and Jacob (forgot to apply from v9);
v10:
  - Rebased to v6.1-rc1;
  - Add selftest for SVM vs LAM;
v9:
  - Fix race between LAM enabling and check that KVM memslot address doesn't
    have any tags;
  - Reduce untagged_addr() overhead until the first LAM user;
  - Clarify SVM vs. LAM semantics;
  - Use mmap_lock to serialize LAM enabling;
v8:
  - Drop redundant smb_mb() in prctl_enable_tagged_addr();
  - Cleanup code around build_cr3();
  - Fix commit messages;
  - Selftests updates;
  - Acked/Reviewed/Tested-bys from Alexander and Peter;
v7:
  - Drop redundant smb_mb() in prctl_enable_tagged_addr();
  - Cleanup code around build_cr3();
  - Fix commit message;
  - Fix indentation;
v6:
  - Rebased onto v6.0-rc1
  - LAM_U48 excluded from the patchet. Still available in the git tree;
  - add ARCH_GET_MAX_TAG_BITS;
  - Fix build without CONFIG_DEBUG_VM;
  - Update comments;
  - Reviewed/Tested-by from Alexander;
v5:
  - Do not use switch_mm() in enable_lam_func()
  - Use mb()/READ_ONCE() pair on LAM enabling;
  - Add self-test by Weihong Zhang;
  - Add comments;
v4:
  - Fix untagged_addr() for LAM_U48;
  - Remove no-threads restriction on LAM enabling;
  - Fix mm_struct access from /proc/$PID/arch_status
  - Fix LAM handling in initialize_tlbstate_and_flush()
  - Pack tlb_state better;
  - Comments and commit messages;
v3:
  - Rebased onto v5.19-rc1
  - Per-process enabling;
  - API overhaul (again);
  - Avoid branches and costly computations in the fast path;
  - LAM_U48 is in optional patch.
v2:
  - Rebased onto v5.18-rc1
  - New arch_prctl(2)-based API
  - Expose status of LAM (or other thread features) in
    /proc/$PID/arch_status

[1] ISE, Chapter 10. https://cdrdv2.intel.com/v1/dl/getContent/671368
Kirill A. Shutemov (12):
  x86/mm: Rework address range check in get_user() and put_user()
  x86: Allow atomic MM_CONTEXT flags setting
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  x86/mm: Handle LAM on context switch
  mm: Introduce untagged_addr_remote()
  x86/uaccess: Provide untagged_addr() and remove tags before address
    check
  x86/mm: Provide arch_prctl() interface for LAM
  x86/mm: Reduce untagged_addr() overhead until the first LAM user
  mm: Expose untagging mask in /proc/$PID/status
  iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  selftests/x86/lam: Add test cases for LAM vs thread creation

Weihong Zhang (5):
  selftests/x86/lam: Add malloc and tag-bits test cases for
    linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address
    masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for
    linear-address masking

 arch/arm64/include/asm/mmu_context.h        |    6 +
 arch/sparc/include/asm/mmu_context_64.h     |    6 +
 arch/sparc/include/asm/uaccess_64.h         |    2 +
 arch/x86/Kconfig                            |   11 +
 arch/x86/entry/vsyscall/vsyscall_64.c       |    2 +-
 arch/x86/include/asm/cpufeatures.h          |    1 +
 arch/x86/include/asm/mmu.h                  |   18 +-
 arch/x86/include/asm/mmu_context.h          |   49 +-
 arch/x86/include/asm/processor-flags.h      |    2 +
 arch/x86/include/asm/tlbflush.h             |   48 +-
 arch/x86/include/asm/uaccess.h              |   35 +-
 arch/x86/include/uapi/asm/prctl.h           |    5 +
 arch/x86/include/uapi/asm/processor-flags.h |    6 +
 arch/x86/kernel/process.c                   |    6 +
 arch/x86/kernel/process_64.c                |   70 +-
 arch/x86/kernel/traps.c                     |    6 +-
 arch/x86/lib/getuser.S                      |   83 +-
 arch/x86/lib/putuser.S                      |   54 +-
 arch/x86/mm/init.c                          |    5 +
 arch/x86/mm/tlb.c                           |   53 +-
 drivers/iommu/iommu-sva.c                   |    8 +-
 drivers/vfio/vfio_iommu_type1.c             |    2 +-
 fs/proc/array.c                             |    6 +
 fs/proc/task_mmu.c                          |    9 +-
 include/linux/ioasid.h                      |    9 -
 include/linux/mm.h                          |   11 -
 include/linux/mmu_context.h                 |   14 +
 include/linux/sched/mm.h                    |    8 +-
 include/linux/uaccess.h                     |   22 +
 mm/debug.c                                  |    1 +
 mm/gup.c                                    |    4 +-
 mm/madvise.c                                |    5 +-
 mm/migrate.c                                |   11 +-
 tools/testing/selftests/x86/Makefile        |    2 +-
 tools/testing/selftests/x86/lam.c           | 1241 +++++++++++++++++++
 35 files changed, 1673 insertions(+), 148 deletions(-)
 create mode 100644 tools/testing/selftests/x86/lam.c

-- 
2.38.2


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user()
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-18 15:49   ` Peter Zijlstra
  2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

The functions get_user() and put_user() check that the target address
range resides in the user space portion of the virtual address space.
In order to perform this check, the functions compare the end of the
range against TASK_SIZE_MAX.

For kernels compiled with CONFIG_X86_5LEVEL, this process requires some
additional trickery using ALTERNATIVE, as TASK_SIZE_MAX depends on the
paging mode in use.

Linus suggested that this check could be simplified for 64-bit kernels.
It is sufficient to check bit 63 of the address to ensure that the range
belongs to user space. Additionally, the use of branches can be avoided
by setting the target address to all ones if bit 63 is set.

There's no need to check the end of the access range as there's huge
gap between end of userspace range and start of the kernel range. The
gap consists of canonical hole and unused ranges on both kernel and
userspace sides.

If an address with bit 63 set is passed down, it will trigger a #GP
exception. _ASM_EXTABLE_UA() complains about this. Replace it with
plain _ASM_EXTABLE() as it is expected behaviour now.

The updated get_user() and put_user() checks are also compatible with
Linear Address Masking, which allows user space to encode metadata in
the upper bits of pointers and eliminates the need to untag the address
before handling it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/lib/getuser.S | 83 ++++++++++++++++--------------------------
 arch/x86/lib/putuser.S | 54 ++++++++++++---------------
 2 files changed, 55 insertions(+), 82 deletions(-)

diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index b70d98d79a9d..b64a2bd1a1ef 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -37,22 +37,22 @@
 
 #define ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", X86_FEATURE_LFENCE_RDTSC
 
-#ifdef CONFIG_X86_5LEVEL
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rdx), \
-		    __stringify(mov $((1 << 56) - 4096 - (n)),%rdx), X86_FEATURE_LA57
-#else
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	mov $(TASK_SIZE_MAX - (n)),%_ASM_DX
-#endif
+.macro check_range size:req
+.if IS_ENABLED(CONFIG_X86_64)
+	mov %rax, %rdx
+	sar $63, %rdx
+	or %rdx, %rax
+.else
+	cmp $TASK_SIZE_MAX-\size+1, %eax
+	jae .Lbad_get_user
+	sbb %edx, %edx		/* array_index_mask_nospec() */
+	and %edx, %eax
+.endif
+.endm
 
 	.text
 SYM_FUNC_START(__get_user_1)
-	LOAD_TASK_SIZE_MINUS_N(0)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	check_range size=1
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -62,11 +62,7 @@ SYM_FUNC_END(__get_user_1)
 EXPORT_SYMBOL(__get_user_1)
 
 SYM_FUNC_START(__get_user_2)
-	LOAD_TASK_SIZE_MINUS_N(1)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	check_range size=2
 	ASM_STAC
 2:	movzwl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -76,11 +72,7 @@ SYM_FUNC_END(__get_user_2)
 EXPORT_SYMBOL(__get_user_2)
 
 SYM_FUNC_START(__get_user_4)
-	LOAD_TASK_SIZE_MINUS_N(3)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	check_range size=4
 	ASM_STAC
 3:	movl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -90,30 +82,17 @@ SYM_FUNC_END(__get_user_4)
 EXPORT_SYMBOL(__get_user_4)
 
 SYM_FUNC_START(__get_user_8)
-#ifdef CONFIG_X86_64
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	check_range size=8
 	ASM_STAC
+#ifdef CONFIG_X86_64
 4:	movq (%_ASM_AX),%rdx
-	xor %eax,%eax
-	ASM_CLAC
-	RET
 #else
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user_8
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
-	ASM_STAC
 4:	movl (%_ASM_AX),%edx
 5:	movl 4(%_ASM_AX),%ecx
+#endif
 	xor %eax,%eax
 	ASM_CLAC
 	RET
-#endif
 SYM_FUNC_END(__get_user_8)
 EXPORT_SYMBOL(__get_user_8)
 
@@ -166,7 +145,7 @@ EXPORT_SYMBOL(__get_user_nocheck_8)
 
 SYM_CODE_START_LOCAL(.Lbad_get_user_clac)
 	ASM_CLAC
-bad_get_user:
+.Lbad_get_user:
 	xor %edx,%edx
 	mov $(-EFAULT),%_ASM_AX
 	RET
@@ -184,23 +163,23 @@ SYM_CODE_END(.Lbad_get_user_8_clac)
 #endif
 
 /* get_user */
-	_ASM_EXTABLE_UA(1b, .Lbad_get_user_clac)
-	_ASM_EXTABLE_UA(2b, .Lbad_get_user_clac)
-	_ASM_EXTABLE_UA(3b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(1b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(2b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(3b, .Lbad_get_user_clac)
 #ifdef CONFIG_X86_64
-	_ASM_EXTABLE_UA(4b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(4b, .Lbad_get_user_clac)
 #else
-	_ASM_EXTABLE_UA(4b, .Lbad_get_user_8_clac)
-	_ASM_EXTABLE_UA(5b, .Lbad_get_user_8_clac)
+	_ASM_EXTABLE(4b, .Lbad_get_user_8_clac)
+	_ASM_EXTABLE(5b, .Lbad_get_user_8_clac)
 #endif
 
 /* __get_user */
-	_ASM_EXTABLE_UA(6b, .Lbad_get_user_clac)
-	_ASM_EXTABLE_UA(7b, .Lbad_get_user_clac)
-	_ASM_EXTABLE_UA(8b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(6b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(7b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(8b, .Lbad_get_user_clac)
 #ifdef CONFIG_X86_64
-	_ASM_EXTABLE_UA(9b, .Lbad_get_user_clac)
+	_ASM_EXTABLE(9b, .Lbad_get_user_clac)
 #else
-	_ASM_EXTABLE_UA(9b, .Lbad_get_user_8_clac)
-	_ASM_EXTABLE_UA(10b, .Lbad_get_user_8_clac)
+	_ASM_EXTABLE(9b, .Lbad_get_user_8_clac)
+	_ASM_EXTABLE(10b, .Lbad_get_user_8_clac)
 #endif
diff --git a/arch/x86/lib/putuser.S b/arch/x86/lib/putuser.S
index 32125224fcca..3062d09a776d 100644
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -33,20 +33,20 @@
  * as they get called from within inline assembly.
  */
 
-#ifdef CONFIG_X86_5LEVEL
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rbx), \
-		    __stringify(mov $((1 << 56) - 4096 - (n)),%rbx), X86_FEATURE_LA57
-#else
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	mov $(TASK_SIZE_MAX - (n)),%_ASM_BX
-#endif
+.macro check_range size:req
+.if IS_ENABLED(CONFIG_X86_64)
+	mov %rcx, %rbx
+	sar $63, %rbx
+	or %rbx, %rcx
+.else
+	cmp $TASK_SIZE_MAX-\size+1, %ecx
+	jae .Lbad_put_user
+.endif
+.endm
 
 .text
 SYM_FUNC_START(__put_user_1)
-	LOAD_TASK_SIZE_MINUS_N(0)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	check_range size=1
 	ASM_STAC
 1:	movb %al,(%_ASM_CX)
 	xor %ecx,%ecx
@@ -66,9 +66,7 @@ SYM_FUNC_END(__put_user_nocheck_1)
 EXPORT_SYMBOL(__put_user_nocheck_1)
 
 SYM_FUNC_START(__put_user_2)
-	LOAD_TASK_SIZE_MINUS_N(1)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	check_range size=2
 	ASM_STAC
 3:	movw %ax,(%_ASM_CX)
 	xor %ecx,%ecx
@@ -88,9 +86,7 @@ SYM_FUNC_END(__put_user_nocheck_2)
 EXPORT_SYMBOL(__put_user_nocheck_2)
 
 SYM_FUNC_START(__put_user_4)
-	LOAD_TASK_SIZE_MINUS_N(3)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	check_range size=4
 	ASM_STAC
 5:	movl %eax,(%_ASM_CX)
 	xor %ecx,%ecx
@@ -110,9 +106,7 @@ SYM_FUNC_END(__put_user_nocheck_4)
 EXPORT_SYMBOL(__put_user_nocheck_4)
 
 SYM_FUNC_START(__put_user_8)
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	check_range size=8
 	ASM_STAC
 7:	mov %_ASM_AX,(%_ASM_CX)
 #ifdef CONFIG_X86_32
@@ -144,15 +138,15 @@ SYM_CODE_START_LOCAL(.Lbad_put_user_clac)
 	RET
 SYM_CODE_END(.Lbad_put_user_clac)
 
-	_ASM_EXTABLE_UA(1b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(2b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(3b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(4b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(5b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(6b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(7b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(9b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(1b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(2b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(3b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(4b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(5b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(6b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(7b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(9b, .Lbad_put_user_clac)
 #ifdef CONFIG_X86_32
-	_ASM_EXTABLE_UA(8b, .Lbad_put_user_clac)
-	_ASM_EXTABLE_UA(10b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(8b, .Lbad_put_user_clac)
+	_ASM_EXTABLE(10b, .Lbad_put_user_clac)
 #endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

So far there's no need in atomic setting of MM context flags in
mm_context_t::flags. The flags set early in exec and never change
after that.

LAM enabling requires atomic flag setting. The upcoming flag
MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process
lifetime where multiple threads exist.

Convert the field to unsigned long and do MM_CONTEXT_* accesses with
__set_bit() and test_bit().

No functional changes.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
 arch/x86/include/asm/mmu.h            | 6 +++---
 arch/x86/include/asm/mmu_context.h    | 2 +-
 arch/x86/kernel/process_64.c          | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 4af81df133ee..aa226f451c52 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -317,7 +317,7 @@ static struct vm_area_struct gate_vma __ro_after_init = {
 struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
 {
 #ifdef CONFIG_COMPAT
-	if (!mm || !(mm->context.flags & MM_CONTEXT_HAS_VSYSCALL))
+	if (!mm || !test_bit(MM_CONTEXT_HAS_VSYSCALL, &mm->context.flags))
 		return NULL;
 #endif
 	if (vsyscall_mode == NONE)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..efa3eaee522c 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,9 +9,9 @@
 #include <linux/bits.h>
 
 /* Uprobes on this MM assume 32-bit code */
-#define MM_CONTEXT_UPROBE_IA32	BIT(0)
+#define MM_CONTEXT_UPROBE_IA32		0
 /* vsyscall page is accessible on this MM */
-#define MM_CONTEXT_HAS_VSYSCALL	BIT(1)
+#define MM_CONTEXT_HAS_VSYSCALL		1
 
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
@@ -39,7 +39,7 @@ typedef struct {
 #endif
 
 #ifdef CONFIG_X86_64
-	unsigned short flags;
+	unsigned long flags;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..53ef591a6166 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -181,7 +181,7 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
 static inline bool is_64bit_mm(struct mm_struct *mm)
 {
 	return	!IS_ENABLED(CONFIG_IA32_EMULATION) ||
-		!(mm->context.flags & MM_CONTEXT_UPROBE_IA32);
+		!test_bit(MM_CONTEXT_UPROBE_IA32, &mm->context.flags);
 }
 #else
 static inline bool is_64bit_mm(struct mm_struct *mm)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4e34b3b68ebd..8b06034e8c70 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -671,7 +671,7 @@ void set_personality_64bit(void)
 	task_pt_regs(current)->orig_ax = __NR_execve;
 	current_thread_info()->status &= ~TS_COMPAT;
 	if (current->mm)
-		current->mm->context.flags = MM_CONTEXT_HAS_VSYSCALL;
+		__set_bit(MM_CONTEXT_HAS_VSYSCALL, &current->mm->context.flags);
 
 	/* TBD: overwrites user setup. Should have two bits.
 	   But 64bit processes have always behaved this way,
@@ -708,7 +708,7 @@ static void __set_personality_ia32(void)
 		 * uprobes applied to this MM need to know this and
 		 * cannot use user_64bit_mode() at that time.
 		 */
-		current->mm->context.flags = MM_CONTEXT_UPROBE_IA32;
+		__set_bit(MM_CONTEXT_UPROBE_IA32, &current->mm->context.flags);
 	}
 
 	current->personality |= force_personality32;
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Enumerate Linear Address Masking and provide defines for CR3 and CR4
flags.

The new CONFIG_ADDRESS_MASKING option enables the feature support in
kernel.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
---
 arch/x86/Kconfig                            | 11 +++++++++++
 arch/x86/include/asm/cpufeatures.h          |  1 +
 arch/x86/include/asm/processor-flags.h      |  2 ++
 arch/x86/include/uapi/asm/processor-flags.h |  6 ++++++
 4 files changed, 20 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3604074a878b..211869aa618d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2290,6 +2290,17 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	  If unsure, leave at the default value.
 
+config ADDRESS_MASKING
+	bool "Linear Address Masking support"
+	depends on X86_64
+	help
+	  Linear Address Masking (LAM) modifies the checking that is applied
+	  to 64-bit linear addresses, allowing software to use of the
+	  untranslated address bits for metadata.
+
+	  The capability can be used for efficient address sanitizers (ASAN)
+	  implementation and for optimizations in JITs.
+
 config HOTPLUG_CPU
 	def_bool y
 	depends on SMP
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 61012476d66e..bc662c80b99d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -314,6 +314,7 @@
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* "" CMPccXADD instructions */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
+#define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index a7f3d9100adb..d8cccadc83a6 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -28,6 +28,8 @@
  * On systems with SME, one bit (in a variable position!) is stolen to indicate
  * that the top-level paging structure is encrypted.
  *
+ * On systemms with LAM, bits 61 and 62 are used to indicate LAM mode.
+ *
  * All of the remaining bits indicate the physical address of the top-level
  * paging structure.
  *
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index c47cc7f2feeb..d898432947ff 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -82,6 +82,10 @@
 #define X86_CR3_PCID_BITS	12
 #define X86_CR3_PCID_MASK	(_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
 
+#define X86_CR3_LAM_U57_BIT	61 /* Activate LAM for userspace, 62:57 bits masked */
+#define X86_CR3_LAM_U57		_BITULL(X86_CR3_LAM_U57_BIT)
+#define X86_CR3_LAM_U48_BIT	62 /* Activate LAM for userspace, 62:48 bits masked */
+#define X86_CR3_LAM_U48		_BITULL(X86_CR3_LAM_U48_BIT)
 #define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
 #define X86_CR3_PCID_NOFLUSH    _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
 
@@ -132,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
+#define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 04/17] x86/mm: Handle LAM on context switch
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 13:49   ` Linus Torvalds
  2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Linear Address Masking mode for userspace pointers encoded in CR3 bits.
The mode is selected per-process and stored in mm_context_t.

switch_mm_irqs_off() now respects selected LAM mode and constructs CR3
accordingly.

The active LAM mode gets recorded in the tlb_state.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
---
 arch/x86/include/asm/mmu.h         |  5 +++
 arch/x86/include/asm/mmu_context.h | 24 ++++++++++++++
 arch/x86/include/asm/tlbflush.h    | 38 ++++++++++++++++++++-
 arch/x86/mm/tlb.c                  | 53 +++++++++++++++++++++---------
 4 files changed, 103 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index efa3eaee522c..22fc9fbf1d0a 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -42,6 +42,11 @@ typedef struct {
 	unsigned long flags;
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
+	unsigned long lam_cr3_mask;
+#endif
+
 	struct mutex lock;
 	void __user *vdso;			/* vdso base address */
 	const struct vdso_image *vdso_image;	/* vdso image in use */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 53ef591a6166..8388fccc4700 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 }
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return READ_ONCE(mm->context.lam_cr3_mask);
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+}
+
+#else
+
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+}
+#endif
+
 #define enter_lazy_tlb enter_lazy_tlb
 extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
 
@@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	arch_dup_pkeys(oldmm, mm);
 	paravirt_arch_dup_mmap(oldmm, mm);
+	dup_lam(oldmm, mm);
 	return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda3118f3b27..e8b47f57bd4a 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_X86_TLBFLUSH_H
 #define _ASM_X86_TLBFLUSH_H
 
-#include <linux/mm.h>
+#include <linux/mm_types.h>
 #include <linux/sched.h>
 
 #include <asm/processor.h>
@@ -12,6 +12,7 @@
 #include <asm/invpcid.h>
 #include <asm/pti.h>
 #include <asm/processor-flags.h>
+#include <asm/pgtable.h>
 
 void __flush_tlb_all(void);
 
@@ -101,6 +102,16 @@ struct tlb_state {
 	 */
 	bool invalidate_other;
 
+#ifdef CONFIG_ADDRESS_MASKING
+	/*
+	 * Active LAM mode.
+	 *
+	 * X86_CR3_LAM_U57/U48 shifted right by X86_CR3_LAM_U57_BIT or 0 if LAM
+	 * disabled.
+	 */
+	u8 lam;
+#endif
+
 	/*
 	 * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
 	 * the corresponding user PCID needs a flush next time we
@@ -357,6 +368,31 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
 }
 #define huge_pmd_needs_flush huge_pmd_needs_flush
 
+#ifdef CONFIG_ADDRESS_MASKING
+static inline  u64 tlbstate_lam_cr3_mask(void)
+{
+	u64 lam = this_cpu_read(cpu_tlbstate.lam);
+
+	return lam << X86_CR3_LAM_U57_BIT;
+}
+
+static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
+{
+	this_cpu_write(cpu_tlbstate.lam,
+		       mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT);
+}
+
+#else
+
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
+{
+}
+#endif
 #endif /* !MODULE */
 
 static inline void __native_tlb_flush_global(unsigned long cr4)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c1e31e9a85d7..8c330a6d0ece 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -154,26 +154,30 @@ static inline u16 user_pcid(u16 asid)
 	return ret;
 }
 
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam)
 {
+	unsigned long cr3 = __sme_pa(pgd) | lam;
+
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
+		VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+		cr3 |= kern_pcid(asid);
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
 	}
+
+	return cr3;
 }
 
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid,
+					      unsigned long lam)
 {
-	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	/*
 	 * Use boot_cpu_has() instead of this_cpu_has() as this function
 	 * might be called during early boot. This should work even after
 	 * boot because all CPU's the have same capabilities:
 	 */
 	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+	return build_cr3(pgd, asid, lam) | CR3_NOFLUSH;
 }
 
 /*
@@ -274,15 +278,16 @@ static inline void invalidate_user_asid(u16 asid)
 		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
 }
 
-static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam,
+			    bool need_flush)
 {
 	unsigned long new_mm_cr3;
 
 	if (need_flush) {
 		invalidate_user_asid(new_asid);
-		new_mm_cr3 = build_cr3(pgdir, new_asid);
+		new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
 	} else {
-		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
 	}
 
 	/*
@@ -491,6 +496,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
@@ -520,7 +526,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid,
+						   tlbstate_lam_cr3_mask()))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -552,9 +559,15 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * instruction.
 	 */
 	if (real_prev == next) {
+		/* Not actually switching mm's */
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
 
+		/*
+		 * If this races with another thread that enables lam, 'new_lam'
+		 * might not match tlbstate_lam_cr3_mask().
+		 */
+
 		/*
 		 * Even in lazy TLB mode, the CPU should stay set in the
 		 * mm_cpumask. The TLB shootdown code can figure out from
@@ -622,15 +635,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	set_tlbstate_lam_mode(next);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-		load_new_mm_cr3(next->pgd, new_asid, true);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 	} else {
 		/* The new ASID is already up to date. */
-		load_new_mm_cr3(next->pgd, new_asid, false);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
@@ -691,6 +705,10 @@ void initialize_tlbstate_and_flush(void)
 	/* Assert that CR3 already references the right mm. */
 	WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
 
+	/* LAM expected to be disabled */
+	WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
+	WARN_ON(mm_lam_cr3_mask(mm));
+
 	/*
 	 * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
 	 * doesn't work like other CR4 bits because it can only be set from
@@ -699,8 +717,8 @@ void initialize_tlbstate_and_flush(void)
 	WARN_ON(boot_cpu_has(X86_FEATURE_PCID) &&
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
-	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm->pgd, 0));
+	/* Disable LAM, force ASID 0 and force a TLB flush. */
+	write_cr3(build_cr3(mm->pgd, 0, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
@@ -708,6 +726,7 @@ void initialize_tlbstate_and_flush(void)
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
+	set_tlbstate_lam_mode(mm);
 
 	for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
 		this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
@@ -1071,8 +1090,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
  */
 unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+	unsigned long cr3 =
+		build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+			  this_cpu_read(cpu_tlbstate.loaded_mm_asid),
+			  tlbstate_lam_cr3_mask());
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 05/17] mm: Introduce untagged_addr_remote()
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

untagged_addr() removes tags/metadata from the address and brings it to
the canonical form. The helper is implemented on arm64 and sparc. Both of
them do untagging based on global rules.

However, Linear Address Masking (LAM) on x86 introduces per-process
settings for untagging. As a result, untagged_addr() is now only
suitable for untagging addresses for the current proccess.

The new helper untagged_addr_remote() has to be used when the address
targets remote process. It requires the mmap lock for target mm to be
taken.

Export dump_mm() as there's now the first user for it: VFIO can be
compiled as module and untagged_addr_remote() triggers dump_mm() via
mmap_assert_locked().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/sparc/include/asm/uaccess_64.h |  2 ++
 drivers/vfio/vfio_iommu_type1.c     |  2 +-
 fs/proc/task_mmu.c                  |  9 +++++++--
 include/linux/mm.h                  | 11 -----------
 include/linux/uaccess.h             | 22 ++++++++++++++++++++++
 mm/debug.c                          |  1 +
 mm/gup.c                            |  4 ++--
 mm/madvise.c                        |  5 +++--
 mm/migrate.c                        | 11 ++++++-----
 9 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index 94266a5c5b04..b825a5dd0210 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -8,8 +8,10 @@
 
 #include <linux/compiler.h>
 #include <linux/string.h>
+#include <linux/mm_types.h>
 #include <asm/asi.h>
 #include <asm/spitfire.h>
+#include <asm/pgtable.h>
 
 #include <asm/processor.h>
 #include <asm-generic/access_ok.h>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe98c00..daf34f957b7b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -573,7 +573,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 		goto done;
 	}
 
-	vaddr = untagged_addr(vaddr);
+	vaddr = untagged_addr_remote(mm, vaddr);
 
 retry:
 	vma = vma_lookup(mm, vaddr);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e35a0398db63..df139a717230 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1692,8 +1692,13 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 
 	/* watch out for wraparound */
 	start_vaddr = end_vaddr;
-	if (svpfn <= (ULONG_MAX >> PAGE_SHIFT))
-		start_vaddr = untagged_addr(svpfn << PAGE_SHIFT);
+	if (svpfn <= (ULONG_MAX >> PAGE_SHIFT)) {
+		ret = mmap_read_lock_killable(mm);
+		if (ret)
+			goto out_free;
+		start_vaddr = untagged_addr_remote(mm, svpfn << PAGE_SHIFT);
+		mmap_read_unlock(mm);
+	}
 
 	/* Ensure the address is inside the task */
 	if (start_vaddr > mm->task_size)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f3f196e4d66d..6b28eb9c6ea2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -96,17 +96,6 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #include <asm/page.h>
 #include <asm/processor.h>
 
-/*
- * Architectures that support memory tagging (assigning tags to memory regions,
- * embedding these tags into addresses that point to these memory regions, and
- * checking that the memory and the pointer tags match on memory accesses)
- * redefine this macro to strip tags from pointers.
- * It's defined as noop for architectures that don't support memory tagging.
- */
-#ifndef untagged_addr
-#define untagged_addr(addr) (addr)
-#endif
-
 #ifndef __pa_symbol
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index afb18f198843..bfdadf5f8bbb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -10,6 +10,28 @@
 
 #include <asm/uaccess.h>
 
+/*
+ * Architectures that support memory tagging (assigning tags to memory regions,
+ * embedding these tags into addresses that point to these memory regions, and
+ * checking that the memory and the pointer tags match on memory accesses)
+ * redefine this macro to strip tags from pointers.
+ *
+ * Passing down mm_struct allows to define untagging rules on per-process
+ * basis.
+ *
+ * It's defined as noop for architectures that don't support memory tagging.
+ */
+#ifndef untagged_addr
+#define untagged_addr(addr) (addr)
+#endif
+
+#ifndef untagged_addr_remote
+#define untagged_addr_remote(mm, addr)	({		\
+	mmap_assert_locked(mm);				\
+	untagged_addr(addr);				\
+})
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
diff --git a/mm/debug.c b/mm/debug.c
index 7f8e5f744e42..3c1b490c7e2b 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
 		mm->def_flags, &mm->def_flags
 	);
 }
+EXPORT_SYMBOL_GPL(dump_mm);
 
 static bool page_init_poisoning __read_mostly = true;
 
diff --git a/mm/gup.c b/mm/gup.c
index f45a3a5be53a..e28d787ba8f8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1091,7 +1091,7 @@ static long __get_user_pages(struct mm_struct *mm,
 	if (!nr_pages)
 		return 0;
 
-	start = untagged_addr(start);
+	start = untagged_addr_remote(mm, start);
 
 	VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));
 
@@ -1265,7 +1265,7 @@ int fixup_user_fault(struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	vm_fault_t ret;
 
-	address = untagged_addr(address);
+	address = untagged_addr_remote(mm, address);
 
 	if (unlocked)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
diff --git a/mm/madvise.c b/mm/madvise.c
index a56a6d17e201..90cd4a442fd2 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1407,8 +1407,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	size_t len;
 	struct blk_plug plug;
 
-	start = untagged_addr(start);
-
 	if (!madvise_behavior_valid(behavior))
 		return -EINVAL;
 
@@ -1440,6 +1438,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 		mmap_read_lock(mm);
 	}
 
+	start = untagged_addr_remote(mm, start);
+	end = start + len;
+
 	blk_start_plug(&plug);
 	error = madvise_walk_vmas(mm, start, end, behavior,
 			madvise_vma_behavior);
diff --git a/mm/migrate.c b/mm/migrate.c
index a4d3fc65085f..dae5022d94b0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1719,14 +1719,17 @@ static int do_move_pages_to_node(struct mm_struct *mm,
  *         target node
  *     1 - when it has been queued
  */
-static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
+static int add_page_for_migration(struct mm_struct *mm, const void __user *p,
 		int node, struct list_head *pagelist, bool migrate_all)
 {
 	struct vm_area_struct *vma;
+	unsigned long addr;
 	struct page *page;
 	int err;
 
 	mmap_read_lock(mm);
+	addr = (unsigned long)untagged_addr_remote(mm, p);
+
 	err = -EFAULT;
 	vma = vma_lookup(mm, addr);
 	if (!vma || !vma_migratable(vma))
@@ -1831,7 +1834,6 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 
 	for (i = start = 0; i < nr_pages; i++) {
 		const void __user *p;
-		unsigned long addr;
 		int node;
 
 		err = -EFAULT;
@@ -1839,7 +1841,6 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			goto out_flush;
 		if (get_user(node, nodes + i))
 			goto out_flush;
-		addr = (unsigned long)untagged_addr(p);
 
 		err = -ENODEV;
 		if (node < 0 || node >= MAX_NUMNODES)
@@ -1867,8 +1868,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		 * Errors in the page lookup or isolation are not fatal and we simply
 		 * report them via status
 		 */
-		err = add_page_for_migration(mm, addr, current_node,
-				&pagelist, flags & MPOL_MF_MOVE_ALL);
+		err = add_page_for_migration(mm, p, current_node, &pagelist,
+					     flags & MPOL_MF_MOVE_ALL);
 
 		if (err > 0) {
 			/* The page is successfully queued for migration */
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

untagged_addr() is a helper used by the core-mm to strip tag bits and
get the address to the canonical shape based on rules of the current
thread. It only handles userspace addresses.

The untagging mask is stored in per-CPU variable and set on context
switching to the task.

The tags must not be included into check whether it's okay to access the
userspace address. Strip tags in access_ok().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         |  3 +++
 arch/x86/include/asm/mmu_context.h | 11 +++++++++++
 arch/x86/include/asm/tlbflush.h    | 10 ++++++++++
 arch/x86/include/asm/uaccess.h     | 31 ++++++++++++++++++++++++++++--
 arch/x86/kernel/process.c          |  3 +++
 arch/x86/mm/init.c                 |  5 +++++
 6 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 22fc9fbf1d0a..9cac8c45a647 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -45,6 +45,9 @@ typedef struct {
 #ifdef CONFIG_ADDRESS_MASKING
 	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
 	unsigned long lam_cr3_mask;
+
+	/* Significant bits of the virtual address. Excludes tag bits. */
+	u64 untag_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8388fccc4700..1d0b743daebb 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -100,6 +100,12 @@ static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+	mm->context.untag_mask = oldmm->context.untag_mask;
+}
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+	mm->context.untag_mask = -1UL;
 }
 
 #else
@@ -112,6 +118,10 @@ static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 }
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+}
 #endif
 
 #define enter_lazy_tlb enter_lazy_tlb
@@ -138,6 +148,7 @@ static inline int init_new_context(struct task_struct *tsk,
 		mm->context.execute_only_pkey = -1;
 	}
 #endif
+	mm_reset_untag_mask(mm);
 	init_new_context_ldt(mm);
 	return 0;
 }
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index e8b47f57bd4a..75bfaa421030 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -54,6 +54,15 @@ static inline void cr4_clear_bits(unsigned long mask)
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_ADDRESS_MASKING
+DECLARE_PER_CPU(u64, tlbstate_untag_mask);
+
+static inline u64 current_untag_mask(void)
+{
+	return this_cpu_read(tlbstate_untag_mask);
+}
+#endif
+
 #ifndef MODULE
 /*
  * 6 because 6 should be plenty and struct tlb_state will fit in two cache
@@ -380,6 +389,7 @@ static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
 {
 	this_cpu_write(cpu_tlbstate.lam,
 		       mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT);
+	this_cpu_write(tlbstate_untag_mask, mm->context.untag_mask);
 }
 
 #else
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 1cc756eafa44..32c9dd052e43 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -7,11 +7,13 @@
 #include <linux/compiler.h>
 #include <linux/instrumented.h>
 #include <linux/kasan-checks.h>
+#include <linux/mm_types.h>
 #include <linux/string.h>
 #include <asm/asm.h>
 #include <asm/page.h>
 #include <asm/smap.h>
 #include <asm/extable.h>
+#include <asm/tlbflush.h>
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 static inline bool pagefault_disabled(void);
@@ -21,6 +23,31 @@ static inline bool pagefault_disabled(void);
 # define WARN_ON_IN_IRQ()
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+/*
+ * Mask out tag bits from the address.
+ *
+ * Magic with the 'sign' allows to untag userspace pointer without any branches
+ * while leaving kernel addresses intact.
+ */
+#define __untagged_addr(untag_mask, addr)	({			\
+	u64 __addr = (__force u64)(addr);				\
+	s64 sign = (s64)__addr >> 63;					\
+	__addr &= untag_mask | sign;					\
+	(__force __typeof__(addr))__addr;				\
+})
+
+#define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
+
+#define untagged_addr_remote(mm, addr)	({				\
+	mmap_assert_locked(mm);						\
+	__untagged_addr((mm)->context.untag_mask, addr);		\
+})
+
+#else
+#define untagged_addr(addr)    (addr)
+#endif
+
 /**
  * access_ok - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
@@ -38,10 +65,10 @@ static inline bool pagefault_disabled(void);
  * Return: true (nonzero) if the memory block may be valid, false (zero)
  * if it is definitely invalid.
  */
-#define access_ok(addr, size)					\
+#define access_ok(addr, size)						\
 ({									\
 	WARN_ON_IN_IRQ();						\
-	likely(__access_ok(addr, size));				\
+	likely(__access_ok(untagged_addr(addr), size));			\
 })
 
 #include <asm-generic/access_ok.h>
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 40d156a31676..ef6bde1d40d8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,6 +47,7 @@
 #include <asm/frame.h>
 #include <asm/unwind.h>
 #include <asm/tdx.h>
+#include <asm/mmu_context.h>
 
 #include "process.h"
 
@@ -367,6 +368,8 @@ void arch_setup_new_exec(void)
 		task_clear_spec_ssb_noexec(current);
 		speculation_ctrl_update(read_thread_flags());
 	}
+
+	mm_reset_untag_mask(current->mm);
 }
 
 #ifdef CONFIG_X86_IOPL_IOPERM
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d3987359d441..be5c7d1c0265 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1044,6 +1044,11 @@ __visible DEFINE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate) = {
 	.cr4 = ~0UL,	/* fail hard if we screw up cr4 shadow initialization */
 };
 
+#ifdef CONFIG_ADDRESS_MASKING
+DEFINE_PER_CPU(u64, tlbstate_untag_mask);
+EXPORT_PER_CPU_SYMBOL(tlbstate_untag_mask);
+#endif
+
 void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache)
 {
 	/* entry 0 MUST be WB (hardwired to speed up translations) */
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Add a few of arch_prctl() handles:

 - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
   of tag bits. It is rounded up to the nearest LAM mode that can
   provide it. For now only LAM_U57 is supported, with 6 tag bits.

 - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
   bits located in the address.

 - ARCH_GET_MAX_TAG_BITS returns the maximum tag bits user can request.
   Zero if LAM is not supported.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h        |  2 ++
 arch/x86/include/uapi/asm/prctl.h |  4 +++
 arch/x86/kernel/process.c         |  3 ++
 arch/x86/kernel/process_64.c      | 55 ++++++++++++++++++++++++++++++-
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 9cac8c45a647..e80762e998ce 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -12,6 +12,8 @@
 #define MM_CONTEXT_UPROBE_IA32		0
 /* vsyscall page is accessible on this MM */
 #define MM_CONTEXT_HAS_VSYSCALL		1
+/* Do not allow changing LAM mode */
+#define MM_CONTEXT_LOCK_LAM		2
 
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 500b96e71f18..a31e27b95b19 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -20,4 +20,8 @@
 #define ARCH_MAP_VDSO_32		0x2002
 #define ARCH_MAP_VDSO_64		0x2003
 
+#define ARCH_GET_UNTAG_MASK		0x4001
+#define ARCH_ENABLE_TAGGED_ADDR		0x4002
+#define ARCH_GET_MAX_TAG_BITS		0x4003
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ef6bde1d40d8..cc0677f58f42 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -162,6 +162,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 
 	savesegment(es, p->thread.es);
 	savesegment(ds, p->thread.ds);
+
+	if (p->mm && (clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM)
+		set_bit(MM_CONTEXT_LOCK_LAM, &p->mm->context.flags);
 #else
 	p->thread.sp0 = (unsigned long) (childregs + 1);
 	savesegment(gs, p->thread.gs);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8b06034e8c70..88aae519c8f8 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -743,6 +743,48 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 }
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+
+#define LAM_U57_BITS 6
+
+static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_LAM))
+		return -ENODEV;
+
+	/* PTRACE_ARCH_PRCTL */
+	if (current->mm != mm)
+		return -EINVAL;
+
+	if (mmap_write_lock_killable(mm))
+		return -EINTR;
+
+	if (test_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags)) {
+		mmap_write_unlock(mm);
+		return -EBUSY;
+	}
+
+	if (!nr_bits) {
+		mmap_write_unlock(mm);
+		return -EINVAL;
+	} else if (nr_bits <= LAM_U57_BITS) {
+		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
+		mm->context.untag_mask =  ~GENMASK(62, 57);
+	} else {
+		mmap_write_unlock(mm);
+		return -EINVAL;
+	}
+
+	write_cr3(__read_cr3() | mm->context.lam_cr3_mask);
+	set_tlbstate_lam_mode(mm);
+	set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags);
+
+	mmap_write_unlock(mm);
+
+	return 0;
+}
+#endif
+
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 {
 	int ret = 0;
@@ -830,7 +872,18 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 	case ARCH_MAP_VDSO_64:
 		return prctl_map_vdso(&vdso_image_64, arg2);
 #endif
-
+#ifdef CONFIG_ADDRESS_MASKING
+	case ARCH_GET_UNTAG_MASK:
+		return put_user(task->mm->context.untag_mask,
+				(unsigned long __user *)arg2);
+	case ARCH_ENABLE_TAGGED_ADDR:
+		return prctl_enable_tagged_addr(task->mm, arg2);
+	case ARCH_GET_MAX_TAG_BITS:
+		if (!cpu_feature_enabled(X86_FEATURE_LAM))
+			return put_user(0, (unsigned long __user *)arg2);
+		else
+			return put_user(LAM_U57_BITS, (unsigned long __user *)arg2);
+#endif
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-17 13:05   ` Peter Zijlstra
  2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Use static key to reduce untagged_addr() overhead.

The key only gets enabled when the first process enables LAM.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/uaccess.h | 8 ++++++--
 arch/x86/kernel/process_64.c   | 4 ++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 32c9dd052e43..f9f85d596581 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -24,6 +24,8 @@ static inline bool pagefault_disabled(void);
 #endif
 
 #ifdef CONFIG_ADDRESS_MASKING
+DECLARE_STATIC_KEY_FALSE(tagged_addr_key);
+
 /*
  * Mask out tag bits from the address.
  *
@@ -32,8 +34,10 @@ static inline bool pagefault_disabled(void);
  */
 #define __untagged_addr(untag_mask, addr)	({			\
 	u64 __addr = (__force u64)(addr);				\
-	s64 sign = (s64)__addr >> 63;					\
-	__addr &= untag_mask | sign;					\
+	if (static_branch_likely(&tagged_addr_key)) {			\
+		s64 sign = (s64)__addr >> 63;				\
+		__addr &= untag_mask | sign;				\
+	}								\
 	(__force __typeof__(addr))__addr;				\
 })
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 88aae519c8f8..1b41c60ebf6e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -745,6 +745,9 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 
 #ifdef CONFIG_ADDRESS_MASKING
 
+DEFINE_STATIC_KEY_FALSE(tagged_addr_key);
+EXPORT_SYMBOL_GPL(tagged_addr_key);
+
 #define LAM_U57_BITS 6
 
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
@@ -781,6 +784,7 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 
 	mmap_write_unlock(mm);
 
+	static_branch_enable(&tagged_addr_key);
 	return 0;
 }
 #endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov,
	Catalin Marinas

Add a line in /proc/$PID/status to report untag_mask. It can be
used to find out LAM status of the process from the outside. It is
useful for debuggers.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/mmu_context.h    | 6 ++++++
 arch/sparc/include/asm/mmu_context_64.h | 6 ++++++
 arch/x86/include/asm/mmu_context.h      | 6 ++++++
 fs/proc/array.c                         | 6 ++++++
 include/linux/mmu_context.h             | 7 +++++++
 5 files changed, 31 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 72dbd6400549..56911691bef0 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -288,6 +288,12 @@ void post_ttbr_update_workaround(void);
 unsigned long arm64_mm_context_get(struct mm_struct *mm);
 void arm64_mm_context_put(struct mm_struct *mm);
 
+#define mm_untag_mask mm_untag_mask
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return -1UL >> 8;
+}
+
 #include <asm-generic/mmu_context.h>
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 7a8380c63aab..799e797c5cdd 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -185,6 +185,12 @@ static inline void finish_arch_post_lock_switch(void)
 	}
 }
 
+#define mm_untag_mask mm_untag_mask
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+       return -1UL >> adi_nbits();
+}
+
 #include <asm-generic/mmu_context.h>
 
 #endif /* !(__ASSEMBLY__) */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 1d0b743daebb..4b77c96da07f 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -103,6 +103,12 @@ static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 	mm->context.untag_mask = oldmm->context.untag_mask;
 }
 
+#define mm_untag_mask mm_untag_mask
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return mm->context.untag_mask;
+}
+
 static inline void mm_reset_untag_mask(struct mm_struct *mm)
 {
 	mm->context.untag_mask = -1UL;
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 49283b8103c7..d2a94eafe9a3 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -428,6 +428,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
 	seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
 }
 
+static inline void task_untag_mask(struct seq_file *m, struct mm_struct *mm)
+{
+	seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(mm));
+}
+
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
@@ -443,6 +448,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 		task_mem(m, mm);
 		task_core_dumping(m, task);
 		task_thp_status(m, mm);
+		task_untag_mask(m, mm);
 		mmput(mm);
 	}
 	task_sig(m, task);
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index b9b970f7ab45..14b9c1fa05c4 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -28,4 +28,11 @@ static inline void leave_mm(int cpu) { }
 # define task_cpu_possible(cpu, p)	cpumask_test_cpu((cpu), task_cpu_possible_mask(p))
 #endif
 
+#ifndef mm_untag_mask
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return -1UL;
+}
+#endif
+
 #endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

Kernel has few users of pasid_valid() and all but one checks if the
process has PASID allocated. The helper takes ioasid_t as the input.

Replace the helper with mm_valid_pasid() that takes mm_struct as the
argument. The only call that checks PASID that is not tied to mm_struct
is open-codded now.

This is preparatory patch. It helps avoid ifdeffery: no need to
dereference mm->pasid in generic code to check if the process has PASID.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/traps.c   | 6 +++---
 drivers/iommu/iommu-sva.c | 4 ++--
 include/linux/ioasid.h    | 9 ---------
 include/linux/sched/mm.h  | 8 +++++++-
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d317dc3d06a3..8b83d8fbce71 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -671,15 +671,15 @@ static bool try_fixup_enqcmd_gp(void)
 	if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
 		return false;
 
-	pasid = current->mm->pasid;
-
 	/*
 	 * If the mm has not been allocated a
 	 * PASID, the #GP can not be fixed up.
 	 */
-	if (!pasid_valid(pasid))
+	if (!mm_valid_pasid(current->mm))
 		return false;
 
+	pasid = current->mm->pasid;
+
 	/*
 	 * Did this thread already have its PASID activated?
 	 * If so, the #GP must be from something else.
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 24bf9b2b58aa..4ee2929f0d7a 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -34,14 +34,14 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
 
 	mutex_lock(&iommu_sva_lock);
 	/* Is a PASID already associated with this mm? */
-	if (pasid_valid(mm->pasid)) {
+	if (mm_valid_pasid(mm)) {
 		if (mm->pasid < min || mm->pasid >= max)
 			ret = -EOVERFLOW;
 		goto out;
 	}
 
 	pasid = ioasid_alloc(&iommu_sva_pasid, min, max, mm);
-	if (!pasid_valid(pasid))
+	if (pasid == INVALID_IOASID)
 		ret = -ENOMEM;
 	else
 		mm_pasid_set(mm, pasid);
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index af1c9d62e642..836ae09e92c2 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -40,10 +40,6 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
-static inline bool pasid_valid(ioasid_t ioasid)
-{
-	return ioasid != INVALID_IOASID;
-}
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -74,10 +70,5 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data)
 	return -ENOTSUPP;
 }
 
-static inline bool pasid_valid(ioasid_t ioasid)
-{
-	return false;
-}
-
 #endif /* CONFIG_IOASID */
 #endif /* __LINUX_IOASID_H */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 2a243616f222..b69fe7e8c0ac 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -457,6 +457,11 @@ static inline void mm_pasid_init(struct mm_struct *mm)
 	mm->pasid = INVALID_IOASID;
 }
 
+static inline bool mm_valid_pasid(struct mm_struct *mm)
+{
+	return mm->pasid != INVALID_IOASID;
+}
+
 /* Associate a PASID with an mm_struct: */
 static inline void mm_pasid_set(struct mm_struct *mm, u32 pasid)
 {
@@ -465,13 +470,14 @@ static inline void mm_pasid_set(struct mm_struct *mm, u32 pasid)
 
 static inline void mm_pasid_drop(struct mm_struct *mm)
 {
-	if (pasid_valid(mm->pasid)) {
+	if (mm_valid_pasid(mm)) {
 		ioasid_free(mm->pasid);
 		mm->pasid = INVALID_IOASID;
 	}
 }
 #else
 static inline void mm_pasid_init(struct mm_struct *mm) {}
+static inline bool mm_valid_pasid(struct mm_struct *mm) { return false; }
 static inline void mm_pasid_set(struct mm_struct *mm, u32 pasid) {}
 static inline void mm_pasid_drop(struct mm_struct *mm) {}
 #endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

IOMMU and SVA-capable devices know nothing about LAM and only expect
canonical addresses. An attempt to pass down tagged pointer will lead
to address translation failure.

By default do not allow to enable both LAM and use SVA in the same
process.

The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation.
By using the arch_prctl() userspace takes responsibility to never pass
tagged address to the device.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         | 2 ++
 arch/x86/include/asm/mmu_context.h | 6 ++++++
 arch/x86/include/uapi/asm/prctl.h  | 1 +
 arch/x86/kernel/process_64.c       | 7 +++++++
 drivers/iommu/iommu-sva.c          | 4 ++++
 include/linux/mmu_context.h        | 7 +++++++
 6 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index e80762e998ce..0da5c227f490 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -14,6 +14,8 @@
 #define MM_CONTEXT_HAS_VSYSCALL		1
 /* Do not allow changing LAM mode */
 #define MM_CONTEXT_LOCK_LAM		2
+/* Allow LAM and SVA coexisting */
+#define MM_CONTEXT_FORCE_TAGGED_SVA	3
 
 /*
  * x86 has arch-specific MMU state beyond what lives in mm_struct.
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 4b77c96da07f..6ffc42dfd59d 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -114,6 +114,12 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm)
 	mm->context.untag_mask = -1UL;
 }
 
+#define arch_pgtable_dma_compat arch_pgtable_dma_compat
+static inline bool arch_pgtable_dma_compat(struct mm_struct *mm)
+{
+	return !mm_lam_cr3_mask(mm) ||
+		test_bit(MM_CONTEXT_FORCE_TAGGED_SVA, &mm->context.flags);
+}
 #else
 
 static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index a31e27b95b19..eb290d89cb32 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -23,5 +23,6 @@
 #define ARCH_GET_UNTAG_MASK		0x4001
 #define ARCH_ENABLE_TAGGED_ADDR		0x4002
 #define ARCH_GET_MAX_TAG_BITS		0x4003
+#define ARCH_FORCE_TAGGED_SVA		0x4004
 
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1b41c60ebf6e..0831d2be190f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -759,6 +759,10 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	if (current->mm != mm)
 		return -EINVAL;
 
+	if (mm_valid_pasid(mm) &&
+	    !test_bit(MM_CONTEXT_FORCE_TAGGED_SVA, &mm->context.flags))
+		return -EINTR;
+
 	if (mmap_write_lock_killable(mm))
 		return -EINTR;
 
@@ -882,6 +886,9 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 				(unsigned long __user *)arg2);
 	case ARCH_ENABLE_TAGGED_ADDR:
 		return prctl_enable_tagged_addr(task->mm, arg2);
+	case ARCH_FORCE_TAGGED_SVA:
+		set_bit(MM_CONTEXT_FORCE_TAGGED_SVA, &task->mm->context.flags);
+		return 0;
 	case ARCH_GET_MAX_TAG_BITS:
 		if (!cpu_feature_enabled(X86_FEATURE_LAM))
 			return put_user(0, (unsigned long __user *)arg2);
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 4ee2929f0d7a..dd76a1a09cf7 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -2,6 +2,7 @@
 /*
  * Helpers for IOMMU drivers implementing SVA
  */
+#include <linux/mmu_context.h>
 #include <linux/mutex.h>
 #include <linux/sched/mm.h>
 #include <linux/iommu.h>
@@ -32,6 +33,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
 	    min == 0 || max < min)
 		return -EINVAL;
 
+	if (!arch_pgtable_dma_compat(mm))
+		return -EBUSY;
+
 	mutex_lock(&iommu_sva_lock);
 	/* Is a PASID already associated with this mm? */
 	if (mm_valid_pasid(mm)) {
diff --git a/include/linux/mmu_context.h b/include/linux/mmu_context.h
index 14b9c1fa05c4..f2b7a3f04099 100644
--- a/include/linux/mmu_context.h
+++ b/include/linux/mmu_context.h
@@ -35,4 +35,11 @@ static inline unsigned long mm_untag_mask(struct mm_struct *mm)
 }
 #endif
 
+#ifndef arch_pgtable_dma_compat
+static inline bool arch_pgtable_dma_compat(struct mm_struct *mm)
+{
+	return true;
+}
+#endif
+
 #endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Weihong Zhang,
	Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM is supported only in 64-bit mode and applies only addresses used for data
accesses. In 64-bit mode, linear address have 64 bits. LAM is applied to 64-bit
linear address and allow software to use high bits for metadata.
LAM supports configurations that differ regarding which pointer bits are masked
and can be used for metadata.

LAM includes following mode:

 - LAM_U57, pointer bits in positions 62:57 are masked (LAM width 6),
   allows bits 62:57 of a user pointer to be used as metadata.

There are some arch_prctls:
ARCH_ENABLE_TAGGED_ADDR: enable LAM mode, mask high bits of a user pointer.
ARCH_GET_UNTAG_MASK: get current untagged mask.
ARCH_GET_MAX_TAG_BITS: the maximum tag bits user can request. zero if LAM
is not supported.

The LAM mode is for pre-process, a process has only one chance to set LAM mode.
But there is no API to disable LAM mode. So all of test cases are run under
child process.

Functions of this test:

MALLOC

 - LAM_U57 masks bits 57:62 of a user pointer. Process on user space
   can dereference such pointers.

 - Disable LAM, dereference a pointer with metadata above 48 bit or 57 bit
   lead to trigger SIGSEGV.

TAG_BITS

 - Max tag bits of LAM_U57 is 6.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/Makefile |   2 +-
 tools/testing/selftests/x86/lam.c    | 326 +++++++++++++++++++++++++++
 2 files changed, 327 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/lam.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 0388c4d60af0..c1a16a9d4f2f 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
 TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
-			corrupt_xstate_header amx
+			corrupt_xstate_header amx lam
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
new file mode 100644
index 000000000000..268c1d2749af
--- /dev/null
+++ b/tools/testing/selftests/x86/lam.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <inttypes.h>
+
+#include "../kselftest.h"
+
+#ifndef __x86_64__
+# error This test is 64-bit only
+#endif
+
+/* LAM modes, these definitions were copied from kernel code */
+#define LAM_NONE                0
+#define LAM_U57_BITS            6
+
+#define LAM_U57_MASK            (0x3fULL << 57)
+/* arch prctl for LAM */
+#define ARCH_GET_UNTAG_MASK     0x4001
+#define ARCH_ENABLE_TAGGED_ADDR 0x4002
+#define ARCH_GET_MAX_TAG_BITS   0x4003
+
+/* Specified test function bits */
+#define FUNC_MALLOC             0x1
+#define FUNC_BITS               0x2
+
+#define TEST_MASK               0x3
+
+#define MALLOC_LEN              32
+
+struct testcases {
+	unsigned int later;
+	int expected; /* 2: SIGSEGV Error; 1: other errors */
+	unsigned long lam;
+	uint64_t addr;
+	int (*test_func)(struct testcases *test);
+	const char *msg;
+};
+
+int tests_cnt;
+jmp_buf segv_env;
+
+static void segv_handler(int sig)
+{
+	ksft_print_msg("Get segmentation fault(%d).", sig);
+	siglongjmp(segv_env, 1);
+}
+
+static inline int cpu_has_lam(void)
+{
+	unsigned int cpuinfo[4];
+
+	__cpuid_count(0x7, 1, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
+
+	return (cpuinfo[0] & (1 << 26));
+}
+
+/*
+ * Set tagged address and read back untag mask.
+ * check if the untagged mask is expected.
+ *
+ * @return:
+ * 0: Set LAM mode successfully
+ * others: failed to set LAM
+ */
+static int set_lam(unsigned long lam)
+{
+	int ret = 0;
+	uint64_t ptr = 0;
+
+	if (lam != LAM_U57_BITS && lam != LAM_NONE)
+		return -1;
+
+	/* Skip check return */
+	syscall(SYS_arch_prctl, ARCH_ENABLE_TAGGED_ADDR, lam);
+
+	/* Get untagged mask */
+	syscall(SYS_arch_prctl, ARCH_GET_UNTAG_MASK, &ptr);
+
+	/* Check mask returned is expected */
+	if (lam == LAM_U57_BITS)
+		ret = (ptr != ~(LAM_U57_MASK));
+	else if (lam == LAM_NONE)
+		ret = (ptr != -1ULL);
+
+	return ret;
+}
+
+static unsigned long get_default_tag_bits(void)
+{
+	pid_t pid;
+	int lam = LAM_NONE;
+	int ret = 0;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+	} else if (pid == 0) {
+		/* Set LAM mode in child process */
+		if (set_lam(LAM_U57_BITS) == 0)
+			lam = LAM_U57_BITS;
+		else
+			lam = LAM_NONE;
+		exit(lam);
+	} else {
+		wait(&ret);
+		lam = WEXITSTATUS(ret);
+	}
+
+	return lam;
+}
+
+/* According to LAM mode, set metadata in high bits */
+static uint64_t set_metadata(uint64_t src, unsigned long lam)
+{
+	uint64_t metadata;
+
+	srand(time(NULL));
+
+	switch (lam) {
+	case LAM_U57_BITS: /* Set metadata in bits 62:57 */
+		/* Get a random non-zero value as metadata */
+		metadata = (rand() % ((1UL << LAM_U57_BITS) - 1) + 1) << 57;
+		metadata |= (src & ~(LAM_U57_MASK));
+		break;
+	default:
+		metadata = src;
+		break;
+	}
+
+	return metadata;
+}
+
+/*
+ * Set metadata in user pointer, compare new pointer with original pointer.
+ * both pointers should point to the same address.
+ *
+ * @return:
+ * 0: value on the pointer with metadate and value on original are same
+ * 1: not same.
+ */
+static int handle_lam_test(void *src, unsigned int lam)
+{
+	char *ptr;
+
+	strcpy((char *)src, "USER POINTER");
+
+	ptr = (char *)set_metadata((uint64_t)src, lam);
+	if (src == ptr)
+		return 0;
+
+	/* Copy a string into the pointer with metadata */
+	strcpy((char *)ptr, "METADATA POINTER");
+
+	return (!!strcmp((char *)src, (char *)ptr));
+}
+
+
+int handle_max_bits(struct testcases *test)
+{
+	unsigned long exp_bits = get_default_tag_bits();
+	unsigned long bits = 0;
+
+	if (exp_bits != LAM_NONE)
+		exp_bits = LAM_U57_BITS;
+
+	/* Get LAM max tag bits */
+	if (syscall(SYS_arch_prctl, ARCH_GET_MAX_TAG_BITS, &bits) == -1)
+		return 1;
+
+	return (exp_bits != bits);
+}
+
+/*
+ * Test lam feature through dereference pointer get from malloc.
+ * @return 0: Pass test. 1: Get failure during test 2: Get SIGSEGV
+ */
+static int handle_malloc(struct testcases *test)
+{
+	char *ptr = NULL;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) == -1)
+			return 1;
+
+	ptr = (char *)malloc(MALLOC_LEN);
+	if (ptr == NULL) {
+		perror("malloc() failure\n");
+		return 1;
+	}
+
+	/* Set signal handler */
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		ret = handle_lam_test(ptr, test->lam);
+	} else {
+		ret = 2;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) == -1 && ret == 0)
+			ret = 1;
+
+	free(ptr);
+
+	return ret;
+}
+
+static int fork_test(struct testcases *test)
+{
+	int ret, child_ret;
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		ret = 1;
+	} else if (pid == 0) {
+		ret = test->test_func(test);
+		exit(ret);
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+	}
+
+	return ret;
+}
+
+static void run_test(struct testcases *test, int count)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < count; i++) {
+		struct testcases *t = test + i;
+
+		/* fork a process to run test case */
+		ret = fork_test(t);
+		if (ret != 0)
+			ret = (t->expected == ret);
+		else
+			ret = !(t->expected);
+
+		tests_cnt++;
+		ksft_test_result(ret, t->msg);
+	}
+}
+
+static struct testcases malloc_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_malloc,
+		.msg = "MALLOC: LAM_U57. Dereferencing pointer with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 2,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_malloc,
+		.msg = "MALLOC:[Negative] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
+
+static struct testcases bits_cases[] = {
+	{
+		.test_func = handle_max_bits,
+		.msg = "BITS: Check default tag bits\n",
+	},
+};
+
+static void cmd_help(void)
+{
+	printf("usage: lam [-h] [-t test list]\n");
+	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
+	printf("\t\t0x1:malloc; 0x2:max_bits;\n");
+	printf("\t-h: help\n");
+}
+
+int main(int argc, char **argv)
+{
+	int c = 0;
+	unsigned int tests = TEST_MASK;
+
+	tests_cnt = 0;
+
+	if (!cpu_has_lam()) {
+		ksft_print_msg("Unsupported LAM feature!\n");
+		return -1;
+	}
+
+	while ((c = getopt(argc, argv, "ht:")) != -1) {
+		switch (c) {
+		case 't':
+			tests = strtoul(optarg, NULL, 16);
+			if (!(tests & TEST_MASK)) {
+				ksft_print_msg("Invalid argument!\n");
+				return -1;
+			}
+			break;
+		case 'h':
+			cmd_help();
+			return 0;
+		default:
+			ksft_print_msg("Invalid argument\n");
+			return -1;
+		}
+	}
+
+	if (tests & FUNC_MALLOC)
+		run_test(malloc_cases, ARRAY_SIZE(malloc_cases));
+
+	if (tests & FUNC_BITS)
+		run_test(bits_cases, ARRAY_SIZE(bits_cases));
+
+	ksft_set_plan(tests_cnt);
+
+	return ksft_exit_pass();
+}
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Weihong Zhang,
	Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

Add mmap and SYSCALL test cases.

SYSCALL test cases:

 - LAM supports set metadata in high bits 62:57 (LAM_U57) of a user pointer, pass
   the pointer to SYSCALL, SYSCALL can dereference the pointer and return correct
   result.

 - Disable LAM, pass a pointer with metadata in high bits to SYSCALL,
   SYSCALL returns -1 (EFAULT).

MMAP test cases:

 - Enable LAM_U57, MMAP with low address (below bits 47), set metadata
   in high bits of the address, dereference the address should be
   allowed.

 - Enable LAM_U57, MMAP with high address (above bits 47), set metadata
   in high bits of the address, dereference the address should be
   allowed.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 144 +++++++++++++++++++++++++++++-
 1 file changed, 140 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 268c1d2749af..39ebfc511685 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -7,6 +7,7 @@
 #include <signal.h>
 #include <setjmp.h>
 #include <sys/mman.h>
+#include <sys/utsname.h>
 #include <sys/wait.h>
 #include <inttypes.h>
 
@@ -29,11 +30,18 @@
 /* Specified test function bits */
 #define FUNC_MALLOC             0x1
 #define FUNC_BITS               0x2
+#define FUNC_MMAP               0x4
+#define FUNC_SYSCALL            0x8
 
-#define TEST_MASK               0x3
+#define TEST_MASK               0xf
+
+#define LOW_ADDR                (0x1UL << 30)
+#define HIGH_ADDR               (0x3UL << 48)
 
 #define MALLOC_LEN              32
 
+#define PAGE_SIZE               (4 << 10)
+
 struct testcases {
 	unsigned int later;
 	int expected; /* 2: SIGSEGV Error; 1: other errors */
@@ -49,6 +57,7 @@ jmp_buf segv_env;
 static void segv_handler(int sig)
 {
 	ksft_print_msg("Get segmentation fault(%d).", sig);
+
 	siglongjmp(segv_env, 1);
 }
 
@@ -61,6 +70,16 @@ static inline int cpu_has_lam(void)
 	return (cpuinfo[0] & (1 << 26));
 }
 
+/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] */
+static inline int cpu_has_la57(void)
+{
+	unsigned int cpuinfo[4];
+
+	__cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
+
+	return (cpuinfo[2] & (1 << 16));
+}
+
 /*
  * Set tagged address and read back untag mask.
  * check if the untagged mask is expected.
@@ -213,6 +232,68 @@ static int handle_malloc(struct testcases *test)
 	return ret;
 }
 
+static int handle_mmap(struct testcases *test)
+{
+	void *ptr;
+	unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	ptr = mmap((void *)test->addr, PAGE_SIZE, PROT_READ | PROT_WRITE,
+		   flags, -1, 0);
+	if (ptr == MAP_FAILED) {
+		if (test->addr == HIGH_ADDR)
+			if (!cpu_has_la57())
+				return 3; /* unsupport LA57 */
+		return 1;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			ret = 1;
+
+	if (ret == 0) {
+		if (sigsetjmp(segv_env, 1) == 0) {
+			signal(SIGSEGV, segv_handler);
+			ret = handle_lam_test(ptr, test->lam);
+		} else {
+			ret = 2;
+		}
+	}
+
+	munmap(ptr, PAGE_SIZE);
+	return ret;
+}
+
+static int handle_syscall(struct testcases *test)
+{
+	struct utsname unme, *pu;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		pu = (struct utsname *)set_metadata((uint64_t)&unme, test->lam);
+		ret = uname(pu);
+		if (ret < 0)
+			ret = 1;
+	} else {
+		ret = 2;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) != -1 && ret == 0)
+			ret = 1;
+
+	return ret;
+}
+
 static int fork_test(struct testcases *test)
 {
 	int ret, child_ret;
@@ -241,13 +322,20 @@ static void run_test(struct testcases *test, int count)
 		struct testcases *t = test + i;
 
 		/* fork a process to run test case */
+		tests_cnt++;
 		ret = fork_test(t);
+
+		/* return 3 is not support LA57, the case should be skipped */
+		if (ret == 3) {
+			ksft_test_result_skip(t->msg);
+			continue;
+		}
+
 		if (ret != 0)
 			ret = (t->expected == ret);
 		else
 			ret = !(t->expected);
 
-		tests_cnt++;
 		ksft_test_result(ret, t->msg);
 	}
 }
@@ -268,7 +356,6 @@ static struct testcases malloc_cases[] = {
 	},
 };
 
-
 static struct testcases bits_cases[] = {
 	{
 		.test_func = handle_max_bits,
@@ -276,11 +363,54 @@ static struct testcases bits_cases[] = {
 	},
 };
 
+static struct testcases syscall_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_syscall,
+		.msg = "SYSCALL: LAM_U57. syscall with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_syscall,
+		.msg = "SYSCALL:[Negative] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
+static struct testcases mmap_cases[] = {
+	{
+		.later = 1,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First mmap high address, then set LAM_U57.\n",
+	},
+	{
+		.later = 0,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First LAM_U57, then High address.\n",
+	},
+	{
+		.later = 0,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = LOW_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First LAM_U57, then Low address.\n",
+	},
+};
+
 static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc; 0x2:max_bits;\n");
+	printf("\t\t0x1:malloc; 0x2:max_bits; 0x4:mmap; 0x8:syscall.\n");
 	printf("\t-h: help\n");
 }
 
@@ -320,6 +450,12 @@ int main(int argc, char **argv)
 	if (tests & FUNC_BITS)
 		run_test(bits_cases, ARRAY_SIZE(bits_cases));
 
+	if (tests & FUNC_MMAP)
+		run_test(mmap_cases, ARRAY_SIZE(mmap_cases));
+
+	if (tests & FUNC_SYSCALL)
+		run_test(syscall_cases, ARRAY_SIZE(syscall_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 14/17] selftests/x86/lam: Add io_uring test cases for linear-address masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Weihong Zhang,
	Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM should be supported in kernel thread, using io_uring to verify LAM feature.
The test cases implement read a file through io_uring, the test cases choose an
iovec array as receiving buffer, which used to receive data, according to LAM
mode, set metadata in high bits of these buffer.

io_uring can deal with these buffers that pointed to pointers with the metadata
in high bits.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 341 +++++++++++++++++++++++++++++-
 1 file changed, 339 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 39ebfc511685..52750ebd0887 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -9,8 +9,12 @@
 #include <sys/mman.h>
 #include <sys/utsname.h>
 #include <sys/wait.h>
+#include <sys/stat.h>
+#include <fcntl.h>
 #include <inttypes.h>
 
+#include <sys/uio.h>
+#include <linux/io_uring.h>
 #include "../kselftest.h"
 
 #ifndef __x86_64__
@@ -32,8 +36,9 @@
 #define FUNC_BITS               0x2
 #define FUNC_MMAP               0x4
 #define FUNC_SYSCALL            0x8
+#define FUNC_URING              0x10
 
-#define TEST_MASK               0xf
+#define TEST_MASK               0x1f
 
 #define LOW_ADDR                (0x1UL << 30)
 #define HIGH_ADDR               (0x3UL << 48)
@@ -42,6 +47,13 @@
 
 #define PAGE_SIZE               (4 << 10)
 
+#define barrier() ({						\
+		   __asm__ __volatile__("" : : : "memory");	\
+})
+
+#define URING_QUEUE_SZ 1
+#define URING_BLOCK_SZ 2048
+
 struct testcases {
 	unsigned int later;
 	int expected; /* 2: SIGSEGV Error; 1: other errors */
@@ -51,6 +63,33 @@ struct testcases {
 	const char *msg;
 };
 
+/* Used by CQ of uring, source file handler and file's size */
+struct file_io {
+	int file_fd;
+	off_t file_sz;
+	struct iovec iovecs[];
+};
+
+struct io_uring_queue {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	unsigned int *flags;
+	unsigned int *array;
+	union {
+		struct io_uring_cqe *cqes;
+		struct io_uring_sqe *sqes;
+	} queue;
+	size_t ring_sz;
+};
+
+struct io_ring {
+	int ring_fd;
+	struct io_uring_queue sq_ring;
+	struct io_uring_queue cq_ring;
+};
+
 int tests_cnt;
 jmp_buf segv_env;
 
@@ -294,6 +333,285 @@ static int handle_syscall(struct testcases *test)
 	return ret;
 }
 
+int sys_uring_setup(unsigned int entries, struct io_uring_params *p)
+{
+	return (int)syscall(__NR_io_uring_setup, entries, p);
+}
+
+int sys_uring_enter(int fd, unsigned int to, unsigned int min, unsigned int flags)
+{
+	return (int)syscall(__NR_io_uring_enter, fd, to, min, flags, NULL, 0);
+}
+
+/* Init submission queue and completion queue */
+int mmap_io_uring(struct io_uring_params p, struct io_ring *s)
+{
+	struct io_uring_queue *sring = &s->sq_ring;
+	struct io_uring_queue *cring = &s->cq_ring;
+
+	sring->ring_sz = p.sq_off.array + p.sq_entries * sizeof(unsigned int);
+	cring->ring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe);
+
+	if (p.features & IORING_FEAT_SINGLE_MMAP) {
+		if (cring->ring_sz > sring->ring_sz)
+			sring->ring_sz = cring->ring_sz;
+
+		cring->ring_sz = sring->ring_sz;
+	}
+
+	void *sq_ptr = mmap(0, sring->ring_sz, PROT_READ | PROT_WRITE,
+			    MAP_SHARED | MAP_POPULATE, s->ring_fd,
+			    IORING_OFF_SQ_RING);
+
+	if (sq_ptr == MAP_FAILED) {
+		perror("sub-queue!");
+		return 1;
+	}
+
+	void *cq_ptr = sq_ptr;
+
+	if (!(p.features & IORING_FEAT_SINGLE_MMAP)) {
+		cq_ptr = mmap(0, cring->ring_sz, PROT_READ | PROT_WRITE,
+			      MAP_SHARED | MAP_POPULATE, s->ring_fd,
+			      IORING_OFF_CQ_RING);
+		if (cq_ptr == MAP_FAILED) {
+			perror("cpl-queue!");
+			munmap(sq_ptr, sring->ring_sz);
+			return 1;
+		}
+	}
+
+	sring->head = sq_ptr + p.sq_off.head;
+	sring->tail = sq_ptr + p.sq_off.tail;
+	sring->ring_mask = sq_ptr + p.sq_off.ring_mask;
+	sring->ring_entries = sq_ptr + p.sq_off.ring_entries;
+	sring->flags = sq_ptr + p.sq_off.flags;
+	sring->array = sq_ptr + p.sq_off.array;
+
+	/* Map a queue as mem map */
+	s->sq_ring.queue.sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
+				     PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+				     s->ring_fd, IORING_OFF_SQES);
+	if (s->sq_ring.queue.sqes == MAP_FAILED) {
+		munmap(sq_ptr, sring->ring_sz);
+		if (sq_ptr != cq_ptr) {
+			ksft_print_msg("failed to mmap uring queue!");
+			munmap(cq_ptr, cring->ring_sz);
+			return 1;
+		}
+	}
+
+	cring->head = cq_ptr + p.cq_off.head;
+	cring->tail = cq_ptr + p.cq_off.tail;
+	cring->ring_mask = cq_ptr + p.cq_off.ring_mask;
+	cring->ring_entries = cq_ptr + p.cq_off.ring_entries;
+	cring->queue.cqes = cq_ptr + p.cq_off.cqes;
+
+	return 0;
+}
+
+/* Init io_uring queues */
+int setup_io_uring(struct io_ring *s)
+{
+	struct io_uring_params para;
+
+	memset(&para, 0, sizeof(para));
+	s->ring_fd = sys_uring_setup(URING_QUEUE_SZ, &para);
+	if (s->ring_fd < 0)
+		return 1;
+
+	return mmap_io_uring(para, s);
+}
+
+/*
+ * Get data from completion queue. the data buffer saved the file data
+ * return 0: success; others: error;
+ */
+int handle_uring_cq(struct io_ring *s)
+{
+	struct file_io *fi = NULL;
+	struct io_uring_queue *cring = &s->cq_ring;
+	struct io_uring_cqe *cqe;
+	unsigned int head;
+	off_t len = 0;
+
+	head = *cring->head;
+
+	do {
+		barrier();
+		if (head == *cring->tail)
+			break;
+		/* Get the entry */
+		cqe = &cring->queue.cqes[head & *s->cq_ring.ring_mask];
+		fi = (struct file_io *)cqe->user_data;
+		if (cqe->res < 0)
+			break;
+
+		int blocks = (int)(fi->file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+		for (int i = 0; i < blocks; i++)
+			len += fi->iovecs[i].iov_len;
+
+		head++;
+	} while (1);
+
+	*cring->head = head;
+	barrier();
+
+	return (len != fi->file_sz);
+}
+
+/*
+ * Submit squeue. specify via IORING_OP_READV.
+ * the buffer need to be set metadata according to LAM mode
+ */
+int handle_uring_sq(struct io_ring *ring, struct file_io *fi, unsigned long lam)
+{
+	int file_fd = fi->file_fd;
+	struct io_uring_queue *sring = &ring->sq_ring;
+	unsigned int index = 0, cur_block = 0, tail = 0, next_tail = 0;
+	struct io_uring_sqe *sqe;
+
+	off_t remain = fi->file_sz;
+	int blocks = (int)(remain + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+	while (remain) {
+		off_t bytes = remain;
+		void *buf;
+
+		if (bytes > URING_BLOCK_SZ)
+			bytes = URING_BLOCK_SZ;
+
+		fi->iovecs[cur_block].iov_len = bytes;
+
+		if (posix_memalign(&buf, URING_BLOCK_SZ, URING_BLOCK_SZ))
+			return 1;
+
+		fi->iovecs[cur_block].iov_base = (void *)set_metadata((uint64_t)buf, lam);
+		remain -= bytes;
+		cur_block++;
+	}
+
+	next_tail = *sring->tail;
+	tail = next_tail;
+	next_tail++;
+
+	barrier();
+
+	index = tail & *ring->sq_ring.ring_mask;
+
+	sqe = &ring->sq_ring.queue.sqes[index];
+	sqe->fd = file_fd;
+	sqe->flags = 0;
+	sqe->opcode = IORING_OP_READV;
+	sqe->addr = (unsigned long)fi->iovecs;
+	sqe->len = blocks;
+	sqe->off = 0;
+	sqe->user_data = (uint64_t)fi;
+
+	sring->array[index] = index;
+	tail = next_tail;
+
+	if (*sring->tail != tail) {
+		*sring->tail = tail;
+		barrier();
+	}
+
+	if (sys_uring_enter(ring->ring_fd, 1, 1, IORING_ENTER_GETEVENTS) < 0)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * Test LAM in async I/O and io_uring, read current binery through io_uring
+ * Set metadata in pointers to iovecs buffer.
+ */
+int do_uring(unsigned long lam)
+{
+	struct io_ring *ring;
+	struct file_io *fi;
+	struct stat st;
+	int ret = 1;
+	char path[PATH_MAX];
+
+	/* get current process path */
+	if (readlink("/proc/self/exe", path, PATH_MAX) <= 0)
+		return 1;
+
+	int file_fd = open(path, O_RDONLY);
+
+	if (file_fd < 0)
+		return 1;
+
+	if (fstat(file_fd, &st) < 0)
+		return 1;
+
+	off_t file_sz = st.st_size;
+
+	int blocks = (int)(file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+	fi = malloc(sizeof(*fi) + sizeof(struct iovec) * blocks);
+	if (!fi)
+		return 1;
+
+	fi->file_sz = file_sz;
+	fi->file_fd = file_fd;
+
+	ring = malloc(sizeof(*ring));
+	if (!ring)
+		return 1;
+
+	memset(ring, 0, sizeof(struct io_ring));
+
+	if (setup_io_uring(ring))
+		goto out;
+
+	if (handle_uring_sq(ring, fi, lam))
+		goto out;
+
+	ret = handle_uring_cq(ring);
+
+out:
+	free(ring);
+
+	for (int i = 0; i < blocks; i++) {
+		if (fi->iovecs[i].iov_base) {
+			uint64_t addr = ((uint64_t)fi->iovecs[i].iov_base);
+
+			switch (lam) {
+			case LAM_U57_BITS: /* Clear bits 62:57 */
+				addr = (addr & ~(0x3fULL << 57));
+				break;
+			}
+			free((void *)addr);
+			fi->iovecs[i].iov_base = NULL;
+		}
+	}
+
+	free(fi);
+
+	return ret;
+}
+
+int handle_uring(struct testcases *test)
+{
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		ret = do_uring(test->lam);
+	} else {
+		ret = 2;
+	}
+
+	return ret;
+}
+
 static int fork_test(struct testcases *test)
 {
 	int ret, child_ret;
@@ -340,6 +658,22 @@ static void run_test(struct testcases *test, int count)
 	}
 }
 
+static struct testcases uring_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_uring,
+		.msg = "URING: LAM_U57. Dereferencing pointer with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_uring,
+		.msg = "URING:[Negative] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
 static struct testcases malloc_cases[] = {
 	{
 		.later = 0,
@@ -410,7 +744,7 @@ static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc; 0x2:max_bits; 0x4:mmap; 0x8:syscall.\n");
+	printf("\t\t0x1:malloc; 0x2:max_bits; 0x4:mmap; 0x8:syscall; 0x10:io_uring.\n");
 	printf("\t-h: help\n");
 }
 
@@ -456,6 +790,9 @@ int main(int argc, char **argv)
 	if (tests & FUNC_SYSCALL)
 		run_test(syscall_cases, ARRAY_SIZE(syscall_cases));
 
+	if (tests & FUNC_URING)
+		run_test(uring_cases, ARRAY_SIZE(uring_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 15/17] selftests/x86/lam: Add inherit test cases for linear-address masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (13 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Weihong Zhang,
	Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM is enabled per-thread and gets inherited on fork(2)/clone(2). exec()
reverts LAM status to the default disabled state.

There are two test scenarios:

 - Fork test cases:

   These cases were used to test the inheritance of LAM for per-thread,
   Child process generated by fork() should inherit LAM feature from
   parent process, Child process can get the LAM mode same as parent
   process.

 - Execve test cases:

   Processes generated by execve() are different from processes
   generated by fork(), these processes revert LAM status to disabled
   status.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 125 +++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 52750ebd0887..ebabd4333b7d 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -37,8 +37,9 @@
 #define FUNC_MMAP               0x4
 #define FUNC_SYSCALL            0x8
 #define FUNC_URING              0x10
+#define FUNC_INHERITE           0x20
 
-#define TEST_MASK               0x1f
+#define TEST_MASK               0x3f
 
 #define LOW_ADDR                (0x1UL << 30)
 #define HIGH_ADDR               (0x3UL << 48)
@@ -174,6 +175,28 @@ static unsigned long get_default_tag_bits(void)
 	return lam;
 }
 
+/*
+ * Set tagged address and read back untag mask.
+ * check if the untag mask is expected.
+ */
+static int get_lam(void)
+{
+	uint64_t ptr = 0;
+	int ret = -1;
+	/* Get untagged mask */
+	if (syscall(SYS_arch_prctl, ARCH_GET_UNTAG_MASK, &ptr) == -1)
+		return -1;
+
+	/* Check mask returned is expected */
+	if (ptr == ~(LAM_U57_MASK))
+		ret = LAM_U57_BITS;
+	else if (ptr == -1ULL)
+		ret = LAM_NONE;
+
+
+	return ret;
+}
+
 /* According to LAM mode, set metadata in high bits */
 static uint64_t set_metadata(uint64_t src, unsigned long lam)
 {
@@ -581,7 +604,7 @@ int do_uring(unsigned long lam)
 
 			switch (lam) {
 			case LAM_U57_BITS: /* Clear bits 62:57 */
-				addr = (addr & ~(0x3fULL << 57));
+				addr = (addr & ~(LAM_U57_MASK));
 				break;
 			}
 			free((void *)addr);
@@ -632,6 +655,72 @@ static int fork_test(struct testcases *test)
 	return ret;
 }
 
+static int handle_execve(struct testcases *test)
+{
+	int ret, child_ret;
+	int lam = test->lam;
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		ret = 1;
+	} else if (pid == 0) {
+		char path[PATH_MAX];
+
+		/* Set LAM mode in parent process */
+		if (set_lam(lam) != 0)
+			return 1;
+
+		/* Get current binary's path and the binary was run by execve */
+		if (readlink("/proc/self/exe", path, PATH_MAX) <= 0)
+			exit(-1);
+
+		/* run binary to get LAM mode and return to parent process */
+		if (execlp(path, path, "-t 0x0", NULL) < 0) {
+			perror("error on exec");
+			exit(-1);
+		}
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+		if (ret != LAM_NONE)
+			return 1;
+	}
+
+	return 0;
+}
+
+static int handle_inheritance(struct testcases *test)
+{
+	int ret, child_ret;
+	int lam = test->lam;
+	pid_t pid;
+
+	/* Set LAM mode in parent process */
+	if (set_lam(lam) != 0)
+		return 1;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		return 1;
+	} else if (pid == 0) {
+		/* Set LAM mode in parent process */
+		int child_lam = get_lam();
+
+		exit(child_lam);
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+
+		if (lam != ret)
+			return 1;
+	}
+
+	return 0;
+}
+
 static void run_test(struct testcases *test, int count)
 {
 	int i, ret = 0;
@@ -740,11 +829,26 @@ static struct testcases mmap_cases[] = {
 	},
 };
 
+static struct testcases inheritance_cases[] = {
+	{
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_inheritance,
+		.msg = "FORK: LAM_U57, child process should get LAM mode same as parent\n",
+	},
+	{
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_execve,
+		.msg = "EXECVE: LAM_U57, child process should get disabled LAM mode\n",
+	},
+};
+
 static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc; 0x2:max_bits; 0x4:mmap; 0x8:syscall; 0x10:io_uring.\n");
+	printf("\t\t0x1:malloc; 0x2:max_bits; 0x4:mmap; 0x8:syscall; 0x10:io_uring; 0x20:inherit;\n");
 	printf("\t-h: help\n");
 }
 
@@ -764,7 +868,7 @@ int main(int argc, char **argv)
 		switch (c) {
 		case 't':
 			tests = strtoul(optarg, NULL, 16);
-			if (!(tests & TEST_MASK)) {
+			if (tests && !(tests & TEST_MASK)) {
 				ksft_print_msg("Invalid argument!\n");
 				return -1;
 			}
@@ -778,6 +882,16 @@ int main(int argc, char **argv)
 		}
 	}
 
+	/*
+	 * When tests is 0, it is not a real test case;
+	 * the option used by test case(execve) to check the lam mode in
+	 * process generated by execve, the process read back lam mode and
+	 * check with lam mode in parent process.
+	 */
+	if (!tests)
+		return (get_lam());
+
+	/* Run test cases */
 	if (tests & FUNC_MALLOC)
 		run_test(malloc_cases, ARRAY_SIZE(malloc_cases));
 
@@ -793,6 +907,9 @@ int main(int argc, char **argv)
 	if (tests & FUNC_URING)
 		run_test(uring_cases, ARRAY_SIZE(uring_cases));
 
+	if (tests & FUNC_INHERITE)
+		run_test(inheritance_cases, ARRAY_SIZE(inheritance_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (14 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
  2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Weihong Zhang,
	Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

By default do not allow to enable both LAM and use SVA in the same
process.
The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation.

Add new test cases for the new arch_prctl:
Before using ARCH_FORCE_TAGGED_SVA, should not allow to enable LAM/SVA
coexisting. the test cases should be negative.

The test depands on idxd driver and iommu. before test, need add
"intel_iommu=on,sm_on" in kernel command line and insmod idxd driver.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 237 +++++++++++++++++++++++++++++-
 1 file changed, 235 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index ebabd4333b7d..a8c91829b616 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -30,6 +30,7 @@
 #define ARCH_GET_UNTAG_MASK     0x4001
 #define ARCH_ENABLE_TAGGED_ADDR 0x4002
 #define ARCH_GET_MAX_TAG_BITS   0x4003
+#define ARCH_FORCE_TAGGED_SVA	0x4004
 
 /* Specified test function bits */
 #define FUNC_MALLOC             0x1
@@ -38,8 +39,9 @@
 #define FUNC_SYSCALL            0x8
 #define FUNC_URING              0x10
 #define FUNC_INHERITE           0x20
+#define FUNC_PASID              0x40
 
-#define TEST_MASK               0x3f
+#define TEST_MASK               0x7f
 
 #define LOW_ADDR                (0x1UL << 30)
 #define HIGH_ADDR               (0x3UL << 48)
@@ -55,11 +57,19 @@
 #define URING_QUEUE_SZ 1
 #define URING_BLOCK_SZ 2048
 
+/* Pasid test define */
+#define LAM_CMD_BIT 0x1
+#define PAS_CMD_BIT 0x2
+#define SVA_CMD_BIT 0x4
+
+#define PAS_CMD(cmd1, cmd2, cmd3) (((cmd3) << 8) | ((cmd2) << 4) | ((cmd1) << 0))
+
 struct testcases {
 	unsigned int later;
 	int expected; /* 2: SIGSEGV Error; 1: other errors */
 	unsigned long lam;
 	uint64_t addr;
+	uint64_t cmd;
 	int (*test_func)(struct testcases *test);
 	const char *msg;
 };
@@ -556,7 +566,7 @@ int do_uring(unsigned long lam)
 	struct file_io *fi;
 	struct stat st;
 	int ret = 1;
-	char path[PATH_MAX];
+	char path[PATH_MAX] = {0};
 
 	/* get current process path */
 	if (readlink("/proc/self/exe", path, PATH_MAX) <= 0)
@@ -852,6 +862,226 @@ static void cmd_help(void)
 	printf("\t-h: help\n");
 }
 
+/* Check for file existence */
+uint8_t file_Exists(const char *fileName)
+{
+	struct stat buffer;
+
+	uint8_t ret = (stat(fileName, &buffer) == 0);
+
+	return ret;
+}
+
+/* Sysfs idxd files */
+const char *dsa_configs[] = {
+	"echo 1 > /sys/bus/dsa/devices/dsa0/wq0.1/group_id",
+	"echo shared > /sys/bus/dsa/devices/dsa0/wq0.1/mode",
+	"echo 10 > /sys/bus/dsa/devices/dsa0/wq0.1/priority",
+	"echo 16 > /sys/bus/dsa/devices/dsa0/wq0.1/size",
+	"echo 15 > /sys/bus/dsa/devices/dsa0/wq0.1/threshold",
+	"echo user > /sys/bus/dsa/devices/dsa0/wq0.1/type",
+	"echo MyApp1 > /sys/bus/dsa/devices/dsa0/wq0.1/name",
+	"echo 1 > /sys/bus/dsa/devices/dsa0/engine0.1/group_id",
+	"echo dsa0 > /sys/bus/dsa/drivers/idxd/bind",
+	/* bind files and devices, generated a device file in /dev */
+	"echo wq0.1 > /sys/bus/dsa/drivers/user/bind",
+};
+
+/* DSA device file */
+const char *dsaDeviceFile = "/dev/dsa/wq0.1";
+/* file for io*/
+const char *dsaPasidEnable = "/sys/bus/dsa/devices/dsa0/pasid_enabled";
+
+/*
+ * DSA depends on kernel cmdline "intel_iommu=on,sm_on"
+ * return pasid_enabled (0: disable 1:enable)
+ */
+int Check_DSA_Kernel_Setting(void)
+{
+	char command[256] = "";
+	char buf[256] = "";
+	char *ptr;
+	int rv = -1;
+
+	snprintf(command, sizeof(command) - 1, "cat %s", dsaPasidEnable);
+
+	FILE *cmd = popen(command, "r");
+
+	if (cmd) {
+		while (fgets(buf, sizeof(buf) - 1, cmd) != NULL);
+
+		pclose(cmd);
+		rv = strtol(buf, &ptr, 16);
+	}
+
+	return rv;
+}
+
+/*
+ * Config DSA's sysfs files as shared DSA's WQ.
+ * Generated a device file /dev/dsa/wq0.1
+ * Return:  0 OK; 1 Failed; 3 Skip(SVA disabled).
+ */
+int Dsa_Init_Sysfs(void)
+{
+	uint len = ARRAY_SIZE(dsa_configs);
+	const char **p = dsa_configs;
+
+	if (file_Exists(dsaDeviceFile) == 1)
+		return 0;
+
+	/* check the idxd driver */
+	if (file_Exists(dsaPasidEnable) != 1) {
+		printf("Please make sure idxd driver was loaded\n");
+		return 3;
+	}
+
+	/* Check SVA feature */
+	if (Check_DSA_Kernel_Setting() != 1) {
+		printf("Please enable SVA.(Add intel_iommu=on,sm_on in kernel cmdline)\n");
+		return 3;
+	}
+
+	/* Check the idxd device file on /dev/dsa/ */
+	for (int i = 0; i < len; i++) {
+		if (system(p[i]))
+			return 1;
+	}
+
+	/* After config, /dev/dsa/wq0.1 should be generated */
+	return (file_Exists(dsaDeviceFile) != 1);
+}
+
+/*
+ * Open DSA device file, triger API: iommu_sva_alloc_pasid
+ */
+void *allocate_dsa_pasid(void)
+{
+	int fd;
+	void *wq;
+
+	fd = open(dsaDeviceFile, O_RDWR);
+	if (fd < 0) {
+		perror("open");
+		return MAP_FAILED;
+	}
+
+	wq = mmap(NULL, 0x1000, PROT_WRITE,
+			   MAP_SHARED | MAP_POPULATE, fd, 0);
+	if (wq == MAP_FAILED)
+		perror("mmap");
+
+	return wq;
+}
+
+int set_force_svm(void)
+{
+	int ret = 0;
+
+	ret = syscall(SYS_arch_prctl, ARCH_FORCE_TAGGED_SVA);
+
+	return ret;
+}
+
+int handle_pasid(struct testcases *test)
+{
+	uint tmp = test->cmd;
+	uint runed = 0x0;
+	int ret = 0;
+	void *wq = NULL;
+
+	ret = Dsa_Init_Sysfs();
+	if (ret != 0)
+		return ret;
+
+	for (int i = 0; i < 3; i++) {
+		int err = 0;
+
+		if (tmp & 0x1) {
+			/* run set lam mode*/
+			if ((runed & 0x1) == 0)	{
+				err = set_lam(LAM_U57_BITS);
+				runed = runed | 0x1;
+			} else
+				err = 1;
+		} else if (tmp & 0x4) {
+			/* run force svm */
+			if ((runed & 0x4) == 0)	{
+				err = set_force_svm();
+				runed = runed | 0x4;
+			} else
+				err = 1;
+		} else if (tmp & 0x2) {
+			/* run allocate pasid */
+			if ((runed & 0x2) == 0) {
+				runed = runed | 0x2;
+				wq = allocate_dsa_pasid();
+				if (wq == MAP_FAILED)
+					err = 1;
+			} else
+				err = 1;
+		}
+
+		ret = ret + err;
+		if (ret > 0)
+			break;
+
+		tmp = tmp >> 4;
+	}
+
+	if (wq != MAP_FAILED && wq != NULL)
+		if (munmap(wq, 0x1000))
+			printf("munmap failed %d\n", errno);
+
+	if (runed != 0x7)
+		ret = 1;
+
+	return (ret != 0);
+}
+
+/*
+ * Pasid test depends on idxd and SVA, kernel should enable iommu and sm.
+ * command line(intel_iommu=on,sm_on)
+ */
+static struct testcases pasid_cases[] = {
+	{
+		.expected = 1,
+		.cmd = PAS_CMD(LAM_CMD_BIT, PAS_CMD_BIT, SVA_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: [Negative] Execute LAM, PASID, SVA in sequence\n",
+	},
+	{
+		.expected = 0,
+		.cmd = PAS_CMD(LAM_CMD_BIT, SVA_CMD_BIT, PAS_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: Execute LAM, SVA, PASID in sequence\n",
+	},
+	{
+		.expected = 1,
+		.cmd = PAS_CMD(PAS_CMD_BIT, LAM_CMD_BIT, SVA_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: [Negative] Execute PASID, LAM, SVA in sequence\n",
+	},
+	{
+		.expected = 0,
+		.cmd = PAS_CMD(PAS_CMD_BIT, SVA_CMD_BIT, LAM_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: Execute PASID, SVA, LAM in sequence\n",
+	},
+	{
+		.expected = 0,
+		.cmd = PAS_CMD(SVA_CMD_BIT, LAM_CMD_BIT, PAS_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: Execute SVA, LAM, PASID in sequence\n",
+	},
+	{
+		.expected = 0,
+		.cmd = PAS_CMD(SVA_CMD_BIT, PAS_CMD_BIT, LAM_CMD_BIT),
+		.test_func = handle_pasid,
+		.msg = "PASID: Execute SVA, PASID, LAM in sequence\n",
+	},
+};
+
 int main(int argc, char **argv)
 {
 	int c = 0;
@@ -910,6 +1140,9 @@ int main(int argc, char **argv)
 	if (tests & FUNC_INHERITE)
 		run_test(inheritance_cases, ARRAY_SIZE(inheritance_cases));
 
+	if (tests & FUNC_PASID)
+		run_test(pasid_cases, ARRAY_SIZE(pasid_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (15 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
@ 2023-01-11 12:37 ` Kirill A. Shutemov
  2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra
  17 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 12:37 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Kirill A. Shutemov

LAM enabling is only allowed when the process has single thread.
LAM mode is inherited into child thread.

Trying to enable LAM after spawning a thread has to fail.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 92 +++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index a8c91829b616..eb0e46905bf9 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -12,6 +13,7 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <inttypes.h>
+#include <sched.h>
 
 #include <sys/uio.h>
 #include <linux/io_uring.h>
@@ -50,6 +52,8 @@
 
 #define PAGE_SIZE               (4 << 10)
 
+#define STACK_SIZE		65536
+
 #define barrier() ({						\
 		   __asm__ __volatile__("" : : : "memory");	\
 })
@@ -731,6 +735,75 @@ static int handle_inheritance(struct testcases *test)
 	return 0;
 }
 
+static int thread_fn_get_lam(void *arg)
+{
+	return get_lam();
+}
+
+static int thread_fn_set_lam(void *arg)
+{
+	struct testcases *test = arg;
+
+	return set_lam(test->lam);
+}
+
+static int handle_thread(struct testcases *test)
+{
+	char stack[STACK_SIZE];
+	int ret, child_ret;
+	int lam = 0;
+	pid_t pid;
+
+	/* Set LAM mode in parent process */
+	if (!test->later) {
+		lam = test->lam;
+		if (set_lam(lam) != 0)
+			return 1;
+	}
+
+	pid = clone(thread_fn_get_lam, stack + STACK_SIZE,
+		    SIGCHLD | CLONE_FILES | CLONE_FS | CLONE_VM, NULL);
+	if (pid < 0) {
+		perror("Clone failed.");
+		return 1;
+	}
+
+	waitpid(pid, &child_ret, 0);
+	ret = WEXITSTATUS(child_ret);
+
+	if (lam != ret)
+		return 1;
+
+	if (test->later) {
+		if (set_lam(test->lam) != 0)
+			return 1;
+	}
+
+	return 0;
+}
+
+static int handle_thread_enable(struct testcases *test)
+{
+	char stack[STACK_SIZE];
+	int ret, child_ret;
+	int lam = test->lam;
+	pid_t pid;
+
+	pid = clone(thread_fn_set_lam, stack + STACK_SIZE,
+		    SIGCHLD | CLONE_FILES | CLONE_FS | CLONE_VM, test);
+	if (pid < 0) {
+		perror("Clone failed.");
+		return 1;
+	}
+
+	waitpid(pid, &child_ret, 0);
+	ret = WEXITSTATUS(child_ret);
+
+	if (lam != ret)
+		return 1;
+
+	return 0;
+}
 static void run_test(struct testcases *test, int count)
 {
 	int i, ret = 0;
@@ -846,6 +919,25 @@ static struct testcases inheritance_cases[] = {
 		.test_func = handle_inheritance,
 		.msg = "FORK: LAM_U57, child process should get LAM mode same as parent\n",
 	},
+	{
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_thread,
+		.msg = "THREAD: LAM_U57, child thread should get LAM mode same as parent\n",
+	},
+	{
+		.expected = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_thread_enable,
+		.msg = "THREAD: [NEGATIVE] Enable LAM in child.\n",
+	},
+	{
+		.expected = 1,
+		.later = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_thread,
+		.msg = "THREAD: [NEGATIVE] Enable LAM in parent after thread created.\n",
+	},
 	{
 		.expected = 0,
 		.lam = LAM_U57_BITS,
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 04/17] x86/mm: Handle LAM on context switch
  2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2023-01-11 13:49   ` Linus Torvalds
  2023-01-11 14:14     ` Kirill A. Shutemov
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2023-01-11 13:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Alexander Potapenko, Taras Madan,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe,
	Bharata B Rao, Jacob Pan, Ashok Raj, Linux-MM,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

Sorry, I'm traveling, and right now only on mobile. So I'm reading patches
on my phone, and answering on it too, so html garbage..

On Wed, Jan 11, 2023, 07:24 Kirill A. Shutemov <
kirill.shutemov@linux.intel.com> wrote:

>
> +static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
> +{
> +       return READ_ONCE(mm->context.lam_cr3_mask);
> +}
>

I mentioned this before - in the original version this needed (but didn't
have, iirc) that READ_ONCE, but in the new non-thread situation I don't
think that's true. There should be no concurrent changes that can interfere
with the read, no?

If there is a reason for that READ_ONCE still, please add a comment for it

         Linus

>

[-- Attachment #2: Type: text/html, Size: 1330 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 04/17] x86/mm: Handle LAM on context switch
  2023-01-11 13:49   ` Linus Torvalds
@ 2023-01-11 14:14     ` Kirill A. Shutemov
  2023-01-11 14:37       ` [PATCHv14.1 " Kirill A. Shutemov
  0 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 14:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Alexander Potapenko, Taras Madan,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe,
	Bharata B Rao, Jacob Pan, Ashok Raj, Linux-MM,
	Linux Kernel Mailing List

On Wed, Jan 11, 2023 at 07:49:30AM -0600, Linus Torvalds wrote:
> Sorry, I'm traveling, and right now only on mobile. So I'm reading patches
> on my phone, and answering on it too, so html garbage..
> 
> On Wed, Jan 11, 2023, 07:24 Kirill A. Shutemov <
> kirill.shutemov@linux.intel.com> wrote:
> 
> >
> > +static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
> > +{
> > +       return READ_ONCE(mm->context.lam_cr3_mask);
> > +}
> >
> 
> I mentioned this before - in the original version this needed (but didn't
> have, iirc) that READ_ONCE, but in the new non-thread situation I don't
> think that's true. There should be no concurrent changes that can interfere
> with the read, no?

It should be safe, yes. 

It is reachable from iommu_sva_bind_device(), but it only called with mm
== current->mm.

I will drop the READ_ONCE().

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCHv14.1 04/17] x86/mm: Handle LAM on context switch
  2023-01-11 14:14     ` Kirill A. Shutemov
@ 2023-01-11 14:37       ` Kirill A. Shutemov
  0 siblings, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-11 14:37 UTC (permalink / raw)
  To: kirill
  Cc: ak, andreyknvl, ashok.raj, bharata, dave.hansen, dvyukov, glider,
	hjl.tools, jacob.jun.pan, kcc, kirill.shutemov, linux-kernel,
	linux-mm, luto, peterz, rick.p.edgecombe, ryabinin.a.a,
	tarasmadan, torvalds, x86

Linear Address Masking mode for userspace pointers encoded in CR3 bits.
The mode is selected per-process and stored in mm_context_t.

switch_mm_irqs_off() now respects selected LAM mode and constructs CR3
accordingly.

The active LAM mode gets recorded in the tlb_state.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Alexander Potapenko <glider@google.com>
---
 v14.1:
  - Drop unneeded READ_ONCE();
---
 arch/x86/include/asm/mmu.h         |  5 +++
 arch/x86/include/asm/mmu_context.h | 24 ++++++++++++++
 arch/x86/include/asm/tlbflush.h    | 38 ++++++++++++++++++++-
 arch/x86/mm/tlb.c                  | 53 +++++++++++++++++++++---------
 4 files changed, 103 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index efa3eaee522c..22fc9fbf1d0a 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -42,6 +42,11 @@ typedef struct {
 	unsigned long flags;
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
+	unsigned long lam_cr3_mask;
+#endif
+
 	struct mutex lock;
 	void __user *vdso;			/* vdso base address */
 	const struct vdso_image *vdso_image;	/* vdso image in use */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 53ef591a6166..a62e70801ea8 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 }
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return mm->context.lam_cr3_mask;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+}
+
+#else
+
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+}
+#endif
+
 #define enter_lazy_tlb enter_lazy_tlb
 extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
 
@@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	arch_dup_pkeys(oldmm, mm);
 	paravirt_arch_dup_mmap(oldmm, mm);
+	dup_lam(oldmm, mm);
 	return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index cda3118f3b27..e8b47f57bd4a 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_X86_TLBFLUSH_H
 #define _ASM_X86_TLBFLUSH_H
 
-#include <linux/mm.h>
+#include <linux/mm_types.h>
 #include <linux/sched.h>
 
 #include <asm/processor.h>
@@ -12,6 +12,7 @@
 #include <asm/invpcid.h>
 #include <asm/pti.h>
 #include <asm/processor-flags.h>
+#include <asm/pgtable.h>
 
 void __flush_tlb_all(void);
 
@@ -101,6 +102,16 @@ struct tlb_state {
 	 */
 	bool invalidate_other;
 
+#ifdef CONFIG_ADDRESS_MASKING
+	/*
+	 * Active LAM mode.
+	 *
+	 * X86_CR3_LAM_U57/U48 shifted right by X86_CR3_LAM_U57_BIT or 0 if LAM
+	 * disabled.
+	 */
+	u8 lam;
+#endif
+
 	/*
 	 * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
 	 * the corresponding user PCID needs a flush next time we
@@ -357,6 +368,31 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
 }
 #define huge_pmd_needs_flush huge_pmd_needs_flush
 
+#ifdef CONFIG_ADDRESS_MASKING
+static inline  u64 tlbstate_lam_cr3_mask(void)
+{
+	u64 lam = this_cpu_read(cpu_tlbstate.lam);
+
+	return lam << X86_CR3_LAM_U57_BIT;
+}
+
+static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
+{
+	this_cpu_write(cpu_tlbstate.lam,
+		       mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT);
+}
+
+#else
+
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
+{
+}
+#endif
 #endif /* !MODULE */
 
 static inline void __native_tlb_flush_global(unsigned long cr4)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c1e31e9a85d7..8c330a6d0ece 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -154,26 +154,30 @@ static inline u16 user_pcid(u16 asid)
 	return ret;
 }
 
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam)
 {
+	unsigned long cr3 = __sme_pa(pgd) | lam;
+
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
+		VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
+		cr3 |= kern_pcid(asid);
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
 	}
+
+	return cr3;
 }
 
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid,
+					      unsigned long lam)
 {
-	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	/*
 	 * Use boot_cpu_has() instead of this_cpu_has() as this function
 	 * might be called during early boot. This should work even after
 	 * boot because all CPU's the have same capabilities:
 	 */
 	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+	return build_cr3(pgd, asid, lam) | CR3_NOFLUSH;
 }
 
 /*
@@ -274,15 +278,16 @@ static inline void invalidate_user_asid(u16 asid)
 		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
 }
 
-static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam,
+			    bool need_flush)
 {
 	unsigned long new_mm_cr3;
 
 	if (need_flush) {
 		invalidate_user_asid(new_asid);
-		new_mm_cr3 = build_cr3(pgdir, new_asid);
+		new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
 	} else {
-		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
 	}
 
 	/*
@@ -491,6 +496,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
@@ -520,7 +526,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid,
+						   tlbstate_lam_cr3_mask()))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -552,9 +559,15 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * instruction.
 	 */
 	if (real_prev == next) {
+		/* Not actually switching mm's */
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
 
+		/*
+		 * If this races with another thread that enables lam, 'new_lam'
+		 * might not match tlbstate_lam_cr3_mask().
+		 */
+
 		/*
 		 * Even in lazy TLB mode, the CPU should stay set in the
 		 * mm_cpumask. The TLB shootdown code can figure out from
@@ -622,15 +635,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	set_tlbstate_lam_mode(next);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-		load_new_mm_cr3(next->pgd, new_asid, true);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 	} else {
 		/* The new ASID is already up to date. */
-		load_new_mm_cr3(next->pgd, new_asid, false);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
@@ -691,6 +705,10 @@ void initialize_tlbstate_and_flush(void)
 	/* Assert that CR3 already references the right mm. */
 	WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
 
+	/* LAM expected to be disabled */
+	WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
+	WARN_ON(mm_lam_cr3_mask(mm));
+
 	/*
 	 * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
 	 * doesn't work like other CR4 bits because it can only be set from
@@ -699,8 +717,8 @@ void initialize_tlbstate_and_flush(void)
 	WARN_ON(boot_cpu_has(X86_FEATURE_PCID) &&
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
-	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm->pgd, 0));
+	/* Disable LAM, force ASID 0 and force a TLB flush. */
+	write_cr3(build_cr3(mm->pgd, 0, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
@@ -708,6 +726,7 @@ void initialize_tlbstate_and_flush(void)
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
+	set_tlbstate_lam_mode(mm);
 
 	for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
 		this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
@@ -1071,8 +1090,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
  */
 unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+	unsigned long cr3 =
+		build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+			  this_cpu_read(cpu_tlbstate.loaded_mm_asid),
+			  tlbstate_lam_cr3_mask());
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
@ 2023-01-17 13:05   ` Peter Zijlstra
  2023-01-17 13:57     ` Kirill A. Shutemov
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-17 13:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Taras Madan, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel

On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:

>  #define __untagged_addr(untag_mask, addr)	({			\
>  	u64 __addr = (__force u64)(addr);				\
> -	s64 sign = (s64)__addr >> 63;					\
> -	__addr &= untag_mask | sign;					\
> +	if (static_branch_likely(&tagged_addr_key)) {			\
> +		s64 sign = (s64)__addr >> 63;				\
> +		__addr &= untag_mask | sign;				\
> +	}								\
>  	(__force __typeof__(addr))__addr;				\
>  })
>  
> #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)

Is the compiler clever enough to put the memop inside the branch?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 13:05   ` Peter Zijlstra
@ 2023-01-17 13:57     ` Kirill A. Shutemov
  2023-01-17 15:02       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-17 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Taras Madan, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel

On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> 
> >  #define __untagged_addr(untag_mask, addr)
> >  	u64 __addr = (__force u64)(addr);				\
> > -	s64 sign = (s64)__addr >> 63;					\
> > -	__addr &= untag_mask | sign;					\
> > +	if (static_branch_likely(&tagged_addr_key)) {			\
> > +		s64 sign = (s64)__addr >> 63;				\
> > +		__addr &= untag_mask | sign;				\
> > +	}								\
> >  	(__force __typeof__(addr))__addr;				\
> >  })
> >  
> > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> 
> Is the compiler clever enough to put the memop inside the branch?

Hm. You mean current_untag_mask() inside static_branch_likely()?

But it is preprocessor who does this, not compiler. So, yes, the memop is
inside the branch.

Or I didn't understand your question.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 13:57     ` Kirill A. Shutemov
@ 2023-01-17 15:02       ` Peter Zijlstra
  2023-01-17 17:18         ` Linus Torvalds
  2023-01-19 23:06         ` Kirill A. Shutemov
  0 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-17 15:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Taras Madan, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Sami Tolvanen,
	ndesaulniers, joao

On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> > 
> > >  #define __untagged_addr(untag_mask, addr)
> > >  	u64 __addr = (__force u64)(addr);				\
> > > -	s64 sign = (s64)__addr >> 63;					\
> > > -	__addr &= untag_mask | sign;					\
> > > +	if (static_branch_likely(&tagged_addr_key)) {			\
> > > +		s64 sign = (s64)__addr >> 63;				\
> > > +		__addr &= untag_mask | sign;				\
> > > +	}								\
> > >  	(__force __typeof__(addr))__addr;				\
> > >  })
> > >  
> > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> > 
> > Is the compiler clever enough to put the memop inside the branch?
> 
> Hm. You mean current_untag_mask() inside static_branch_likely()?
> 
> But it is preprocessor who does this, not compiler. So, yes, the memop is
> inside the branch.
> 
> Or I didn't understand your question.

Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.

That said, I did just put it through a compiler to see wth it did and it
is pretty gross:

GCC-12.2:

0000 00000000000023b0 <write_ok_or_segv>:
0000     23b0:  48 89 fa                mov    %rdi,%rdx
0003     23b3:  eb 76                   jmp    242b <write_ok_or_segv+0x7b>
0005     23b5:  65 48 8b 0d 00 00 00 00         mov    %gs:0x0(%rip),%rcx        # 23bd <write_ok_or_segv+0xd>  23b9: R_X86_64_PC32     tlbstate_untag_mask-0x4
000d     23bd:  48 89 f8                mov    %rdi,%rax
0010     23c0:  48 c1 f8 3f             sar    $0x3f,%rax
0014     23c4:  48 09 c8                or     %rcx,%rax
0017     23c7:  48 21 f8                and    %rdi,%rax
001a     23ca:  48 b9 00 f0 ff ff ff 7f 00 00   movabs $0x7ffffffff000,%rcx
0024     23d4:  48 39 f1                cmp    %rsi,%rcx
0027     23d7:  72 14                   jb     23ed <write_ok_or_segv+0x3d>
0029     23d9:  48 29 f1                sub    %rsi,%rcx
002c     23dc:  be 01 00 00 00          mov    $0x1,%esi
0031     23e1:  48 39 c1                cmp    %rax,%rcx
0034     23e4:  72 07                   jb     23ed <write_ok_or_segv+0x3d>
0036     23e6:  89 f0                   mov    %esi,%eax
0038     23e8:  e9 00 00 00 00          jmp    23ed <write_ok_or_segv+0x3d>     23e9: R_X86_64_PLT32    __x86_return_thunk-0x4
003d     23ed:  65 48 8b 04 25 00 00 00 00      mov    %gs:0x0,%rax     23f2: R_X86_64_32S      pcpu_hot
0046     23f6:  48 89 90 68 0b 00 00    mov    %rdx,0xb68(%rax)
004d     23fd:  be 01 00 00 00          mov    $0x1,%esi
0052     2402:  bf 0b 00 00 00          mov    $0xb,%edi
0057     2407:  48 c7 80 78 0b 00 00 06 00 00 00        movq   $0x6,0xb78(%rax)
0062     2412:  48 c7 80 70 0b 00 00 0e 00 00 00        movq   $0xe,0xb70(%rax)
006d     241d:  e8 00 00 00 00          call   2422 <write_ok_or_segv+0x72>     241e: R_X86_64_PLT32    force_sig_fault-0x4
0072     2422:  31 f6                   xor    %esi,%esi
0074     2424:  89 f0                   mov    %esi,%eax
0076     2426:  e9 00 00 00 00          jmp    242b <write_ok_or_segv+0x7b>     2427: R_X86_64_PLT32    __x86_return_thunk-0x4
007b     242b:  48 89 f8                mov    %rdi,%rax
007e     242e:  eb 9a                   jmp    23ca <write_ok_or_segv+0x1a>

Note the stupid jump to the end and back. Not all sites do this mind
you, but a fair number seem to do it.

Let me try llvm to see if it is any smarter.

CLANG-16:

0000 0000000000002d50 <write_ok_or_segv>:
0000     2d50:	41 57                	push   %r15
0002     2d52:	41 56                	push   %r14
0004     2d54:	41 54                	push   %r12
0006     2d56:	53                   	push   %rbx
0007     2d57:	48 89 f3             	mov    %rsi,%rbx
000a     2d5a:	48 89 fa             	mov    %rdi,%rdx
000d     2d5d:	49 89 fe             	mov    %rdi,%r14
0010     2d60:	eb 15                	jmp    2d77 <write_ok_or_segv+0x27>
0012     2d62:	48 89 d0             	mov    %rdx,%rax
0015     2d65:	48 c1 f8 3f          	sar    $0x3f,%rax
0019     2d69:	65 4c 8b 35 00 00 00 00 	mov    %gs:0x0(%rip),%r14        # 2d71 <write_ok_or_segv+0x21>	2d6d: R_X86_64_PC32	tlbstate_untag_mask-0x4
0021     2d71:	49 09 c6             	or     %rax,%r14
0024     2d74:	49 21 d6             	and    %rdx,%r14
0027     2d77:	f3 0f 1e fa          	endbr64
002b     2d7b:	49 bf 00 f0 ff ff ff 7f 00 00 	movabs $0x7ffffffff000,%r15
0035     2d85:	4d 89 fc             	mov    %r15,%r12
0038     2d88:	49 29 dc             	sub    %rbx,%r12
003b     2d8b:	72 05                	jb     2d92 <write_ok_or_segv+0x42>
003d     2d8d:	4d 39 f4             	cmp    %r14,%r12
0040     2d90:	73 34                	jae    2dc6 <write_ok_or_segv+0x76>
0042     2d92:	65 48 8b 05 00 00 00 00 	mov    %gs:0x0(%rip),%rax        # 2d9a <write_ok_or_segv+0x4a>	2d96: R_X86_64_PC32	pcpu_hot-0x4
004a     2d9a:	48 c7 80 78 0b 00 00 06 00 00 00 	movq   $0x6,0xb78(%rax)
0055     2da5:	48 89 90 68 0b 00 00 	mov    %rdx,0xb68(%rax)
005c     2dac:	48 c7 80 70 0b 00 00 0e 00 00 00 	movq   $0xe,0xb70(%rax)
0067     2db7:	bf 0b 00 00 00       	mov    $0xb,%edi
006c     2dbc:	be 01 00 00 00       	mov    $0x1,%esi
0071     2dc1:	e8 00 00 00 00       	call   2dc6 <write_ok_or_segv+0x76>	2dc2: R_X86_64_PLT32	force_sig_fault-0x4
0076     2dc6:	4d 39 f4             	cmp    %r14,%r12
0079     2dc9:	0f 93 c1             	setae  %cl
007c     2dcc:	49 39 df             	cmp    %rbx,%r15
007f     2dcf:	0f 93 c0             	setae  %al
0082     2dd2:	20 c8                	and    %cl,%al
0084     2dd4:	5b                   	pop    %rbx
0085     2dd5:	41 5c                	pop    %r12
0087     2dd7:	41 5e                	pop    %r14
0089     2dd9:	41 5f                	pop    %r15
008b     2ddb:	e9 00 00 00 00       	jmp    2de0 <__pfx_get_gate_vma>	2ddc: R_X86_64_PLT32	__x86_return_thunk-0x4

Well, it got the untag right, but OMG.. :-( Joao, Sami, any idea why it
put an ENDBR in there?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 15:02       ` Peter Zijlstra
@ 2023-01-17 17:18         ` Linus Torvalds
  2023-01-17 17:28           ` Linus Torvalds
                             ` (2 more replies)
  2023-01-19 23:06         ` Kirill A. Shutemov
  1 sibling, 3 replies; 39+ messages in thread
From: Linus Torvalds @ 2023-01-17 17:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, ndesaulniers, joao

On Tue, Jan 17, 2023 at 7:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> > >
> > > >  #define __untagged_addr(untag_mask, addr)
> > > >   u64 __addr = (__force u64)(addr);                               \
> > > > - s64 sign = (s64)__addr >> 63;                                   \
> > > > - __addr &= untag_mask | sign;                                    \
> > > > + if (static_branch_likely(&tagged_addr_key)) {                   \
> > > > +         s64 sign = (s64)__addr >> 63;                           \
> > > > +         __addr &= untag_mask | sign;                            \
> > > > + }                                                               \
> > > >   (__force __typeof__(addr))__addr;                               \
> > > >  })
> > > >
> > > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> > >
> > > Is the compiler clever enough to put the memop inside the branch?
> >
> > Hm. You mean current_untag_mask() inside static_branch_likely()?
> >
> > But it is preprocessor who does this, not compiler. So, yes, the memop is
> > inside the branch.
> >
> > Or I didn't understand your question.
>
> Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.
>
> That said, I did just put it through a compiler to see wth it did and it
> is pretty gross:

Yeah, I think the static branch likely just makes things worse.

And if we really want to make the "no untag mask exists" case better,
I think the code should probably use static_branch_unlikely() rather
than *_likely(). That should make it jump to the masking code, and
leave the unmasked code as a fallthrough, no?

The reason clang seems to generate saner code is that clang seems to
largely ignore the whole "__builtin_expect()", at least not to the
point where it tries to make the unlikely case be out-of-line.

But on the whole, I think we'd be better off without this whole static branch.

The cost of "untagged_addr()" generally shouldn't be worth this. There
are few performance-crticial users - the most common case is, I think,
just mmap() and friends, and the single load is going to be a
non-issue there.

Looking around, I think the only situation where we may care is
strnlen_user() and strncpy_from_user(). Those *can* be
performance-critical. They're used for paths and for execve() strings,
and can be a bit hot.

And both of those cases actually just use it because of the whole
"maximum address" calculation to avoid traversing into kernel
addresses, so I wonder if we could use alternatives there, kind of
like the get_user/put_user cases did. Except it's generic code, so ..

But maybe even those aren't worth worrying about. At least they do the
unmasking outside the loop - although then in the case of execve(),
the string copies themselves are obviously done in a loop anyway.

Kirill, do you have clear numbers for that static key being a noticeable win?

                 Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 17:18         ` Linus Torvalds
@ 2023-01-17 17:28           ` Linus Torvalds
  2023-01-17 18:26             ` Nick Desaulniers
  2023-01-17 18:14           ` Peter Zijlstra
  2023-01-17 18:21           ` Peter Zijlstra
  2 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2023-01-17 17:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, ndesaulniers, joao

On Tue, Jan 17, 2023 at 9:18 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The reason clang seems to generate saner code is that clang seems to
> largely ignore the whole "__builtin_expect()", at least not to the
> point where it tries to make the unlikely case be out-of-line.

Side note: that's not something new or unusual. It's been the case
since I started testing clang - we have several code-paths where we
use "unlikely()" to try to get very unlikely cases to be out-of-line,
and clang just mostly ignores it, or treats it as a very weak hint. I
think the only way to get clang to treat it as a *strong* hint is to
use PGO.

And in this case it actually made code generation look better,
probably because this particular use of static_branch_likely() is a
bit confused about which side should be the preferred one.  It's using
the static branch to make the old case not have the masked load, but
then it's saying that the new case is the likely one.

So clang ignoring the likely() hint is probably the right thing here,
and then the wrong thing in some other places.

              Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 17:18         ` Linus Torvalds
  2023-01-17 17:28           ` Linus Torvalds
@ 2023-01-17 18:14           ` Peter Zijlstra
  2023-01-17 18:21           ` Peter Zijlstra
  2 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-17 18:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, ndesaulniers, joao

On Tue, Jan 17, 2023 at 09:18:01AM -0800, Linus Torvalds wrote:

> The reason clang seems to generate saner code is that clang seems to
> largely ignore the whole "__builtin_expect()", at least not to the
> point where it tries to make the unlikely case be out-of-line.

So in this case there is only a 'likely' hint, we're explicitly trying
to keep the thing in-line so we can jump over it.

It is GCC that generated an implicit else (and marked it 'unlikely' --
which we didn't ask for), but worse, it failed to spot the else case is
in fact shared with the normal case and it could've simply lifted that
mov instruction.

That is, instead of this:

0003     23b3:  eb 76                   jmp    242b <write_ok_or_segv+0x7b>
0005     23b5:  65 48 8b 0d 00 00 00 00         mov    %gs:0x0(%rip),%rcx        # 23bd <write_ok_or_segv+0xd>  23b9: R_X86_64_PC32     tlbstate_untag_mask-0x4
000d     23bd:  48 89 f8                mov    %rdi,%rax
0010     23c0:  48 c1 f8 3f             sar    $0x3f,%rax
0014     23c4:  48 09 c8                or     %rcx,%rax
0017     23c7:  48 21 f8                and    %rdi,%rax

001a     23ca:  48 b9 00 f0 ff ff ff 7f 00 00   movabs $0x7ffffffff000,%rcx

007b     242b:  48 89 f8                mov    %rdi,%rax
007e     242e:  eb 9a                   jmp    23ca <write_ok_or_segv+0x1a>

It could've just done:

0003     48 89 f8                mov    %rdi,%rax
0006     eb 76                   jmp    +18
0008     65 48 8b 0d 00 00 00 00         mov    %gs:0x0(%rip),%rcx        # 23bd <write_ok_or_segv+0xd>  23b9: R_X86_64_PC32     tlbstate_untag_mask-0x4
0010     48 c1 f8 3f             sar    $0x3f,%rax
0014     48 09 c8                or     %rcx,%rax
0017     48 21 f8                and    %rdi,%rax

001a     48 b9 00 f0 ff ff ff 7f 00 00   movabs $0x7ffffffff000,%rcx

and everything would've been good. In all the cases I've seen it do
this, it was the same, it has this silly move out of line that's also
part of the regular branch.

That is, I like __builtin_expect() to be a strong hint. If I don't want
things out of line, I shouldn't have put unlikely on it. What I don't
like is that implicit else branches get the opposite strong hint.

What I like even less is that it found it needed that else branch at
all.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 17:18         ` Linus Torvalds
  2023-01-17 17:28           ` Linus Torvalds
  2023-01-17 18:14           ` Peter Zijlstra
@ 2023-01-17 18:21           ` Peter Zijlstra
  2 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-17 18:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, ndesaulniers, joao

On Tue, Jan 17, 2023 at 09:18:01AM -0800, Linus Torvalds wrote:

> But maybe even those aren't worth worrying about. At least they do the
> unmasking outside the loop - although then in the case of execve(),
> the string copies themselves are obviously done in a loop anyway.
> 
> Kirill, do you have clear numbers for that static key being a noticeable win?

Numbers would be good; I think this all started as a precaution, but
clearly that's not really working out so well :/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 17:28           ` Linus Torvalds
@ 2023-01-17 18:26             ` Nick Desaulniers
  2023-01-17 18:33               ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Desaulniers @ 2023-01-17 18:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, joao

On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jan 17, 2023 at 9:18 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The reason clang seems to generate saner code is that clang seems to
> > largely ignore the whole "__builtin_expect()", at least not to the
> > point where it tries to make the unlikely case be out-of-line.
>
> Side note: that's not something new or unusual. It's been the case
> since I started testing clang - we have several code-paths where we
> use "unlikely()" to try to get very unlikely cases to be out-of-line,
> and clang just mostly ignores it, or treats it as a very weak hint. I
> think the only way to get clang to treat it as a *strong* hint is to
> use PGO.

I'd be surprised if that were intentional or by design.

Do you guys have a bug report we could look at?

> So clang ignoring the likely() hint is probably the right thing here,
> and then the wrong thing in some other places.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 18:26             ` Nick Desaulniers
@ 2023-01-17 18:33               ` Linus Torvalds
  2023-01-17 19:17                 ` Nick Desaulniers
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2023-01-17 18:33 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, joao

On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Side note: that's not something new or unusual. It's been the case
> > since I started testing clang - we have several code-paths where we
> > use "unlikely()" to try to get very unlikely cases to be out-of-line,
> > and clang just mostly ignores it, or treats it as a very weak hint. I
> > think the only way to get clang to treat it as a *strong* hint is to
> > use PGO.
>
> I'd be surprised if that were intentional or by design.
>
> Do you guys have a bug report we could look at?

Heh. I actually sent you an example long ago. Let me go fish it out of
my mail archives and quote some of it below so that you can find it in
yours..

              Linus

[ Time passes. Found this email to you and Bill Wendling from Feb 16,
2020, Message-ID
CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@mail.gmail.com ]:

  Anyway, I'm looking at clang code generation, and comparing it with
  gcc on one of my "this has been optimized to hell and back" functions:
  __d_lookup_rcu().

  It looks like clang does frame pointers, and ignores our
  likely/unlikely annotations.

  So this code:

                if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) {
                        int tlen;
                        const char *tname;
                        ......

  doesn't actually jump out of line, but instead generates the unlikely
  case as the fallthrough:

        testb   $2, (%r12)
        je      .LBB50_9
        ... unlikely code goes here...

  and then the likely case ends up having unfortunate reloads inside the
  hot loop. Possibly because it has one fewer free registers than gcc
  because of the frame pointer.

  I didn't look into _why_ clang generates frame pointers but gcc
  doesn't. It may be just a compiler default, I think we don't end up
  explicitly asking either way.

[ And then Bill replied with this ]

  It's not a no-op. We add branch probabilities to the IR, whether
  they're honored or not depends on the transformation. But they
  *should* be honored when available. I've seen in the past that instead
  of moving unlikely blocks out of the way (like gcc, which moves them
  below the function's "ret" instruction), LLVM will do something like
  this:

    <normal code>
    <jmp to loop conditional test>
        <unlikely code>
        <more likely code>
    <loop conditional test>
    <...>

  I.e. the loop is rotated and the unlikely code is first and the hotter
  code is closer together but between the unlikely and conditional test.
  Could this be what's going on? Otherwise, maybe clang decided that
  it's not beneficial to move the code out-of-line because the benefit
  was minimal? (I'm guessing here.)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 18:33               ` Linus Torvalds
@ 2023-01-17 19:17                 ` Nick Desaulniers
  2023-01-17 20:10                   ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Nick Desaulniers @ 2023-01-17 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, joao

On Tue, Jan 17, 2023 at 10:34 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jan 17, 2023 at 10:26 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > On Tue, Jan 17, 2023 at 9:29 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Side note: that's not something new or unusual. It's been the case
> > > since I started testing clang - we have several code-paths where we
> > > use "unlikely()" to try to get very unlikely cases to be out-of-line,
> > > and clang just mostly ignores it, or treats it as a very weak hint. I
> > > think the only way to get clang to treat it as a *strong* hint is to
> > > use PGO.
> >
> > I'd be surprised if that were intentional or by design.
> >
> > Do you guys have a bug report we could look at?
>
> Heh. I actually sent you an example long ago. Let me go fish it out of
> my mail archives and quote some of it below so that you can find it in
> yours..
>
>               Linus
>
> [ Time passes. Found this email to you and Bill Wendling from Feb 16,
> 2020, Message-ID
> CAHk-=wigVshsByCMjkUiZyQSR5N5zi2aAeQc+VJCzQV=nm8E7g@mail.gmail.com ]:
>
>   Anyway, I'm looking at clang code generation, and comparing it with
>   gcc on one of my "this has been optimized to hell and back" functions:
>   __d_lookup_rcu().
>
>   It looks like clang does frame pointers, and ignores our
>   likely/unlikely annotations.
>
>   So this code:
>
>                 if (unlikely(parent->d_flags & DCACHE_OP_COMPARE)) {
>                         int tlen;
>                         const char *tname;
>                         ......
>
>   doesn't actually jump out of line, but instead generates the unlikely
>   case as the fallthrough:
>
>         testb   $2, (%r12)
>         je      .LBB50_9
>         ... unlikely code goes here...


Perhaps that was compiler version or config specific?

$ make LLVM=1 -j128 defconfig fs/dcache.o
$ llvm-objdump -d  --no-show-raw-insn
--disassemble-symbols=__d_lookup_rcu fs/dcache.o
0000000000003210 <__d_lookup_rcu>:
    3210:      endbr64
    3214:      pushq %rbp
    3215:      pushq %r15
    3217:      pushq %r14
    3219:      pushq %r12
    321b:      pushq %rbx
    321c:      testb $0x2, (%rdi)
    321f:      jne 0x32d7 <__d_lookup_rcu+0xc7>
...
    32d7:      popq %rbx
    32d8:      popq %r12
    32da:      popq %r14
    32dc:      popq %r15
    32de:      popq %rbp
    32df:      jmp 0x3300 <__d_lookup_rcu_op_compare>

That looks like what you want, yeah?

Your original report was from nearly 3 years ago; could have fixed a
few instances of branch weights not getting propagated since then.

What's going on in this case in this thread? I know we don't support
hot/cold attributes on labels yet, but if static_branch_likely (or
friends) is being used, we assign the indirect branches a 0%
likeliness/branch-weight.

>
>   and then the likely case ends up having unfortunate reloads inside the
>   hot loop. Possibly because it has one fewer free registers than gcc
>   because of the frame pointer.
>
>   I didn't look into _why_ clang generates frame pointers but gcc
>   doesn't. It may be just a compiler default, I think we don't end up
>   explicitly asking either way.
>
> [ And then Bill replied with this ]
>
>   It's not a no-op. We add branch probabilities to the IR, whether
>   they're honored or not depends on the transformation. But they
>   *should* be honored when available. I've seen in the past that instead
>   of moving unlikely blocks out of the way (like gcc, which moves them
>   below the function's "ret" instruction), LLVM will do something like
>   this:
>
>     <normal code>
>     <jmp to loop conditional test>
>         <unlikely code>
>         <more likely code>
>     <loop conditional test>
>     <...>
>
>   I.e. the loop is rotated and the unlikely code is first and the hotter
>   code is closer together but between the unlikely and conditional test.
>   Could this be what's going on? Otherwise, maybe clang decided that
>   it's not beneficial to move the code out-of-line because the benefit
>   was minimal? (I'm guessing here.)



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 19:17                 ` Nick Desaulniers
@ 2023-01-17 20:10                   ` Linus Torvalds
  2023-01-17 20:43                     ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2023-01-17 20:10 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, joao

On Tue, Jan 17, 2023 at 11:17 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Perhaps that was compiler version or config specific?

Possible, but...

The clang code generation annoyed me enough that I actually ended up
rewriting the unlikely test to be outside the loop in commit
ae2a823643d7 ("dcache: move the DCACHE_OP_COMPARE case out of the
__d_lookup_rcu loop").

I think that then made clang no longer have the whole "rotate loop
with unlikely case in the middle" issue.

And then because clang *still* messed up by trying to be too clever (see

    https://lore.kernel.org/all/CAHk-=wjyOB66pofW0mfzDN7SO8zS1EMRZuR-_2aHeO+7kuSrAg@mail.gmail.com/

for details), I also ended up doing commit c4e34dd99f2e ("x86:
simplify load_unaligned_zeropad() implementation").

The end result is that now the compiler almost *cannot* mess up any more.

So the reason clang now does a good job on __d_lookup_rcu() is largely
that I took away all the places where it did badly ;)

That said, clang still generates more register pressure than gcc,
causing the function prologue and epilogue to be rather bigger
(pushing and popping six registers, as opposed to gcc that only needs
three)

Gcc is also better able to schedule the prologue and epilogue together
with the work of the function, which clang seems to always do it as a
"push all" and "pop all" sequence.

That scheduling doesn't matter in that particular place (although it
does make the unlikely case of calling __d_lookup_rcu_op_compare
pointlessly push all regs only to then pop them), but I've seen a few
other cases where it ends up meaning that it always does that full
function prologue even when the *likely* case then returns early and
doesn't actually need any of that work because it didn't use any of
those registers.

But yeah, the RCU pathname lookup looks fine these days. And I don't
actually think it was due to clang changes ;)

                Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 20:10                   ` Linus Torvalds
@ 2023-01-17 20:43                     ` Linus Torvalds
  0 siblings, 0 replies; 39+ messages in thread
From: Linus Torvalds @ 2023-01-17 20:43 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel, Sami Tolvanen, joao

On Tue, Jan 17, 2023 at 12:10 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, clang still generates more register pressure than gcc,
> causing the function prologue and epilogue to be rather bigger
> (pushing and popping six registers, as opposed to gcc that only needs
> three)

.. and at least part of that is the same thing with the bad byte mask
generation (see that "clang *still* messes up" link for details).

Basically, the byte mask is computed by

        mask = bytemask_from_count(tcount);

where we have

   #define bytemask_from_count(cnt)       (~(~0ul << (cnt)*8))

and clang tries very very hard to avoid that "multiply by 8", so
instead it keeps a shadow copy of that "(cnt)*8" value in the loop.

That is wrong for a couple of reasons:

 (a) it adds register pressure for no good reason

 (b) when you shift left by that value, only the low 6 bits of that
value matters

And guess how that "tcount" is updated? It's this:

                tcount -= sizeof(unsigned long);

in the loop, and thus the update of that shadow value of "(cnt)*8" is done as

        addl    $-64, %ecx

inside that loop.

This is truly stupid and wasted work, because the low 6 bits of the
value - remember, the only part that matters - DOES NOT CHANGE when
you do that.

So clang has decided that it needs to

 (a) avoid the "expensive" multiply-by-8 at the end by turning it into
a repeated "add $-64" inside the loop

 (b) added register pressure and one extra instruction inside the loop

 (c) not realized that that extra instruction doesn't actually *do*
anything, because it only affects the bits that don't actually matter
in the end.

which is all kind of silly, wouldn't you agree. Every single step
there was pointless.

But with my other simplifications, the fact that clang does these
extra things is no longer all that noticeable. It *used* to be a
horrible disaster because the extra register pressure ended up meaning
that you had spills and all kinds of nastiness. Now the function is
simple enough that even with the extra register pressure, there's no
need for spills.

.. until you look at the 32-bit version, which still needs spills. Gcc
does too, but clang just makes it worse by having the extra pointless
shadow variable.

If I cared about 32-bit, I might write up a bugzilla entry. As it is,
it's just "clang tries to be clever, and in the process is actually
being stupid".

              Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user()
  2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
@ 2023-01-18 15:49   ` Peter Zijlstra
  2023-01-18 15:59     ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-18 15:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Taras Madan, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel

On Wed, Jan 11, 2023 at 03:37:20PM +0300, Kirill A. Shutemov wrote:
> The functions get_user() and put_user() check that the target address
> range resides in the user space portion of the virtual address space.
> In order to perform this check, the functions compare the end of the
> range against TASK_SIZE_MAX.
> 
> For kernels compiled with CONFIG_X86_5LEVEL, this process requires some
> additional trickery using ALTERNATIVE, as TASK_SIZE_MAX depends on the
> paging mode in use.
> 
> Linus suggested that this check could be simplified for 64-bit kernels.
> It is sufficient to check bit 63 of the address to ensure that the range
> belongs to user space. Additionally, the use of branches can be avoided
> by setting the target address to all ones if bit 63 is set.
> 
> There's no need to check the end of the access range as there's huge
> gap between end of userspace range and start of the kernel range. The
> gap consists of canonical hole and unused ranges on both kernel and
> userspace sides.

So far I can follow, however

> If an address with bit 63 set is passed down, it will trigger a #GP
> exception. _ASM_EXTABLE_UA() complains about this. Replace it with
> plain _ASM_EXTABLE() as it is expected behaviour now.

here I don't. The new logic basically squishes every kernel address to
-1L -- a known unmapped address, but getting that address in
{get,put}_user() is still a fail, right?

We used to manually branch to bad_get_user when outside TASK_SIZE_MAX,
now we rely on #GP.

So why silence it?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user()
  2023-01-18 15:49   ` Peter Zijlstra
@ 2023-01-18 15:59     ` Linus Torvalds
  2023-01-18 16:48       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2023-01-18 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel

On Wed, Jan 18, 2023 at 7:50 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Jan 11, 2023 at 03:37:20PM +0300, Kirill A. Shutemov wrote:
>
> > If an address with bit 63 set is passed down, it will trigger a #GP
> > exception. _ASM_EXTABLE_UA() complains about this. Replace it with
> > plain _ASM_EXTABLE() as it is expected behaviour now.
>
> here I don't. The new logic basically squishes every kernel address to
> -1L -- a known unmapped address, but getting that address in
> {get,put}_user() is still a fail, right?
>
> We used to manually branch to bad_get_user when outside TASK_SIZE_MAX,
> now we rely on #GP.
>
> So why silence it?

We don't silence it - for a kernel address that turns into an all-ones
address, the the _ASM_EXTABLE() will still cause the -EFAULT due to
the page fault.

But it's not the high bit set case that is the problem here.

The problem is a "positive" address that is non-canonical.

Testing against TASK_SIZE_MAX would catch non-canonical addresses
before the access, and we'd return -EFAULT.

But now that we don't test against TASK_SIZE_MAX any more,
non-canonical accesses will cause a GP fault, and *that* message is
what we want to silence.

We'll still return -EFAULT, of course, we're just getting rid of the

        WARN_ONCE(trapnr == X86_TRAP_GP,
                "General protection fault in user access.
Non-canonical address?");

issue that comes from not being so exact about the address limit any more.

                 Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user()
  2023-01-18 15:59     ` Linus Torvalds
@ 2023-01-18 16:48       ` Peter Zijlstra
  2023-01-18 17:01         ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-18 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel

On Wed, Jan 18, 2023 at 07:59:21AM -0800, Linus Torvalds wrote:

> We don't silence it - for a kernel address that turns into an all-ones
> address, the the _ASM_EXTABLE() will still cause the -EFAULT due to
> the page fault.

> But it's not the high bit set case that is the problem here.

Yes, and the explicit bad_get_user jump would not print the message and
now with _UA removed it won't either (I seem to have my wires crossed
just now).

> The problem is a "positive" address that is non-canonical.
> 
> Testing against TASK_SIZE_MAX would catch non-canonical addresses
> before the access, and we'd return -EFAULT.
> 
> But now that we don't test against TASK_SIZE_MAX any more,
> non-canonical accesses will cause a GP fault, and *that* message is
> what we want to silence.

Right, but I was thinking that we'd explicitly allowed those because
with LAM enabled we'd actually accept those addresses.

> We'll still return -EFAULT, of course, we're just getting rid of the
> 
>         WARN_ONCE(trapnr == X86_TRAP_GP,
>                 "General protection fault in user access.
> Non-canonical address?");
> 
> issue that comes from not being so exact about the address limit any more.

Ah indeed, so for !LAM we'd now print the message were we would not
before (the whole TASK_SIZE_MAX+ range).

OK, agreed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 00/17] Linear Address Masking enabling
  2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
                   ` (16 preceding siblings ...)
  2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
@ 2023-01-18 16:49 ` Peter Zijlstra
  17 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2023-01-18 16:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Taras Madan, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel

On Wed, Jan 11, 2023 at 03:37:19PM +0300, Kirill A. Shutemov wrote:

> Kirill A. Shutemov (12):
>   x86/mm: Rework address range check in get_user() and put_user()
>   x86: Allow atomic MM_CONTEXT flags setting
>   x86: CPUID and CR3/CR4 flags for Linear Address Masking
>   x86/mm: Handle LAM on context switch
>   mm: Introduce untagged_addr_remote()
>   x86/uaccess: Provide untagged_addr() and remove tags before address
>     check
>   x86/mm: Provide arch_prctl() interface for LAM
>   x86/mm: Reduce untagged_addr() overhead until the first LAM user
>   mm: Expose untagging mask in /proc/$PID/status
>   iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
>   x86/mm/iommu/sva: Make LAM and SVA mutually exclusive

+- the static_branch() thing,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user()
  2023-01-18 16:48       ` Peter Zijlstra
@ 2023-01-18 17:01         ` Linus Torvalds
  0 siblings, 0 replies; 39+ messages in thread
From: Linus Torvalds @ 2023-01-18 17:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	linux-mm, linux-kernel

On Wed, Jan 18, 2023 at 8:49 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > We'll still return -EFAULT, of course, we're just getting rid of the
> >
> >         WARN_ONCE(trapnr == X86_TRAP_GP,
> >                 "General protection fault in user access.
> > Non-canonical address?");
> >
> > issue that comes from not being so exact about the address limit any more.
>
> Ah indeed, so for !LAM we'd now print the message were we would not
> before (the whole TASK_SIZE_MAX+ range).

Yeah.

We could just remove that warning entirely, but it has been useful for
syzbot catching random user addresses that weren't caught by
"access_ok()" when people did bad bad things (ie using the
non-checking "__copy_from_user()" and friends).

I'm not sure how much that warning is worth any more - and for
get_user() and put_user() itself it buys us nothing, since by
definition _those_ do the range checking. Christoph getting rid of the
set_fs() model simplified a lot of our user address checking.

But I think it's easier to just keep that existing warning about "how
did you get a non-canonical address here" for other user accesses, and
just make get/put_user() use that _ASM_EXTABLE() version that doesn't
do it.

               Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user
  2023-01-17 15:02       ` Peter Zijlstra
  2023-01-17 17:18         ` Linus Torvalds
@ 2023-01-19 23:06         ` Kirill A. Shutemov
  1 sibling, 0 replies; 39+ messages in thread
From: Kirill A. Shutemov @ 2023-01-19 23:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Bharata B Rao, Jacob Pan, Ashok Raj,
	Linus Torvalds, linux-mm, linux-kernel, Sami Tolvanen,
	ndesaulniers, joao, Andrew Cooper

On Tue, Jan 17, 2023 at 04:02:06PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 17, 2023 at 04:57:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jan 17, 2023 at 02:05:22PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 11, 2023 at 03:37:27PM +0300, Kirill A. Shutemov wrote:
> > > 
> > > >  #define __untagged_addr(untag_mask, addr)
> > > >  	u64 __addr = (__force u64)(addr);				\
> > > > -	s64 sign = (s64)__addr >> 63;					\
> > > > -	__addr &= untag_mask | sign;					\
> > > > +	if (static_branch_likely(&tagged_addr_key)) {			\
> > > > +		s64 sign = (s64)__addr >> 63;				\
> > > > +		__addr &= untag_mask | sign;				\
> > > > +	}								\
> > > >  	(__force __typeof__(addr))__addr;				\
> > > >  })
> > > >  
> > > > #define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
> > > 
> > > Is the compiler clever enough to put the memop inside the branch?
> > 
> > Hm. You mean current_untag_mask() inside static_branch_likely()?
> > 
> > But it is preprocessor who does this, not compiler. So, yes, the memop is
> > inside the branch.
> > 
> > Or I didn't understand your question.
> 
> Nah, call it a pre-lunch dip, I overlooked the whole CPP angle -- d'0h.
> 
> That said, I did just put it through a compiler to see wth it did and it
> is pretty gross:

I tried to replace static branch with alternative. It kinda works, but
required few hack. Thanks to Andrew Cooper for helping to untangle them.

I am not sure if it worth the effort. I don't have any evidence that it
helps. untagged_addr() overhead is rather small and hides in noise of
syscall cost.

I only made alternative for untagged_addr(), but not for
untagged_addr_remote(). _remote() case has very few users.

BTW, it would be nice to be able to apply alternative later, delaying it
until the first user of LAM, like I did with static_branch.
We don't have a way to do this right?

Any opinions? I am okay dropping the patch altogether.

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index c44b56f7ffba..3f0c31044f02 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -75,6 +75,12 @@
 # define DISABLE_CALL_DEPTH_TRACKING	(1 << (X86_FEATURE_CALL_DEPTH & 31))
 #endif
 
+#ifdef CONFIG_ADDRESS_MASKING
+# define DISABLE_LAM		0
+#else
+# define DISABLE_LAM		(1 << (X86_FEATURE_LAM & 31))
+#endif
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 # define DISABLE_ENQCMD		0
 #else
@@ -115,7 +121,7 @@
 #define DISABLED_MASK10	0
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
 			 DISABLE_CALL_DEPTH_TRACKING)
-#define DISABLED_MASK12	0
+#define DISABLED_MASK12	(DISABLE_LAM)
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f9f85d596581..57ccb91fcccf 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -9,6 +9,7 @@
 #include <linux/kasan-checks.h>
 #include <linux/mm_types.h>
 #include <linux/string.h>
+#include <linux/mmap_lock.h>
 #include <asm/asm.h>
 #include <asm/page.h>
 #include <asm/smap.h>
@@ -24,28 +25,48 @@ static inline bool pagefault_disabled(void);
 #endif
 
 #ifdef CONFIG_ADDRESS_MASKING
-DECLARE_STATIC_KEY_FALSE(tagged_addr_key);
+static inline unsigned long __untagged_addr(unsigned long addr)
+{
+	/*
+	 * Magic with the 'sign' allows to untag userspace pointer without
+	 * any branches while leaving kernel addresses intact.
+	 */
+	long sign;
+
+	/*
+	 * Refer tlbstate_untag_mask directly to avoid RIP-relative relocation
+	 * in alternative instructions. The relocation gets wrong when gets
+	 * copied to the target place.
+	 */
+	asm (ALTERNATIVE("",
+			 "sar $63, %[sign]\n\t" /* user_ptr ? 0 : -1UL */
+			 "or %%gs:tlbstate_untag_mask, %[sign]\n\t"
+			 "and %[sign], %[addr]\n\t", X86_FEATURE_LAM)
+	     : [addr] "+r" (addr), [sign] "=r" (sign)
+	     : "m" (tlbstate_untag_mask), "[sign]" (addr));
+
+	return addr;
+}
 
-/*
- * Mask out tag bits from the address.
- *
- * Magic with the 'sign' allows to untag userspace pointer without any branches
- * while leaving kernel addresses intact.
- */
-#define __untagged_addr(untag_mask, addr)	({			\
-	u64 __addr = (__force u64)(addr);				\
-	if (static_branch_likely(&tagged_addr_key)) {			\
-		s64 sign = (s64)__addr >> 63;				\
-		__addr &= untag_mask | sign;				\
-	}								\
-	(__force __typeof__(addr))__addr;				\
+#define untagged_addr(addr)	({					\
+	unsigned long __addr = (__force unsigned long)(addr);		\
+	(__force __typeof__(addr))__untagged_addr(__addr);		\
 })
 
-#define untagged_addr(addr) __untagged_addr(current_untag_mask(), addr)
+static inline unsigned long __untagged_addr_remote(struct mm_struct *mm,
+						   unsigned long addr)
+{
+	long sign = addr >> 63;
+
+	mmap_assert_locked(mm);
+	addr &= (mm)->context.untag_mask | sign;
+
+	return addr;
+}
 
 #define untagged_addr_remote(mm, addr)	({				\
-	mmap_assert_locked(mm);						\
-	__untagged_addr((mm)->context.untag_mask, addr);		\
+	unsigned long __addr = (__force unsigned long)(addr);		\
+	(__force __typeof__(addr))__untagged_addr_remote(mm, __addr);	\
 })
 
 #else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 0831d2be190f..e006725afdf1 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -745,9 +745,6 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 
 #ifdef CONFIG_ADDRESS_MASKING
 
-DEFINE_STATIC_KEY_FALSE(tagged_addr_key);
-EXPORT_SYMBOL_GPL(tagged_addr_key);
-
 #define LAM_U57_BITS 6
 
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
@@ -787,8 +784,6 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags);
 
 	mmap_write_unlock(mm);
-
-	static_branch_enable(&tagged_addr_key);
 	return 0;
 }
 #endif
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-01-19 23:13 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-11 12:37 [PATCHv14 00/17] Linear Address Masking enabling Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 01/17] x86/mm: Rework address range check in get_user() and put_user() Kirill A. Shutemov
2023-01-18 15:49   ` Peter Zijlstra
2023-01-18 15:59     ` Linus Torvalds
2023-01-18 16:48       ` Peter Zijlstra
2023-01-18 17:01         ` Linus Torvalds
2023-01-11 12:37 ` [PATCHv14 02/17] x86: Allow atomic MM_CONTEXT flags setting Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 03/17] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 04/17] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2023-01-11 13:49   ` Linus Torvalds
2023-01-11 14:14     ` Kirill A. Shutemov
2023-01-11 14:37       ` [PATCHv14.1 " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 05/17] mm: Introduce untagged_addr_remote() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 06/17] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 07/17] x86/mm: Provide arch_prctl() interface for LAM Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 08/17] x86/mm: Reduce untagged_addr() overhead until the first LAM user Kirill A. Shutemov
2023-01-17 13:05   ` Peter Zijlstra
2023-01-17 13:57     ` Kirill A. Shutemov
2023-01-17 15:02       ` Peter Zijlstra
2023-01-17 17:18         ` Linus Torvalds
2023-01-17 17:28           ` Linus Torvalds
2023-01-17 18:26             ` Nick Desaulniers
2023-01-17 18:33               ` Linus Torvalds
2023-01-17 19:17                 ` Nick Desaulniers
2023-01-17 20:10                   ` Linus Torvalds
2023-01-17 20:43                     ` Linus Torvalds
2023-01-17 18:14           ` Peter Zijlstra
2023-01-17 18:21           ` Peter Zijlstra
2023-01-19 23:06         ` Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 09/17] mm: Expose untagging mask in /proc/$PID/status Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 10/17] iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 11/17] x86/mm/iommu/sva: Make LAM and SVA mutually exclusive Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 12/17] selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 13/17] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 14/17] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 15/17] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 16/17] selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA " Kirill A. Shutemov
2023-01-11 12:37 ` [PATCHv14 17/17] selftests/x86/lam: Add test cases for LAM vs thread creation Kirill A. Shutemov
2023-01-18 16:49 ` [PATCHv14 00/17] Linear Address Masking enabling Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.