linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 00/13] Linear Address Masking enabling
@ 2022-07-12 23:13 Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Linear Address Masking[1] (LAM) modifies the checking that is applied to
64-bit linear addresses, allowing software to use of the untranslated
address bits for metadata.

The patchset brings support for LAM for userspace addresses.

LAM_U48 enabling is controversial since it competes for bits with
5-level paging. Its enabling isolated into an optional last patch that
can be applied at maintainer's discretion.

Please review and consider applying.

v5:
  - Do not use switch_mm() in enable_lam_func()
  - Use mb()/READ_ONCE() pair on LAM enabling;
  - Add self-test by Weihong Zhang;
  - Add comments;
v4:
  - Fix untagged_addr() for LAM_U48;
  - Remove no-threads restriction on LAM enabling;
  - Fix mm_struct access from /proc/$PID/arch_status
  - Fix LAM handling in initialize_tlbstate_and_flush()
  - Pack tlb_state better;
  - Comments and commit messages;
v3:
  - Rebased onto v5.19-rc1
  - Per-process enabling;
  - API overhaul (again);
  - Avoid branches and costly computations in the fast path;
  - LAM_U48 is in optional patch.
v2:
  - Rebased onto v5.18-rc1
  - New arch_prctl(2)-based API
  - Expose status of LAM (or other thread features) in
    /proc/$PID/arch_status

[1] ISE, Chapter 10. https://cdrdv2.intel.com/v1/dl/getContent/671368

Kirill A. Shutemov (8):
  x86/mm: Fix CR3_ADDR_MASK
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  mm: Pass down mm_struct to untagged_addr()
  x86/mm: Handle LAM on context switch
  x86/uaccess: Provide untagged_addr() and remove tags before address
    check
  x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  x86: Expose untagging mask in /proc/$PID/arch_status
  x86/mm: Extend LAM to support to LAM_U48

Weihong Zhang (5):
  selftests/x86/lam: Add malloc test cases for linear-address masking
  selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address
    masking
  selftests/x86/lam: Add io_uring test cases for linear-address masking
  selftests/x86/lam: Add inherit test cases for linear-address masking
  selftests/x86/lam: Add tests cases for LAM_U48

 arch/arm64/include/asm/memory.h               |   4 +-
 arch/arm64/include/asm/signal.h               |   2 +-
 arch/arm64/include/asm/uaccess.h              |   4 +-
 arch/arm64/kernel/hw_breakpoint.c             |   2 +-
 arch/arm64/kernel/traps.c                     |   4 +-
 arch/arm64/mm/fault.c                         |  10 +-
 arch/sparc/include/asm/pgtable_64.h           |   2 +-
 arch/sparc/include/asm/uaccess_64.h           |   2 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/elf.h                    |   3 +-
 arch/x86/include/asm/mmu.h                    |   6 +
 arch/x86/include/asm/mmu_context.h            |  58 ++
 arch/x86/include/asm/processor-flags.h        |   2 +-
 arch/x86/include/asm/tlbflush.h               |  36 +
 arch/x86/include/asm/uaccess.h                |  42 +-
 arch/x86/include/uapi/asm/prctl.h             |   3 +
 arch/x86/include/uapi/asm/processor-flags.h   |   6 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/fpu/xstate.c                  |  47 -
 arch/x86/kernel/proc.c                        |  60 ++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/process_64.c                  |  83 +-
 arch/x86/kernel/sys_x86_64.c                  |   5 +-
 arch/x86/mm/hugetlbpage.c                     |   6 +-
 arch/x86/mm/mmap.c                            |  10 +-
 arch/x86/mm/tlb.c                             |  42 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   2 +-
 drivers/gpu/drm/radeon/radeon_gem.c           |   2 +-
 drivers/infiniband/hw/mlx4/mr.c               |   2 +-
 drivers/media/common/videobuf2/frame_vector.c |   2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |   2 +-
 .../staging/media/atomisp/pci/hmm/hmm_bo.c    |   2 +-
 drivers/tee/tee_shm.c                         |   2 +-
 drivers/vfio/vfio_iommu_type1.c               |   2 +-
 fs/proc/task_mmu.c                            |   2 +-
 include/linux/mm.h                            |  11 -
 include/linux/uaccess.h                       |  15 +
 lib/strncpy_from_user.c                       |   2 +-
 lib/strnlen_user.c                            |   2 +-
 mm/gup.c                                      |   6 +-
 mm/madvise.c                                  |   2 +-
 mm/mempolicy.c                                |   6 +-
 mm/migrate.c                                  |   2 +-
 mm/mincore.c                                  |   2 +-
 mm/mlock.c                                    |   4 +-
 mm/mmap.c                                     |   2 +-
 mm/mprotect.c                                 |   2 +-
 mm/mremap.c                                   |   2 +-
 mm/msync.c                                    |   2 +-
 tools/testing/selftests/x86/Makefile          |   2 +-
 tools/testing/selftests/x86/lam.c             | 913 ++++++++++++++++++
 virt/kvm/kvm_main.c                           |   2 +-
 53 files changed, 1315 insertions(+), 127 deletions(-)
 create mode 100644 arch/x86/kernel/proc.c
 create mode 100644 tools/testing/selftests/x86/lam.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-21 13:10   ` Alexander Potapenko
  2022-07-29  3:00   ` Hu, Robert
  2022-07-12 23:13 ` [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

The mask must not include bits above physical address mask. These bits
are reserved and can be used for other things. Bits 61 and 62 are used
for Linear Address Masking.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/processor-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 02c2cbda4a74..a7f3d9100adb 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -35,7 +35,7 @@
  */
 #ifdef CONFIG_X86_64
 /* Mask off the address space ID and SME encryption bits. */
-#define CR3_ADDR_MASK	__sme_clr(0x7FFFFFFFFFFFF000ull)
+#define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-21 13:10   ` Alexander Potapenko
  2022-07-12 23:13 ` [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Enumerate Linear Address Masking and provide defines for CR3 and CR4
flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 03acc823838a..6ad5841e087f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -300,6 +300,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index c47cc7f2feeb..d898432947ff 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -82,6 +82,10 @@
 #define X86_CR3_PCID_BITS	12
 #define X86_CR3_PCID_MASK	(_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
 
+#define X86_CR3_LAM_U57_BIT	61 /* Activate LAM for userspace, 62:57 bits masked */
+#define X86_CR3_LAM_U57		_BITULL(X86_CR3_LAM_U57_BIT)
+#define X86_CR3_LAM_U48_BIT	62 /* Activate LAM for userspace, 62:48 bits masked */
+#define X86_CR3_LAM_U48		_BITULL(X86_CR3_LAM_U48_BIT)
 #define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
 #define X86_CR3_PCID_NOFLUSH    _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
 
@@ -132,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
+#define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr()
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-21 13:12   ` Alexander Potapenko
  2022-07-12 23:13 ` [PATCHv5 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Intel Linear Address Masking (LAM) brings per-mm untagging rules. Pass
down mm_struct to the untagging helper. It will help to apply untagging
policy correctly.

In most cases, current->mm is the one to use, but there are some
exceptions, such as get_user_page_remote().

Move dummy implementation of untagged_addr() from <linux/mm.h> to
<linux/uaccess.h>. <asm/uaccess.h> can override the implementation.
Moving the dummy header outside <linux/mm.h> helps to avoid header hell
if you need to defer mm_struct within the helper.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/arm64/include/asm/memory.h                  |  4 ++--
 arch/arm64/include/asm/signal.h                  |  2 +-
 arch/arm64/include/asm/uaccess.h                 |  4 ++--
 arch/arm64/kernel/hw_breakpoint.c                |  2 +-
 arch/arm64/kernel/traps.c                        |  4 ++--
 arch/arm64/mm/fault.c                            | 10 +++++-----
 arch/sparc/include/asm/pgtable_64.h              |  2 +-
 arch/sparc/include/asm/uaccess_64.h              |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          |  2 +-
 drivers/gpu/drm/radeon/radeon_gem.c              |  2 +-
 drivers/infiniband/hw/mlx4/mr.c                  |  2 +-
 drivers/media/common/videobuf2/frame_vector.c    |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c    |  2 +-
 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c   |  2 +-
 drivers/tee/tee_shm.c                            |  2 +-
 drivers/vfio/vfio_iommu_type1.c                  |  2 +-
 fs/proc/task_mmu.c                               |  2 +-
 include/linux/mm.h                               | 11 -----------
 include/linux/uaccess.h                          | 15 +++++++++++++++
 lib/strncpy_from_user.c                          |  2 +-
 lib/strnlen_user.c                               |  2 +-
 mm/gup.c                                         |  6 +++---
 mm/madvise.c                                     |  2 +-
 mm/mempolicy.c                                   |  6 +++---
 mm/migrate.c                                     |  2 +-
 mm/mincore.c                                     |  2 +-
 mm/mlock.c                                       |  4 ++--
 mm/mmap.c                                        |  2 +-
 mm/mprotect.c                                    |  2 +-
 mm/mremap.c                                      |  2 +-
 mm/msync.c                                       |  2 +-
 virt/kvm/kvm_main.c                              |  2 +-
 33 files changed, 59 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..88bee513b74c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -215,8 +215,8 @@ static inline unsigned long kaslr_offset(void)
 #define __untagged_addr(addr)	\
 	((__force __typeof__(addr))sign_extend64((__force u64)(addr), 55))
 
-#define untagged_addr(addr)	({					\
-	u64 __addr = (__force u64)(addr);					\
+#define untagged_addr(mm, addr)	({					\
+	u64 __addr = (__force u64)(addr);				\
 	__addr &= __untagged_addr(__addr);				\
 	(__force __typeof__(addr))__addr;				\
 })
diff --git a/arch/arm64/include/asm/signal.h b/arch/arm64/include/asm/signal.h
index ef449f5f4ba8..0899c355c398 100644
--- a/arch/arm64/include/asm/signal.h
+++ b/arch/arm64/include/asm/signal.h
@@ -18,7 +18,7 @@ static inline void __user *arch_untagged_si_addr(void __user *addr,
 	if (sig == SIGTRAP && si_code == TRAP_BRKPT)
 		return addr;
 
-	return untagged_addr(addr);
+	return untagged_addr(current->mm, addr);
 }
 #define arch_untagged_si_addr arch_untagged_si_addr
 
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 63f9c828f1a7..bdcc014bd297 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -44,7 +44,7 @@ static inline int access_ok(const void __user *addr, unsigned long size)
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
 	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
-		addr = untagged_addr(addr);
+		addr = untagged_addr(current->mm, addr);
 
 	return likely(__access_ok(addr, size));
 }
@@ -217,7 +217,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	"	csel	%0, %1, xzr, eq\n"
 	: "=&r" (safe_ptr)
 	: "r" (ptr), "r" (TASK_SIZE_MAX - 1),
-	  "r" (untagged_addr(ptr))
+	  "r" (untagged_addr(current->mm, ptr))
 	: "cc");
 
 	csdb();
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index b29a311bb055..d637cee7b771 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -715,7 +715,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
 	u64 wp_low, wp_high;
 	u32 lens, lene;
 
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 
 	lens = __ffs(ctrl->len);
 	lene = __fls(ctrl->len);
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9ac7a81b79be..385612d9890b 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -476,7 +476,7 @@ void arm64_notify_segfault(unsigned long addr)
 	int code;
 
 	mmap_read_lock(current->mm);
-	if (find_vma(current->mm, untagged_addr(addr)) == NULL)
+	if (find_vma(current->mm, untagged_addr(current->mm, addr)) == NULL)
 		code = SEGV_MAPERR;
 	else
 		code = SEGV_ACCERR;
@@ -540,7 +540,7 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
 	int ret = 0;
 
 	tagged_address = pt_regs_read_reg(regs, rt);
-	address = untagged_addr(tagged_address);
+	address = untagged_addr(current->mm, tagged_address);
 
 	switch (crm) {
 	case ESR_ELx_SYS64_ISS_CRM_DC_CVAU:	/* DC CVAU, gets promoted */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c5e11768e5c1..9577d7e37f36 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -454,7 +454,7 @@ static void set_thread_esr(unsigned long address, unsigned long esr)
 static void do_bad_area(unsigned long far, unsigned long esr,
 			struct pt_regs *regs)
 {
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	/*
 	 * If we are in kernel mode at this point, we have no context to
@@ -524,7 +524,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	vm_fault_t fault;
 	unsigned long vm_flags;
 	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(mm, far);
 
 	if (kprobe_page_fault(regs, esr))
 		return 0;
@@ -675,7 +675,7 @@ static int __kprobes do_translation_fault(unsigned long far,
 					  unsigned long esr,
 					  struct pt_regs *regs)
 {
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	if (is_ttbr0_addr(addr))
 		return do_page_fault(far, esr, regs);
@@ -719,7 +719,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
 		 * UNKNOWN for synchronous external aborts. Mask them out now
 		 * so that userspace doesn't see them.
 		 */
-		siaddr  = untagged_addr(far);
+		siaddr  = untagged_addr(current->mm, far);
 	}
 	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
 
@@ -809,7 +809,7 @@ static const struct fault_info fault_info[] = {
 void do_mem_abort(unsigned long far, unsigned long esr, struct pt_regs *regs)
 {
 	const struct fault_info *inf = esr_to_fault_info(esr);
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	if (!inf->fn(far, esr, regs))
 		return;
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 4679e45c8348..1336d7bfaab9 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1071,7 +1071,7 @@ static inline unsigned long __untagged_addr(unsigned long start)
 
 	return start;
 }
-#define untagged_addr(addr) \
+#define untagged_addr(mm, addr) \
 	((__typeof__(addr))(__untagged_addr((unsigned long)(addr))))
 
 static inline bool pte_access_permitted(pte_t pte, bool write)
diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index 94266a5c5b04..b825a5dd0210 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -8,8 +8,10 @@
 
 #include <linux/compiler.h>
 #include <linux/string.h>
+#include <linux/mm_types.h>
 #include <asm/asi.h>
 #include <asm/spitfire.h>
+#include <asm/pgtable.h>
 
 #include <asm/processor.h>
 #include <asm-generic/access_ok.h>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6b6d46e29e6e..b37199b16643 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1491,7 +1491,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
 			if (!offset || !*offset)
 				return -EINVAL;
-			user_addr = untagged_addr(*offset);
+			user_addr = untagged_addr(current->mm, *offset);
 		} else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
 				    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
 			bo_type = ttm_bo_type_sg;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8ef31d687ef3..691dfb3f2c0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -382,7 +382,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	uint32_t handle;
 	int r;
 
-	args->addr = untagged_addr(args->addr);
+	args->addr = untagged_addr(current->mm, args->addr);
 
 	if (offset_in_page(args->addr | args->size))
 		return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 8c01a7f0e027..2c3980677f64 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -371,7 +371,7 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	uint32_t handle;
 	int r;
 
-	args->addr = untagged_addr(args->addr);
+	args->addr = untagged_addr(current->mm, args->addr);
 
 	if (offset_in_page(args->addr | args->size))
 		return -EINVAL;
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 04a67b481608..b2860feeae3c 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -379,7 +379,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_device *device, u64 start,
 	 * again
 	 */
 	if (!ib_access_writable(access_flags)) {
-		unsigned long untagged_start = untagged_addr(start);
+		unsigned long untagged_start = untagged_addr(current->mm, start);
 		struct vm_area_struct *vma;
 
 		mmap_read_lock(current->mm);
diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 542dde9d2609..7e62f7a2555d 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
 		nr_frames = vec->nr_allocated;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	ret = pin_user_pages_fast(start, nr_frames,
 				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..a1444f8afa05 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -157,8 +157,8 @@ static void videobuf_dma_contig_user_put(struct videobuf_dma_contig_memory *mem)
 static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 					struct videobuf_buffer *vb)
 {
-	unsigned long untagged_baddr = untagged_addr(vb->baddr);
 	struct mm_struct *mm = current->mm;
+	unsigned long untagged_baddr = untagged_addr(mm, vb->baddr);
 	struct vm_area_struct *vma;
 	unsigned long prev_pfn, this_pfn;
 	unsigned long pages_done, user_address;
diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
index 0168f9839c90..863d30a7ad23 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
@@ -913,7 +913,7 @@ static int alloc_user_pages(struct hmm_buffer_object *bo,
 	 * and map to user space
 	 */
 
-	userptr = untagged_addr(userptr);
+	userptr = untagged_addr(current->mm, userptr);
 
 	bo->pages = pages;
 
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index f2b1bcefcadd..386be09cb2cd 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -261,7 +261,7 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
 	shm->flags = flags;
 	shm->ctx = ctx;
 	shm->id = id;
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 	start = rounddown(addr, PAGE_SIZE);
 	shm->offset = addr - start;
 	shm->size = length;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..5ac6c61d7caa 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -561,7 +561,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 		goto done;
 	}
 
-	vaddr = untagged_addr(vaddr);
+	vaddr = untagged_addr(mm, vaddr);
 
 retry:
 	vma = vma_lookup(mm, vaddr);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2d04e3470d4c..c7d262bd6d6b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1659,7 +1659,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	/* watch out for wraparound */
 	start_vaddr = end_vaddr;
 	if (svpfn <= (ULONG_MAX >> PAGE_SHIFT))
-		start_vaddr = untagged_addr(svpfn << PAGE_SHIFT);
+		start_vaddr = untagged_addr(mm, svpfn << PAGE_SHIFT);
 
 	/* Ensure the address is inside the task */
 	if (start_vaddr > mm->task_size)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..f0cb92ff1391 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -94,17 +94,6 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #include <asm/page.h>
 #include <asm/processor.h>
 
-/*
- * Architectures that support memory tagging (assigning tags to memory regions,
- * embedding these tags into addresses that point to these memory regions, and
- * checking that the memory and the pointer tags match on memory accesses)
- * redefine this macro to strip tags from pointers.
- * It's defined as noop for architectures that don't support memory tagging.
- */
-#ifndef untagged_addr
-#define untagged_addr(addr) (addr)
-#endif
-
 #ifndef __pa_symbol
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf02b75..46fd816179d7 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -10,6 +10,21 @@
 
 #include <asm/uaccess.h>
 
+/*
+ * Architectures that support memory tagging (assigning tags to memory regions,
+ * embedding these tags into addresses that point to these memory regions, and
+ * checking that the memory and the pointer tags match on memory accesses)
+ * redefine this macro to strip tags from pointers.
+ *
+ * Passing down mm_struct allows to define untagging rules on per-process
+ * basis.
+ *
+ * It's defined as noop for architectures that don't support memory tagging.
+ */
+#ifndef untagged_addr
+#define untagged_addr(mm, addr) (addr)
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index 6432b8c3e431..6e1e2aa0c994 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -121,7 +121,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
 		return 0;
 
 	max_addr = TASK_SIZE_MAX;
-	src_addr = (unsigned long)untagged_addr(src);
+	src_addr = (unsigned long)untagged_addr(current->mm, src);
 	if (likely(src_addr < max_addr)) {
 		unsigned long max = max_addr - src_addr;
 		long retval;
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index feeb935a2299..abc096a68f05 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -97,7 +97,7 @@ long strnlen_user(const char __user *str, long count)
 		return 0;
 
 	max_addr = TASK_SIZE_MAX;
-	src_addr = (unsigned long)untagged_addr(str);
+	src_addr = (unsigned long)untagged_addr(current->mm, str);
 	if (likely(src_addr < max_addr)) {
 		unsigned long max = max_addr - src_addr;
 		long retval;
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..dbe825faf842 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1104,7 +1104,7 @@ static long __get_user_pages(struct mm_struct *mm,
 	if (!nr_pages)
 		return 0;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));
 
@@ -1285,7 +1285,7 @@ int fixup_user_fault(struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	vm_fault_t ret;
 
-	address = untagged_addr(address);
+	address = untagged_addr(mm, address);
 
 	if (unlocked)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -2865,7 +2865,7 @@ static int internal_get_user_pages_fast(unsigned long start,
 	if (!(gup_flags & FOLL_FAST_ONLY))
 		might_lock_read(&current->mm->mmap_lock);
 
-	start = untagged_addr(start) & PAGE_MASK;
+	start = untagged_addr(current->mm, start) & PAGE_MASK;
 	len = nr_pages << PAGE_SHIFT;
 	if (check_add_overflow(start, len, &end))
 		return 0;
diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..e3c668ddb099 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1373,7 +1373,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	size_t len;
 	struct blk_plug plug;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	if (!madvise_behavior_valid(behavior))
 		return -EINVAL;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..a03b4d2bc26a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1458,7 +1458,7 @@ static long kernel_mbind(unsigned long start, unsigned long len,
 	int lmode = mode;
 	int err;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 	err = sanitize_mpol_flags(&lmode, &mode_flags);
 	if (err)
 		return err;
@@ -1481,7 +1481,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
 	unsigned long end;
 	int err = -ENOENT;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
 	/*
@@ -1684,7 +1684,7 @@ static int kernel_get_mempolicy(int __user *policy,
 	if (nmask != NULL && maxnode < nr_node_ids)
 		return -EINVAL;
 
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 
 	err = do_get_mempolicy(&pval, &nodes, addr, flags);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..af05049b055b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1714,7 +1714,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			goto out_flush;
 		if (get_user(node, nodes + i))
 			goto out_flush;
-		addr = (unsigned long)untagged_addr(p);
+		addr = (unsigned long)untagged_addr(mm, p);
 
 		err = -ENODEV;
 		if (node < 0 || node >= MAX_NUMNODES)
diff --git a/mm/mincore.c b/mm/mincore.c
index fa200c14185f..72c55bd9d184 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -236,7 +236,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
 	unsigned long pages;
 	unsigned char *tmp;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	/* Check the start address: needs to be page-aligned.. */
 	if (start & ~PAGE_MASK)
diff --git a/mm/mlock.c b/mm/mlock.c
index 716caf851043..054168d3e648 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -571,7 +571,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
 	unsigned long lock_limit;
 	int error = -ENOMEM;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	if (!can_do_mlock())
 		return -EPERM;
@@ -634,7 +634,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
 	int ret;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	len = PAGE_ALIGN(len + (offset_in_page(start)));
 	start &= PAGE_MASK;
diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..1a7baf6b6b8e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2926,7 +2926,7 @@ EXPORT_SYMBOL(vm_munmap);
 
 SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 {
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 	return __vm_munmap(addr, len, true);
 }
 
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ba5592655ee3..871e954f6155 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -622,7 +622,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 				(prot & PROT_READ);
 	struct mmu_gather tlb;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
 	if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
diff --git a/mm/mremap.c b/mm/mremap.c
index b522cd0259a0..f76648bc4f67 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -906,7 +906,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	 *
 	 * See Documentation/arm64/tagged-address-abi.rst for more information.
 	 */
-	addr = untagged_addr(addr);
+	addr = untagged_addr(mm, addr);
 
 	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
 		return ret;
diff --git a/mm/msync.c b/mm/msync.c
index 137d1c104f3e..5fe989bd3c4b 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -37,7 +37,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 	int unmapped_error = 0;
 	int error = -EINVAL;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
 		goto out;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a49df8988cd6..03f7ad0ebc8a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1876,7 +1876,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		return -EINVAL;
 	/* We can read the guest memory with __xxx_user() later on. */
 	if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
-	    (mem->userspace_addr != untagged_addr(mem->userspace_addr)) ||
+	    (mem->userspace_addr != untagged_addr(kvm->mm, mem->userspace_addr)) ||
 	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
 			mem->memory_size))
 		return -EINVAL;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 04/13] x86/mm: Handle LAM on context switch
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Linear Address Masking mode for userspace pointers encoded in CR3 bits.
The mode is selected per-thread. Add new thread features indicate that the
thread has Linear Address Masking enabled.

switch_mm_irqs_off() now respects these flags and constructs CR3
accordingly.

The active LAM mode gets recorded in the tlb_state.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         |  3 +++
 arch/x86/include/asm/mmu_context.h | 24 +++++++++++++++++
 arch/x86/include/asm/tlbflush.h    | 36 +++++++++++++++++++++++++
 arch/x86/mm/tlb.c                  | 42 +++++++++++++++++++-----------
 4 files changed, 90 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..002889ca8978 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -40,6 +40,9 @@ typedef struct {
 
 #ifdef CONFIG_X86_64
 	unsigned short flags;
+
+	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
+	unsigned long lam_cr3_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..69c943b2ae90 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return mm->context.lam_cr3_mask;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+}
+
+#else
+
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+}
+#endif
+
 #define enter_lazy_tlb enter_lazy_tlb
 extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
 
@@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	arch_dup_pkeys(oldmm, mm);
 	paravirt_arch_dup_mmap(oldmm, mm);
+	dup_lam(oldmm, mm);
 	return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4af5579c7ef7..66db94d4daf4 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -100,6 +100,16 @@ struct tlb_state {
 	 */
 	bool invalidate_other;
 
+#ifdef CONFIG_X86_64
+	/*
+	 * Active LAM mode.
+	 *
+	 * X86_CR3_LAM_U57/U48 shifted right by X86_CR3_LAM_U57_BIT or 0 if LAM
+	 * disabled.
+	 */
+	u8 lam;
+#endif
+
 	/*
 	 * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
 	 * the corresponding user PCID needs a flush next time we
@@ -363,4 +373,30 @@ static inline void __native_tlb_flush_global(unsigned long cr4)
 	native_write_cr4(cr4 ^ X86_CR4_PGE);
 	native_write_cr4(cr4);
 }
+
+#ifdef CONFIG_X86_64
+static inline unsigned long tlbstate_lam_cr3_mask(void)
+{
+	unsigned long lam = this_cpu_read(cpu_tlbstate.lam);
+
+	return lam << X86_CR3_LAM_U57_BIT;
+}
+
+static inline void set_tlbstate_cr3_lam_mask(unsigned long mask)
+{
+	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
+}
+
+#else
+
+static inline unsigned long tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_cr3_lam_mask(u64 mask)
+{
+}
+#endif
+
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..4c93f87a8928 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -154,17 +154,18 @@ static inline u16 user_pcid(u16 asid)
 	return ret;
 }
 
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam)
 {
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
+		return __sme_pa(pgd) | kern_pcid(asid) | lam;
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
+		return __sme_pa(pgd) | lam;
 	}
 }
 
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid,
+					      unsigned long lam)
 {
 	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	/*
@@ -173,7 +174,7 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 	 * boot because all CPU's the have same capabilities:
 	 */
 	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+	return __sme_pa(pgd) | kern_pcid(asid) | lam | CR3_NOFLUSH;
 }
 
 /*
@@ -274,15 +275,16 @@ static inline void invalidate_user_asid(u16 asid)
 		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
 }
 
-static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam,
+			    bool need_flush)
 {
 	unsigned long new_mm_cr3;
 
 	if (need_flush) {
 		invalidate_user_asid(new_asid);
-		new_mm_cr3 = build_cr3(pgdir, new_asid);
+		new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
 	} else {
-		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
 	}
 
 	/*
@@ -491,6 +493,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long prev_lam = tlbstate_lam_cr3_mask();
+	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
@@ -520,7 +524,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, prev_lam))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -622,15 +626,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	set_tlbstate_cr3_lam_mask(new_lam);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-		load_new_mm_cr3(next->pgd, new_asid, true);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 	} else {
 		/* The new ASID is already up to date. */
-		load_new_mm_cr3(next->pgd, new_asid, false);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
@@ -691,6 +696,10 @@ void initialize_tlbstate_and_flush(void)
 	/* Assert that CR3 already references the right mm. */
 	WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
 
+	/* LAM expected to be disabled in CR3 and init_mm */
+	WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
+	WARN_ON(mm_lam_cr3_mask(&init_mm));
+
 	/*
 	 * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
 	 * doesn't work like other CR4 bits because it can only be set from
@@ -699,8 +708,8 @@ void initialize_tlbstate_and_flush(void)
 	WARN_ON(boot_cpu_has(X86_FEATURE_PCID) &&
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
-	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm->pgd, 0));
+	/* Disable LAM, force ASID 0 and force a TLB flush. */
+	write_cr3(build_cr3(mm->pgd, 0, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
@@ -708,6 +717,7 @@ void initialize_tlbstate_and_flush(void)
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
+	set_tlbstate_cr3_lam_mask(0);
 
 	for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
 		this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
@@ -1047,8 +1057,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
  */
 unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+	unsigned long cr3 =
+		build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid),
+		tlbstate_lam_cr3_mask());
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-13 15:02   ` [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
  2022-07-21 13:14   ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Alexander Potapenko
  2022-07-12 23:13 ` [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

untagged_addr() is a helper used by the core-mm to strip tag bits and
get the address to the canonical shape. In only handles userspace
addresses. The untagging mask is stored in mmu_context and will be set
on enabling LAM for the process.

The tags must not be included into check whether it's okay to access the
userspace address.

Strip tags in access_ok().

get_user() and put_user() don't use access_ok(), but check access
against TASK_SIZE directly in assembly. Strip tags, before calling into
the assembly helper.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         |  3 +++
 arch/x86/include/asm/mmu_context.h | 11 ++++++++
 arch/x86/include/asm/uaccess.h     | 42 +++++++++++++++++++++++++++---
 arch/x86/kernel/process.c          |  3 +++
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 002889ca8978..2fdb390040b5 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -43,6 +43,9 @@ typedef struct {
 
 	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
 	unsigned long lam_cr3_mask;
+
+	/* Significant bits of the virtual address. Excludes tag bits. */
+	u64 untag_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 69c943b2ae90..5bd3d46685dc 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -100,6 +100,12 @@ static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+	mm->context.untag_mask = oldmm->context.untag_mask;
+}
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+	mm->context.untag_mask = -1UL;
 }
 
 #else
@@ -112,6 +118,10 @@ static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 }
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+}
 #endif
 
 #define enter_lazy_tlb enter_lazy_tlb
@@ -138,6 +148,7 @@ static inline int init_new_context(struct task_struct *tsk,
 		mm->context.execute_only_pkey = -1;
 	}
 #endif
+	mm_reset_untag_mask(mm);
 	init_new_context_ldt(mm);
 	return 0;
 }
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 913e593a3b45..803241dfc473 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -6,6 +6,7 @@
  */
 #include <linux/compiler.h>
 #include <linux/kasan-checks.h>
+#include <linux/mm_types.h>
 #include <linux/string.h>
 #include <asm/asm.h>
 #include <asm/page.h>
@@ -20,6 +21,30 @@ static inline bool pagefault_disabled(void);
 # define WARN_ON_IN_IRQ()
 #endif
 
+#ifdef CONFIG_X86_64
+/*
+ * Mask out tag bits from the address.
+ *
+ * Magic with the 'sign' allows to untag userspace pointer without any branches
+ * while leaving kernel addresses intact.
+ */
+#define untagged_addr(mm, addr)	({					\
+	u64 __addr = (__force u64)(addr);				\
+	s64 sign = (s64)__addr >> 63;					\
+	__addr &= (mm)->context.untag_mask | sign;			\
+	(__force __typeof__(addr))__addr;				\
+})
+
+#define untagged_ptr(mm, ptr)	({					\
+	u64 __ptrval = (__force u64)(ptr);				\
+	__ptrval = untagged_addr(mm, __ptrval);				\
+	(__force __typeof__(*(ptr)) *)__ptrval;				\
+})
+#else
+#define untagged_addr(mm, addr)	(addr)
+#define untagged_ptr(mm, ptr)	(ptr)
+#endif
+
 /**
  * access_ok - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
@@ -40,7 +65,7 @@ static inline bool pagefault_disabled(void);
 #define access_ok(addr, size)					\
 ({									\
 	WARN_ON_IN_IRQ();						\
-	likely(__access_ok(addr, size));				\
+	likely(__access_ok(untagged_addr(current->mm, addr), size));	\
 })
 
 #include <asm-generic/access_ok.h>
@@ -125,7 +150,13 @@ extern int __get_user_bad(void);
  * Return: zero on success, or -EFAULT on error.
  * On error, the variable @x is set to zero.
  */
-#define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
+#define get_user(x,ptr)							\
+({									\
+	__typeof__(*(ptr)) __user *__ptr_clean;				\
+	__ptr_clean = untagged_ptr(current->mm, ptr);			\
+	might_fault();							\
+	do_get_user_call(get_user,x,__ptr_clean);			\
+})
 
 /**
  * __get_user - Get a simple variable from user space, with less checking.
@@ -222,7 +253,12 @@ extern void __put_user_nocheck_8(void);
  *
  * Return: zero on success, or -EFAULT on error.
  */
-#define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); })
+#define put_user(x, ptr) ({						\
+	__typeof__(*(ptr)) __user *__ptr_clean;				\
+	__ptr_clean = untagged_ptr(current->mm, ptr);			\
+	might_fault();							\
+	do_put_user_call(put_user,x,__ptr_clean);			\
+})
 
 /**
  * __put_user - Write a simple value into user space, with less checking.
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9b2772b7e1f3..18b2bfdf7b9b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,6 +47,7 @@
 #include <asm/frame.h>
 #include <asm/unwind.h>
 #include <asm/tdx.h>
+#include <asm/mmu_context.h>
 
 #include "process.h"
 
@@ -367,6 +368,8 @@ void arch_setup_new_exec(void)
 		task_clear_spec_ssb_noexec(current);
 		speculation_ctrl_update(read_thread_flags());
 	}
+
+	mm_reset_untag_mask(current->mm);
 }
 
 #ifdef CONFIG_X86_IOPL_IOPERM
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-18 17:47   ` Alexander Potapenko
  2022-07-12 23:13 ` [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Add a couple of arch_prctl() handles:

 - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
   of tag bits. It is rounded up to the nearest LAM mode that can
   provide it. For now only LAM_U57 is supported, with 6 tag bits.

 - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
   bits located in the address.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/uapi/asm/prctl.h |  3 ++
 arch/x86/kernel/process_64.c      | 60 ++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 500b96e71f18..38164a05c23c 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -20,4 +20,7 @@
 #define ARCH_MAP_VDSO_32		0x2002
 #define ARCH_MAP_VDSO_64		0x2003
 
+#define ARCH_GET_UNTAG_MASK		0x4001
+#define ARCH_ENABLE_TAGGED_ADDR		0x4002
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1962008fe743..82a19168bfa4 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -742,6 +742,60 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 }
 #endif
 
+static void enable_lam_func(void *mm)
+{
+	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	unsigned long lam_mask;
+	unsigned long cr3;
+
+	if (loaded_mm != mm)
+		return;
+
+	lam_mask = READ_ONCE(loaded_mm->context.lam_cr3_mask);
+
+	/* Update CR3 to get LAM active on the CPU */
+	cr3 = __read_cr3();
+	cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+	cr3 |= lam_mask;
+	write_cr3(cr3);
+	set_tlbstate_cr3_lam_mask(lam_mask);
+}
+
+static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
+{
+	int ret = 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_LAM))
+		return -ENODEV;
+
+	mutex_lock(&mm->context.lock);
+
+	/* Already enabled? */
+	if (mm->context.lam_cr3_mask) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (!nr_bits) {
+		ret = -EINVAL;
+		goto out;
+	} else if (nr_bits <= 6) {
+		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
+		mm->context.untag_mask =  ~GENMASK(62, 57);
+	} else {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Make lam_cr3_mask and untag_mask visible on other CPUs */
+	smp_mb();
+
+	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
+out:
+	mutex_unlock(&mm->context.lock);
+	return ret;
+}
+
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 {
 	int ret = 0;
@@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 	case ARCH_MAP_VDSO_64:
 		return prctl_map_vdso(&vdso_image_64, arg2);
 #endif
-
+	case ARCH_GET_UNTAG_MASK:
+		return put_user(task->mm->context.untag_mask,
+				(unsigned long __user *)arg2);
+	case ARCH_ENABLE_TAGGED_ADDR:
+		return prctl_enable_tagged_addr(task->mm, arg2);
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-21 13:47   ` Alexander Potapenko
  2022-07-12 23:13 ` [PATCHv5 08/13] selftests/x86/lam: Add malloc test cases for linear-address masking Kirill A. Shutemov
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

Add a line in /proc/$PID/arch_status to report untag_mask. It can be
used to find out LAM status of the process from the outside. It is
useful for debuggers.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu_context.h | 10 +++++
 arch/x86/kernel/Makefile           |  2 +
 arch/x86/kernel/fpu/xstate.c       | 47 -----------------------
 arch/x86/kernel/proc.c             | 60 ++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+), 47 deletions(-)
 create mode 100644 arch/x86/kernel/proc.c

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 5bd3d46685dc..b0e9ea23758b 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -103,6 +103,11 @@ static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 	mm->context.untag_mask = oldmm->context.untag_mask;
 }
 
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return mm->context.untag_mask;
+}
+
 static inline void mm_reset_untag_mask(struct mm_struct *mm)
 {
 	mm->context.untag_mask = -1UL;
@@ -119,6 +124,11 @@ static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 }
 
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return -1UL;
+}
+
 static inline void mm_reset_untag_mask(struct mm_struct *mm)
 {
 }
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4c8b6ae802ac..313f1d8e7783 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -141,6 +141,8 @@ obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev.o
 
+obj-$(CONFIG_PROC_FS)			+= proc.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..838a6f0627fd 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -10,8 +10,6 @@
 #include <linux/mman.h>
 #include <linux/nospec.h>
 #include <linux/pkeys.h>
-#include <linux/seq_file.h>
-#include <linux/proc_fs.h>
 #include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
@@ -1745,48 +1743,3 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
 		return -EINVAL;
 	}
 }
-
-#ifdef CONFIG_PROC_PID_ARCH_STATUS
-/*
- * Report the amount of time elapsed in millisecond since last AVX512
- * use in the task.
- */
-static void avx512_status(struct seq_file *m, struct task_struct *task)
-{
-	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
-	long delta;
-
-	if (!timestamp) {
-		/*
-		 * Report -1 if no AVX512 usage
-		 */
-		delta = -1;
-	} else {
-		delta = (long)(jiffies - timestamp);
-		/*
-		 * Cap to LONG_MAX if time difference > LONG_MAX
-		 */
-		if (delta < 0)
-			delta = LONG_MAX;
-		delta = jiffies_to_msecs(delta);
-	}
-
-	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
-	seq_putc(m, '\n');
-}
-
-/*
- * Report architecture specific information
- */
-int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
-			struct pid *pid, struct task_struct *task)
-{
-	/*
-	 * Report AVX512 state if the processor and build option supported.
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
-		avx512_status(m, task);
-
-	return 0;
-}
-#endif /* CONFIG_PROC_PID_ARCH_STATUS */
diff --git a/arch/x86/kernel/proc.c b/arch/x86/kernel/proc.c
new file mode 100644
index 000000000000..9765b4d05ce4
--- /dev/null
+++ b/arch/x86/kernel/proc.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/proc_fs.h>
+#include <linux/sched/mm.h>
+#include <linux/seq_file.h>
+#include <uapi/asm/prctl.h>
+#include <asm/mmu_context.h>
+
+/*
+ * Report the amount of time elapsed in millisecond since last AVX512
+ * use in the task.
+ */
+static void avx512_status(struct seq_file *m, struct task_struct *task)
+{
+	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+	long delta;
+
+	if (!timestamp) {
+		/*
+		 * Report -1 if no AVX512 usage
+		 */
+		delta = -1;
+	} else {
+		delta = (long)(jiffies - timestamp);
+		/*
+		 * Cap to LONG_MAX if time difference > LONG_MAX
+		 */
+		if (delta < 0)
+			delta = LONG_MAX;
+		delta = jiffies_to_msecs(delta);
+	}
+
+	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
+	seq_putc(m, '\n');
+}
+
+/*
+ * Report architecture specific information
+ */
+int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
+			struct pid *pid, struct task_struct *task)
+{
+	struct mm_struct *mm;
+	unsigned long untag_mask = -1UL;
+
+	/*
+	 * Report AVX512 state if the processor and build option supported.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
+		avx512_status(m, task);
+
+	mm = get_task_mm(task);
+	if (mm) {
+		untag_mask = mm_untag_mask(task->mm);
+		mmput(mm);
+	}
+
+	seq_printf(m, "untag_mask:\t%#lx\n", untag_mask);
+
+	return 0;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 08/13] selftests/x86/lam: Add malloc test cases for linear-address masking
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 09/13] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Weihong Zhang, Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM is supported only in 64-bit mode and applies only addresses used for data
accesses. In 64-bit mode, linear address have 64 bits. LAM is applied to 64-bit
linear address and allow software to use high bits for metadata.
LAM supports configurations that differ regarding which pointer bits are masked
and can be used for metadata.

LAM includes following mode:

 - LAM_U57, pointer bits in positions 62:57 are masked (LAM width 6),
   allows bits 62:57 of a user pointer to be used as metadata.

There are two arch_prctls:
ARCH_ENABLE_TAGGED_ADDR: enable LAM mode, mask high bits of a user pointer.
ARCH_GET_UNTAG_MASK: get current untagged mask.

The LAM mode is for pre-process, a process has only one chance to set LAM mode.
But there is no API to disable LAM mode. So all of test cases are run under
child process.

Functions of this test:

 - LAM_U57 masks bits 57:62 of a user pointer. Process on user space
   can dereference such pointers.

 - Disable LAM, dereference a pointer with metadata above 48 bit or 57 bit
   lead to trigger SIGSEGV.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/Makefile |   2 +-
 tools/testing/selftests/x86/lam.c    | 263 +++++++++++++++++++++++++++
 2 files changed, 264 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/lam.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 0388c4d60af0..c1a16a9d4f2f 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
 TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
-			corrupt_xstate_header amx
+			corrupt_xstate_header amx lam
 # Some selftests require 32bit support enabled also on 64bit systems
 TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
 
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
new file mode 100644
index 000000000000..4aaf6ad107c3
--- /dev/null
+++ b/tools/testing/selftests/x86/lam.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <signal.h>
+#include <setjmp.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <inttypes.h>
+
+#include "../kselftest.h"
+
+#ifndef __x86_64__
+# error This test is 64-bit only
+#endif
+
+/* LAM modes, these definitions were copied from kernel code */
+#define LAM_NONE                0
+#define LAM_U57_BITS            6
+/* arch prctl for LAM */
+#define ARCH_GET_UNTAG_MASK     0x4001
+#define ARCH_ENABLE_TAGGED_ADDR 0x4002
+
+/* Specified test function bits  */
+#define FUNC_MALLOC             0x1
+
+#define TEST_MASK               0x1
+
+#define MALLOC_LEN              32
+
+struct testcases {
+	unsigned int later;
+	int expected; /* 2: SIGSEGV Error; 1: other errors */
+	unsigned long lam;
+	uint64_t addr;
+	int (*test_func)(struct testcases *test);
+	const char *msg;
+};
+
+int tests_cnt;
+jmp_buf segv_env;
+
+static void segv_handler(int sig)
+{
+	ksft_print_msg("Get segmentation fault(%d).", sig);
+	siglongjmp(segv_env, 1);
+}
+
+static inline int cpu_has_lam(void)
+{
+	unsigned int cpuinfo[4];
+
+	__cpuid_count(0x7, 1, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
+
+	return (cpuinfo[0] & (1 << 26));
+}
+
+/*
+ * Set tagged address and read back untag mask.
+ * check if the untagged mask is expected.
+ */
+static int set_lam(unsigned long lam)
+{
+	int ret = 0;
+	uint64_t ptr = 0;
+
+	if (lam != LAM_U57_BITS && lam != LAM_NONE)
+		return -1;
+
+	/* Skip check return */
+	syscall(SYS_arch_prctl, ARCH_ENABLE_TAGGED_ADDR, lam);
+
+	/* Get untagged mask */
+	syscall(SYS_arch_prctl, ARCH_GET_UNTAG_MASK, &ptr);
+
+	/* Check mask returned is expected */
+	if (lam == LAM_U57_BITS)
+		ret = (ptr != ~(0x3fULL << 57));
+	else if (lam == LAM_NONE)
+		ret = (ptr != -1ULL);
+
+	return ret;
+}
+
+/* According to LAM mode, set metadata in high bits */
+static uint64_t get_metadata(uint64_t src, unsigned long lam)
+{
+	uint64_t metadata;
+
+	srand(time(NULL));
+	/* Get a random value as metadata */
+	metadata = rand();
+
+	switch (lam) {
+	case LAM_U57_BITS: /* Set metadata in bits 62:57 */
+		metadata = (src & ~(0x3fULL << 57)) | ((metadata & 0x3f) << 57);
+		break;
+	default:
+		metadata = src;
+		break;
+	}
+
+	return metadata;
+}
+
+/*
+ * Set metadata in user pointer, compare new pointer with original pointer.
+ * both pointers should point to the same address.
+ */
+static int handle_lam_test(void *src, unsigned int lam)
+{
+	char *ptr;
+
+	strcpy((char *)src, "USER POINTER");
+
+	ptr = (char *)get_metadata((uint64_t)src, lam);
+	if (src == ptr)
+		return 0;
+
+	/* Copy a string into the pointer with metadata */
+	strcpy((char *)ptr, "METADATA POINTER");
+
+	return (!!strcmp((char *)src, (char *)ptr));
+}
+
+/*
+ * Test lam feature through dereference pointer get from malloc.
+ * @return 0: Pass test. 1: Get failure during test 2: Get SIGSEGV
+ */
+static int handle_malloc(struct testcases *test)
+{
+	char *ptr = NULL;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) == -1)
+			return 1;
+
+	ptr = (char *)malloc(MALLOC_LEN);
+	if (ptr == NULL) {
+		perror("malloc() failure\n");
+		return 1;
+	}
+
+	/* Set signal handler */
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		ret = handle_lam_test(ptr, test->lam);
+	} else {
+		ret = 2;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) == -1 && ret == 0)
+			ret = 1;
+
+	free(ptr);
+
+	return ret;
+}
+
+static int fork_test(struct testcases *test)
+{
+	int ret, child_ret;
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		ret = 1;
+	} else if (pid == 0) {
+		ret = test->test_func(test);
+		exit(ret);
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+	}
+
+	return ret;
+}
+
+static void run_test(struct testcases *test, int count)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < count; i++) {
+		struct testcases *t = test + i;
+
+		/* fork a process to run test case */
+		ret = fork_test(t);
+		if (ret != 0)
+			ret = (t->expected == ret);
+		else
+			ret = !(t->expected);
+
+		tests_cnt++;
+		ksft_test_result(ret, t->msg);
+	}
+}
+
+static struct testcases malloc_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_malloc,
+		.msg = "MALLOC: LAM_U57. Dereferencing pointer with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 2,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_malloc,
+		.msg = "MALLOC:[Negtive] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
+static void cmd_help(void)
+{
+	printf("usage: lam [-h] [-t test list]\n");
+	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
+	printf("\t\t0x1:malloc;\n");
+	printf("\t-h: help\n");
+}
+
+int main(int argc, char **argv)
+{
+	int c = 0;
+	unsigned int tests = TEST_MASK;
+
+	tests_cnt = 0;
+
+	if (!cpu_has_lam()) {
+		ksft_print_msg("Unsupported LAM feature!\n");
+		return -1;
+	}
+
+	while ((c = getopt(argc, argv, "ht:")) != -1) {
+		switch (c) {
+		case 't':
+			tests = strtoul(optarg, NULL, 16);
+			if (!(tests & TEST_MASK)) {
+				ksft_print_msg("Invalid argument!\n");
+				return -1;
+			}
+			break;
+		case 'h':
+			cmd_help();
+			return 0;
+		default:
+			ksft_print_msg("Invalid argument\n");
+			return -1;
+		}
+	}
+
+	if (tests & FUNC_MALLOC)
+		run_test(malloc_cases, ARRAY_SIZE(malloc_cases));
+
+	ksft_set_plan(tests_cnt);
+
+	return ksft_exit_pass();
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 09/13] selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 08/13] selftests/x86/lam: Add malloc test cases for linear-address masking Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 10/13] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Weihong Zhang, Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

Add mmap and SYSCALL test cases.

SYSCALL test cases:

 - LAM supports set metadata in high bits 62:57 (LAM_U57) of a user pointer, pass
   the pointer to SYSCALL, SYSCALL can dereference the pointer and return correct
   result.

 - Disable LAM, pass a pointer with metadata in high bits to SYSCALL,
   SYSCALL returns -1 (EFAULT).

MMAP test cases:

 - Enable LAM_U57, MMAP with low address (below bits 47), set metadata
   in high bits of the address, dereference the address should be
   allowed.

 - Enable LAM_U57, MMAP with high address (above bits 47), set metadata
   in high bits of the address, dereference the address should be
   allowed.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 134 +++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 4aaf6ad107c3..70c01cc9386b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -7,6 +7,7 @@
 #include <signal.h>
 #include <setjmp.h>
 #include <sys/mman.h>
+#include <sys/utsname.h>
 #include <sys/wait.h>
 #include <inttypes.h>
 
@@ -25,11 +26,18 @@
 
 /* Specified test function bits  */
 #define FUNC_MALLOC             0x1
+#define FUNC_MMAP               0x2
+#define FUNC_SYSCALL            0x4
 
-#define TEST_MASK               0x1
+#define TEST_MASK               0x7
+
+#define LOW_ADDR                (0x1UL << 30)
+#define HIGH_ADDR               (0x3UL << 48)
 
 #define MALLOC_LEN              32
 
+#define PAGE_SIZE               (4 << 10)
+
 struct testcases {
 	unsigned int later;
 	int expected; /* 2: SIGSEGV Error; 1: other errors */
@@ -45,6 +53,7 @@ jmp_buf segv_env;
 static void segv_handler(int sig)
 {
 	ksft_print_msg("Get segmentation fault(%d).", sig);
+
 	siglongjmp(segv_env, 1);
 }
 
@@ -57,6 +66,16 @@ static inline int cpu_has_lam(void)
 	return (cpuinfo[0] & (1 << 26));
 }
 
+/* Check 5-level page table feature in CPUID.(EAX=07H, ECX=00H):ECX.[bit 16] */
+static inline int cpu_has_la57(void)
+{
+	unsigned int cpuinfo[4];
+
+	__cpuid_count(0x7, 0, cpuinfo[0], cpuinfo[1], cpuinfo[2], cpuinfo[3]);
+
+	return (cpuinfo[2] & (1 << 16));
+}
+
 /*
  * Set tagged address and read back untag mask.
  * check if the untagged mask is expected.
@@ -161,6 +180,68 @@ static int handle_malloc(struct testcases *test)
 	return ret;
 }
 
+static int handle_mmap(struct testcases *test)
+{
+	void *ptr;
+	unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	ptr = mmap((void *)test->addr, PAGE_SIZE, PROT_READ | PROT_WRITE,
+		   flags, -1, 0);
+	if (ptr == MAP_FAILED) {
+		if (test->addr == HIGH_ADDR)
+			if (!cpu_has_la57())
+				return 3; /* unsupport LA57 */
+		return 1;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			ret = 1;
+
+	if (ret == 0) {
+		if (sigsetjmp(segv_env, 1) == 0) {
+			signal(SIGSEGV, segv_handler);
+			ret = handle_lam_test(ptr, test->lam);
+		} else {
+			ret = 2;
+		}
+	}
+
+	munmap(ptr, PAGE_SIZE);
+	return ret;
+}
+
+static int handle_syscall(struct testcases *test)
+{
+	struct utsname unme, *pu;
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		pu = (struct utsname *)get_metadata((uint64_t)&unme, test->lam);
+		ret = uname(pu);
+		if (ret < 0)
+			ret = 1;
+	} else {
+		ret = 2;
+	}
+
+	if (test->later != 0 && test->lam != 0)
+		if (set_lam(test->lam) != -1 && ret == 0)
+			ret = 1;
+
+	return ret;
+}
+
 static int fork_test(struct testcases *test)
 {
 	int ret, child_ret;
@@ -216,11 +297,54 @@ static struct testcases malloc_cases[] = {
 	},
 };
 
+static struct testcases syscall_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_syscall,
+		.msg = "SYSCALL: LAM_U57. syscall with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_syscall,
+		.msg = "SYSCALL:[Negtive] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
+static struct testcases mmap_cases[] = {
+	{
+		.later = 1,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First mmap high address, then set LAM_U57.\n",
+	},
+	{
+		.later = 0,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First LAM_U57, then High address.\n",
+	},
+	{
+		.later = 0,
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.addr = LOW_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First LAM_U57, then Low address.\n",
+	},
+};
+
 static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc;\n");
+	printf("\t\t0x1:malloc; 0x2:mmap; 0x4:syscall.\n");
 	printf("\t-h: help\n");
 }
 
@@ -257,6 +381,12 @@ int main(int argc, char **argv)
 	if (tests & FUNC_MALLOC)
 		run_test(malloc_cases, ARRAY_SIZE(malloc_cases));
 
+	if (tests & FUNC_MMAP)
+		run_test(mmap_cases, ARRAY_SIZE(mmap_cases));
+
+	if (tests & FUNC_SYSCALL)
+		run_test(syscall_cases, ARRAY_SIZE(syscall_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 10/13] selftests/x86/lam: Add io_uring test cases for linear-address masking
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 09/13] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 11/13] selftests/x86/lam: Add inherit " Kirill A. Shutemov
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Weihong Zhang, Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM should be supported in kernel thread, using io_uring to verify LAM feature.
The test cases implement read a file through io_uring, the test cases choose an
iovec array as receiving buffer, which used to receive data, according to LAM
mode, set metadata in high bits of these buffer.

io_uring can deal with these buffers that pointed to pointers with the metadata
in high bits.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 343 +++++++++++++++++++++++++++++-
 1 file changed, 340 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 70c01cc9386b..d2ae75b3bdc0 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -9,8 +9,12 @@
 #include <sys/mman.h>
 #include <sys/utsname.h>
 #include <sys/wait.h>
+#include <sys/stat.h>
+#include <fcntl.h>
 #include <inttypes.h>
 
+#include <sys/uio.h>
+#include <linux/io_uring.h>
 #include "../kselftest.h"
 
 #ifndef __x86_64__
@@ -24,12 +28,13 @@
 #define ARCH_GET_UNTAG_MASK     0x4001
 #define ARCH_ENABLE_TAGGED_ADDR 0x4002
 
-/* Specified test function bits  */
+/* Specified test function bits */
 #define FUNC_MALLOC             0x1
 #define FUNC_MMAP               0x2
 #define FUNC_SYSCALL            0x4
+#define FUNC_URING              0x8
 
-#define TEST_MASK               0x7
+#define TEST_MASK               0xf
 
 #define LOW_ADDR                (0x1UL << 30)
 #define HIGH_ADDR               (0x3UL << 48)
@@ -38,6 +43,13 @@
 
 #define PAGE_SIZE               (4 << 10)
 
+#define barrier() ({						\
+		   __asm__ __volatile__("" : : : "memory");	\
+})
+
+#define URING_QUEUE_SZ 1
+#define URING_BLOCK_SZ 2048
+
 struct testcases {
 	unsigned int later;
 	int expected; /* 2: SIGSEGV Error; 1: other errors */
@@ -47,6 +59,33 @@ struct testcases {
 	const char *msg;
 };
 
+/* Used by CQ of uring, source file handler and file's size */
+struct file_io {
+	int file_fd;
+	off_t file_sz;
+	struct iovec iovecs[];
+};
+
+struct io_uring_queue {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	unsigned int *flags;
+	unsigned int *array;
+	union {
+		struct io_uring_cqe *cqes;
+		struct io_uring_sqe *sqes;
+	} queue;
+	size_t ring_sz;
+};
+
+struct io_ring {
+	int ring_fd;
+	struct io_uring_queue sq_ring;
+	struct io_uring_queue cq_ring;
+};
+
 int tests_cnt;
 jmp_buf segv_env;
 
@@ -242,6 +281,285 @@ static int handle_syscall(struct testcases *test)
 	return ret;
 }
 
+int sys_uring_setup(unsigned int entries, struct io_uring_params *p)
+{
+	return (int)syscall(__NR_io_uring_setup, entries, p);
+}
+
+int sys_uring_enter(int fd, unsigned int to, unsigned int min, unsigned int flags)
+{
+	return (int)syscall(__NR_io_uring_enter, fd, to, min, flags, NULL, 0);
+}
+
+/* Init submission queue and completion queue */
+int mmap_io_uring(struct io_uring_params p, struct io_ring *s)
+{
+	struct io_uring_queue *sring = &s->sq_ring;
+	struct io_uring_queue *cring = &s->cq_ring;
+
+	sring->ring_sz = p.sq_off.array + p.sq_entries * sizeof(unsigned int);
+	cring->ring_sz = p.cq_off.cqes + p.cq_entries * sizeof(struct io_uring_cqe);
+
+	if (p.features & IORING_FEAT_SINGLE_MMAP) {
+		if (cring->ring_sz > sring->ring_sz)
+			sring->ring_sz = cring->ring_sz;
+
+		cring->ring_sz = sring->ring_sz;
+	}
+
+	void *sq_ptr = mmap(0, sring->ring_sz, PROT_READ | PROT_WRITE,
+			    MAP_SHARED | MAP_POPULATE, s->ring_fd,
+			    IORING_OFF_SQ_RING);
+
+	if (sq_ptr == MAP_FAILED) {
+		perror("sub-queue!");
+		return 1;
+	}
+
+	void *cq_ptr = sq_ptr;
+
+	if (!(p.features & IORING_FEAT_SINGLE_MMAP)) {
+		cq_ptr = mmap(0, cring->ring_sz, PROT_READ | PROT_WRITE,
+			      MAP_SHARED | MAP_POPULATE, s->ring_fd,
+			      IORING_OFF_CQ_RING);
+		if (cq_ptr == MAP_FAILED) {
+			perror("cpl-queue!");
+			munmap(sq_ptr, sring->ring_sz);
+			return 1;
+		}
+	}
+
+	sring->head = sq_ptr + p.sq_off.head;
+	sring->tail = sq_ptr + p.sq_off.tail;
+	sring->ring_mask = sq_ptr + p.sq_off.ring_mask;
+	sring->ring_entries = sq_ptr + p.sq_off.ring_entries;
+	sring->flags = sq_ptr + p.sq_off.flags;
+	sring->array = sq_ptr + p.sq_off.array;
+
+	/* Map a queue as mem map */
+	s->sq_ring.queue.sqes = mmap(0, p.sq_entries * sizeof(struct io_uring_sqe),
+				     PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+				     s->ring_fd, IORING_OFF_SQES);
+	if (s->sq_ring.queue.sqes == MAP_FAILED) {
+		munmap(sq_ptr, sring->ring_sz);
+		if (sq_ptr != cq_ptr) {
+			ksft_print_msg("failed to mmap uring queue!");
+			munmap(cq_ptr, cring->ring_sz);
+			return 1;
+		}
+	}
+
+	cring->head = cq_ptr + p.cq_off.head;
+	cring->tail = cq_ptr + p.cq_off.tail;
+	cring->ring_mask = cq_ptr + p.cq_off.ring_mask;
+	cring->ring_entries = cq_ptr + p.cq_off.ring_entries;
+	cring->queue.cqes = cq_ptr + p.cq_off.cqes;
+
+	return 0;
+}
+
+/* Init io_uring queues */
+int setup_io_uring(struct io_ring *s)
+{
+	struct io_uring_params para;
+
+	memset(&para, 0, sizeof(para));
+	s->ring_fd = sys_uring_setup(URING_QUEUE_SZ, &para);
+	if (s->ring_fd < 0)
+		return 1;
+
+	return mmap_io_uring(para, s);
+}
+
+/*
+ * Get data from completion queue. the data buffer saved the file data
+ * return 0: success; others: error;
+ */
+int handle_uring_cq(struct io_ring *s)
+{
+	struct file_io *fi = NULL;
+	struct io_uring_queue *cring = &s->cq_ring;
+	struct io_uring_cqe *cqe;
+	unsigned int head;
+	off_t len = 0;
+
+	head = *cring->head;
+
+	do {
+		barrier();
+		if (head == *cring->tail)
+			break;
+		/* Get the entry */
+		cqe = &cring->queue.cqes[head & *s->cq_ring.ring_mask];
+		fi = (struct file_io *)cqe->user_data;
+		if (cqe->res < 0)
+			break;
+
+		int blocks = (int)(fi->file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+		for (int i = 0; i < blocks; i++)
+			len += fi->iovecs[i].iov_len;
+
+		head++;
+	} while (1);
+
+	*cring->head = head;
+	barrier();
+
+	return (len != fi->file_sz);
+}
+
+/*
+ * Submit squeue. specify via IORING_OP_READV.
+ * the buffer need to be set metadata according to LAM mode
+ */
+int handle_uring_sq(struct io_ring *ring, struct file_io *fi, unsigned long lam)
+{
+	int file_fd = fi->file_fd;
+	struct io_uring_queue *sring = &ring->sq_ring;
+	unsigned int index = 0, cur_block = 0, tail = 0, next_tail = 0;
+	struct io_uring_sqe *sqe;
+
+	off_t remain = fi->file_sz;
+	int blocks = (int)(remain + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+	while (remain) {
+		off_t bytes = remain;
+		void *buf;
+
+		if (bytes > URING_BLOCK_SZ)
+			bytes = URING_BLOCK_SZ;
+
+		fi->iovecs[cur_block].iov_len = bytes;
+
+		if (posix_memalign(&buf, URING_BLOCK_SZ, URING_BLOCK_SZ))
+			return 1;
+
+		fi->iovecs[cur_block].iov_base = (void *)get_metadata((uint64_t)buf, lam);
+		remain -= bytes;
+		cur_block++;
+	}
+
+	next_tail = *sring->tail;
+	tail = next_tail;
+	next_tail++;
+
+	barrier();
+
+	index = tail & *ring->sq_ring.ring_mask;
+
+	sqe = &ring->sq_ring.queue.sqes[index];
+	sqe->fd = file_fd;
+	sqe->flags = 0;
+	sqe->opcode = IORING_OP_READV;
+	sqe->addr = (unsigned long)fi->iovecs;
+	sqe->len = blocks;
+	sqe->off = 0;
+	sqe->user_data = (uint64_t)fi;
+
+	sring->array[index] = index;
+	tail = next_tail;
+
+	if (*sring->tail != tail) {
+		*sring->tail = tail;
+		barrier();
+	}
+
+	if (sys_uring_enter(ring->ring_fd, 1, 1, IORING_ENTER_GETEVENTS) < 0)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * Test LAM in async I/O and io_uring, read current binery through io_uring
+ * Set metadata in pointers to iovecs buffer.
+ */
+int do_uring(unsigned long lam)
+{
+	struct io_ring *ring;
+	struct file_io *fi;
+	struct stat st;
+	int ret = 1;
+	char path[PATH_MAX];
+
+	/* get current process path */
+	if (readlink("/proc/self/exe", path, PATH_MAX) <= 0)
+		return 1;
+
+	int file_fd = open(path, O_RDONLY);
+
+	if (file_fd < 0)
+		return 1;
+
+	if (fstat(file_fd, &st) < 0)
+		return 1;
+
+	off_t file_sz = st.st_size;
+
+	int blocks = (int)(file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ;
+
+	fi = malloc(sizeof(*fi) + sizeof(struct iovec) * blocks);
+	if (!fi)
+		return 1;
+
+	fi->file_sz = file_sz;
+	fi->file_fd = file_fd;
+
+	ring = malloc(sizeof(*ring));
+	if (!ring)
+		return 1;
+
+	memset(ring, 0, sizeof(struct io_ring));
+
+	if (setup_io_uring(ring))
+		goto out;
+
+	if (handle_uring_sq(ring, fi, lam))
+		goto out;
+
+	ret = handle_uring_cq(ring);
+
+out:
+	free(ring);
+
+	for (int i = 0; i < blocks; i++) {
+		if (fi->iovecs[i].iov_base) {
+			uint64_t addr = ((uint64_t)fi->iovecs[i].iov_base);
+
+			switch (lam) {
+			case LAM_U57_BITS: /* Clear bits 62:57 */
+				addr = (addr & ~(0x3fULL << 57));
+				break;
+			}
+			free((void *)addr);
+			fi->iovecs[i].iov_base = NULL;
+		}
+	}
+
+	free(fi);
+
+	return ret;
+}
+
+int handle_uring(struct testcases *test)
+{
+	int ret = 0;
+
+	if (test->later == 0 && test->lam != 0)
+		if (set_lam(test->lam) != 0)
+			return 1;
+
+	if (sigsetjmp(segv_env, 1) == 0) {
+		signal(SIGSEGV, segv_handler);
+		ret = do_uring(test->lam);
+	} else {
+		ret = 2;
+	}
+
+	return ret;
+}
+
 static int fork_test(struct testcases *test)
 {
 	int ret, child_ret;
@@ -281,6 +599,22 @@ static void run_test(struct testcases *test, int count)
 	}
 }
 
+static struct testcases uring_cases[] = {
+	{
+		.later = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_uring,
+		.msg = "URING: LAM_U57. Dereferencing pointer with metadata\n",
+	},
+	{
+		.later = 1,
+		.expected = 1,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_uring,
+		.msg = "URING:[Negtive] Disable LAM. Dereferencing pointer with metadata.\n",
+	},
+};
+
 static struct testcases malloc_cases[] = {
 	{
 		.later = 0,
@@ -344,7 +678,7 @@ static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc; 0x2:mmap; 0x4:syscall.\n");
+	printf("\t\t0x1:malloc; 0x2:mmap; 0x4:syscall; 0x8:io_uring.\n");
 	printf("\t-h: help\n");
 }
 
@@ -387,6 +721,9 @@ int main(int argc, char **argv)
 	if (tests & FUNC_SYSCALL)
 		run_test(syscall_cases, ARRAY_SIZE(syscall_cases));
 
+	if (tests & FUNC_URING)
+		run_test(uring_cases, ARRAY_SIZE(uring_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 11/13] selftests/x86/lam: Add inherit test cases for linear-address masking
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 10/13] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 OPTIONAL 12/13] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Weihong Zhang, Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM is enabled per-thread and gets inherited on fork(2)/clone(2). exec()
reverts LAM status to the default disabled state.

There are two test scenarios:

 - Fork test cases:

   These cases were used to test the inheritance of LAM for per-thread,
   Child process generated by fork() should inherit LAM feature from
   parent process, Child process can get the LAM mode same as parent
   process.

 - Execve test cases:

   Processes generated by execve() are different from processes
   generated by fork(), these processes revert LAM status to disabled
   status.

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 124 +++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index d2ae75b3bdc0..fcac5feb47d0 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -34,7 +34,9 @@
 #define FUNC_SYSCALL            0x4
 #define FUNC_URING              0x8
 
-#define TEST_MASK               0xf
+#define FUNC_INHERITE           0x10
+
+#define TEST_MASK               0x1f
 
 #define LOW_ADDR                (0x1UL << 30)
 #define HIGH_ADDR               (0x3UL << 48)
@@ -142,6 +144,28 @@ static int set_lam(unsigned long lam)
 	return ret;
 }
 
+/*
+ * Set tagged address and read back untag mask.
+ * check if the untag mask is expected.
+ */
+static int get_lam(void)
+{
+	uint64_t ptr = 0;
+	int ret = -1;
+	/* Get untagged mask */
+	if (syscall(SYS_arch_prctl, ARCH_GET_UNTAG_MASK, &ptr) == -1)
+		return -1;
+
+	/* Check mask returned is expected */
+	if (ptr == ~(0x3fULL << 57))
+		ret = LAM_U57_BITS;
+	else if (ptr == -1ULL)
+		ret = LAM_NONE;
+
+
+	return ret;
+}
+
 /* According to LAM mode, set metadata in high bits */
 static uint64_t get_metadata(uint64_t src, unsigned long lam)
 {
@@ -580,6 +604,72 @@ static int fork_test(struct testcases *test)
 	return ret;
 }
 
+static int handle_execve(struct testcases *test)
+{
+	int ret, child_ret;
+	int lam = test->lam;
+	pid_t pid;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		ret = 1;
+	} else if (pid == 0) {
+		char path[PATH_MAX];
+
+		/* Set LAM mode in parent process */
+		if (set_lam(lam) != 0)
+			return 1;
+
+		/* Get current binary's path and the binary was run by execve */
+		if (readlink("/proc/self/exe", path, PATH_MAX) <= 0)
+			exit(-1);
+
+		/* run binary to get LAM mode and return to parent process */
+		if (execlp(path, path, "-t 0x0", NULL) < 0) {
+			perror("error on exec");
+			exit(-1);
+		}
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+		if (ret != LAM_NONE)
+			return 1;
+	}
+
+	return 0;
+}
+
+static int handle_inheritance(struct testcases *test)
+{
+	int ret, child_ret;
+	int lam = test->lam;
+	pid_t pid;
+
+	/* Set LAM mode in parent process */
+	if (set_lam(lam) != 0)
+		return 1;
+
+	pid = fork();
+	if (pid < 0) {
+		perror("Fork failed.");
+		return 1;
+	} else if (pid == 0) {
+		/* Set LAM mode in parent process */
+		int child_lam = get_lam();
+
+		exit(child_lam);
+	} else {
+		wait(&child_ret);
+		ret = WEXITSTATUS(child_ret);
+
+		if (lam != ret)
+			return 1;
+	}
+
+	return 0;
+}
+
 static void run_test(struct testcases *test, int count)
 {
 	int i, ret = 0;
@@ -674,11 +764,26 @@ static struct testcases mmap_cases[] = {
 	},
 };
 
+static struct testcases inheritance_cases[] = {
+	{
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_inheritance,
+		.msg = "FORK: LAM_U57, child process should get LAM mode same as parent\n",
+	},
+	{
+		.expected = 0,
+		.lam = LAM_U57_BITS,
+		.test_func = handle_execve,
+		.msg = "EXECVE: LAM_U57, child process should get disabled LAM mode\n",
+	},
+};
+
 static void cmd_help(void)
 {
 	printf("usage: lam [-h] [-t test list]\n");
 	printf("\t-t test list: run tests specified in the test list, default:0x%x\n", TEST_MASK);
-	printf("\t\t0x1:malloc; 0x2:mmap; 0x4:syscall; 0x8:io_uring.\n");
+	printf("\t\t0x1:malloc; 0x2:mmap; 0x4:syscall; 0x8:io_uring; 0x10:inherit;\n");
 	printf("\t-h: help\n");
 }
 
@@ -698,7 +803,7 @@ int main(int argc, char **argv)
 		switch (c) {
 		case 't':
 			tests = strtoul(optarg, NULL, 16);
-			if (!(tests & TEST_MASK)) {
+			if (tests && !(tests & TEST_MASK)) {
 				ksft_print_msg("Invalid argument!\n");
 				return -1;
 			}
@@ -712,6 +817,16 @@ int main(int argc, char **argv)
 		}
 	}
 
+	/*
+	 * When tests is 0, it is not a real test case;
+	 * the option used by test case(execve) to check the lam mode in
+	 * process generated by execve, the process read back lam mode and
+	 * check with lam mode in parent process.
+	 */
+	if (!tests)
+		return (get_lam());
+
+	/* Run test cases */
 	if (tests & FUNC_MALLOC)
 		run_test(malloc_cases, ARRAY_SIZE(malloc_cases));
 
@@ -724,6 +839,9 @@ int main(int argc, char **argv)
 	if (tests & FUNC_URING)
 		run_test(uring_cases, ARRAY_SIZE(uring_cases));
 
+	if (tests & FUNC_INHERITE)
+		run_test(inheritance_cases, ARRAY_SIZE(inheritance_cases));
+
 	ksft_set_plan(tests_cnt);
 
 	return ksft_exit_pass();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 OPTIONAL 12/13] x86/mm: Extend LAM to support to LAM_U48
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 11/13] selftests/x86/lam: Add inherit " Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-12 23:13 ` [PATCHv5 OPTIONAL 13/13] selftests/x86/lam: Add tests cases for LAM_U48 Kirill A. Shutemov
  2022-07-18 17:39 ` [PATCHv5 00/13] Linear Address Masking enabling Alexander Potapenko
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Kirill A. Shutemov

LAM_U48 allows to encode 15 bits of tags into address.

LAM_U48 steals bits above 47-bit for tags and makes it impossible for
userspace to use full address space on 5-level paging machine.

Make these features mutually exclusive: whichever gets enabled first
blocks the other one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/elf.h         |  3 ++-
 arch/x86/include/asm/mmu_context.h | 13 +++++++++++++
 arch/x86/kernel/process_64.c       | 23 +++++++++++++++++++++++
 arch/x86/kernel/sys_x86_64.c       |  5 +++--
 arch/x86/mm/hugetlbpage.c          |  6 ++++--
 arch/x86/mm/mmap.c                 | 10 +++++++++-
 6 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index cb0ff1055ab1..4df13497a770 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -317,7 +317,8 @@ static inline int mmap_is_ia32(void)
 extern unsigned long task_size_32bit(void);
 extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
-extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
+extern bool mmap_address_hint_valid(struct mm_struct *mm,
+				    unsigned long addr, unsigned long len);
 extern unsigned long get_sigframe_size(void);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b0e9ea23758b..3736f41948e9 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -263,6 +263,19 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 
 unsigned long __get_current_cr3_fast(void);
 
+#ifdef CONFIG_X86_5LEVEL
+static inline bool full_va_allowed(struct mm_struct *mm)
+{
+	/* LAM_U48 steals VA bits above 47-bit for tags */
+	return mm->context.lam_cr3_mask != X86_CR3_LAM_U48;
+}
+#else
+static inline bool full_va_allowed(struct mm_struct *mm)
+{
+	return false;
+}
+#endif
+
 #include <asm-generic/mmu_context.h>
 
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 82a19168bfa4..cfa2e42a135a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -761,6 +761,16 @@ static void enable_lam_func(void *mm)
 	set_tlbstate_cr3_lam_mask(lam_mask);
 }
 
+static bool lam_u48_allowed(void)
+{
+	struct mm_struct *mm = current->mm;
+
+	if (!full_va_allowed(mm))
+		return true;
+
+	return find_vma(mm, DEFAULT_MAP_WINDOW) == NULL;
+}
+
 static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 {
 	int ret = 0;
@@ -768,6 +778,10 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	if (!cpu_feature_enabled(X86_FEATURE_LAM))
 		return -ENODEV;
 
+	/* lam_u48_allowed() requires mmap_lock */
+	if (mmap_write_lock_killable(mm))
+		return -EINTR;
+
 	mutex_lock(&mm->context.lock);
 
 	/* Already enabled? */
@@ -782,6 +796,14 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	} else if (nr_bits <= 6) {
 		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
 		mm->context.untag_mask =  ~GENMASK(62, 57);
+	} else if (nr_bits <= 15) {
+		if (!lam_u48_allowed()) {
+			ret = -EBUSY;
+			goto out;
+		}
+
+		mm->context.lam_cr3_mask = X86_CR3_LAM_U48;
+		mm->context.untag_mask =  ~GENMASK(62, 48);
 	} else {
 		ret = -EINVAL;
 		goto out;
@@ -793,6 +815,7 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
 out:
 	mutex_unlock(&mm->context.lock);
+	mmap_write_unlock(mm);
 	return ret;
 }
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 8cc653ffdccd..5ea6aaed89ba 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -21,6 +21,7 @@
 
 #include <asm/elf.h>
 #include <asm/ia32.h>
+#include <asm/mmu_context.h>
 
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
@@ -182,7 +183,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	/* requesting a specific address */
 	if (addr) {
 		addr &= PAGE_MASK;
-		if (!mmap_address_hint_valid(addr, len))
+		if (!mmap_address_hint_valid(mm, addr, len))
 			goto get_unmapped_area;
 
 		vma = find_vma(mm, addr);
@@ -203,7 +204,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	 * !in_32bit_syscall() check to avoid high addresses for x32
 	 * (and make it no op on native i386).
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall() && full_va_allowed(mm))
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index a0d023cb4292..9fdc8db42365 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -18,6 +18,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/elf.h>
+#include <asm/mmu_context.h>
 
 #if 0	/* This is just for testing */
 struct page *
@@ -103,6 +104,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 		unsigned long pgoff, unsigned long flags)
 {
 	struct hstate *h = hstate_file(file);
+	struct mm_struct *mm = current->mm;
 	struct vm_unmapped_area_info info;
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
@@ -114,7 +116,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall() && full_va_allowed(mm))
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
@@ -161,7 +163,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 
 	if (addr) {
 		addr &= huge_page_mask(h);
-		if (!mmap_address_hint_valid(addr, len))
+		if (!mmap_address_hint_valid(mm, addr, len))
 			goto get_unmapped_area;
 
 		vma = find_vma(mm, addr);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..aa0086722a38 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -21,6 +21,7 @@
 #include <linux/elf-randomize.h>
 #include <asm/elf.h>
 #include <asm/io.h>
+#include <asm/mmu_context.h>
 
 #include "physaddr.h"
 
@@ -35,6 +36,8 @@ unsigned long task_size_32bit(void)
 
 unsigned long task_size_64bit(int full_addr_space)
 {
+	if (!full_va_allowed(current->mm))
+		return DEFAULT_MAP_WINDOW;
 	return full_addr_space ? TASK_SIZE_MAX : DEFAULT_MAP_WINDOW;
 }
 
@@ -170,6 +173,7 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 
 /**
  * mmap_address_hint_valid - Validate the address hint of mmap
+ * @mm:		Address space
  * @addr:	Address hint
  * @len:	Mapping length
  *
@@ -206,11 +210,15 @@ const char *arch_vma_name(struct vm_area_struct *vma)
  * the failure of such a fixed mapping request, so the restriction is not
  * applied.
  */
-bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
+bool mmap_address_hint_valid(struct mm_struct *mm,
+			     unsigned long addr, unsigned long len)
 {
 	if (TASK_SIZE - len < addr)
 		return false;
 
+	if (addr + len > DEFAULT_MAP_WINDOW && !full_va_allowed(mm))
+		return false;
+
 	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5 OPTIONAL 13/13] selftests/x86/lam: Add tests cases for LAM_U48
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 OPTIONAL 12/13] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
@ 2022-07-12 23:13 ` Kirill A. Shutemov
  2022-07-18 17:39 ` [PATCHv5 00/13] Linear Address Masking enabling Alexander Potapenko
  13 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-12 23:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel,
	Weihong Zhang, Kirill A . Shutemov

From: Weihong Zhang <weihong.zhang@intel.com>

LAM supports configurations that differ regarding which pointer bits are masked.
With LAM_U48, bits 62:48 in pointers can be masked as metadata bits, the width
of LAM is 15.

Add test cases in existed test scenarios:

MALLOC:

 - Enable LAM_U48, masks bits 48:62 of user pointers as metadata, the
   process can dereference these pointers.

MMAP:

 - Enable LAM_U48, mmaping with high address (above bits 47) have to be
   failed, which lead to trigger SIGSEGV.

 - LAM_U48 can't be enabled if there is a mmaping with high
   address(above bits 47) before enable LAM_U48.

SYSCALL:

 - LAM supports set metadata in high bits 62:48 (LAM48) of user process,
   pass these pointers to SYSCALL, SYSCALL can dereference pointers and
   return correct result.

IO_URING:

 - Add LAM_U48 test on IO_URING, Enable LAM_U48, set metadata in bits
   62:48 of buffers, IO_URING can handle these buffers well.

FORK/EXEC:

 - Add LAM_U48 test cases in inherit scenarios. these cases should same
   as LAM_U57;

Signed-off-by: Weihong Zhang <weihong.zhang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/lam.c | 67 ++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index fcac5feb47d0..b354e57bf072 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -24,6 +24,7 @@
 /* LAM modes, these definitions were copied from kernel code */
 #define LAM_NONE                0
 #define LAM_U57_BITS            6
+#define LAM_U48_BITS            15
 /* arch prctl for LAM */
 #define ARCH_GET_UNTAG_MASK     0x4001
 #define ARCH_ENABLE_TAGGED_ADDR 0x4002
@@ -126,7 +127,7 @@ static int set_lam(unsigned long lam)
 	int ret = 0;
 	uint64_t ptr = 0;
 
-	if (lam != LAM_U57_BITS && lam != LAM_NONE)
+	if (lam != LAM_U48_BITS && lam != LAM_U57_BITS && lam != LAM_NONE)
 		return -1;
 
 	/* Skip check return */
@@ -138,6 +139,8 @@ static int set_lam(unsigned long lam)
 	/* Check mask returned is expected */
 	if (lam == LAM_U57_BITS)
 		ret = (ptr != ~(0x3fULL << 57));
+	else if (lam == LAM_U48_BITS)
+		ret = (ptr != ~(0x7fffULL << 48));
 	else if (lam == LAM_NONE)
 		ret = (ptr != -1ULL);
 
@@ -161,6 +164,8 @@ static int get_lam(void)
 		ret = LAM_U57_BITS;
 	else if (ptr == -1ULL)
 		ret = LAM_NONE;
+	else if (ptr == ~(0x7fffULL << 48))
+		ret = LAM_U48_BITS;
 
 
 	return ret;
@@ -176,6 +181,9 @@ static uint64_t get_metadata(uint64_t src, unsigned long lam)
 	metadata = rand();
 
 	switch (lam) {
+	case LAM_U48_BITS: /* Set metadata in bits 62:48 */
+		metadata = (src & ~(0x7fffULL << 48)) | ((metadata & 0x7fff) << 48);
+		break;
 	case LAM_U57_BITS: /* Set metadata in bits 62:57 */
 		metadata = (src & ~(0x3fULL << 57)) | ((metadata & 0x3f) << 57);
 		break;
@@ -552,6 +560,9 @@ int do_uring(unsigned long lam)
 			uint64_t addr = ((uint64_t)fi->iovecs[i].iov_base);
 
 			switch (lam) {
+			case LAM_U48_BITS: /* Clear bits 62:48 */
+				addr = (addr & ~(0x7fffULL << 48));
+				break;
 			case LAM_U57_BITS: /* Clear bits 62:57 */
 				addr = (addr & ~(0x3fULL << 57));
 				break;
@@ -696,6 +707,12 @@ static struct testcases uring_cases[] = {
 		.test_func = handle_uring,
 		.msg = "URING: LAM_U57. Dereferencing pointer with metadata\n",
 	},
+	{
+		.later = 0,
+		.lam = LAM_U48_BITS,
+		.test_func = handle_uring,
+		.msg = "URING: LAM_U48. Dereferencing pointer with metadata.\n",
+	},
 	{
 		.later = 1,
 		.expected = 1,
@@ -712,6 +729,12 @@ static struct testcases malloc_cases[] = {
 		.test_func = handle_malloc,
 		.msg = "MALLOC: LAM_U57. Dereferencing pointer with metadata\n",
 	},
+	{
+		.later = 0,
+		.lam = LAM_U48_BITS,
+		.test_func = handle_malloc,
+		.msg = "MALLOC: LAM_U48. Dereferencing pointer with metadata.\n",
+	},
 	{
 		.later = 1,
 		.expected = 2,
@@ -728,6 +751,12 @@ static struct testcases syscall_cases[] = {
 		.test_func = handle_syscall,
 		.msg = "SYSCALL: LAM_U57. syscall with metadata\n",
 	},
+	{
+		.later = 0,
+		.lam = LAM_U48_BITS,
+		.test_func = handle_syscall,
+		.msg = "SYSCALL: LAM_U48. syscall with metadata\n",
+	},
 	{
 		.later = 1,
 		.expected = 1,
@@ -738,6 +767,14 @@ static struct testcases syscall_cases[] = {
 };
 
 static struct testcases mmap_cases[] = {
+	{
+		.later = 0,
+		.expected = 2,
+		.lam = LAM_U48_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: [Negtive] First LAM_U48, then High address.\n",
+	},
 	{
 		.later = 1,
 		.expected = 0,
@@ -746,6 +783,14 @@ static struct testcases mmap_cases[] = {
 		.test_func = handle_mmap,
 		.msg = "MMAP: First mmap high address, then set LAM_U57.\n",
 	},
+	{
+		.later = 1,
+		.expected = 1,
+		.lam = LAM_U48_BITS,
+		.addr = HIGH_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: [Negtive] First mmap high address, then set LAM_U48.\n",
+	},
 	{
 		.later = 0,
 		.expected = 0,
@@ -762,6 +807,14 @@ static struct testcases mmap_cases[] = {
 		.test_func = handle_mmap,
 		.msg = "MMAP: First LAM_U57, then Low address.\n",
 	},
+	{
+		.later = 0,
+		.expected = 0,
+		.lam = LAM_U48_BITS,
+		.addr = LOW_ADDR,
+		.test_func = handle_mmap,
+		.msg = "MMAP: First LAM_U48, then low address.\n",
+	},
 };
 
 static struct testcases inheritance_cases[] = {
@@ -771,12 +824,24 @@ static struct testcases inheritance_cases[] = {
 		.test_func = handle_inheritance,
 		.msg = "FORK: LAM_U57, child process should get LAM mode same as parent\n",
 	},
+	{
+		.expected = 0,
+		.lam = LAM_U48_BITS,
+		.test_func = handle_inheritance,
+		.msg = "FORK: LAM_U48, child process should get LAM mode same as parent\n",
+	},
 	{
 		.expected = 0,
 		.lam = LAM_U57_BITS,
 		.test_func = handle_execve,
 		.msg = "EXECVE: LAM_U57, child process should get disabled LAM mode\n",
 	},
+	{
+		.expected = 0,
+		.lam = LAM_U48_BITS,
+		.test_func = handle_execve,
+		.msg = "EXECVE: LAM_U48, child process should get disabled LAM mode\n",
+	},
 };
 
 static void cmd_help(void)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch
  2022-07-12 23:13 ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
@ 2022-07-13 15:02   ` Kirill A. Shutemov
  2022-07-20  8:57     ` Alexander Potapenko
  2022-07-21 13:13     ` Alexander Potapenko
  2022-07-21 13:14   ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Alexander Potapenko
  1 sibling, 2 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-13 15:02 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: ak, andreyknvl, dave.hansen, dvyukov, glider, hjl.tools, kcc,
	linux-kernel, linux-mm, luto, peterz, rick.p.edgecombe,
	ryabinin.a.a, tarasmadan, x86

Linear Address Masking mode for userspace pointers encoded in CR3 bits.
The mode is selected per-thread. Add new thread features indicate that the
thread has Linear Address Masking enabled.

switch_mm_irqs_off() now respects these flags and constructs CR3
accordingly.

The active LAM mode gets recorded in the tlb_state.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 v5.1:
  - Fix build issue with CONFIG_MODULE=y
---
 arch/x86/include/asm/mmu.h         |  3 +++
 arch/x86/include/asm/mmu_context.h | 24 +++++++++++++++++
 arch/x86/include/asm/tlbflush.h    | 35 +++++++++++++++++++++++++
 arch/x86/mm/tlb.c                  | 42 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..002889ca8978 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -40,6 +40,9 @@ typedef struct {
 
 #ifdef CONFIG_X86_64
 	unsigned short flags;
+
+	/* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
+	unsigned long lam_cr3_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..69c943b2ae90 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return mm->context.lam_cr3_mask;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+}
+
+#else
+
+static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+}
+#endif
+
 #define enter_lazy_tlb enter_lazy_tlb
 extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
 
@@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	arch_dup_pkeys(oldmm, mm);
 	paravirt_arch_dup_mmap(oldmm, mm);
+	dup_lam(oldmm, mm);
 	return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4af5579c7ef7..efe83d33327f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -100,6 +100,16 @@ struct tlb_state {
 	 */
 	bool invalidate_other;
 
+#ifdef CONFIG_X86_64
+	/*
+	 * Active LAM mode.
+	 *
+	 * X86_CR3_LAM_U57/U48 shifted right by X86_CR3_LAM_U57_BIT or 0 if LAM
+	 * disabled.
+	 */
+	u8 lam;
+#endif
+
 	/*
 	 * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
 	 * the corresponding user PCID needs a flush next time we
@@ -356,6 +366,30 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
 }
 #define huge_pmd_needs_flush huge_pmd_needs_flush
 
+#ifdef CONFIG_X86_64
+static inline unsigned long tlbstate_lam_cr3_mask(void)
+{
+	unsigned long lam = this_cpu_read(cpu_tlbstate.lam);
+
+	return lam << X86_CR3_LAM_U57_BIT;
+}
+
+static inline void set_tlbstate_cr3_lam_mask(unsigned long mask)
+{
+	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
+}
+
+#else
+
+static inline unsigned long tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_cr3_lam_mask(u64 mask)
+{
+}
+#endif
 #endif /* !MODULE */
 
 static inline void __native_tlb_flush_global(unsigned long cr4)
@@ -363,4 +397,5 @@ static inline void __native_tlb_flush_global(unsigned long cr4)
 	native_write_cr4(cr4 ^ X86_CR4_PGE);
 	native_write_cr4(cr4);
 }
+
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..4c93f87a8928 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -154,17 +154,18 @@ static inline u16 user_pcid(u16 asid)
 	return ret;
 }
 
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam)
 {
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
+		return __sme_pa(pgd) | kern_pcid(asid) | lam;
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
+		return __sme_pa(pgd) | lam;
 	}
 }
 
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid,
+					      unsigned long lam)
 {
 	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	/*
@@ -173,7 +174,7 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 	 * boot because all CPU's the have same capabilities:
 	 */
 	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+	return __sme_pa(pgd) | kern_pcid(asid) | lam | CR3_NOFLUSH;
 }
 
 /*
@@ -274,15 +275,16 @@ static inline void invalidate_user_asid(u16 asid)
 		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
 }
 
-static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam,
+			    bool need_flush)
 {
 	unsigned long new_mm_cr3;
 
 	if (need_flush) {
 		invalidate_user_asid(new_asid);
-		new_mm_cr3 = build_cr3(pgdir, new_asid);
+		new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
 	} else {
-		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
 	}
 
 	/*
@@ -491,6 +493,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	unsigned long prev_lam = tlbstate_lam_cr3_mask();
+	unsigned long new_lam = mm_lam_cr3_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
@@ -520,7 +524,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, prev_lam))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -622,15 +626,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	set_tlbstate_cr3_lam_mask(new_lam);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-		load_new_mm_cr3(next->pgd, new_asid, true);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 	} else {
 		/* The new ASID is already up to date. */
-		load_new_mm_cr3(next->pgd, new_asid, false);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
@@ -691,6 +696,10 @@ void initialize_tlbstate_and_flush(void)
 	/* Assert that CR3 already references the right mm. */
 	WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
 
+	/* LAM expected to be disabled in CR3 and init_mm */
+	WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
+	WARN_ON(mm_lam_cr3_mask(&init_mm));
+
 	/*
 	 * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
 	 * doesn't work like other CR4 bits because it can only be set from
@@ -699,8 +708,8 @@ void initialize_tlbstate_and_flush(void)
 	WARN_ON(boot_cpu_has(X86_FEATURE_PCID) &&
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
-	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm->pgd, 0));
+	/* Disable LAM, force ASID 0 and force a TLB flush. */
+	write_cr3(build_cr3(mm->pgd, 0, 0));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
@@ -708,6 +717,7 @@ void initialize_tlbstate_and_flush(void)
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
+	set_tlbstate_cr3_lam_mask(0);
 
 	for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
 		this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
@@ -1047,8 +1057,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
  */
 unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+	unsigned long cr3 =
+		build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid),
+		tlbstate_lam_cr3_mask());
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 00/13] Linear Address Masking enabling
  2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2022-07-12 23:13 ` [PATCHv5 OPTIONAL 13/13] selftests/x86/lam: Add tests cases for LAM_U48 Kirill A. Shutemov
@ 2022-07-18 17:39 ` Alexander Potapenko
  2022-07-20  0:59   ` Kirill A. Shutemov
  13 siblings, 1 reply; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-18 17:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Linear Address Masking[1] (LAM) modifies the checking that is applied to
> 64-bit linear addresses, allowing software to use of the untranslated
> address bits for metadata.
>
> The patchset brings support for LAM for userspace addresses.
>
> LAM_U48 enabling is controversial since it competes for bits with
> 5-level paging. Its enabling isolated into an optional last patch that
> can be applied at maintainer's discretion.

I believe having optional patches will put unnecessary burden on
distro maintainers.
Soon after landing U48 support other changes will start piling on top
of it, and it will be impossible to maintain a kernel with this patch
removed.
It also won't make any difference for the upstream, where this patch
will be always present.

We'd better decide now whether we need U48 or not, and either keep it
or delete it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-12 23:13 ` [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
@ 2022-07-18 17:47   ` Alexander Potapenko
  2022-07-20  0:57     ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-18 17:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Add a couple of arch_prctl() handles:
>
>  - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
>    of tag bits. It is rounded up to the nearest LAM mode that can
>    provide it. For now only LAM_U57 is supported, with 6 tag bits.
>
>  - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
>    bits located in the address.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/uapi/asm/prctl.h |  3 ++
>  arch/x86/kernel/process_64.c      | 60 ++++++++++++++++++++++++++++++-
>  2 files changed, 62 insertions(+), 1 deletion(-)


> +
> +static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> +{
> +       int ret = 0;
> +
> +       if (!cpu_feature_enabled(X86_FEATURE_LAM))
> +               return -ENODEV;

Hm, I used to think ENODEV is specific to devices, and -EINVAL is more
appropriate here.
On the other hand, e.g. prctl(PR_SET_SPECULATION_CTRL) can also return ENODEV...


>  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
>  {
>         int ret = 0;
> @@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
>         case ARCH_MAP_VDSO_64:
>                 return prctl_map_vdso(&vdso_image_64, arg2);
>  #endif
> -
> +       case ARCH_GET_UNTAG_MASK:
> +               return put_user(task->mm->context.untag_mask,
> +                               (unsigned long __user *)arg2);

Can we have ARCH_GET_UNTAG_MASK return the same error value (ENODEV or
EINVAL) as ARCH_ENABLE_TAGGED_ADDR in the case the host doesn't
support LAM?
After all, the mask does not make much sense in this case.

> +       case ARCH_ENABLE_TAGGED_ADDR:
> +               return prctl_enable_tagged_addr(task->mm, arg2);
>         default:
>                 ret = -EINVAL;
>                 break;
> --
> 2.35.1
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-18 17:47   ` Alexander Potapenko
@ 2022-07-20  0:57     ` Kirill A. Shutemov
  2022-07-20  8:19       ` Alexander Potapenko
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-20  0:57 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Mon, Jul 18, 2022 at 07:47:44PM +0200, Alexander Potapenko wrote:
> On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Add a couple of arch_prctl() handles:
> >
> >  - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
> >    of tag bits. It is rounded up to the nearest LAM mode that can
> >    provide it. For now only LAM_U57 is supported, with 6 tag bits.
> >
> >  - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
> >    bits located in the address.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/include/uapi/asm/prctl.h |  3 ++
> >  arch/x86/kernel/process_64.c      | 60 ++++++++++++++++++++++++++++++-
> >  2 files changed, 62 insertions(+), 1 deletion(-)
> 
> 
> > +
> > +static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> > +{
> > +       int ret = 0;
> > +
> > +       if (!cpu_feature_enabled(X86_FEATURE_LAM))
> > +               return -ENODEV;
> 
> Hm, I used to think ENODEV is specific to devices, and -EINVAL is more
> appropriate here.
> On the other hand, e.g. prctl(PR_SET_SPECULATION_CTRL) can also return ENODEV...

I'm fine either way. Although there are way too many -EINVALs around, so
it does not communicate much to user.

> >  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> >  {
> >         int ret = 0;
> > @@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> >         case ARCH_MAP_VDSO_64:
> >                 return prctl_map_vdso(&vdso_image_64, arg2);
> >  #endif
> > -
> > +       case ARCH_GET_UNTAG_MASK:
> > +               return put_user(task->mm->context.untag_mask,
> > +                               (unsigned long __user *)arg2);
> 
> Can we have ARCH_GET_UNTAG_MASK return the same error value (ENODEV or
> EINVAL) as ARCH_ENABLE_TAGGED_ADDR in the case the host doesn't
> support LAM?
> After all, the mask does not make much sense in this case.

I'm not sure about this.

As it is ARCH_GET_UNTAG_MASK returns -1UL mask if LAM is not present or
not enabled. Applying this mask will give correct result for both.

Why is -ENODEV better here? Looks like just more work for userspace.

> 
> > +       case ARCH_ENABLE_TAGGED_ADDR:
> > +               return prctl_enable_tagged_addr(task->mm, arg2);
> >         default:
> >                 ret = -EINVAL;
> >                 break;
> > --
> > 2.35.1
> >
> 
> 
> -- 
> Alexander Potapenko
> Software Engineer
> 
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
> 
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 00/13] Linear Address Masking enabling
  2022-07-18 17:39 ` [PATCHv5 00/13] Linear Address Masking enabling Alexander Potapenko
@ 2022-07-20  0:59   ` Kirill A. Shutemov
  2022-07-21 13:09     ` Alexander Potapenko
  2022-07-21 17:07     ` Dave Hansen
  0 siblings, 2 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-20  0:59 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Alexander Potapenko
  Cc: Peter Zijlstra, the arch/x86 maintainers, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Taras Madan, Dmitry Vyukov,
	H . J . Lu, Andi Kleen, Rick Edgecombe,
	Linux Memory Management List, LKML

On Mon, Jul 18, 2022 at 07:39:22PM +0200, Alexander Potapenko wrote:
> On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > 64-bit linear addresses, allowing software to use of the untranslated
> > address bits for metadata.
> >
> > The patchset brings support for LAM for userspace addresses.
> >
> > LAM_U48 enabling is controversial since it competes for bits with
> > 5-level paging. Its enabling isolated into an optional last patch that
> > can be applied at maintainer's discretion.
> 
> I believe having optional patches will put unnecessary burden on
> distro maintainers.
> Soon after landing U48 support other changes will start piling on top
> of it, and it will be impossible to maintain a kernel with this patch
> removed.
> It also won't make any difference for the upstream, where this patch
> will be always present.
> 
> We'd better decide now whether we need U48 or not, and either keep it
> or delete it.

Dave, Andy, any position on this?

I wrote LAM_U48 support to prove that interface is flexible enough, but I
see why it can be a problem if a distro will pick them up ahead of
upstream.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-20  0:57     ` Kirill A. Shutemov
@ 2022-07-20  8:19       ` Alexander Potapenko
  2022-07-20 12:47         ` Kirill A. Shutemov
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-20  8:19 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 20, 2022 at 2:57 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Mon, Jul 18, 2022 at 07:47:44PM +0200, Alexander Potapenko wrote:
> > On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > Add a couple of arch_prctl() handles:
> > >
> > >  - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
> > >    of tag bits. It is rounded up to the nearest LAM mode that can
> > >    provide it. For now only LAM_U57 is supported, with 6 tag bits.
> > >
> > >  - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
> > >    bits located in the address.
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > ---
> > >  arch/x86/include/uapi/asm/prctl.h |  3 ++
> > >  arch/x86/kernel/process_64.c      | 60 ++++++++++++++++++++++++++++++-
> > >  2 files changed, 62 insertions(+), 1 deletion(-)
> >
> >
> > > +
> > > +static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
> > > +{
> > > +       int ret = 0;
> > > +
> > > +       if (!cpu_feature_enabled(X86_FEATURE_LAM))
> > > +               return -ENODEV;
> >
> > Hm, I used to think ENODEV is specific to devices, and -EINVAL is more
> > appropriate here.
> > On the other hand, e.g. prctl(PR_SET_SPECULATION_CTRL) can also return ENODEV...
>
> I'm fine either way. Although there are way too many -EINVALs around, so
> it does not communicate much to user.
>
> > >  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > >  {
> > >         int ret = 0;
> > > @@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > >         case ARCH_MAP_VDSO_64:
> > >                 return prctl_map_vdso(&vdso_image_64, arg2);
> > >  #endif
> > > -
> > > +       case ARCH_GET_UNTAG_MASK:
> > > +               return put_user(task->mm->context.untag_mask,
> > > +                               (unsigned long __user *)arg2);
> >
> > Can we have ARCH_GET_UNTAG_MASK return the same error value (ENODEV or
> > EINVAL) as ARCH_ENABLE_TAGGED_ADDR in the case the host doesn't
> > support LAM?
> > After all, the mask does not make much sense in this case.
>
> I'm not sure about this.
>
> As it is ARCH_GET_UNTAG_MASK returns -1UL mask if LAM is not present or
> not enabled. Applying this mask will give correct result for both.

Is anyone going to use this mask if tagging is unsupported?
Tools like HWASan won't even try to proceed in that case.

> Why is -ENODEV better here? Looks like just more work for userspace.

This boils down to the question of detecting LAM support I raised previously.
It's nice to have a syscall without side effects to check whether LAM
can be enabled at all (e.g. one can do the check in the parent process
and conditionally enable LAM in certain, but not all, child processes)
CPUID won't help here, because the presence of the LAM bit in CPUID
doesn't guarantee its support in the kernel, and every other solution
is more complicated than just issuing a system call.

Note that TBI has PR_GET_TAGGED_ADDR_CTRL, which can be used to detect
the presence of memory tagging support.

>
> >
> > > +       case ARCH_ENABLE_TAGGED_ADDR:
> > > +               return prctl_enable_tagged_addr(task->mm, arg2);
> > >         default:
> > >                 ret = -EINVAL;
> > >                 break;
> > > --
> > > 2.35.1
> > >
> >
> >
> > --
> > Alexander Potapenko
> > Software Engineer
> >
> > Google Germany GmbH
> > Erika-Mann-Straße, 33
> > 80636 München
> >
> > Geschäftsführer: Paul Manicle, Liana Sebastian
> > Registergericht und -nummer: Hamburg, HRB 86891
> > Sitz der Gesellschaft: Hamburg
>
> --
>  Kirill A. Shutemov



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch
  2022-07-13 15:02   ` [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2022-07-20  8:57     ` Alexander Potapenko
  2022-07-20 12:38       ` Kirill A. Shutemov
  2022-07-21 13:13     ` Alexander Potapenko
  1 sibling, 1 reply; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-20  8:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andi Kleen, Andrey Konovalov, Dave Hansen, Dmitriy Vyukov,
	H.J. Lu, Kostya Serebryany, LKML, Linux Memory Management List,
	Andy Lutomirski, Peter Zijlstra, Rick Edgecombe, Andrey Ryabinin,
	Taras Madan, the arch/x86 maintainers

>         /*
> @@ -491,6 +493,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  {
>         struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>         u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> +       unsigned long prev_lam = tlbstate_lam_cr3_mask();
Note: this variable is never used if CONFIG_DEBUG_VM is off.

>  #ifdef CONFIG_DEBUG_VM
> -       if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
> +       if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, prev_lam))) {
>                 /*
>                  * If we were to BUG here, we'd be very likely to kill
>                  * the system so hard that we don't see the call trace.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch
  2022-07-20  8:57     ` Alexander Potapenko
@ 2022-07-20 12:38       ` Kirill A. Shutemov
  0 siblings, 0 replies; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-20 12:38 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Andi Kleen, Andrey Konovalov, Dave Hansen, Dmitriy Vyukov,
	H.J. Lu, Kostya Serebryany, LKML, Linux Memory Management List,
	Andy Lutomirski, Peter Zijlstra, Rick Edgecombe, Andrey Ryabinin,
	Taras Madan, the arch/x86 maintainers

On Wed, Jul 20, 2022 at 10:57:01AM +0200, Alexander Potapenko wrote:
> >         /*
> > @@ -491,6 +493,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> >  {
> >         struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
> >         u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> > +       unsigned long prev_lam = tlbstate_lam_cr3_mask();
> Note: this variable is never used if CONFIG_DEBUG_VM is off.

Good point. I will add this:

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 4c93f87a8928..5e9ed9f55c36 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -558,6 +558,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
+		VM_WARN_ON(prev_lam != new_lam);

 		/*
 		 * Even in lazy TLB mode, the CPU should stay set in the
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-20  8:19       ` Alexander Potapenko
@ 2022-07-20 12:47         ` Kirill A. Shutemov
  2022-07-20 12:54           ` Alexander Potapenko
  0 siblings, 1 reply; 33+ messages in thread
From: Kirill A. Shutemov @ 2022-07-20 12:47 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 20, 2022 at 10:19:36AM +0200, Alexander Potapenko wrote:
> > > >  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > > >  {
> > > >         int ret = 0;
> > > > @@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > > >         case ARCH_MAP_VDSO_64:
> > > >                 return prctl_map_vdso(&vdso_image_64, arg2);
> > > >  #endif
> > > > -
> > > > +       case ARCH_GET_UNTAG_MASK:
> > > > +               return put_user(task->mm->context.untag_mask,
> > > > +                               (unsigned long __user *)arg2);
> > >
> > > Can we have ARCH_GET_UNTAG_MASK return the same error value (ENODEV or
> > > EINVAL) as ARCH_ENABLE_TAGGED_ADDR in the case the host doesn't
> > > support LAM?
> > > After all, the mask does not make much sense in this case.
> >
> > I'm not sure about this.
> >
> > As it is ARCH_GET_UNTAG_MASK returns -1UL mask if LAM is not present or
> > not enabled. Applying this mask will give correct result for both.
> 
> Is anyone going to use this mask if tagging is unsupported?
> Tools like HWASan won't even try to proceed in that case.

I can imagine the code that tries to be indifferent to whether a pointer
has tags. It gets mask from ARCH_GET_UNTAG_MASK and applies it to the
pointer without any conditions.

> > Why is -ENODEV better here? Looks like just more work for userspace.
> 
> This boils down to the question of detecting LAM support I raised previously.
> It's nice to have a syscall without side effects to check whether LAM
> can be enabled at all (e.g. one can do the check in the parent process
> and conditionally enable LAM in certain, but not all, child processes)
> CPUID won't help here, because the presence of the LAM bit in CPUID
> doesn't guarantee its support in the kernel, and every other solution
> is more complicated than just issuing a system call.
> 
> Note that TBI has PR_GET_TAGGED_ADDR_CTRL, which can be used to detect
> the presence of memory tagging support.

I would rather make enumeration explicit:

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 38164a05c23c..a31e27b95b19 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -22,5 +22,6 @@

 #define ARCH_GET_UNTAG_MASK		0x4001
 #define ARCH_ENABLE_TAGGED_ADDR		0x4002
+#define ARCH_GET_MAX_TAG_BITS		0x4003

 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index cfa2e42a135a..2e4df63b775f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -911,6 +911,13 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 				(unsigned long __user *)arg2);
 	case ARCH_ENABLE_TAGGED_ADDR:
 		return prctl_enable_tagged_addr(task->mm, arg2);
+	case ARCH_GET_MAX_TAG_BITS:
+		if (!cpu_feature_enabled(X86_FEATURE_LAM))
+			return put_user(0, (unsigned long __user *)arg2);
+		else if (lam_u48_allowed())
+			return put_user(15, (unsigned long __user *)arg2);
+		else
+			return put_user(6, (unsigned long __user *)arg2);
 	default:
 		ret = -EINVAL;
 		break;
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-20 12:47         ` Kirill A. Shutemov
@ 2022-07-20 12:54           ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-20 12:54 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 20, 2022 at 2:47 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Wed, Jul 20, 2022 at 10:19:36AM +0200, Alexander Potapenko wrote:
> > > > >  long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > > > >  {
> > > > >         int ret = 0;
> > > > > @@ -829,7 +883,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> > > > >         case ARCH_MAP_VDSO_64:
> > > > >                 return prctl_map_vdso(&vdso_image_64, arg2);
> > > > >  #endif
> > > > > -
> > > > > +       case ARCH_GET_UNTAG_MASK:
> > > > > +               return put_user(task->mm->context.untag_mask,
> > > > > +                               (unsigned long __user *)arg2);
> > > >
> > > > Can we have ARCH_GET_UNTAG_MASK return the same error value (ENODEV or
> > > > EINVAL) as ARCH_ENABLE_TAGGED_ADDR in the case the host doesn't
> > > > support LAM?
> > > > After all, the mask does not make much sense in this case.
> > >
> > > I'm not sure about this.
> > >
> > > As it is ARCH_GET_UNTAG_MASK returns -1UL mask if LAM is not present or
> > > not enabled. Applying this mask will give correct result for both.
> >
> > Is anyone going to use this mask if tagging is unsupported?
> > Tools like HWASan won't even try to proceed in that case.
>
> I can imagine the code that tries to be indifferent to whether a pointer
> has tags. It gets mask from ARCH_GET_UNTAG_MASK and applies it to the
> pointer without any conditions.

In that case there would still be just one call to ARCH_GET_UNTAG_MASK
to get the mask that will probably be applied many times.
So there's not a big difference with checking for -ENODEV and setting
that mask manually.
But your proposal with a special arch_prctl indeed looks cleaner.

> > > Why is -ENODEV better here? Looks like just more work for userspace.
> >
> > This boils down to the question of detecting LAM support I raised previously.
> > It's nice to have a syscall without side effects to check whether LAM
> > can be enabled at all (e.g. one can do the check in the parent process
> > and conditionally enable LAM in certain, but not all, child processes)
> > CPUID won't help here, because the presence of the LAM bit in CPUID
> > doesn't guarantee its support in the kernel, and every other solution
> > is more complicated than just issuing a system call.
> >
> > Note that TBI has PR_GET_TAGGED_ADDR_CTRL, which can be used to detect
> > the presence of memory tagging support.
>
> I would rather make enumeration explicit:

Ok, this would also work. Thanks!

> diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
> index 38164a05c23c..a31e27b95b19 100644
> --- a/arch/x86/include/uapi/asm/prctl.h
> +++ b/arch/x86/include/uapi/asm/prctl.h
> @@ -22,5 +22,6 @@
>
>  #define ARCH_GET_UNTAG_MASK            0x4001
>  #define ARCH_ENABLE_TAGGED_ADDR                0x4002
> +#define ARCH_GET_MAX_TAG_BITS          0x4003
>
>  #endif /* _ASM_X86_PRCTL_H */
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index cfa2e42a135a..2e4df63b775f 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -911,6 +911,13 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
>                                 (unsigned long __user *)arg2);
>         case ARCH_ENABLE_TAGGED_ADDR:
>                 return prctl_enable_tagged_addr(task->mm, arg2);
> +       case ARCH_GET_MAX_TAG_BITS:
> +               if (!cpu_feature_enabled(X86_FEATURE_LAM))
> +                       return put_user(0, (unsigned long __user *)arg2);
> +               else if (lam_u48_allowed())
> +                       return put_user(15, (unsigned long __user *)arg2);
> +               else
> +                       return put_user(6, (unsigned long __user *)arg2);
>         default:
>                 ret = -EINVAL;
>                 break;
> --
>  Kirill A. Shutemov



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 00/13] Linear Address Masking enabling
  2022-07-20  0:59   ` Kirill A. Shutemov
@ 2022-07-21 13:09     ` Alexander Potapenko
  2022-07-21 17:07     ` Dave Hansen
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:09 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 20, 2022 at 2:59 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Mon, Jul 18, 2022 at 07:39:22PM +0200, Alexander Potapenko wrote:
> > On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > Linear Address Masking[1] (LAM) modifies the checking that is applied to
> > > 64-bit linear addresses, allowing software to use of the untranslated
> > > address bits for metadata.
> > >
> > > The patchset brings support for LAM for userspace addresses.

For what it's worth, there's an LLVM bot running basic HWASan tests on
QEMU with the latest LAM patches here:
https://lab.llvm.org/buildbot/#/builders/169
So far the bot is happy, giving us some sense of LAM_U57 support being sane.
I'll add some tags to individual patches.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK
  2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
@ 2022-07-21 13:10   ` Alexander Potapenko
  2022-07-29  3:00   ` Hu, Robert
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The mask must not include bits above physical address mask. These bits
> are reserved and can be used for other things. Bits 61 and 62 are used
> for Linear Address Masking.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>

> ---
>  arch/x86/include/asm/processor-flags.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
> index 02c2cbda4a74..a7f3d9100adb 100644
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -35,7 +35,7 @@
>   */
>  #ifdef CONFIG_X86_64
>  /* Mask off the address space ID and SME encryption bits. */
> -#define CR3_ADDR_MASK  __sme_clr(0x7FFFFFFFFFFFF000ull)
> +#define CR3_ADDR_MASK  __sme_clr(PHYSICAL_PAGE_MASK)
>  #define CR3_PCID_MASK  0xFFFull
>  #define CR3_NOFLUSH    BIT_ULL(63)
>
> --
> 2.35.1
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking
  2022-07-12 23:13 ` [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
@ 2022-07-21 13:10   ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Enumerate Linear Address Masking and provide defines for CR3 and CR4
> flags.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>


> ---
>  arch/x86/include/asm/cpufeatures.h          | 1 +
>  arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 03acc823838a..6ad5841e087f 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -300,6 +300,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI           (12*32+ 4) /* AVX VNNI instructions */
>  #define X86_FEATURE_AVX512_BF16                (12*32+ 5) /* AVX512 BFLOAT16 instructions */
> +#define X86_FEATURE_LAM                        (12*32+26) /* Linear Address Masking */
>
>  /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
>  #define X86_FEATURE_CLZERO             (13*32+ 0) /* CLZERO instruction */
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index c47cc7f2feeb..d898432947ff 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -82,6 +82,10 @@
>  #define X86_CR3_PCID_BITS      12
>  #define X86_CR3_PCID_MASK      (_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
>
> +#define X86_CR3_LAM_U57_BIT    61 /* Activate LAM for userspace, 62:57 bits masked */
> +#define X86_CR3_LAM_U57                _BITULL(X86_CR3_LAM_U57_BIT)
> +#define X86_CR3_LAM_U48_BIT    62 /* Activate LAM for userspace, 62:48 bits masked */
> +#define X86_CR3_LAM_U48                _BITULL(X86_CR3_LAM_U48_BIT)
>  #define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
>  #define X86_CR3_PCID_NOFLUSH    _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
>
> @@ -132,6 +136,8 @@
>  #define X86_CR4_PKE            _BITUL(X86_CR4_PKE_BIT)
>  #define X86_CR4_CET_BIT                23 /* enable Control-flow Enforcement Technology */
>  #define X86_CR4_CET            _BITUL(X86_CR4_CET_BIT)
> +#define X86_CR4_LAM_SUP_BIT    28 /* LAM for supervisor pointers */
> +#define X86_CR4_LAM_SUP                _BITUL(X86_CR4_LAM_SUP_BIT)
>
>  /*
>   * x86-64 Task Priority Register, CR8
> --
> 2.35.1
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr()
  2022-07-12 23:13 ` [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
@ 2022-07-21 13:12   ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Intel Linear Address Masking (LAM) brings per-mm untagging rules. Pass
> down mm_struct to the untagging helper. It will help to apply untagging
> policy correctly.
>
> In most cases, current->mm is the one to use, but there are some
> exceptions, such as get_user_page_remote().
>
> Move dummy implementation of untagged_addr() from <linux/mm.h> to
> <linux/uaccess.h>. <asm/uaccess.h> can override the implementation.
> Moving the dummy header outside <linux/mm.h> helps to avoid header hell
> if you need to defer mm_struct within the helper.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>


> ---
>  arch/arm64/include/asm/memory.h                  |  4 ++--
>  arch/arm64/include/asm/signal.h                  |  2 +-
>  arch/arm64/include/asm/uaccess.h                 |  4 ++--
>  arch/arm64/kernel/hw_breakpoint.c                |  2 +-
>  arch/arm64/kernel/traps.c                        |  4 ++--
>  arch/arm64/mm/fault.c                            | 10 +++++-----
>  arch/sparc/include/asm/pgtable_64.h              |  2 +-
>  arch/sparc/include/asm/uaccess_64.h              |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          |  2 +-
>  drivers/gpu/drm/radeon/radeon_gem.c              |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c                  |  2 +-
>  drivers/media/common/videobuf2/frame_vector.c    |  2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c    |  2 +-
>  drivers/staging/media/atomisp/pci/hmm/hmm_bo.c   |  2 +-
>  drivers/tee/tee_shm.c                            |  2 +-
>  drivers/vfio/vfio_iommu_type1.c                  |  2 +-
>  fs/proc/task_mmu.c                               |  2 +-
>  include/linux/mm.h                               | 11 -----------
>  include/linux/uaccess.h                          | 15 +++++++++++++++
>  lib/strncpy_from_user.c                          |  2 +-
>  lib/strnlen_user.c                               |  2 +-
>  mm/gup.c                                         |  6 +++---
>  mm/madvise.c                                     |  2 +-
>  mm/mempolicy.c                                   |  6 +++---
>  mm/migrate.c                                     |  2 +-
>  mm/mincore.c                                     |  2 +-
>  mm/mlock.c                                       |  4 ++--
>  mm/mmap.c                                        |  2 +-
>  mm/mprotect.c                                    |  2 +-
>  mm/mremap.c                                      |  2 +-
>  mm/msync.c                                       |  2 +-
>  virt/kvm/kvm_main.c                              |  2 +-
>  33 files changed, 59 insertions(+), 53 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0af70d9abede..88bee513b74c 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -215,8 +215,8 @@ static inline unsigned long kaslr_offset(void)
>  #define __untagged_addr(addr)  \
>         ((__force __typeof__(addr))sign_extend64((__force u64)(addr), 55))
>
> -#define untagged_addr(addr)    ({                                      \
> -       u64 __addr = (__force u64)(addr);                                       \
> +#define untagged_addr(mm, addr)        ({                                      \
> +       u64 __addr = (__force u64)(addr);                               \
>         __addr &= __untagged_addr(__addr);                              \
>         (__force __typeof__(addr))__addr;                               \
>  })
> diff --git a/arch/arm64/include/asm/signal.h b/arch/arm64/include/asm/signal.h
> index ef449f5f4ba8..0899c355c398 100644
> --- a/arch/arm64/include/asm/signal.h
> +++ b/arch/arm64/include/asm/signal.h
> @@ -18,7 +18,7 @@ static inline void __user *arch_untagged_si_addr(void __user *addr,
>         if (sig == SIGTRAP && si_code == TRAP_BRKPT)
>                 return addr;
>
> -       return untagged_addr(addr);
> +       return untagged_addr(current->mm, addr);
>  }
>  #define arch_untagged_si_addr arch_untagged_si_addr
>
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 63f9c828f1a7..bdcc014bd297 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -44,7 +44,7 @@ static inline int access_ok(const void __user *addr, unsigned long size)
>          */
>         if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
>             (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
> -               addr = untagged_addr(addr);
> +               addr = untagged_addr(current->mm, addr);
>
>         return likely(__access_ok(addr, size));
>  }
> @@ -217,7 +217,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
>         "       csel    %0, %1, xzr, eq\n"
>         : "=&r" (safe_ptr)
>         : "r" (ptr), "r" (TASK_SIZE_MAX - 1),
> -         "r" (untagged_addr(ptr))
> +         "r" (untagged_addr(current->mm, ptr))
>         : "cc");
>
>         csdb();
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index b29a311bb055..d637cee7b771 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -715,7 +715,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
>         u64 wp_low, wp_high;
>         u32 lens, lene;
>
> -       addr = untagged_addr(addr);
> +       addr = untagged_addr(current->mm, addr);
>
>         lens = __ffs(ctrl->len);
>         lene = __fls(ctrl->len);
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 9ac7a81b79be..385612d9890b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -476,7 +476,7 @@ void arm64_notify_segfault(unsigned long addr)
>         int code;
>
>         mmap_read_lock(current->mm);
> -       if (find_vma(current->mm, untagged_addr(addr)) == NULL)
> +       if (find_vma(current->mm, untagged_addr(current->mm, addr)) == NULL)
>                 code = SEGV_MAPERR;
>         else
>                 code = SEGV_ACCERR;
> @@ -540,7 +540,7 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
>         int ret = 0;
>
>         tagged_address = pt_regs_read_reg(regs, rt);
> -       address = untagged_addr(tagged_address);
> +       address = untagged_addr(current->mm, tagged_address);
>
>         switch (crm) {
>         case ESR_ELx_SYS64_ISS_CRM_DC_CVAU:     /* DC CVAU, gets promoted */
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index c5e11768e5c1..9577d7e37f36 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -454,7 +454,7 @@ static void set_thread_esr(unsigned long address, unsigned long esr)
>  static void do_bad_area(unsigned long far, unsigned long esr,
>                         struct pt_regs *regs)
>  {
> -       unsigned long addr = untagged_addr(far);
> +       unsigned long addr = untagged_addr(current->mm, far);
>
>         /*
>          * If we are in kernel mode at this point, we have no context to
> @@ -524,7 +524,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>         vm_fault_t fault;
>         unsigned long vm_flags;
>         unsigned int mm_flags = FAULT_FLAG_DEFAULT;
> -       unsigned long addr = untagged_addr(far);
> +       unsigned long addr = untagged_addr(mm, far);
>
>         if (kprobe_page_fault(regs, esr))
>                 return 0;
> @@ -675,7 +675,7 @@ static int __kprobes do_translation_fault(unsigned long far,
>                                           unsigned long esr,
>                                           struct pt_regs *regs)
>  {
> -       unsigned long addr = untagged_addr(far);
> +       unsigned long addr = untagged_addr(current->mm, far);
>
>         if (is_ttbr0_addr(addr))
>                 return do_page_fault(far, esr, regs);
> @@ -719,7 +719,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
>                  * UNKNOWN for synchronous external aborts. Mask them out now
>                  * so that userspace doesn't see them.
>                  */
> -               siaddr  = untagged_addr(far);
> +               siaddr  = untagged_addr(current->mm, far);
>         }
>         arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>
> @@ -809,7 +809,7 @@ static const struct fault_info fault_info[] = {
>  void do_mem_abort(unsigned long far, unsigned long esr, struct pt_regs *regs)
>  {
>         const struct fault_info *inf = esr_to_fault_info(esr);
> -       unsigned long addr = untagged_addr(far);
> +       unsigned long addr = untagged_addr(current->mm, far);
>
>         if (!inf->fn(far, esr, regs))
>                 return;
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 4679e45c8348..1336d7bfaab9 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -1071,7 +1071,7 @@ static inline unsigned long __untagged_addr(unsigned long start)
>
>         return start;
>  }
> -#define untagged_addr(addr) \
> +#define untagged_addr(mm, addr) \
>         ((__typeof__(addr))(__untagged_addr((unsigned long)(addr))))
>
>  static inline bool pte_access_permitted(pte_t pte, bool write)
> diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
> index 94266a5c5b04..b825a5dd0210 100644
> --- a/arch/sparc/include/asm/uaccess_64.h
> +++ b/arch/sparc/include/asm/uaccess_64.h
> @@ -8,8 +8,10 @@
>
>  #include <linux/compiler.h>
>  #include <linux/string.h>
> +#include <linux/mm_types.h>
>  #include <asm/asi.h>
>  #include <asm/spitfire.h>
> +#include <asm/pgtable.h>
>
>  #include <asm/processor.h>
>  #include <asm-generic/access_ok.h>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 6b6d46e29e6e..b37199b16643 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1491,7 +1491,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>                 if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>                         if (!offset || !*offset)
>                                 return -EINVAL;
> -                       user_addr = untagged_addr(*offset);
> +                       user_addr = untagged_addr(current->mm, *offset);
>                 } else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
>                                     KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
>                         bo_type = ttm_bo_type_sg;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 8ef31d687ef3..691dfb3f2c0e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -382,7 +382,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
>         uint32_t handle;
>         int r;
>
> -       args->addr = untagged_addr(args->addr);
> +       args->addr = untagged_addr(current->mm, args->addr);
>
>         if (offset_in_page(args->addr | args->size))
>                 return -EINVAL;
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index 8c01a7f0e027..2c3980677f64 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -371,7 +371,7 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
>         uint32_t handle;
>         int r;
>
> -       args->addr = untagged_addr(args->addr);
> +       args->addr = untagged_addr(current->mm, args->addr);
>
>         if (offset_in_page(args->addr | args->size))
>                 return -EINVAL;
> diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
> index 04a67b481608..b2860feeae3c 100644
> --- a/drivers/infiniband/hw/mlx4/mr.c
> +++ b/drivers/infiniband/hw/mlx4/mr.c
> @@ -379,7 +379,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_device *device, u64 start,
>          * again
>          */
>         if (!ib_access_writable(access_flags)) {
> -               unsigned long untagged_start = untagged_addr(start);
> +               unsigned long untagged_start = untagged_addr(current->mm, start);
>                 struct vm_area_struct *vma;
>
>                 mmap_read_lock(current->mm);
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index 542dde9d2609..7e62f7a2555d 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>         if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
>                 nr_frames = vec->nr_allocated;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(mm, start);
>
>         ret = pin_user_pages_fast(start, nr_frames,
>                                   FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..a1444f8afa05 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -157,8 +157,8 @@ static void videobuf_dma_contig_user_put(struct videobuf_dma_contig_memory *mem)
>  static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
>                                         struct videobuf_buffer *vb)
>  {
> -       unsigned long untagged_baddr = untagged_addr(vb->baddr);
>         struct mm_struct *mm = current->mm;
> +       unsigned long untagged_baddr = untagged_addr(mm, vb->baddr);
>         struct vm_area_struct *vma;
>         unsigned long prev_pfn, this_pfn;
>         unsigned long pages_done, user_address;
> diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
> index 0168f9839c90..863d30a7ad23 100644
> --- a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
> +++ b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
> @@ -913,7 +913,7 @@ static int alloc_user_pages(struct hmm_buffer_object *bo,
>          * and map to user space
>          */
>
> -       userptr = untagged_addr(userptr);
> +       userptr = untagged_addr(current->mm, userptr);
>
>         bo->pages = pages;
>
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index f2b1bcefcadd..386be09cb2cd 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -261,7 +261,7 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
>         shm->flags = flags;
>         shm->ctx = ctx;
>         shm->id = id;
> -       addr = untagged_addr(addr);
> +       addr = untagged_addr(current->mm, addr);
>         start = rounddown(addr, PAGE_SIZE);
>         shm->offset = addr - start;
>         shm->size = length;
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c13b9290e357..5ac6c61d7caa 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -561,7 +561,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
>                 goto done;
>         }
>
> -       vaddr = untagged_addr(vaddr);
> +       vaddr = untagged_addr(mm, vaddr);
>
>  retry:
>         vma = vma_lookup(mm, vaddr);
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 2d04e3470d4c..c7d262bd6d6b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1659,7 +1659,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>         /* watch out for wraparound */
>         start_vaddr = end_vaddr;
>         if (svpfn <= (ULONG_MAX >> PAGE_SHIFT))
> -               start_vaddr = untagged_addr(svpfn << PAGE_SHIFT);
> +               start_vaddr = untagged_addr(mm, svpfn << PAGE_SHIFT);
>
>         /* Ensure the address is inside the task */
>         if (start_vaddr > mm->task_size)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..f0cb92ff1391 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -94,17 +94,6 @@ extern int mmap_rnd_compat_bits __read_mostly;
>  #include <asm/page.h>
>  #include <asm/processor.h>
>
> -/*
> - * Architectures that support memory tagging (assigning tags to memory regions,
> - * embedding these tags into addresses that point to these memory regions, and
> - * checking that the memory and the pointer tags match on memory accesses)
> - * redefine this macro to strip tags from pointers.
> - * It's defined as noop for architectures that don't support memory tagging.
> - */
> -#ifndef untagged_addr
> -#define untagged_addr(addr) (addr)
> -#endif
> -
>  #ifndef __pa_symbol
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 5a328cf02b75..46fd816179d7 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -10,6 +10,21 @@
>
>  #include <asm/uaccess.h>
>
> +/*
> + * Architectures that support memory tagging (assigning tags to memory regions,
> + * embedding these tags into addresses that point to these memory regions, and
> + * checking that the memory and the pointer tags match on memory accesses)
> + * redefine this macro to strip tags from pointers.
> + *
> + * Passing down mm_struct allows to define untagging rules on per-process
> + * basis.
> + *
> + * It's defined as noop for architectures that don't support memory tagging.
> + */
> +#ifndef untagged_addr
> +#define untagged_addr(mm, addr) (addr)
> +#endif
> +
>  /*
>   * Architectures should provide two primitives (raw_copy_{to,from}_user())
>   * and get rid of their private instances of copy_{to,from}_user() and
> diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> index 6432b8c3e431..6e1e2aa0c994 100644
> --- a/lib/strncpy_from_user.c
> +++ b/lib/strncpy_from_user.c
> @@ -121,7 +121,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
>                 return 0;
>
>         max_addr = TASK_SIZE_MAX;
> -       src_addr = (unsigned long)untagged_addr(src);
> +       src_addr = (unsigned long)untagged_addr(current->mm, src);
>         if (likely(src_addr < max_addr)) {
>                 unsigned long max = max_addr - src_addr;
>                 long retval;
> diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
> index feeb935a2299..abc096a68f05 100644
> --- a/lib/strnlen_user.c
> +++ b/lib/strnlen_user.c
> @@ -97,7 +97,7 @@ long strnlen_user(const char __user *str, long count)
>                 return 0;
>
>         max_addr = TASK_SIZE_MAX;
> -       src_addr = (unsigned long)untagged_addr(str);
> +       src_addr = (unsigned long)untagged_addr(current->mm, str);
>         if (likely(src_addr < max_addr)) {
>                 unsigned long max = max_addr - src_addr;
>                 long retval;
> diff --git a/mm/gup.c b/mm/gup.c
> index 551264407624..dbe825faf842 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1104,7 +1104,7 @@ static long __get_user_pages(struct mm_struct *mm,
>         if (!nr_pages)
>                 return 0;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(mm, start);
>
>         VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));
>
> @@ -1285,7 +1285,7 @@ int fixup_user_fault(struct mm_struct *mm,
>         struct vm_area_struct *vma;
>         vm_fault_t ret;
>
> -       address = untagged_addr(address);
> +       address = untagged_addr(mm, address);
>
>         if (unlocked)
>                 fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> @@ -2865,7 +2865,7 @@ static int internal_get_user_pages_fast(unsigned long start,
>         if (!(gup_flags & FOLL_FAST_ONLY))
>                 might_lock_read(&current->mm->mmap_lock);
>
> -       start = untagged_addr(start) & PAGE_MASK;
> +       start = untagged_addr(current->mm, start) & PAGE_MASK;
>         len = nr_pages << PAGE_SHIFT;
>         if (check_add_overflow(start, len, &end))
>                 return 0;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index d7b4f2602949..e3c668ddb099 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1373,7 +1373,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
>         size_t len;
>         struct blk_plug plug;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(mm, start);
>
>         if (!madvise_behavior_valid(behavior))
>                 return -EINVAL;
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index d39b01fd52fe..a03b4d2bc26a 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1458,7 +1458,7 @@ static long kernel_mbind(unsigned long start, unsigned long len,
>         int lmode = mode;
>         int err;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(current->mm, start);
>         err = sanitize_mpol_flags(&lmode, &mode_flags);
>         if (err)
>                 return err;
> @@ -1481,7 +1481,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
>         unsigned long end;
>         int err = -ENOENT;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(mm, start);
>         if (start & ~PAGE_MASK)
>                 return -EINVAL;
>         /*
> @@ -1684,7 +1684,7 @@ static int kernel_get_mempolicy(int __user *policy,
>         if (nmask != NULL && maxnode < nr_node_ids)
>                 return -EINVAL;
>
> -       addr = untagged_addr(addr);
> +       addr = untagged_addr(current->mm, addr);
>
>         err = do_get_mempolicy(&pval, &nodes, addr, flags);
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index e51588e95f57..af05049b055b 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1714,7 +1714,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>                         goto out_flush;
>                 if (get_user(node, nodes + i))
>                         goto out_flush;
> -               addr = (unsigned long)untagged_addr(p);
> +               addr = (unsigned long)untagged_addr(mm, p);
>
>                 err = -ENODEV;
>                 if (node < 0 || node >= MAX_NUMNODES)
> diff --git a/mm/mincore.c b/mm/mincore.c
> index fa200c14185f..72c55bd9d184 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -236,7 +236,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
>         unsigned long pages;
>         unsigned char *tmp;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(current->mm, start);
>
>         /* Check the start address: needs to be page-aligned.. */
>         if (start & ~PAGE_MASK)
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 716caf851043..054168d3e648 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -571,7 +571,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
>         unsigned long lock_limit;
>         int error = -ENOMEM;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(current->mm, start);
>
>         if (!can_do_mlock())
>                 return -EPERM;
> @@ -634,7 +634,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
>  {
>         int ret;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(current->mm, start);
>
>         len = PAGE_ALIGN(len + (offset_in_page(start)));
>         start &= PAGE_MASK;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 61e6135c54ef..1a7baf6b6b8e 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2926,7 +2926,7 @@ EXPORT_SYMBOL(vm_munmap);
>
>  SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
>  {
> -       addr = untagged_addr(addr);
> +       addr = untagged_addr(current->mm, addr);
>         return __vm_munmap(addr, len, true);
>  }
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ba5592655ee3..871e954f6155 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -622,7 +622,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>                                 (prot & PROT_READ);
>         struct mmu_gather tlb;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(current->mm, start);
>
>         prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
>         if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
> diff --git a/mm/mremap.c b/mm/mremap.c
> index b522cd0259a0..f76648bc4f67 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -906,7 +906,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>          *
>          * See Documentation/arm64/tagged-address-abi.rst for more information.
>          */
> -       addr = untagged_addr(addr);
> +       addr = untagged_addr(mm, addr);
>
>         if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
>                 return ret;
> diff --git a/mm/msync.c b/mm/msync.c
> index 137d1c104f3e..5fe989bd3c4b 100644
> --- a/mm/msync.c
> +++ b/mm/msync.c
> @@ -37,7 +37,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
>         int unmapped_error = 0;
>         int error = -EINVAL;
>
> -       start = untagged_addr(start);
> +       start = untagged_addr(mm, start);
>
>         if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
>                 goto out;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a49df8988cd6..03f7ad0ebc8a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1876,7 +1876,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>                 return -EINVAL;
>         /* We can read the guest memory with __xxx_user() later on. */
>         if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> -           (mem->userspace_addr != untagged_addr(mem->userspace_addr)) ||
> +           (mem->userspace_addr != untagged_addr(kvm->mm, mem->userspace_addr)) ||
>              !access_ok((void __user *)(unsigned long)mem->userspace_addr,
>                         mem->memory_size))
>                 return -EINVAL;
> --
> 2.35.1
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch
  2022-07-13 15:02   ` [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
  2022-07-20  8:57     ` Alexander Potapenko
@ 2022-07-21 13:13     ` Alexander Potapenko
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andi Kleen, Andrey Konovalov, Dave Hansen, Dmitriy Vyukov,
	H.J. Lu, Kostya Serebryany, LKML, Linux Memory Management List,
	Andy Lutomirski, Peter Zijlstra, Rick Edgecombe, Andrey Ryabinin,
	Taras Madan, the arch/x86 maintainers

On Wed, Jul 13, 2022 at 5:02 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Linear Address Masking mode for userspace pointers encoded in CR3 bits.
> The mode is selected per-thread. Add new thread features indicate that the
> thread has Linear Address Masking enabled.
>
> switch_mm_irqs_off() now respects these flags and constructs CR3
> accordingly.
>
> The active LAM mode gets recorded in the tlb_state.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alexander Potapenko <glider@google.com>

> ---
>  v5.1:
>   - Fix build issue with CONFIG_MODULE=y
> ---
>  arch/x86/include/asm/mmu.h         |  3 +++
>  arch/x86/include/asm/mmu_context.h | 24 +++++++++++++++++
>  arch/x86/include/asm/tlbflush.h    | 35 +++++++++++++++++++++++++
>  arch/x86/mm/tlb.c                  | 42 +++++++++++++++++++-----------
>  4 files changed, 89 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index 5d7494631ea9..002889ca8978 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -40,6 +40,9 @@ typedef struct {
>
>  #ifdef CONFIG_X86_64
>         unsigned short flags;
> +
> +       /* Active LAM mode:  X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
> +       unsigned long lam_cr3_mask;
>  #endif
>
>         struct mutex lock;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index b8d40ddeab00..69c943b2ae90 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
>  }
>  #endif
>
> +#ifdef CONFIG_X86_64
> +static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
> +{
> +       return mm->context.lam_cr3_mask;
> +}
> +
> +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> +{
> +       mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
> +}
> +
> +#else
> +
> +static inline unsigned long mm_lam_cr3_mask(struct mm_struct *mm)
> +{
> +       return 0;
> +}
> +
> +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> +{
> +}
> +#endif
> +
>  #define enter_lazy_tlb enter_lazy_tlb
>  extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
>
> @@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
>  {
>         arch_dup_pkeys(oldmm, mm);
>         paravirt_arch_dup_mmap(oldmm, mm);
> +       dup_lam(oldmm, mm);
>         return ldt_dup_context(oldmm, mm);
>  }
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 4af5579c7ef7..efe83d33327f 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -100,6 +100,16 @@ struct tlb_state {
>          */
>         bool invalidate_other;
>
> +#ifdef CONFIG_X86_64
> +       /*
> +        * Active LAM mode.
> +        *
> +        * X86_CR3_LAM_U57/U48 shifted right by X86_CR3_LAM_U57_BIT or 0 if LAM
> +        * disabled.
> +        */
> +       u8 lam;
> +#endif
> +
>         /*
>          * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate
>          * the corresponding user PCID needs a flush next time we
> @@ -356,6 +366,30 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
>  }
>  #define huge_pmd_needs_flush huge_pmd_needs_flush
>
> +#ifdef CONFIG_X86_64
> +static inline unsigned long tlbstate_lam_cr3_mask(void)
> +{
> +       unsigned long lam = this_cpu_read(cpu_tlbstate.lam);
> +
> +       return lam << X86_CR3_LAM_U57_BIT;
> +}
> +
> +static inline void set_tlbstate_cr3_lam_mask(unsigned long mask)
> +{
> +       this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
> +}
> +
> +#else
> +
> +static inline unsigned long tlbstate_lam_cr3_mask(void)
> +{
> +       return 0;
> +}
> +
> +static inline void set_tlbstate_cr3_lam_mask(u64 mask)
> +{
> +}
> +#endif
>  #endif /* !MODULE */
>
>  static inline void __native_tlb_flush_global(unsigned long cr4)
> @@ -363,4 +397,5 @@ static inline void __native_tlb_flush_global(unsigned long cr4)
>         native_write_cr4(cr4 ^ X86_CR4_PGE);
>         native_write_cr4(cr4);
>  }
> +
>  #endif /* _ASM_X86_TLBFLUSH_H */
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d400b6d9d246..4c93f87a8928 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -154,17 +154,18 @@ static inline u16 user_pcid(u16 asid)
>         return ret;
>  }
>
> -static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
> +static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam)
>  {
>         if (static_cpu_has(X86_FEATURE_PCID)) {
> -               return __sme_pa(pgd) | kern_pcid(asid);
> +               return __sme_pa(pgd) | kern_pcid(asid) | lam;
>         } else {
>                 VM_WARN_ON_ONCE(asid != 0);
> -               return __sme_pa(pgd);
> +               return __sme_pa(pgd) | lam;
>         }
>  }
>
> -static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
> +static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid,
> +                                             unsigned long lam)
>  {
>         VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
>         /*
> @@ -173,7 +174,7 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
>          * boot because all CPU's the have same capabilities:
>          */
>         VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
> -       return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
> +       return __sme_pa(pgd) | kern_pcid(asid) | lam | CR3_NOFLUSH;
>  }
>
>  /*
> @@ -274,15 +275,16 @@ static inline void invalidate_user_asid(u16 asid)
>                   (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
>  }
>
> -static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
> +static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam,
> +                           bool need_flush)
>  {
>         unsigned long new_mm_cr3;
>
>         if (need_flush) {
>                 invalidate_user_asid(new_asid);
> -               new_mm_cr3 = build_cr3(pgdir, new_asid);
> +               new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
>         } else {
> -               new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
> +               new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
>         }
>
>         /*
> @@ -491,6 +493,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  {
>         struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>         u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> +       unsigned long prev_lam = tlbstate_lam_cr3_mask();
> +       unsigned long new_lam = mm_lam_cr3_mask(next);
>         bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
>         unsigned cpu = smp_processor_id();
>         u64 next_tlb_gen;
> @@ -520,7 +524,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>          * isn't free.
>          */
>  #ifdef CONFIG_DEBUG_VM
> -       if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
> +       if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, prev_lam))) {
>                 /*
>                  * If we were to BUG here, we'd be very likely to kill
>                  * the system so hard that we don't see the call trace.
> @@ -622,15 +626,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>                 barrier();
>         }
>
> +       set_tlbstate_cr3_lam_mask(new_lam);
>         if (need_flush) {
>                 this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
>                 this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
> -               load_new_mm_cr3(next->pgd, new_asid, true);
> +               load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
>
>                 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>         } else {
>                 /* The new ASID is already up to date. */
> -               load_new_mm_cr3(next->pgd, new_asid, false);
> +               load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
>
>                 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
>         }
> @@ -691,6 +696,10 @@ void initialize_tlbstate_and_flush(void)
>         /* Assert that CR3 already references the right mm. */
>         WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
>
> +       /* LAM expected to be disabled in CR3 and init_mm */
> +       WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
> +       WARN_ON(mm_lam_cr3_mask(&init_mm));
> +
>         /*
>          * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
>          * doesn't work like other CR4 bits because it can only be set from
> @@ -699,8 +708,8 @@ void initialize_tlbstate_and_flush(void)
>         WARN_ON(boot_cpu_has(X86_FEATURE_PCID) &&
>                 !(cr4_read_shadow() & X86_CR4_PCIDE));
>
> -       /* Force ASID 0 and force a TLB flush. */
> -       write_cr3(build_cr3(mm->pgd, 0));
> +       /* Disable LAM, force ASID 0 and force a TLB flush. */
> +       write_cr3(build_cr3(mm->pgd, 0, 0));
>
>         /* Reinitialize tlbstate. */
>         this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
> @@ -708,6 +717,7 @@ void initialize_tlbstate_and_flush(void)
>         this_cpu_write(cpu_tlbstate.next_asid, 1);
>         this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
>         this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
> +       set_tlbstate_cr3_lam_mask(0);
>
>         for (i = 1; i < TLB_NR_DYN_ASIDS; i++)
>                 this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0);
> @@ -1047,8 +1057,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>   */
>  unsigned long __get_current_cr3_fast(void)
>  {
> -       unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
> -               this_cpu_read(cpu_tlbstate.loaded_mm_asid));
> +       unsigned long cr3 =
> +               build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
> +               this_cpu_read(cpu_tlbstate.loaded_mm_asid),
> +               tlbstate_lam_cr3_mask());
>
>         /* For now, be very restrictive about when this can be called. */
>         VM_WARN_ON(in_nmi() || preemptible());
> --
> 2.35.1
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-07-12 23:13 ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
  2022-07-13 15:02   ` [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2022-07-21 13:14   ` Alexander Potapenko
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:14 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> untagged_addr() is a helper used by the core-mm to strip tag bits and
> get the address to the canonical shape. In only handles userspace
> addresses. The untagging mask is stored in mmu_context and will be set
> on enabling LAM for the process.
>
> The tags must not be included into check whether it's okay to access the
> userspace address.
>
> Strip tags in access_ok().
>
> get_user() and put_user() don't use access_ok(), but check access
> against TASK_SIZE directly in assembly. Strip tags, before calling into
> the assembly helper.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-07-12 23:13 ` [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
@ 2022-07-21 13:47   ` Alexander Potapenko
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Potapenko @ 2022-07-21 13:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, Linux Memory Management List, LKML

On Wed, Jul 13, 2022 at 1:13 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Add a line in /proc/$PID/arch_status to report untag_mask. It can be
> used to find out LAM status of the process from the outside. It is
> useful for debuggers.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alexander Potapenko <glider@google.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCHv5 00/13] Linear Address Masking enabling
  2022-07-20  0:59   ` Kirill A. Shutemov
  2022-07-21 13:09     ` Alexander Potapenko
@ 2022-07-21 17:07     ` Dave Hansen
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Hansen @ 2022-07-21 17:07 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Alexander Potapenko
  Cc: Peter Zijlstra, the arch/x86 maintainers, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Taras Madan, Dmitry Vyukov,
	H . J . Lu, Andi Kleen, Rick Edgecombe,
	Linux Memory Management List, LKML

On 7/19/22 17:59, Kirill A. Shutemov wrote:
> Dave, Andy, any position on this?
> 
> I wrote LAM_U48 support to prove that interface is flexible enough, but I
> see why it can be a problem if a distro will pick them up ahead of
> upstream.

My position is that maintaining distro forks is troublesome.  If you
held a gun to my head today and made me merge *something* I'd leave out
the U48 patch, but reserve the right to add it later.

I'm not sure whether that makes the distros lives easier or harder.  I'm
not promising anything either way, though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK
  2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
  2022-07-21 13:10   ` Alexander Potapenko
@ 2022-07-29  3:00   ` Hu, Robert
  1 sibling, 0 replies; 33+ messages in thread
From: Hu, Robert @ 2022-07-29  3:00 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Lutomirski, Andy, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Taras Madan, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Edgecombe, Rick P, linux-mm, linux-kernel

> -----Original Message-----
> From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Sent: Wednesday, July 13, 2022 07:13
> To: Dave Hansen <dave.hansen@linux.intel.com>; Lutomirski, Andy
> <luto@kernel.org>; Peter Zijlstra <peterz@infradead.org>
> Cc: x86@kernel.org; Kostya Serebryany <kcc@google.com>; Andrey Ryabinin
> <ryabinin.a.a@gmail.com>; Andrey Konovalov <andreyknvl@gmail.com>;
> Alexander Potapenko <glider@google.com>; Taras Madan
> <tarasmadan@google.com>; Dmitry Vyukov <dvyukov@google.com>; H . J . Lu
> <hjl.tools@gmail.com>; Andi Kleen <ak@linux.intel.com>; Edgecombe, Rick P
> <rick.p.edgecombe@intel.com>; linux-mm@kvack.org; linux-
> kernel@vger.kernel.org; Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Subject: [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK
> 
> The mask must not include bits above physical address mask. These bits are
> reserved and can be used for other things. Bits 61 and 62 are used for Linear
> Address Masking.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/include/asm/processor-flags.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/processor-flags.h
> b/arch/x86/include/asm/processor-flags.h
> index 02c2cbda4a74..a7f3d9100adb 100644
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -35,7 +35,7 @@
>   */
[Hu, Robert] 
The comments above these #define's, explaining CR3 layout, can be updated on
the new CR3 bits as well?

>  #ifdef CONFIG_X86_64
>  /* Mask off the address space ID and SME encryption bits. */
> -#define CR3_ADDR_MASK	__sme_clr(0x7FFFFFFFFFFFF000ull)
> +#define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
>  #define CR3_PCID_MASK	0xFFFull
>  #define CR3_NOFLUSH	BIT_ULL(63)
> 
> --
> 2.35.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-07-29  3:01 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 23:13 [PATCHv5 00/13] Linear Address Masking enabling Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 01/13] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
2022-07-21 13:10   ` Alexander Potapenko
2022-07-29  3:00   ` Hu, Robert
2022-07-12 23:13 ` [PATCHv5 02/13] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2022-07-21 13:10   ` Alexander Potapenko
2022-07-12 23:13 ` [PATCHv5 03/13] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
2022-07-21 13:12   ` Alexander Potapenko
2022-07-12 23:13 ` [PATCHv5 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2022-07-13 15:02   ` [PATCHv5.1 04/13] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2022-07-20  8:57     ` Alexander Potapenko
2022-07-20 12:38       ` Kirill A. Shutemov
2022-07-21 13:13     ` Alexander Potapenko
2022-07-21 13:14   ` [PATCHv5 05/13] x86/uaccess: Provide untagged_addr() and remove tags before address check Alexander Potapenko
2022-07-12 23:13 ` [PATCHv5 06/13] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
2022-07-18 17:47   ` Alexander Potapenko
2022-07-20  0:57     ` Kirill A. Shutemov
2022-07-20  8:19       ` Alexander Potapenko
2022-07-20 12:47         ` Kirill A. Shutemov
2022-07-20 12:54           ` Alexander Potapenko
2022-07-12 23:13 ` [PATCHv5 07/13] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
2022-07-21 13:47   ` Alexander Potapenko
2022-07-12 23:13 ` [PATCHv5 08/13] selftests/x86/lam: Add malloc test cases for linear-address masking Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 09/13] selftests/x86/lam: Add mmap and SYSCALL " Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 10/13] selftests/x86/lam: Add io_uring " Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 11/13] selftests/x86/lam: Add inherit " Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 OPTIONAL 12/13] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
2022-07-12 23:13 ` [PATCHv5 OPTIONAL 13/13] selftests/x86/lam: Add tests cases for LAM_U48 Kirill A. Shutemov
2022-07-18 17:39 ` [PATCHv5 00/13] Linear Address Masking enabling Alexander Potapenko
2022-07-20  0:59   ` Kirill A. Shutemov
2022-07-21 13:09     ` Alexander Potapenko
2022-07-21 17:07     ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).