linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/8] Linear Address Masking enabling
@ 2022-06-10 14:35 Kirill A. Shutemov
  2022-06-10 14:35 ` [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
                   ` (9 more replies)
  0 siblings, 10 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Linear Address Masking[1] (LAM) modifies the checking that is applied to
64-bit linear addresses, allowing software to use of the untranslated
address bits for metadata.

The patchset brings support for LAM for userspace addresses.

LAM_U48 enabling is controversial since it competes for bits with
5-level paging. Its enabling isolated into an optional last patch that
can be applied at maintainer's discretion.

Please review and consider applying.

v3:
  - Rebased onto v5.19-rc1
  - Per-process enabling;
  - API overhaul (again);
  - Avoid branches and costly computations in the fast path;
  - LAM_U48 is in optional patch.
v2:
  - Rebased onto v5.18-rc1
  - New arch_prctl(2)-based API
  - Expose status of LAM (or other thread features) in
    /proc/$PID/arch_status

[1] ISE, Chapter 14.
https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf

Kirill A. Shutemov (8):
  x86/mm: Fix CR3_ADDR_MASK
  x86: CPUID and CR3/CR4 flags for Linear Address Masking
  mm: Pass down mm_struct to untagged_addr()
  x86/mm: Handle LAM on context switch
  x86/uaccess: Provide untagged_addr() and remove tags before address check
  x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  x86: Expose untagging mask in /proc/$PID/arch_status
  x86/mm: Extend LAM to support to LAM_U48

 arch/arm64/include/asm/memory.h               |  4 +-
 arch/arm64/include/asm/signal.h               |  2 +-
 arch/arm64/include/asm/uaccess.h              |  4 +-
 arch/arm64/kernel/hw_breakpoint.c             |  2 +-
 arch/arm64/kernel/traps.c                     |  4 +-
 arch/arm64/mm/fault.c                         | 10 +--
 arch/sparc/include/asm/pgtable_64.h           |  2 +-
 arch/sparc/include/asm/uaccess_64.h           |  2 +
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/elf.h                    |  3 +-
 arch/x86/include/asm/mmu.h                    |  2 +
 arch/x86/include/asm/mmu_context.h            | 58 +++++++++++++++++
 arch/x86/include/asm/processor-flags.h        |  2 +-
 arch/x86/include/asm/tlbflush.h               |  3 +
 arch/x86/include/asm/uaccess.h                | 44 ++++++++++++-
 arch/x86/include/uapi/asm/prctl.h             |  3 +
 arch/x86/include/uapi/asm/processor-flags.h   |  6 ++
 arch/x86/kernel/Makefile                      |  2 +
 arch/x86/kernel/fpu/xstate.c                  | 47 --------------
 arch/x86/kernel/proc.c                        | 50 +++++++++++++++
 arch/x86/kernel/process.c                     |  3 +
 arch/x86/kernel/process_64.c                  | 54 +++++++++++++++-
 arch/x86/kernel/sys_x86_64.c                  |  5 +-
 arch/x86/mm/hugetlbpage.c                     |  6 +-
 arch/x86/mm/mmap.c                            |  9 ++-
 arch/x86/mm/tlb.c                             | 62 ++++++++++++++-----
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  2 +-
 drivers/gpu/drm/radeon/radeon_gem.c           |  2 +-
 drivers/infiniband/hw/mlx4/mr.c               |  2 +-
 drivers/media/common/videobuf2/frame_vector.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 .../staging/media/atomisp/pci/hmm/hmm_bo.c    |  2 +-
 drivers/tee/tee_shm.c                         |  2 +-
 drivers/vfio/vfio_iommu_type1.c               |  2 +-
 fs/proc/task_mmu.c                            |  2 +-
 include/linux/mm.h                            | 11 ----
 include/linux/uaccess.h                       | 11 ++++
 lib/strncpy_from_user.c                       |  2 +-
 lib/strnlen_user.c                            |  2 +-
 mm/gup.c                                      |  6 +-
 mm/madvise.c                                  |  2 +-
 mm/mempolicy.c                                |  6 +-
 mm/migrate.c                                  |  2 +-
 mm/mincore.c                                  |  2 +-
 mm/mlock.c                                    |  4 +-
 mm/mmap.c                                     |  2 +-
 mm/mprotect.c                                 |  2 +-
 mm/mremap.c                                   |  2 +-
 mm/msync.c                                    |  2 +-
 virt/kvm/kvm_main.c                           |  2 +-
 51 files changed, 342 insertions(+), 126 deletions(-)
 create mode 100644 arch/x86/kernel/proc.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 23:32   ` Edgecombe, Rick P
  2022-06-10 14:35 ` [PATCHv3 2/8] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

The mask must not include bits above physical address mask. These bits
are reserved and can be used for other things. Bits 61 and 62 are used
for Linear Address Masking.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/processor-flags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 02c2cbda4a74..a7f3d9100adb 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -35,7 +35,7 @@
  */
 #ifdef CONFIG_X86_64
 /* Mask off the address space ID and SME encryption bits. */
-#define CR3_ADDR_MASK	__sme_clr(0x7FFFFFFFFFFFF000ull)
+#define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 2/8] x86: CPUID and CR3/CR4 flags for Linear Address Masking
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
  2022-06-10 14:35 ` [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 14:35 ` [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Enumerate Linear Address Masking and provide defines for CR3 and CR4
flags.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/uapi/asm/processor-flags.h | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 393f2bbb5e3a..9835ab09b590 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -300,6 +300,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index c47cc7f2feeb..d898432947ff 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -82,6 +82,10 @@
 #define X86_CR3_PCID_BITS	12
 #define X86_CR3_PCID_MASK	(_AC((1UL << X86_CR3_PCID_BITS) - 1, UL))
 
+#define X86_CR3_LAM_U57_BIT	61 /* Activate LAM for userspace, 62:57 bits masked */
+#define X86_CR3_LAM_U57		_BITULL(X86_CR3_LAM_U57_BIT)
+#define X86_CR3_LAM_U48_BIT	62 /* Activate LAM for userspace, 62:48 bits masked */
+#define X86_CR3_LAM_U48		_BITULL(X86_CR3_LAM_U48_BIT)
 #define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */
 #define X86_CR3_PCID_NOFLUSH    _BITULL(X86_CR3_PCID_NOFLUSH_BIT)
 
@@ -132,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
+#define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr()
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
  2022-06-10 14:35 ` [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
  2022-06-10 14:35 ` [PATCHv3 2/8] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 23:33   ` Edgecombe, Rick P
  2022-06-17 15:27   ` Alexander Potapenko
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Intel Linear Address Masking (LAM) brings per-mm untagging rules. Pass
down mm_struct to the untagging helper. It will help to apply untagging
policy correctly.

In most cases, current->mm is the one to use, but there are some
exceptions, such as get_user_page_remote().

Move dummy implementation of untagged_addr() from <linux/mm.h> to
<linux/uaccess.h>. <asm/uaccess.h> can override the implementation.
Moving the dummy header outside <linux/mm.h> helps to avoid header hell
if you need to defer mm_struct within the helper.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/arm64/include/asm/memory.h                  |  4 ++--
 arch/arm64/include/asm/signal.h                  |  2 +-
 arch/arm64/include/asm/uaccess.h                 |  4 ++--
 arch/arm64/kernel/hw_breakpoint.c                |  2 +-
 arch/arm64/kernel/traps.c                        |  4 ++--
 arch/arm64/mm/fault.c                            | 10 +++++-----
 arch/sparc/include/asm/pgtable_64.h              |  2 +-
 arch/sparc/include/asm/uaccess_64.h              |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          |  2 +-
 drivers/gpu/drm/radeon/radeon_gem.c              |  2 +-
 drivers/infiniband/hw/mlx4/mr.c                  |  2 +-
 drivers/media/common/videobuf2/frame_vector.c    |  2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c    |  2 +-
 drivers/staging/media/atomisp/pci/hmm/hmm_bo.c   |  2 +-
 drivers/tee/tee_shm.c                            |  2 +-
 drivers/vfio/vfio_iommu_type1.c                  |  2 +-
 fs/proc/task_mmu.c                               |  2 +-
 include/linux/mm.h                               | 11 -----------
 include/linux/uaccess.h                          | 11 +++++++++++
 lib/strncpy_from_user.c                          |  2 +-
 lib/strnlen_user.c                               |  2 +-
 mm/gup.c                                         |  6 +++---
 mm/madvise.c                                     |  2 +-
 mm/mempolicy.c                                   |  6 +++---
 mm/migrate.c                                     |  2 +-
 mm/mincore.c                                     |  2 +-
 mm/mlock.c                                       |  4 ++--
 mm/mmap.c                                        |  2 +-
 mm/mprotect.c                                    |  2 +-
 mm/mremap.c                                      |  2 +-
 mm/msync.c                                       |  2 +-
 virt/kvm/kvm_main.c                              |  2 +-
 33 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..88bee513b74c 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -215,8 +215,8 @@ static inline unsigned long kaslr_offset(void)
 #define __untagged_addr(addr)	\
 	((__force __typeof__(addr))sign_extend64((__force u64)(addr), 55))
 
-#define untagged_addr(addr)	({					\
-	u64 __addr = (__force u64)(addr);					\
+#define untagged_addr(mm, addr)	({					\
+	u64 __addr = (__force u64)(addr);				\
 	__addr &= __untagged_addr(__addr);				\
 	(__force __typeof__(addr))__addr;				\
 })
diff --git a/arch/arm64/include/asm/signal.h b/arch/arm64/include/asm/signal.h
index ef449f5f4ba8..0899c355c398 100644
--- a/arch/arm64/include/asm/signal.h
+++ b/arch/arm64/include/asm/signal.h
@@ -18,7 +18,7 @@ static inline void __user *arch_untagged_si_addr(void __user *addr,
 	if (sig == SIGTRAP && si_code == TRAP_BRKPT)
 		return addr;
 
-	return untagged_addr(addr);
+	return untagged_addr(current->mm, addr);
 }
 #define arch_untagged_si_addr arch_untagged_si_addr
 
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 63f9c828f1a7..bdcc014bd297 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -44,7 +44,7 @@ static inline int access_ok(const void __user *addr, unsigned long size)
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
 	    (current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
-		addr = untagged_addr(addr);
+		addr = untagged_addr(current->mm, addr);
 
 	return likely(__access_ok(addr, size));
 }
@@ -217,7 +217,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
 	"	csel	%0, %1, xzr, eq\n"
 	: "=&r" (safe_ptr)
 	: "r" (ptr), "r" (TASK_SIZE_MAX - 1),
-	  "r" (untagged_addr(ptr))
+	  "r" (untagged_addr(current->mm, ptr))
 	: "cc");
 
 	csdb();
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index b29a311bb055..d637cee7b771 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -715,7 +715,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
 	u64 wp_low, wp_high;
 	u32 lens, lene;
 
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 
 	lens = __ffs(ctrl->len);
 	lene = __fls(ctrl->len);
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9ac7a81b79be..385612d9890b 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -476,7 +476,7 @@ void arm64_notify_segfault(unsigned long addr)
 	int code;
 
 	mmap_read_lock(current->mm);
-	if (find_vma(current->mm, untagged_addr(addr)) == NULL)
+	if (find_vma(current->mm, untagged_addr(current->mm, addr)) == NULL)
 		code = SEGV_MAPERR;
 	else
 		code = SEGV_ACCERR;
@@ -540,7 +540,7 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
 	int ret = 0;
 
 	tagged_address = pt_regs_read_reg(regs, rt);
-	address = untagged_addr(tagged_address);
+	address = untagged_addr(current->mm, tagged_address);
 
 	switch (crm) {
 	case ESR_ELx_SYS64_ISS_CRM_DC_CVAU:	/* DC CVAU, gets promoted */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index c5e11768e5c1..9577d7e37f36 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -454,7 +454,7 @@ static void set_thread_esr(unsigned long address, unsigned long esr)
 static void do_bad_area(unsigned long far, unsigned long esr,
 			struct pt_regs *regs)
 {
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	/*
 	 * If we are in kernel mode at this point, we have no context to
@@ -524,7 +524,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	vm_fault_t fault;
 	unsigned long vm_flags;
 	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(mm, far);
 
 	if (kprobe_page_fault(regs, esr))
 		return 0;
@@ -675,7 +675,7 @@ static int __kprobes do_translation_fault(unsigned long far,
 					  unsigned long esr,
 					  struct pt_regs *regs)
 {
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	if (is_ttbr0_addr(addr))
 		return do_page_fault(far, esr, regs);
@@ -719,7 +719,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
 		 * UNKNOWN for synchronous external aborts. Mask them out now
 		 * so that userspace doesn't see them.
 		 */
-		siaddr  = untagged_addr(far);
+		siaddr  = untagged_addr(current->mm, far);
 	}
 	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
 
@@ -809,7 +809,7 @@ static const struct fault_info fault_info[] = {
 void do_mem_abort(unsigned long far, unsigned long esr, struct pt_regs *regs)
 {
 	const struct fault_info *inf = esr_to_fault_info(esr);
-	unsigned long addr = untagged_addr(far);
+	unsigned long addr = untagged_addr(current->mm, far);
 
 	if (!inf->fn(far, esr, regs))
 		return;
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 4679e45c8348..1336d7bfaab9 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1071,7 +1071,7 @@ static inline unsigned long __untagged_addr(unsigned long start)
 
 	return start;
 }
-#define untagged_addr(addr) \
+#define untagged_addr(mm, addr) \
 	((__typeof__(addr))(__untagged_addr((unsigned long)(addr))))
 
 static inline bool pte_access_permitted(pte_t pte, bool write)
diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index 94266a5c5b04..b825a5dd0210 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -8,8 +8,10 @@
 
 #include <linux/compiler.h>
 #include <linux/string.h>
+#include <linux/mm_types.h>
 #include <asm/asi.h>
 #include <asm/spitfire.h>
+#include <asm/pgtable.h>
 
 #include <asm/processor.h>
 #include <asm-generic/access_ok.h>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 67abf8dcd30a..6136d7000844 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1491,7 +1491,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		if (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
 			if (!offset || !*offset)
 				return -EINVAL;
-			user_addr = untagged_addr(*offset);
+			user_addr = untagged_addr(current->mm, *offset);
 		} else if (flags & (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
 				    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
 			bo_type = ttm_bo_type_sg;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8ef31d687ef3..691dfb3f2c0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -382,7 +382,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	uint32_t handle;
 	int r;
 
-	args->addr = untagged_addr(args->addr);
+	args->addr = untagged_addr(current->mm, args->addr);
 
 	if (offset_in_page(args->addr | args->size))
 		return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 8c01a7f0e027..2c3980677f64 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -371,7 +371,7 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	uint32_t handle;
 	int r;
 
-	args->addr = untagged_addr(args->addr);
+	args->addr = untagged_addr(current->mm, args->addr);
 
 	if (offset_in_page(args->addr | args->size))
 		return -EINVAL;
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 04a67b481608..b2860feeae3c 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -379,7 +379,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_device *device, u64 start,
 	 * again
 	 */
 	if (!ib_access_writable(access_flags)) {
-		unsigned long untagged_start = untagged_addr(start);
+		unsigned long untagged_start = untagged_addr(current->mm, start);
 		struct vm_area_struct *vma;
 
 		mmap_read_lock(current->mm);
diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 542dde9d2609..7e62f7a2555d 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	if (WARN_ON_ONCE(nr_frames > vec->nr_allocated))
 		nr_frames = vec->nr_allocated;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	ret = pin_user_pages_fast(start, nr_frames,
 				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..a1444f8afa05 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -157,8 +157,8 @@ static void videobuf_dma_contig_user_put(struct videobuf_dma_contig_memory *mem)
 static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 					struct videobuf_buffer *vb)
 {
-	unsigned long untagged_baddr = untagged_addr(vb->baddr);
 	struct mm_struct *mm = current->mm;
+	unsigned long untagged_baddr = untagged_addr(mm, vb->baddr);
 	struct vm_area_struct *vma;
 	unsigned long prev_pfn, this_pfn;
 	unsigned long pages_done, user_address;
diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
index 0168f9839c90..863d30a7ad23 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
@@ -913,7 +913,7 @@ static int alloc_user_pages(struct hmm_buffer_object *bo,
 	 * and map to user space
 	 */
 
-	userptr = untagged_addr(userptr);
+	userptr = untagged_addr(current->mm, userptr);
 
 	bo->pages = pages;
 
diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index f2b1bcefcadd..386be09cb2cd 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -261,7 +261,7 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
 	shm->flags = flags;
 	shm->ctx = ctx;
 	shm->id = id;
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 	start = rounddown(addr, PAGE_SIZE);
 	shm->offset = addr - start;
 	shm->size = length;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..5ac6c61d7caa 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -561,7 +561,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 		goto done;
 	}
 
-	vaddr = untagged_addr(vaddr);
+	vaddr = untagged_addr(mm, vaddr);
 
 retry:
 	vma = vma_lookup(mm, vaddr);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2d04e3470d4c..c7d262bd6d6b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1659,7 +1659,7 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	/* watch out for wraparound */
 	start_vaddr = end_vaddr;
 	if (svpfn <= (ULONG_MAX >> PAGE_SHIFT))
-		start_vaddr = untagged_addr(svpfn << PAGE_SHIFT);
+		start_vaddr = untagged_addr(mm, svpfn << PAGE_SHIFT);
 
 	/* Ensure the address is inside the task */
 	if (start_vaddr > mm->task_size)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..f0cb92ff1391 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -94,17 +94,6 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #include <asm/page.h>
 #include <asm/processor.h>
 
-/*
- * Architectures that support memory tagging (assigning tags to memory regions,
- * embedding these tags into addresses that point to these memory regions, and
- * checking that the memory and the pointer tags match on memory accesses)
- * redefine this macro to strip tags from pointers.
- * It's defined as noop for architectures that don't support memory tagging.
- */
-#ifndef untagged_addr
-#define untagged_addr(addr) (addr)
-#endif
-
 #ifndef __pa_symbol
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf02b75..05a157bbaaef 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -10,6 +10,17 @@
 
 #include <asm/uaccess.h>
 
+/*
+ * Architectures that support memory tagging (assigning tags to memory regions,
+ * embedding these tags into addresses that point to these memory regions, and
+ * checking that the memory and the pointer tags match on memory accesses)
+ * redefine this macro to strip tags from pointers.
+ * It's defined as noop for architectures that don't support memory tagging.
+ */
+#ifndef untagged_addr
+#define untagged_addr(mm, addr) (addr)
+#endif
+
 /*
  * Architectures should provide two primitives (raw_copy_{to,from}_user())
  * and get rid of their private instances of copy_{to,from}_user() and
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index 6432b8c3e431..6e1e2aa0c994 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -121,7 +121,7 @@ long strncpy_from_user(char *dst, const char __user *src, long count)
 		return 0;
 
 	max_addr = TASK_SIZE_MAX;
-	src_addr = (unsigned long)untagged_addr(src);
+	src_addr = (unsigned long)untagged_addr(current->mm, src);
 	if (likely(src_addr < max_addr)) {
 		unsigned long max = max_addr - src_addr;
 		long retval;
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index feeb935a2299..abc096a68f05 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -97,7 +97,7 @@ long strnlen_user(const char __user *str, long count)
 		return 0;
 
 	max_addr = TASK_SIZE_MAX;
-	src_addr = (unsigned long)untagged_addr(str);
+	src_addr = (unsigned long)untagged_addr(current->mm, str);
 	if (likely(src_addr < max_addr)) {
 		unsigned long max = max_addr - src_addr;
 		long retval;
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..dbe825faf842 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1104,7 +1104,7 @@ static long __get_user_pages(struct mm_struct *mm,
 	if (!nr_pages)
 		return 0;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	VM_BUG_ON(!!pages != !!(gup_flags & (FOLL_GET | FOLL_PIN)));
 
@@ -1285,7 +1285,7 @@ int fixup_user_fault(struct mm_struct *mm,
 	struct vm_area_struct *vma;
 	vm_fault_t ret;
 
-	address = untagged_addr(address);
+	address = untagged_addr(mm, address);
 
 	if (unlocked)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -2865,7 +2865,7 @@ static int internal_get_user_pages_fast(unsigned long start,
 	if (!(gup_flags & FOLL_FAST_ONLY))
 		might_lock_read(&current->mm->mmap_lock);
 
-	start = untagged_addr(start) & PAGE_MASK;
+	start = untagged_addr(current->mm, start) & PAGE_MASK;
 	len = nr_pages << PAGE_SHIFT;
 	if (check_add_overflow(start, len, &end))
 		return 0;
diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..e3c668ddb099 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1373,7 +1373,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
 	size_t len;
 	struct blk_plug plug;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	if (!madvise_behavior_valid(behavior))
 		return -EINVAL;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..a03b4d2bc26a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1458,7 +1458,7 @@ static long kernel_mbind(unsigned long start, unsigned long len,
 	int lmode = mode;
 	int err;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 	err = sanitize_mpol_flags(&lmode, &mode_flags);
 	if (err)
 		return err;
@@ -1481,7 +1481,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
 	unsigned long end;
 	int err = -ENOENT;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 	if (start & ~PAGE_MASK)
 		return -EINVAL;
 	/*
@@ -1684,7 +1684,7 @@ static int kernel_get_mempolicy(int __user *policy,
 	if (nmask != NULL && maxnode < nr_node_ids)
 		return -EINVAL;
 
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 
 	err = do_get_mempolicy(&pval, &nodes, addr, flags);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..af05049b055b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1714,7 +1714,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 			goto out_flush;
 		if (get_user(node, nodes + i))
 			goto out_flush;
-		addr = (unsigned long)untagged_addr(p);
+		addr = (unsigned long)untagged_addr(mm, p);
 
 		err = -ENODEV;
 		if (node < 0 || node >= MAX_NUMNODES)
diff --git a/mm/mincore.c b/mm/mincore.c
index fa200c14185f..72c55bd9d184 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -236,7 +236,7 @@ SYSCALL_DEFINE3(mincore, unsigned long, start, size_t, len,
 	unsigned long pages;
 	unsigned char *tmp;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	/* Check the start address: needs to be page-aligned.. */
 	if (start & ~PAGE_MASK)
diff --git a/mm/mlock.c b/mm/mlock.c
index 716caf851043..054168d3e648 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -571,7 +571,7 @@ static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t fla
 	unsigned long lock_limit;
 	int error = -ENOMEM;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	if (!can_do_mlock())
 		return -EPERM;
@@ -634,7 +634,7 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 {
 	int ret;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	len = PAGE_ALIGN(len + (offset_in_page(start)));
 	start &= PAGE_MASK;
diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..1a7baf6b6b8e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2926,7 +2926,7 @@ EXPORT_SYMBOL(vm_munmap);
 
 SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 {
-	addr = untagged_addr(addr);
+	addr = untagged_addr(current->mm, addr);
 	return __vm_munmap(addr, len, true);
 }
 
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ba5592655ee3..871e954f6155 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -622,7 +622,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 				(prot & PROT_READ);
 	struct mmu_gather tlb;
 
-	start = untagged_addr(start);
+	start = untagged_addr(current->mm, start);
 
 	prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
 	if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
diff --git a/mm/mremap.c b/mm/mremap.c
index b522cd0259a0..f76648bc4f67 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -906,7 +906,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	 *
 	 * See Documentation/arm64/tagged-address-abi.rst for more information.
 	 */
-	addr = untagged_addr(addr);
+	addr = untagged_addr(mm, addr);
 
 	if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_DONTUNMAP))
 		return ret;
diff --git a/mm/msync.c b/mm/msync.c
index 137d1c104f3e..5fe989bd3c4b 100644
--- a/mm/msync.c
+++ b/mm/msync.c
@@ -37,7 +37,7 @@ SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
 	int unmapped_error = 0;
 	int error = -EINVAL;
 
-	start = untagged_addr(start);
+	start = untagged_addr(mm, start);
 
 	if (flags & ~(MS_ASYNC | MS_INVALIDATE | MS_SYNC))
 		goto out;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 64ec2222a196..93d5b2e3073a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1876,7 +1876,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		return -EINVAL;
 	/* We can read the guest memory with __xxx_user() later on. */
 	if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
-	    (mem->userspace_addr != untagged_addr(mem->userspace_addr)) ||
+	    (mem->userspace_addr != untagged_addr(kvm->mm, mem->userspace_addr)) ||
 	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
 			mem->memory_size))
 		return -EINVAL;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 23:55   ` Edgecombe, Rick P
                     ` (3 more replies)
  2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
                   ` (5 subsequent siblings)
  9 siblings, 4 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Linear Address Masking mode for userspace pointers encoded in CR3 bits.
The mode is selected per-thread. Add new thread features indicate that the
thread has Linear Address Masking enabled.

switch_mm_irqs_off() now respects these flags and constructs CR3
accordingly.

The active LAM mode gets recorded in the tlb_state.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         |  1 +
 arch/x86/include/asm/mmu_context.h | 24 ++++++++++++
 arch/x86/include/asm/tlbflush.h    |  3 ++
 arch/x86/mm/tlb.c                  | 62 ++++++++++++++++++++++--------
 4 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5d7494631ea9..d150e92163b6 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -40,6 +40,7 @@ typedef struct {
 
 #ifdef CONFIG_X86_64
 	unsigned short flags;
+	u64 lam_cr3_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..e6eac047c728 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
 }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
+{
+	return mm->context.lam_cr3_mask;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+}
+
+#else
+
+static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+}
+#endif
+
 #define enter_lazy_tlb enter_lazy_tlb
 extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
 
@@ -168,6 +191,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	arch_dup_pkeys(oldmm, mm);
 	paravirt_arch_dup_mmap(oldmm, mm);
+	dup_lam(oldmm, mm);
 	return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4af5579c7ef7..5b93dad93ff4 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -86,6 +86,9 @@ struct tlb_state {
 		unsigned long		last_user_mm_spec;
 	};
 
+#ifdef CONFIG_X86_64
+	u64 lam_cr3_mask;
+#endif
 	u16 loaded_mm_asid;
 	u16 next_asid;
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..458867a8f4bd 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -154,17 +154,17 @@ static inline u16 user_pcid(u16 asid)
 	return ret;
 }
 
-static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, u64 lam)
 {
 	if (static_cpu_has(X86_FEATURE_PCID)) {
-		return __sme_pa(pgd) | kern_pcid(asid);
+		return __sme_pa(pgd) | kern_pcid(asid) | lam;
 	} else {
 		VM_WARN_ON_ONCE(asid != 0);
-		return __sme_pa(pgd);
+		return __sme_pa(pgd) | lam;
 	}
 }
 
-static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
+static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid, u64 lam)
 {
 	VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
 	/*
@@ -173,7 +173,7 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 	 * boot because all CPU's the have same capabilities:
 	 */
 	VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
-	return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
+	return __sme_pa(pgd) | kern_pcid(asid) | lam | CR3_NOFLUSH;
 }
 
 /*
@@ -274,15 +274,15 @@ static inline void invalidate_user_asid(u16 asid)
 		  (unsigned long *)this_cpu_ptr(&cpu_tlbstate.user_pcid_flush_mask));
 }
 
-static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush)
+static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, u64 lam, bool need_flush)
 {
 	unsigned long new_mm_cr3;
 
 	if (need_flush) {
 		invalidate_user_asid(new_asid);
-		new_mm_cr3 = build_cr3(pgdir, new_asid);
+		new_mm_cr3 = build_cr3(pgdir, new_asid, lam);
 	} else {
-		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid);
+		new_mm_cr3 = build_cr3_noflush(pgdir, new_asid, lam);
 	}
 
 	/*
@@ -486,11 +486,36 @@ void cr4_update_pce(void *ignored)
 static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
 #endif
 
+#ifdef CONFIG_X86_64
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	return this_cpu_read(cpu_tlbstate.lam_cr3_mask);
+}
+
+static inline void set_tlbstate_lam_cr3_mask(u64 mask)
+{
+	this_cpu_write(cpu_tlbstate.lam_cr3_mask, mask);
+}
+
+#else
+
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_lam_cr3_mask(u64 mask)
+{
+}
+#endif
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
 	struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+	u64 prev_lam = tlbstate_lam_cr3_mask();
+	u64 new_lam = mm_cr3_lam_mask(next);
 	bool was_lazy = this_cpu_read(cpu_tlbstate_shared.is_lazy);
 	unsigned cpu = smp_processor_id();
 	u64 next_tlb_gen;
@@ -504,6 +529,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * cpu_tlbstate.loaded_mm) matches next.
 	 *
 	 * NB: leave_mm() calls us with prev == NULL and tsk == NULL.
+	 *
+	 * NB: Initial LAM enabling calls us with prev == next. We must update
+	 * CR3 if prev_lam doesn't match the new one.
 	 */
 
 	/* We don't want flush_tlb_func() to run concurrently with us. */
@@ -520,7 +548,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * isn't free.
 	 */
 #ifdef CONFIG_DEBUG_VM
-	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {
+	if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid, prev_lam))) {
 		/*
 		 * If we were to BUG here, we'd be very likely to kill
 		 * the system so hard that we don't see the call trace.
@@ -551,7 +579,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * provides that full memory barrier and core serializing
 	 * instruction.
 	 */
-	if (real_prev == next) {
+	if (real_prev == next && prev_lam == new_lam) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
 
@@ -622,15 +650,16 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		barrier();
 	}
 
+	set_tlbstate_lam_cr3_mask(new_lam);
 	if (need_flush) {
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
 		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
-		load_new_mm_cr3(next->pgd, new_asid, true);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, true);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
 	} else {
 		/* The new ASID is already up to date. */
-		load_new_mm_cr3(next->pgd, new_asid, false);
+		load_new_mm_cr3(next->pgd, new_asid, new_lam, false);
 
 		trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
 	}
@@ -687,6 +716,7 @@ void initialize_tlbstate_and_flush(void)
 	struct mm_struct *mm = this_cpu_read(cpu_tlbstate.loaded_mm);
 	u64 tlb_gen = atomic64_read(&init_mm.context.tlb_gen);
 	unsigned long cr3 = __read_cr3();
+	u64 lam = cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
 
 	/* Assert that CR3 already references the right mm. */
 	WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
@@ -700,7 +730,7 @@ void initialize_tlbstate_and_flush(void)
 		!(cr4_read_shadow() & X86_CR4_PCIDE));
 
 	/* Force ASID 0 and force a TLB flush. */
-	write_cr3(build_cr3(mm->pgd, 0));
+	write_cr3(build_cr3(mm->pgd, 0, lam));
 
 	/* Reinitialize tlbstate. */
 	this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
@@ -1047,8 +1077,10 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
  */
 unsigned long __get_current_cr3_fast(void)
 {
-	unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
-		this_cpu_read(cpu_tlbstate.loaded_mm_asid));
+	unsigned long cr3 =
+		build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,
+		this_cpu_read(cpu_tlbstate.loaded_mm_asid),
+		tlbstate_lam_cr3_mask());
 
 	/* For now, be very restrictive about when this can be called. */
 	VM_WARN_ON(in_nmi() || preemptible());
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-13 17:36   ` Edgecombe, Rick P
                     ` (2 more replies)
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
                   ` (4 subsequent siblings)
  9 siblings, 3 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

untagged_addr() is a helper used by the core-mm to strip tag bits and
get the address to the canonical shape. In only handles userspace
addresses. The untagging mask is stored in mmu_context and will be set
on enabling LAM for the process.

The tags must not be included into check whether it's okay to access the
userspace address.

Strip tags in access_ok().

get_user() and put_user() don't use access_ok(), but check access
against TASK_SIZE directly in assembly. Strip tags, before calling into
the assembly helper.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu.h         |  1 +
 arch/x86/include/asm/mmu_context.h | 11 ++++++++
 arch/x86/include/asm/uaccess.h     | 44 ++++++++++++++++++++++++++++--
 arch/x86/kernel/process.c          |  3 ++
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index d150e92163b6..59c617fc7c6f 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -41,6 +41,7 @@ typedef struct {
 #ifdef CONFIG_X86_64
 	unsigned short flags;
 	u64 lam_cr3_mask;
+	u64 untag_mask;
 #endif
 
 	struct mutex lock;
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index e6eac047c728..05821534aadc 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -100,6 +100,12 @@ static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
+	mm->context.untag_mask = oldmm->context.untag_mask;
+}
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+	mm->context.untag_mask = -1UL;
 }
 
 #else
@@ -112,6 +118,10 @@ static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
 static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 }
+
+static inline void mm_reset_untag_mask(struct mm_struct *mm)
+{
+}
 #endif
 
 #define enter_lazy_tlb enter_lazy_tlb
@@ -138,6 +148,7 @@ static inline int init_new_context(struct task_struct *tsk,
 		mm->context.execute_only_pkey = -1;
 	}
 #endif
+	mm_reset_untag_mask(mm);
 	init_new_context_ldt(mm);
 	return 0;
 }
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 35f222aa66bf..ca754521a4db 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -6,6 +6,7 @@
  */
 #include <linux/compiler.h>
 #include <linux/kasan-checks.h>
+#include <linux/mm_types.h>
 #include <linux/string.h>
 #include <asm/asm.h>
 #include <asm/page.h>
@@ -20,6 +21,32 @@ static inline bool pagefault_disabled(void);
 # define WARN_ON_IN_IRQ()
 #endif
 
+#ifdef CONFIG_X86_64
+/*
+ * Mask out tag bits from the address.
+ *
+ * Magic with the 'sign' allows to untag userspace pointer without any branches
+ * while leaving kernel addresses intact.
+ */
+#define untagged_addr(mm, addr)	({					\
+	u64 __addr = (__force u64)(addr);				\
+	s64 sign = (s64)__addr >> 63;					\
+	__addr ^= sign;							\
+	__addr &= (mm)->context.untag_mask;				\
+	__addr ^= sign;							\
+	(__force __typeof__(addr))__addr;				\
+})
+
+#define untagged_ptr(mm, ptr)	({					\
+	u64 __ptrval = (__force u64)(ptr);				\
+	__ptrval = untagged_addr(mm, __ptrval);				\
+	(__force __typeof__(*(ptr)) *)__ptrval;				\
+})
+#else
+#define untagged_addr(mm, addr)	(addr)
+#define untagged_ptr(mm, ptr)	(ptr)
+#endif
+
 /**
  * access_ok - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
@@ -40,7 +67,7 @@ static inline bool pagefault_disabled(void);
 #define access_ok(addr, size)					\
 ({									\
 	WARN_ON_IN_IRQ();						\
-	likely(__access_ok(addr, size));				\
+	likely(__access_ok(untagged_addr(current->mm, addr), size));	\
 })
 
 #include <asm-generic/access_ok.h>
@@ -125,7 +152,13 @@ extern int __get_user_bad(void);
  * Return: zero on success, or -EFAULT on error.
  * On error, the variable @x is set to zero.
  */
-#define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
+#define get_user(x,ptr)							\
+({									\
+	__typeof__(*(ptr)) __user *__ptr_clean;				\
+	__ptr_clean = untagged_ptr(current->mm, ptr);			\
+	might_fault();							\
+	do_get_user_call(get_user,x,__ptr_clean);			\
+})
 
 /**
  * __get_user - Get a simple variable from user space, with less checking.
@@ -222,7 +255,12 @@ extern void __put_user_nocheck_8(void);
  *
  * Return: zero on success, or -EFAULT on error.
  */
-#define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); })
+#define put_user(x, ptr) ({						\
+	__typeof__(*(ptr)) __user *__ptr_clean;				\
+	__ptr_clean = untagged_ptr(current->mm, ptr);			\
+	might_fault();							\
+	do_put_user_call(put_user,x,__ptr_clean);			\
+})
 
 /**
  * __put_user - Write a simple value into user space, with less checking.
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9b2772b7e1f3..18b2bfdf7b9b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,6 +47,7 @@
 #include <asm/frame.h>
 #include <asm/unwind.h>
 #include <asm/tdx.h>
+#include <asm/mmu_context.h>
 
 #include "process.h"
 
@@ -367,6 +368,8 @@ void arch_setup_new_exec(void)
 		task_clear_spec_ssb_noexec(current);
 		speculation_ctrl_update(read_thread_flags());
 	}
+
+	mm_reset_untag_mask(current->mm);
 }
 
 #ifdef CONFIG_X86_IOPL_IOPERM
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 15:25   ` Edgecombe, Rick P
                     ` (4 more replies)
  2022-06-10 14:35 ` [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
                   ` (3 subsequent siblings)
  9 siblings, 5 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Add a couple of arch_prctl() handles:

 - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
   of tag bits. It is rounded up to the nearest LAM mode that can
   provide it. For now only LAM_U57 is supported, with 6 tag bits.

 - ARCH_GET_UNTAG_MASK returns untag mask. It can indicates where tag
   bits located in the address.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/uapi/asm/prctl.h |  3 +++
 arch/x86/kernel/process_64.c      | 32 ++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 500b96e71f18..38164a05c23c 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -20,4 +20,7 @@
 #define ARCH_MAP_VDSO_32		0x2002
 #define ARCH_MAP_VDSO_64		0x2003
 
+#define ARCH_GET_UNTAG_MASK		0x4001
+#define ARCH_ENABLE_TAGGED_ADDR		0x4002
+
 #endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1962008fe743..93c8eba1a66d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -742,6 +742,32 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 }
 #endif
 
+static int prctl_enable_tagged_addr(unsigned long nr_bits)
+{
+	struct mm_struct *mm = current->mm;
+
+	/* Already enabled? */
+	if (mm->context.lam_cr3_mask)
+		return -EBUSY;
+
+	/* LAM has to be enabled before spawning threads */
+	if (get_nr_threads(current) > 1)
+		return -EBUSY;
+
+	if (!nr_bits) {
+		return -EINVAL;
+	} else if (nr_bits <= 6) {
+		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
+		mm->context.untag_mask =  ~GENMASK(62, 57);
+	} else {
+		return -EINVAL;
+	}
+
+	/* Update CR3 to get LAM active */
+	switch_mm(current->mm, current->mm, current);
+	return 0;
+}
+
 long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 {
 	int ret = 0;
@@ -829,7 +855,11 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
 	case ARCH_MAP_VDSO_64:
 		return prctl_map_vdso(&vdso_image_64, arg2);
 #endif
-
+	case ARCH_GET_UNTAG_MASK:
+		return put_user(current->mm->context.untag_mask,
+				(unsigned long __user *)arg2);
+	case ARCH_ENABLE_TAGGED_ADDR:
+		return prctl_enable_tagged_addr(arg2);
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-10 15:24   ` Dave Hansen
  2022-06-10 14:35 ` [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

Add a line in /proc/$PID/arch_status to report untag_mask. It can be
used to find out LAM status of the process from the outside. It is
useful for debuggers.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mmu_context.h | 10 ++++++
 arch/x86/kernel/Makefile           |  2 ++
 arch/x86/kernel/fpu/xstate.c       | 47 ----------------------------
 arch/x86/kernel/proc.c             | 50 ++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 47 deletions(-)
 create mode 100644 arch/x86/kernel/proc.c

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 05821534aadc..a6cded0f5e64 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -103,6 +103,11 @@ static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 	mm->context.untag_mask = oldmm->context.untag_mask;
 }
 
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return mm->context.untag_mask;
+}
+
 static inline void mm_reset_untag_mask(struct mm_struct *mm)
 {
 	mm->context.untag_mask = -1UL;
@@ -119,6 +124,11 @@ static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
 {
 }
 
+static inline unsigned long mm_untag_mask(struct mm_struct *mm)
+{
+	return -1UL;
+}
+
 static inline void mm_reset_untag_mask(struct mm_struct *mm)
 {
 }
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 03364dc40d8d..228e108cbaba 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -145,6 +145,8 @@ obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev.o
 
+obj-$(CONFIG_PROC_FS)			+= proc.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..838a6f0627fd 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -10,8 +10,6 @@
 #include <linux/mman.h>
 #include <linux/nospec.h>
 #include <linux/pkeys.h>
-#include <linux/seq_file.h>
-#include <linux/proc_fs.h>
 #include <linux/vmalloc.h>
 
 #include <asm/fpu/api.h>
@@ -1745,48 +1743,3 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
 		return -EINVAL;
 	}
 }
-
-#ifdef CONFIG_PROC_PID_ARCH_STATUS
-/*
- * Report the amount of time elapsed in millisecond since last AVX512
- * use in the task.
- */
-static void avx512_status(struct seq_file *m, struct task_struct *task)
-{
-	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
-	long delta;
-
-	if (!timestamp) {
-		/*
-		 * Report -1 if no AVX512 usage
-		 */
-		delta = -1;
-	} else {
-		delta = (long)(jiffies - timestamp);
-		/*
-		 * Cap to LONG_MAX if time difference > LONG_MAX
-		 */
-		if (delta < 0)
-			delta = LONG_MAX;
-		delta = jiffies_to_msecs(delta);
-	}
-
-	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
-	seq_putc(m, '\n');
-}
-
-/*
- * Report architecture specific information
- */
-int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
-			struct pid *pid, struct task_struct *task)
-{
-	/*
-	 * Report AVX512 state if the processor and build option supported.
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
-		avx512_status(m, task);
-
-	return 0;
-}
-#endif /* CONFIG_PROC_PID_ARCH_STATUS */
diff --git a/arch/x86/kernel/proc.c b/arch/x86/kernel/proc.c
new file mode 100644
index 000000000000..59e681425e09
--- /dev/null
+++ b/arch/x86/kernel/proc.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <uapi/asm/prctl.h>
+#include <asm/mmu_context.h>
+
+/*
+ * Report the amount of time elapsed in millisecond since last AVX512
+ * use in the task.
+ */
+static void avx512_status(struct seq_file *m, struct task_struct *task)
+{
+	unsigned long timestamp = READ_ONCE(task->thread.fpu.avx512_timestamp);
+	long delta;
+
+	if (!timestamp) {
+		/*
+		 * Report -1 if no AVX512 usage
+		 */
+		delta = -1;
+	} else {
+		delta = (long)(jiffies - timestamp);
+		/*
+		 * Cap to LONG_MAX if time difference > LONG_MAX
+		 */
+		if (delta < 0)
+			delta = LONG_MAX;
+		delta = jiffies_to_msecs(delta);
+	}
+
+	seq_put_decimal_ll(m, "AVX512_elapsed_ms:\t", delta);
+	seq_putc(m, '\n');
+}
+
+/*
+ * Report architecture specific information
+ */
+int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
+			struct pid *pid, struct task_struct *task)
+{
+	/*
+	 * Report AVX512 state if the processor and build option supported.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
+		avx512_status(m, task);
+
+	seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(task->mm));
+
+	return 0;
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
@ 2022-06-10 14:35 ` Kirill A. Shutemov
  2022-06-16 10:00   ` Peter Zijlstra
  2022-06-10 20:22 ` [PATCHv3 0/8] Linear Address Masking enabling Kostya Serebryany
  2022-06-16 22:52 ` Edgecombe, Rick P
  9 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 14:35 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel, Kirill A. Shutemov

LAM_U48 allows to encode 15 bits of tags into address.

LAM_U48 steals bits above 47-bit for tags and makes it impossible for
userspace to use full address space on 5-level paging machine.

Make these features mutually exclusive: whichever gets enabled first
blocks the other one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/elf.h         |  3 ++-
 arch/x86/include/asm/mmu_context.h | 13 +++++++++++++
 arch/x86/kernel/process_64.c       | 22 ++++++++++++++++++++++
 arch/x86/kernel/sys_x86_64.c       |  5 +++--
 arch/x86/mm/hugetlbpage.c          |  6 ++++--
 arch/x86/mm/mmap.c                 |  9 ++++++++-
 6 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index cb0ff1055ab1..4df13497a770 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -317,7 +317,8 @@ static inline int mmap_is_ia32(void)
 extern unsigned long task_size_32bit(void);
 extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
-extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
+extern bool mmap_address_hint_valid(struct mm_struct *mm,
+				    unsigned long addr, unsigned long len);
 extern unsigned long get_sigframe_size(void);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index a6cded0f5e64..17d31988edd6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -263,6 +263,19 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
 
 unsigned long __get_current_cr3_fast(void);
 
+#ifdef CONFIG_X86_5LEVEL
+static inline bool full_va_allowed(struct mm_struct *mm)
+{
+	/* LAM_U48 steals VA bits above 47-bit for tags */
+	return mm->context.lam_cr3_mask != X86_CR3_LAM_U48;
+}
+#else
+static inline bool full_va_allowed(struct mm_struct *mm)
+{
+	return false;
+}
+#endif
+
 #include <asm-generic/mmu_context.h>
 
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 93c8eba1a66d..56822d313b96 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -742,6 +742,16 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 }
 #endif
 
+static bool lam_u48_allowed(void)
+{
+	struct mm_struct *mm = current->mm;
+
+	if (!full_va_allowed(mm))
+		return true;
+
+	return find_vma(mm, DEFAULT_MAP_WINDOW) == NULL;
+}
+
 static int prctl_enable_tagged_addr(unsigned long nr_bits)
 {
 	struct mm_struct *mm = current->mm;
@@ -759,6 +769,18 @@ static int prctl_enable_tagged_addr(unsigned long nr_bits)
 	} else if (nr_bits <= 6) {
 		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
 		mm->context.untag_mask =  ~GENMASK(62, 57);
+	} else if (nr_bits <= 15) {
+		if (mmap_write_lock_killable(mm))
+			return -EINTR;
+
+		if (!lam_u48_allowed()) {
+			mmap_write_unlock(mm);
+			return -EBUSY;
+		}
+
+		mm->context.lam_cr3_mask = X86_CR3_LAM_U48;
+		mm->context.untag_mask =  ~GENMASK(62, 48);
+		mmap_write_unlock(mm);
 	} else {
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 8cc653ffdccd..5ea6aaed89ba 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -21,6 +21,7 @@
 
 #include <asm/elf.h>
 #include <asm/ia32.h>
+#include <asm/mmu_context.h>
 
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
@@ -182,7 +183,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	/* requesting a specific address */
 	if (addr) {
 		addr &= PAGE_MASK;
-		if (!mmap_address_hint_valid(addr, len))
+		if (!mmap_address_hint_valid(mm, addr, len))
 			goto get_unmapped_area;
 
 		vma = find_vma(mm, addr);
@@ -203,7 +204,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	 * !in_32bit_syscall() check to avoid high addresses for x32
 	 * (and make it no op on native i386).
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall() && full_va_allowed(mm))
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index a0d023cb4292..9fdc8db42365 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -18,6 +18,7 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/elf.h>
+#include <asm/mmu_context.h>
 
 #if 0	/* This is just for testing */
 struct page *
@@ -103,6 +104,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 		unsigned long pgoff, unsigned long flags)
 {
 	struct hstate *h = hstate_file(file);
+	struct mm_struct *mm = current->mm;
 	struct vm_unmapped_area_info info;
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
@@ -114,7 +116,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall() && full_va_allowed(mm))
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
@@ -161,7 +163,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 
 	if (addr) {
 		addr &= huge_page_mask(h);
-		if (!mmap_address_hint_valid(addr, len))
+		if (!mmap_address_hint_valid(mm, addr, len))
 			goto get_unmapped_area;
 
 		vma = find_vma(mm, addr);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..f9ca824729de 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -21,6 +21,7 @@
 #include <linux/elf-randomize.h>
 #include <asm/elf.h>
 #include <asm/io.h>
+#include <asm/mmu_context.h>
 
 #include "physaddr.h"
 
@@ -35,6 +36,8 @@ unsigned long task_size_32bit(void)
 
 unsigned long task_size_64bit(int full_addr_space)
 {
+	if (!full_va_allowed(current->mm))
+		return DEFAULT_MAP_WINDOW;
 	return full_addr_space ? TASK_SIZE_MAX : DEFAULT_MAP_WINDOW;
 }
 
@@ -206,11 +209,15 @@ const char *arch_vma_name(struct vm_area_struct *vma)
  * the failure of such a fixed mapping request, so the restriction is not
  * applied.
  */
-bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
+bool mmap_address_hint_valid(struct mm_struct *mm,
+			     unsigned long addr, unsigned long len)
 {
 	if (TASK_SIZE - len < addr)
 		return false;
 
+	if (addr + len > DEFAULT_MAP_WINDOW && !full_va_allowed(mm))
+		return false;
+
 	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-06-10 14:35 ` [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
@ 2022-06-10 15:24   ` Dave Hansen
  2022-06-11  1:28     ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2022-06-10 15:24 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On 6/10/22 07:35, Kirill A. Shutemov wrote:
> +/*
> + * Report architecture specific information
> + */
> +int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
> +			struct pid *pid, struct task_struct *task)
> +{
> +	/*
> +	 * Report AVX512 state if the processor and build option supported.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
> +		avx512_status(m, task);
> +
> +	seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(task->mm));
> +
> +	return 0;
> +}

Arch-specific gunk is great for, well, arch-specific stuff.  AVX-512 and
its, um, "quirks", really won't show up anywhere else.  But x86 isn't
even the first to be doing this address tagging business.

Shouldn't we be talking to the ARM folks about a common way to do this?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
@ 2022-06-10 15:25   ` Edgecombe, Rick P
  2022-06-10 18:04     ` Kirill A. Shutemov
  2022-06-10 16:16   ` Edgecombe, Rick P
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 15:25 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> +{
> +       struct mm_struct *mm = current->mm;

do_arch_prctl_64() can be called via ptrace. I think you need to
operate on the mm of 'task', or just block the operation if task !=
current.

> +
> +       /* Already enabled? */
> +       if (mm->context.lam_cr3_mask)
> +               return -EBUSY;
> +
> +       /* LAM has to be enabled before spawning threads */
> +       if (get_nr_threads(current) > 1)
> +               return -EBUSY;
> +
> +       if (!nr_bits) {
> +               return -EINVAL;
> +       } else if (nr_bits <= 6) {
> +               mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
> +               mm->context.untag_mask =  ~GENMASK(62, 57);
> +       } else {
> +               return -EINVAL;
> +       }
> +
> +       /* Update CR3 to get LAM active */
> +       switch_mm(current->mm, current->mm, current);
> +       return 0;
> +}
> +
>  long do_arch_prctl_64(struct task_struct *task, int option, unsigned
> long arg2)
>  {
>         int ret = 0;
> @@ -829,7 +855,11 @@ long do_arch_prctl_64(struct task_struct *task,
> int option, unsigned long arg2)
>         case ARCH_MAP_VDSO_64:
>                 return prctl_map_vdso(&vdso_image_64, arg2);
>  #endif
> -
> +       case ARCH_GET_UNTAG_MASK:
> +               return put_user(current->mm->context.untag_mask,
> +                               (unsigned long __user *)arg2);
> +       case ARCH_ENABLE_TAGGED_ADDR:
> +               return prctl_enable_tagged_addr(arg2);
>         default:
>                 ret = -EINVAL;
>                 break;

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
  2022-06-10 15:25   ` Edgecombe, Rick P
@ 2022-06-10 16:16   ` Edgecombe, Rick P
  2022-06-10 18:06     ` Kirill A. Shutemov
  2022-06-13 14:42   ` Michal Hocko
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 16:16 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> +{
> +       struct mm_struct *mm = current->mm;
> +
> +       /* Already enabled? */
> +       if (mm->context.lam_cr3_mask)
> +               return -EBUSY;
> +
> +       /* LAM has to be enabled before spawning threads */
> +       if (get_nr_threads(current) > 1)
> +               return -EBUSY;

Does this work for vfork()? I guess the idea is that locking is not
needed below because there is only one thread with the MM, but with
vfork() another task could operate on the MM, call fork(), etc. I'm not
sure...

> +
> +       if (!nr_bits) {
> +               return -EINVAL;
> +       } else if (nr_bits <= 6) {
> +               mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
> +               mm->context.untag_mask =  ~GENMASK(62, 57);
> +       } else {
> +               return -EINVAL;
> +       }
> +
> +       /* Update CR3 to get LAM active */
> +       switch_mm(current->mm, current->mm, current);
> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 15:25   ` Edgecombe, Rick P
@ 2022-06-10 18:04     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 18:04 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, Lutomirski, Andy, dave.hansen, linux-kernel, hjl.tools,
	linux-mm, kcc, andreyknvl, ak, dvyukov, x86, ryabinin.a.a,
	glider

On Fri, Jun 10, 2022 at 03:25:02PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > +{
> > +       struct mm_struct *mm = current->mm;
> 
> do_arch_prctl_64() can be called via ptrace. I think you need to
> operate on the mm of 'task', or just block the operation if task !=
> current.

Hm. True. Let's see if the interface in general good enough.

Ouch. I just noticied that I missed check for LAM feature :/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 16:16   ` Edgecombe, Rick P
@ 2022-06-10 18:06     ` Kirill A. Shutemov
  2022-06-10 18:08       ` Edgecombe, Rick P
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-10 18:06 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, Lutomirski, Andy, dave.hansen, linux-kernel, hjl.tools,
	linux-mm, kcc, andreyknvl, ak, dvyukov, x86, ryabinin.a.a,
	glider

On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > +{
> > +       struct mm_struct *mm = current->mm;
> > +
> > +       /* Already enabled? */
> > +       if (mm->context.lam_cr3_mask)
> > +               return -EBUSY;
> > +
> > +       /* LAM has to be enabled before spawning threads */
> > +       if (get_nr_threads(current) > 1)
> > +               return -EBUSY;
> 
> Does this work for vfork()? I guess the idea is that locking is not
> needed below because there is only one thread with the MM, but with
> vfork() another task could operate on the MM, call fork(), etc. I'm not
> sure...

I'm not sure I follow. vfork() blocks parent process until child exit or
execve(). I don't see how it is a problem.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 18:06     ` Kirill A. Shutemov
@ 2022-06-10 18:08       ` Edgecombe, Rick P
  2022-06-10 22:18         ` Edgecombe, Rick P
  0 siblings, 1 reply; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 18:08 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, dave.hansen,
	andreyknvl, kcc, ak, dvyukov, x86, ryabinin.a.a, Lutomirski,
	Andy, glider

On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > > +{
> > > +       struct mm_struct *mm = current->mm;
> > > +
> > > +       /* Already enabled? */
> > > +       if (mm->context.lam_cr3_mask)
> > > +               return -EBUSY;
> > > +
> > > +       /* LAM has to be enabled before spawning threads */
> > > +       if (get_nr_threads(current) > 1)
> > > +               return -EBUSY;
> > 
> > Does this work for vfork()? I guess the idea is that locking is not
> > needed below because there is only one thread with the MM, but with
> > vfork() another task could operate on the MM, call fork(), etc. I'm
> > not
> > sure...
> 
> I'm not sure I follow. vfork() blocks parent process until child exit
> or
> execve(). I don't see how it is a problem.

Oh yea, you're right.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 0/8] Linear Address Masking enabling
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2022-06-10 14:35 ` [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
@ 2022-06-10 20:22 ` Kostya Serebryany
  2022-06-16 22:52 ` Edgecombe, Rick P
  9 siblings, 0 replies; 67+ messages in thread
From: Kostya Serebryany @ 2022-06-10 20:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

Thanks for working on this, please make LAM happen.
It enables efficient memory safety testing that is already available on AArch64.

Memory error detectors, such as ASAN and Valgrind (or KASAN for the kernel)
have limited applicability, primarily because of their run-time overheads
(CPU, RAM, and code size). In many cases, the major obstacle to a wider
deployment is the RAM overhead, which is typically 2x-3x. There is another tool,
HWASAN [1], which solves the same problem and has < 10% RAM overhead.
This tool is available only on AArch64 because it relies on the
top-byte-ignore (TBI)
feature. Full support for that feature [2] has been added to the
kernel in order to
enable HWASAN. Adding support for LAM will enable HWASAN on x86_64.

HWASAN is already the main memory safety tool for Android [3] - the reduced RAM
overhead allowed us to utilize this testing tool where ASAN’s RAM overhead was
prohibitive. We have also prototyped the x86_64 variant of HWASAN, and we can
observe that it is a major improvement over ASAN. The kernel support
and hardware
availability are the only missing parts.

Making HWASAN available on x86_64 will enable developers of server and
client software
to scale up their memory safety testing, and thus improve the quality
and security of their products.


[1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://www.kernel.org/doc/html/latest/arm64/tagged-address-abi.html
[3] https://source.android.com/devices/tech/debug/hwasan

--kcc


On Fri, Jun 10, 2022 at 7:35 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Linear Address Masking[1] (LAM) modifies the checking that is applied to
> 64-bit linear addresses, allowing software to use of the untranslated
> address bits for metadata.
>
> The patchset brings support for LAM for userspace addresses.
>
> LAM_U48 enabling is controversial since it competes for bits with
> 5-level paging. Its enabling isolated into an optional last patch that
> can be applied at maintainer's discretion.
>
> Please review and consider applying.
>
> v3:
>   - Rebased onto v5.19-rc1
>   - Per-process enabling;
>   - API overhaul (again);
>   - Avoid branches and costly computations in the fast path;
>   - LAM_U48 is in optional patch.
> v2:
>   - Rebased onto v5.18-rc1
>   - New arch_prctl(2)-based API
>   - Expose status of LAM (or other thread features) in
>     /proc/$PID/arch_status
>
> [1] ISE, Chapter 14.
> https://software.intel.com/content/dam/develop/external/us/en/documents-tps/architecture-instruction-set-extensions-programming-reference.pdf
>
> Kirill A. Shutemov (8):
>   x86/mm: Fix CR3_ADDR_MASK
>   x86: CPUID and CR3/CR4 flags for Linear Address Masking
>   mm: Pass down mm_struct to untagged_addr()
>   x86/mm: Handle LAM on context switch
>   x86/uaccess: Provide untagged_addr() and remove tags before address check
>   x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
>   x86: Expose untagging mask in /proc/$PID/arch_status
>   x86/mm: Extend LAM to support to LAM_U48
>
>  arch/arm64/include/asm/memory.h               |  4 +-
>  arch/arm64/include/asm/signal.h               |  2 +-
>  arch/arm64/include/asm/uaccess.h              |  4 +-
>  arch/arm64/kernel/hw_breakpoint.c             |  2 +-
>  arch/arm64/kernel/traps.c                     |  4 +-
>  arch/arm64/mm/fault.c                         | 10 +--
>  arch/sparc/include/asm/pgtable_64.h           |  2 +-
>  arch/sparc/include/asm/uaccess_64.h           |  2 +
>  arch/x86/include/asm/cpufeatures.h            |  1 +
>  arch/x86/include/asm/elf.h                    |  3 +-
>  arch/x86/include/asm/mmu.h                    |  2 +
>  arch/x86/include/asm/mmu_context.h            | 58 +++++++++++++++++
>  arch/x86/include/asm/processor-flags.h        |  2 +-
>  arch/x86/include/asm/tlbflush.h               |  3 +
>  arch/x86/include/asm/uaccess.h                | 44 ++++++++++++-
>  arch/x86/include/uapi/asm/prctl.h             |  3 +
>  arch/x86/include/uapi/asm/processor-flags.h   |  6 ++
>  arch/x86/kernel/Makefile                      |  2 +
>  arch/x86/kernel/fpu/xstate.c                  | 47 --------------
>  arch/x86/kernel/proc.c                        | 50 +++++++++++++++
>  arch/x86/kernel/process.c                     |  3 +
>  arch/x86/kernel/process_64.c                  | 54 +++++++++++++++-
>  arch/x86/kernel/sys_x86_64.c                  |  5 +-
>  arch/x86/mm/hugetlbpage.c                     |  6 +-
>  arch/x86/mm/mmap.c                            |  9 ++-
>  arch/x86/mm/tlb.c                             | 62 ++++++++++++++-----
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |  2 +-
>  drivers/gpu/drm/radeon/radeon_gem.c           |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c               |  2 +-
>  drivers/media/common/videobuf2/frame_vector.c |  2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
>  .../staging/media/atomisp/pci/hmm/hmm_bo.c    |  2 +-
>  drivers/tee/tee_shm.c                         |  2 +-
>  drivers/vfio/vfio_iommu_type1.c               |  2 +-
>  fs/proc/task_mmu.c                            |  2 +-
>  include/linux/mm.h                            | 11 ----
>  include/linux/uaccess.h                       | 11 ++++
>  lib/strncpy_from_user.c                       |  2 +-
>  lib/strnlen_user.c                            |  2 +-
>  mm/gup.c                                      |  6 +-
>  mm/madvise.c                                  |  2 +-
>  mm/mempolicy.c                                |  6 +-
>  mm/migrate.c                                  |  2 +-
>  mm/mincore.c                                  |  2 +-
>  mm/mlock.c                                    |  4 +-
>  mm/mmap.c                                     |  2 +-
>  mm/mprotect.c                                 |  2 +-
>  mm/mremap.c                                   |  2 +-
>  mm/msync.c                                    |  2 +-
>  virt/kvm/kvm_main.c                           |  2 +-
>  51 files changed, 342 insertions(+), 126 deletions(-)
>  create mode 100644 arch/x86/kernel/proc.c
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 18:08       ` Edgecombe, Rick P
@ 2022-06-10 22:18         ` Edgecombe, Rick P
  2022-06-11  1:12           ` Kirill A. Shutemov
  2022-06-12 21:03           ` Andy Lutomirski
  0 siblings, 2 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 22:18 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, dave.hansen,
	andreyknvl, kcc, ak, dvyukov, x86, ryabinin.a.a, Lutomirski,
	Andy, glider

On Fri, 2022-06-10 at 11:08 -0700, Edgecombe, Richard P wrote:
> On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote:
> > On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P wrote:
> > > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > > > +{
> > > > +       struct mm_struct *mm = current->mm;
> > > > +
> > > > +       /* Already enabled? */
> > > > +       if (mm->context.lam_cr3_mask)
> > > > +               return -EBUSY;
> > > > +
> > > > +       /* LAM has to be enabled before spawning threads */
> > > > +       if (get_nr_threads(current) > 1)
> > > > +               return -EBUSY;
> > > 
> > > Does this work for vfork()? I guess the idea is that locking is
> > > not
> > > needed below because there is only one thread with the MM, but
> > > with
> > > vfork() another task could operate on the MM, call fork(), etc.
> > > I'm
> > > not
> > > sure...
> > 
> > I'm not sure I follow. vfork() blocks parent process until child
> > exit
> > or
> > execve(). I don't see how it is a problem.
> 
> Oh yea, you're right.

Actually, I guess vfork() only suspends the calling thread. So what if
you had:
1. Parent spawns a bunch of threads
2. vforks()
3. Child enables LAM (it only has one thread, so succeeds)
4. Child exits()
5. Parent has some threads with LAM, and some not

It's some weird userspace that doesn't deserve to have things work for
it, but I wonder if it could open up little races around untagging. As
an example, KVM might have a super narrow race where it checks for tags
in memslots using addr != untagged_addr(addr) before checking
access_ok(addr, ...). See __kvm_set_memory_region(). If mm-
>context.untag_mask got set in the middle, tagged memslots could be
added.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK
  2022-06-10 14:35 ` [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
@ 2022-06-10 23:32   ` Edgecombe, Rick P
  0 siblings, 0 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 23:32 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> The mask must not include bits above physical address mask. These
> bits
> are reserved and can be used for other things. Bits 61 and 62 are
> used
> for Linear Address Masking.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/processor-flags.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/processor-flags.h
> b/arch/x86/include/asm/processor-flags.h
> index 02c2cbda4a74..a7f3d9100adb 100644
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -35,7 +35,7 @@
>   */
>  #ifdef CONFIG_X86_64
>  /* Mask off the address space ID and SME encryption bits. */
> -#define CR3_ADDR_MASK	__sme_clr(0x7FFFFFFFFFFFF000ull)
> +#define CR3_ADDR_MASK	__sme_clr(PHYSICAL_PAGE_MASK)
>  #define CR3_PCID_MASK	0xFFFull
>  #define CR3_NOFLUSH	BIT_ULL(63)
>  

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr()
  2022-06-10 14:35 ` [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
@ 2022-06-10 23:33   ` Edgecombe, Rick P
  2022-06-17 15:27   ` Alexander Potapenko
  1 sibling, 0 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 23:33 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> Intel Linear Address Masking (LAM) brings per-mm untagging rules.
> Pass
> down mm_struct to the untagging helper. It will help to apply
> untagging
> policy correctly.
> 
> In most cases, current->mm is the one to use, but there are some
> exceptions, such as get_user_page_remote().
> 
> Move dummy implementation of untagged_addr() from <linux/mm.h> to
> <linux/uaccess.h>. <asm/uaccess.h> can override the implementation.
> Moving the dummy header outside <linux/mm.h> helps to avoid header
> hell
> if you need to defer mm_struct within the helper.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/arm64/include/asm/memory.h                  |  4 ++--
>  arch/arm64/include/asm/signal.h                  |  2 +-
>  arch/arm64/include/asm/uaccess.h                 |  4 ++--
>  arch/arm64/kernel/hw_breakpoint.c                |  2 +-
>  arch/arm64/kernel/traps.c                        |  4 ++--
>  arch/arm64/mm/fault.c                            | 10 +++++-----
>  arch/sparc/include/asm/pgtable_64.h              |  2 +-
>  arch/sparc/include/asm/uaccess_64.h              |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c          |  2 +-
>  drivers/gpu/drm/radeon/radeon_gem.c              |  2 +-
>  drivers/infiniband/hw/mlx4/mr.c                  |  2 +-
>  drivers/media/common/videobuf2/frame_vector.c    |  2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c    |  2 +-
>  drivers/staging/media/atomisp/pci/hmm/hmm_bo.c   |  2 +-
>  drivers/tee/tee_shm.c                            |  2 +-
>  drivers/vfio/vfio_iommu_type1.c                  |  2 +-
>  fs/proc/task_mmu.c                               |  2 +-
>  include/linux/mm.h                               | 11 -----------
>  include/linux/uaccess.h                          | 11 +++++++++++
>  lib/strncpy_from_user.c                          |  2 +-
>  lib/strnlen_user.c                               |  2 +-
>  mm/gup.c                                         |  6 +++---
>  mm/madvise.c                                     |  2 +-
>  mm/mempolicy.c                                   |  6 +++---
>  mm/migrate.c                                     |  2 +-
>  mm/mincore.c                                     |  2 +-
>  mm/mlock.c                                       |  4 ++--
>  mm/mmap.c                                        |  2 +-
>  mm/mprotect.c                                    |  2 +-
>  mm/mremap.c                                      |  2 +-
>  mm/msync.c                                       |  2 +-
>  virt/kvm/kvm_main.c                              |  2 +-
>  33 files changed, 55 insertions(+), 53 deletions(-)

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
@ 2022-06-10 23:55   ` Edgecombe, Rick P
  2022-06-15 15:54     ` Kirill A. Shutemov
  2022-06-16  9:08   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-10 23:55 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> @@ -687,6 +716,7 @@ void initialize_tlbstate_and_flush(void)
>         struct mm_struct *mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>         u64 tlb_gen = atomic64_read(&init_mm.context.tlb_gen);
>         unsigned long cr3 = __read_cr3();
> +       u64 lam = cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>  
>         /* Assert that CR3 already references the right mm. */
>         WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
> @@ -700,7 +730,7 @@ void initialize_tlbstate_and_flush(void)
>                 !(cr4_read_shadow() & X86_CR4_PCIDE));
>  
>         /* Force ASID 0 and force a TLB flush. */
> -       write_cr3(build_cr3(mm->pgd, 0));
> +       write_cr3(build_cr3(mm->pgd, 0, lam));
>  

Can you explain why to keep the lam bits that were in CR3 here? It
seems to be worried some CR3 bits got changed and need to be set to a
known state. Why not take them from the MM?

Also, it warns if the cr3 pfn doesn't match the mm pgd, should it warn
if cr3 lam bits don't match the MM's copy?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 22:18         ` Edgecombe, Rick P
@ 2022-06-11  1:12           ` Kirill A. Shutemov
  2022-06-11  2:36             ` Edgecombe, Rick P
  2022-06-12 21:03           ` Andy Lutomirski
  1 sibling, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-11  1:12 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, dave.hansen,
	andreyknvl, kcc, ak, dvyukov, x86, ryabinin.a.a, Lutomirski,
	Andy, glider

On Fri, Jun 10, 2022 at 10:18:23PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 11:08 -0700, Edgecombe, Richard P wrote:
> > On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote:
> > > On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P wrote:
> > > > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > > > > +{
> > > > > +       struct mm_struct *mm = current->mm;
> > > > > +
> > > > > +       /* Already enabled? */
> > > > > +       if (mm->context.lam_cr3_mask)
> > > > > +               return -EBUSY;
> > > > > +
> > > > > +       /* LAM has to be enabled before spawning threads */
> > > > > +       if (get_nr_threads(current) > 1)
> > > > > +               return -EBUSY;
> > > > 
> > > > Does this work for vfork()? I guess the idea is that locking is
> > > > not
> > > > needed below because there is only one thread with the MM, but
> > > > with
> > > > vfork() another task could operate on the MM, call fork(), etc.
> > > > I'm
> > > > not
> > > > sure...
> > > 
> > > I'm not sure I follow. vfork() blocks parent process until child
> > > exit
> > > or
> > > execve(). I don't see how it is a problem.
> > 
> > Oh yea, you're right.
> 
> Actually, I guess vfork() only suspends the calling thread. So what if
> you had:
> 1. Parent spawns a bunch of threads
> 2. vforks()
> 3. Child enables LAM (it only has one thread, so succeeds)
> 4. Child exits()
> 5. Parent has some threads with LAM, and some not

I think it is in "Don't do that" territory. It is very similar to cases
described in "Caveats" section of the vfork(2) man-page.

> It's some weird userspace that doesn't deserve to have things work for
> it, but I wonder if it could open up little races around untagging. As
> an example, KVM might have a super narrow race where it checks for tags
> in memslots using addr != untagged_addr(addr) before checking
> access_ok(addr, ...). See __kvm_set_memory_region(). If mm-
> >context.untag_mask got set in the middle, tagged memslots could be
> added.

Ultimately, a process which calls vfork(2) is in control of what happens
to the new process until execve(2) or exit(2). So, yes it is very creative
way to shoot yourself into leg, but I don't think it worth preventing.

And I'm not sure how the fix would look like.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-06-10 15:24   ` Dave Hansen
@ 2022-06-11  1:28     ` Kirill A. Shutemov
  2022-06-27 12:00       ` Catalin Marinas
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-11  1:28 UTC (permalink / raw)
  To: Dave Hansen, Catalin Marinas, Will Deacon
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On Fri, Jun 10, 2022 at 08:24:38AM -0700, Dave Hansen wrote:
> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> > +/*
> > + * Report architecture specific information
> > + */
> > +int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
> > +			struct pid *pid, struct task_struct *task)
> > +{
> > +	/*
> > +	 * Report AVX512 state if the processor and build option supported.
> > +	 */
> > +	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
> > +		avx512_status(m, task);
> > +
> > +	seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(task->mm));
> > +
> > +	return 0;
> > +}
> 
> Arch-specific gunk is great for, well, arch-specific stuff.  AVX-512 and
> its, um, "quirks", really won't show up anywhere else.  But x86 isn't
> even the first to be doing this address tagging business.
> 
> Shouldn't we be talking to the ARM folks about a common way to do this?

+ Catalin, Will.

I guess we can expose the mask via proc for ARM too, but I'm not sure if
we can unify interface further without breaking existing TBI users: TBI is
enabled per-thread while LAM is per-process.

Any opinions?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-11  1:12           ` Kirill A. Shutemov
@ 2022-06-11  2:36             ` Edgecombe, Rick P
  0 siblings, 0 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-11  2:36 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, kcc, andreyknvl,
	dave.hansen, ak, dvyukov, x86, ryabinin.a.a, Lutomirski, Andy,
	glider

On Sat, 2022-06-11 at 04:12 +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 10, 2022 at 10:18:23PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-06-10 at 11:08 -0700, Edgecombe, Richard P wrote:
> > > On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote:
> > > > On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P
> > > > wrote:
> > > > > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > > > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > > > > > +{
> > > > > > +       struct mm_struct *mm = current->mm;
> > > > > > +
> > > > > > +       /* Already enabled? */
> > > > > > +       if (mm->context.lam_cr3_mask)
> > > > > > +               return -EBUSY;
> > > > > > +
> > > > > > +       /* LAM has to be enabled before spawning threads */
> > > > > > +       if (get_nr_threads(current) > 1)
> > > > > > +               return -EBUSY;
> > > > > 
> > > > > Does this work for vfork()? I guess the idea is that locking
> > > > > is
> > > > > not
> > > > > needed below because there is only one thread with the MM,
> > > > > but
> > > > > with
> > > > > vfork() another task could operate on the MM, call fork(),
> > > > > etc.
> > > > > I'm
> > > > > not
> > > > > sure...
> > > > 
> > > > I'm not sure I follow. vfork() blocks parent process until
> > > > child
> > > > exit
> > > > or
> > > > execve(). I don't see how it is a problem.
> > > 
> > > Oh yea, you're right.
> > 
> > Actually, I guess vfork() only suspends the calling thread. So what
> > if
> > you had:
> > 1. Parent spawns a bunch of threads
> > 2. vforks()
> > 3. Child enables LAM (it only has one thread, so succeeds)
> > 4. Child exits()
> > 5. Parent has some threads with LAM, and some not
> 
> I think it is in "Don't do that" territory. It is very similar to
> cases
> described in "Caveats" section of the vfork(2) man-page.
> 
> > It's some weird userspace that doesn't deserve to have things work
> > for
> > it, but I wonder if it could open up little races around untagging.
> > As
> > an example, KVM might have a super narrow race where it checks for
> > tags
> > in memslots using addr != untagged_addr(addr) before checking
> > access_ok(addr, ...). See __kvm_set_memory_region(). If mm-
> > > context.untag_mask got set in the middle, tagged memslots could
> > > be
> > 
> > added.
> 
> Ultimately, a process which calls vfork(2) is in control of what
> happens
> to the new process until execve(2) or exit(2). So, yes it is very
> creative
> way to shoot yourself into leg, but I don't think it worth
> preventing.
> 
> And I'm not sure how the fix would look like.

Yea, userspace shooting itself in the foot is fine. You would really
have to go out of your way to do that. But my concern is that it will
expose the kernel. The KVM scenario I outlined is a narrow race, but it
lets guests write to freed pages. So the "not first thread enabling"
seems like a generally fragile thing.

I don't know how to fix it, but I think enabling LAM seems fraught and
should be contained strictly to MMs with one thread.

I'm not sure, but what about using in_vfork()?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 22:18         ` Edgecombe, Rick P
  2022-06-11  1:12           ` Kirill A. Shutemov
@ 2022-06-12 21:03           ` Andy Lutomirski
  2022-06-16  9:44             ` Peter Zijlstra
  1 sibling, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-12 21:03 UTC (permalink / raw)
  To: Rick P Edgecombe, Kirill A. Shutemov
  Cc: Linux Kernel Mailing List, Peter Zijlstra (Intel),
	H.J. Lu, linux-mm, Dave Hansen, andreyknvl, kcc, Andi Kleen,
	dvyukov, the arch/x86 maintainers, ryabinin.a.a, glider

On Fri, Jun 10, 2022, at 3:18 PM, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 11:08 -0700, Edgecombe, Richard P wrote:
>> On Fri, 2022-06-10 at 21:06 +0300, Kirill A. Shutemov wrote:
>> > On Fri, Jun 10, 2022 at 04:16:01PM +0000, Edgecombe, Rick P wrote:
>> > > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
>> > > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
>> > > > +{
>> > > > +       struct mm_struct *mm = current->mm;
>> > > > +
>> > > > +       /* Already enabled? */
>> > > > +       if (mm->context.lam_cr3_mask)
>> > > > +               return -EBUSY;
>> > > > +
>> > > > +       /* LAM has to be enabled before spawning threads */
>> > > > +       if (get_nr_threads(current) > 1)
>> > > > +               return -EBUSY;
>> > > 
>> > > Does this work for vfork()? I guess the idea is that locking is
>> > > not
>> > > needed below because there is only one thread with the MM, but
>> > > with
>> > > vfork() another task could operate on the MM, call fork(), etc.
>> > > I'm
>> > > not
>> > > sure...
>> > 
>> > I'm not sure I follow. vfork() blocks parent process until child
>> > exit
>> > or
>> > execve(). I don't see how it is a problem.
>> 
>> Oh yea, you're right.
>
> Actually, I guess vfork() only suspends the calling thread. So what if
> you had:
> 1. Parent spawns a bunch of threads
> 2. vforks()
> 3. Child enables LAM (it only has one thread, so succeeds)
> 4. Child exits()
> 5. Parent has some threads with LAM, and some not
>
> It's some weird userspace that doesn't deserve to have things work for
> it, but I wonder if it could open up little races around untagging. As
> an example, KVM might have a super narrow race where it checks for tags
> in memslots using addr != untagged_addr(addr) before checking
> access_ok(addr, ...). See __kvm_set_memory_region(). If mm-
>>context.untag_mask got set in the middle, tagged memslots could be
> added.

get_nr_threads() is the wrong thing.  Either look at mm->mm_users or find a way to get rid of this restriction entirely.

IMO it would not be insane to have a way to iterate over all tasks using an mm.  But doing this for io_uring, etc might be interesting.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
  2022-06-10 15:25   ` Edgecombe, Rick P
  2022-06-10 16:16   ` Edgecombe, Rick P
@ 2022-06-13 14:42   ` Michal Hocko
  2022-06-16 17:05     ` Kirill A. Shutemov
  2022-06-16  9:39   ` Peter Zijlstra
  2022-06-28 23:42   ` Andy Lutomirski
  4 siblings, 1 reply; 67+ messages in thread
From: Michal Hocko @ 2022-06-13 14:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On Fri 10-06-22 17:35:25, Kirill A. Shutemov wrote:
[...]
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 1962008fe743..93c8eba1a66d 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -742,6 +742,32 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
>  }
>  #endif
>  
> +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	/* Already enabled? */
> +	if (mm->context.lam_cr3_mask)
> +		return -EBUSY;
> +
> +	/* LAM has to be enabled before spawning threads */
> +	if (get_nr_threads(current) > 1)
> +		return -EBUSY;

This will not be sufficient in general. You can have mm shared with a
process without CLONE_THREAD. So you would also need to check also
MMF_MULTIPROCESS. But I do remember that general get_nr_threads is quite
tricky to use properly. Make sure to CC Oleg Nesterov for more details.

Also how does this work when the mm is shared with a kernel thread?

> +
> +	if (!nr_bits) {
> +		return -EINVAL;
> +	} else if (nr_bits <= 6) {
> +		mm->context.lam_cr3_mask = X86_CR3_LAM_U57;
> +		mm->context.untag_mask =  ~GENMASK(62, 57);
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	/* Update CR3 to get LAM active */
> +	switch_mm(current->mm, current->mm, current);
> +	return 0;
> +}

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
@ 2022-06-13 17:36   ` Edgecombe, Rick P
  2022-06-15 16:58     ` Kirill A. Shutemov
                       ` (2 more replies)
  2022-06-16 10:02   ` Peter Zijlstra
  2022-06-28 23:40   ` Andy Lutomirski
  2 siblings, 3 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-13 17:36 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> +#ifdef CONFIG_X86_64
> +/*
> + * Mask out tag bits from the address.
> + *
> + * Magic with the 'sign' allows to untag userspace pointer without
> any branches
> + * while leaving kernel addresses intact.

Trying to understand the magic part here. I guess how it works is, when
the high bit is set, it does the opposite of untagging the addresses by
setting the tag bits instead of clearing them. So:
 - For proper canonical kernel addresses (with U57) it leaves them 
   intact since the tag bits were already set.
 - For non-canonical kernel-half addresses, it fixes them up. 
   (0xeffffff000000840->0xfffffff000000840)
 - For U48 and 5 level paging, it corrupts some normal kernel 
   addresses. (0xff90ffffffffffff->0xffffffffffffffff)

I just ported this to userspace and threw some addresses at it to see
what happened, so hopefully I got that right.

Is this special kernel address handling only needed because
copy_to_kernel_nofault(), etc call the user helpers?

> + */
> +#define untagged_addr(mm,
> addr)        ({                                      \
> +       u64 __addr = (__force
> u64)(addr);                               \
> +       s64 sign = (s64)__addr >>
> 63;                                   \
> +       __addr ^=
> sign;                                                 \
> +       __addr &= (mm)-
> >context.untag_mask;                             \
> +       __addr ^=
> sign;                                                 \
> +       (__force
> __typeof__(addr))__addr;                               \
> +})

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 23:55   ` Edgecombe, Rick P
@ 2022-06-15 15:54     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-15 15:54 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, Lutomirski, Andy, dave.hansen, linux-kernel, hjl.tools,
	linux-mm, kcc, andreyknvl, ak, dvyukov, x86, ryabinin.a.a,
	glider

On Fri, Jun 10, 2022 at 11:55:02PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > @@ -687,6 +716,7 @@ void initialize_tlbstate_and_flush(void)
> >         struct mm_struct *mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> >         u64 tlb_gen = atomic64_read(&init_mm.context.tlb_gen);
> >         unsigned long cr3 = __read_cr3();
> > +       u64 lam = cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> >  
> >         /* Assert that CR3 already references the right mm. */
> >         WARN_ON((cr3 & CR3_ADDR_MASK) != __pa(mm->pgd));
> > @@ -700,7 +730,7 @@ void initialize_tlbstate_and_flush(void)
> >                 !(cr4_read_shadow() & X86_CR4_PCIDE));
> >  
> >         /* Force ASID 0 and force a TLB flush. */
> > -       write_cr3(build_cr3(mm->pgd, 0));
> > +       write_cr3(build_cr3(mm->pgd, 0, lam));
> >  
> 
> Can you explain why to keep the lam bits that were in CR3 here? It
> seems to be worried some CR3 bits got changed and need to be set to a
> known state. Why not take them from the MM?
> 
> Also, it warns if the cr3 pfn doesn't match the mm pgd, should it warn
> if cr3 lam bits don't match the MM's copy?

You are right, taking LAM mode from init_mm is more correct. And we need
to update tlbstate with the new LAM mode. 

I think both CR3 and init_mm should LAM disabled here as we are bringing
CPU up. I'll add WARN_ON().

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-13 17:36   ` Edgecombe, Rick P
@ 2022-06-15 16:58     ` Kirill A. Shutemov
  2022-06-15 19:06       ` Edgecombe, Rick P
  2022-06-16  9:30     ` Peter Zijlstra
  2022-06-16  9:34     ` Peter Zijlstra
  2 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-15 16:58 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, Lutomirski, Andy, dave.hansen, linux-kernel, hjl.tools,
	linux-mm, kcc, andreyknvl, ak, dvyukov, x86, ryabinin.a.a,
	glider

On Mon, Jun 13, 2022 at 05:36:43PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > +#ifdef CONFIG_X86_64
> > +/*
> > + * Mask out tag bits from the address.
> > + *
> > + * Magic with the 'sign' allows to untag userspace pointer without
> > any branches
> > + * while leaving kernel addresses intact.
> 
> Trying to understand the magic part here. I guess how it works is, when
> the high bit is set, it does the opposite of untagging the addresses by
> setting the tag bits instead of clearing them. So:
>  - For proper canonical kernel addresses (with U57) it leaves them 
>    intact since the tag bits were already set.
>  - For non-canonical kernel-half addresses, it fixes them up. 
>    (0xeffffff000000840->0xfffffff000000840)
>  - For U48 and 5 level paging, it corrupts some normal kernel 
>    addresses. (0xff90ffffffffffff->0xffffffffffffffff)
> 
> I just ported this to userspace and threw some addresses at it to see
> what happened, so hopefully I got that right.

Ouch. Thanks for noticing this. I should have catched this myself. Yes,
this implementation is broken for LAM_U48 on 5-level machine.

What about this:

	#define untagged_addr(mm, addr)	({					\
		u64 __addr = (__force u64)(addr);				\
		s64 sign = (s64)__addr >> 63;					\
		__addr &= (mm)->context.untag_mask | sign;			\
		(__force __typeof__(addr))__addr;				\
	})

It makes mask effectively. all-ones for supervisor addresses. And it is
less magic to my eyes.

The generated code also look sane to me:

    11d0:	48 89 f8             	mov    %rdi,%rax
    11d3:	48 c1 f8 3f          	sar    $0x3f,%rax
    11d7:	48 0b 05 52 2e 00 00 	or     0x2e52(%rip),%rax        # 4030 <untag_mask>
    11de:	48 21 f8             	and    %rdi,%rax

Any comments?

> Is this special kernel address handling only needed because
> copy_to_kernel_nofault(), etc call the user helpers?

I did not have any particular use-case in mind. But just if some kernel
address gets there and bits get cleared we will have very hard to debug
bug.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-15 16:58     ` Kirill A. Shutemov
@ 2022-06-15 19:06       ` Edgecombe, Rick P
  0 siblings, 0 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-15 19:06 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, dave.hansen,
	andreyknvl, kcc, ak, dvyukov, x86, ryabinin.a.a, Lutomirski,
	Andy, glider

On Wed, 2022-06-15 at 19:58 +0300, Kirill A. Shutemov wrote:
> On Mon, Jun 13, 2022 at 05:36:43PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > +#ifdef CONFIG_X86_64
> > > +/*
> > > + * Mask out tag bits from the address.
> > > + *
> > > + * Magic with the 'sign' allows to untag userspace pointer
> > > without
> > > any branches
> > > + * while leaving kernel addresses intact.
> > 
> > Trying to understand the magic part here. I guess how it works is,
> > when
> > the high bit is set, it does the opposite of untagging the
> > addresses by
> > setting the tag bits instead of clearing them. So:
> >   - For proper canonical kernel addresses (with U57) it leaves
> > them 
> >     intact since the tag bits were already set.
> >   - For non-canonical kernel-half addresses, it fixes them up. 
> >     (0xeffffff000000840->0xfffffff000000840)
> >   - For U48 and 5 level paging, it corrupts some normal kernel 
> >     addresses. (0xff90ffffffffffff->0xffffffffffffffff)
> > 
> > I just ported this to userspace and threw some addresses at it to
> > see
> > what happened, so hopefully I got that right.
> 
> Ouch. Thanks for noticing this. I should have catched this myself.
> Yes,
> this implementation is broken for LAM_U48 on 5-level machine.
> 
> What about this:
> 
>         #define untagged_addr(mm,
> addr) ({                                      \
>                 u64 __addr = (__force
> u64)(addr);                               \
>                 s64 sign = (s64)__addr >>
> 63;                                   \
>                 __addr &= (mm)->context.untag_mask |
> sign;                      \
>                 (__force
> __typeof__(addr))__addr;                               \
>         })
> 
> It makes mask effectively. all-ones for supervisor addresses. And it
> is
> less magic to my eyes.

Yea, it seems to leave kernel half addresses alone now, including
leaving non-canonical addresses as non-canonical and 5 level addresses.

With the new bit math:
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

> 
> The generated code also look sane to me:
> 
>     11d0:       48 89 f8                mov    %rdi,%rax
>     11d3:       48 c1 f8 3f             sar    $0x3f,%rax
>     11d7:       48 0b 05 52 2e 00 00    or    
> 0x2e52(%rip),%rax        # 4030 <untag_mask>
>     11de:       48 21 f8                and    %rdi,%rax
> 
> Any comments?
> 
> > Is this special kernel address handling only needed because
> > copy_to_kernel_nofault(), etc call the user helpers?
> 
> I did not have any particular use-case in mind. But just if some
> kernel
> address gets there and bits get cleared we will have very hard to
> debug
> bug.

I just was thinking if we could rearrange the code to avoid untagging
kernel addresses, we could skip this, or even VM_WARN_ON() if we see
one. Seems ok either way.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
  2022-06-10 23:55   ` Edgecombe, Rick P
@ 2022-06-16  9:08   ` Peter Zijlstra
  2022-06-16 16:40     ` Kirill A. Shutemov
  2022-06-17 15:35   ` Alexander Potapenko
  2022-06-28 23:33   ` Andy Lutomirski
  3 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16  9:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel, namit

On Fri, Jun 10, 2022 at 05:35:23PM +0300, Kirill A. Shutemov wrote:

> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 4af5579c7ef7..5b93dad93ff4 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -86,6 +86,9 @@ struct tlb_state {
>  		unsigned long		last_user_mm_spec;
>  	};
>  
> +#ifdef CONFIG_X86_64
> +	u64 lam_cr3_mask;
> +#endif
>  	u16 loaded_mm_asid;
>  	u16 next_asid;
>  

Urgh.. so there's a comment there that states:

/*
 * 6 because 6 should be plenty and struct tlb_state will fit in two cache
 * lines.
 */
#define TLB_NR_DYN_ASIDS        6

And then look at tlb_state:

struct tlb_state {
	struct mm_struct *         loaded_mm;            /*     0     8 */
	union {
		struct mm_struct * last_user_mm;         /*     8     8 */
		long unsigned int  last_user_mm_spec;    /*     8     8 */
	};                                               /*     8     8 */
	u16                        loaded_mm_asid;       /*    16     2 */
	u16                        next_asid;            /*    18     2 */
	bool                       invalidate_other;     /*    20     1 */

	/* XXX 1 byte hole, try to pack */

	short unsigned int         user_pcid_flush_mask; /*    22     2 */
	long unsigned int          cr4;                  /*    24     8 */
	struct tlb_context         ctxs[6];              /*    32    96 */

	/* size: 128, cachelines: 2, members: 8 */
	/* sum members: 127, holes: 1, sum holes: 1 */
};

If you add that u64 as you do, you'll wreck all that.

Either use that one spare byte, or find room elsewhere I suppose.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-13 17:36   ` Edgecombe, Rick P
  2022-06-15 16:58     ` Kirill A. Shutemov
@ 2022-06-16  9:30     ` Peter Zijlstra
  2022-06-16 16:44       ` Kirill A. Shutemov
  2022-06-16  9:34     ` Peter Zijlstra
  2 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16  9:30 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kirill.shutemov, Lutomirski, Andy, dave.hansen, linux-kernel,
	hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov, x86,
	ryabinin.a.a, glider

On Mon, Jun 13, 2022 at 05:36:43PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > +#ifdef CONFIG_X86_64
> > +/*
> > + * Mask out tag bits from the address.
> > + *
> > + * Magic with the 'sign' allows to untag userspace pointer without
> > any branches
> > + * while leaving kernel addresses intact.
> 
> Trying to understand the magic part here. I guess how it works is, when
> the high bit is set, it does the opposite of untagging the addresses by
> setting the tag bits instead of clearing them. So:

The magic is really rather simple to see; there's two observations:

  x ^ y ^ y == x

That is; xor is it's own inverse. And secondly, xor with 1 is a bit
toggle.

So if we mask a negative value, we destroy the sign. Therefore, if we
xor with the sign-bit, we have a nop for positive numbers and a toggle
for negatives (effectively making them positive, -1, 2s complement
yada-yada) then we can mask, without fear of destroying the sign, and
then we xor again to undo whatever we did before, effectively restoring
the sign.

Anyway, concequence of all this is that LAM_U48 won't work correct on
5-level kernels, because the mask will still destroy kernel pointers.

As such, this patch only does LAM_U57.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-13 17:36   ` Edgecombe, Rick P
  2022-06-15 16:58     ` Kirill A. Shutemov
  2022-06-16  9:30     ` Peter Zijlstra
@ 2022-06-16  9:34     ` Peter Zijlstra
  2 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16  9:34 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kirill.shutemov, Lutomirski, Andy, dave.hansen, linux-kernel,
	hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov, x86,
	ryabinin.a.a, glider

On Mon, Jun 13, 2022 at 05:36:43PM +0000, Edgecombe, Rick P wrote:

> Is this special kernel address handling only needed because
> copy_to_kernel_nofault(), etc call the user helpers?

It is to make absolutely sure we don't need to go audit everything, if
code is correct without untag_pointer() it will still be correct with it
on.

Also avoids future bugs due to being robust in general.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
                     ` (2 preceding siblings ...)
  2022-06-13 14:42   ` Michal Hocko
@ 2022-06-16  9:39   ` Peter Zijlstra
  2022-06-28 23:42   ` Andy Lutomirski
  4 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16  9:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Fri, Jun 10, 2022 at 05:35:25PM +0300, Kirill A. Shutemov wrote:
> Add a couple of arch_prctl() handles:
> 
>  - ARCH_ENABLE_TAGGED_ADDR enabled LAM. The argument is required number
>    of tag bits. It is rounded up to the nearest LAM mode that can
>    provide it. For now only LAM_U57 is supported, with 6 tag bits.

As stated in the reply to the other patch; this isn't a 'for now' thing.
Due to how the untag thing works. Supporting U48 on 5-level kernels is
going to require a more complicated untag.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-12 21:03           ` Andy Lutomirski
@ 2022-06-16  9:44             ` Peter Zijlstra
  2022-06-16 16:54               ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16  9:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rick P Edgecombe, Kirill A. Shutemov, Linux Kernel Mailing List,
	H.J. Lu, linux-mm, Dave Hansen, andreyknvl, kcc, Andi Kleen,
	dvyukov, the arch/x86 maintainers, ryabinin.a.a, glider

On Sun, Jun 12, 2022 at 02:03:43PM -0700, Andy Lutomirski wrote:

> >> > > > +       /* LAM has to be enabled before spawning threads */
> >> > > > +       if (get_nr_threads(current) > 1)
> >> > > > +               return -EBUSY;

> >> > > Does this work for vfork()? I guess the idea is that locking is

vfork() isn't the problem, the problem is that Linux allows CLONE_VM
without CLONE_THREAD. Now, mostly nobody does that these days, but it is
possible.

> get_nr_threads() is the wrong thing.  Either look at mm->mm_users or
> find a way to get rid of this restriction entirely.

mm->mm_users should indeed be sufficient here.

> IMO it would not be insane to have a way to iterate over all tasks
> using an mm.  But doing this for io_uring, etc might be interesting.

That has come up so often over the past 15+ years I've no idea how come
we've still not managed to actually do that ;-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48
  2022-06-10 14:35 ` [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
@ 2022-06-16 10:00   ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Fri, Jun 10, 2022 at 05:35:27PM +0300, Kirill A. Shutemov wrote:
> LAM_U48 allows to encode 15 bits of tags into address.
> 
> LAM_U48 steals bits above 47-bit for tags and makes it impossible for
> userspace to use full address space on 5-level paging machine.
> 
> Make these features mutually exclusive: whichever gets enabled first
> blocks the other one.

This patch is broken in that it doesn't fix untag_pointer()

*If* you really want to continue down this road; you'll need something
like:

#define untagged_addr(mm, addr)        ({                              \
       u64 __addr = (__force u64)(addr);                               \
       s64 sign = (s64)__addr >> 63;                                   \
       __addr ^= sign;                                                 \
       __addr &= (mm)->context.untag_mask[sign & 1];                   \
       __addr ^= sign;                                                 \
       (__force __typeof__(addr))__addr;                               \
})

Which uses a different mask for kernel and user pointers.

Anyway, without this U48 patch on, the mask could be a constant, no need
to keep this variable, we can unconditionally unmask U57.

Let me go reply to that other mail too.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
  2022-06-13 17:36   ` Edgecombe, Rick P
@ 2022-06-16 10:02   ` Peter Zijlstra
  2022-06-16 16:48     ` Kirill A. Shutemov
  2022-06-28 23:40   ` Andy Lutomirski
  2 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-16 10:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Fri, Jun 10, 2022 at 05:35:24PM +0300, Kirill A. Shutemov wrote:
> +#ifdef CONFIG_X86_64
> +/*
> + * Mask out tag bits from the address.
> + *
> + * Magic with the 'sign' allows to untag userspace pointer without any branches
> + * while leaving kernel addresses intact.
> + */
> +#define untagged_addr(mm, addr)	({					\
> +	u64 __addr = (__force u64)(addr);				\
> +	s64 sign = (s64)__addr >> 63;					\
> +	__addr ^= sign;							\
> +	__addr &= (mm)->context.untag_mask;				\
> +	__addr ^= sign;							\
> +	(__force __typeof__(addr))__addr;				\
> +})

Can't we make that mask a constant and *always* unmask U57 irrespective
of LAM being on?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-16  9:08   ` Peter Zijlstra
@ 2022-06-16 16:40     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel, namit

On Thu, Jun 16, 2022 at 11:08:07AM +0200, Peter Zijlstra wrote:
> Either use that one spare byte, or find room elsewhere I suppose.

Okay, I will put into the byte after invalidate_other and modify
tlbstate_lam_cr3_mask() and set_tlbstate_lam_cr3_mask() to shift it
accordingly.

It looks like this:

struct tlb_state {
	struct mm_struct *         loaded_mm;            /*     0     8 */
	union {
		struct mm_struct * last_user_mm;         /*     8     8 */
		unsigned long      last_user_mm_spec;    /*     8     8 */
	};                                               /*     8     8 */
	union {
		struct mm_struct *         last_user_mm;         /*     0     8 */
		unsigned long              last_user_mm_spec;    /*     0     8 */
	};

	u16                        loaded_mm_asid;       /*    16     2 */
	u16                        next_asid;            /*    18     2 */
	bool                       invalidate_other;     /*    20     1 */
	u8                         lam;                  /*    21     1 */
	unsigned short             user_pcid_flush_mask; /*    22     2 */
	unsigned long              cr4;                  /*    24     8 */
	struct tlb_context         ctxs[6];              /*    32    96 */

	/* size: 128, cachelines: 2, members: 9 */
};

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-16  9:30     ` Peter Zijlstra
@ 2022-06-16 16:44       ` Kirill A. Shutemov
  2022-06-17 11:36         ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 16:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Edgecombe, Rick P, Lutomirski, Andy, dave.hansen, linux-kernel,
	hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov, x86,
	ryabinin.a.a, glider

On Thu, Jun 16, 2022 at 11:30:49AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 13, 2022 at 05:36:43PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > +#ifdef CONFIG_X86_64
> > > +/*
> > > + * Mask out tag bits from the address.
> > > + *
> > > + * Magic with the 'sign' allows to untag userspace pointer without
> > > any branches
> > > + * while leaving kernel addresses intact.
> > 
> > Trying to understand the magic part here. I guess how it works is, when
> > the high bit is set, it does the opposite of untagging the addresses by
> > setting the tag bits instead of clearing them. So:
> 
> The magic is really rather simple to see; there's two observations:
> 
>   x ^ y ^ y == x
> 
> That is; xor is it's own inverse. And secondly, xor with 1 is a bit
> toggle.
> 
> So if we mask a negative value, we destroy the sign. Therefore, if we
> xor with the sign-bit, we have a nop for positive numbers and a toggle
> for negatives (effectively making them positive, -1, 2s complement
> yada-yada) then we can mask, without fear of destroying the sign, and
> then we xor again to undo whatever we did before, effectively restoring
> the sign.
> 
> Anyway, concequence of all this is that LAM_U48 won't work correct on
> 5-level kernels, because the mask will still destroy kernel pointers.

Any objection against this variant (was posted in the thread):

       	#define untagged_addr(mm, addr) ({                                      \
               	u64 __addr = (__force u64)(addr);                               \
               	s64 sign = (s64)__addr >> 63;                                   \
               	__addr &= (mm)->context.untag_mask | sign;                      \
               	(__force __typeof__(addr))__addr;                               \
       	})

?

I find it easier to follow and it is LAM_U48-safe.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-16 10:02   ` Peter Zijlstra
@ 2022-06-16 16:48     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Andy Lutomirski, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Thu, Jun 16, 2022 at 12:02:16PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 10, 2022 at 05:35:24PM +0300, Kirill A. Shutemov wrote:
> > +#ifdef CONFIG_X86_64
> > +/*
> > + * Mask out tag bits from the address.
> > + *
> > + * Magic with the 'sign' allows to untag userspace pointer without any branches
> > + * while leaving kernel addresses intact.
> > + */
> > +#define untagged_addr(mm, addr)	({					\
> > +	u64 __addr = (__force u64)(addr);				\
> > +	s64 sign = (s64)__addr >> 63;					\
> > +	__addr ^= sign;							\
> > +	__addr &= (mm)->context.untag_mask;				\
> > +	__addr ^= sign;							\
> > +	(__force __typeof__(addr))__addr;				\
> > +})
> 
> Can't we make that mask a constant and *always* unmask U57 irrespective
> of LAM being on?

We can do this if we give up on LAM_U48.

It would also needlessly relax canonical check. I'm not sure it is a good
idea.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-16  9:44             ` Peter Zijlstra
@ 2022-06-16 16:54               ` Kirill A. Shutemov
  2022-06-30  2:04                 ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Rick P Edgecombe, Linux Kernel Mailing List,
	H.J. Lu, linux-mm, Dave Hansen, andreyknvl, kcc, Andi Kleen,
	dvyukov, the arch/x86 maintainers, ryabinin.a.a, glider

On Thu, Jun 16, 2022 at 11:44:59AM +0200, Peter Zijlstra wrote:
> > get_nr_threads() is the wrong thing.  Either look at mm->mm_users or
> > find a way to get rid of this restriction entirely.
> 
> mm->mm_users should indeed be sufficient here.

Hm. kthread_use_mm() doesn't bump mm_users. Do we care?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-13 14:42   ` Michal Hocko
@ 2022-06-16 17:05     ` Kirill A. Shutemov
  2022-06-19 23:40       ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 17:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On Mon, Jun 13, 2022 at 04:42:57PM +0200, Michal Hocko wrote:
> On Fri 10-06-22 17:35:25, Kirill A. Shutemov wrote:
> [...]
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index 1962008fe743..93c8eba1a66d 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -742,6 +742,32 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
> >  }
> >  #endif
> >  
> > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +
> > +	/* Already enabled? */
> > +	if (mm->context.lam_cr3_mask)
> > +		return -EBUSY;
> > +
> > +	/* LAM has to be enabled before spawning threads */
> > +	if (get_nr_threads(current) > 1)
> > +		return -EBUSY;
> 
> This will not be sufficient in general. You can have mm shared with a
> process without CLONE_THREAD. So you would also need to check also
> MMF_MULTIPROCESS. But I do remember that general get_nr_threads is quite
> tricky to use properly. Make sure to CC Oleg Nesterov for more details.
> 
> Also how does this work when the mm is shared with a kernel thread?

It seems we need to check mm_count to exclude kernel threads that use the
mm. But I expect it to produce bunch of false-positives.

Or we can make all CPUs to do

	switch_mm(current->mm, current->mm, current);

and get LAM bits updated regardless what mm it runs. It would also remove
limitation that LAM can only be enabled when there's no threads.

But I feel that is a bad idea, but I have no clue why. :P

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 0/8] Linear Address Masking enabling
  2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2022-06-10 20:22 ` [PATCHv3 0/8] Linear Address Masking enabling Kostya Serebryany
@ 2022-06-16 22:52 ` Edgecombe, Rick P
  2022-06-16 23:43   ` Kirill A. Shutemov
  9 siblings, 1 reply; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-16 22:52 UTC (permalink / raw)
  To: kirill.shutemov, peterz, Lutomirski, Andy, dave.hansen
  Cc: linux-kernel, hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov,
	x86, ryabinin.a.a, glider

On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> Linear Address Masking[1] (LAM) modifies the checking that is applied
> to
> 64-bit linear addresses, allowing software to use of the untranslated
> address bits for metadata.
> 
> The patchset brings support for LAM for userspace addresses.

Arm has this documentation about which memory operations support being
passed tagged pointers, and which do not:
Documentation/arm64/tagged-address-abi.rst

Is the idea that LAM would have something similar, or exactly mirror
the arm ABI? It seems like it is the same right now. Should the docs be
generalized?


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 0/8] Linear Address Masking enabling
  2022-06-16 22:52 ` Edgecombe, Rick P
@ 2022-06-16 23:43   ` Kirill A. Shutemov
  2022-06-16 23:48     ` Edgecombe, Rick P
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-16 23:43 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, Lutomirski, Andy, dave.hansen, linux-kernel, hjl.tools,
	linux-mm, kcc, andreyknvl, ak, dvyukov, x86, ryabinin.a.a,
	glider

On Thu, Jun 16, 2022 at 10:52:14PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > Linear Address Masking[1] (LAM) modifies the checking that is applied
> > to
> > 64-bit linear addresses, allowing software to use of the untranslated
> > address bits for metadata.
> > 
> > The patchset brings support for LAM for userspace addresses.
> 
> Arm has this documentation about which memory operations support being
> passed tagged pointers, and which do not:
> Documentation/arm64/tagged-address-abi.rst
> 
> Is the idea that LAM would have something similar, or exactly mirror
> the arm ABI? It seems like it is the same right now. Should the docs be
> generalized?

It is somewhat similar, but not exact. ARM TBI interface implies tag size
and placement. ARM TBI is per-thread and LAM is per-process.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 0/8] Linear Address Masking enabling
  2022-06-16 23:43   ` Kirill A. Shutemov
@ 2022-06-16 23:48     ` Edgecombe, Rick P
  0 siblings, 0 replies; 67+ messages in thread
From: Edgecombe, Rick P @ 2022-06-16 23:48 UTC (permalink / raw)
  To: kirill.shutemov
  Cc: linux-kernel, peterz, hjl.tools, linux-mm, dave.hansen,
	andreyknvl, kcc, ak, dvyukov, x86, ryabinin.a.a, Lutomirski,
	Andy, glider

On Fri, 2022-06-17 at 02:43 +0300, Kirill A. Shutemov wrote:
> On Thu, Jun 16, 2022 at 10:52:14PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-06-10 at 17:35 +0300, Kirill A. Shutemov wrote:
> > > Linear Address Masking[1] (LAM) modifies the checking that is
> > > applied
> > > to
> > > 64-bit linear addresses, allowing software to use of the
> > > untranslated
> > > address bits for metadata.
> > > 
> > > The patchset brings support for LAM for userspace addresses.
> > 
> > Arm has this documentation about which memory operations support
> > being
> > passed tagged pointers, and which do not:
> > Documentation/arm64/tagged-address-abi.rst
> > 
> > Is the idea that LAM would have something similar, or exactly
> > mirror
> > the arm ABI? It seems like it is the same right now. Should the
> > docs be
> > generalized?
> 
> It is somewhat similar, but not exact. ARM TBI interface implies tag
> size
> and placement. ARM TBI is per-thread and LAM is per-process.

Ah right. I was thinking more the part about which syscalls support
tagged addresses:

https://www.kernel.org/doc/html/latest/arm64/tagged-address-abi.html#id1

Some mention kernel versions where they changed. Just thinking it could
get complex to track which HW features support which syscalls for which
kernel versions.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-16 16:44       ` Kirill A. Shutemov
@ 2022-06-17 11:36         ` Peter Zijlstra
  2022-06-17 14:22           ` H.J. Lu
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-17 11:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Edgecombe, Rick P, Lutomirski, Andy, dave.hansen, linux-kernel,
	hjl.tools, linux-mm, kcc, andreyknvl, ak, dvyukov, x86,
	ryabinin.a.a, glider

On Thu, Jun 16, 2022 at 07:44:40PM +0300, Kirill A. Shutemov wrote:
> Any objection against this variant (was posted in the thread):
> 
>        	#define untagged_addr(mm, addr) ({                                      \
>                	u64 __addr = (__force u64)(addr);                               \
>                	s64 sign = (s64)__addr >> 63;                                   \
>                	__addr &= (mm)->context.untag_mask | sign;                      \
>                	(__force __typeof__(addr))__addr;                               \
>        	})
> 
> ?

Yeah, I suppose that should work fine.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-17 11:36         ` Peter Zijlstra
@ 2022-06-17 14:22           ` H.J. Lu
  2022-06-17 14:28             ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: H.J. Lu @ 2022-06-17 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Edgecombe, Rick P, Lutomirski, Andy,
	dave.hansen, linux-kernel, linux-mm, kcc, andreyknvl, ak,
	dvyukov, x86, ryabinin.a.a, glider

On Fri, Jun 17, 2022 at 4:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 16, 2022 at 07:44:40PM +0300, Kirill A. Shutemov wrote:
> > Any objection against this variant (was posted in the thread):
> >
> >               #define untagged_addr(mm, addr) ({                                      \
> >                       u64 __addr = (__force u64)(addr);                               \
> >                       s64 sign = (s64)__addr >> 63;                                   \
> >                       __addr &= (mm)->context.untag_mask | sign;                      \
> >                       (__force __typeof__(addr))__addr;                               \
> >               })
> >
> > ?
>
> Yeah, I suppose that should work fine.

Won't the sign bit be put at the wrong place?

-- 
H.J.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-17 14:22           ` H.J. Lu
@ 2022-06-17 14:28             ` Peter Zijlstra
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2022-06-17 14:28 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kirill A. Shutemov, Edgecombe, Rick P, Lutomirski, Andy,
	dave.hansen, linux-kernel, linux-mm, kcc, andreyknvl, ak,
	dvyukov, x86, ryabinin.a.a, glider

On Fri, Jun 17, 2022 at 07:22:57AM -0700, H.J. Lu wrote:
> On Fri, Jun 17, 2022 at 4:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jun 16, 2022 at 07:44:40PM +0300, Kirill A. Shutemov wrote:
> > > Any objection against this variant (was posted in the thread):
> > >
> > >               #define untagged_addr(mm, addr) ({                                      \
> > >                       u64 __addr = (__force u64)(addr);                               \
> > >                       s64 sign = (s64)__addr >> 63;                                   \
> > >                       __addr &= (mm)->context.untag_mask | sign;                      \
> > >                       (__force __typeof__(addr))__addr;                               \
> > >               })
> > >
> > > ?
> >
> > Yeah, I suppose that should work fine.
> 
> Won't the sign bit be put at the wrong place?

sign is either 0 or ~0, by or-ing ~0 into the mask, the masking becomes
no-op.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr()
  2022-06-10 14:35 ` [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
  2022-06-10 23:33   ` Edgecombe, Rick P
@ 2022-06-17 15:27   ` Alexander Potapenko
  2022-06-17 22:38     ` Kirill A. Shutemov
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Potapenko @ 2022-06-17 15:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Linux Memory Management List, LKML

On Fri, Jun 10, 2022 at 4:35 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Intel Linear Address Masking (LAM) brings per-mm untagging rules. Pass
> down mm_struct to the untagging helper. It will help to apply untagging
> policy correctly.
>
> In most cases, current->mm is the one to use, but there are some
> exceptions, such as get_user_page_remote().

Wouldn't it be easier to keep using current->mm in untagged_addr(addr)
by default, and introduce a separate macro for the exceptions?

>
> +/*
> + * Architectures that support memory tagging (assigning tags to memory regions,
> + * embedding these tags into addresses that point to these memory regions, and
> + * checking that the memory and the pointer tags match on memory accesses)
> + * redefine this macro to strip tags from pointers.
> + * It's defined as noop for architectures that don't support memory tagging.
> + */
> +#ifndef untagged_addr
> +#define untagged_addr(mm, addr) (addr)
> +#endif
The comment above should probably be extended to explain the effect of `mm`.




-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
  2022-06-10 23:55   ` Edgecombe, Rick P
  2022-06-16  9:08   ` Peter Zijlstra
@ 2022-06-17 15:35   ` Alexander Potapenko
  2022-06-17 22:39     ` Kirill A. Shutemov
  2022-06-28 23:33   ` Andy Lutomirski
  3 siblings, 1 reply; 67+ messages in thread
From: Alexander Potapenko @ 2022-06-17 15:35 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Linux Memory Management List, LKML

On Fri, Jun 10, 2022 at 4:35 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Linear Address Masking mode for userspace pointers encoded in CR3 bits.
> The mode is selected per-thread. Add new thread features indicate that the
> thread has Linear Address Masking enabled.
>
> switch_mm_irqs_off() now respects these flags and constructs CR3
> accordingly.
>
> The active LAM mode gets recorded in the tlb_state.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


> +#ifdef CONFIG_X86_64
> +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> +{
> +       return mm->context.lam_cr3_mask;
> +}

Nit: can we have either "cr3_lam_mask" or "lam_cr3_mask" in both places?



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr()
  2022-06-17 15:27   ` Alexander Potapenko
@ 2022-06-17 22:38     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-17 22:38 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Linux Memory Management List, LKML

On Fri, Jun 17, 2022 at 05:27:46PM +0200, Alexander Potapenko wrote:
> On Fri, Jun 10, 2022 at 4:35 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Intel Linear Address Masking (LAM) brings per-mm untagging rules. Pass
> > down mm_struct to the untagging helper. It will help to apply untagging
> > policy correctly.
> >
> > In most cases, current->mm is the one to use, but there are some
> > exceptions, such as get_user_page_remote().
> 
> Wouldn't it be easier to keep using current->mm in untagged_addr(addr)
> by default, and introduce a separate macro for the exceptions?

I don't think it is a good idea. Explicit mm forces writer to consider
what mm she wants to use in the particular case.

> > +/*
> > + * Architectures that support memory tagging (assigning tags to memory regions,
> > + * embedding these tags into addresses that point to these memory regions, and
> > + * checking that the memory and the pointer tags match on memory accesses)
> > + * redefine this macro to strip tags from pointers.
> > + * It's defined as noop for architectures that don't support memory tagging.
> > + */
> > +#ifndef untagged_addr
> > +#define untagged_addr(mm, addr) (addr)
> > +#endif
> The comment above should probably be extended to explain the effect of `mm`.

Sure, will update.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-17 15:35   ` Alexander Potapenko
@ 2022-06-17 22:39     ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-17 22:39 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	the arch/x86 maintainers, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, Linux Memory Management List, LKML

On Fri, Jun 17, 2022 at 05:35:05PM +0200, Alexander Potapenko wrote:
> On Fri, Jun 10, 2022 at 4:35 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Linear Address Masking mode for userspace pointers encoded in CR3 bits.
> > The mode is selected per-thread. Add new thread features indicate that the
> > thread has Linear Address Masking enabled.
> >
> > switch_mm_irqs_off() now respects these flags and constructs CR3
> > accordingly.
> >
> > The active LAM mode gets recorded in the tlb_state.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> 
> > +#ifdef CONFIG_X86_64
> > +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> > +{
> > +       return mm->context.lam_cr3_mask;
> > +}
> 
> Nit: can we have either "cr3_lam_mask" or "lam_cr3_mask" in both places?

With changes sugessted by Peter, the field in the mmu_context will be
called 'lam' as it is not CR3 mask anymore.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-16 17:05     ` Kirill A. Shutemov
@ 2022-06-19 23:40       ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-19 23:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Michal Hocko, Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On Thu, Jun 16, 2022 at 08:05:10PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jun 13, 2022 at 04:42:57PM +0200, Michal Hocko wrote:
> > On Fri 10-06-22 17:35:25, Kirill A. Shutemov wrote:
> > [...]
> > > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > > index 1962008fe743..93c8eba1a66d 100644
> > > --- a/arch/x86/kernel/process_64.c
> > > +++ b/arch/x86/kernel/process_64.c
> > > @@ -742,6 +742,32 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
> > >  }
> > >  #endif
> > >  
> > > +static int prctl_enable_tagged_addr(unsigned long nr_bits)
> > > +{
> > > +	struct mm_struct *mm = current->mm;
> > > +
> > > +	/* Already enabled? */
> > > +	if (mm->context.lam_cr3_mask)
> > > +		return -EBUSY;
> > > +
> > > +	/* LAM has to be enabled before spawning threads */
> > > +	if (get_nr_threads(current) > 1)
> > > +		return -EBUSY;
> > 
> > This will not be sufficient in general. You can have mm shared with a
> > process without CLONE_THREAD. So you would also need to check also
> > MMF_MULTIPROCESS. But I do remember that general get_nr_threads is quite
> > tricky to use properly. Make sure to CC Oleg Nesterov for more details.
> > 
> > Also how does this work when the mm is shared with a kernel thread?
> 
> It seems we need to check mm_count to exclude kernel threads that use the
> mm. But I expect it to produce bunch of false-positives.
> 
> Or we can make all CPUs to do
> 
> 	switch_mm(current->mm, current->mm, current);
> 
> and get LAM bits updated regardless what mm it runs. It would also remove
> limitation that LAM can only be enabled when there's no threads.
> 
> But I feel that is a bad idea, but I have no clue why. :P

Below is what I meant. Maybe it's not that bad. I donno.

Any opinions?

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 56822d313b96..69e6b11efa62 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -752,6 +752,16 @@ static bool lam_u48_allowed(void)
 	return find_vma(mm, DEFAULT_MAP_WINDOW) == NULL;
 }
 
+static void enable_lam_func(void *mm)
+{
+	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+
+	if (loaded_mm != mm)
+		return;
+
+	switch_mm(loaded_mm, loaded_mm, current);
+}
+
 static int prctl_enable_tagged_addr(unsigned long nr_bits)
 {
 	struct mm_struct *mm = current->mm;
@@ -760,10 +770,6 @@ static int prctl_enable_tagged_addr(unsigned long nr_bits)
 	if (mm->context.lam_cr3_mask)
 		return -EBUSY;
 
-	/* LAM has to be enabled before spawning threads */
-	if (get_nr_threads(current) > 1)
-		return -EBUSY;
-
 	if (!nr_bits) {
 		return -EINVAL;
 	} else if (nr_bits <= 6) {
@@ -785,8 +791,8 @@ static int prctl_enable_tagged_addr(unsigned long nr_bits)
 		return -EINVAL;
 	}
 
-	/* Update CR3 to get LAM active */
-	switch_mm(current->mm, current->mm, current);
+	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
+
 	return 0;
 }
 
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status
  2022-06-11  1:28     ` Kirill A. Shutemov
@ 2022-06-27 12:00       ` Catalin Marinas
  0 siblings, 0 replies; 67+ messages in thread
From: Catalin Marinas @ 2022-06-27 12:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Will Deacon, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, x86, Kostya Serebryany, Andrey Ryabinin,
	Andrey Konovalov, Alexander Potapenko, Dmitry Vyukov, H . J . Lu,
	Andi Kleen, Rick Edgecombe, linux-mm, linux-kernel

Hi Kirill,

Sorry, this fell through the cracks (thanks to Will for reminding me).

On Sat, Jun 11, 2022 at 04:28:30AM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 10, 2022 at 08:24:38AM -0700, Dave Hansen wrote:
> > On 6/10/22 07:35, Kirill A. Shutemov wrote:
> > > +/*
> > > + * Report architecture specific information
> > > + */
> > > +int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns,
> > > +			struct pid *pid, struct task_struct *task)
> > > +{
> > > +	/*
> > > +	 * Report AVX512 state if the processor and build option supported.
> > > +	 */
> > > +	if (cpu_feature_enabled(X86_FEATURE_AVX512F))
> > > +		avx512_status(m, task);
> > > +
> > > +	seq_printf(m, "untag_mask:\t%#lx\n", mm_untag_mask(task->mm));
> > > +
> > > +	return 0;
> > > +}
> > 
> > Arch-specific gunk is great for, well, arch-specific stuff.  AVX-512 and
> > its, um, "quirks", really won't show up anywhere else.  But x86 isn't
> > even the first to be doing this address tagging business.
> > 
> > Shouldn't we be talking to the ARM folks about a common way to do this?
> 
> + Catalin, Will.
> 
> I guess we can expose the mask via proc for ARM too, but I'm not sure if
> we can unify interface further without breaking existing TBI users: TBI is
> enabled per-thread while LAM is per-process.

Hardware TBI is enabled for all user space at boot (it was like this
form the beginning). The TBI syscall interface is per-thread (TIF flag)
but it doesn't change any hardware behaviour. The mask is fixed in
hardware, unchangeable. I'm fine with reporting an untag_mask in a
common way, only that setting it won't be possible on arm64.

If arm64 ever gains support for a modifiable untag_mask, it's a good
chance it would be per mm as well since the controls for TBI are per
page table.

-- 
Catalin

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
                     ` (2 preceding siblings ...)
  2022-06-17 15:35   ` Alexander Potapenko
@ 2022-06-28 23:33   ` Andy Lutomirski
  2022-06-29  0:34     ` Kirill A. Shutemov
  3 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-28 23:33 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On 6/10/22 07:35, Kirill A. Shutemov wrote:
> Linear Address Masking mode for userspace pointers encoded in CR3 bits.
> The mode is selected per-thread. Add new thread features indicate that the
> thread has Linear Address Masking enabled.
> 
> switch_mm_irqs_off() now respects these flags and constructs CR3
> accordingly.
> 
> The active LAM mode gets recorded in the tlb_state.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   arch/x86/include/asm/mmu.h         |  1 +
>   arch/x86/include/asm/mmu_context.h | 24 ++++++++++++
>   arch/x86/include/asm/tlbflush.h    |  3 ++
>   arch/x86/mm/tlb.c                  | 62 ++++++++++++++++++++++--------
>   4 files changed, 75 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> index 5d7494631ea9..d150e92163b6 100644
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -40,6 +40,7 @@ typedef struct {
>   
>   #ifdef CONFIG_X86_64
>   	unsigned short flags;
> +	u64 lam_cr3_mask;
>   #endif
>   
>   	struct mutex lock;
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index b8d40ddeab00..e6eac047c728 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
>   }
>   #endif
>   
> +#ifdef CONFIG_X86_64
> +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> +{
> +	return mm->context.lam_cr3_mask;
> +}
> +
> +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> +{
> +	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
> +}
> +
> +#else
> +
> +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> +{
> +	return 0;
> +}
> +
> +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> +{
> +}
> +#endif

Do we really need the ifdeffery here?  I see no real harm in having the 
field exist on 32-bit -- we don't care much about performance for 32-bit 
kernels.

> -	if (real_prev == next) {
> +	if (real_prev == next && prev_lam == new_lam) {
>   		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
>   			   next->context.ctx_id);

This looks wrong to me.  If we change threads within the same mm but lam 
changes (which is certainly possible by a race if nothing else) then 
this will go down the "we really are changing mms" path, not the "we're 
not changing but we might need to flush something" path.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
  2022-06-13 17:36   ` Edgecombe, Rick P
  2022-06-16 10:02   ` Peter Zijlstra
@ 2022-06-28 23:40   ` Andy Lutomirski
  2022-06-29  0:42     ` Kirill A. Shutemov
  2 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-28 23:40 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On 6/10/22 07:35, Kirill A. Shutemov wrote:
> untagged_addr() is a helper used by the core-mm to strip tag bits and
> get the address to the canonical shape. In only handles userspace
> addresses. The untagging mask is stored in mmu_context and will be set
> on enabling LAM for the process.
> 
> The tags must not be included into check whether it's okay to access the
> userspace address.
> 
> Strip tags in access_ok().

What is the intended behavior for an access that spans a tag boundary?

Also, at the risk of a potentially silly question, why do we need to 
strip the tag before access_ok()?  With LAM, every valid tagged user 
address is also a valid untagged address, right?  (There is no 
particular need to enforce the actual value of TASK_SIZE_MAX on 
*access*, just on mmap.)

IOW, wouldn't it be sufficient, and probably better than what we have 
now, to just check that the entire range has the high bit clear?

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
                     ` (3 preceding siblings ...)
  2022-06-16  9:39   ` Peter Zijlstra
@ 2022-06-28 23:42   ` Andy Lutomirski
  2022-06-29  0:53     ` Kirill A. Shutemov
  4 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-28 23:42 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Peter Zijlstra
  Cc: x86, Kostya Serebryany, Andrey Ryabinin, Andrey Konovalov,
	Alexander Potapenko, Dmitry Vyukov, H . J . Lu, Andi Kleen,
	Rick Edgecombe, linux-mm, linux-kernel

On 6/10/22 07:35, Kirill A. Shutemov wrote:

> +	/* Update CR3 to get LAM active */
> +	switch_mm(current->mm, current->mm, current);

Can you at least justify this oddity?  When changing an LDT, we use a 
dedicated mechanism.  Is there a significant benefit to abusing 
switch_mm for this?

Also, why can't we enable LAM on a multithreaded process?  We can change 
an LDT, and the code isn't even particularly complicated.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-28 23:33   ` Andy Lutomirski
@ 2022-06-29  0:34     ` Kirill A. Shutemov
  2022-06-30  1:51       ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-29  0:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Tue, Jun 28, 2022 at 04:33:21PM -0700, Andy Lutomirski wrote:
> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> > Linear Address Masking mode for userspace pointers encoded in CR3 bits.
> > The mode is selected per-thread. Add new thread features indicate that the
> > thread has Linear Address Masking enabled.
> > 
> > switch_mm_irqs_off() now respects these flags and constructs CR3
> > accordingly.
> > 
> > The active LAM mode gets recorded in the tlb_state.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >   arch/x86/include/asm/mmu.h         |  1 +
> >   arch/x86/include/asm/mmu_context.h | 24 ++++++++++++
> >   arch/x86/include/asm/tlbflush.h    |  3 ++
> >   arch/x86/mm/tlb.c                  | 62 ++++++++++++++++++++++--------
> >   4 files changed, 75 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
> > index 5d7494631ea9..d150e92163b6 100644
> > --- a/arch/x86/include/asm/mmu.h
> > +++ b/arch/x86/include/asm/mmu.h
> > @@ -40,6 +40,7 @@ typedef struct {
> >   #ifdef CONFIG_X86_64
> >   	unsigned short flags;
> > +	u64 lam_cr3_mask;
> >   #endif
> >   	struct mutex lock;
> > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> > index b8d40ddeab00..e6eac047c728 100644
> > --- a/arch/x86/include/asm/mmu_context.h
> > +++ b/arch/x86/include/asm/mmu_context.h
> > @@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
> >   }
> >   #endif
> > +#ifdef CONFIG_X86_64
> > +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> > +{
> > +	return mm->context.lam_cr3_mask;
> > +}
> > +
> > +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> > +{
> > +	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
> > +}
> > +
> > +#else
> > +
> > +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
> > +{
> > +}
> > +#endif
> 
> Do we really need the ifdeffery here?  I see no real harm in having the
> field exist on 32-bit -- we don't care much about performance for 32-bit
> kernels.

The waste doesn't feel right to me. I would rather keep it.

But sure I can do this if needed.

> > -	if (real_prev == next) {
> > +	if (real_prev == next && prev_lam == new_lam) {
> >   		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
> >   			   next->context.ctx_id);
> 
> This looks wrong to me.  If we change threads within the same mm but lam
> changes (which is certainly possible by a race if nothing else) then this
> will go down the "we really are changing mms" path, not the "we're not
> changing but we might need to flush something" path.

If LAM gets enabled we must write CR3 with the new LAM mode. Without the
change real_prev == next case will not do this for !was_lazy case.

Note that currently enabling LAM is done by setting LAM mode in the mmu
context and doing switch_mm(current->mm, current->mm, current), so it is
very important case.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-28 23:40   ` Andy Lutomirski
@ 2022-06-29  0:42     ` Kirill A. Shutemov
  2022-06-30  2:38       ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-29  0:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Tue, Jun 28, 2022 at 04:40:48PM -0700, Andy Lutomirski wrote:
> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> > untagged_addr() is a helper used by the core-mm to strip tag bits and
> > get the address to the canonical shape. In only handles userspace
> > addresses. The untagging mask is stored in mmu_context and will be set
> > on enabling LAM for the process.
> > 
> > The tags must not be included into check whether it's okay to access the
> > userspace address.
> > 
> > Strip tags in access_ok().
> 
> What is the intended behavior for an access that spans a tag boundary?

You mean if 'addr' passed to access_ok() is below 47- or 56-bit but 'addr'
+ 'size' overflows into tags? If is not valid access and the current
implementation works correctly.

> Also, at the risk of a potentially silly question, why do we need to strip
> the tag before access_ok()?  With LAM, every valid tagged user address is
> also a valid untagged address, right?  (There is no particular need to
> enforce the actual value of TASK_SIZE_MAX on *access*, just on mmap.)
> 
> IOW, wouldn't it be sufficient, and probably better than what we have now,
> to just check that the entire range has the high bit clear?

No. We do things to addresses on kernel side beyond dereferencing them.
Like comparing addresses have to make sense. find_vma() has to find
relevant mapping and so on. 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-28 23:42   ` Andy Lutomirski
@ 2022-06-29  0:53     ` Kirill A. Shutemov
  2022-06-30  2:29       ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-06-29  0:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra, x86, Kostya Serebryany,
	Andrey Ryabinin, Andrey Konovalov, Alexander Potapenko,
	Dmitry Vyukov, H . J . Lu, Andi Kleen, Rick Edgecombe, linux-mm,
	linux-kernel

On Tue, Jun 28, 2022 at 04:42:40PM -0700, Andy Lutomirski wrote:
> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> 
> > +	/* Update CR3 to get LAM active */
> > +	switch_mm(current->mm, current->mm, current);
> 
> Can you at least justify this oddity?  When changing an LDT, we use a
> dedicated mechanism.  Is there a significant benefit to abusing switch_mm
> for this?

I'm not sure I follow. LAM mode is set in CR3. switch_mm() has to handle
it anyway to context switch. Why do you consider it abuse?

> 
> Also, why can't we enable LAM on a multithreaded process?  We can change an
> LDT, and the code isn't even particularly complicated.

I reworked this in v4[1] and it allows multithreaded processes. Have you
got that version?

Intel had issue with mail server, but I assumed it didn't affect my
patchset since I see it in the archive.

[1] https://lore.kernel.org/all/20220622162230.83474-1-kirill.shutemov@linux.intel.com/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 4/8] x86/mm: Handle LAM on context switch
  2022-06-29  0:34     ` Kirill A. Shutemov
@ 2022-06-30  1:51       ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-30  1:51 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List



On Tue, Jun 28, 2022, at 5:34 PM, Kirill A. Shutemov wrote:
> On Tue, Jun 28, 2022 at 04:33:21PM -0700, Andy Lutomirski wrote:
>> On 6/10/22 07:35, Kirill A. Shutemov wrote:
>> > Linear Address Masking mode for userspace pointers encoded in CR3 bits.
>> > The mode is selected per-thread. Add new thread features indicate that the
>> > thread has Linear Address Masking enabled.
>> > 
>> > switch_mm_irqs_off() now respects these flags and constructs CR3
>> > accordingly.
>> > 
>> > The active LAM mode gets recorded in the tlb_state.
>> > 
>> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > ---
>> >   arch/x86/include/asm/mmu.h         |  1 +
>> >   arch/x86/include/asm/mmu_context.h | 24 ++++++++++++
>> >   arch/x86/include/asm/tlbflush.h    |  3 ++
>> >   arch/x86/mm/tlb.c                  | 62 ++++++++++++++++++++++--------
>> >   4 files changed, 75 insertions(+), 15 deletions(-)
>> > 
>> > diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
>> > index 5d7494631ea9..d150e92163b6 100644
>> > --- a/arch/x86/include/asm/mmu.h
>> > +++ b/arch/x86/include/asm/mmu.h
>> > @@ -40,6 +40,7 @@ typedef struct {
>> >   #ifdef CONFIG_X86_64
>> >   	unsigned short flags;
>> > +	u64 lam_cr3_mask;
>> >   #endif
>> >   	struct mutex lock;
>> > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>> > index b8d40ddeab00..e6eac047c728 100644
>> > --- a/arch/x86/include/asm/mmu_context.h
>> > +++ b/arch/x86/include/asm/mmu_context.h
>> > @@ -91,6 +91,29 @@ static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
>> >   }
>> >   #endif
>> > +#ifdef CONFIG_X86_64
>> > +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
>> > +{
>> > +	return mm->context.lam_cr3_mask;
>> > +}
>> > +
>> > +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
>> > +{
>> > +	mm->context.lam_cr3_mask = oldmm->context.lam_cr3_mask;
>> > +}
>> > +
>> > +#else
>> > +
>> > +static inline u64 mm_cr3_lam_mask(struct mm_struct *mm)
>> > +{
>> > +	return 0;
>> > +}
>> > +
>> > +static inline void dup_lam(struct mm_struct *oldmm, struct mm_struct *mm)
>> > +{
>> > +}
>> > +#endif
>> 
>> Do we really need the ifdeffery here?  I see no real harm in having the
>> field exist on 32-bit -- we don't care much about performance for 32-bit
>> kernels.
>
> The waste doesn't feel right to me. I would rather keep it.
>
> But sure I can do this if needed.

I could go either way here.

>
>> > -	if (real_prev == next) {
>> > +	if (real_prev == next && prev_lam == new_lam) {
>> >   		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
>> >   			   next->context.ctx_id);
>> 
>> This looks wrong to me.  If we change threads within the same mm but lam
>> changes (which is certainly possible by a race if nothing else) then this
>> will go down the "we really are changing mms" path, not the "we're not
>> changing but we might need to flush something" path.
>
> If LAM gets enabled we must write CR3 with the new LAM mode. Without the
> change real_prev == next case will not do this for !was_lazy case.

You could fix that.  Or you could determine that this isn’t actually needed, just like updating the LDT in that path isn’t needed, if you change the way LAM is updated.

>
> Note that currently enabling LAM is done by setting LAM mode in the mmu
> context and doing switch_mm(current->mm, current->mm, current), so it is
> very important case.
>

Well, I did separately ask why this is the case.

> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-16 16:54               ` Kirill A. Shutemov
@ 2022-06-30  2:04                 ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-30  2:04 UTC (permalink / raw)
  To: Kirill A. Shutemov, Peter Zijlstra (Intel)
  Cc: Rick P Edgecombe, Linux Kernel Mailing List, H.J. Lu, linux-mm,
	Dave Hansen, andreyknvl, kcc, Andi Kleen, dvyukov,
	the arch/x86 maintainers, ryabinin.a.a, glider



On Thu, Jun 16, 2022, at 9:54 AM, Kirill A. Shutemov wrote:
> On Thu, Jun 16, 2022 at 11:44:59AM +0200, Peter Zijlstra wrote:
>> > get_nr_threads() is the wrong thing.  Either look at mm->mm_users or
>> > find a way to get rid of this restriction entirely.
>> 
>> mm->mm_users should indeed be sufficient here.
>
> Hm. kthread_use_mm() doesn't bump mm_users. Do we care?

I think the idea is that the kthread in question is expected to hold an mm_users reference. 

>
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-29  0:53     ` Kirill A. Shutemov
@ 2022-06-30  2:29       ` Andy Lutomirski
  2022-07-01 15:38         ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-30  2:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List



On Tue, Jun 28, 2022, at 5:53 PM, Kirill A. Shutemov wrote:
> On Tue, Jun 28, 2022 at 04:42:40PM -0700, Andy Lutomirski wrote:
>> On 6/10/22 07:35, Kirill A. Shutemov wrote:
>> 
>> > +	/* Update CR3 to get LAM active */
>> > +	switch_mm(current->mm, current->mm, current);
>> 
>> Can you at least justify this oddity?  When changing an LDT, we use a
>> dedicated mechanism.  Is there a significant benefit to abusing switch_mm
>> for this?
>
> I'm not sure I follow. LAM mode is set in CR3. switch_mm() has to handle
> it anyway to context switch. Why do you consider it abuse?
>
>> 
>> Also, why can't we enable LAM on a multithreaded process?  We can change an
>> LDT, and the code isn't even particularly complicated.
>
> I reworked this in v4[1] and it allows multithreaded processes. Have you
> got that version?
>
> Intel had issue with mail server, but I assumed it didn't affect my
> patchset since I see it in the archive.
>

I didn’t notice it. Not quite sure what the issue was. Could just be incompetence on my part.

I think that’s the right idea, except that I think you shouldn’t use switch_mm for this. Just update the LAM bits directly.   Once you read mm_cpumask, you should be guaranteed (see next paragraph) that, for each CPU that isn’t in the set, if it switches to the new mm, it will notice the new LAM.

I say “should be” because I think smp_wmb() is insufficient. You’re ordering a write with a subsequent read, which needs smp_mb().

> [1] 
> https://lore.kernel.org/all/20220622162230.83474-1-kirill.shutemov@linux.intel.com/
>
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-29  0:42     ` Kirill A. Shutemov
@ 2022-06-30  2:38       ` Andy Lutomirski
  2022-07-05  0:13         ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-06-30  2:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List



On Tue, Jun 28, 2022, at 5:42 PM, Kirill A. Shutemov wrote:
> On Tue, Jun 28, 2022 at 04:40:48PM -0700, Andy Lutomirski wrote:
>> On 6/10/22 07:35, Kirill A. Shutemov wrote:
>> > untagged_addr() is a helper used by the core-mm to strip tag bits and
>> > get the address to the canonical shape. In only handles userspace
>> > addresses. The untagging mask is stored in mmu_context and will be set
>> > on enabling LAM for the process.
>> > 
>> > The tags must not be included into check whether it's okay to access the
>> > userspace address.
>> > 
>> > Strip tags in access_ok().
>> 
>> What is the intended behavior for an access that spans a tag boundary?
>
> You mean if 'addr' passed to access_ok() is below 47- or 56-bit but 'addr'
> + 'size' overflows into tags? If is not valid access and the current
> implementation works correctly.
>
>> Also, at the risk of a potentially silly question, why do we need to strip
>> the tag before access_ok()?  With LAM, every valid tagged user address is
>> also a valid untagged address, right?  (There is no particular need to
>> enforce the actual value of TASK_SIZE_MAX on *access*, just on mmap.)
>> 
>> IOW, wouldn't it be sufficient, and probably better than what we have now,
>> to just check that the entire range has the high bit clear?
>
> No. We do things to addresses on kernel side beyond dereferencing them.
> Like comparing addresses have to make sense. find_vma() has to find
> relevant mapping and so on. 

I think you’re misunderstanding me. Of course things like find_vma() need to untag the address. (But things like munmap, IMO, ought not accept tagged addresses.)

But I’m not talking about that at all. I’m asking why we need to untag an address for access_ok.  In the bad old days, access_ok checked that a range was below a *variable* cutoff. But set_fs() is gone, and I don’t think this is needed any more.

With some off-the-cuff bit hackery, I think it ought to be sufficient to calculate addr+len and fail if the overflow or sign bits get set. If LAM is off, we could calculate addr | len and fail if either of the top two bits is set, but LAM may not let us get away with that.  The general point being that, on x86 (as long as we ignore AMD’s LAM-like feature) an address is a user address if the top bit is clear. Whether that address is canonical or not or will translate or not is a separate issue. (And making this change would require allowing uaccess to #GP, which requires some care.)

What do you think?

>
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-06-30  2:29       ` Andy Lutomirski
@ 2022-07-01 15:38         ` Kirill A. Shutemov
  2022-07-02 23:55           ` Andy Lutomirski
  0 siblings, 1 reply; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-07-01 15:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List

On Wed, Jun 29, 2022 at 07:29:13PM -0700, Andy Lutomirski wrote:
> 
> 
> On Tue, Jun 28, 2022, at 5:53 PM, Kirill A. Shutemov wrote:
> > On Tue, Jun 28, 2022 at 04:42:40PM -0700, Andy Lutomirski wrote:
> >> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> >> 
> >> > +	/* Update CR3 to get LAM active */
> >> > +	switch_mm(current->mm, current->mm, current);
> >> 
> >> Can you at least justify this oddity?  When changing an LDT, we use a
> >> dedicated mechanism.  Is there a significant benefit to abusing switch_mm
> >> for this?
> >
> > I'm not sure I follow. LAM mode is set in CR3. switch_mm() has to handle
> > it anyway to context switch. Why do you consider it abuse?
> >
> >> 
> >> Also, why can't we enable LAM on a multithreaded process?  We can change an
> >> LDT, and the code isn't even particularly complicated.
> >
> > I reworked this in v4[1] and it allows multithreaded processes. Have you
> > got that version?
> >
> > Intel had issue with mail server, but I assumed it didn't affect my
> > patchset since I see it in the archive.
> >
> 
> I didn’t notice it. Not quite sure what the issue was. Could just be
> incompetence on my part.
> 
> I think that’s the right idea, except that I think you shouldn’t use
> switch_mm for this. Just update the LAM bits directly.   Once you read
> mm_cpumask, you should be guaranteed (see next paragraph) that, for each
> CPU that isn’t in the set, if it switches to the new mm, it will notice
> the new LAM.
> 
> I say “should be” because I think smp_wmb() is insufficient. You’re
> ordering a write with a subsequent read, which needs smp_mb().

I think it is better to put smp_mb() to make it explicit.

Does the fixup below look okay?

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 2d70d75e207f..8da54e7b6f98 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -367,4 +367,30 @@ static inline void __native_tlb_flush_global(unsigned long cr4)
 	native_write_cr4(cr4 ^ X86_CR4_PGE);
 	native_write_cr4(cr4);
 }
+
+#ifdef CONFIG_X86_64
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	u64 lam = this_cpu_read(cpu_tlbstate.lam);
+
+	return lam << X86_CR3_LAM_U57_BIT;
+}
+
+static inline void set_tlbstate_lam_cr3_mask(u64 mask)
+{
+	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
+}
+
+#else
+
+static inline u64 tlbstate_lam_cr3_mask(void)
+{
+	return 0;
+}
+
+static inline void set_tlbstate_lam_cr3_mask(u64 mask)
+{
+}
+#endif
+
 #endif /* _ASM_X86_TLBFLUSH_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 427ebef3f64b..cd2b03fe94c4 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -745,15 +745,16 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
 static void enable_lam_func(void *mm)
 {
 	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	unsigned long lam_mask;
 
 	if (loaded_mm != mm)
 		return;
 
-	/* Counterpart of smp_wmb() in prctl_enable_tagged_addr() */
-	smp_rmb();
+	lam_mask = READ_ONCE(loaded_mm->context.lam_cr3_mask);
 
 	/* Update CR3 to get LAM active on the CPU */
-	switch_mm(loaded_mm, loaded_mm, current);
+	write_cr3(__read_cr3() | lam_mask);
+	set_tlbstate_lam_cr3_mask(lam_mask);
 }
 
 static bool lam_u48_allowed(void)
@@ -805,7 +806,7 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits)
 	}
 
 	/* Make lam_cr3_mask and untag_mask visible on other CPUs */
-	smp_wmb();
+	smp_mb();
 
 	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
 out:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c5c4f76329c2..d9a2acdae90f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -486,31 +486,6 @@ void cr4_update_pce(void *ignored)
 static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
 #endif
 
-#ifdef CONFIG_X86_64
-static inline u64 tlbstate_lam_cr3_mask(void)
-{
-	u64 lam = this_cpu_read(cpu_tlbstate.lam);
-
-	return lam << X86_CR3_LAM_U57_BIT;
-}
-
-static inline void set_tlbstate_lam_cr3_mask(u64 mask)
-{
-	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
-}
-
-#else
-
-static inline u64 tlbstate_lam_cr3_mask(void)
-{
-	return 0;
-}
-
-static inline void set_tlbstate_lam_cr3_mask(u64 mask)
-{
-}
-#endif
-
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -581,7 +556,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	 * provides that full memory barrier and core serializing
 	 * instruction.
 	 */
-	if (real_prev == next && prev_lam == new_lam) {
+	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
 
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-01 15:38         ` Kirill A. Shutemov
@ 2022-07-02 23:55           ` Andy Lutomirski
  2022-07-04 13:43             ` Kirill A. Shutemov
  0 siblings, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2022-07-02 23:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List



On Fri, Jul 1, 2022, at 8:38 AM, Kirill A. Shutemov wrote:
> On Wed, Jun 29, 2022 at 07:29:13PM -0700, Andy Lutomirski wrote:
>> 
>> 
>> On Tue, Jun 28, 2022, at 5:53 PM, Kirill A. Shutemov wrote:
>> > On Tue, Jun 28, 2022 at 04:42:40PM -0700, Andy Lutomirski wrote:
>> >> On 6/10/22 07:35, Kirill A. Shutemov wrote:
>> >> 
>> >> > +	/* Update CR3 to get LAM active */
>> >> > +	switch_mm(current->mm, current->mm, current);
>> >> 
>> >> Can you at least justify this oddity?  When changing an LDT, we use a
>> >> dedicated mechanism.  Is there a significant benefit to abusing switch_mm
>> >> for this?
>> >
>> > I'm not sure I follow. LAM mode is set in CR3. switch_mm() has to handle
>> > it anyway to context switch. Why do you consider it abuse?
>> >
>> >> 
>> >> Also, why can't we enable LAM on a multithreaded process?  We can change an
>> >> LDT, and the code isn't even particularly complicated.
>> >
>> > I reworked this in v4[1] and it allows multithreaded processes. Have you
>> > got that version?
>> >
>> > Intel had issue with mail server, but I assumed it didn't affect my
>> > patchset since I see it in the archive.
>> >
>> 
>> I didn’t notice it. Not quite sure what the issue was. Could just be
>> incompetence on my part.
>> 
>> I think that’s the right idea, except that I think you shouldn’t use
>> switch_mm for this. Just update the LAM bits directly.   Once you read
>> mm_cpumask, you should be guaranteed (see next paragraph) that, for each
>> CPU that isn’t in the set, if it switches to the new mm, it will notice
>> the new LAM.
>> 
>> I say “should be” because I think smp_wmb() is insufficient. You’re
>> ordering a write with a subsequent read, which needs smp_mb().
>
> I think it is better to put smp_mb() to make it explicit.
>
> Does the fixup below look okay?
>
> diff --git a/arch/x86/include/asm/tlbflush.h 
> b/arch/x86/include/asm/tlbflush.h
> index 2d70d75e207f..8da54e7b6f98 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -367,4 +367,30 @@ static inline void 
> __native_tlb_flush_global(unsigned long cr4)
>  	native_write_cr4(cr4 ^ X86_CR4_PGE);
>  	native_write_cr4(cr4);
>  }
> +
> +#ifdef CONFIG_X86_64
> +static inline u64 tlbstate_lam_cr3_mask(void)
> +{
> +	u64 lam = this_cpu_read(cpu_tlbstate.lam);
> +
> +	return lam << X86_CR3_LAM_U57_BIT;
> +}
> +
> +static inline void set_tlbstate_lam_cr3_mask(u64 mask)
> +{
> +	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
> +}
> +
> +#else
> +
> +static inline u64 tlbstate_lam_cr3_mask(void)
> +{
> +	return 0;
> +}
> +
> +static inline void set_tlbstate_lam_cr3_mask(u64 mask)
> +{
> +}
> +#endif
> +
>  #endif /* _ASM_X86_TLBFLUSH_H */
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 427ebef3f64b..cd2b03fe94c4 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -745,15 +745,16 @@ static long prctl_map_vdso(const struct 
> vdso_image *image, unsigned long addr)
>  static void enable_lam_func(void *mm)
>  {
>  	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> +	unsigned long lam_mask;
> 
>  	if (loaded_mm != mm)
>  		return;
> 
> -	/* Counterpart of smp_wmb() in prctl_enable_tagged_addr() */
> -	smp_rmb();
> +	lam_mask = READ_ONCE(loaded_mm->context.lam_cr3_mask);
> 
>  	/* Update CR3 to get LAM active on the CPU */
> -	switch_mm(loaded_mm, loaded_mm, current);
> +	write_cr3(__read_cr3() | lam_mask);

Perhaps this should also mask off the old LAM mask?

> +	set_tlbstate_lam_cr3_mask(lam_mask);
>  }
> 
>  static bool lam_u48_allowed(void)
> @@ -805,7 +806,7 @@ static int prctl_enable_tagged_addr(struct 
> mm_struct *mm, unsigned long nr_bits)
>  	}
> 
>  	/* Make lam_cr3_mask and untag_mask visible on other CPUs */
> -	smp_wmb();
> +	smp_mb();
> 
>  	on_each_cpu_mask(mm_cpumask(mm), enable_lam_func, mm, true);
>  out:
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index c5c4f76329c2..d9a2acdae90f 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -486,31 +486,6 @@ void cr4_update_pce(void *ignored)
>  static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
>  #endif
> 
> -#ifdef CONFIG_X86_64
> -static inline u64 tlbstate_lam_cr3_mask(void)
> -{
> -	u64 lam = this_cpu_read(cpu_tlbstate.lam);
> -
> -	return lam << X86_CR3_LAM_U57_BIT;
> -}
> -
> -static inline void set_tlbstate_lam_cr3_mask(u64 mask)
> -{
> -	this_cpu_write(cpu_tlbstate.lam, mask >> X86_CR3_LAM_U57_BIT);
> -}
> -
> -#else
> -
> -static inline u64 tlbstate_lam_cr3_mask(void)
> -{
> -	return 0;
> -}
> -
> -static inline void set_tlbstate_lam_cr3_mask(u64 mask)
> -{
> -}
> -#endif
> -
>  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  			struct task_struct *tsk)
>  {
> @@ -581,7 +556,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, 
> struct mm_struct *next,
>  	 * provides that full memory barrier and core serializing
>  	 * instruction.
>  	 */
> -	if (real_prev == next && prev_lam == new_lam) {
> +	if (real_prev == next) {
>  		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
>  			   next->context.ctx_id);
> 
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR
  2022-07-02 23:55           ` Andy Lutomirski
@ 2022-07-04 13:43             ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-07-04 13:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List

On Sat, Jul 02, 2022 at 04:55:40PM -0700, Andy Lutomirski wrote:
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index 427ebef3f64b..cd2b03fe94c4 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -745,15 +745,16 @@ static long prctl_map_vdso(const struct 
> > vdso_image *image, unsigned long addr)
> >  static void enable_lam_func(void *mm)
> >  {
> >  	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
> > +	unsigned long lam_mask;
> > 
> >  	if (loaded_mm != mm)
> >  		return;
> > 
> > -	/* Counterpart of smp_wmb() in prctl_enable_tagged_addr() */
> > -	smp_rmb();
> > +	lam_mask = READ_ONCE(loaded_mm->context.lam_cr3_mask);
> > 
> >  	/* Update CR3 to get LAM active on the CPU */
> > -	switch_mm(loaded_mm, loaded_mm, current);
> > +	write_cr3(__read_cr3() | lam_mask);
> 
> Perhaps this should also mask off the old LAM mask?

So far LAM enabling is one-way operation, so it should be fine.
But I think masking off is good idea to avoid problems in the future.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check
  2022-06-30  2:38       ` Andy Lutomirski
@ 2022-07-05  0:13         ` Kirill A. Shutemov
  0 siblings, 0 replies; 67+ messages in thread
From: Kirill A. Shutemov @ 2022-07-05  0:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Peter Zijlstra (Intel),
	the arch/x86 maintainers, kcc, ryabinin.a.a, andreyknvl, glider,
	dvyukov, H.J. Lu, Andi Kleen, Rick P Edgecombe, linux-mm,
	Linux Kernel Mailing List

On Wed, Jun 29, 2022 at 07:38:20PM -0700, Andy Lutomirski wrote:
> 
> 
> On Tue, Jun 28, 2022, at 5:42 PM, Kirill A. Shutemov wrote:
> > On Tue, Jun 28, 2022 at 04:40:48PM -0700, Andy Lutomirski wrote:
> >> On 6/10/22 07:35, Kirill A. Shutemov wrote:
> >> > untagged_addr() is a helper used by the core-mm to strip tag bits and
> >> > get the address to the canonical shape. In only handles userspace
> >> > addresses. The untagging mask is stored in mmu_context and will be set
> >> > on enabling LAM for the process.
> >> > 
> >> > The tags must not be included into check whether it's okay to access the
> >> > userspace address.
> >> > 
> >> > Strip tags in access_ok().
> >> 
> >> What is the intended behavior for an access that spans a tag boundary?
> >
> > You mean if 'addr' passed to access_ok() is below 47- or 56-bit but 'addr'
> > + 'size' overflows into tags? If is not valid access and the current
> > implementation works correctly.
> >
> >> Also, at the risk of a potentially silly question, why do we need to strip
> >> the tag before access_ok()?  With LAM, every valid tagged user address is
> >> also a valid untagged address, right?  (There is no particular need to
> >> enforce the actual value of TASK_SIZE_MAX on *access*, just on mmap.)
> >> 
> >> IOW, wouldn't it be sufficient, and probably better than what we have now,
> >> to just check that the entire range has the high bit clear?
> >
> > No. We do things to addresses on kernel side beyond dereferencing them.
> > Like comparing addresses have to make sense. find_vma() has to find
> > relevant mapping and so on. 
> 
> I think you’re misunderstanding me. Of course things like find_vma()
> need to untag the address. (But things like munmap, IMO, ought not
> accept tagged addresses.)
> 
> But I’m not talking about that at all. I’m asking why we need to untag
> an address for access_ok.  In the bad old days, access_ok checked that a
> range was below a *variable* cutoff. But set_fs() is gone, and I don’t
> think this is needed any more.
> 
> With some off-the-cuff bit hackery, I think it ought to be sufficient to
> calculate addr+len and fail if the overflow or sign bits get set. If LAM
> is off, we could calculate addr | len and fail if either of the top two
> bits is set, but LAM may not let us get away with that.  The general
> point being that, on x86 (as long as we ignore AMD’s LAM-like feature)
> an address is a user address if the top bit is clear. Whether that
> address is canonical or not or will translate or not is a separate
> issue. (And making this change would require allowing uaccess to #GP,
> which requires some care.)
> 
> What do you think?

I think it can work. Below is my first take on this. It is raw and
requires more work.

You talk about access_ok(), but I also checked what has to be done in
get_user()/put_user() path. Not sure it is a good idea (but it seems
work).

The basic idea is to OR start and end of the range and check that MSB is
clear.

I reworked array_index_mask_nospec() thing to make the address all-ones if
it is ouside of the user range. It should work to stop speculation as well
as making it zero as we do now. 

The patch as it is breaks 32-bit. It has to be handled separately.

Any comments?

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 803241dfc473..53c6f86b036f 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -45,6 +45,12 @@ static inline bool pagefault_disabled(void);
 #define untagged_ptr(mm, ptr)	(ptr)
 #endif
 
+#define __access_ok(ptr, size)						\
+({									\
+	unsigned long addr = (unsigned long)(ptr);			\
+	!((addr | (size) | addr + (size)) >> (BITS_PER_LONG - 1));	\
+})
+
 /**
  * access_ok - Checks if a user space pointer is valid
  * @addr: User space pointer to start of block to check
@@ -62,10 +68,10 @@ static inline bool pagefault_disabled(void);
  * Return: true (nonzero) if the memory block may be valid, false (zero)
  * if it is definitely invalid.
  */
-#define access_ok(addr, size)					\
+#define access_ok(addr, size)						\
 ({									\
 	WARN_ON_IN_IRQ();						\
-	likely(__access_ok(untagged_addr(current->mm, addr), size));	\
+	likely(__access_ok(addr, size));				\
 })
 
 #include <asm-generic/access_ok.h>
@@ -150,13 +156,7 @@ extern int __get_user_bad(void);
  * Return: zero on success, or -EFAULT on error.
  * On error, the variable @x is set to zero.
  */
-#define get_user(x,ptr)							\
-({									\
-	__typeof__(*(ptr)) __user *__ptr_clean;				\
-	__ptr_clean = untagged_ptr(current->mm, ptr);			\
-	might_fault();							\
-	do_get_user_call(get_user,x,__ptr_clean);			\
-})
+#define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
 
 /**
  * __get_user - Get a simple variable from user space, with less checking.
@@ -253,12 +253,7 @@ extern void __put_user_nocheck_8(void);
  *
  * Return: zero on success, or -EFAULT on error.
  */
-#define put_user(x, ptr) ({						\
-	__typeof__(*(ptr)) __user *__ptr_clean;				\
-	__ptr_clean = untagged_ptr(current->mm, ptr);			\
-	might_fault();							\
-	do_put_user_call(put_user,x,__ptr_clean);			\
-})
+#define put_user(x, ptr) ({ might_fault(); do_put_user_call(put_user,x,ptr); })
 
 /**
  * __put_user - Write a simple value into user space, with less checking.
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index b70d98d79a9d..e526ae6ff844 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -37,22 +37,13 @@
 
 #define ASM_BARRIER_NOSPEC ALTERNATIVE "", "lfence", X86_FEATURE_LFENCE_RDTSC
 
-#ifdef CONFIG_X86_5LEVEL
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rdx), \
-		    __stringify(mov $((1 << 56) - 4096 - (n)),%rdx), X86_FEATURE_LA57
-#else
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	mov $(TASK_SIZE_MAX - (n)),%_ASM_DX
-#endif
-
 	.text
 SYM_FUNC_START(__get_user_1)
-	LOAD_TASK_SIZE_MINUS_N(0)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	mov %_ASM_AX, %_ASM_DX
+	shl $1, %_ASM_DX
+	jb bad_get_user
+	sbb %_ASM_DX, %_ASM_DX
+	or %_ASM_DX, %_ASM_AX
 	ASM_STAC
 1:	movzbl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -62,11 +53,13 @@ SYM_FUNC_END(__get_user_1)
 EXPORT_SYMBOL(__get_user_1)
 
 SYM_FUNC_START(__get_user_2)
-	LOAD_TASK_SIZE_MINUS_N(1)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	mov %_ASM_AX, %_ASM_DX
+	add $1, %_ASM_DX
+	or %_ASM_AX, %_ASM_DX
+	shl $1, %_ASM_DX
+	jb bad_get_user
+	sbb %_ASM_DX, %_ASM_DX
+	or %_ASM_DX, %_ASM_AX
 	ASM_STAC
 2:	movzwl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -76,11 +69,13 @@ SYM_FUNC_END(__get_user_2)
 EXPORT_SYMBOL(__get_user_2)
 
 SYM_FUNC_START(__get_user_4)
-	LOAD_TASK_SIZE_MINUS_N(3)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	mov %_ASM_AX, %_ASM_DX
+	add $3, %_ASM_DX
+	or %_ASM_AX, %_ASM_DX
+	shl $1, %_ASM_DX
+	jb bad_get_user
+	sbb %_ASM_DX, %_ASM_DX
+	or %_ASM_DX, %_ASM_AX
 	ASM_STAC
 3:	movl (%_ASM_AX),%edx
 	xor %eax,%eax
@@ -91,22 +86,26 @@ EXPORT_SYMBOL(__get_user_4)
 
 SYM_FUNC_START(__get_user_8)
 #ifdef CONFIG_X86_64
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	mov %_ASM_AX, %_ASM_DX
+	add $7, %_ASM_DX
+	or %_ASM_AX, %_ASM_DX
+	shl $1, %_ASM_DX
+	jb bad_get_user
+	sbb %_ASM_DX, %_ASM_DX
+	or %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movq (%_ASM_AX),%rdx
 	xor %eax,%eax
 	ASM_CLAC
 	RET
 #else
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_DX,%_ASM_AX
-	jae bad_get_user_8
-	sbb %_ASM_DX, %_ASM_DX		/* array_index_mask_nospec() */
-	and %_ASM_DX, %_ASM_AX
+	mov %_ASM_AX, %_ASM_DX
+	add $7, %_ASM_DX
+	or %_ASM_AX, %_ASM_DX
+	shl $1, %_ASM_DX
+	jb bad_get_user
+	sbb %_ASM_DX, %_ASM_DX
+	or %_ASM_DX, %_ASM_AX
 	ASM_STAC
 4:	movl (%_ASM_AX),%edx
 5:	movl 4(%_ASM_AX),%ecx
diff --git a/arch/x86/lib/putuser.S b/arch/x86/lib/putuser.S
index b7dfd60243b7..eb7c4396cb1e 100644
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -33,20 +33,11 @@
  * as they get called from within inline assembly.
  */
 
-#ifdef CONFIG_X86_5LEVEL
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	ALTERNATIVE __stringify(mov $((1 << 47) - 4096 - (n)),%rbx), \
-		    __stringify(mov $((1 << 56) - 4096 - (n)),%rbx), X86_FEATURE_LA57
-#else
-#define LOAD_TASK_SIZE_MINUS_N(n) \
-	mov $(TASK_SIZE_MAX - (n)),%_ASM_BX
-#endif
-
 .text
 SYM_FUNC_START(__put_user_1)
-	LOAD_TASK_SIZE_MINUS_N(0)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	mov %_ASM_CX, %_ASM_BX
+	shl $1, %_ASM_BX
+	jb .Lbad_put_user
 SYM_INNER_LABEL(__put_user_nocheck_1, SYM_L_GLOBAL)
 	ENDBR
 	ASM_STAC
@@ -59,9 +50,11 @@ EXPORT_SYMBOL(__put_user_1)
 EXPORT_SYMBOL(__put_user_nocheck_1)
 
 SYM_FUNC_START(__put_user_2)
-	LOAD_TASK_SIZE_MINUS_N(1)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	mov %_ASM_CX, %_ASM_BX
+	add $1, %_ASM_BX
+	or %_ASM_CX, %_ASM_BX
+	shl $1, %_ASM_BX
+	jb .Lbad_put_user
 SYM_INNER_LABEL(__put_user_nocheck_2, SYM_L_GLOBAL)
 	ENDBR
 	ASM_STAC
@@ -74,9 +67,11 @@ EXPORT_SYMBOL(__put_user_2)
 EXPORT_SYMBOL(__put_user_nocheck_2)
 
 SYM_FUNC_START(__put_user_4)
-	LOAD_TASK_SIZE_MINUS_N(3)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	mov %_ASM_CX, %_ASM_BX
+	add $3, %_ASM_BX
+	or %_ASM_CX, %_ASM_BX
+	shl $1, %_ASM_BX
+	jb .Lbad_put_user
 SYM_INNER_LABEL(__put_user_nocheck_4, SYM_L_GLOBAL)
 	ENDBR
 	ASM_STAC
@@ -89,9 +84,11 @@ EXPORT_SYMBOL(__put_user_4)
 EXPORT_SYMBOL(__put_user_nocheck_4)
 
 SYM_FUNC_START(__put_user_8)
-	LOAD_TASK_SIZE_MINUS_N(7)
-	cmp %_ASM_BX,%_ASM_CX
-	jae .Lbad_put_user
+	mov %_ASM_CX, %_ASM_BX
+	add $7, %_ASM_BX
+	or %_ASM_CX, %_ASM_BX
+	shl $1, %_ASM_BX
+	jb .Lbad_put_user
 SYM_INNER_LABEL(__put_user_nocheck_8, SYM_L_GLOBAL)
 	ENDBR
 	ASM_STAC
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2022-07-05  0:13 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 14:35 [PATCHv3 0/8] Linear Address Masking enabling Kirill A. Shutemov
2022-06-10 14:35 ` [PATCHv3 1/8] x86/mm: Fix CR3_ADDR_MASK Kirill A. Shutemov
2022-06-10 23:32   ` Edgecombe, Rick P
2022-06-10 14:35 ` [PATCHv3 2/8] x86: CPUID and CR3/CR4 flags for Linear Address Masking Kirill A. Shutemov
2022-06-10 14:35 ` [PATCHv3 3/8] mm: Pass down mm_struct to untagged_addr() Kirill A. Shutemov
2022-06-10 23:33   ` Edgecombe, Rick P
2022-06-17 15:27   ` Alexander Potapenko
2022-06-17 22:38     ` Kirill A. Shutemov
2022-06-10 14:35 ` [PATCHv3 4/8] x86/mm: Handle LAM on context switch Kirill A. Shutemov
2022-06-10 23:55   ` Edgecombe, Rick P
2022-06-15 15:54     ` Kirill A. Shutemov
2022-06-16  9:08   ` Peter Zijlstra
2022-06-16 16:40     ` Kirill A. Shutemov
2022-06-17 15:35   ` Alexander Potapenko
2022-06-17 22:39     ` Kirill A. Shutemov
2022-06-28 23:33   ` Andy Lutomirski
2022-06-29  0:34     ` Kirill A. Shutemov
2022-06-30  1:51       ` Andy Lutomirski
2022-06-10 14:35 ` [PATCHv3 5/8] x86/uaccess: Provide untagged_addr() and remove tags before address check Kirill A. Shutemov
2022-06-13 17:36   ` Edgecombe, Rick P
2022-06-15 16:58     ` Kirill A. Shutemov
2022-06-15 19:06       ` Edgecombe, Rick P
2022-06-16  9:30     ` Peter Zijlstra
2022-06-16 16:44       ` Kirill A. Shutemov
2022-06-17 11:36         ` Peter Zijlstra
2022-06-17 14:22           ` H.J. Lu
2022-06-17 14:28             ` Peter Zijlstra
2022-06-16  9:34     ` Peter Zijlstra
2022-06-16 10:02   ` Peter Zijlstra
2022-06-16 16:48     ` Kirill A. Shutemov
2022-06-28 23:40   ` Andy Lutomirski
2022-06-29  0:42     ` Kirill A. Shutemov
2022-06-30  2:38       ` Andy Lutomirski
2022-07-05  0:13         ` Kirill A. Shutemov
2022-06-10 14:35 ` [PATCHv3 6/8] x86/mm: Provide ARCH_GET_UNTAG_MASK and ARCH_ENABLE_TAGGED_ADDR Kirill A. Shutemov
2022-06-10 15:25   ` Edgecombe, Rick P
2022-06-10 18:04     ` Kirill A. Shutemov
2022-06-10 16:16   ` Edgecombe, Rick P
2022-06-10 18:06     ` Kirill A. Shutemov
2022-06-10 18:08       ` Edgecombe, Rick P
2022-06-10 22:18         ` Edgecombe, Rick P
2022-06-11  1:12           ` Kirill A. Shutemov
2022-06-11  2:36             ` Edgecombe, Rick P
2022-06-12 21:03           ` Andy Lutomirski
2022-06-16  9:44             ` Peter Zijlstra
2022-06-16 16:54               ` Kirill A. Shutemov
2022-06-30  2:04                 ` Andy Lutomirski
2022-06-13 14:42   ` Michal Hocko
2022-06-16 17:05     ` Kirill A. Shutemov
2022-06-19 23:40       ` Kirill A. Shutemov
2022-06-16  9:39   ` Peter Zijlstra
2022-06-28 23:42   ` Andy Lutomirski
2022-06-29  0:53     ` Kirill A. Shutemov
2022-06-30  2:29       ` Andy Lutomirski
2022-07-01 15:38         ` Kirill A. Shutemov
2022-07-02 23:55           ` Andy Lutomirski
2022-07-04 13:43             ` Kirill A. Shutemov
2022-06-10 14:35 ` [PATCHv3 7/8] x86: Expose untagging mask in /proc/$PID/arch_status Kirill A. Shutemov
2022-06-10 15:24   ` Dave Hansen
2022-06-11  1:28     ` Kirill A. Shutemov
2022-06-27 12:00       ` Catalin Marinas
2022-06-10 14:35 ` [PATCHv3 OPTIONAL 8/8] x86/mm: Extend LAM to support to LAM_U48 Kirill A. Shutemov
2022-06-16 10:00   ` Peter Zijlstra
2022-06-10 20:22 ` [PATCHv3 0/8] Linear Address Masking enabling Kostya Serebryany
2022-06-16 22:52 ` Edgecombe, Rick P
2022-06-16 23:43   ` Kirill A. Shutemov
2022-06-16 23:48     ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).