linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2)
@ 2017-02-01 23:24 Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 1/7] x86, mpx: introduce per-mm MPX table size tracking Dave Hansen
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen

Changes from v1:
 * Added selftests support for this feature
 * Removed "mawa" nomenclature from all variables and functions
 * Added patch to cram new "mawa" value into existing MPX space
   in mmu_context_t
 * Optimize the switch_mpx_bd() code with a likely().  We will
   need to do a bit more analysis here to see what the cheapest
   way to do this is.

--

Kirill is chugging right along getting his 5-level paging[1] patch set
ready to be merged.  I figured I'd share an early draft of the MPX
support that will to go along with it.

Background: there is a lot more detail about what bounds tables are in
the changelog for fe3d197f843.  But, basically MPX bounds tables help
us to store the ranges to which a pointer is allowed to point.  The
tables are walked by hardware and they are indexed by the virtual
address of the pointer being checked.

A larger virtual address space (from 5-level paging) means that we
need larger tables.  5-level paging hardware includes a feature called
MPX Address-Width Adjust (MAWA) that grows the bounds tables so they
can address the new address space.  MAWA is controlled independently
from the paging mode (via an MSR) so that old MPX binaries can run on
new hardware and kernels supporting 5-level paging.

But, since userspace is responsible for allocating the table that is
growing (the directory), we need to ensure that userspace and the
kernel agree about the size of these tables and the kernel can set the
MSR appropriately.

These are not quite ready to get applied anywhere, but I don't expect
the basics to change unless folks have big problems with this.  The
only big remaining piece of work is to update the MPX selftest code.

Dave Hansen (7):
      x86, mpx: introduce per-mm MPX table size tracking
      x86, mpx: update MPX to grok larger bounds tables
      x86, mpx: extend MPX prctl() to pass in size of bounds directory
      x86, mpx: context-switch new MPX address size MSR
      x86, mpx: shrink per-mm MPX data
      x86, mpx, selftests: Use prctl header instead of magic numbers
      x86, mpx: update MPX selftest to test larger bounds dir

 arch/x86/include/asm/mmu.h                  |   9 +-
 arch/x86/include/asm/mpx.h                  |  77 ++++++++--
 arch/x86/include/asm/msr-index.h            |   1 +
 arch/x86/include/asm/processor.h            |   6 +-
 arch/x86/mm/mpx.c                           | 100 +++++++++++--
 arch/x86/mm/tlb.c                           |  46 ++++++
 kernel/sys.c                                |   6 +-
 tools/testing/selftests/x86/mpx-hw.h        |  23 ++-
 tools/testing/selftests/x86/mpx-mini-test.c | 156 +++++++++++++++-----
 9 files changed, 349 insertions(+), 75 deletions(-)

1. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 1/7] x86, mpx: introduce per-mm MPX table size tracking
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables Dave Hansen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


Larger address spaces mean larger MPX bounds table sizes.  This
tracks which size tables we are using.

"MAWA" is what the hardware documentation calls this feature: MPX
Address-Width Adjust.

---

 b/arch/x86/include/asm/mmu.h |    1 +
 b/arch/x86/include/asm/mpx.h |    6 ++++++
 2 files changed, 7 insertions(+)

diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h
--- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa	2017-02-01 15:12:15.699124140 -0800
+++ b/arch/x86/include/asm/mmu.h	2017-02-01 15:12:15.702124275 -0800
@@ -34,6 +34,7 @@ typedef struct {
 #ifdef CONFIG_X86_INTEL_MPX
 	/* address of the bounds directory */
 	void __user *bd_addr;
+	int mpx_bd_shift;
 #endif
 } mm_context_t;
 
diff -puN arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa	2017-02-01 15:12:15.700124185 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-02-01 15:12:15.702124275 -0800
@@ -68,6 +68,12 @@ static inline void mpx_mm_init(struct mm
 	 * directory, so point this at an invalid address.
 	 */
 	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
+	/*
+	 * All processes start out in "legacy" MPX mode with
+	 * the old bounds directory size.  This corresponds to
+	 * what the specs call MAWA=0.
+	 */
+	mm->context.mpx_bd_shift = 0;
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 1/7] x86, mpx: introduce per-mm MPX table size tracking Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-12 19:05   ` Thomas Gleixner
  2017-02-01 23:24 ` [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory Dave Hansen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


The MPX code in the kernel needs to walk these tables in order to
populate them on demand as well as unmap them when memory is
freed.  A larger virtual address space means larger MPX bounds
tables.

Update the bounds table walking code to understand how to walk
the larger table size.  We use the new per-mm "mpx_bd_shift"
value to determine which format to use.

The mpx_bd_size_shift() function looks like a useless abstraction
here.  But, the new 'mpx_bd_shift' field will get packed into a
single bit in 'bd_addr' later.  Keep the abstraction in place now
to make the series simpler and make it more obvious when the
complexity comes from the packing rather than the actual data
etself.

---

 b/arch/x86/include/asm/mpx.h |   27 +++++++++++++++++++++------
 b/arch/x86/mm/mpx.c          |   30 ++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 14 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes	2017-02-01 15:12:16.114142809 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-02-01 15:12:16.119143034 -0800
@@ -14,15 +14,30 @@
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 /*
- * The upper 28 bits [47:20] of the virtual address in 64-bit
- * are used to index into bounds directory (BD).
+ * The uppermost bits [56:20] of the virtual address in 64-bit
+ * are used to index into bounds directory (BD).  On processors
+ * with support for smaller virtual address space size, the "56"
+ * is obviously smaller.
  *
- * The directory is 2G (2^31) in size, and with 8-byte entries
- * it has 2^28 entries.
+ * When using 47-bit virtual addresses, the directory is 2G
+ * (2^31) bytes in size, and with 8-byte entries it has 2^28
+ * entries.  With 56-bit virtual addresses, it goes to 1T in size
+ * and has 2^37 entries.
+ *
+ * Needs to be ULL so we can use this in 32-bit kernels without
+ * warnings.
  */
-#define MPX_BD_SIZE_BYTES_64	(1UL<<31)
+#define MPX_BD_BASE_SIZE_BYTES_64	(1ULL<<31)
 #define MPX_BD_ENTRY_BYTES_64	8
-#define MPX_BD_NR_ENTRIES_64	(MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
+/*
+ * Note: size of tables on 64-bit is not constant, so we have no
+ * fixed definition for MPX_BD_NR_ENTRIES_64.
+ *
+ * The 5-Level Paging Whitepaper says:  "A bound directory
+ * comprises 2^(28+MAWA) 64-bit entries."  Since MAWA=0 in
+ * legacy mode:
+ */
+#define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
diff -puN arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes	2017-02-01 15:12:16.115142854 -0800
+++ b/arch/x86/mm/mpx.c	2017-02-01 15:12:16.119143034 -0800
@@ -20,12 +20,21 @@
 #define CREATE_TRACE_POINTS
 #include <asm/trace/mpx.h>
 
+static inline int mpx_bd_size_shift(struct mm_struct *mm)
+{
+	return mm->context.mpx_bd_shift;
+}
+
 static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
 {
-	if (is_64bit_mm(mm))
-		return MPX_BD_SIZE_BYTES_64;
-	else
+	if (!is_64bit_mm(mm))
 		return MPX_BD_SIZE_BYTES_32;
+
+	/*
+	 * The bounds directory grows with the address space size.
+	 * The "legacy" shift is 0.
+	 */
+	return MPX_BD_BASE_SIZE_BYTES_64 << mpx_bd_shift_shift(mm);
 }
 
 static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm)
@@ -724,6 +733,7 @@ static inline unsigned long bd_entry_vir
 {
 	unsigned long long virt_space;
 	unsigned long long GB = (1ULL << 30);
+	unsigned long legacy_64bit_vaddr_bits = 48;
 
 	/*
 	 * This covers 32-bit emulation as well as 32-bit kernels
@@ -733,12 +743,16 @@ static inline unsigned long bd_entry_vir
 		return (4ULL * GB) / MPX_BD_NR_ENTRIES_32;
 
 	/*
-	 * 'x86_virt_bits' returns what the hardware is capable
-	 * of, and returns the full >32-bit address space when
-	 * running 32-bit kernels on 64-bit hardware.
+	 * With 5-level paging, the virtual address space size
+	 * gets bigger.  A bounds directory entry still points to
+	 * a single bounds table and the *tables* stay the same
+	 * size.  Thus, the address space that a directory entry
+	 * covers does not change based on the paging mode or the
+	 * size of the bounds directory itself.  Just use the
+	 * legacy size.
 	 */
-	virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
-	return virt_space / MPX_BD_NR_ENTRIES_64;
+	virt_space = (1ULL << legacy_64bit_vaddr_bits);
+	return virt_space / MPX_BD_LEGACY_NR_ENTRIES_64;
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 1/7] x86, mpx: introduce per-mm MPX table size tracking Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-12 19:15   ` Thomas Gleixner
  2017-02-01 23:24 ` [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR Dave Hansen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


The MPX bounds tables are indexed by virtual address.  A larger virtual
address space means that we need larger tables.  But, we need to ensure
that userspace and the kernel agree about the size of these tables.

To do this, we require that userspace passes in the size of the tables
if they want a non-legacy size.  They do this with a previously unused
(required to be 0) argument to the PR_MPX_ENABLE_MANAGEMENT ptctl().

This way, the kernel can make sure that the size of the tables is
consistent with the size of the address space and can return an error
if there is a mismatch.

There are essentially 3 table sizes that matter:
1. 32-bit table sized for a 32-bit address space
2. 64-bit table sized for a 48-bit address space
3. 64-bit table sized for a 57-bit address space

We cover all three of those cases.

FIXME: we also need to ensure that we check the current state of the
larger address space opt-in.  If we've opted in to larger address spaces
we can not allow a small bounds directory to be used.  Also, if we've
not opted in, we can not allow the larger bounds directory to be used.
This can be fixed once the in-kernel API for opting in/out is settled.

---

 b/arch/x86/include/asm/mpx.h       |    7 ++++
 b/arch/x86/include/asm/processor.h |    6 ++--
 b/arch/x86/mm/mpx.c                |   54 +++++++++++++++++++++++++++++++++++--
 b/kernel/sys.c                     |    6 ++--
 4 files changed, 65 insertions(+), 8 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa	2017-02-01 15:12:16.570163322 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-02-01 15:12:16.578163682 -0800
@@ -40,6 +40,13 @@
 #define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
+ * When the hardware "MAWA" feature is enabled, we have a larger
+ * bounds directory.  There are only two sizes supported: large
+ * and small, so we only need a single value here.
+ */
+#define MPX_LARGE_BOUNDS_DIR_SHIFT 9
+
+/*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
  * entries it has 2^20 entries.
  */
diff -puN arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa arch/x86/include/asm/processor.h
--- a/arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa	2017-02-01 15:12:16.572163412 -0800
+++ b/arch/x86/include/asm/processor.h	2017-02-01 15:12:16.579163727 -0800
@@ -874,14 +874,14 @@ extern int get_tsc_mode(unsigned long ad
 extern int set_tsc_mode(unsigned int val);
 
 /* Register/unregister a process' MPX related resource */
-#define MPX_ENABLE_MANAGEMENT()	mpx_enable_management()
+#define MPX_ENABLE_MANAGEMENT(bd_size)	mpx_enable_management(bd_size)
 #define MPX_DISABLE_MANAGEMENT()	mpx_disable_management()
 
 #ifdef CONFIG_X86_INTEL_MPX
-extern int mpx_enable_management(void);
+extern int mpx_enable_management(unsigned long bd_size);
 extern int mpx_disable_management(void);
 #else
-static inline int mpx_enable_management(void)
+static inline int mpx_enable_management(unsigned long bd_size)
 {
 	return -EINVAL;
 }
diff -puN arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa	2017-02-01 15:12:16.573163457 -0800
+++ b/arch/x86/mm/mpx.c	2017-02-01 15:12:16.580163772 -0800
@@ -344,7 +344,54 @@ static __user void *mpx_get_bounds_dir(v
 		(bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK);
 }
 
-int mpx_enable_management(void)
+int mpx_set_mm_bd_size(unsigned long bd_size)
+{
+	struct mm_struct *mm = current->mm;
+
+	switch ((unsigned long long)bd_size) {
+	case 0:
+		/* Legacy call to prctl(): */
+		mm->context.mpx_bd_shift = 0;
+		return 0;
+	case MPX_BD_SIZE_BYTES_32:
+		/* 32-bit, legacy-sized bounds directory: */
+		if (is_64bit_mm(mm))
+			return -EINVAL;
+		mm->context.mpx_bd_shift = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64:
+		/* 64-bit, legacy-sized bounds directory: */
+		if (!is_64bit_mm(mm)
+		// FIXME && ! opted-in to larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_bd_shift = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_LARGE_BOUNDS_DIR_SHIFT:
+		/*
+		 * Non-legacy call, with larger directory.
+		 * Note that there is no 32-bit equivalent for
+		 * this case since its address space does not
+		 * change sizes.
+		 */
+		if (!is_64bit_mm(mm))
+			return -EINVAL;
+		/*
+		 * Do not let this be enabled unles we are on
+		 * 5-level hardware *and* have that feature
+		 * enabled. FIXME: need runtime check
+		 */
+		if (!cpu_feature_enabled(X86_FEATURE_LA57)
+		// FIXME && opted into larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_bd_shift = MPX_LARGE_BOUNDS_DIR_SHIFT;
+		return 0;
+	}
+	return -EINVAL;
+}
+
+int mpx_enable_management(unsigned long bd_size)
 {
 	void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
 	struct mm_struct *mm = current->mm;
@@ -363,10 +410,13 @@ int mpx_enable_management(void)
 	 */
 	bd_base = mpx_get_bounds_dir();
 	down_write(&mm->mmap_sem);
+	ret = mpx_set_mm_bd_size(bd_size);
+	if (ret)
+		goto out;
 	mm->context.bd_addr = bd_base;
 	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
 		ret = -ENXIO;
-
+out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
diff -puN kernel/sys.c~mawa-040-prctl-set-mawa kernel/sys.c
--- a/kernel/sys.c~mawa-040-prctl-set-mawa	2017-02-01 15:12:16.575163547 -0800
+++ b/kernel/sys.c	2017-02-01 15:12:16.580163772 -0800
@@ -92,7 +92,7 @@
 # define SET_TSC_CTL(a)		(-EINVAL)
 #endif
 #ifndef MPX_ENABLE_MANAGEMENT
-# define MPX_ENABLE_MANAGEMENT()	(-EINVAL)
+# define MPX_ENABLE_MANAGEMENT(bd_size)	(-EINVAL)
 #endif
 #ifndef MPX_DISABLE_MANAGEMENT
 # define MPX_DISABLE_MANAGEMENT()	(-EINVAL)
@@ -2246,9 +2246,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 		up_write(&me->mm->mmap_sem);
 		break;
 	case PR_MPX_ENABLE_MANAGEMENT:
-		if (arg2 || arg3 || arg4 || arg5)
+		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		error = MPX_ENABLE_MANAGEMENT();
+		error = MPX_ENABLE_MANAGEMENT(arg2);
 		break;
 	case PR_MPX_DISABLE_MANAGEMENT:
 		if (arg2 || arg3 || arg4 || arg5)
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
                   ` (2 preceding siblings ...)
  2017-02-01 23:24 ` [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-12 19:37   ` Thomas Gleixner
  2017-02-01 23:24 ` [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data Dave Hansen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


As mentioned in previous patches, larger address spaces mean
larger MPX tables.  But, the entire system is either entirely
using 5-level paging, or not.  We do not mix pagetable formats.

If the size of the MPX tables depended soley on the paging mode,
old binaries would break because the format of the tables changed
underneath them.  So, since CR4 never changes, but we need some
way to change the MPX table format, a new MSR is introduced:
MSR_IA32_MPX_LAX.

If we are in 5-level paging mode *and* the enable bit in this MSR
is set, the CPU will use the new, larger MPX bounds table format.
If 5-level paging is disabled, or the enable bit is clear, then
the legacy-style smaller tables will be used.

But, we might mix legacy and non-legacy binaries on the same
system, so this MSR needs to be context-switched.  Add code to
do this, along with some simple optimizations to skip the MSR
writes if the MSR does not need to be updated.

---

 b/arch/x86/include/asm/mpx.h       |   11 ++++++++
 b/arch/x86/include/asm/msr-index.h |    1 
 b/arch/x86/mm/mpx.c                |    5 ----
 b/arch/x86/mm/tlb.c                |   46 +++++++++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 5 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-050-context-switch-msr arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-050-context-switch-msr	2017-02-01 15:12:17.087186579 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-02-01 15:12:17.095186939 -0800
@@ -99,6 +99,11 @@ static inline void mpx_mm_init(struct mm
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
+
+static inline int mpx_bd_size_shift(struct mm_struct *mm)
+{
+	return mm->context.mpx_bd_shift;
+}
 #else
 static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 {
@@ -120,6 +125,12 @@ static inline void mpx_notify_unmap(stru
 				    unsigned long start, unsigned long end)
 {
 }
+/* Should never be called, but need stub to avoid an #ifdef */
+static inline int mpx_bd_size_shift(struct mm_struct *mm)
+{
+	WARN_ON(1);
+	return 0;
+}
 #endif /* CONFIG_X86_INTEL_MPX */
 
 #endif /* _ASM_X86_MPX_H */
diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h
--- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr	2017-02-01 15:12:17.088186624 -0800
+++ b/arch/x86/include/asm/msr-index.h	2017-02-01 15:12:17.096186984 -0800
@@ -410,6 +410,7 @@
 #define MSR_IA32_BNDCFGS		0x00000d90
 
 #define MSR_IA32_XSS			0x00000da0
+#define MSR_IA32_MPX_LAX		0x00001000
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
diff -puN arch/x86/mm/mpx.c~mawa-050-context-switch-msr arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-050-context-switch-msr	2017-02-01 15:12:17.090186714 -0800
+++ b/arch/x86/mm/mpx.c	2017-02-01 15:12:17.096186984 -0800
@@ -20,11 +20,6 @@
 #define CREATE_TRACE_POINTS
 #include <asm/trace/mpx.h>
 
-static inline int mpx_bd_size_shift(struct mm_struct *mm)
-{
-	return mm->context.mpx_bd_shift;
-}
-
 static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
 {
 	if (!is_64bit_mm(mm))
diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c
--- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr	2017-02-01 15:12:17.092186804 -0800
+++ b/arch/x86/mm/tlb.c	2017-02-01 15:12:17.097187029 -0800
@@ -9,6 +9,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/mpx.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
@@ -71,6 +72,50 @@ void switch_mm(struct mm_struct *prev, s
 	local_irq_restore(flags);
 }
 
+/*
+ * The MPX tables change sizes based on the size of the virtual
+ * (aka. linear) address space.  There is an MSR to tell the CPU
+ * whether we want the legacy-style ones or the larger ones when
+ * we are running with an eXtended virtual address space.
+ */
+static inline void switch_mpx_bd(struct mm_struct *prev, struct mm_struct *next)
+{
+	/*
+	 * Note: there is one and only one bit in use in the MSR
+	 * at this time, so we do not have to be concerned with
+	 * preserving any of the other bits.  Just write 0 or 1.
+	 */
+	u32 IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
+
+	/*
+	 * Avoid the MSR on CPUs without MPX, obviously:
+	 */
+	if (!cpu_feature_enabled(X86_FEATURE_MPX))
+		return;
+	/*
+	 * FIXME: do we want a check here for the 5-level paging
+	 * CR4 bit or CPUID bit, or is the mawa check below OK?
+	 * It's not obvious what would be the fastest or if it
+	 * matters.
+	 */
+
+	/*
+	 * Avoid the relatively costly MSR if we are not changing
+	 * MAWA state.  All processes not using MPX will have a
+	 * mpx_mawa_shift()=0, so we do not need to check
+	 * separately for whether MPX management is enabled.
+	 */
+	if (likely(mpx_bd_size_shift(prev) == mpx_bd_size_shift(next)))
+		return;
+
+	if (mpx_bd_size_shift(next)) {
+		wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0);
+	} else {
+		/* clear the enable bit: */
+		wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0);
+	}
+}
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -136,6 +181,7 @@ void switch_mm_irqs_off(struct mm_struct
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+		switch_mpx_bd(prev, next);
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
                   ` (3 preceding siblings ...)
  2017-02-01 23:24 ` [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-12 22:44   ` Thomas Gleixner
  2017-02-01 23:24 ` [RFC][PATCH 6/7] x86, mpx, selftests: Use prctl header instead of magic numbers Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 7/7] x86, mpx: update MPX selftest to test larger bounds dir Dave Hansen
  6 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


We have three pieces of data that we need to store about MPX's operation:
1. Is kernel management on/off?
2. If it's on, where is the bounds directory located?
3. If it's on, how big is the bounds directory?

We keep all this data in the mm_context_t.  Currently, #1 and #2
are stored in 'bd_addr' and #3 in 'mpx_bd_shift'.  But, the
address in 'bd_addr' must be page-aligned, so we have plenty of
space to share with things like 'mpx_bd_shift' which is a single
bit.

We rename the 'bd_addr' field to 'mpx_directory_info' since it
now has more than just the address, move the "invalid" value to a
single bit instead of -1 so it does not collide with the "large"
bit.

Note that these new bits _start_ at bit 2.  This is explained in
a comment too, but I started at bit 2 since the hardware register
(BNDCFGU) that stores a bounds directory pointer uses the two low
bits for other purposes.  Starting at bit 2 makes it much more
obvious that these bits mean very different things than the bits
in the register.

The rest of the patch is pretty mechanical.  the one exception is the
mpx_enable_management() code.  I wanted to keep it fairly tidy and
straightforward, but the logic behind mpx_set_dir_size() is pretty
messy.  It looks strange to be passing "&mm->context.mpx_directory_info"
around instead of just the mm or the mm->context, but considering the
context:

	/* Mask out the invalid bit: */
	mm->context.mpx_directory_info &= ~MPX_INVALID_BOUNDS_DIR;
	ret = mpx_set_dir_size(bd_size, &mm->context.mpx_directory_info);

I think it makes it a lot more obvious what is going on.

---

 b/arch/x86/include/asm/mmu.h |   10 +++++--
 b/arch/x86/include/asm/mpx.h |   44 +++++++++++++++++++++-------------
 b/arch/x86/mm/mpx.c          |   55 +++++++++++++++++++++++++------------------
 3 files changed, 66 insertions(+), 43 deletions(-)

diff -puN arch/x86/include/asm/mmu.h~mawa-060-onebit arch/x86/include/asm/mmu.h
--- a/arch/x86/include/asm/mmu.h~mawa-060-onebit	2017-02-01 15:12:17.598209567 -0800
+++ b/arch/x86/include/asm/mmu.h	2017-02-01 15:12:17.605209882 -0800
@@ -32,9 +32,13 @@ typedef struct {
 	s16 execute_only_pkey;
 #endif
 #ifdef CONFIG_X86_INTEL_MPX
-	/* address of the bounds directory */
-	void __user *bd_addr;
-	int mpx_bd_shift;
+	/*
+	 * The bounds directory must be page-aligned, so we store
+	 * its address in the high bits and information about its
+	 * size in some low bits.  A bit is also used to indicate
+	 * when the directory is invalid and MPX management is off.
+	 */
+	unsigned long mpx_directory_info;
 #endif
 } mm_context_t;
 
diff -puN arch/x86/include/asm/mpx.h~mawa-060-onebit arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-060-onebit	2017-02-01 15:12:17.600209657 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-02-01 15:12:17.605209882 -0800
@@ -6,10 +6,14 @@
 #include <asm/insn.h>
 
 /*
- * NULL is theoretically a valid place to put the bounds
- * directory, so point this at an invalid address.
+ * These get stored into mm_context_t->mpx_directory_info.
+ * We could theoretically use bits 0 and 1, but those are
+ * used in the BNDCFGU register that also holds the bounds
+ * directory pointer.  To avoid confusion, use different bits.
  */
-#define MPX_INVALID_BOUNDS_DIR	((void __user *)-1)
+#define MPX_INVALID_BOUNDS_DIR	(1UL<<2)
+#define MPX_LARGE_BOUNDS_DIR	(1UL<<3)
+
 #define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
@@ -44,7 +48,7 @@
  * bounds directory.  There are only two sizes supported: large
  * and small, so we only need a single value here.
  */
-#define MPX_LARGE_BOUNDS_DIR_SHIFT 9
+#define MPX_LARGE_BOUNDS_DIR_SHIFT	9
 
 /*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
@@ -79,32 +83,38 @@
 #ifdef CONFIG_X86_INTEL_MPX
 siginfo_t *mpx_generate_siginfo(struct pt_regs *regs);
 int mpx_handle_bd_fault(void);
+static inline void __user *mpx_bounds_dir_addr(struct mm_struct *mm)
+{
+	/*
+	 * The only bit that can be set in a valid bounds
+	 * directory is MPX_LARGE_BOUNDS_DIR, so only mask
+	 * it back off.
+	 */
+	return (void __user *)
+		(mm->context.mpx_directory_info & ~MPX_LARGE_BOUNDS_DIR);
+}
 static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
 {
-	return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR);
+	return (mm->context.mpx_directory_info != MPX_INVALID_BOUNDS_DIR);
 }
 static inline void mpx_mm_init(struct mm_struct *mm)
 {
 	/*
-	 * NULL is theoretically a valid place to put the bounds
-	 * directory, so point this at an invalid address.
-	 */
-	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
-	/*
-	 * All processes start out in "legacy" MPX mode with
-	 * the old bounds directory size.  This corresponds to
-	 * what the specs call MAWA=0.
+	 * MPX starts out off (invalid) and with a legacy-size
+	 * bounds directory (cleared MPX_LARGE_BOUNDS_DIR bit).
 	 */
-	mm->context.mpx_bd_shift = 0;
+	mm->context.mpx_directory_info = MPX_INVALID_BOUNDS_DIR;
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
-
 static inline int mpx_bd_size_shift(struct mm_struct *mm)
 {
-	return mm->context.mpx_bd_shift;
+	if (!kernel_managing_mpx_tables(mm))
+		return 0;
+	if (mm->context.mpx_directory_info & MPX_LARGE_BOUNDS_DIR)
+		return MPX_LARGE_BOUNDS_DIR_SHIFT;
+	return 0;
 }
-#else
 static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 {
 	return NULL;
diff -puN arch/x86/mm/mpx.c~mawa-060-onebit arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-060-onebit	2017-02-01 15:12:17.602209747 -0800
+++ b/arch/x86/mm/mpx.c	2017-02-01 15:12:17.606209927 -0800
@@ -339,29 +339,31 @@ static __user void *mpx_get_bounds_dir(v
 		(bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK);
 }
 
-int mpx_set_mm_bd_size(unsigned long bd_size)
+int mpx_set_dir_size(unsigned long bd_size, unsigned long *mpx_directory_info)
 {
 	struct mm_struct *mm = current->mm;
+	int ret = 0;
+	bool large_dir = false;
 
 	switch ((unsigned long long)bd_size) {
 	case 0:
-		/* Legacy call to prctl(): */
-		mm->context.mpx_bd_shift = 0;
-		return 0;
+		/* Legacy call to prctl() */
+		break;
 	case MPX_BD_SIZE_BYTES_32:
 		/* 32-bit, legacy-sized bounds directory: */
-		if (is_64bit_mm(mm))
-			return -EINVAL;
-		mm->context.mpx_bd_shift = 0;
-		return 0;
+		if (is_64bit_mm(mm)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = 0;
+		break;
 	case MPX_BD_BASE_SIZE_BYTES_64:
 		/* 64-bit, legacy-sized bounds directory: */
 		if (!is_64bit_mm(mm)
 		// FIXME && ! opted-in to larger address space
 		)
-			return -EINVAL;
-		mm->context.mpx_bd_shift = 0;
-		return 0;
+			ret = -EINVAL;
+		break;
 	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_LARGE_BOUNDS_DIR_SHIFT:
 		/*
 		 * Non-legacy call, with larger directory.
@@ -370,7 +372,7 @@ int mpx_set_mm_bd_size(unsigned long bd_
 		 * change sizes.
 		 */
 		if (!is_64bit_mm(mm))
-			return -EINVAL;
+			ret = -EINVAL;
 		/*
 		 * Do not let this be enabled unles we are on
 		 * 5-level hardware *and* have that feature
@@ -379,16 +381,20 @@ int mpx_set_mm_bd_size(unsigned long bd_
 		if (!cpu_feature_enabled(X86_FEATURE_LA57)
 		// FIXME && opted into larger address space
 		)
-			return -EINVAL;
-		mm->context.mpx_bd_shift = MPX_LARGE_BOUNDS_DIR_SHIFT;
-		return 0;
+			ret = -EINVAL;
+		if (ret)
+			break;
+		large_dir = true;
+		break;
 	}
-	return -EINVAL;
+	if (large_dir)
+		(*mpx_directory_info) |= MPX_LARGE_BOUNDS_DIR;
+	return ret;
 }
 
 int mpx_enable_management(unsigned long bd_size)
 {
-	void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
+	void __user *bd_base;
 	struct mm_struct *mm = current->mm;
 	int ret = 0;
 
@@ -404,13 +410,16 @@ int mpx_enable_management(unsigned long
 	 * unmap path; we can just use mm->context.bd_addr instead.
 	 */
 	bd_base = mpx_get_bounds_dir();
+	if (bd_base == MPX_INVALID_BOUNDS_DIR)
+		return -ENXIO;
+
 	down_write(&mm->mmap_sem);
-	ret = mpx_set_mm_bd_size(bd_size);
+	/* Mask out the invalid bit: */
+	mm->context.mpx_directory_info &= ~MPX_INVALID_BOUNDS_DIR;
+	ret = mpx_set_dir_size(bd_size, &mm->context.mpx_directory_info);
 	if (ret)
 		goto out;
-	mm->context.bd_addr = bd_base;
-	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
-		ret = -ENXIO;
+	mm->context.mpx_directory_info |= bd_base;
 out:
 	up_write(&mm->mmap_sem);
 	return ret;
@@ -424,7 +433,7 @@ int mpx_disable_management(void)
 		return -ENXIO;
 
 	down_write(&mm->mmap_sem);
-	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
+	mm->context.mpx_directory_info = MPX_INVALID_BOUNDS_DIR;
 	up_write(&mm->mmap_sem);
 	return 0;
 }
@@ -1006,7 +1015,7 @@ static int try_unmap_single_bt(struct mm
 		end = bta_end_vaddr;
 	}
 
-	bde_vaddr = mm->context.bd_addr + mpx_get_bd_entry_offset(mm, start);
+	bde_vaddr = mpx_bounds_dir_addr(mm) + mpx_get_bd_entry_offset(mm, start);
 	ret = get_bt_addr(mm, bde_vaddr, &bt_addr);
 	/*
 	 * No bounds table there, so nothing to unmap.
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 6/7] x86, mpx, selftests: Use prctl header instead of magic numbers
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
                   ` (4 preceding siblings ...)
  2017-02-01 23:24 ` [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  2017-02-01 23:24 ` [RFC][PATCH 7/7] x86, mpx: update MPX selftest to test larger bounds dir Dave Hansen
  6 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


I got away with just hard-coding the prctl() numbers in the MPX
selftests.  Include the kernel header so we can just use the
symbolic names.

---

 b/tools/testing/selftests/x86/mpx-mini-test.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff -puN tools/testing/selftests/x86/mpx-mini-test.c~mawa-068-selftests-inc tools/testing/selftests/x86/mpx-mini-test.c
--- a/tools/testing/selftests/x86/mpx-mini-test.c~mawa-068-selftests-inc	2017-02-01 15:12:18.083231385 -0800
+++ b/tools/testing/selftests/x86/mpx-mini-test.c	2017-02-01 15:12:18.087231565 -0800
@@ -40,6 +40,8 @@ int zap_all_every_this_many_mallocs = 10
 #include "mpx-debug.h"
 #include "mpx-mm.h"
 
+#include "../../../../include/uapi/linux/prctl.h"
+
 #ifndef __always_inline
 #define __always_inline inline __attribute__((always_inline)
 #endif
@@ -666,7 +668,7 @@ bool process_specific_init(void)
 	check_clear(dir, size);
 	enable_mpx(dir);
 	check_clear(dir, size);
-	if (prctl(43, 0, 0, 0, 0)) {
+	if (prctl(PR_MPX_ENABLE_MANAGEMENT, 0, 0, 0, 0)) {
 		printf("no MPX support\n");
 		abort();
 		return false;
@@ -676,7 +678,7 @@ bool process_specific_init(void)
 
 bool process_specific_finish(void)
 {
-	if (prctl(44)) {
+	if (prctl(PR_MPX_DISABLE_MANAGEMENT)) {
 		printf("no MPX support\n");
 		return false;
 	}
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 7/7] x86, mpx: update MPX selftest to test larger bounds dir
  2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
                   ` (5 preceding siblings ...)
  2017-02-01 23:24 ` [RFC][PATCH 6/7] x86, mpx, selftests: Use prctl header instead of magic numbers Dave Hansen
@ 2017-02-01 23:24 ` Dave Hansen
  6 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2017-02-01 23:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, kirill.shutemov, Dave Hansen


Since the bounds directory is changing its size, we also need to
update userspace to allocate a larger one.

This adds support to the MPX selftests to detect hardware where
we need a larger bounds directory and attempts to enable MPX
support for the larger directory.

The messiest thing here is that the hardware will not claim to
*have* a larger bounds directory until after we've enabled MPX.
But, that's after we needed to have allocated the bounds
directory.  In other words, we can't use the hardware's bounds
table size enumeration (MAWA) to tell us how large the directory
should be.

---

 b/tools/testing/selftests/x86/mpx-hw.h        |   23 +++
 b/tools/testing/selftests/x86/mpx-mini-test.c |  154 ++++++++++++++++++++------
 2 files changed, 140 insertions(+), 37 deletions(-)

diff -puN tools/testing/selftests/x86/mpx-hw.h~mawa-070-mpx-selftests tools/testing/selftests/x86/mpx-hw.h
--- a/tools/testing/selftests/x86/mpx-hw.h~mawa-070-mpx-selftests	2017-02-01 15:12:18.512250684 -0800
+++ b/tools/testing/selftests/x86/mpx-hw.h	2017-02-01 15:12:18.518250953 -0800
@@ -32,7 +32,8 @@
 #define MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES	32
 #define MPX_BOUNDS_TABLE_SIZE_BYTES		(1ULL << 22) /* 4MB */
 #define MPX_BOUNDS_DIR_ENTRY_SIZE_BYTES		8
-#define MPX_BOUNDS_DIR_SIZE_BYTES		(1ULL << 31) /* 2GB */
+#define MPX_LEGACY_BOUNDS_DIR_SIZE_BYTES	(1ULL << 31) /* 2GB */
+#define MPX_LA57_BOUNDS_DIR_SIZE_BYTES		(1ULL << 40) /* 1TB */
 
 #define MPX_BOUNDS_TABLE_BOTTOM_BIT		3
 #define MPX_BOUNDS_TABLE_TOP_BIT		19
@@ -41,8 +42,23 @@
 
 #endif
 
+/* What size should we allocate for the bounds directory? */
+extern unsigned long long mpx_bounds_dir_alloc_size_bytes(void);
+/*
+ * How large is the hardware currently expecting the bounds
+ * directory to be?
+ *
+ * Note: We have to *tell* the hardware when we want it to use
+ * a larger bounds directory.  Until that point, this will
+ * return the smaller "legacy" value.  But, we *allocate* the
+ * directory before well tell the hardware what size we want
+ * it to be.  So, we need to separate the concepts and have two
+ * different functions.
+ */
+extern unsigned long long mpx_bounds_dir_hw_size_bytes(void);
+
 #define MPX_BOUNDS_DIR_NR_ENTRIES	\
-	(MPX_BOUNDS_DIR_SIZE_BYTES/MPX_BOUNDS_DIR_ENTRY_SIZE_BYTES)
+	(mpx_bounds_dir_hw_size_bytes()/MPX_BOUNDS_DIR_ENTRY_SIZE_BYTES)
 #define MPX_BOUNDS_TABLE_NR_ENTRIES	\
 	(MPX_BOUNDS_TABLE_SIZE_BYTES/MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES)
 
@@ -63,7 +79,8 @@ struct mpx_bt_entry {
 } __attribute__((packed));
 
 struct mpx_bounds_dir {
-	struct mpx_bd_entry entries[MPX_BOUNDS_DIR_NR_ENTRIES];
+	/* This is a variable size array: */
+	struct mpx_bd_entry entries[0];
 } __attribute__((packed));
 
 struct mpx_bounds_table {
diff -puN tools/testing/selftests/x86/mpx-mini-test.c~mawa-070-mpx-selftests tools/testing/selftests/x86/mpx-mini-test.c
--- a/tools/testing/selftests/x86/mpx-mini-test.c~mawa-070-mpx-selftests	2017-02-01 15:12:18.514250773 -0800
+++ b/tools/testing/selftests/x86/mpx-mini-test.c	2017-02-01 15:12:18.518250953 -0800
@@ -462,6 +462,72 @@ static inline void cpuid_count(unsigned
 }
 
 #define XSTATE_CPUID	    0x0000000d
+#define CPUID_MAWA_LEAF	    0x00000007
+#define CPUID_MAWA_SUBLEAF  0x00000000
+#define CPUID_MAWA_BOTTOM_BIT	17
+#define CPUID_MAWA_TOP_BIT	21
+
+/*
+ * On CPUs supporting 5-level paging with a larger virtual address
+ * space, the bounds directory is also larger.  The mechanism to
+ * grow the bounds directory is called "MPX Address-Width Adjust"
+ * (MAWA) and its presence is enumerated via CPUID.
+ */
+static inline int bd_size_shift(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	unsigned int shift;
+
+	cpuid_count(CPUID_MAWA_LEAF, CPUID_MAWA_SUBLEAF,
+			&eax, &ebx, &ecx, &edx);
+
+	shift = ecx;
+	shift >>= CPUID_MAWA_BOTTOM_BIT;
+	shift &= (1U << (CPUID_MAWA_TOP_BIT - CPUID_MAWA_BOTTOM_BIT)) - 1;
+
+	return shift;
+}
+
+#define CPUID_LA57_LEAF		0x00000007
+#define CPUID_LA57_SUBLEAF	0x00000000
+#define CPUID_LA57_ECX_MASK	(1UL << 16)
+
+/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
+static inline int cpu_supports_lax(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+
+	cpuid_count(CPUID_LA57_LEAF, CPUID_LA57_SUBLEAF,
+			&eax, &ebx, &ecx, &edx);
+
+	return !!(ecx & CPUID_LA57_ECX_MASK);
+}
+
+unsigned long long mpx_bounds_dir_hw_size_bytes(void)
+{
+#ifdef __i386__
+	/* 32-bit has a fixed size directory: */
+	return MPX_BOUNDS_DIR_SIZE_BYTES;
+#else
+	/*
+	 * 64-bit depends on what mode the hardware is in.
+	 * Are we in LA57 mode, and has the kernel set up
+	 * the "MAWA" MSR for us?
+	 */
+	return MPX_LEGACY_BOUNDS_DIR_SIZE_BYTES << bd_size_shift();
+#endif
+}
+
+unsigned long long mpx_bounds_dir_alloc_size_bytes(void)
+{
+#ifdef __i386__
+	return mpx_bounds_dir_hw_size_bytes();
+#else
+	if (cpu_supports_lax())
+		return MPX_LA57_BOUNDS_DIR_SIZE_BYTES;
+	return MPX_LEGACY_BOUNDS_DIR_SIZE_BYTES;
+#endif
+}
 
 /*
  * List of XSAVE features Linux knows about:
@@ -601,7 +667,8 @@ struct mpx_bounds_dir *bounds_dir_ptr;
 
 unsigned long __bd_incore(const char *func, int line)
 {
-	unsigned long ret = nr_incore(bounds_dir_ptr, MPX_BOUNDS_DIR_SIZE_BYTES);
+	unsigned long ret = nr_incore(bounds_dir_ptr,
+				      mpx_bounds_dir_hw_size_bytes());
 	return ret;
 }
 #define bd_incore() __bd_incore(__func__, __LINE__)
@@ -624,43 +691,50 @@ void check_clear_bd(void)
 	check_clear(bounds_dir_ptr, 2UL << 30);
 }
 
-#define USE_MALLOC_FOR_BOUNDS_DIR 1
-bool process_specific_init(void)
+void *alloc_bounds_directory(unsigned long long size)
 {
-	unsigned long size;
-	unsigned long *dir;
+	/*
+	 * This can make debugging easier because the
+	 * address calculations are simpler:
+	 */
+	void *hint_addr = NULL; //0x200000000000;
 	/* Guarantee we have the space to align it, add padding: */
 	unsigned long pad = getpagesize();
+	unsigned long *dir;
+	int flags;
 
-	size = 2UL << 30; /* 2GB */
-	if (sizeof(unsigned long) == 4)
-		size = 4UL << 20; /* 4MB */
-	dprintf1("trying to allocate %ld MB bounds directory\n", (size >> 20));
-
-	if (USE_MALLOC_FOR_BOUNDS_DIR) {
-		unsigned long _dir;
-
-		dir = malloc(size + pad);
-		assert(dir);
-		_dir = (unsigned long)dir;
-		_dir += 0xfffUL;
-		_dir &= ~0xfffUL;
-		dir = (void *)_dir;
-	} else {
-		/*
-		 * This makes debugging easier because the address
-		 * calculations are simpler:
-		 */
-		dir = mmap((void *)0x200000000000, size + pad,
-				PROT_READ|PROT_WRITE,
-				MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
-		if (dir == (void *)-1) {
-			perror("unable to allocate bounds directory");
-			abort();
-		}
-		check_clear(dir, size);
+	/*
+	 * The bounds directory can be very large and cause us
+	 * to exceed overcommit limits.  Use MAP_NORESERVE to
+	 * avoid the overcommit limits.
+	 */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
+	dir = mmap(hint_addr, size + pad , PROT_READ|PROT_WRITE, flags, -1, 0);
+	if (dir == (void *)-1) {
+		perror("unable to allocate bounds directory");
+		abort();
 	}
-	bounds_dir_ptr = (void *)dir;
+	check_clear(dir, size);
+	return dir;
+}
+
+#define USE_MALLOC_FOR_BOUNDS_DIR 0
+bool process_specific_init(void)
+{
+	unsigned long long size;
+	unsigned long *dir;
+	int err;
+
+	size = mpx_bounds_dir_alloc_size_bytes();
+	dprintf1("trying to allocate %lld MB bounds directory\n", (size >> 20));
+
+	dir = alloc_bounds_directory(size);
+	/*
+	 * The directory is a large anonymous allocation, so it
+	 * looks like an ideal place to use transparent large pages.
+	 * But, in practice, it's usually sparsely populated and
+	 * will waste lots of memory.  Turn THP off:
+	 */
 	madvise(bounds_dir_ptr, size, MADV_NOHUGEPAGE);
 	bd_incore();
 	dprintf1("bounds directory: 0x%p -> 0x%p\n", bounds_dir_ptr,
@@ -668,7 +742,19 @@ bool process_specific_init(void)
 	check_clear(dir, size);
 	enable_mpx(dir);
 	check_clear(dir, size);
-	if (prctl(PR_MPX_ENABLE_MANAGEMENT, 0, 0, 0, 0)) {
+
+	/* Try to tell newer kernels the size of the directory: */
+	err = prctl(PR_MPX_ENABLE_MANAGEMENT, size, 0, 0, 0);
+	/*
+	 * But also handle older kernels that need argument 2 to be 0.
+	 * If the hardware supports larger bounds directories, we
+	 * allocated a large one in anticipation of needing it. But,
+	 * the kernel does not support it, so will use only a
+	 * small portion (1/512th) of it in these tests.
+	 */
+	if (err)
+		err = prctl(PR_MPX_ENABLE_MANAGEMENT, 0, 0, 0, 0);
+	if (err) {
 		printf("no MPX support\n");
 		abort();
 		return false;
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables
  2017-02-01 23:24 ` [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables Dave Hansen
@ 2017-02-12 19:05   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2017-02-12 19:05 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86, kirill.shutemov

On Wed, 1 Feb 2017, Dave Hansen wrote:
>  /*
> - * The upper 28 bits [47:20] of the virtual address in 64-bit
> - * are used to index into bounds directory (BD).
> + * The uppermost bits [56:20] of the virtual address in 64-bit
> + * are used to index into bounds directory (BD).  On processors
> + * with support for smaller virtual address space size, the "56"
> + * is obviously smaller.

 ... space size, the upper limit is adjusted accordingly.

Or something like that,

> +/*
> + * Note: size of tables on 64-bit is not constant, so we have no
> + * fixed definition for MPX_BD_NR_ENTRIES_64.
> + *
> + * The 5-Level Paging Whitepaper says:  "A bound directory
> + * comprises 2^(28+MAWA) 64-bit entries."  Since MAWA=0 in
> + * legacy mode:
> + */
> +#define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)

(1UL << 28) please

>  
> +static inline int mpx_bd_size_shift(struct mm_struct *mm)
> +{
> +	return mm->context.mpx_bd_shift;
> +}

Do we really need that helper?

>  static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
>  {
> -	if (is_64bit_mm(mm))
> -		return MPX_BD_SIZE_BYTES_64;
> -	else
> +	if (!is_64bit_mm(mm))
>  		return MPX_BD_SIZE_BYTES_32;
> +
> +	/*
> +	 * The bounds directory grows with the address space size.
> +	 * The "legacy" shift is 0.
> +	 */
> +	return MPX_BD_BASE_SIZE_BYTES_64 << mpx_bd_shift_shift(mm);

shift_shift. I wonder how that compiles...

Looks good otherwise.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory
  2017-02-01 23:24 ` [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory Dave Hansen
@ 2017-02-12 19:15   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2017-02-12 19:15 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86, kirill.shutemov

On Wed, 1 Feb 2017, Dave Hansen wrote:
> FIXME: we also need to ensure that we check the current state of the
> larger address space opt-in.  If we've opted in to larger address spaces
> we can not allow a small bounds directory to be used.  Also, if we've
> not opted in, we can not allow the larger bounds directory to be used.
> This can be fixed once the in-kernel API for opting in/out is settled.

Ok.

>  /* Register/unregister a process' MPX related resource */
> -#define MPX_ENABLE_MANAGEMENT()	mpx_enable_management()
> +#define MPX_ENABLE_MANAGEMENT(bd_size)	mpx_enable_management(bd_size)
>  #define MPX_DISABLE_MANAGEMENT()	mpx_disable_management()

Please add another tab before mpx_disable so both are aligned.

> -int mpx_enable_management(void)
> +int mpx_set_mm_bd_size(unsigned long bd_size)

static ?

> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	switch ((unsigned long long)bd_size) {
> +	case 0:
> +		/* Legacy call to prctl(): */
> +		mm->context.mpx_bd_shift = 0;
> +		return 0;
> +	case MPX_BD_SIZE_BYTES_32:
> +		/* 32-bit, legacy-sized bounds directory: */
> +		if (is_64bit_mm(mm))
> +			return -EINVAL;
> +		mm->context.mpx_bd_shift = 0;
> +		return 0;
> +	case MPX_BD_BASE_SIZE_BYTES_64:
> +		/* 64-bit, legacy-sized bounds directory: */
> +		if (!is_64bit_mm(mm)
> +		// FIXME && ! opted-in to larger address space

Hmm. Confused. This is where we enable MPX and decode the requested address
space. How can an already opt in happen?

If that's a enable call for an already enabled thing, then we should catch
that at the call site, I think.

> +	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_LARGE_BOUNDS_DIR_SHIFT:
> +		/*
> +		 * Non-legacy call, with larger directory.
> +		 * Note that there is no 32-bit equivalent for
> +		 * this case since its address space does not
> +		 * change sizes.
> +		 */
> +		if (!is_64bit_mm(mm))
> +			return -EINVAL;
> +		/*
> +		 * Do not let this be enabled unles we are on
> +		 * 5-level hardware *and* have that feature
> +		 * enabled. FIXME: need runtime check

Runtime check? Isn't the feature bit enough?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR
  2017-02-01 23:24 ` [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR Dave Hansen
@ 2017-02-12 19:37   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2017-02-12 19:37 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86, kirill.shutemov

On Wed, 1 Feb 2017, Dave Hansen wrote:
> +/*
> + * The MPX tables change sizes based on the size of the virtual
> + * (aka. linear) address space.  There is an MSR to tell the CPU
> + * whether we want the legacy-style ones or the larger ones when
> + * we are running with an eXtended virtual address space.
> + */
> +static inline void switch_mpx_bd(struct mm_struct *prev, struct mm_struct *next)
> +{
> +	/*
> +	 * Note: there is one and only one bit in use in the MSR
> +	 * at this time, so we do not have to be concerned with
> +	 * preserving any of the other bits.  Just write 0 or 1.
> +	 */
> +	u32 IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
> +
> +	/*
> +	 * Avoid the MSR on CPUs without MPX, obviously:
> +	 */
> +	if (!cpu_feature_enabled(X86_FEATURE_MPX))
> +		return;
> +	/*
> +	 * FIXME: do we want a check here for the 5-level paging
> +	 * CR4 bit or CPUID bit, or is the mawa check below OK?
> +	 * It's not obvious what would be the fastest or if it
> +	 * matters.
> +	 */

Well, you could use a static key which is enabled when 5 level paging and
MPX is enabled.

> +	/*
> +	 * Avoid the relatively costly MSR if we are not changing
> +	 * MAWA state.  All processes not using MPX will have a
> +	 * mpx_mawa_shift()=0, so we do not need to check
> +	 * separately for whether MPX management is enabled.
> +	 */
> +	if (likely(mpx_bd_size_shift(prev) == mpx_bd_size_shift(next)))
> +		return;

So this switches back unconditionally if the previous task was using the
large tables even if the next task is not using MPX at all. It's probably a
non issue.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data
  2017-02-01 23:24 ` [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data Dave Hansen
@ 2017-02-12 22:44   ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2017-02-12 22:44 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86, kirill.shutemov

On Wed, 1 Feb 2017, Dave Hansen wrote:
>  /*
> - * NULL is theoretically a valid place to put the bounds
> - * directory, so point this at an invalid address.
> + * These get stored into mm_context_t->mpx_directory_info.
> + * We could theoretically use bits 0 and 1, but those are
> + * used in the BNDCFGU register that also holds the bounds
> + * directory pointer.  To avoid confusion, use different bits.
>   */
> -#define MPX_INVALID_BOUNDS_DIR	((void __user *)-1)
> +#define MPX_INVALID_BOUNDS_DIR	(1UL<<2)
> +#define MPX_LARGE_BOUNDS_DIR	(1UL<<3)

Please keep them tabular aligned

>  static inline int mpx_bd_size_shift(struct mm_struct *mm)
>  {
> -	return mm->context.mpx_bd_shift;
> +	if (!kernel_managing_mpx_tables(mm))
> +		return 0;
> +	if (mm->context.mpx_directory_info & MPX_LARGE_BOUNDS_DIR)
> +		return MPX_LARGE_BOUNDS_DIR_SHIFT;
> +	return 0;

So now makes the inline sense.

> -int mpx_set_mm_bd_size(unsigned long bd_size)
> +int mpx_set_dir_size(unsigned long bd_size, unsigned long *mpx_directory_info)
>  {
>  	struct mm_struct *mm = current->mm;
> +	int ret = 0;
> +	bool large_dir = false;

>  	struct mm_struct *mm = current->mm;
> +	bool large_dir = false;
> +	int ret = 0;

Please

>  
>  	switch ((unsigned long long)bd_size) {
>  	case 0:
> -		/* Legacy call to prctl(): */
> -		mm->context.mpx_bd_shift = 0;
> -		return 0;
> +		/* Legacy call to prctl() */
> +		break;
>  	case MPX_BD_SIZE_BYTES_32:
>  		/* 32-bit, legacy-sized bounds directory: */
> -		if (is_64bit_mm(mm))
> -			return -EINVAL;
> -		mm->context.mpx_bd_shift = 0;
> -		return 0;
> +		if (is_64bit_mm(mm)) {
> +			ret = -EINVAL;
> +			break;

Why do you want to break in the error case instead of just returning the
error? In case of error it really makes no sense to fiddle with the large
page bit in the directory_info.

> +		}
> +		ret = 0;

It's already 0

> +		break;
>  	case MPX_BD_BASE_SIZE_BYTES_64:
>  		/* 64-bit, legacy-sized bounds directory: */
>  		if (!is_64bit_mm(mm)
>  		// FIXME && ! opted-in to larger address space
>  		)
> -			return -EINVAL;
> -		mm->context.mpx_bd_shift = 0;
> -		return 0;
> +			ret = -EINVAL;

See above

> +		break;
>  	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_LARGE_BOUNDS_DIR_SHIFT:
>  		/*
>  		 * Non-legacy call, with larger directory.
> @@ -370,7 +372,7 @@ int mpx_set_mm_bd_size(unsigned long bd_
>  		 * change sizes.
>  		 */
>  		if (!is_64bit_mm(mm))
> -			return -EINVAL;
> +			ret = -EINVAL;

Ditto

>  		/*
>  		 * Do not let this be enabled unles we are on
>  		 * 5-level hardware *and* have that feature
> @@ -379,16 +381,20 @@ int mpx_set_mm_bd_size(unsigned long bd_
>  		if (!cpu_feature_enabled(X86_FEATURE_LA57)
>  		// FIXME && opted into larger address space
>  		)
> -			return -EINVAL;
> -		mm->context.mpx_bd_shift = MPX_LARGE_BOUNDS_DIR_SHIFT;
> -		return 0;
> +			ret = -EINVAL;
> +		if (ret)
> +			break;

This is outright silly.

> +		large_dir = true;
> +		break;
>  	}
> -	return -EINVAL;
> +	if (large_dir)
> +		(*mpx_directory_info) |= MPX_LARGE_BOUNDS_DIR;
> +	return ret;
>  }
>  
>  int mpx_enable_management(unsigned long bd_size)
>  {
> -	void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
> +	void __user *bd_base;
>  	struct mm_struct *mm = current->mm;
>  	int ret = 0;
>  
> @@ -404,13 +410,16 @@ int mpx_enable_management(unsigned long
>  	 * unmap path; we can just use mm->context.bd_addr instead.
>  	 */
>  	bd_base = mpx_get_bounds_dir();
> +	if (bd_base == MPX_INVALID_BOUNDS_DIR)
> +		return -ENXIO;
> +
>  	down_write(&mm->mmap_sem);
> -	ret = mpx_set_mm_bd_size(bd_size);
> +	/* Mask out the invalid bit: */
> +	mm->context.mpx_directory_info &= ~MPX_INVALID_BOUNDS_DIR;

The handling of that bit is really confusing

> +	ret = mpx_set_dir_size(bd_size, &mm->context.mpx_directory_info);
>  	if (ret)
>  		goto out;

And what makes the thing invalid again in case of ret != 0?

> -	mm->context.bd_addr = bd_base;
> -	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
> -		ret = -ENXIO;
> +	mm->context.mpx_directory_info |= bd_base;
>  out:
>  	up_write(&mm->mmap_sem);
>  	return ret;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-12 22:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 23:24 [RFC][PATCH 0/7] x86, mpx: Support larger address space (MAWA) (v2) Dave Hansen
2017-02-01 23:24 ` [RFC][PATCH 1/7] x86, mpx: introduce per-mm MPX table size tracking Dave Hansen
2017-02-01 23:24 ` [RFC][PATCH 2/7] x86, mpx: update MPX to grok larger bounds tables Dave Hansen
2017-02-12 19:05   ` Thomas Gleixner
2017-02-01 23:24 ` [RFC][PATCH 3/7] x86, mpx: extend MPX prctl() to pass in size of bounds directory Dave Hansen
2017-02-12 19:15   ` Thomas Gleixner
2017-02-01 23:24 ` [RFC][PATCH 4/7] x86, mpx: context-switch new MPX address size MSR Dave Hansen
2017-02-12 19:37   ` Thomas Gleixner
2017-02-01 23:24 ` [RFC][PATCH 5/7] x86, mpx: shrink per-mm MPX data Dave Hansen
2017-02-12 22:44   ` Thomas Gleixner
2017-02-01 23:24 ` [RFC][PATCH 6/7] x86, mpx, selftests: Use prctl header instead of magic numbers Dave Hansen
2017-02-01 23:24 ` [RFC][PATCH 7/7] x86, mpx: update MPX selftest to test larger bounds dir Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).