All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA)
@ 2017-01-26 22:40 ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen

Kirill is chugging right along getting his 5-level paging[1] patch set
ready to be merged.  I figured I'd share an early draft of the MPX
support that will to go along with it.

Background: there is a lot more detail about what bounds tables are in
the changelog for fe3d197f843.  But, basically MPX bounds tables help
us to store the ranges to which a pointer is allowed to point.  The
tables are walked by hardware and they are indexed by the virtual
address of the pointer being checked.

A larger virtual address space (from 5-level paging) means that we
need larger tables.  5-level paging hardware includes a feature called
MPX Address-Width Adjust (MAWA) that grows the bounds tables so they
can address the new address space.  MAWA is controlled independently
from the paging mode (via an MSR) so that old MPX binaries can run on
new hardware and kernels supporting 5-level paging.

But, since userspace is responsible for allocating the table that is
growing (the directory), we need to ensure that userspace and the
kernel agree about the size of these tables and the kernel can set the
MSR appropriately.

These are not quite ready to get applied anywhere, but I don't expect
the basics to change unless folks have big problems with this.  The
only big remaining piece of work is to update the MPX selftest code.

Dave Hansen (4):
      x86, mpx: introduce per-mm MPX table size tracking
      x86, mpx: update MPX to grok larger bounds tables
      x86, mpx: extend MPX prctl() to pass in size of bounds directory
      x86, mpx: context-switch new MPX address size MSR

 arch/x86/include/asm/mmu.h       |  1 +
 arch/x86/include/asm/mpx.h       | 41 ++++++++++++++---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/include/asm/processor.h |  6 +--
 arch/x86/mm/mpx.c                | 79 ++++++++++++++++++++++++++++----
 arch/x86/mm/pgtable.c            |  2 +-
 arch/x86/mm/tlb.c                | 42 +++++++++++++++++
 kernel/sys.c                     |  6 +--
 8 files changed, 155 insertions(+), 23 deletions(-)

1. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA)
@ 2017-01-26 22:40 ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen

Kirill is chugging right along getting his 5-level paging[1] patch set
ready to be merged.  I figured I'd share an early draft of the MPX
support that will to go along with it.

Background: there is a lot more detail about what bounds tables are in
the changelog for fe3d197f843.  But, basically MPX bounds tables help
us to store the ranges to which a pointer is allowed to point.  The
tables are walked by hardware and they are indexed by the virtual
address of the pointer being checked.

A larger virtual address space (from 5-level paging) means that we
need larger tables.  5-level paging hardware includes a feature called
MPX Address-Width Adjust (MAWA) that grows the bounds tables so they
can address the new address space.  MAWA is controlled independently
from the paging mode (via an MSR) so that old MPX binaries can run on
new hardware and kernels supporting 5-level paging.

But, since userspace is responsible for allocating the table that is
growing (the directory), we need to ensure that userspace and the
kernel agree about the size of these tables and the kernel can set the
MSR appropriately.

These are not quite ready to get applied anywhere, but I don't expect
the basics to change unless folks have big problems with this.  The
only big remaining piece of work is to update the MPX selftest code.

Dave Hansen (4):
      x86, mpx: introduce per-mm MPX table size tracking
      x86, mpx: update MPX to grok larger bounds tables
      x86, mpx: extend MPX prctl() to pass in size of bounds directory
      x86, mpx: context-switch new MPX address size MSR

 arch/x86/include/asm/mmu.h       |  1 +
 arch/x86/include/asm/mpx.h       | 41 ++++++++++++++---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/include/asm/processor.h |  6 +--
 arch/x86/mm/mpx.c                | 79 ++++++++++++++++++++++++++++----
 arch/x86/mm/pgtable.c            |  2 +-
 arch/x86/mm/tlb.c                | 42 +++++++++++++++++
 kernel/sys.c                     |  6 +--
 8 files changed, 155 insertions(+), 23 deletions(-)

1. https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking
  2017-01-26 22:40 ` Dave Hansen
@ 2017-01-26 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


Larger address spaces mean larger MPX bounds table sizes.  This
tracks which size tables we are using.

"MAWA" is what the hardware documentation calls this feature:
MPX Address-Width Adjust.  We will carry that nomenclature throughout
this series.

The new field will be optimized and get packed into 'bd_addr' in a later
patch.  But, leave it separate for now to make the series simpler.

---

 b/arch/x86/include/asm/mmu.h |    1 +
 b/arch/x86/include/asm/mpx.h |    9 +++++++++
 2 files changed, 10 insertions(+)

diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h
--- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.643673297 -0800
+++ b/arch/x86/include/asm/mmu.h	2017-01-26 14:31:32.647673476 -0800
@@ -34,6 +34,7 @@ typedef struct {
 #ifdef CONFIG_X86_INTEL_MPX
 	/* address of the bounds directory */
 	void __user *bd_addr;
+	int mpx_mawa;
 #endif
 } mm_context_t;
 
diff -puN arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.644673342 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:32.648673521 -0800
@@ -68,6 +68,15 @@ static inline void mpx_mm_init(struct mm
 	 * directory, so point this at an invalid address.
 	 */
 	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
+	/*
+	 * All processes start out in "legacy" MPX mode with
+	 * MAWA=0.
+	 */
+	mm->context.mpx_mawa = 0;
+}
+static inline int mpx_mawa_shift(struct mm_struct *mm)
+{
+	return mm->context.mpx_mawa;
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking
@ 2017-01-26 22:40   ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


Larger address spaces mean larger MPX bounds table sizes.  This
tracks which size tables we are using.

"MAWA" is what the hardware documentation calls this feature:
MPX Address-Width Adjust.  We will carry that nomenclature throughout
this series.

The new field will be optimized and get packed into 'bd_addr' in a later
patch.  But, leave it separate for now to make the series simpler.

---

 b/arch/x86/include/asm/mmu.h |    1 +
 b/arch/x86/include/asm/mpx.h |    9 +++++++++
 2 files changed, 10 insertions(+)

diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h
--- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.643673297 -0800
+++ b/arch/x86/include/asm/mmu.h	2017-01-26 14:31:32.647673476 -0800
@@ -34,6 +34,7 @@ typedef struct {
 #ifdef CONFIG_X86_INTEL_MPX
 	/* address of the bounds directory */
 	void __user *bd_addr;
+	int mpx_mawa;
 #endif
 } mm_context_t;
 
diff -puN arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.644673342 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:32.648673521 -0800
@@ -68,6 +68,15 @@ static inline void mpx_mm_init(struct mm
 	 * directory, so point this at an invalid address.
 	 */
 	mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
+	/*
+	 * All processes start out in "legacy" MPX mode with
+	 * MAWA=0.
+	 */
+	mm->context.mpx_mawa = 0;
+}
+static inline int mpx_mawa_shift(struct mm_struct *mm)
+{
+	return mm->context.mpx_mawa;
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 2/4] x86, mpx: update MPX to grok larger bounds tables
  2017-01-26 22:40 ` Dave Hansen
@ 2017-01-26 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


As mentioned repeatedly, larger address spaces mean larger MPX bounds
tables.  The MPX code in the kernel needs to walk these tables in order
to populate them on demand as well as unmap them when memory is freed.

This updates the bounds table walking code to understand how to walk
the larger table size.  It uses the new per-mm "MAWA" value to determine
which format to use.

---

 b/arch/x86/include/asm/mpx.h |   27 +++++++++++++++++++++------
 b/arch/x86/mm/mpx.c          |   25 +++++++++++++++++--------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes	2017-01-26 14:31:33.098693731 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:33.103693956 -0800
@@ -14,15 +14,30 @@
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 /*
- * The upper 28 bits [47:20] of the virtual address in 64-bit
- * are used to index into bounds directory (BD).
+ * The uppermost bits [56:20] of the virtual address in 64-bit
+ * are used to index into bounds directory (BD).  On processors
+ * with support for smaller virtual address space size, the "56"
+ * is obviously smaller.
  *
- * The directory is 2G (2^31) in size, and with 8-byte entries
- * it has 2^28 entries.
+ * When using 47-bit virtual addresses, the directory is 2G
+ * (2^31) bytes in size, and with 8-byte entries it has 2^28
+ * entries.  With 56-bit virtual addresses, it goes to 1T in size
+ * and has 2^37 entries.
+ *
+ * Needs to be ULL so we can use this in 32-bit kernels without
+ * warnings.
  */
-#define MPX_BD_SIZE_BYTES_64	(1UL<<31)
+#define MPX_BD_BASE_SIZE_BYTES_64	(1ULL<<31)
 #define MPX_BD_ENTRY_BYTES_64	8
-#define MPX_BD_NR_ENTRIES_64	(MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
+/*
+ * Note: size of tables on 64-bit is not constant, so we have no
+ * fixed definition for MPX_BD_NR_ENTRIES_64.
+ *
+ * The 5-Level Paging Whitepaper says:
+ * A bound directory comprises 2^(28+MAWA) 64-bit entries.
+ * MAWA=0 in the legacy mode, so:
+ */
+#define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
diff -puN arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes	2017-01-26 14:31:33.099693776 -0800
+++ b/arch/x86/mm/mpx.c	2017-01-26 14:31:33.103693956 -0800
@@ -22,10 +22,14 @@
 
 static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
 {
-	if (is_64bit_mm(mm))
-		return MPX_BD_SIZE_BYTES_64;
-	else
+	if (!is_64bit_mm(mm))
 		return MPX_BD_SIZE_BYTES_32;
+
+	/*
+	 * The bounds directory grows with the MAWA value.  The
+	 * "legacy" shift is 0.
+	 */
+	return MPX_BD_BASE_SIZE_BYTES_64 << mpx_mawa_shift(mm);
 }
 
 static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm)
@@ -724,6 +728,7 @@ static inline unsigned long bd_entry_vir
 {
 	unsigned long long virt_space;
 	unsigned long long GB = (1ULL << 30);
+	unsigned long legacy_64bit_vaddr_bits = 48;
 
 	/*
 	 * This covers 32-bit emulation as well as 32-bit kernels
@@ -733,12 +738,16 @@ static inline unsigned long bd_entry_vir
 		return (4ULL * GB) / MPX_BD_NR_ENTRIES_32;
 
 	/*
-	 * 'x86_virt_bits' returns what the hardware is capable
-	 * of, and returns the full >32-bit address space when
-	 * running 32-bit kernels on 64-bit hardware.
+	 * With 5-level paging, the virtual address space size
+	 * gets bigger.  A bounds directory entry still points to
+	 * a single bounds table and the *tables* stay the same
+	 * size.  Thus, the address space that a directory entry
+	 * covers does not change based on the paging mode (or
+	 * MAWA value).  Just use the legacy calculation despite
+	 * the MAWA mode.
 	 */
-	virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
-	return virt_space / MPX_BD_NR_ENTRIES_64;
+	virt_space = (1ULL << legacy_64bit_vaddr_bits);
+	return virt_space / MPX_BD_LEGACY_NR_ENTRIES_64;
 }
 
 /*
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 2/4] x86, mpx: update MPX to grok larger bounds tables
@ 2017-01-26 22:40   ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


As mentioned repeatedly, larger address spaces mean larger MPX bounds
tables.  The MPX code in the kernel needs to walk these tables in order
to populate them on demand as well as unmap them when memory is freed.

This updates the bounds table walking code to understand how to walk
the larger table size.  It uses the new per-mm "MAWA" value to determine
which format to use.

---

 b/arch/x86/include/asm/mpx.h |   27 +++++++++++++++++++++------
 b/arch/x86/mm/mpx.c          |   25 +++++++++++++++++--------
 2 files changed, 38 insertions(+), 14 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-030-bounds-directory-sizes	2017-01-26 14:31:33.098693731 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:33.103693956 -0800
@@ -14,15 +14,30 @@
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 /*
- * The upper 28 bits [47:20] of the virtual address in 64-bit
- * are used to index into bounds directory (BD).
+ * The uppermost bits [56:20] of the virtual address in 64-bit
+ * are used to index into bounds directory (BD).  On processors
+ * with support for smaller virtual address space size, the "56"
+ * is obviously smaller.
  *
- * The directory is 2G (2^31) in size, and with 8-byte entries
- * it has 2^28 entries.
+ * When using 47-bit virtual addresses, the directory is 2G
+ * (2^31) bytes in size, and with 8-byte entries it has 2^28
+ * entries.  With 56-bit virtual addresses, it goes to 1T in size
+ * and has 2^37 entries.
+ *
+ * Needs to be ULL so we can use this in 32-bit kernels without
+ * warnings.
  */
-#define MPX_BD_SIZE_BYTES_64	(1UL<<31)
+#define MPX_BD_BASE_SIZE_BYTES_64	(1ULL<<31)
 #define MPX_BD_ENTRY_BYTES_64	8
-#define MPX_BD_NR_ENTRIES_64	(MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
+/*
+ * Note: size of tables on 64-bit is not constant, so we have no
+ * fixed definition for MPX_BD_NR_ENTRIES_64.
+ *
+ * The 5-Level Paging Whitepaper says:
+ * A bound directory comprises 2^(28+MAWA) 64-bit entries.
+ * MAWA=0 in the legacy mode, so:
+ */
+#define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
diff -puN arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-030-bounds-directory-sizes	2017-01-26 14:31:33.099693776 -0800
+++ b/arch/x86/mm/mpx.c	2017-01-26 14:31:33.103693956 -0800
@@ -22,10 +22,14 @@
 
 static inline unsigned long mpx_bd_size_bytes(struct mm_struct *mm)
 {
-	if (is_64bit_mm(mm))
-		return MPX_BD_SIZE_BYTES_64;
-	else
+	if (!is_64bit_mm(mm))
 		return MPX_BD_SIZE_BYTES_32;
+
+	/*
+	 * The bounds directory grows with the MAWA value.  The
+	 * "legacy" shift is 0.
+	 */
+	return MPX_BD_BASE_SIZE_BYTES_64 << mpx_mawa_shift(mm);
 }
 
 static inline unsigned long mpx_bt_size_bytes(struct mm_struct *mm)
@@ -724,6 +728,7 @@ static inline unsigned long bd_entry_vir
 {
 	unsigned long long virt_space;
 	unsigned long long GB = (1ULL << 30);
+	unsigned long legacy_64bit_vaddr_bits = 48;
 
 	/*
 	 * This covers 32-bit emulation as well as 32-bit kernels
@@ -733,12 +738,16 @@ static inline unsigned long bd_entry_vir
 		return (4ULL * GB) / MPX_BD_NR_ENTRIES_32;
 
 	/*
-	 * 'x86_virt_bits' returns what the hardware is capable
-	 * of, and returns the full >32-bit address space when
-	 * running 32-bit kernels on 64-bit hardware.
+	 * With 5-level paging, the virtual address space size
+	 * gets bigger.  A bounds directory entry still points to
+	 * a single bounds table and the *tables* stay the same
+	 * size.  Thus, the address space that a directory entry
+	 * covers does not change based on the paging mode (or
+	 * MAWA value).  Just use the legacy calculation despite
+	 * the MAWA mode.
 	 */
-	virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
-	return virt_space / MPX_BD_NR_ENTRIES_64;
+	virt_space = (1ULL << legacy_64bit_vaddr_bits);
+	return virt_space / MPX_BD_LEGACY_NR_ENTRIES_64;
 }
 
 /*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 3/4] x86, mpx: extend MPX prctl() to pass in size of bounds directory
  2017-01-26 22:40 ` Dave Hansen
@ 2017-01-26 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


The MPX bounds tables are indexed by virtual address.  A larger virtual
address space means that we need larger tables.  But, we need to ensure
that userspace and the kernel agree about the size of these tables.

To do this, we require that userspace pass in the size of the tables
if they want a non-legacy size.  They do this with a previously unused
(required to be 0) argument to the PR_MPX_ENABLE_MANAGEMENT ptctl().

This way, the kernel can make sure that the size of the tables is
consistent with the size of the address space and can return an error
if there is a mismatch.

There are essentially 3 table sizes that matter:
1. 32-bit table sized for a 32-bit address space
2. 64-bit table sized for a 48-bit address space
3. 64-bit table sized for a 57-bit address space

We cover all three of those cases.

FIXME: we also need to ensure that we check the current state of the
larger address space opt-in.  If we've opted in to larger address spaces
we can not allow a small bounds directory to be used.  Also, if we've
not opted in, we can not allow the larger bounds directory to be used.

---

 b/arch/x86/include/asm/mpx.h       |    5 +++
 b/arch/x86/include/asm/processor.h |    6 ++--
 b/arch/x86/mm/mpx.c                |   54 +++++++++++++++++++++++++++++++++++--
 b/arch/x86/mm/pgtable.c            |    2 -
 b/kernel/sys.c                     |    6 ++--
 5 files changed, 64 insertions(+), 9 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.564714660 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:33.574715109 -0800
@@ -40,6 +40,11 @@
 #define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
+ * We only support one value for MAWA
+ */
+#define MPX_MAWA_VALUE		9
+
+/*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
  * entries it has 2^20 entries.
  */
diff -puN arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa arch/x86/include/asm/processor.h
--- a/arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.566714750 -0800
+++ b/arch/x86/include/asm/processor.h	2017-01-26 14:31:33.575715154 -0800
@@ -863,14 +863,14 @@ extern int get_tsc_mode(unsigned long ad
 extern int set_tsc_mode(unsigned int val);
 
 /* Register/unregister a process' MPX related resource */
-#define MPX_ENABLE_MANAGEMENT()	mpx_enable_management()
+#define MPX_ENABLE_MANAGEMENT(bd_size)	mpx_enable_management(bd_size)
 #define MPX_DISABLE_MANAGEMENT()	mpx_disable_management()
 
 #ifdef CONFIG_X86_INTEL_MPX
-extern int mpx_enable_management(void);
+extern int mpx_enable_management(unsigned long bd_size);
 extern int mpx_disable_management(void);
 #else
-static inline int mpx_enable_management(void)
+static inline int mpx_enable_management(unsigned long bd_size)
 {
 	return -EINVAL;
 }
diff -puN arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.567714795 -0800
+++ b/arch/x86/mm/mpx.c	2017-01-26 14:31:33.575715154 -0800
@@ -339,7 +339,54 @@ static __user void *mpx_get_bounds_dir(v
 		(bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK);
 }
 
-int mpx_enable_management(void)
+int mpx_set_mm_bd_size(unsigned long bd_size)
+{
+	struct mm_struct *mm = current->mm;
+
+	switch ((unsigned long long)bd_size) {
+	case 0:
+		/* Legacy call to prctl(): */
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_SIZE_BYTES_32:
+		/* 32-bit, legacy-sized bounds directory: */
+		if (is_64bit_mm(mm))
+			return -EINVAL;
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64:
+		/* 64-bit, legacy-sized bounds directory: */
+		if (!is_64bit_mm(mm)
+		// FIXME && ! opted-in to larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_MAWA_VALUE:
+		/*
+		 * Non-legacy call, with larger directory.
+		 * Note that there is no 32-bit equivalent for
+		 * this case since its address space does not
+		 * change sizes.
+		 */
+		if (!is_64bit_mm(mm))
+			return -EINVAL;
+		/*
+		 * Do not let this be enabled unles we are on
+		 * 5-level hardware *and* have that feature
+		 * enabled. FIXME: need runtime check
+		 */
+		if (!cpu_feature_enabled(X86_FEATURE_LA57)
+		// FIXME && opted into larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_mawa = MPX_MAWA_VALUE;
+		return 0;
+	}
+	return -EINVAL;
+}
+
+int mpx_enable_management(unsigned long bd_size)
 {
 	void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
 	struct mm_struct *mm = current->mm;
@@ -358,10 +405,13 @@ int mpx_enable_management(void)
 	 */
 	bd_base = mpx_get_bounds_dir();
 	down_write(&mm->mmap_sem);
+	ret = mpx_set_mm_bd_size(bd_size);
+	if (ret)
+		goto out;
 	mm->context.bd_addr = bd_base;
 	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
 		ret = -ENXIO;
-
+out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
diff -puN arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.569714885 -0800
+++ b/arch/x86/mm/pgtable.c	2017-01-26 14:31:33.575715154 -0800
@@ -85,7 +85,7 @@ void ___pud_free_tlb(struct mmu_gather *
 #if CONFIG_PGTABLE_LEVELS > 4
 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 {
-	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
+	//paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
 	tlb_remove_page(tlb, virt_to_page(p4d));
 }
 #endif	/* CONFIG_PGTABLE_LEVELS > 4 */
diff -puN kernel/sys.c~mawa-040-prctl-set-mawa kernel/sys.c
--- a/kernel/sys.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.571714974 -0800
+++ b/kernel/sys.c	2017-01-26 14:31:33.576715199 -0800
@@ -92,7 +92,7 @@
 # define SET_TSC_CTL(a)		(-EINVAL)
 #endif
 #ifndef MPX_ENABLE_MANAGEMENT
-# define MPX_ENABLE_MANAGEMENT()	(-EINVAL)
+# define MPX_ENABLE_MANAGEMENT(bd_size)	(-EINVAL)
 #endif
 #ifndef MPX_DISABLE_MANAGEMENT
 # define MPX_DISABLE_MANAGEMENT()	(-EINVAL)
@@ -2246,9 +2246,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 		up_write(&me->mm->mmap_sem);
 		break;
 	case PR_MPX_ENABLE_MANAGEMENT:
-		if (arg2 || arg3 || arg4 || arg5)
+		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		error = MPX_ENABLE_MANAGEMENT();
+		error = MPX_ENABLE_MANAGEMENT(arg2);
 		break;
 	case PR_MPX_DISABLE_MANAGEMENT:
 		if (arg2 || arg3 || arg4 || arg5)
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 3/4] x86, mpx: extend MPX prctl() to pass in size of bounds directory
@ 2017-01-26 22:40   ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


The MPX bounds tables are indexed by virtual address.  A larger virtual
address space means that we need larger tables.  But, we need to ensure
that userspace and the kernel agree about the size of these tables.

To do this, we require that userspace pass in the size of the tables
if they want a non-legacy size.  They do this with a previously unused
(required to be 0) argument to the PR_MPX_ENABLE_MANAGEMENT ptctl().

This way, the kernel can make sure that the size of the tables is
consistent with the size of the address space and can return an error
if there is a mismatch.

There are essentially 3 table sizes that matter:
1. 32-bit table sized for a 32-bit address space
2. 64-bit table sized for a 48-bit address space
3. 64-bit table sized for a 57-bit address space

We cover all three of those cases.

FIXME: we also need to ensure that we check the current state of the
larger address space opt-in.  If we've opted in to larger address spaces
we can not allow a small bounds directory to be used.  Also, if we've
not opted in, we can not allow the larger bounds directory to be used.

---

 b/arch/x86/include/asm/mpx.h       |    5 +++
 b/arch/x86/include/asm/processor.h |    6 ++--
 b/arch/x86/mm/mpx.c                |   54 +++++++++++++++++++++++++++++++++++--
 b/arch/x86/mm/pgtable.c            |    2 -
 b/kernel/sys.c                     |    6 ++--
 5 files changed, 64 insertions(+), 9 deletions(-)

diff -puN arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.564714660 -0800
+++ b/arch/x86/include/asm/mpx.h	2017-01-26 14:31:33.574715109 -0800
@@ -40,6 +40,11 @@
 #define MPX_BD_LEGACY_NR_ENTRIES_64	(1UL<<28)
 
 /*
+ * We only support one value for MAWA
+ */
+#define MPX_MAWA_VALUE		9
+
+/*
  * The 32-bit directory is 4MB (2^22) in size, and with 4-byte
  * entries it has 2^20 entries.
  */
diff -puN arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa arch/x86/include/asm/processor.h
--- a/arch/x86/include/asm/processor.h~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.566714750 -0800
+++ b/arch/x86/include/asm/processor.h	2017-01-26 14:31:33.575715154 -0800
@@ -863,14 +863,14 @@ extern int get_tsc_mode(unsigned long ad
 extern int set_tsc_mode(unsigned int val);
 
 /* Register/unregister a process' MPX related resource */
-#define MPX_ENABLE_MANAGEMENT()	mpx_enable_management()
+#define MPX_ENABLE_MANAGEMENT(bd_size)	mpx_enable_management(bd_size)
 #define MPX_DISABLE_MANAGEMENT()	mpx_disable_management()
 
 #ifdef CONFIG_X86_INTEL_MPX
-extern int mpx_enable_management(void);
+extern int mpx_enable_management(unsigned long bd_size);
 extern int mpx_disable_management(void);
 #else
-static inline int mpx_enable_management(void)
+static inline int mpx_enable_management(unsigned long bd_size)
 {
 	return -EINVAL;
 }
diff -puN arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.567714795 -0800
+++ b/arch/x86/mm/mpx.c	2017-01-26 14:31:33.575715154 -0800
@@ -339,7 +339,54 @@ static __user void *mpx_get_bounds_dir(v
 		(bndcsr->bndcfgu & MPX_BNDCFG_ADDR_MASK);
 }
 
-int mpx_enable_management(void)
+int mpx_set_mm_bd_size(unsigned long bd_size)
+{
+	struct mm_struct *mm = current->mm;
+
+	switch ((unsigned long long)bd_size) {
+	case 0:
+		/* Legacy call to prctl(): */
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_SIZE_BYTES_32:
+		/* 32-bit, legacy-sized bounds directory: */
+		if (is_64bit_mm(mm))
+			return -EINVAL;
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64:
+		/* 64-bit, legacy-sized bounds directory: */
+		if (!is_64bit_mm(mm)
+		// FIXME && ! opted-in to larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_mawa = 0;
+		return 0;
+	case MPX_BD_BASE_SIZE_BYTES_64 << MPX_MAWA_VALUE:
+		/*
+		 * Non-legacy call, with larger directory.
+		 * Note that there is no 32-bit equivalent for
+		 * this case since its address space does not
+		 * change sizes.
+		 */
+		if (!is_64bit_mm(mm))
+			return -EINVAL;
+		/*
+		 * Do not let this be enabled unles we are on
+		 * 5-level hardware *and* have that feature
+		 * enabled. FIXME: need runtime check
+		 */
+		if (!cpu_feature_enabled(X86_FEATURE_LA57)
+		// FIXME && opted into larger address space
+		)
+			return -EINVAL;
+		mm->context.mpx_mawa = MPX_MAWA_VALUE;
+		return 0;
+	}
+	return -EINVAL;
+}
+
+int mpx_enable_management(unsigned long bd_size)
 {
 	void __user *bd_base = MPX_INVALID_BOUNDS_DIR;
 	struct mm_struct *mm = current->mm;
@@ -358,10 +405,13 @@ int mpx_enable_management(void)
 	 */
 	bd_base = mpx_get_bounds_dir();
 	down_write(&mm->mmap_sem);
+	ret = mpx_set_mm_bd_size(bd_size);
+	if (ret)
+		goto out;
 	mm->context.bd_addr = bd_base;
 	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
 		ret = -ENXIO;
-
+out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
diff -puN arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.569714885 -0800
+++ b/arch/x86/mm/pgtable.c	2017-01-26 14:31:33.575715154 -0800
@@ -85,7 +85,7 @@ void ___pud_free_tlb(struct mmu_gather *
 #if CONFIG_PGTABLE_LEVELS > 4
 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 {
-	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
+	//paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
 	tlb_remove_page(tlb, virt_to_page(p4d));
 }
 #endif	/* CONFIG_PGTABLE_LEVELS > 4 */
diff -puN kernel/sys.c~mawa-040-prctl-set-mawa kernel/sys.c
--- a/kernel/sys.c~mawa-040-prctl-set-mawa	2017-01-26 14:31:33.571714974 -0800
+++ b/kernel/sys.c	2017-01-26 14:31:33.576715199 -0800
@@ -92,7 +92,7 @@
 # define SET_TSC_CTL(a)		(-EINVAL)
 #endif
 #ifndef MPX_ENABLE_MANAGEMENT
-# define MPX_ENABLE_MANAGEMENT()	(-EINVAL)
+# define MPX_ENABLE_MANAGEMENT(bd_size)	(-EINVAL)
 #endif
 #ifndef MPX_DISABLE_MANAGEMENT
 # define MPX_DISABLE_MANAGEMENT()	(-EINVAL)
@@ -2246,9 +2246,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
 		up_write(&me->mm->mmap_sem);
 		break;
 	case PR_MPX_ENABLE_MANAGEMENT:
-		if (arg2 || arg3 || arg4 || arg5)
+		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		error = MPX_ENABLE_MANAGEMENT();
+		error = MPX_ENABLE_MANAGEMENT(arg2);
 		break;
 	case PR_MPX_DISABLE_MANAGEMENT:
 		if (arg2 || arg3 || arg4 || arg5)
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR
  2017-01-26 22:40 ` Dave Hansen
@ 2017-01-26 22:40   ` Dave Hansen
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


As mentioned in previous patches, larger address spaces mean
larger MPX tables.  But, the entire system is either entirely
using 5-level paging, or not.  We do not mix pagetable formats.

If the size of the MPX tables depended soley on the paging mode,
old binaries would break because the format of the tables changed
underneath them.  So, since CR4 never changes, but we need some
way to change the MPX table format, a new MSR is introduced:
MSR_IA32_MPX_LAX.

If we are in 5-level paging mode *and* the enable bit in this MSR
is set, the CPU will use the new, larger MPX bounds table format.
If 5-level paging is disabled, or the enable bit is clear, then
the legacy-style smaller tables will be used.

But, we might mix legacy and non-legacy binaries on the same
system, so this MSR needs to be context-switched.  Add code to
do this, along with some simple optimizations to skip the MSR
writes if the MSR does not need to be updated.

---

 b/arch/x86/include/asm/msr-index.h |    1 
 b/arch/x86/mm/tlb.c                |   42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h
--- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr	2017-01-26 14:31:37.747902524 -0800
+++ b/arch/x86/include/asm/msr-index.h	2017-01-26 14:31:37.752902749 -0800
@@ -410,6 +410,7 @@
 #define MSR_IA32_BNDCFGS		0x00000d90
 
 #define MSR_IA32_XSS			0x00000da0
+#define MSR_IA32_MPX_LAX		0x00001000
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c
--- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr	2017-01-26 14:31:37.749902614 -0800
+++ b/arch/x86/mm/tlb.c	2017-01-26 14:31:37.753902794 -0800
@@ -71,6 +71,47 @@ void switch_mm(struct mm_struct *prev, s
 	local_irq_restore(flags);
 }
 
+/*
+ * The MPX tables change sizes based on the size of the virtual
+ * (aka. linear) address space.  There is an MSR to tell the CPU
+ * whether we want the legacy-style ones or the larger ones when
+ * we are running with an eXtended virtual address space.
+ */
+static void switch_mawa(struct mm_struct *prev, struct mm_struct *next)
+{
+	/*
+	 * Note: there is one and only one bit in use in the MSR
+	 * at this time, so we do not have to be concerned with
+	 * preseving any of the other bits.  Just write 0 or 1.
+	 */
+	unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
+
+	if (!cpu_feature_enabled(X86_FEATURE_MPX))
+		return;
+	/*
+	 * FIXME: do we want a check here for the 5-level paging
+	 * CR4 bit or CPUID bit, or is the mawa check below OK?
+	 * It's not obvious what would be the fastest or if it
+	 * matters.
+	 */
+
+	/*
+	 * Avoid the relatively costly MSR if we are not changing
+	 * MAWA state.  All processes not using MPX will have a
+	 * mpx_mawa_shift()=0, so we do not need to check
+	 * separately for whether MPX management is enabled.
+	 */
+	if (mpx_mawa_shift(prev) == mpx_mawa_shift(next))
+		return;
+
+	if (mpx_mawa_shift(next)) {
+		wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0);
+	} else {
+		/* clear the enable bit: */
+		wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0);
+	}
+}
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+		switch_mawa(prev, next);
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
_

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR
@ 2017-01-26 22:40   ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2017-01-26 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm, x86, Dave Hansen


As mentioned in previous patches, larger address spaces mean
larger MPX tables.  But, the entire system is either entirely
using 5-level paging, or not.  We do not mix pagetable formats.

If the size of the MPX tables depended soley on the paging mode,
old binaries would break because the format of the tables changed
underneath them.  So, since CR4 never changes, but we need some
way to change the MPX table format, a new MSR is introduced:
MSR_IA32_MPX_LAX.

If we are in 5-level paging mode *and* the enable bit in this MSR
is set, the CPU will use the new, larger MPX bounds table format.
If 5-level paging is disabled, or the enable bit is clear, then
the legacy-style smaller tables will be used.

But, we might mix legacy and non-legacy binaries on the same
system, so this MSR needs to be context-switched.  Add code to
do this, along with some simple optimizations to skip the MSR
writes if the MSR does not need to be updated.

---

 b/arch/x86/include/asm/msr-index.h |    1 
 b/arch/x86/mm/tlb.c                |   42 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h
--- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr	2017-01-26 14:31:37.747902524 -0800
+++ b/arch/x86/include/asm/msr-index.h	2017-01-26 14:31:37.752902749 -0800
@@ -410,6 +410,7 @@
 #define MSR_IA32_BNDCFGS		0x00000d90
 
 #define MSR_IA32_XSS			0x00000da0
+#define MSR_IA32_MPX_LAX		0x00001000
 
 #define FEATURE_CONTROL_LOCKED				(1<<0)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX	(1<<1)
diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c
--- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr	2017-01-26 14:31:37.749902614 -0800
+++ b/arch/x86/mm/tlb.c	2017-01-26 14:31:37.753902794 -0800
@@ -71,6 +71,47 @@ void switch_mm(struct mm_struct *prev, s
 	local_irq_restore(flags);
 }
 
+/*
+ * The MPX tables change sizes based on the size of the virtual
+ * (aka. linear) address space.  There is an MSR to tell the CPU
+ * whether we want the legacy-style ones or the larger ones when
+ * we are running with an eXtended virtual address space.
+ */
+static void switch_mawa(struct mm_struct *prev, struct mm_struct *next)
+{
+	/*
+	 * Note: there is one and only one bit in use in the MSR
+	 * at this time, so we do not have to be concerned with
+	 * preseving any of the other bits.  Just write 0 or 1.
+	 */
+	unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
+
+	if (!cpu_feature_enabled(X86_FEATURE_MPX))
+		return;
+	/*
+	 * FIXME: do we want a check here for the 5-level paging
+	 * CR4 bit or CPUID bit, or is the mawa check below OK?
+	 * It's not obvious what would be the fastest or if it
+	 * matters.
+	 */
+
+	/*
+	 * Avoid the relatively costly MSR if we are not changing
+	 * MAWA state.  All processes not using MPX will have a
+	 * mpx_mawa_shift()=0, so we do not need to check
+	 * separately for whether MPX management is enabled.
+	 */
+	if (mpx_mawa_shift(prev) == mpx_mawa_shift(next))
+		return;
+
+	if (mpx_mawa_shift(next)) {
+		wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0);
+	} else {
+		/* clear the enable bit: */
+		wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0);
+	}
+}
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct
 		/* Load per-mm CR4 state */
 		load_mm_cr4(next);
 
+		switch_mawa(prev, next);
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
 		/*
 		 * Load the LDT, if the LDT is different.
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA)
  2017-01-26 22:40 ` Dave Hansen
@ 2017-01-27  8:16   ` Ingo Molnar
  -1 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, x86, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> Kirill is chugging right along getting his 5-level paging[1] patch set
> ready to be merged.  I figured I'd share an early draft of the MPX
> support that will to go along with it.
> 
> Background: there is a lot more detail about what bounds tables are in
> the changelog for fe3d197f843.  But, basically MPX bounds tables help
> us to store the ranges to which a pointer is allowed to point.  The
> tables are walked by hardware and they are indexed by the virtual
> address of the pointer being checked.
> 
> A larger virtual address space (from 5-level paging) means that we
> need larger tables.  5-level paging hardware includes a feature called
> MPX Address-Width Adjust (MAWA) that grows the bounds tables so they
> can address the new address space.  MAWA is controlled independently
> from the paging mode (via an MSR) so that old MPX binaries can run on
> new hardware and kernels supporting 5-level paging.
> 
> But, since userspace is responsible for allocating the table that is
> growing (the directory), we need to ensure that userspace and the
> kernel agree about the size of these tables and the kernel can set the
> MSR appropriately.
> 
> These are not quite ready to get applied anywhere, but I don't expect
> the basics to change unless folks have big problems with this.  The
> only big remaining piece of work is to update the MPX selftest code.
> 
> Dave Hansen (4):
>       x86, mpx: introduce per-mm MPX table size tracking
>       x86, mpx: update MPX to grok larger bounds tables
>       x86, mpx: extend MPX prctl() to pass in size of bounds directory
>       x86, mpx: context-switch new MPX address size MSR

On a related note, the MPX testcases seem to have gone from the 
tools/testing/selftests/x86/Makefile (possibly a merge mishap - the original 
commit adds it correctly), so they are not being built.

Plus I noticed that the pkeys testcases are producing a lot of noise:

triton:~/tip/tools/testing/selftests/x86> make
[...]
gcc -m64 -o protection_keys_64 -O2 -g -std=gnu99 -pthread -Wall  protection_keys.c -lrt -ldl
protection_keys.c: In function ‘setup_hugetlbfs’:
protection_keys.c:816:6: warning: unused variable ‘i’ [-Wunused-variable]
  int i;
      ^
protection_keys.c:815:6: warning: unused variable ‘validated_nr_pages’ [-Wunused-variable]
  int validated_nr_pages;
      ^
protection_keys.c: In function ‘test_pkey_syscalls_bad_args’:
protection_keys.c:1136:6: warning: unused variable ‘bad_flag’ [-Wunused-variable]
  int bad_flag = (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + 1;
      ^
protection_keys.c: In function ‘test_pkey_alloc_exhaust’:
protection_keys.c:1153:16: warning: unused variable ‘init_val’ [-Wunused-variable]
  unsigned long init_val;
                ^
protection_keys.c:1152:16: warning: unused variable ‘flags’ [-Wunused-variable]
  unsigned long flags;
                ^
In file included from protection_keys.c:45:0:
pkey-helpers.h: In function ‘sigsafe_printf’:
pkey-helpers.h:41:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
   write(1, dprint_in_signal_buffer, len);
   ^
protection_keys.c: In function ‘dumpit’:
protection_keys.c:407:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
   write(1, buf, nr_read);
   ^
protection_keys.c: In function ‘pkey_disable_set’:
protection_keys.c:68:5: warning: ‘orig_pkru’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (!(condition)) {   \
     ^
protection_keys.c:465:6: note: ‘orig_pkru’ was declared here
  u32 orig_pkru;
      ^
[...]

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA)
@ 2017-01-27  8:16   ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, x86, Thomas Gleixner, H. Peter Anvin,
	Peter Zijlstra


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> Kirill is chugging right along getting his 5-level paging[1] patch set
> ready to be merged.  I figured I'd share an early draft of the MPX
> support that will to go along with it.
> 
> Background: there is a lot more detail about what bounds tables are in
> the changelog for fe3d197f843.  But, basically MPX bounds tables help
> us to store the ranges to which a pointer is allowed to point.  The
> tables are walked by hardware and they are indexed by the virtual
> address of the pointer being checked.
> 
> A larger virtual address space (from 5-level paging) means that we
> need larger tables.  5-level paging hardware includes a feature called
> MPX Address-Width Adjust (MAWA) that grows the bounds tables so they
> can address the new address space.  MAWA is controlled independently
> from the paging mode (via an MSR) so that old MPX binaries can run on
> new hardware and kernels supporting 5-level paging.
> 
> But, since userspace is responsible for allocating the table that is
> growing (the directory), we need to ensure that userspace and the
> kernel agree about the size of these tables and the kernel can set the
> MSR appropriately.
> 
> These are not quite ready to get applied anywhere, but I don't expect
> the basics to change unless folks have big problems with this.  The
> only big remaining piece of work is to update the MPX selftest code.
> 
> Dave Hansen (4):
>       x86, mpx: introduce per-mm MPX table size tracking
>       x86, mpx: update MPX to grok larger bounds tables
>       x86, mpx: extend MPX prctl() to pass in size of bounds directory
>       x86, mpx: context-switch new MPX address size MSR

On a related note, the MPX testcases seem to have gone from the 
tools/testing/selftests/x86/Makefile (possibly a merge mishap - the original 
commit adds it correctly), so they are not being built.

Plus I noticed that the pkeys testcases are producing a lot of noise:

triton:~/tip/tools/testing/selftests/x86> make
[...]
gcc -m64 -o protection_keys_64 -O2 -g -std=gnu99 -pthread -Wall  protection_keys.c -lrt -ldl
protection_keys.c: In function a??setup_hugetlbfsa??:
protection_keys.c:816:6: warning: unused variable a??ia?? [-Wunused-variable]
  int i;
      ^
protection_keys.c:815:6: warning: unused variable a??validated_nr_pagesa?? [-Wunused-variable]
  int validated_nr_pages;
      ^
protection_keys.c: In function a??test_pkey_syscalls_bad_argsa??:
protection_keys.c:1136:6: warning: unused variable a??bad_flaga?? [-Wunused-variable]
  int bad_flag = (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + 1;
      ^
protection_keys.c: In function a??test_pkey_alloc_exhausta??:
protection_keys.c:1153:16: warning: unused variable a??init_vala?? [-Wunused-variable]
  unsigned long init_val;
                ^
protection_keys.c:1152:16: warning: unused variable a??flagsa?? [-Wunused-variable]
  unsigned long flags;
                ^
In file included from protection_keys.c:45:0:
pkey-helpers.h: In function a??sigsafe_printfa??:
pkey-helpers.h:41:3: warning: ignoring return value of a??writea??, declared with attribute warn_unused_result [-Wunused-result]
   write(1, dprint_in_signal_buffer, len);
   ^
protection_keys.c: In function a??dumpita??:
protection_keys.c:407:3: warning: ignoring return value of a??writea??, declared with attribute warn_unused_result [-Wunused-result]
   write(1, buf, nr_read);
   ^
protection_keys.c: In function a??pkey_disable_seta??:
protection_keys.c:68:5: warning: a??orig_pkrua?? may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (!(condition)) {   \
     ^
protection_keys.c:465:6: note: a??orig_pkrua?? was declared here
  u32 orig_pkru;
      ^
[...]

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking
  2017-01-26 22:40   ` Dave Hansen
@ 2017-01-27  8:26     ` Ingo Molnar
  -1 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> Larger address spaces mean larger MPX bounds table sizes.  This
> tracks which size tables we are using.
> 
> "MAWA" is what the hardware documentation calls this feature:
> MPX Address-Width Adjust.  We will carry that nomenclature throughout
> this series.
> 
> The new field will be optimized and get packed into 'bd_addr' in a later
> patch.  But, leave it separate for now to make the series simpler.
> 
> ---
> 
>  b/arch/x86/include/asm/mmu.h |    1 +
>  b/arch/x86/include/asm/mpx.h |    9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h
> --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.643673297 -0800
> +++ b/arch/x86/include/asm/mmu.h	2017-01-26 14:31:32.647673476 -0800
> @@ -34,6 +34,7 @@ typedef struct {
>  #ifdef CONFIG_X86_INTEL_MPX
>  	/* address of the bounds directory */
>  	void __user *bd_addr;
> +	int mpx_mawa;

-ENOCOMMENT.

Plus 'int' looks probably wrong, unless the hardware really wants signed shift 
values. (whatever 'mpx_mawa' is.)

Plus, while Intel is free to use sucky acronyms such as MAWA, could we please name 
this and related functionality sensibly: mpx_table_size or mpx_table_shift or 
such? The data structure comment can point out that Intel calls this 'MAWA'.

(Also, the changelog refers to a later change, which never happens in this 
series.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking
@ 2017-01-27  8:26     ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, linux-mm, x86


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> Larger address spaces mean larger MPX bounds table sizes.  This
> tracks which size tables we are using.
> 
> "MAWA" is what the hardware documentation calls this feature:
> MPX Address-Width Adjust.  We will carry that nomenclature throughout
> this series.
> 
> The new field will be optimized and get packed into 'bd_addr' in a later
> patch.  But, leave it separate for now to make the series simpler.
> 
> ---
> 
>  b/arch/x86/include/asm/mmu.h |    1 +
>  b/arch/x86/include/asm/mpx.h |    9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff -puN arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa arch/x86/include/asm/mmu.h
> --- a/arch/x86/include/asm/mmu.h~mawa-020-mmu_context-mawa	2017-01-26 14:31:32.643673297 -0800
> +++ b/arch/x86/include/asm/mmu.h	2017-01-26 14:31:32.647673476 -0800
> @@ -34,6 +34,7 @@ typedef struct {
>  #ifdef CONFIG_X86_INTEL_MPX
>  	/* address of the bounds directory */
>  	void __user *bd_addr;
> +	int mpx_mawa;

-ENOCOMMENT.

Plus 'int' looks probably wrong, unless the hardware really wants signed shift 
values. (whatever 'mpx_mawa' is.)

Plus, while Intel is free to use sucky acronyms such as MAWA, could we please name 
this and related functionality sensibly: mpx_table_size or mpx_table_shift or 
such? The data structure comment can point out that Intel calls this 'MAWA'.

(Also, the changelog refers to a later change, which never happens in this 
series.)

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR
  2017-01-26 22:40   ` Dave Hansen
@ 2017-01-27  8:31     ` Ingo Molnar
  -1 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, x86, Peter Zijlstra, Thomas Gleixner,
	H. Peter Anvin


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> + * The MPX tables change sizes based on the size of the virtual
> + * (aka. linear) address space.  There is an MSR to tell the CPU
> + * whether we want the legacy-style ones or the larger ones when
> + * we are running with an eXtended virtual address space.
> + */
> +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next)
> +{
> +	/*
> +	 * Note: there is one and only one bit in use in the MSR
> +	 * at this time, so we do not have to be concerned with
> +	 * preseving any of the other bits.  Just write 0 or 1.
> +	 */
> +	unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_MPX))
> +		return;
> +	/*
> +	 * FIXME: do we want a check here for the 5-level paging
> +	 * CR4 bit or CPUID bit, or is the mawa check below OK?
> +	 * It's not obvious what would be the fastest or if it
> +	 * matters.
> +	 */
> +
> +	/*
> +	 * Avoid the relatively costly MSR if we are not changing
> +	 * MAWA state.  All processes not using MPX will have a
> +	 * mpx_mawa_shift()=0, so we do not need to check
> +	 * separately for whether MPX management is enabled.
> +	 */
> +	if (mpx_mawa_shift(prev) == mpx_mawa_shift(next))
> +		return;

Please stop the senseless looking wrappery - if the field is name sensibly then it 
can be accessed directly through mm_struct.

> +
> +	if (mpx_mawa_shift(next)) {
> +		wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0);
> +	} else {
> +		/* clear the enable bit: */
> +		wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0);
> +	}
> +}
> +
>  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  			struct task_struct *tsk)
>  {
> @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct
>  		/* Load per-mm CR4 state */
>  		load_mm_cr4(next);
>  
> +		switch_mawa(prev, next);

This implementation adds about 4-5 unnecessary instructions to the context 
switching hot path of every non-MPX task, even on non-MPX hardware.

Please make sure that this is something like:

	if (unlikely(prev->mpx_msr_val != next->mpx_msr_val))
		switch_mpx(prev, next);

... which reduces the hot path overhead to something like 2 instruction (if we are 
lucky).

This can be put into switch_mpx() and can be inlined - just make sure that on a 
defconfig the generated machine code is sane.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR
@ 2017-01-27  8:31     ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2017-01-27  8:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, linux-mm, x86, Peter Zijlstra, Thomas Gleixner,
	H. Peter Anvin


* Dave Hansen <dave.hansen@linux.intel.com> wrote:

> + * The MPX tables change sizes based on the size of the virtual
> + * (aka. linear) address space.  There is an MSR to tell the CPU
> + * whether we want the legacy-style ones or the larger ones when
> + * we are running with an eXtended virtual address space.
> + */
> +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next)
> +{
> +	/*
> +	 * Note: there is one and only one bit in use in the MSR
> +	 * at this time, so we do not have to be concerned with
> +	 * preseving any of the other bits.  Just write 0 or 1.
> +	 */
> +	unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_MPX))
> +		return;
> +	/*
> +	 * FIXME: do we want a check here for the 5-level paging
> +	 * CR4 bit or CPUID bit, or is the mawa check below OK?
> +	 * It's not obvious what would be the fastest or if it
> +	 * matters.
> +	 */
> +
> +	/*
> +	 * Avoid the relatively costly MSR if we are not changing
> +	 * MAWA state.  All processes not using MPX will have a
> +	 * mpx_mawa_shift()=0, so we do not need to check
> +	 * separately for whether MPX management is enabled.
> +	 */
> +	if (mpx_mawa_shift(prev) == mpx_mawa_shift(next))
> +		return;

Please stop the senseless looking wrappery - if the field is name sensibly then it 
can be accessed directly through mm_struct.

> +
> +	if (mpx_mawa_shift(next)) {
> +		wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0);
> +	} else {
> +		/* clear the enable bit: */
> +		wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0);
> +	}
> +}
> +
>  void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  			struct task_struct *tsk)
>  {
> @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct
>  		/* Load per-mm CR4 state */
>  		load_mm_cr4(next);
>  
> +		switch_mawa(prev, next);

This implementation adds about 4-5 unnecessary instructions to the context 
switching hot path of every non-MPX task, even on non-MPX hardware.

Please make sure that this is something like:

	if (unlikely(prev->mpx_msr_val != next->mpx_msr_val))
		switch_mpx(prev, next);

... which reduces the hot path overhead to something like 2 instruction (if we are 
lucky).

This can be put into switch_mpx() and can be inlined - just make sure that on a 
defconfig the generated machine code is sane.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-01-27  8:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 22:40 [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) Dave Hansen
2017-01-26 22:40 ` Dave Hansen
2017-01-26 22:40 ` [RFC][PATCH 1/4] x86, mpx: introduce per-mm MPX table size tracking Dave Hansen
2017-01-26 22:40   ` Dave Hansen
2017-01-27  8:26   ` Ingo Molnar
2017-01-27  8:26     ` Ingo Molnar
2017-01-26 22:40 ` [RFC][PATCH 2/4] x86, mpx: update MPX to grok larger bounds tables Dave Hansen
2017-01-26 22:40   ` Dave Hansen
2017-01-26 22:40 ` [RFC][PATCH 3/4] x86, mpx: extend MPX prctl() to pass in size of bounds directory Dave Hansen
2017-01-26 22:40   ` Dave Hansen
2017-01-26 22:40 ` [RFC][PATCH 4/4] x86, mpx: context-switch new MPX address size MSR Dave Hansen
2017-01-26 22:40   ` Dave Hansen
2017-01-27  8:31   ` Ingo Molnar
2017-01-27  8:31     ` Ingo Molnar
2017-01-27  8:16 ` [RFC][PATCH 0/4] x86, mpx: Support larger address space (MAWA) Ingo Molnar
2017-01-27  8:16   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.