All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/10] Intel MPX support
@ 2014-09-11  8:46 ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patchset adds support for the Memory Protection Extensions
(MPX) feature found in future Intel processors.

MPX can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions
are usurped at runtime due to buffer overflow or underflow.

MPX provides this capability at very low performance overhead for
newly compiled code, and provides compatibility mechanisms with legacy
software components. MPX architecture is designed allow a machine to
run both MPX enabled software and legacy software that is MPX unaware.
In such a case, the legacy software does not benefit from MPX, but it
also does not experience any change in functionality or reduction in
performance.

More information about Intel MPX can be found in "Intel(R) Architecture
Instruction Set Extensions Programming Reference".

To get the advantage of MPX, changes are required in the OS kernel,
binutils, compiler, system libraries support.

New GCC option -fmpx is introduced to utilize MPX instructions.
Currently GCC compiler sources with MPX support is available in a
separate branch in common GCC SVN repository. See GCC SVN page
(http://gcc.gnu.org/svn.html) for details.

To have the full protection, we had to add MPX instrumentation to all
the necessary Glibc routines (e.g. memcpy) written on assembler, and
compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
Glibc source can be found in Glibc git repository.

Enabling an application to use MPX will generally not require source
code updates but there is some runtime code, which is responsible for
configuring and enabling MPX, needed in order to make use of MPX.
For most applications this runtime support will be available by linking
to a library supplied by the compiler or possibly it will come directly
from the OS once OS versions that support MPX are available.

MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
provide handlers for bounds faults (#BR), and manage bounds memory.

The high-level areas modified in the patchset are as follow:
1) struct siginfo is extended to include bound violation information.
2) two prctl() commands are added to do performance optimization.

Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) software Development Emulator) instead,
which can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator

This patchset has been tested on real internal hardware platform at Intel.
We have some simple unit tests in user space, which directly call MPX
instructions to produce #BR to let kernel allocate bounds tables and
cause bounds violations. We also compiled several benchmarks with an
MPX-enabled Gcc/Glibc and ICC, an ran them with this patch set.
We found a number of bugs in this code in these tests.

Future TODO items:
1) support 32-bit binaries on 64-bit kernels.

Changes since v1:
  * check to see if #BR occurred in userspace or kernel space.
  * use generic structure and macro as much as possible when
    decode mpx instructions.

Changes since v2:
  * fix some compile warnings.
  * update documentation.

Changes since v3:
  * correct some syntax errors at documentation, and document
    extended struct siginfo.
  * for kill the process when the error code of BNDSTATUS is 3.
  * add some comments.
  * remove new prctl() commands.
  * fix some compile warnings for 32-bit.

Changes since v4:
  * raise SIGBUS if the allocations of the bound tables fail.

Changes since v5:
  * hook unmap() path to cleanup unused bounds tables, and use
    new prctl() command to register bounds directory address to
    struct mm_struct to check whether one process is MPX enabled
    during unmap().
  * in order track precisely MPX memory usage, add MPX specific
    mmap interface and one VM_MPX flag to check whether a VMA
    is MPX bounds table.
  * add macro cpu_has_mpx to do performance optimization.
  * sync struct figinfo for mips with general version to avoid
    build issue.

Changes since v6:
  * because arch_vma_name is removed, this patchset have toset MPX
    specific ->vm_ops to do the same thing.
  * fix warnings for 32 bit arch.
  * add more description into these patches.

Changes since v7:
  * introduce VM_ARCH_2 flag. 
  * remove all of the pr_debug()s.
  * fix prctl numbers in documentation.
  * fix some bugs on bounds tables freeing.

Qiaowei Ren (10):
  x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: add MPX specific mmap interface
  x86, mpx: add macro cpu_has_mpx
  x86, mpx: hook #BR exception handler to allocate bound tables
  x86, mpx: extend siginfo structure to include bound violation
    information
  mips: sync struct siginfo with general version
  x86, mpx: decode MPX instruction to get bound violation information
  x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  x86, mpx: cleanup unused bound tables
  x86, mpx: add documentation on Intel MPX

 Documentation/x86/intel_mpx.txt      |  127 +++++++++++
 arch/mips/include/uapi/asm/siginfo.h |    4 +
 arch/x86/Kconfig                     |    4 +
 arch/x86/include/asm/cpufeature.h    |    6 +
 arch/x86/include/asm/mmu_context.h   |   16 ++
 arch/x86/include/asm/mpx.h           |   91 ++++++++
 arch/x86/include/asm/processor.h     |   18 ++
 arch/x86/kernel/Makefile             |    1 +
 arch/x86/kernel/mpx.c                |  412 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c              |   61 +++++-
 arch/x86/mm/Makefile                 |    2 +
 arch/x86/mm/mpx.c                    |  331 +++++++++++++++++++++++++++
 fs/proc/task_mmu.c                   |    1 +
 include/asm-generic/mmu_context.h    |    6 +
 include/linux/mm.h                   |    6 +
 include/linux/mm_types.h             |    3 +
 include/uapi/asm-generic/siginfo.h   |    9 +-
 include/uapi/linux/prctl.h           |    6 +
 kernel/signal.c                      |    4 +
 kernel/sys.c                         |   12 +
 mm/mmap.c                            |    2 +
 21 files changed, 1120 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/x86/intel_mpx.txt
 create mode 100644 arch/x86/include/asm/mpx.h
 create mode 100644 arch/x86/kernel/mpx.c
 create mode 100644 arch/x86/mm/mpx.c


^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH v8 00/10] Intel MPX support
@ 2014-09-11  8:46 ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patchset adds support for the Memory Protection Extensions
(MPX) feature found in future Intel processors.

MPX can be used in conjunction with compiler changes to check memory
references, for those references whose compile-time normal intentions
are usurped at runtime due to buffer overflow or underflow.

MPX provides this capability at very low performance overhead for
newly compiled code, and provides compatibility mechanisms with legacy
software components. MPX architecture is designed allow a machine to
run both MPX enabled software and legacy software that is MPX unaware.
In such a case, the legacy software does not benefit from MPX, but it
also does not experience any change in functionality or reduction in
performance.

More information about Intel MPX can be found in "Intel(R) Architecture
Instruction Set Extensions Programming Reference".

To get the advantage of MPX, changes are required in the OS kernel,
binutils, compiler, system libraries support.

New GCC option -fmpx is introduced to utilize MPX instructions.
Currently GCC compiler sources with MPX support is available in a
separate branch in common GCC SVN repository. See GCC SVN page
(http://gcc.gnu.org/svn.html) for details.

To have the full protection, we had to add MPX instrumentation to all
the necessary Glibc routines (e.g. memcpy) written on assembler, and
compile Glibc with the MPX enabled GCC compiler. Currently MPX enabled
Glibc source can be found in Glibc git repository.

Enabling an application to use MPX will generally not require source
code updates but there is some runtime code, which is responsible for
configuring and enabling MPX, needed in order to make use of MPX.
For most applications this runtime support will be available by linking
to a library supplied by the compiler or possibly it will come directly
from the OS once OS versions that support MPX are available.

MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
provide handlers for bounds faults (#BR), and manage bounds memory.

The high-level areas modified in the patchset are as follow:
1) struct siginfo is extended to include bound violation information.
2) two prctl() commands are added to do performance optimization.

Currently no hardware with MPX ISA is available but it is always
possible to use SDE (Intel(R) software Development Emulator) instead,
which can be downloaded from
http://software.intel.com/en-us/articles/intel-software-development-emulator

This patchset has been tested on real internal hardware platform at Intel.
We have some simple unit tests in user space, which directly call MPX
instructions to produce #BR to let kernel allocate bounds tables and
cause bounds violations. We also compiled several benchmarks with an
MPX-enabled Gcc/Glibc and ICC, an ran them with this patch set.
We found a number of bugs in this code in these tests.

Future TODO items:
1) support 32-bit binaries on 64-bit kernels.

Changes since v1:
  * check to see if #BR occurred in userspace or kernel space.
  * use generic structure and macro as much as possible when
    decode mpx instructions.

Changes since v2:
  * fix some compile warnings.
  * update documentation.

Changes since v3:
  * correct some syntax errors at documentation, and document
    extended struct siginfo.
  * for kill the process when the error code of BNDSTATUS is 3.
  * add some comments.
  * remove new prctl() commands.
  * fix some compile warnings for 32-bit.

Changes since v4:
  * raise SIGBUS if the allocations of the bound tables fail.

Changes since v5:
  * hook unmap() path to cleanup unused bounds tables, and use
    new prctl() command to register bounds directory address to
    struct mm_struct to check whether one process is MPX enabled
    during unmap().
  * in order track precisely MPX memory usage, add MPX specific
    mmap interface and one VM_MPX flag to check whether a VMA
    is MPX bounds table.
  * add macro cpu_has_mpx to do performance optimization.
  * sync struct figinfo for mips with general version to avoid
    build issue.

Changes since v6:
  * because arch_vma_name is removed, this patchset have toset MPX
    specific ->vm_ops to do the same thing.
  * fix warnings for 32 bit arch.
  * add more description into these patches.

Changes since v7:
  * introduce VM_ARCH_2 flag. 
  * remove all of the pr_debug()s.
  * fix prctl numbers in documentation.
  * fix some bugs on bounds tables freeing.

Qiaowei Ren (10):
  x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: add MPX specific mmap interface
  x86, mpx: add macro cpu_has_mpx
  x86, mpx: hook #BR exception handler to allocate bound tables
  x86, mpx: extend siginfo structure to include bound violation
    information
  mips: sync struct siginfo with general version
  x86, mpx: decode MPX instruction to get bound violation information
  x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  x86, mpx: cleanup unused bound tables
  x86, mpx: add documentation on Intel MPX

 Documentation/x86/intel_mpx.txt      |  127 +++++++++++
 arch/mips/include/uapi/asm/siginfo.h |    4 +
 arch/x86/Kconfig                     |    4 +
 arch/x86/include/asm/cpufeature.h    |    6 +
 arch/x86/include/asm/mmu_context.h   |   16 ++
 arch/x86/include/asm/mpx.h           |   91 ++++++++
 arch/x86/include/asm/processor.h     |   18 ++
 arch/x86/kernel/Makefile             |    1 +
 arch/x86/kernel/mpx.c                |  412 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c              |   61 +++++-
 arch/x86/mm/Makefile                 |    2 +
 arch/x86/mm/mpx.c                    |  331 +++++++++++++++++++++++++++
 fs/proc/task_mmu.c                   |    1 +
 include/asm-generic/mmu_context.h    |    6 +
 include/linux/mm.h                   |    6 +
 include/linux/mm_types.h             |    3 +
 include/uapi/asm-generic/siginfo.h   |    9 +-
 include/uapi/linux/prctl.h           |    6 +
 kernel/signal.c                      |    4 +
 kernel/sys.c                         |   12 +
 mm/mmap.c                            |    2 +
 21 files changed, 1120 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/x86/intel_mpx.txt
 create mode 100644 arch/x86/include/asm/mpx.h
 create mode 100644 arch/x86/kernel/mpx.c
 create mode 100644 arch/x86/mm/mpx.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH v8 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

MPX-enabled application will possibly create a lot of bounds tables
in process address space to save bounds information. These tables
can take up huge swaths of memory (as much as 80% of the memory on
the system) even if we clean them up aggressively. Being this huge,
we need a way to track their memory use. If we want to track them,
we essentially have two options:

1. walk the multi-GB (in virtual space) bounds directory to locate
   all the VMAs and walk them
2. Find a way to distinguish MPX bounds-table VMAs from normal
   anonymous VMAs and use some existing mechanism to walk them

We expect (1) will be prohibitively expensive. For (2), we only
need a single bit, and we've chosen to use a VM_ flag.  We understand
that they are scarce and are open to other options.

There is one potential hybrid approach: check the bounds directory
entry for any anonymous VMA that could possibly contain a bounds table.
This is less expensive than (1), but still requires reading a pointer
out of userspace for every VMA that we iterate over.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 fs/proc/task_mmu.c |    1 +
 include/linux/mm.h |    6 ++++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..cc31520 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_GROWSDOWN)]	= "gd",
 		[ilog2(VM_PFNMAP)]	= "pf",
 		[ilog2(VM_DENYWRITE)]	= "dw",
+		[ilog2(VM_MPX)]		= "mp",
 		[ilog2(VM_LOCKED)]	= "lo",
 		[ilog2(VM_IO)]		= "io",
 		[ilog2(VM_SEQ_READ)]	= "sr",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc8..942be8a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
+#define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
@@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
+#if defined(CONFIG_X86)
+/* MPX specific bounds table or bounds directory */
+# define VM_MPX		VM_ARCH_2
+#endif
+
 #ifndef VM_GROWSUP
 # define VM_GROWSUP	VM_NONE
 #endif
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

MPX-enabled application will possibly create a lot of bounds tables
in process address space to save bounds information. These tables
can take up huge swaths of memory (as much as 80% of the memory on
the system) even if we clean them up aggressively. Being this huge,
we need a way to track their memory use. If we want to track them,
we essentially have two options:

1. walk the multi-GB (in virtual space) bounds directory to locate
   all the VMAs and walk them
2. Find a way to distinguish MPX bounds-table VMAs from normal
   anonymous VMAs and use some existing mechanism to walk them

We expect (1) will be prohibitively expensive. For (2), we only
need a single bit, and we've chosen to use a VM_ flag.  We understand
that they are scarce and are open to other options.

There is one potential hybrid approach: check the bounds directory
entry for any anonymous VMA that could possibly contain a bounds table.
This is less expensive than (1), but still requires reading a pointer
out of userspace for every VMA that we iterate over.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 fs/proc/task_mmu.c |    1 +
 include/linux/mm.h |    6 ++++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..cc31520 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -549,6 +549,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_GROWSDOWN)]	= "gd",
 		[ilog2(VM_PFNMAP)]	= "pf",
 		[ilog2(VM_DENYWRITE)]	= "dw",
+		[ilog2(VM_MPX)]		= "mp",
 		[ilog2(VM_LOCKED)]	= "lo",
 		[ilog2(VM_IO)]		= "io",
 		[ilog2(VM_SEQ_READ)]	= "sr",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc8..942be8a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -127,6 +127,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
+#define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
@@ -154,6 +155,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
+#if defined(CONFIG_X86)
+/* MPX specific bounds table or bounds directory */
+# define VM_MPX		VM_ARCH_2
+#endif
+
 #ifndef VM_GROWSUP
 # define VM_GROWSUP	VM_NONE
 #endif
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 02/10] x86, mpx: add MPX specific mmap interface
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds one MPX specific mmap interface, which only handles
mpx related maps, including bounds table and bounds directory.

In order to track MPX specific memory usage, this interface is added
to stick new vm_flag VM_MPX in the vma_area_struct when create a
bounds table or bounds directory.

These bounds tables can take huge amounts of memory.  In the
worst-case scenario, the tables can be 4x the size of the data
structure being tracked. IOW, a 1-page structure can require 4
bounds-table pages.

My expectation is that folks using MPX are going to be keen on
figuring out how much memory is being dedicated to it. With this
feature, plus some grepping in /proc/$pid/smaps one could take a
pretty good stab at it.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/Kconfig           |    4 ++
 arch/x86/include/asm/mpx.h |   38 +++++++++++++++++++++
 arch/x86/mm/Makefile       |    2 +
 arch/x86/mm/mpx.c          |   79 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/mpx.h
 create mode 100644 arch/x86/mm/mpx.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 778178f..935aa69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -243,6 +243,10 @@ config HAVE_INTEL_TXT
 	def_bool y
 	depends on INTEL_IOMMU && ACPI
 
+config X86_INTEL_MPX
+	def_bool y
+	depends on CPU_SUP_INTEL
+
 config X86_32_SMP
 	def_bool y
 	depends on X86_32 && SMP
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..5725ac4
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,38 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+/* upper 28 bits [47:20] of the virtual address in 64-bit used to
+ * index into bounds directory (BD).
+ */
+#define MPX_BD_ENTRY_OFFSET	28
+#define MPX_BD_ENTRY_SHIFT	3
+/* bits [19:3] of the virtual address in 64-bit used to index into
+ * bounds table (BT).
+ */
+#define MPX_BT_ENTRY_OFFSET	17
+#define MPX_BT_ENTRY_SHIFT	5
+#define MPX_IGN_BITS		3
+
+#else
+
+#define MPX_BD_ENTRY_OFFSET	20
+#define MPX_BD_ENTRY_SHIFT	2
+#define MPX_BT_ENTRY_OFFSET	10
+#define MPX_BT_ENTRY_SHIFT	4
+#define MPX_IGN_BITS		2
+
+#endif
+
+#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
+#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
+
+#define MPX_BNDSTA_ERROR_CODE	0x3
+
+unsigned long mpx_mmap(unsigned long len);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 6a19ad9..ecfdc46 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_MEMTEST)		+= memtest.o
+
+obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
new file mode 100644
index 0000000..e1b28e6
--- /dev/null
+++ b/arch/x86/mm/mpx.c
@@ -0,0 +1,79 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <linux/sched/sysctl.h>
+
+static const char *mpx_mapping_name(struct vm_area_struct *vma)
+{
+	return "[mpx]";
+}
+
+static struct vm_operations_struct mpx_vma_ops = {
+	.name = mpx_mapping_name,
+};
+
+/*
+ * this is really a simplified "vm_mmap". it only handles mpx
+ * related maps, including bounds table and bounds directory.
+ *
+ * here we can stick new vm_flag VM_MPX in the vma_area_struct
+ * when create a bounds table or bounds directory, in order to
+ * track MPX specific memory.
+ */
+unsigned long mpx_mmap(unsigned long len)
+{
+	unsigned long ret;
+	unsigned long addr, pgoff;
+	struct mm_struct *mm = current->mm;
+	vm_flags_t vm_flags;
+	struct vm_area_struct *vma;
+
+	/* Only bounds table and bounds directory can be allocated here */
+	if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES)
+		return -EINVAL;
+
+	down_write(&mm->mmap_sem);
+
+	/* Too many mappings? */
+	if (mm->map_count > sysctl_max_map_count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Obtain the address to map to. we verify (or select) it and ensure
+	 * that it represents a valid section of the address space.
+	 */
+	addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE);
+	if (addr & ~PAGE_MASK) {
+		ret = addr;
+		goto out;
+	}
+
+	vm_flags = VM_READ | VM_WRITE | VM_MPX |
+			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+
+	/* Set pgoff according to addr for anon_vma */
+	pgoff = addr >> PAGE_SHIFT;
+
+	ret = mmap_region(NULL, addr, len, vm_flags, pgoff);
+	if (IS_ERR_VALUE(ret))
+		goto out;
+
+	vma = find_vma(mm, ret);
+	if (!vma) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	vma->vm_ops = &mpx_vma_ops;
+
+	if (vm_flags & VM_LOCKED) {
+		up_write(&mm->mmap_sem);
+		mm_populate(ret, len);
+		return ret;
+	}
+
+out:
+	up_write(&mm->mmap_sem);
+	return ret;
+}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 02/10] x86, mpx: add MPX specific mmap interface
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds one MPX specific mmap interface, which only handles
mpx related maps, including bounds table and bounds directory.

In order to track MPX specific memory usage, this interface is added
to stick new vm_flag VM_MPX in the vma_area_struct when create a
bounds table or bounds directory.

These bounds tables can take huge amounts of memory.  In the
worst-case scenario, the tables can be 4x the size of the data
structure being tracked. IOW, a 1-page structure can require 4
bounds-table pages.

My expectation is that folks using MPX are going to be keen on
figuring out how much memory is being dedicated to it. With this
feature, plus some grepping in /proc/$pid/smaps one could take a
pretty good stab at it.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/Kconfig           |    4 ++
 arch/x86/include/asm/mpx.h |   38 +++++++++++++++++++++
 arch/x86/mm/Makefile       |    2 +
 arch/x86/mm/mpx.c          |   79 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/mpx.h
 create mode 100644 arch/x86/mm/mpx.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 778178f..935aa69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -243,6 +243,10 @@ config HAVE_INTEL_TXT
 	def_bool y
 	depends on INTEL_IOMMU && ACPI
 
+config X86_INTEL_MPX
+	def_bool y
+	depends on CPU_SUP_INTEL
+
 config X86_32_SMP
 	def_bool y
 	depends on X86_32 && SMP
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
new file mode 100644
index 0000000..5725ac4
--- /dev/null
+++ b/arch/x86/include/asm/mpx.h
@@ -0,0 +1,38 @@
+#ifndef _ASM_X86_MPX_H
+#define _ASM_X86_MPX_H
+
+#include <linux/types.h>
+#include <asm/ptrace.h>
+
+#ifdef CONFIG_X86_64
+
+/* upper 28 bits [47:20] of the virtual address in 64-bit used to
+ * index into bounds directory (BD).
+ */
+#define MPX_BD_ENTRY_OFFSET	28
+#define MPX_BD_ENTRY_SHIFT	3
+/* bits [19:3] of the virtual address in 64-bit used to index into
+ * bounds table (BT).
+ */
+#define MPX_BT_ENTRY_OFFSET	17
+#define MPX_BT_ENTRY_SHIFT	5
+#define MPX_IGN_BITS		3
+
+#else
+
+#define MPX_BD_ENTRY_OFFSET	20
+#define MPX_BD_ENTRY_SHIFT	2
+#define MPX_BT_ENTRY_OFFSET	10
+#define MPX_BT_ENTRY_SHIFT	4
+#define MPX_IGN_BITS		2
+
+#endif
+
+#define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
+#define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
+
+#define MPX_BNDSTA_ERROR_CODE	0x3
+
+unsigned long mpx_mmap(unsigned long len);
+
+#endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 6a19ad9..ecfdc46 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -30,3 +30,5 @@ obj-$(CONFIG_ACPI_NUMA)		+= srat.o
 obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 
 obj-$(CONFIG_MEMTEST)		+= memtest.o
+
+obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
new file mode 100644
index 0000000..e1b28e6
--- /dev/null
+++ b/arch/x86/mm/mpx.c
@@ -0,0 +1,79 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+#include <asm/mman.h>
+#include <linux/sched/sysctl.h>
+
+static const char *mpx_mapping_name(struct vm_area_struct *vma)
+{
+	return "[mpx]";
+}
+
+static struct vm_operations_struct mpx_vma_ops = {
+	.name = mpx_mapping_name,
+};
+
+/*
+ * this is really a simplified "vm_mmap". it only handles mpx
+ * related maps, including bounds table and bounds directory.
+ *
+ * here we can stick new vm_flag VM_MPX in the vma_area_struct
+ * when create a bounds table or bounds directory, in order to
+ * track MPX specific memory.
+ */
+unsigned long mpx_mmap(unsigned long len)
+{
+	unsigned long ret;
+	unsigned long addr, pgoff;
+	struct mm_struct *mm = current->mm;
+	vm_flags_t vm_flags;
+	struct vm_area_struct *vma;
+
+	/* Only bounds table and bounds directory can be allocated here */
+	if (len != MPX_BD_SIZE_BYTES && len != MPX_BT_SIZE_BYTES)
+		return -EINVAL;
+
+	down_write(&mm->mmap_sem);
+
+	/* Too many mappings? */
+	if (mm->map_count > sysctl_max_map_count) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Obtain the address to map to. we verify (or select) it and ensure
+	 * that it represents a valid section of the address space.
+	 */
+	addr = get_unmapped_area(NULL, 0, len, 0, MAP_ANONYMOUS | MAP_PRIVATE);
+	if (addr & ~PAGE_MASK) {
+		ret = addr;
+		goto out;
+	}
+
+	vm_flags = VM_READ | VM_WRITE | VM_MPX |
+			mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+
+	/* Set pgoff according to addr for anon_vma */
+	pgoff = addr >> PAGE_SHIFT;
+
+	ret = mmap_region(NULL, addr, len, vm_flags, pgoff);
+	if (IS_ERR_VALUE(ret))
+		goto out;
+
+	vma = find_vma(mm, ret);
+	if (!vma) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	vma->vm_ops = &mpx_vma_ops;
+
+	if (vm_flags & VM_LOCKED) {
+		up_write(&mm->mmap_sem);
+		mm_populate(ret, len);
+		return ret;
+	}
+
+out:
+	up_write(&mm->mmap_sem);
+	return ret;
+}
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 03/10] x86, mpx: add macro cpu_has_mpx
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

In order to do performance optimization, this patch adds macro
cpu_has_mpx which will directly return 0 when MPX is not supported
by kernel.

Community gave a lot of comments on this macro cpu_has_mpx in previous
version. Dave will introduce a patchset about disabled features to fix
it later.

In this code:
        if (cpu_has_mpx)
                do_some_mpx_thing();

The patch series from Dave will introduce a new macro cpu_feature_enabled()
(if merged after this patchset) to replace the cpu_has_mpx.
        if (cpu_feature_enabled(X86_FEATURE_MPX))
                do_some_mpx_thing();

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/cpufeature.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index bb9b258..82ec7ed 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -353,6 +353,12 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
 #define cpu_has_topoext		boot_cpu_has(X86_FEATURE_TOPOEXT)
 
+#ifdef CONFIG_X86_INTEL_MPX
+#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)
+#else
+#define cpu_has_mpx 0
+#endif /* CONFIG_X86_INTEL_MPX */
+
 #ifdef CONFIG_X86_64
 
 #undef  cpu_has_vme
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 03/10] x86, mpx: add macro cpu_has_mpx
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

In order to do performance optimization, this patch adds macro
cpu_has_mpx which will directly return 0 when MPX is not supported
by kernel.

Community gave a lot of comments on this macro cpu_has_mpx in previous
version. Dave will introduce a patchset about disabled features to fix
it later.

In this code:
        if (cpu_has_mpx)
                do_some_mpx_thing();

The patch series from Dave will introduce a new macro cpu_feature_enabled()
(if merged after this patchset) to replace the cpu_has_mpx.
        if (cpu_feature_enabled(X86_FEATURE_MPX))
                do_some_mpx_thing();

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/cpufeature.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index bb9b258..82ec7ed 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -353,6 +353,12 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 #define cpu_has_eager_fpu	boot_cpu_has(X86_FEATURE_EAGER_FPU)
 #define cpu_has_topoext		boot_cpu_has(X86_FEATURE_TOPOEXT)
 
+#ifdef CONFIG_X86_INTEL_MPX
+#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)
+#else
+#define cpu_has_mpx 0
+#endif /* CONFIG_X86_INTEL_MPX */
+
 #ifdef CONFIG_X86_64
 
 #undef  cpu_has_vme
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch handles a #BR exception for non-existent tables by
carving the space out of the normal processes address space
(essentially calling mmap() from inside the kernel) and then
pointing the bounds-directory over to it.

The tables need to be accessed and controlled by userspace
because the compiler generates instructions for MPX-enabled
code which frequently store and retrieve entries from the bounds
tables. Any direct kernel involvement (like a syscall) to access
the tables would destroy performance since these are so frequent.

The tables are carved out of userspace because we have no better
spot to put them. For each pointer which is being tracked by MPX,
the bounds tables contain 4 longs worth of data, and the tables
are indexed virtually. If we were to preallocate the tables, we
would theoretically need to allocate 4x the virtual space that
we have available for userspace somewhere else. We don't have
that room in the kernel address space.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h |   20 +++++++++++++++
 arch/x86/kernel/Makefile   |    1 +
 arch/x86/kernel/mpx.c      |   58 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c    |   55 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 133 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kernel/mpx.c

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 5725ac4..b7598ac 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -18,6 +18,8 @@
 #define MPX_BT_ENTRY_SHIFT	5
 #define MPX_IGN_BITS		3
 
+#define MPX_BD_ENTRY_TAIL	3
+
 #else
 
 #define MPX_BD_ENTRY_OFFSET	20
@@ -26,13 +28,31 @@
 #define MPX_BT_ENTRY_SHIFT	4
 #define MPX_IGN_BITS		2
 
+#define MPX_BD_ENTRY_TAIL	2
+
 #endif
 
+#define MPX_BNDSTA_TAIL		2
+#define MPX_BNDCFG_TAIL		12
+#define MPX_BNDSTA_ADDR_MASK	(~((1UL<<MPX_BNDSTA_TAIL)-1))
+#define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
+#define MPX_BT_ADDR_MASK	(~((1UL<<MPX_BD_ENTRY_TAIL)-1))
+
 #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
 #define MPX_BNDSTA_ERROR_CODE	0x3
+#define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 unsigned long mpx_mmap(unsigned long len);
 
+#ifdef CONFIG_X86_INTEL_MPX
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+#else
+static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ada2e2d..9ece662 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT)	+= preempt.o
 
 obj-y				+= process.o
 obj-y				+= i387.o xsave.o
+obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-y				+= ptrace.o
 obj-$(CONFIG_X86_32)		+= tls.o
 obj-$(CONFIG_IA32_EMULATION)	+= tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..88d660f
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,58 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+
+static int allocate_bt(long __user *bd_entry)
+{
+	unsigned long bt_addr, old_val = 0;
+	int ret = 0;
+
+	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
+	if (IS_ERR((void *)bt_addr))
+		return bt_addr;
+	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG;
+
+	ret = user_atomic_cmpxchg_inatomic(&old_val, bd_entry, 0, bt_addr);
+	if (ret)
+		goto out;
+
+	/*
+	 * there is a existing bounds table pointed at this bounds
+	 * directory entry, and so we need to free the bounds table
+	 * allocated just now.
+	 */
+	if (old_val)
+		goto out;
+
+	return 0;
+
+out:
+	vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+	return ret;
+}
+
+/*
+ * When a BNDSTX instruction attempts to save bounds to a BD entry
+ * with the lack of the valid bit being set, a #BR is generated.
+ * This is an indication that no BT exists for this entry. In this
+ * case the fault handler will allocate a new BT.
+ *
+ * With 32-bit mode, the size of BD is 4MB, and the size of each
+ * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
+ * and the size of each bound table is 4MB.
+ */
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+	unsigned long status;
+	unsigned long bd_entry, bd_base;
+
+	bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+	status = xsave_buf->bndcsr.status_reg;
+
+	bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+	if ((bd_entry < bd_base) ||
+		(bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
+		return -EINVAL;
+
+	return allocate_bt((long __user *)bd_entry);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0d0e922..396a88b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
 #include <asm/fixmap.h>
 #include <asm/mach_traps.h>
 #include <asm/alternative.h>
+#include <asm/mpx.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -228,7 +229,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code)	\
 
 DO_ERROR(X86_TRAP_DE,     SIGFPE,  "divide error",		divide_error)
 DO_ERROR(X86_TRAP_OF,     SIGSEGV, "overflow",			overflow)
-DO_ERROR(X86_TRAP_BR,     SIGSEGV, "bounds",			bounds)
 DO_ERROR(X86_TRAP_UD,     SIGILL,  "invalid opcode",		invalid_op)
 DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,  "coprocessor segment overrun",coprocessor_segment_overrun)
 DO_ERROR(X86_TRAP_TS,     SIGSEGV, "invalid TSS",		invalid_TSS)
@@ -278,6 +278,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 }
 #endif
 
+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+	enum ctx_state prev_state;
+	unsigned long status;
+	struct xsave_struct *xsave_buf;
+	struct task_struct *tsk = current;
+
+	prev_state = exception_enter();
+	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+			X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+		goto exit;
+	conditional_sti(regs);
+
+	if (!user_mode(regs))
+		die("bounds", regs, error_code);
+
+	if (!cpu_has_mpx) {
+		/* The exception is not from Intel MPX */
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+		goto exit;
+	}
+
+	fpu_xsave(&tsk->thread.fpu);
+	xsave_buf = &(tsk->thread.fpu.state->xsave);
+	status = xsave_buf->bndcsr.status_reg;
+
+	/*
+	 * The error code field of the BNDSTATUS register communicates status
+	 * information of a bound range exception #BR or operation involving
+	 * bound directory.
+	 */
+	switch (status & MPX_BNDSTA_ERROR_CODE) {
+	case 2:
+		/*
+		 * Bound directory has invalid entry.
+		 */
+		if (do_mpx_bt_fault(xsave_buf))
+			force_sig(SIGSEGV, tsk);
+		break;
+
+	case 1: /* Bound violation. */
+	case 0: /* No exception caused by Intel MPX operations. */
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+		break;
+
+	default:
+		die("bounds", regs, error_code);
+	}
+
+exit:
+	exception_exit(prev_state);
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch handles a #BR exception for non-existent tables by
carving the space out of the normal processes address space
(essentially calling mmap() from inside the kernel) and then
pointing the bounds-directory over to it.

The tables need to be accessed and controlled by userspace
because the compiler generates instructions for MPX-enabled
code which frequently store and retrieve entries from the bounds
tables. Any direct kernel involvement (like a syscall) to access
the tables would destroy performance since these are so frequent.

The tables are carved out of userspace because we have no better
spot to put them. For each pointer which is being tracked by MPX,
the bounds tables contain 4 longs worth of data, and the tables
are indexed virtually. If we were to preallocate the tables, we
would theoretically need to allocate 4x the virtual space that
we have available for userspace somewhere else. We don't have
that room in the kernel address space.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h |   20 +++++++++++++++
 arch/x86/kernel/Makefile   |    1 +
 arch/x86/kernel/mpx.c      |   58 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c    |   55 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 133 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kernel/mpx.c

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 5725ac4..b7598ac 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -18,6 +18,8 @@
 #define MPX_BT_ENTRY_SHIFT	5
 #define MPX_IGN_BITS		3
 
+#define MPX_BD_ENTRY_TAIL	3
+
 #else
 
 #define MPX_BD_ENTRY_OFFSET	20
@@ -26,13 +28,31 @@
 #define MPX_BT_ENTRY_SHIFT	4
 #define MPX_IGN_BITS		2
 
+#define MPX_BD_ENTRY_TAIL	2
+
 #endif
 
+#define MPX_BNDSTA_TAIL		2
+#define MPX_BNDCFG_TAIL		12
+#define MPX_BNDSTA_ADDR_MASK	(~((1UL<<MPX_BNDSTA_TAIL)-1))
+#define MPX_BNDCFG_ADDR_MASK	(~((1UL<<MPX_BNDCFG_TAIL)-1))
+#define MPX_BT_ADDR_MASK	(~((1UL<<MPX_BD_ENTRY_TAIL)-1))
+
 #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
 #define MPX_BNDSTA_ERROR_CODE	0x3
+#define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 unsigned long mpx_mmap(unsigned long len);
 
+#ifdef CONFIG_X86_INTEL_MPX
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+#else
+static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index ada2e2d..9ece662 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_PREEMPT)	+= preempt.o
 
 obj-y				+= process.o
 obj-y				+= i387.o xsave.o
+obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-y				+= ptrace.o
 obj-$(CONFIG_X86_32)		+= tls.o
 obj-$(CONFIG_IA32_EMULATION)	+= tls.o
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
new file mode 100644
index 0000000..88d660f
--- /dev/null
+++ b/arch/x86/kernel/mpx.c
@@ -0,0 +1,58 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <asm/mpx.h>
+
+static int allocate_bt(long __user *bd_entry)
+{
+	unsigned long bt_addr, old_val = 0;
+	int ret = 0;
+
+	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
+	if (IS_ERR((void *)bt_addr))
+		return bt_addr;
+	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG;
+
+	ret = user_atomic_cmpxchg_inatomic(&old_val, bd_entry, 0, bt_addr);
+	if (ret)
+		goto out;
+
+	/*
+	 * there is a existing bounds table pointed at this bounds
+	 * directory entry, and so we need to free the bounds table
+	 * allocated just now.
+	 */
+	if (old_val)
+		goto out;
+
+	return 0;
+
+out:
+	vm_munmap(bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+	return ret;
+}
+
+/*
+ * When a BNDSTX instruction attempts to save bounds to a BD entry
+ * with the lack of the valid bit being set, a #BR is generated.
+ * This is an indication that no BT exists for this entry. In this
+ * case the fault handler will allocate a new BT.
+ *
+ * With 32-bit mode, the size of BD is 4MB, and the size of each
+ * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
+ * and the size of each bound table is 4MB.
+ */
+int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
+{
+	unsigned long status;
+	unsigned long bd_entry, bd_base;
+
+	bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
+	status = xsave_buf->bndcsr.status_reg;
+
+	bd_entry = status & MPX_BNDSTA_ADDR_MASK;
+	if ((bd_entry < bd_base) ||
+		(bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
+		return -EINVAL;
+
+	return allocate_bt((long __user *)bd_entry);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0d0e922..396a88b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -60,6 +60,7 @@
 #include <asm/fixmap.h>
 #include <asm/mach_traps.h>
 #include <asm/alternative.h>
+#include <asm/mpx.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -228,7 +229,6 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code)	\
 
 DO_ERROR(X86_TRAP_DE,     SIGFPE,  "divide error",		divide_error)
 DO_ERROR(X86_TRAP_OF,     SIGSEGV, "overflow",			overflow)
-DO_ERROR(X86_TRAP_BR,     SIGSEGV, "bounds",			bounds)
 DO_ERROR(X86_TRAP_UD,     SIGILL,  "invalid opcode",		invalid_op)
 DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,  "coprocessor segment overrun",coprocessor_segment_overrun)
 DO_ERROR(X86_TRAP_TS,     SIGSEGV, "invalid TSS",		invalid_TSS)
@@ -278,6 +278,59 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 }
 #endif
 
+dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
+{
+	enum ctx_state prev_state;
+	unsigned long status;
+	struct xsave_struct *xsave_buf;
+	struct task_struct *tsk = current;
+
+	prev_state = exception_enter();
+	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
+			X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
+		goto exit;
+	conditional_sti(regs);
+
+	if (!user_mode(regs))
+		die("bounds", regs, error_code);
+
+	if (!cpu_has_mpx) {
+		/* The exception is not from Intel MPX */
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+		goto exit;
+	}
+
+	fpu_xsave(&tsk->thread.fpu);
+	xsave_buf = &(tsk->thread.fpu.state->xsave);
+	status = xsave_buf->bndcsr.status_reg;
+
+	/*
+	 * The error code field of the BNDSTATUS register communicates status
+	 * information of a bound range exception #BR or operation involving
+	 * bound directory.
+	 */
+	switch (status & MPX_BNDSTA_ERROR_CODE) {
+	case 2:
+		/*
+		 * Bound directory has invalid entry.
+		 */
+		if (do_mpx_bt_fault(xsave_buf))
+			force_sig(SIGSEGV, tsk);
+		break;
+
+	case 1: /* Bound violation. */
+	case 0: /* No exception caused by Intel MPX operations. */
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
+		break;
+
+	default:
+		die("bounds", regs, error_code);
+	}
+
+exit:
+	exception_exit(prev_state);
+}
+
 dotraplinkage void
 do_general_protection(struct pt_regs *regs, long error_code)
 {
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 05/10] x86, mpx: extend siginfo structure to include bound violation information
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 include/uapi/asm-generic/siginfo.h |    9 ++++++++-
 kernel/signal.c                    |    4 ++++
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
 			int _trapno;	/* TRAP # which caused the signal */
 #endif
 			short _addr_lsb; /* LSB of the reported address */
+			struct {
+				void __user *_lower;
+				void __user *_upper;
+			} _addr_bnd;
 		} _sigfault;
 
 		/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
 #define si_trapno	_sifields._sigfault._trapno
 #endif
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
+#define si_lower	_sifields._sigfault._addr_bnd._lower
+#define si_upper	_sifields._sigfault._addr_bnd._upper
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
 #ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
  */
 #define SEGV_MAPERR	(__SI_FAULT|1)	/* address not mapped to object */
 #define SEGV_ACCERR	(__SI_FAULT|2)	/* invalid permissions for mapped object */
-#define NSIGSEGV	2
+#define SEGV_BNDERR	(__SI_FAULT|3)  /* failed address bound checks */
+#define NSIGSEGV	3
 
 /*
  * SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f..2c403a4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from)
 		if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
 			err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
 #endif
+#ifdef SEGV_BNDERR
+		err |= __put_user(from->si_lower, &to->si_lower);
+		err |= __put_user(from->si_upper, &to->si_upper);
+#endif
 		break;
 	case __SI_CHLD:
 		err |= __put_user(from->si_pid, &to->si_pid);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 05/10] x86, mpx: extend siginfo structure to include bound violation information
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds new fields about bound violation into siginfo
structure. si_lower and si_upper are respectively lower bound
and upper bound when bound violation is caused.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 include/uapi/asm-generic/siginfo.h |    9 ++++++++-
 kernel/signal.c                    |    4 ++++
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index ba5be7f..1e35520 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,10 @@ typedef struct siginfo {
 			int _trapno;	/* TRAP # which caused the signal */
 #endif
 			short _addr_lsb; /* LSB of the reported address */
+			struct {
+				void __user *_lower;
+				void __user *_upper;
+			} _addr_bnd;
 		} _sigfault;
 
 		/* SIGPOLL */
@@ -131,6 +135,8 @@ typedef struct siginfo {
 #define si_trapno	_sifields._sigfault._trapno
 #endif
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
+#define si_lower	_sifields._sigfault._addr_bnd._lower
+#define si_upper	_sifields._sigfault._addr_bnd._upper
 #define si_band		_sifields._sigpoll._band
 #define si_fd		_sifields._sigpoll._fd
 #ifdef __ARCH_SIGSYS
@@ -199,7 +205,8 @@ typedef struct siginfo {
  */
 #define SEGV_MAPERR	(__SI_FAULT|1)	/* address not mapped to object */
 #define SEGV_ACCERR	(__SI_FAULT|2)	/* invalid permissions for mapped object */
-#define NSIGSEGV	2
+#define SEGV_BNDERR	(__SI_FAULT|3)  /* failed address bound checks */
+#define NSIGSEGV	3
 
 /*
  * SIGBUS si_codes
diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f..2c403a4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2748,6 +2748,10 @@ int copy_siginfo_to_user(siginfo_t __user *to, const siginfo_t *from)
 		if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
 			err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
 #endif
+#ifdef SEGV_BNDERR
+		err |= __put_user(from->si_lower, &to->si_lower);
+		err |= __put_user(from->si_upper, &to->si_upper);
+#endif
 		break;
 	case __SI_CHLD:
 		err |= __put_user(from->si_pid, &to->si_pid);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 06/10] mips: sync struct siginfo with general version
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

Due to new fields about bound violation added into struct siginfo,
this patch syncs it with general version to avoid build issue.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/mips/include/uapi/asm/siginfo.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h
index e811744..d08f83f 100644
--- a/arch/mips/include/uapi/asm/siginfo.h
+++ b/arch/mips/include/uapi/asm/siginfo.h
@@ -92,6 +92,10 @@ typedef struct siginfo {
 			int _trapno;	/* TRAP # which caused the signal */
 #endif
 			short _addr_lsb;
+			struct {
+				void __user *_lower;
+				void __user *_upper;
+			} _addr_bnd;
 		} _sigfault;
 
 		/* SIGPOLL, SIGXFSZ (To do ...)	 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 06/10] mips: sync struct siginfo with general version
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

Due to new fields about bound violation added into struct siginfo,
this patch syncs it with general version to avoid build issue.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/mips/include/uapi/asm/siginfo.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h
index e811744..d08f83f 100644
--- a/arch/mips/include/uapi/asm/siginfo.h
+++ b/arch/mips/include/uapi/asm/siginfo.h
@@ -92,6 +92,10 @@ typedef struct siginfo {
 			int _trapno;	/* TRAP # which caused the signal */
 #endif
 			short _addr_lsb;
+			struct {
+				void __user *_lower;
+				void __user *_upper;
+			} _addr_bnd;
 		} _sigfault;
 
 		/* SIGPOLL, SIGXFSZ (To do ...)	 */
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.

This patch does't use the generic decoder, and implements a limited
special-purpose decoder to decode MPX instructions, simply because the
generic decoder is very heavyweight not just in terms of performance
but in terms of interface -- because it has to.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h |   23 ++++
 arch/x86/kernel/mpx.c      |  299 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c    |    6 +
 3 files changed, 328 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index b7598ac..780af63 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -3,6 +3,7 @@
 
 #include <linux/types.h>
 #include <asm/ptrace.h>
+#include <asm/insn.h>
 
 #ifdef CONFIG_X86_64
 
@@ -44,15 +45,37 @@
 #define MPX_BNDSTA_ERROR_CODE	0x3
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
+struct mpx_insn {
+	struct insn_field rex_prefix;	/* REX prefix */
+	struct insn_field modrm;
+	struct insn_field sib;
+	struct insn_field displacement;
+
+	unsigned char addr_bytes;	/* effective address size */
+	unsigned char limit;
+	unsigned char x86_64;
+
+	const unsigned char *kaddr;	/* kernel address of insn to analyze */
+	const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE	15
+
 unsigned long mpx_mmap(unsigned long len);
 
 #ifdef CONFIG_X86_INTEL_MPX
 int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf);
 #else
 static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
 {
 	return -EINVAL;
 }
+static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf)
+{
+}
 #endif /* CONFIG_X86_INTEL_MPX */
 
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 88d660f..7ef6e39 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -2,6 +2,275 @@
 #include <linux/syscalls.h>
 #include <asm/mpx.h>
 
+enum reg_type {
+	REG_TYPE_RM = 0,
+	REG_TYPE_INDEX,
+	REG_TYPE_BASE,
+};
+
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+			     enum reg_type type)
+{
+	int regno = 0;
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+	unsigned char sib = (unsigned char)insn->sib.value;
+
+	static const int regoff[] = {
+		offsetof(struct pt_regs, ax),
+		offsetof(struct pt_regs, cx),
+		offsetof(struct pt_regs, dx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, sp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+		offsetof(struct pt_regs, r8),
+		offsetof(struct pt_regs, r9),
+		offsetof(struct pt_regs, r10),
+		offsetof(struct pt_regs, r11),
+		offsetof(struct pt_regs, r12),
+		offsetof(struct pt_regs, r13),
+		offsetof(struct pt_regs, r14),
+		offsetof(struct pt_regs, r15),
+#endif
+	};
+
+	switch (type) {
+	case REG_TYPE_RM:
+		regno = X86_MODRM_RM(modrm);
+		if (X86_REX_B(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	case REG_TYPE_INDEX:
+		regno = X86_SIB_INDEX(sib);
+		if (X86_REX_X(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	case REG_TYPE_BASE:
+		regno = X86_SIB_BASE(sib);
+		if (X86_REX_B(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	default:
+		break;
+	}
+
+	return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+	unsigned long addr;
+	unsigned long base;
+	unsigned long indx;
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+	unsigned char sib = (unsigned char)insn->sib.value;
+
+	if (X86_MODRM_MOD(modrm) == 3) {
+		addr = get_reg(insn, regs, REG_TYPE_RM);
+	} else {
+		if (insn->sib.nbytes) {
+			base = get_reg(insn, regs, REG_TYPE_BASE);
+			indx = get_reg(insn, regs, REG_TYPE_INDEX);
+			addr = base + indx * (1 << X86_SIB_SCALE(sib));
+		} else {
+			addr = get_reg(insn, regs, REG_TYPE_RM);
+		}
+		addr += insn->displacement.value;
+	}
+
+	return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n)	\
+	((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn)		\
+({					\
+	t r = *(t *)insn->next_byte;	\
+	insn->next_byte += sizeof(t);	\
+	r;				\
+})
+
+#define __peek_next(t, insn)		\
+({					\
+	t r = *(t *)insn->next_byte;	\
+	r;				\
+})
+
+#define get_next(t, insn)		\
+({					\
+	if (unlikely(!validate_next(t, insn, 0)))	\
+		goto err_out;		\
+	__get_next(t, insn);		\
+})
+
+#define peek_next(t, insn)		\
+({					\
+	if (unlikely(!validate_next(t, insn, 0)))	\
+		goto err_out;		\
+	__peek_next(t, insn);		\
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+	unsigned char b;
+
+	/* Decode legacy prefix and REX prefix */
+	b = peek_next(unsigned char, insn);
+	while (b != 0x0f) {
+		/*
+		 * look for a rex prefix
+		 * a REX prefix cannot be followed by a legacy prefix.
+		 */
+		if (insn->x86_64 && ((b&0xf0) == 0x40)) {
+			insn->rex_prefix.value = b;
+			insn->rex_prefix.nbytes = 1;
+			insn->next_byte++;
+			break;
+		}
+
+		/* check the other legacy prefixes */
+		switch (b) {
+		case 0xf2:
+		case 0xf3:
+		case 0xf0:
+		case 0x64:
+		case 0x65:
+		case 0x2e:
+		case 0x3e:
+		case 0x26:
+		case 0x36:
+		case 0x66:
+		case 0x67:
+			insn->next_byte++;
+			break;
+		default: /* everything else is garbage */
+			goto err_out;
+		}
+		b = peek_next(unsigned char, insn);
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+	insn->modrm.value = get_next(unsigned char, insn);
+	insn->modrm.nbytes = 1;
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+
+	if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+		insn->sib.value = get_next(unsigned char, insn);
+		insn->sib.nbytes = 1;
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+	unsigned char mod, rm, base;
+
+	/*
+	 * Interpreting the modrm byte:
+	 * mod = 00 - no displacement fields (exceptions below)
+	 * mod = 01 - 1-byte displacement field
+	 * mod = 10 - displacement field is 4 bytes
+	 * mod = 11 - no memory operand
+	 *
+	 * mod != 11, r/m = 100 - SIB byte exists
+	 * mod = 00, SIB base = 101 - displacement field is 4 bytes
+	 * mod = 00, r/m = 101 - rip-relative addressing, displacement
+	 *	field is 4 bytes
+	 */
+	mod = X86_MODRM_MOD(insn->modrm.value);
+	rm = X86_MODRM_RM(insn->modrm.value);
+	base = X86_SIB_BASE(insn->sib.value);
+	if (mod == 3)
+		return;
+	if (mod == 1) {
+		insn->displacement.value = get_next(unsigned char, insn);
+		insn->displacement.nbytes = 1;
+	} else if ((mod == 0 && rm == 5) || mod == 2 ||
+			(mod == 0 && base == 5)) {
+		insn->displacement.value = get_next(int, insn);
+		insn->displacement.nbytes = 4;
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+	unsigned char buf[MAX_MPX_INSN_SIZE];
+	int bytes;
+
+	memset(insn, 0, sizeof(*insn));
+
+	bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+	insn->limit = MAX_MPX_INSN_SIZE - bytes;
+	insn->kaddr = buf;
+	insn->next_byte = buf;
+
+	/*
+	 * In 64-bit Mode, all Intel MPX instructions use 64-bit
+	 * operands for bounds and 64 bit addressing, i.e. REX.W &
+	 * 67H have no effect on data or address size.
+	 *
+	 * In compatibility and legacy modes (including 16-bit code
+	 * segments, real and virtual 8086 modes) all Intel MPX
+	 * instructions use 32-bit operands for bounds and 32 bit
+	 * addressing.
+	 */
+#ifdef CONFIG_X86_64
+	insn->x86_64 = 1;
+	insn->addr_bytes = 8;
+#else
+	insn->x86_64 = 0;
+	insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn,
+				     struct pt_regs *regs)
+{
+	mpx_insn_init(insn, regs);
+
+	/*
+	 * In this case, we only need decode bndcl/bndcn/bndcu,
+	 * so we can use private diassembly interfaces to get
+	 * prefixes, modrm, sib, displacement, etc..
+	 */
+	mpx_insn_get_prefixes(insn);
+	insn->next_byte += 2; /* ignore opcode */
+	mpx_insn_get_modrm(insn);
+	mpx_insn_get_sib(insn);
+	mpx_insn_get_displacement(insn);
+
+	return get_addr_ref(insn, regs);
+}
+
 static int allocate_bt(long __user *bd_entry)
 {
 	unsigned long bt_addr, old_val = 0;
@@ -56,3 +325,33 @@ int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
 
 	return allocate_bt((long __user *)bd_entry);
 }
+
+/*
+ * If a bounds overflow occurs then a #BR is generated. The fault
+ * handler will decode MPX instructions to get violation address
+ * and set this address into extended struct siginfo.
+ */
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf)
+{
+	struct mpx_insn insn;
+	uint8_t bndregno;
+	unsigned long addr_vio;
+
+	addr_vio = mpx_insn_decode(&insn, regs);
+
+	bndregno = X86_MODRM_REG(insn.modrm.value);
+	if (bndregno > 3)
+		return;
+
+	/* Note: the upper 32 bits are ignored in 32-bit mode. */
+	info->si_lower = (void __user *)(unsigned long)
+		(xsave_buf->bndregs.bndregs[2*bndregno]);
+	info->si_upper = (void __user *)(unsigned long)
+		(~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+	info->si_addr_lsb = 0;
+	info->si_signo = SIGSEGV;
+	info->si_errno = 0;
+	info->si_code = SEGV_BNDERR;
+	info->si_addr = (void __user *)addr_vio;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 396a88b..93ce924 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -284,6 +284,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	unsigned long status;
 	struct xsave_struct *xsave_buf;
 	struct task_struct *tsk = current;
+	siginfo_t info;
 
 	prev_state = exception_enter();
 	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -319,6 +320,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 		break;
 
 	case 1: /* Bound violation. */
+		do_mpx_bounds(regs, &info, xsave_buf);
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+				error_code, &info);
+		break;
+
 	case 0: /* No exception caused by Intel MPX operations. */
 		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
 		break;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch sets bound violation fields of siginfo struct in #BR
exception handler by decoding the user instruction and constructing
the faulting pointer.

This patch does't use the generic decoder, and implements a limited
special-purpose decoder to decode MPX instructions, simply because the
generic decoder is very heavyweight not just in terms of performance
but in terms of interface -- because it has to.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h |   23 ++++
 arch/x86/kernel/mpx.c      |  299 ++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c    |    6 +
 3 files changed, 328 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index b7598ac..780af63 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -3,6 +3,7 @@
 
 #include <linux/types.h>
 #include <asm/ptrace.h>
+#include <asm/insn.h>
 
 #ifdef CONFIG_X86_64
 
@@ -44,15 +45,37 @@
 #define MPX_BNDSTA_ERROR_CODE	0x3
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
+struct mpx_insn {
+	struct insn_field rex_prefix;	/* REX prefix */
+	struct insn_field modrm;
+	struct insn_field sib;
+	struct insn_field displacement;
+
+	unsigned char addr_bytes;	/* effective address size */
+	unsigned char limit;
+	unsigned char x86_64;
+
+	const unsigned char *kaddr;	/* kernel address of insn to analyze */
+	const unsigned char *next_byte;
+};
+
+#define MAX_MPX_INSN_SIZE	15
+
 unsigned long mpx_mmap(unsigned long len);
 
 #ifdef CONFIG_X86_INTEL_MPX
 int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf);
 #else
 static inline int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
 {
 	return -EINVAL;
 }
+static inline void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf)
+{
+}
 #endif /* CONFIG_X86_INTEL_MPX */
 
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 88d660f..7ef6e39 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -2,6 +2,275 @@
 #include <linux/syscalls.h>
 #include <asm/mpx.h>
 
+enum reg_type {
+	REG_TYPE_RM = 0,
+	REG_TYPE_INDEX,
+	REG_TYPE_BASE,
+};
+
+static unsigned long get_reg(struct mpx_insn *insn, struct pt_regs *regs,
+			     enum reg_type type)
+{
+	int regno = 0;
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+	unsigned char sib = (unsigned char)insn->sib.value;
+
+	static const int regoff[] = {
+		offsetof(struct pt_regs, ax),
+		offsetof(struct pt_regs, cx),
+		offsetof(struct pt_regs, dx),
+		offsetof(struct pt_regs, bx),
+		offsetof(struct pt_regs, sp),
+		offsetof(struct pt_regs, bp),
+		offsetof(struct pt_regs, si),
+		offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+		offsetof(struct pt_regs, r8),
+		offsetof(struct pt_regs, r9),
+		offsetof(struct pt_regs, r10),
+		offsetof(struct pt_regs, r11),
+		offsetof(struct pt_regs, r12),
+		offsetof(struct pt_regs, r13),
+		offsetof(struct pt_regs, r14),
+		offsetof(struct pt_regs, r15),
+#endif
+	};
+
+	switch (type) {
+	case REG_TYPE_RM:
+		regno = X86_MODRM_RM(modrm);
+		if (X86_REX_B(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	case REG_TYPE_INDEX:
+		regno = X86_SIB_INDEX(sib);
+		if (X86_REX_X(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	case REG_TYPE_BASE:
+		regno = X86_SIB_BASE(sib);
+		if (X86_REX_B(insn->rex_prefix.value) == 1)
+			regno += 8;
+		break;
+
+	default:
+		break;
+	}
+
+	return regs_get_register(regs, regoff[regno]);
+}
+
+/*
+ * return the address being referenced be instruction
+ * for rm=3 returning the content of the rm reg
+ * for rm!=3 calculates the address using SIB and Disp
+ */
+static unsigned long get_addr_ref(struct mpx_insn *insn, struct pt_regs *regs)
+{
+	unsigned long addr;
+	unsigned long base;
+	unsigned long indx;
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+	unsigned char sib = (unsigned char)insn->sib.value;
+
+	if (X86_MODRM_MOD(modrm) == 3) {
+		addr = get_reg(insn, regs, REG_TYPE_RM);
+	} else {
+		if (insn->sib.nbytes) {
+			base = get_reg(insn, regs, REG_TYPE_BASE);
+			indx = get_reg(insn, regs, REG_TYPE_INDEX);
+			addr = base + indx * (1 << X86_SIB_SCALE(sib));
+		} else {
+			addr = get_reg(insn, regs, REG_TYPE_RM);
+		}
+		addr += insn->displacement.value;
+	}
+
+	return addr;
+}
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n)	\
+	((insn)->next_byte + sizeof(t) + n - (insn)->kaddr <= (insn)->limit)
+
+#define __get_next(t, insn)		\
+({					\
+	t r = *(t *)insn->next_byte;	\
+	insn->next_byte += sizeof(t);	\
+	r;				\
+})
+
+#define __peek_next(t, insn)		\
+({					\
+	t r = *(t *)insn->next_byte;	\
+	r;				\
+})
+
+#define get_next(t, insn)		\
+({					\
+	if (unlikely(!validate_next(t, insn, 0)))	\
+		goto err_out;		\
+	__get_next(t, insn);		\
+})
+
+#define peek_next(t, insn)		\
+({					\
+	if (unlikely(!validate_next(t, insn, 0)))	\
+		goto err_out;		\
+	__peek_next(t, insn);		\
+})
+
+static void mpx_insn_get_prefixes(struct mpx_insn *insn)
+{
+	unsigned char b;
+
+	/* Decode legacy prefix and REX prefix */
+	b = peek_next(unsigned char, insn);
+	while (b != 0x0f) {
+		/*
+		 * look for a rex prefix
+		 * a REX prefix cannot be followed by a legacy prefix.
+		 */
+		if (insn->x86_64 && ((b&0xf0) == 0x40)) {
+			insn->rex_prefix.value = b;
+			insn->rex_prefix.nbytes = 1;
+			insn->next_byte++;
+			break;
+		}
+
+		/* check the other legacy prefixes */
+		switch (b) {
+		case 0xf2:
+		case 0xf3:
+		case 0xf0:
+		case 0x64:
+		case 0x65:
+		case 0x2e:
+		case 0x3e:
+		case 0x26:
+		case 0x36:
+		case 0x66:
+		case 0x67:
+			insn->next_byte++;
+			break;
+		default: /* everything else is garbage */
+			goto err_out;
+		}
+		b = peek_next(unsigned char, insn);
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_modrm(struct mpx_insn *insn)
+{
+	insn->modrm.value = get_next(unsigned char, insn);
+	insn->modrm.nbytes = 1;
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_sib(struct mpx_insn *insn)
+{
+	unsigned char modrm = (unsigned char)insn->modrm.value;
+
+	if (X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+		insn->sib.value = get_next(unsigned char, insn);
+		insn->sib.nbytes = 1;
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_get_displacement(struct mpx_insn *insn)
+{
+	unsigned char mod, rm, base;
+
+	/*
+	 * Interpreting the modrm byte:
+	 * mod = 00 - no displacement fields (exceptions below)
+	 * mod = 01 - 1-byte displacement field
+	 * mod = 10 - displacement field is 4 bytes
+	 * mod = 11 - no memory operand
+	 *
+	 * mod != 11, r/m = 100 - SIB byte exists
+	 * mod = 00, SIB base = 101 - displacement field is 4 bytes
+	 * mod = 00, r/m = 101 - rip-relative addressing, displacement
+	 *	field is 4 bytes
+	 */
+	mod = X86_MODRM_MOD(insn->modrm.value);
+	rm = X86_MODRM_RM(insn->modrm.value);
+	base = X86_SIB_BASE(insn->sib.value);
+	if (mod == 3)
+		return;
+	if (mod == 1) {
+		insn->displacement.value = get_next(unsigned char, insn);
+		insn->displacement.nbytes = 1;
+	} else if ((mod == 0 && rm == 5) || mod == 2 ||
+			(mod == 0 && base == 5)) {
+		insn->displacement.value = get_next(int, insn);
+		insn->displacement.nbytes = 4;
+	}
+
+err_out:
+	return;
+}
+
+static void mpx_insn_init(struct mpx_insn *insn, struct pt_regs *regs)
+{
+	unsigned char buf[MAX_MPX_INSN_SIZE];
+	int bytes;
+
+	memset(insn, 0, sizeof(*insn));
+
+	bytes = copy_from_user(buf, (void __user *)regs->ip, MAX_MPX_INSN_SIZE);
+	insn->limit = MAX_MPX_INSN_SIZE - bytes;
+	insn->kaddr = buf;
+	insn->next_byte = buf;
+
+	/*
+	 * In 64-bit Mode, all Intel MPX instructions use 64-bit
+	 * operands for bounds and 64 bit addressing, i.e. REX.W &
+	 * 67H have no effect on data or address size.
+	 *
+	 * In compatibility and legacy modes (including 16-bit code
+	 * segments, real and virtual 8086 modes) all Intel MPX
+	 * instructions use 32-bit operands for bounds and 32 bit
+	 * addressing.
+	 */
+#ifdef CONFIG_X86_64
+	insn->x86_64 = 1;
+	insn->addr_bytes = 8;
+#else
+	insn->x86_64 = 0;
+	insn->addr_bytes = 4;
+#endif
+}
+
+static unsigned long mpx_insn_decode(struct mpx_insn *insn,
+				     struct pt_regs *regs)
+{
+	mpx_insn_init(insn, regs);
+
+	/*
+	 * In this case, we only need decode bndcl/bndcn/bndcu,
+	 * so we can use private diassembly interfaces to get
+	 * prefixes, modrm, sib, displacement, etc..
+	 */
+	mpx_insn_get_prefixes(insn);
+	insn->next_byte += 2; /* ignore opcode */
+	mpx_insn_get_modrm(insn);
+	mpx_insn_get_sib(insn);
+	mpx_insn_get_displacement(insn);
+
+	return get_addr_ref(insn, regs);
+}
+
 static int allocate_bt(long __user *bd_entry)
 {
 	unsigned long bt_addr, old_val = 0;
@@ -56,3 +325,33 @@ int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
 
 	return allocate_bt((long __user *)bd_entry);
 }
+
+/*
+ * If a bounds overflow occurs then a #BR is generated. The fault
+ * handler will decode MPX instructions to get violation address
+ * and set this address into extended struct siginfo.
+ */
+void do_mpx_bounds(struct pt_regs *regs, siginfo_t *info,
+		struct xsave_struct *xsave_buf)
+{
+	struct mpx_insn insn;
+	uint8_t bndregno;
+	unsigned long addr_vio;
+
+	addr_vio = mpx_insn_decode(&insn, regs);
+
+	bndregno = X86_MODRM_REG(insn.modrm.value);
+	if (bndregno > 3)
+		return;
+
+	/* Note: the upper 32 bits are ignored in 32-bit mode. */
+	info->si_lower = (void __user *)(unsigned long)
+		(xsave_buf->bndregs.bndregs[2*bndregno]);
+	info->si_upper = (void __user *)(unsigned long)
+		(~xsave_buf->bndregs.bndregs[2*bndregno+1]);
+	info->si_addr_lsb = 0;
+	info->si_signo = SIGSEGV;
+	info->si_errno = 0;
+	info->si_code = SEGV_BNDERR;
+	info->si_addr = (void __user *)addr_vio;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 396a88b..93ce924 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -284,6 +284,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	unsigned long status;
 	struct xsave_struct *xsave_buf;
 	struct task_struct *tsk = current;
+	siginfo_t info;
 
 	prev_state = exception_enter();
 	if (notify_die(DIE_TRAP, "bounds", regs, error_code,
@@ -319,6 +320,11 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 		break;
 
 	case 1: /* Bound violation. */
+		do_mpx_bounds(regs, &info, xsave_buf);
+		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs,
+				error_code, &info);
+		break;
+
 	case 0: /* No exception caused by Intel MPX operations. */
 		do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
 		break;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
commands. These commands can be used to register and unregister MPX
related resource on the x86 platform.

The base of the bounds directory is set into mm_struct during
PR_MPX_REGISTER command execution. This member can be used to
check whether one application is mpx enabled.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h       |    1 +
 arch/x86/include/asm/processor.h |   18 ++++++++++++
 arch/x86/kernel/mpx.c            |   55 ++++++++++++++++++++++++++++++++++++++
 include/linux/mm_types.h         |    3 ++
 include/uapi/linux/prctl.h       |    6 ++++
 kernel/sys.c                     |   12 ++++++++
 6 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 780af63..6cb0853 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -43,6 +43,7 @@
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
 #define MPX_BNDSTA_ERROR_CODE	0x3
+#define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 struct mpx_insn {
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index eb71ec7..b801fea 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 extern int get_tsc_mode(unsigned long adr);
 extern int set_tsc_mode(unsigned int val);
 
+/* Register/unregister a process' MPX related resource */
+#define MPX_REGISTER(tsk)	mpx_register((tsk))
+#define MPX_UNREGISTER(tsk)	mpx_unregister((tsk))
+
+#ifdef CONFIG_X86_INTEL_MPX
+extern int mpx_register(struct task_struct *tsk);
+extern int mpx_unregister(struct task_struct *tsk);
+#else
+static inline int mpx_register(struct task_struct *tsk)
+{
+	return -EINVAL;
+}
+static inline int mpx_unregister(struct task_struct *tsk)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
 extern u16 amd_get_nb_id(int cpu);
 
 static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 7ef6e39..b86873a 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -1,6 +1,61 @@
 #include <linux/kernel.h>
 #include <linux/syscalls.h>
+#include <linux/prctl.h>
 #include <asm/mpx.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+
+/*
+ * This should only be called when cpuid has been checked
+ * and we are sure that MPX is available.
+ */
+static __user void *task_get_bounds_dir(struct task_struct *tsk)
+{
+	struct xsave_struct *xsave_buf;
+
+	fpu_xsave(&tsk->thread.fpu);
+	xsave_buf = &(tsk->thread.fpu.state->xsave);
+	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
+		return NULL;
+
+	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
+			MPX_BNDCFG_ADDR_MASK);
+}
+
+int mpx_register(struct task_struct *tsk)
+{
+	struct mm_struct *mm = tsk->mm;
+
+	if (!cpu_has_mpx)
+		return -EINVAL;
+
+	/*
+	 * runtime in the userspace will be responsible for allocation of
+	 * the bounds directory. Then, it will save the base of the bounds
+	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
+	 * XRSTOR instruction.
+	 *
+	 * fpu_xsave() is expected to be very expensive. In order to do
+	 * performance optimization, here we get the base of the bounds
+	 * directory and then save it into mm_struct to be used in future.
+	 */
+	mm->bd_addr = task_get_bounds_dir(tsk);
+	if (!mm->bd_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+int mpx_unregister(struct task_struct *tsk)
+{
+	struct mm_struct *mm = current->mm;
+
+	if (!cpu_has_mpx)
+		return -EINVAL;
+
+	mm->bd_addr = NULL;
+	return 0;
+}
 
 enum reg_type {
 	REG_TYPE_RM = 0,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6e0b286..760aee3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -454,6 +454,9 @@ struct mm_struct {
 	bool tlb_flush_pending;
 #endif
 	struct uprobes_state uprobes_state;
+#ifdef CONFIG_X86_INTEL_MPX
+	void __user *bd_addr;		/* address of the bounds directory */
+#endif
 };
 
 static inline void mm_init_cpumask(struct mm_struct *mm)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 58afc04..ce86fa9 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -152,4 +152,10 @@
 #define PR_SET_THP_DISABLE	41
 #define PR_GET_THP_DISABLE	42
 
+/*
+ * Register/unregister MPX related resource.
+ */
+#define PR_MPX_REGISTER		43
+#define PR_MPX_UNREGISTER	44
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index ce81291..9a43587 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -91,6 +91,12 @@
 #ifndef SET_TSC_CTL
 # define SET_TSC_CTL(a)		(-EINVAL)
 #endif
+#ifndef MPX_REGISTER
+# define MPX_REGISTER(a)	(-EINVAL)
+#endif
+#ifndef MPX_UNREGISTER
+# define MPX_UNREGISTER(a)	(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			me->mm->def_flags &= ~VM_NOHUGEPAGE;
 		up_write(&me->mm->mmap_sem);
 		break;
+	case PR_MPX_REGISTER:
+		error = MPX_REGISTER(me);
+		break;
+	case PR_MPX_UNREGISTER:
+		error = MPX_UNREGISTER(me);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
commands. These commands can be used to register and unregister MPX
related resource on the x86 platform.

The base of the bounds directory is set into mm_struct during
PR_MPX_REGISTER command execution. This member can be used to
check whether one application is mpx enabled.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mpx.h       |    1 +
 arch/x86/include/asm/processor.h |   18 ++++++++++++
 arch/x86/kernel/mpx.c            |   55 ++++++++++++++++++++++++++++++++++++++
 include/linux/mm_types.h         |    3 ++
 include/uapi/linux/prctl.h       |    6 ++++
 kernel/sys.c                     |   12 ++++++++
 6 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 780af63..6cb0853 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -43,6 +43,7 @@
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
 #define MPX_BNDSTA_ERROR_CODE	0x3
+#define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
 
 struct mpx_insn {
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index eb71ec7..b801fea 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -953,6 +953,24 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
 extern int get_tsc_mode(unsigned long adr);
 extern int set_tsc_mode(unsigned int val);
 
+/* Register/unregister a process' MPX related resource */
+#define MPX_REGISTER(tsk)	mpx_register((tsk))
+#define MPX_UNREGISTER(tsk)	mpx_unregister((tsk))
+
+#ifdef CONFIG_X86_INTEL_MPX
+extern int mpx_register(struct task_struct *tsk);
+extern int mpx_unregister(struct task_struct *tsk);
+#else
+static inline int mpx_register(struct task_struct *tsk)
+{
+	return -EINVAL;
+}
+static inline int mpx_unregister(struct task_struct *tsk)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_X86_INTEL_MPX */
+
 extern u16 amd_get_nb_id(int cpu);
 
 static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
diff --git a/arch/x86/kernel/mpx.c b/arch/x86/kernel/mpx.c
index 7ef6e39..b86873a 100644
--- a/arch/x86/kernel/mpx.c
+++ b/arch/x86/kernel/mpx.c
@@ -1,6 +1,61 @@
 #include <linux/kernel.h>
 #include <linux/syscalls.h>
+#include <linux/prctl.h>
 #include <asm/mpx.h>
+#include <asm/i387.h>
+#include <asm/fpu-internal.h>
+
+/*
+ * This should only be called when cpuid has been checked
+ * and we are sure that MPX is available.
+ */
+static __user void *task_get_bounds_dir(struct task_struct *tsk)
+{
+	struct xsave_struct *xsave_buf;
+
+	fpu_xsave(&tsk->thread.fpu);
+	xsave_buf = &(tsk->thread.fpu.state->xsave);
+	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
+		return NULL;
+
+	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
+			MPX_BNDCFG_ADDR_MASK);
+}
+
+int mpx_register(struct task_struct *tsk)
+{
+	struct mm_struct *mm = tsk->mm;
+
+	if (!cpu_has_mpx)
+		return -EINVAL;
+
+	/*
+	 * runtime in the userspace will be responsible for allocation of
+	 * the bounds directory. Then, it will save the base of the bounds
+	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
+	 * XRSTOR instruction.
+	 *
+	 * fpu_xsave() is expected to be very expensive. In order to do
+	 * performance optimization, here we get the base of the bounds
+	 * directory and then save it into mm_struct to be used in future.
+	 */
+	mm->bd_addr = task_get_bounds_dir(tsk);
+	if (!mm->bd_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+int mpx_unregister(struct task_struct *tsk)
+{
+	struct mm_struct *mm = current->mm;
+
+	if (!cpu_has_mpx)
+		return -EINVAL;
+
+	mm->bd_addr = NULL;
+	return 0;
+}
 
 enum reg_type {
 	REG_TYPE_RM = 0,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6e0b286..760aee3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -454,6 +454,9 @@ struct mm_struct {
 	bool tlb_flush_pending;
 #endif
 	struct uprobes_state uprobes_state;
+#ifdef CONFIG_X86_INTEL_MPX
+	void __user *bd_addr;		/* address of the bounds directory */
+#endif
 };
 
 static inline void mm_init_cpumask(struct mm_struct *mm)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 58afc04..ce86fa9 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -152,4 +152,10 @@
 #define PR_SET_THP_DISABLE	41
 #define PR_GET_THP_DISABLE	42
 
+/*
+ * Register/unregister MPX related resource.
+ */
+#define PR_MPX_REGISTER		43
+#define PR_MPX_UNREGISTER	44
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index ce81291..9a43587 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -91,6 +91,12 @@
 #ifndef SET_TSC_CTL
 # define SET_TSC_CTL(a)		(-EINVAL)
 #endif
+#ifndef MPX_REGISTER
+# define MPX_REGISTER(a)	(-EINVAL)
+#endif
+#ifndef MPX_UNREGISTER
+# define MPX_UNREGISTER(a)	(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 			me->mm->def_flags &= ~VM_NOHUGEPAGE;
 		up_write(&me->mm->mmap_sem);
 		break;
+	case PR_MPX_REGISTER:
+		error = MPX_REGISTER(me);
+		break;
+	case PR_MPX_UNREGISTER:
+		error = MPX_UNREGISTER(me);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

Since the kernel allocated those tables on-demand without userspace
knowledge, it is also responsible for freeing them when the associated
mappings go away.

Here, the solution for this issue is to hook do_munmap() to check
whether one process is MPX enabled. If yes, those bounds tables covered
in the virtual address region which is being unmapped will be freed also.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mmu_context.h |   16 +++
 arch/x86/include/asm/mpx.h         |    9 ++
 arch/x86/mm/mpx.c                  |  252 ++++++++++++++++++++++++++++++++++++
 include/asm-generic/mmu_context.h  |    6 +
 mm/mmap.c                          |    2 +
 5 files changed, 285 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 166af2a..d13e01c 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
+#include <asm/mpx.h>
 #ifndef CONFIG_PARAVIRT
 #include <asm-generic/mm_hooks.h>
 
@@ -102,4 +103,19 @@ do {						\
 } while (0)
 #endif
 
+static inline void arch_unmap(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_X86_INTEL_MPX
+	/*
+	 * Check whether this vma comes from MPX-enabled application.
+	 * If so, release this vma related bound tables.
+	 */
+	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
+		mpx_unmap(mm, start, end);
+
+#endif
+}
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 6cb0853..e848a74 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -42,6 +42,13 @@
 #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
+#define MPX_BD_ENTRY_MASK	((1<<MPX_BD_ENTRY_OFFSET)-1)
+#define MPX_BT_ENTRY_MASK	((1<<MPX_BT_ENTRY_OFFSET)-1)
+#define MPX_GET_BD_ENTRY_OFFSET(addr)	((((addr)>>(MPX_BT_ENTRY_OFFSET+ \
+		MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT)
+#define MPX_GET_BT_ENTRY_OFFSET(addr)	((((addr)>>MPX_IGN_BITS) & \
+		MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT)
+
 #define MPX_BNDSTA_ERROR_CODE	0x3
 #define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
@@ -63,6 +70,8 @@ struct mpx_insn {
 #define MAX_MPX_INSN_SIZE	15
 
 unsigned long mpx_mmap(unsigned long len);
+void mpx_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end);
 
 #ifdef CONFIG_X86_INTEL_MPX
 int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index e1b28e6..feb1f01 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -1,7 +1,16 @@
+/*
+ * mpx.c - Memory Protection eXtensions
+ *
+ * Copyright (c) 2014, Intel Corporation.
+ * Qiaowei Ren <qiaowei.ren@intel.com>
+ * Dave Hansen <dave.hansen@intel.com>
+ */
+
 #include <linux/kernel.h>
 #include <linux/syscalls.h>
 #include <asm/mpx.h>
 #include <asm/mman.h>
+#include <asm/mmu_context.h>
 #include <linux/sched/sysctl.h>
 
 static const char *mpx_mapping_name(struct vm_area_struct *vma)
@@ -77,3 +86,246 @@ out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
+
+/*
+ * Get the base of bounds tables pointed by specific bounds
+ * directory entry.
+ */
+static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr)
+{
+	int valid;
+
+	if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry))))
+		return -EFAULT;
+
+	pagefault_disable();
+	if (get_user(*bt_addr, bd_entry))
+		goto out;
+	pagefault_enable();
+
+	valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
+	*bt_addr &= MPX_BT_ADDR_MASK;
+
+	/*
+	 * If this bounds directory entry is nonzero, and meanwhile
+	 * the valid bit is zero, one SIGSEGV will be produced due to
+	 * this unexpected situation.
+	 */
+	if (!valid && *bt_addr)
+		return -EINVAL;
+	if (!valid)
+		return -ENOENT;
+
+	return 0;
+
+out:
+	pagefault_enable();
+	return -EFAULT;
+}
+
+/*
+ * Free the backing physical pages of bounds table 'bt_addr'.
+ * Assume start...end is within that bounds table.
+ */
+static int __must_check zap_bt_entries(struct mm_struct *mm,
+		unsigned long bt_addr,
+		unsigned long start, unsigned long end)
+{
+	struct vm_area_struct *vma;
+
+	/* Find the vma which overlaps this bounds table */
+	vma = find_vma(mm, bt_addr);
+	/*
+	 * The table entry comes from userspace and could be
+	 * pointing anywhere, so make sure it is at least
+	 * pointing to valid memory.
+	 */
+	if (!vma || !(vma->vm_flags & VM_MPX) ||
+			vma->vm_start > bt_addr ||
+			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
+		return -EINVAL;
+
+	zap_page_range(vma, start, end - start, NULL);
+	return 0;
+}
+
+static int __must_check unmap_single_bt(struct mm_struct *mm,
+		long __user *bd_entry, unsigned long bt_addr)
+{
+	int ret;
+
+	pagefault_disable();
+	ret = user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
+			bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0);
+	pagefault_enable();
+	if (ret)
+		return -EFAULT;
+
+	/*
+	 * to avoid recursion, do_munmap() will check whether it comes
+	 * from one bounds table through VM_MPX flag.
+	 */
+	return do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+}
+
+/*
+ * If the bounds table pointed by bounds directory 'bd_entry' is
+ * not shared, unmap this whole bounds table. Otherwise, only free
+ * those backing physical pages of bounds table entries covered
+ * in this virtual address region start...end.
+ */
+static int __must_check unmap_shared_bt(struct mm_struct *mm,
+		long __user *bd_entry, unsigned long start,
+		unsigned long end, bool prev_shared, bool next_shared)
+{
+	unsigned long bt_addr;
+	int ret;
+
+	ret = get_bt_addr(bd_entry, &bt_addr);
+	if (ret)
+		return ret;
+
+	if (prev_shared && next_shared)
+		ret = zap_bt_entries(mm, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+	else if (prev_shared)
+		ret = zap_bt_entries(mm, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+				bt_addr+MPX_BT_SIZE_BYTES);
+	else if (next_shared)
+		ret = zap_bt_entries(mm, bt_addr, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+	else
+		ret = unmap_single_bt(mm, bd_entry, bt_addr);
+
+	return ret;
+}
+
+/*
+ * A virtual address region being munmap()ed might share bounds table
+ * with adjacent VMAs. We only need to free the backing physical
+ * memory of these shared bounds tables entries covered in this virtual
+ * address region.
+ *
+ * the VMAs covering the virtual address region start...end have already
+ * been split if necessary and removed from the VMA list.
+ */
+static int __must_check unmap_side_bts(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+	long __user *bde_start, *bde_end;
+	struct vm_area_struct *prev, *next;
+	bool prev_shared = false, next_shared = false;
+
+	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+
+	/*
+	 * Check whether bde_start and bde_end are shared with adjacent
+	 * VMAs. Because the VMAs covering the virtual address region
+	 * start...end have already been removed from the VMA list, if
+	 * next is not NULL it will satisfy start < end <= next->vm_start.
+	 * And if prev is not NULL, prev->vm_end <= start < end.
+	 */
+	next = find_vma_prev(mm, start, &prev);
+	if (prev && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1))
+			== bde_start)
+		prev_shared = true;
+	if (next && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(next->vm_start))
+			== bde_end)
+		next_shared = true;
+
+	/*
+	 * This virtual address region being munmap()ed is only
+	 * covered by one bounds table.
+	 *
+	 * In this case, if this table is also shared with adjacent
+	 * VMAs, only part of the backing physical memory of the bounds
+	 * table need be freeed. Otherwise the whole bounds table need
+	 * be unmapped.
+	 */
+	if (bde_start == bde_end) {
+		return unmap_shared_bt(mm, bde_start, start, end,
+				prev_shared, next_shared);
+	}
+
+	/*
+	 * If more than one bounds tables are covered in this virtual
+	 * address region being munmap()ed, we need to separately check
+	 * whether bde_start and bde_end are shared with adjacent VMAs.
+	 */
+	ret = unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
+	if (ret)
+		return ret;
+
+	ret = unmap_shared_bt(mm, bde_end, start, end, false, next_shared);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int __must_check mpx_try_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+	long __user *bd_entry, *bde_start, *bde_end;
+	unsigned long bt_addr;
+
+	/*
+	 * unmap bounds tables pointed out by start/end bounds directory
+	 * entries, or only free part of their backing physical memroy
+	 * if they are shared with adjacent VMAs.
+	 */
+	ret = unmap_side_bts(mm, start, end);
+	if (ret == -EFAULT)
+		return ret;
+
+	/*
+	 * unmap those bounds table which are entirely covered in this
+	 * virtual address region.
+	 */
+	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
+		ret = get_bt_addr(bd_entry, &bt_addr);
+		/*
+		 * A fault means we have to drop mmap_sem,
+		 * perform the fault, and retry this somehow.
+		 */
+		if (ret == -EFAULT)
+			return ret;
+		/*
+		 * Any other issue (like a bad bounds-directory)
+		 * we can try the next one.
+		 */
+		if (ret)
+			continue;
+
+		ret = unmap_single_bt(mm, bd_entry, bt_addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * Free unused bounds tables covered in a virtual address region being
+ * munmap()ed. Assume end > start.
+ *
+ * This function will be called by do_munmap(), and the VMAs covering
+ * the virtual address region start...end have already been split if
+ * necessary and remvoed from the VMA list.
+ */
+void mpx_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+
+	ret = mpx_try_unmap(mm, start, end);
+	if (ret == -EINVAL)
+		force_sig(SIGSEGV, current);
+}
diff --git a/include/asm-generic/mmu_context.h b/include/asm-generic/mmu_context.h
index a7eec91..ac558ca 100644
--- a/include/asm-generic/mmu_context.h
+++ b/include/asm-generic/mmu_context.h
@@ -42,4 +42,10 @@ static inline void activate_mm(struct mm_struct *prev_mm,
 {
 }
 
+static inline void arch_unmap(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			unsigned long start, unsigned long end)
+{
+}
+
 #endif /* __ASM_GENERIC_MMU_CONTEXT_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index c1f2ea4..abe533f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2583,6 +2583,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	/* Fix up all other VM information */
 	remove_vma_list(mm, vma);
 
+	arch_unmap(mm, vma, start, end);
+
 	return 0;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

Since the kernel allocated those tables on-demand without userspace
knowledge, it is also responsible for freeing them when the associated
mappings go away.

Here, the solution for this issue is to hook do_munmap() to check
whether one process is MPX enabled. If yes, those bounds tables covered
in the virtual address region which is being unmapped will be freed also.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 arch/x86/include/asm/mmu_context.h |   16 +++
 arch/x86/include/asm/mpx.h         |    9 ++
 arch/x86/mm/mpx.c                  |  252 ++++++++++++++++++++++++++++++++++++
 include/asm-generic/mmu_context.h  |    6 +
 mm/mmap.c                          |    2 +
 5 files changed, 285 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 166af2a..d13e01c 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
+#include <asm/mpx.h>
 #ifndef CONFIG_PARAVIRT
 #include <asm-generic/mm_hooks.h>
 
@@ -102,4 +103,19 @@ do {						\
 } while (0)
 #endif
 
+static inline void arch_unmap(struct mm_struct *mm,
+		struct vm_area_struct *vma,
+		unsigned long start, unsigned long end)
+{
+#ifdef CONFIG_X86_INTEL_MPX
+	/*
+	 * Check whether this vma comes from MPX-enabled application.
+	 * If so, release this vma related bound tables.
+	 */
+	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
+		mpx_unmap(mm, start, end);
+
+#endif
+}
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index 6cb0853..e848a74 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -42,6 +42,13 @@
 #define MPX_BD_SIZE_BYTES (1UL<<(MPX_BD_ENTRY_OFFSET+MPX_BD_ENTRY_SHIFT))
 #define MPX_BT_SIZE_BYTES (1UL<<(MPX_BT_ENTRY_OFFSET+MPX_BT_ENTRY_SHIFT))
 
+#define MPX_BD_ENTRY_MASK	((1<<MPX_BD_ENTRY_OFFSET)-1)
+#define MPX_BT_ENTRY_MASK	((1<<MPX_BT_ENTRY_OFFSET)-1)
+#define MPX_GET_BD_ENTRY_OFFSET(addr)	((((addr)>>(MPX_BT_ENTRY_OFFSET+ \
+		MPX_IGN_BITS)) & MPX_BD_ENTRY_MASK) << MPX_BD_ENTRY_SHIFT)
+#define MPX_GET_BT_ENTRY_OFFSET(addr)	((((addr)>>MPX_IGN_BITS) & \
+		MPX_BT_ENTRY_MASK) << MPX_BT_ENTRY_SHIFT)
+
 #define MPX_BNDSTA_ERROR_CODE	0x3
 #define MPX_BNDCFG_ENABLE_FLAG	0x1
 #define MPX_BD_ENTRY_VALID_FLAG	0x1
@@ -63,6 +70,8 @@ struct mpx_insn {
 #define MAX_MPX_INSN_SIZE	15
 
 unsigned long mpx_mmap(unsigned long len);
+void mpx_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end);
 
 #ifdef CONFIG_X86_INTEL_MPX
 int do_mpx_bt_fault(struct xsave_struct *xsave_buf);
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index e1b28e6..feb1f01 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -1,7 +1,16 @@
+/*
+ * mpx.c - Memory Protection eXtensions
+ *
+ * Copyright (c) 2014, Intel Corporation.
+ * Qiaowei Ren <qiaowei.ren@intel.com>
+ * Dave Hansen <dave.hansen@intel.com>
+ */
+
 #include <linux/kernel.h>
 #include <linux/syscalls.h>
 #include <asm/mpx.h>
 #include <asm/mman.h>
+#include <asm/mmu_context.h>
 #include <linux/sched/sysctl.h>
 
 static const char *mpx_mapping_name(struct vm_area_struct *vma)
@@ -77,3 +86,246 @@ out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
+
+/*
+ * Get the base of bounds tables pointed by specific bounds
+ * directory entry.
+ */
+static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr)
+{
+	int valid;
+
+	if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry))))
+		return -EFAULT;
+
+	pagefault_disable();
+	if (get_user(*bt_addr, bd_entry))
+		goto out;
+	pagefault_enable();
+
+	valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
+	*bt_addr &= MPX_BT_ADDR_MASK;
+
+	/*
+	 * If this bounds directory entry is nonzero, and meanwhile
+	 * the valid bit is zero, one SIGSEGV will be produced due to
+	 * this unexpected situation.
+	 */
+	if (!valid && *bt_addr)
+		return -EINVAL;
+	if (!valid)
+		return -ENOENT;
+
+	return 0;
+
+out:
+	pagefault_enable();
+	return -EFAULT;
+}
+
+/*
+ * Free the backing physical pages of bounds table 'bt_addr'.
+ * Assume start...end is within that bounds table.
+ */
+static int __must_check zap_bt_entries(struct mm_struct *mm,
+		unsigned long bt_addr,
+		unsigned long start, unsigned long end)
+{
+	struct vm_area_struct *vma;
+
+	/* Find the vma which overlaps this bounds table */
+	vma = find_vma(mm, bt_addr);
+	/*
+	 * The table entry comes from userspace and could be
+	 * pointing anywhere, so make sure it is at least
+	 * pointing to valid memory.
+	 */
+	if (!vma || !(vma->vm_flags & VM_MPX) ||
+			vma->vm_start > bt_addr ||
+			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
+		return -EINVAL;
+
+	zap_page_range(vma, start, end - start, NULL);
+	return 0;
+}
+
+static int __must_check unmap_single_bt(struct mm_struct *mm,
+		long __user *bd_entry, unsigned long bt_addr)
+{
+	int ret;
+
+	pagefault_disable();
+	ret = user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
+			bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0);
+	pagefault_enable();
+	if (ret)
+		return -EFAULT;
+
+	/*
+	 * to avoid recursion, do_munmap() will check whether it comes
+	 * from one bounds table through VM_MPX flag.
+	 */
+	return do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
+}
+
+/*
+ * If the bounds table pointed by bounds directory 'bd_entry' is
+ * not shared, unmap this whole bounds table. Otherwise, only free
+ * those backing physical pages of bounds table entries covered
+ * in this virtual address region start...end.
+ */
+static int __must_check unmap_shared_bt(struct mm_struct *mm,
+		long __user *bd_entry, unsigned long start,
+		unsigned long end, bool prev_shared, bool next_shared)
+{
+	unsigned long bt_addr;
+	int ret;
+
+	ret = get_bt_addr(bd_entry, &bt_addr);
+	if (ret)
+		return ret;
+
+	if (prev_shared && next_shared)
+		ret = zap_bt_entries(mm, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+	else if (prev_shared)
+		ret = zap_bt_entries(mm, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
+				bt_addr+MPX_BT_SIZE_BYTES);
+	else if (next_shared)
+		ret = zap_bt_entries(mm, bt_addr, bt_addr,
+				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
+	else
+		ret = unmap_single_bt(mm, bd_entry, bt_addr);
+
+	return ret;
+}
+
+/*
+ * A virtual address region being munmap()ed might share bounds table
+ * with adjacent VMAs. We only need to free the backing physical
+ * memory of these shared bounds tables entries covered in this virtual
+ * address region.
+ *
+ * the VMAs covering the virtual address region start...end have already
+ * been split if necessary and removed from the VMA list.
+ */
+static int __must_check unmap_side_bts(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+	long __user *bde_start, *bde_end;
+	struct vm_area_struct *prev, *next;
+	bool prev_shared = false, next_shared = false;
+
+	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+
+	/*
+	 * Check whether bde_start and bde_end are shared with adjacent
+	 * VMAs. Because the VMAs covering the virtual address region
+	 * start...end have already been removed from the VMA list, if
+	 * next is not NULL it will satisfy start < end <= next->vm_start.
+	 * And if prev is not NULL, prev->vm_end <= start < end.
+	 */
+	next = find_vma_prev(mm, start, &prev);
+	if (prev && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1))
+			== bde_start)
+		prev_shared = true;
+	if (next && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(next->vm_start))
+			== bde_end)
+		next_shared = true;
+
+	/*
+	 * This virtual address region being munmap()ed is only
+	 * covered by one bounds table.
+	 *
+	 * In this case, if this table is also shared with adjacent
+	 * VMAs, only part of the backing physical memory of the bounds
+	 * table need be freeed. Otherwise the whole bounds table need
+	 * be unmapped.
+	 */
+	if (bde_start == bde_end) {
+		return unmap_shared_bt(mm, bde_start, start, end,
+				prev_shared, next_shared);
+	}
+
+	/*
+	 * If more than one bounds tables are covered in this virtual
+	 * address region being munmap()ed, we need to separately check
+	 * whether bde_start and bde_end are shared with adjacent VMAs.
+	 */
+	ret = unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
+	if (ret)
+		return ret;
+
+	ret = unmap_shared_bt(mm, bde_end, start, end, false, next_shared);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int __must_check mpx_try_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+	long __user *bd_entry, *bde_start, *bde_end;
+	unsigned long bt_addr;
+
+	/*
+	 * unmap bounds tables pointed out by start/end bounds directory
+	 * entries, or only free part of their backing physical memroy
+	 * if they are shared with adjacent VMAs.
+	 */
+	ret = unmap_side_bts(mm, start, end);
+	if (ret == -EFAULT)
+		return ret;
+
+	/*
+	 * unmap those bounds table which are entirely covered in this
+	 * virtual address region.
+	 */
+	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
+	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
+	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
+		ret = get_bt_addr(bd_entry, &bt_addr);
+		/*
+		 * A fault means we have to drop mmap_sem,
+		 * perform the fault, and retry this somehow.
+		 */
+		if (ret == -EFAULT)
+			return ret;
+		/*
+		 * Any other issue (like a bad bounds-directory)
+		 * we can try the next one.
+		 */
+		if (ret)
+			continue;
+
+		ret = unmap_single_bt(mm, bd_entry, bt_addr);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * Free unused bounds tables covered in a virtual address region being
+ * munmap()ed. Assume end > start.
+ *
+ * This function will be called by do_munmap(), and the VMAs covering
+ * the virtual address region start...end have already been split if
+ * necessary and remvoed from the VMA list.
+ */
+void mpx_unmap(struct mm_struct *mm,
+		unsigned long start, unsigned long end)
+{
+	int ret;
+
+	ret = mpx_try_unmap(mm, start, end);
+	if (ret == -EINVAL)
+		force_sig(SIGSEGV, current);
+}
diff --git a/include/asm-generic/mmu_context.h b/include/asm-generic/mmu_context.h
index a7eec91..ac558ca 100644
--- a/include/asm-generic/mmu_context.h
+++ b/include/asm-generic/mmu_context.h
@@ -42,4 +42,10 @@ static inline void activate_mm(struct mm_struct *prev_mm,
 {
 }
 
+static inline void arch_unmap(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			unsigned long start, unsigned long end)
+{
+}
+
 #endif /* __ASM_GENERIC_MMU_CONTEXT_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index c1f2ea4..abe533f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2583,6 +2583,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	/* Fix up all other VM information */
 	remove_vma_list(mm, vma);
 
+	arch_unmap(mm, vma, start, end);
+
 	return 0;
 }
 
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 10/10] x86, mpx: add documentation on Intel MPX
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-11  8:46   ` Qiaowei Ren
  -1 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 Documentation/x86/intel_mpx.txt |  127 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/intel_mpx.txt

diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..ccffeee
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,127 @@
+1. Intel(R) MPX Overview
+========================
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX provides
+hardware features that can be used in conjunction with compiler
+changes to check memory references, for those references whose
+compile-time normal intentions are usurped at runtime due to
+buffer overflow or underflow.
+
+For more information, please refer to Intel(R) Architecture
+Instruction Set Extensions Programming Reference, Chapter 9:
+Intel(R) Memory Protection Extensions.
+
+Note: Currently no hardware with MPX ISA is available but it is always
+possible to use SDE (Intel(R) Software Development Emulator) instead,
+which can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+
+2. How does MPX kernel code work
+================================
+
+Handling #BR faults caused by MPX
+---------------------------------
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults.
+  * bounds violation caused by MPX instructions.
+  * new bounds tables (BT) need to be allocated to save bounds.
+
+We hook #BR handler to handle these two new situations.
+
+Decoding MPX instructions
+-------------------------
+
+If a #BR is generated due to a bounds violation caused by MPX.
+We need to decode MPX instructions to get violation address and
+set this address into extended struct siginfo.
+
+The _sigfault feild of struct siginfo is extended as follow:
+
+87		/* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+88		struct {
+89			void __user *_addr; /* faulting insn/memory ref. */
+90 #ifdef __ARCH_SI_TRAPNO
+91			int _trapno;	/* TRAP # which caused the signal */
+92 #endif
+93			short _addr_lsb; /* LSB of the reported address */
+94			struct {
+95				void __user *_lower;
+96				void __user *_upper;
+97			} _addr_bnd;
+98		} _sigfault;
+
+The '_addr' field refers to violation address, and new '_addr_and'
+field refers to the upper/lower bounds when a #BR is caused.
+
+Glibc will be also updated to support this new siginfo. So user
+can get violation address and bounds when bounds violations occur.
+
+Freeing unused bounds tables
+----------------------------
+
+When a BNDSTX instruction attempts to save bounds to a bounds directory
+entry marked as invalid, a #BR is generated. This is an indication that
+no bounds table exists for this entry. In this case the fault handler
+will allocate a new bounds table on demand.
+
+Since the kernel allocated those tables on-demand without userspace
+knowledge, it is also responsible for freeing them when the associated
+mappings go away.
+
+Here, the solution for this issue is to hook do_munmap() to check
+whether one process is MPX enabled. If yes, those bounds tables covered
+in the virtual address region which is being unmapped will be freed also.
+
+Adding new prctl commands
+-------------------------
+
+Runtime library in userspace is responsible for allocation of bounds
+directory. So kernel have to use XSAVE instruction to get the base
+of bounds directory from BNDCFG register.
+
+But XSAVE is expected to be very expensive. In order to do performance
+optimization, we have to add new prctl command to get the base of
+bounds directory to be used in future.
+
+Two new prctl commands are added to register and unregister MPX related
+resource.
+
+155	#define PR_MPX_REGISTER         43
+156	#define PR_MPX_UNREGISTER       44
+
+The base of the bounds directory is set into mm_struct during
+PR_MPX_REGISTER command execution. This member can be used to
+check whether one application is mpx enabled.
+
+
+3. Tips
+=======
+
+1) Users are not allowed to create bounds tables and point the bounds
+directory at them in the userspace. In fact, it is not also necessary
+for users to create bounds tables in the userspace.
+
+When #BR fault is produced due to invalid entry, bounds table will be
+created in kernel on demand and kernel will not transfer this fault to
+userspace. So usersapce can't receive #BR fault for invalid entry, and
+it is not also necessary for users to create bounds tables by themselves.
+
+Certainly users can allocate bounds tables and forcibly point the bounds
+directory at them through XSAVE instruction, and then set valid bit
+of bounds entry to have this entry valid. But we have no way to track
+the memory usage of these user-created bounds tables. In regard to this,
+this behaviour is outlawed here.
+
+2) We will not support the case that multiple bounds directory entries
+are pointed at the same bounds table.
+
+Users can be allowed to take multiple bounds directory entries and point
+them at the same bounds table. See more information "Intel(R) Architecture
+Instruction Set Extensions Programming Reference" (9.3.4).
+
+If userspace did this, it will be possible for kernel to unmap an in-use
+bounds table since it does not recognize sharing. So this behavior is
+also outlawed here.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v8 10/10] x86, mpx: add documentation on Intel MPX
@ 2014-09-11  8:46   ` Qiaowei Ren
  0 siblings, 0 replies; 130+ messages in thread
From: Qiaowei Ren @ 2014-09-11  8:46 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen
  Cc: x86, linux-mm, linux-kernel, Qiaowei Ren

This patch adds the Documentation/x86/intel_mpx.txt file with some
information about Intel MPX.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
---
 Documentation/x86/intel_mpx.txt |  127 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 127 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/x86/intel_mpx.txt

diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 0000000..ccffeee
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,127 @@
+1. Intel(R) MPX Overview
+========================
+
+Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new
+capability introduced into Intel Architecture. Intel MPX provides
+hardware features that can be used in conjunction with compiler
+changes to check memory references, for those references whose
+compile-time normal intentions are usurped at runtime due to
+buffer overflow or underflow.
+
+For more information, please refer to Intel(R) Architecture
+Instruction Set Extensions Programming Reference, Chapter 9:
+Intel(R) Memory Protection Extensions.
+
+Note: Currently no hardware with MPX ISA is available but it is always
+possible to use SDE (Intel(R) Software Development Emulator) instead,
+which can be downloaded from
+http://software.intel.com/en-us/articles/intel-software-development-emulator
+
+
+2. How does MPX kernel code work
+================================
+
+Handling #BR faults caused by MPX
+---------------------------------
+
+When MPX is enabled, there are 2 new situations that can generate
+#BR faults.
+  * bounds violation caused by MPX instructions.
+  * new bounds tables (BT) need to be allocated to save bounds.
+
+We hook #BR handler to handle these two new situations.
+
+Decoding MPX instructions
+-------------------------
+
+If a #BR is generated due to a bounds violation caused by MPX.
+We need to decode MPX instructions to get violation address and
+set this address into extended struct siginfo.
+
+The _sigfault feild of struct siginfo is extended as follow:
+
+87		/* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+88		struct {
+89			void __user *_addr; /* faulting insn/memory ref. */
+90 #ifdef __ARCH_SI_TRAPNO
+91			int _trapno;	/* TRAP # which caused the signal */
+92 #endif
+93			short _addr_lsb; /* LSB of the reported address */
+94			struct {
+95				void __user *_lower;
+96				void __user *_upper;
+97			} _addr_bnd;
+98		} _sigfault;
+
+The '_addr' field refers to violation address, and new '_addr_and'
+field refers to the upper/lower bounds when a #BR is caused.
+
+Glibc will be also updated to support this new siginfo. So user
+can get violation address and bounds when bounds violations occur.
+
+Freeing unused bounds tables
+----------------------------
+
+When a BNDSTX instruction attempts to save bounds to a bounds directory
+entry marked as invalid, a #BR is generated. This is an indication that
+no bounds table exists for this entry. In this case the fault handler
+will allocate a new bounds table on demand.
+
+Since the kernel allocated those tables on-demand without userspace
+knowledge, it is also responsible for freeing them when the associated
+mappings go away.
+
+Here, the solution for this issue is to hook do_munmap() to check
+whether one process is MPX enabled. If yes, those bounds tables covered
+in the virtual address region which is being unmapped will be freed also.
+
+Adding new prctl commands
+-------------------------
+
+Runtime library in userspace is responsible for allocation of bounds
+directory. So kernel have to use XSAVE instruction to get the base
+of bounds directory from BNDCFG register.
+
+But XSAVE is expected to be very expensive. In order to do performance
+optimization, we have to add new prctl command to get the base of
+bounds directory to be used in future.
+
+Two new prctl commands are added to register and unregister MPX related
+resource.
+
+155	#define PR_MPX_REGISTER         43
+156	#define PR_MPX_UNREGISTER       44
+
+The base of the bounds directory is set into mm_struct during
+PR_MPX_REGISTER command execution. This member can be used to
+check whether one application is mpx enabled.
+
+
+3. Tips
+=======
+
+1) Users are not allowed to create bounds tables and point the bounds
+directory at them in the userspace. In fact, it is not also necessary
+for users to create bounds tables in the userspace.
+
+When #BR fault is produced due to invalid entry, bounds table will be
+created in kernel on demand and kernel will not transfer this fault to
+userspace. So usersapce can't receive #BR fault for invalid entry, and
+it is not also necessary for users to create bounds tables by themselves.
+
+Certainly users can allocate bounds tables and forcibly point the bounds
+directory at them through XSAVE instruction, and then set valid bit
+of bounds entry to have this entry valid. But we have no way to track
+the memory usage of these user-created bounds tables. In regard to this,
+this behaviour is outlawed here.
+
+2) We will not support the case that multiple bounds directory entries
+are pointed at the same bounds table.
+
+Users can be allowed to take multiple bounds directory entries and point
+them at the same bounds table. See more information "Intel(R) Architecture
+Instruction Set Extensions Programming Reference" (9.3.4).
+
+If userspace did this, it will be possible for kernel to unmap an in-use
+bounds table since it does not recognize sharing. So this behavior is
+also outlawed here.
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-11 14:59     ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 14:59 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> + * This function will be called by do_munmap(), and the VMAs covering
> + * the virtual address region start...end have already been split if
> + * necessary and remvoed from the VMA list.

"remvoed" -> "removed"

> +void mpx_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +
> +	ret = mpx_try_unmap(mm, start, end);
> +	if (ret == -EINVAL)
> +		force_sig(SIGSEGV, current);
> +}

In the case of a fault during an unmap, this just ignores the situation
and returns silently.  Where is the code to retry the freeing operation
outside of mmap_sem?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-11 14:59     ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 14:59 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> + * This function will be called by do_munmap(), and the VMAs covering
> + * the virtual address region start...end have already been split if
> + * necessary and remvoed from the VMA list.

"remvoed" -> "removed"

> +void mpx_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +
> +	ret = mpx_try_unmap(mm, start, end);
> +	if (ret == -EINVAL)
> +		force_sig(SIGSEGV, current);
> +}

In the case of a fault during an unmap, this just ignores the situation
and returns silently.  Where is the code to retry the freeing operation
outside of mmap_sem?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-11 15:03     ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 15:03 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +
> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}

I don't think casting a u64 to a ulong, then to a pointer is useful.
Just take the '(unsigned long)' out.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-11 15:03     ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 15:03 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +
> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}

I don't think casting a u64 to a ulong, then to a pointer is useful.
Just take the '(unsigned long)' out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 06/10] mips: sync struct siginfo with general version
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-11 22:13     ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 22:13 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> Due to new fields about bound violation added into struct siginfo,
> this patch syncs it with general version to avoid build issue.

You completely fail to explain which build issue is addressed by this
patch. The code you added to kernel/signal.c which accesses _addr_bnd
is guarded by

+#ifdef SEGV_BNDERR

which is not defined my MIPS. Also why is this only affecting MIPS and
not any other architecture which provides its own struct siginfo ?

That patch makes no sense at all, at least not without a proper
explanation.

Thanks,

	tglx

> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> ---
>  arch/mips/include/uapi/asm/siginfo.h |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h
> index e811744..d08f83f 100644
> --- a/arch/mips/include/uapi/asm/siginfo.h
> +++ b/arch/mips/include/uapi/asm/siginfo.h
> @@ -92,6 +92,10 @@ typedef struct siginfo {
>  			int _trapno;	/* TRAP # which caused the signal */
>  #endif
>  			short _addr_lsb;
> +			struct {
> +				void __user *_lower;
> +				void __user *_upper;
> +			} _addr_bnd;
>  		} _sigfault;
>  
>  		/* SIGPOLL, SIGXFSZ (To do ...)	 */
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 06/10] mips: sync struct siginfo with general version
@ 2014-09-11 22:13     ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 22:13 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> Due to new fields about bound violation added into struct siginfo,
> this patch syncs it with general version to avoid build issue.

You completely fail to explain which build issue is addressed by this
patch. The code you added to kernel/signal.c which accesses _addr_bnd
is guarded by

+#ifdef SEGV_BNDERR

which is not defined my MIPS. Also why is this only affecting MIPS and
not any other architecture which provides its own struct siginfo ?

That patch makes no sense at all, at least not without a proper
explanation.

Thanks,

	tglx

> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
> ---
>  arch/mips/include/uapi/asm/siginfo.h |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/mips/include/uapi/asm/siginfo.h b/arch/mips/include/uapi/asm/siginfo.h
> index e811744..d08f83f 100644
> --- a/arch/mips/include/uapi/asm/siginfo.h
> +++ b/arch/mips/include/uapi/asm/siginfo.h
> @@ -92,6 +92,10 @@ typedef struct siginfo {
>  			int _trapno;	/* TRAP # which caused the signal */
>  #endif
>  			short _addr_lsb;
> +			struct {
> +				void __user *_lower;
> +				void __user *_upper;
> +			} _addr_bnd;
>  		} _sigfault;
>  
>  		/* SIGPOLL, SIGXFSZ (To do ...)	 */
> -- 
> 1.7.1
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-11 22:18     ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 22:18 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> This patch sets bound violation fields of siginfo struct in #BR
> exception handler by decoding the user instruction and constructing
> the faulting pointer.
> 
> This patch does't use the generic decoder, and implements a limited
> special-purpose decoder to decode MPX instructions, simply because the
> generic decoder is very heavyweight not just in terms of performance
> but in terms of interface -- because it has to.

And why is that an argument to add another special purpose decoder?

If a bound violation happens it is completely irrelevant whether the
decoder is heavyweight or not.

So unless you come up with a convincing argument why the generic
decoder is the wrong place, this won't happen.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-11 22:18     ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 22:18 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> This patch sets bound violation fields of siginfo struct in #BR
> exception handler by decoding the user instruction and constructing
> the faulting pointer.
> 
> This patch does't use the generic decoder, and implements a limited
> special-purpose decoder to decode MPX instructions, simply because the
> generic decoder is very heavyweight not just in terms of performance
> but in terms of interface -- because it has to.

And why is that an argument to add another special purpose decoder?

If a bound violation happens it is completely irrelevant whether the
decoder is heavyweight or not.

So unless you come up with a convincing argument why the generic
decoder is the wrong place, this won't happen.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11 22:18     ` Thomas Gleixner
@ 2014-09-11 22:32       ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 22:32 UTC (permalink / raw)
  To: Thomas Gleixner, Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>> This patch sets bound violation fields of siginfo struct in #BR
>> exception handler by decoding the user instruction and constructing
>> the faulting pointer.
>>
>> This patch does't use the generic decoder, and implements a limited
>> special-purpose decoder to decode MPX instructions, simply because the
>> generic decoder is very heavyweight not just in terms of performance
>> but in terms of interface -- because it has to.
> 
> And why is that an argument to add another special purpose decoder?

Peter asked for it to be done this way specifically:

	https://lkml.org/lkml/2014/6/19/411


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-11 22:32       ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-11 22:32 UTC (permalink / raw)
  To: Thomas Gleixner, Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>> This patch sets bound violation fields of siginfo struct in #BR
>> exception handler by decoding the user instruction and constructing
>> the faulting pointer.
>>
>> This patch does't use the generic decoder, and implements a limited
>> special-purpose decoder to decode MPX instructions, simply because the
>> generic decoder is very heavyweight not just in terms of performance
>> but in terms of interface -- because it has to.
> 
> And why is that an argument to add another special purpose decoder?

Peter asked for it to be done this way specifically:

	https://lkml.org/lkml/2014/6/19/411

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11 22:32       ` Dave Hansen
@ 2014-09-11 22:35         ` H. Peter Anvin
  -1 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-11 22:35 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Qiaowei Ren
  Cc: Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 03:32 PM, Dave Hansen wrote:
> On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
>> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>>> This patch sets bound violation fields of siginfo struct in #BR
>>> exception handler by decoding the user instruction and constructing
>>> the faulting pointer.
>>>
>>> This patch does't use the generic decoder, and implements a limited
>>> special-purpose decoder to decode MPX instructions, simply because the
>>> generic decoder is very heavyweight not just in terms of performance
>>> but in terms of interface -- because it has to.
>>
>> And why is that an argument to add another special purpose decoder?
> 
> Peter asked for it to be done this way specifically:
> 
> 	https://lkml.org/lkml/2014/6/19/411
> 

Specifically because marshaling the data in and out of the generic
decoder was more complex than a special-purpose decoder.

	-hpa


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-11 22:35         ` H. Peter Anvin
  0 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-11 22:35 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Qiaowei Ren
  Cc: Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 03:32 PM, Dave Hansen wrote:
> On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
>> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>>> This patch sets bound violation fields of siginfo struct in #BR
>>> exception handler by decoding the user instruction and constructing
>>> the faulting pointer.
>>>
>>> This patch does't use the generic decoder, and implements a limited
>>> special-purpose decoder to decode MPX instructions, simply because the
>>> generic decoder is very heavyweight not just in terms of performance
>>> but in terms of interface -- because it has to.
>>
>> And why is that an argument to add another special purpose decoder?
> 
> Peter asked for it to be done this way specifically:
> 
> 	https://lkml.org/lkml/2014/6/19/411
> 

Specifically because marshaling the data in and out of the generic
decoder was more complex than a special-purpose decoder.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-11 23:28     ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 23:28 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> commands. These commands can be used to register and unregister MPX
> related resource on the x86 platform.

I cant see anything which is registered/unregistered.
 
> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

This changelog is completely useless.

What's the actual point of this prctl?

> +/*
> + * This should only be called when cpuid has been checked
> + * and we are sure that MPX is available.

Groan. Why can't you put that cpuid check into that function right
away instead of adding a worthless comment?

It's obviously more important to have a comment about somthing which
is obvious than explaining what the function is actually doing, right?

> + */
> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> +	struct xsave_struct *xsave_buf;
> +
> +	fpu_xsave(&tsk->thread.fpu);
> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> +		return NULL;

Now this might be understandable with a proper comment. Right now it's
a magic check for something uncomprehensible.

> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}
> +
> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */

Ah. Now we get some information what this might do. But that does not
make any sense at all.

So all it does is:

    tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;

or:

    tsk->mm.bd_addr = NULL;

So we use that information to check, whether we need to tear down a
VM_MPX flagged region with mpx_unmap(), right?

> +         /*
> +          * Check whether this vma comes from MPX-enabled application.
> +          * If so, release this vma related bound tables.
> +          */
> +         if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> +                 mpx_unmap(mm, start, end);

You really must be kidding. The application maps that table and never
calls that prctl so do_unmap() will happily ignore it?

The design to support this feature makes no sense at all to me. We
have a special mmap interface, some magic kernel side mapping
functionality and then on top of it a prctl telling the kernel to
ignore/respect it.

All I have seen so far is the hint to read some intel feature
documentation, but no coherent explanation how this patch set makes
use of that very feature. The last patch in the series does not count
as coherent explanation. It merily documents parts of the
implementation details which are required to make use of it but
completely lacks of a coherent description how all of this is supposed
to work.

Despite the fact that this is V8, I can't suppress the feeling that
this is just cobbled together to make it work somehow and we'll deal
with the fallout later. I wouldn't be surprised if some of the fallout
is going to be security related. I have a pretty good idea how to
exploit it even without understanding the non-malicious intent of the
whole thing.

So: NAK to the whole series for now until someone comes up with a
coherent explanation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-11 23:28     ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 23:28 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, Dave Hansen, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Qiaowei Ren wrote:

> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> commands. These commands can be used to register and unregister MPX
> related resource on the x86 platform.

I cant see anything which is registered/unregistered.
 
> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

This changelog is completely useless.

What's the actual point of this prctl?

> +/*
> + * This should only be called when cpuid has been checked
> + * and we are sure that MPX is available.

Groan. Why can't you put that cpuid check into that function right
away instead of adding a worthless comment?

It's obviously more important to have a comment about somthing which
is obvious than explaining what the function is actually doing, right?

> + */
> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> +	struct xsave_struct *xsave_buf;
> +
> +	fpu_xsave(&tsk->thread.fpu);
> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> +		return NULL;

Now this might be understandable with a proper comment. Right now it's
a magic check for something uncomprehensible.

> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}
> +
> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */

Ah. Now we get some information what this might do. But that does not
make any sense at all.

So all it does is:

    tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;

or:

    tsk->mm.bd_addr = NULL;

So we use that information to check, whether we need to tear down a
VM_MPX flagged region with mpx_unmap(), right?

> +         /*
> +          * Check whether this vma comes from MPX-enabled application.
> +          * If so, release this vma related bound tables.
> +          */
> +         if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> +                 mpx_unmap(mm, start, end);

You really must be kidding. The application maps that table and never
calls that prctl so do_unmap() will happily ignore it?

The design to support this feature makes no sense at all to me. We
have a special mmap interface, some magic kernel side mapping
functionality and then on top of it a prctl telling the kernel to
ignore/respect it.

All I have seen so far is the hint to read some intel feature
documentation, but no coherent explanation how this patch set makes
use of that very feature. The last patch in the series does not count
as coherent explanation. It merily documents parts of the
implementation details which are required to make use of it but
completely lacks of a coherent description how all of this is supposed
to work.

Despite the fact that this is V8, I can't suppress the feeling that
this is just cobbled together to make it work somehow and we'll deal
with the fallout later. I wouldn't be surprised if some of the fallout
is going to be security related. I have a pretty good idea how to
exploit it even without understanding the non-malicious intent of the
whole thing.

So: NAK to the whole series for now until someone comes up with a
coherent explanation.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11 22:35         ` H. Peter Anvin
@ 2014-09-11 23:37           ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 23:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 03:32 PM, Dave Hansen wrote:
> > On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >>> This patch sets bound violation fields of siginfo struct in #BR
> >>> exception handler by decoding the user instruction and constructing
> >>> the faulting pointer.
> >>>
> >>> This patch does't use the generic decoder, and implements a limited
> >>> special-purpose decoder to decode MPX instructions, simply because the
> >>> generic decoder is very heavyweight not just in terms of performance
> >>> but in terms of interface -- because it has to.
> >>
> >> And why is that an argument to add another special purpose decoder?
> > 
> > Peter asked for it to be done this way specifically:
> > 
> > 	https://lkml.org/lkml/2014/6/19/411
> > 
> 
> Specifically because marshaling the data in and out of the generic
> decoder was more complex than a special-purpose decoder.

I did not look at that detail and I trust your judgement here, but
that is in no way explained in the changelog.

This whole patchset is a pain to review due to half baken changelogs
and complete lack of a proper design description.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-11 23:37           ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-11 23:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 03:32 PM, Dave Hansen wrote:
> > On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >>> This patch sets bound violation fields of siginfo struct in #BR
> >>> exception handler by decoding the user instruction and constructing
> >>> the faulting pointer.
> >>>
> >>> This patch does't use the generic decoder, and implements a limited
> >>> special-purpose decoder to decode MPX instructions, simply because the
> >>> generic decoder is very heavyweight not just in terms of performance
> >>> but in terms of interface -- because it has to.
> >>
> >> And why is that an argument to add another special purpose decoder?
> > 
> > Peter asked for it to be done this way specifically:
> > 
> > 	https://lkml.org/lkml/2014/6/19/411
> > 
> 
> Specifically because marshaling the data in and out of the generic
> decoder was more complex than a special-purpose decoder.

I did not look at that detail and I trust your judgement here, but
that is in no way explained in the changelog.

This whole patchset is a pain to review due to half baken changelogs
and complete lack of a proper design description.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11 23:28     ` Thomas Gleixner
@ 2014-09-12  0:10       ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  0:10 UTC (permalink / raw)
  To: Thomas Gleixner, Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
>> commands. These commands can be used to register and unregister MPX
>> related resource on the x86 platform.
> 
> I cant see anything which is registered/unregistered.

This registers the location of the bounds directory with the kernel.

>From the app's perspective, it says "I'm using MPX, and here is where I
put the root data structure".

Without this, the kernel would have to do an (expensive) xsave operation
every time it wanted to see if MPX was in use.  This also makes the
user/kernel interaction more explicit.  We would be in a world of hurt
if userspace was allowed to move the bounds directory around.  With this
interface, it's a bit more obvious that userspace can't just move it
around willy-nilly.

>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to
>> check whether one application is mpx enabled.
> 
> This changelog is completely useless.

Yeah, it's pretty bare-bones.  Let me know if the explanation above
makes sense, and we'll get it updated.

>> +/*
>> + * This should only be called when cpuid has been checked
>> + * and we are sure that MPX is available.
> 
> Groan. Why can't you put that cpuid check into that function right
> away instead of adding a worthless comment?

Sounds reasonable to me.  We should just move the cpuid check in to
task_get_bounds_dir().

>> + */
>> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
>> +{
>> +	struct xsave_struct *xsave_buf;
>> +
>> +	fpu_xsave(&tsk->thread.fpu);
>> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
>> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
>> +		return NULL;
> 
> Now this might be understandable with a proper comment. Right now it's
> a magic check for something uncomprehensible.

It's a bit ugly to access, but it seems pretty blatantly obvious that
this is a check for "Is the enable flag in a hardware register set?"

Yes, the registers have names only a mother could love.  But that is
what they're really called.

I guess we could add some comments about why we need to do the xsave.

>> +int mpx_register(struct task_struct *tsk)
>> +{
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
> 
> Ah. Now we get some information what this might do. But that does not
> make any sense at all.
> 
> So all it does is:
> 
>     tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> 
> or:
> 
>     tsk->mm.bd_addr = NULL;
> 
> So we use that information to check, whether we need to tear down a
> VM_MPX flagged region with mpx_unmap(), right?

Well, we use it to figure out whether we _potentially_ need to tear down
an VM_MPX-flagged area.  There's no guarantee that there will be one.

>> +         /*
>> +          * Check whether this vma comes from MPX-enabled application.
>> +          * If so, release this vma related bound tables.
>> +          */
>> +         if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
>> +                 mpx_unmap(mm, start, end);
> 
> You really must be kidding. The application maps that table and never
> calls that prctl so do_unmap() will happily ignore it?

Yes.  The only other way the kernel can possibly know that it needs to
go tearing things down is with a potentially frequent and expensive xsave.

Either we change mmap to say "this mmap() is for a bounds directory", or
we have some other interface that says "the mmap() for the bounds
directory is at $foo".  We could also record the bounds directory the
first time that we catch userspace using it.  I'd rather have an
explicit interface than an implicit one like that, though I don't feel
that strongly about it.

> The design to support this feature makes no sense at all to me. We
> have a special mmap interface, some magic kernel side mapping
> functionality and then on top of it a prctl telling the kernel to
> ignore/respect it.

That's a good point.  We don't seem to have anything in the
allocate_bt() side of things to tell the kernel to refuse to create
things if the prctl() hasn't been called.  That needs to get added.

> All I have seen so far is the hint to read some intel feature
> documentation, but no coherent explanation how this patch set makes
> use of that very feature. The last patch in the series does not count
> as coherent explanation. It merily documents parts of the
> implementation details which are required to make use of it but
> completely lacks of a coherent description how all of this is supposed
> to work.

It sounds like we need to take the patch00 plus the documentation patch
and try to lay things out more clearly.

> Despite the fact that this is V8, I can't suppress the feeling that
> this is just cobbled together to make it work somehow and we'll deal
> with the fallout later.

It's v8, but it's been very lightly reviewed.  I do appreciate the
review at this point, though.

> I wouldn't be surprised if some of the fallout
> is going to be security related. I have a pretty good idea how to
> exploit it even without understanding the non-malicious intent of the
> whole thing.

If you don't want to share them in public, I'm happy to take this
off-list, but please do share.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12  0:10       ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  0:10 UTC (permalink / raw)
  To: Thomas Gleixner, Qiaowei Ren
  Cc: H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
>> commands. These commands can be used to register and unregister MPX
>> related resource on the x86 platform.
> 
> I cant see anything which is registered/unregistered.

This registers the location of the bounds directory with the kernel.

>From the app's perspective, it says "I'm using MPX, and here is where I
put the root data structure".

Without this, the kernel would have to do an (expensive) xsave operation
every time it wanted to see if MPX was in use.  This also makes the
user/kernel interaction more explicit.  We would be in a world of hurt
if userspace was allowed to move the bounds directory around.  With this
interface, it's a bit more obvious that userspace can't just move it
around willy-nilly.

>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to
>> check whether one application is mpx enabled.
> 
> This changelog is completely useless.

Yeah, it's pretty bare-bones.  Let me know if the explanation above
makes sense, and we'll get it updated.

>> +/*
>> + * This should only be called when cpuid has been checked
>> + * and we are sure that MPX is available.
> 
> Groan. Why can't you put that cpuid check into that function right
> away instead of adding a worthless comment?

Sounds reasonable to me.  We should just move the cpuid check in to
task_get_bounds_dir().

>> + */
>> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
>> +{
>> +	struct xsave_struct *xsave_buf;
>> +
>> +	fpu_xsave(&tsk->thread.fpu);
>> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
>> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
>> +		return NULL;
> 
> Now this might be understandable with a proper comment. Right now it's
> a magic check for something uncomprehensible.

It's a bit ugly to access, but it seems pretty blatantly obvious that
this is a check for "Is the enable flag in a hardware register set?"

Yes, the registers have names only a mother could love.  But that is
what they're really called.

I guess we could add some comments about why we need to do the xsave.

>> +int mpx_register(struct task_struct *tsk)
>> +{
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
> 
> Ah. Now we get some information what this might do. But that does not
> make any sense at all.
> 
> So all it does is:
> 
>     tsk->mm.bd_addr = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> 
> or:
> 
>     tsk->mm.bd_addr = NULL;
> 
> So we use that information to check, whether we need to tear down a
> VM_MPX flagged region with mpx_unmap(), right?

Well, we use it to figure out whether we _potentially_ need to tear down
an VM_MPX-flagged area.  There's no guarantee that there will be one.

>> +         /*
>> +          * Check whether this vma comes from MPX-enabled application.
>> +          * If so, release this vma related bound tables.
>> +          */
>> +         if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
>> +                 mpx_unmap(mm, start, end);
> 
> You really must be kidding. The application maps that table and never
> calls that prctl so do_unmap() will happily ignore it?

Yes.  The only other way the kernel can possibly know that it needs to
go tearing things down is with a potentially frequent and expensive xsave.

Either we change mmap to say "this mmap() is for a bounds directory", or
we have some other interface that says "the mmap() for the bounds
directory is at $foo".  We could also record the bounds directory the
first time that we catch userspace using it.  I'd rather have an
explicit interface than an implicit one like that, though I don't feel
that strongly about it.

> The design to support this feature makes no sense at all to me. We
> have a special mmap interface, some magic kernel side mapping
> functionality and then on top of it a prctl telling the kernel to
> ignore/respect it.

That's a good point.  We don't seem to have anything in the
allocate_bt() side of things to tell the kernel to refuse to create
things if the prctl() hasn't been called.  That needs to get added.

> All I have seen so far is the hint to read some intel feature
> documentation, but no coherent explanation how this patch set makes
> use of that very feature. The last patch in the series does not count
> as coherent explanation. It merily documents parts of the
> implementation details which are required to make use of it but
> completely lacks of a coherent description how all of this is supposed
> to work.

It sounds like we need to take the patch00 plus the documentation patch
and try to lay things out more clearly.

> Despite the fact that this is V8, I can't suppress the feeling that
> this is just cobbled together to make it work somehow and we'll deal
> with the fallout later.

It's v8, but it's been very lightly reviewed.  I do appreciate the
review at this point, though.

> I wouldn't be surprised if some of the fallout
> is going to be security related. I have a pretty good idea how to
> exploit it even without understanding the non-malicious intent of the
> whole thing.

If you don't want to share them in public, I'm happy to take this
off-list, but please do share.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-11  8:46 ` Qiaowei Ren
@ 2014-09-12  0:51   ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  0:51 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
> provide handlers for bounds faults (#BR), and manage bounds memory.

Qiaowei, We probably need to mention here what "bounds memory" is, and
why it has to be managed, and who is responsible for the different pieces.

Who allocates the memory?
Who fills the memory?
When is it freed?

Thomas, do you have any other suggestions for things you'd like to see
clarified?

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-12  0:51   ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  0:51 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
> provide handlers for bounds faults (#BR), and manage bounds memory.

Qiaowei, We probably need to mention here what "bounds memory" is, and
why it has to be managed, and who is responsible for the different pieces.

Who allocates the memory?
Who fills the memory?
When is it freed?

Thomas, do you have any other suggestions for things you'd like to see
clarified?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
  2014-09-11 22:13     ` Thomas Gleixner
@ 2014-09-12  2:54       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  2:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel



On 2014-09-12, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> 
>> Due to new fields about bound violation added into struct siginfo,
>> this patch syncs it with general version to avoid build issue.
> 
> You completely fail to explain which build issue is addressed by this
> patch. The code you added to kernel/signal.c which accesses _addr_bnd
> is guarded by
> 
> +#ifdef SEGV_BNDERR
> 
> which is not defined my MIPS. Also why is this only affecting MIPS and
> not any other architecture which provides its own struct siginfo ?
> 
> That patch makes no sense at all, at least not without a proper explanation.
>

For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like "no _lower in struct siginfo" when arch=mips.

In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. 

Thanks,
Qiaowei

> 
>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> ---
>>  arch/mips/include/uapi/asm/siginfo.h |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>> diff --git a/arch/mips/include/uapi/asm/siginfo.h
>> b/arch/mips/include/uapi/asm/siginfo.h
>> index e811744..d08f83f 100644
>> --- a/arch/mips/include/uapi/asm/siginfo.h
>> +++ b/arch/mips/include/uapi/asm/siginfo.h
>> @@ -92,6 +92,10 @@ typedef struct siginfo {
>>  			int _trapno;	/* TRAP # which caused the signal */
>>  #endif
>>  			short _addr_lsb;
>> +			struct {
>> +				void __user *_lower;
>> +				void __user *_upper;
>> +			} _addr_bnd;
>>  		} _sigfault;
>>  
>>  		/* SIGPOLL, SIGXFSZ (To do ...)	 */
>> --
>> 1.7.1
>> 
>>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
@ 2014-09-12  2:54       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  2:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel



On 2014-09-12, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> 
>> Due to new fields about bound violation added into struct siginfo,
>> this patch syncs it with general version to avoid build issue.
> 
> You completely fail to explain which build issue is addressed by this
> patch. The code you added to kernel/signal.c which accesses _addr_bnd
> is guarded by
> 
> +#ifdef SEGV_BNDERR
> 
> which is not defined my MIPS. Also why is this only affecting MIPS and
> not any other architecture which provides its own struct siginfo ?
> 
> That patch makes no sense at all, at least not without a proper explanation.
>

For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will include general siginfo.h, and only replace general stuct siginfo with mips specific struct siginfo. So SEGV_BNDERR will be defined for all archs, and we will get error like "no _lower in struct siginfo" when arch=mips.

In addition, only MIPS arch define its own struct siginfo, so this is only affecting MIPS. 

Thanks,
Qiaowei

> 
>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> ---
>>  arch/mips/include/uapi/asm/siginfo.h |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>> diff --git a/arch/mips/include/uapi/asm/siginfo.h
>> b/arch/mips/include/uapi/asm/siginfo.h
>> index e811744..d08f83f 100644
>> --- a/arch/mips/include/uapi/asm/siginfo.h
>> +++ b/arch/mips/include/uapi/asm/siginfo.h
>> @@ -92,6 +92,10 @@ typedef struct siginfo {
>>  			int _trapno;	/* TRAP # which caused the signal */
>>  #endif
>>  			short _addr_lsb;
>> +			struct {
>> +				void __user *_lower;
>> +				void __user *_upper;
>> +			} _addr_bnd;
>>  		} _sigfault;
>>  
>>  		/* SIGPOLL, SIGXFSZ (To do ...)	 */
>> --
>> 1.7.1
>> 
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-11 14:59     ` Dave Hansen
@ 2014-09-12  3:02       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  3:02 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-11, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> + * This function will be called by do_munmap(), and the VMAs
>> + covering
>> + * the virtual address region start...end have already been split
>> + if
>> + * necessary and remvoed from the VMA list.
> 
> "remvoed" -> "removed"
> 
>> +void mpx_unmap(struct mm_struct *mm,
>> +		unsigned long start, unsigned long end) {
>> +	int ret;
>> +
>> +	ret = mpx_try_unmap(mm, start, end);
>> +	if (ret == -EINVAL)
>> +		force_sig(SIGSEGV, current);
>> +}
> 
> In the case of a fault during an unmap, this just ignores the
> situation and returns silently.  Where is the code to retry the
> freeing operation outside of mmap_sem?

Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post.

Thanks,
Qiaowei


^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-12  3:02       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  3:02 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-11, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> + * This function will be called by do_munmap(), and the VMAs
>> + covering
>> + * the virtual address region start...end have already been split
>> + if
>> + * necessary and remvoed from the VMA list.
> 
> "remvoed" -> "removed"
> 
>> +void mpx_unmap(struct mm_struct *mm,
>> +		unsigned long start, unsigned long end) {
>> +	int ret;
>> +
>> +	ret = mpx_try_unmap(mm, start, end);
>> +	if (ret == -EINVAL)
>> +		force_sig(SIGSEGV, current);
>> +}
> 
> In the case of a fault during an unmap, this just ignores the
> situation and returns silently.  Where is the code to retry the
> freeing operation outside of mmap_sem?

Dave, you mean delayed_work code? According to our discussion, it will be deferred to another mainline post.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11 15:03     ` Dave Hansen
@ 2014-09-12  3:10       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  3:10 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-11, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +
>> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
>> +			MPX_BNDCFG_ADDR_MASK);
>> +}
> 
> I don't think casting a u64 to a ulong, then to a pointer is useful.
> Just take the '(unsigned long)' out.

If so, this will spits out a warning on 32-bit:

arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir':
arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Thanks,
Qiaowei


^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12  3:10       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-12  3:10 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-11, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +
>> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
>> +			MPX_BNDCFG_ADDR_MASK);
>> +}
> 
> I don't think casting a u64 to a ulong, then to a pointer is useful.
> Just take the '(unsigned long)' out.

If so, this will spits out a warning on 32-bit:

arch/x86/kernel/mpx.c: In function 'task_get_bounds_dir':
arch/x86/kernel/mpx.c:21:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11 23:37           ` Thomas Gleixner
@ 2014-09-12  4:44             ` H. Peter Anvin
  -1 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12  4:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 04:37 PM, Thomas Gleixner wrote:
>>
>> Specifically because marshaling the data in and out of the generic
>> decoder was more complex than a special-purpose decoder.
>
> I did not look at that detail and I trust your judgement here, but
> that is in no way explained in the changelog.
>
> This whole patchset is a pain to review due to half baken changelogs
> and complete lack of a proper design description.
>

I'm not wedded to that concept, by the way, but using the generic parser 
had a whole bunch of its own problems, including the fact that you're 
getting bytes from user space.

It might be worthwhile to compare the older patchset which did use the 
generic parser to make sure that it actually made sense.

	-hpa





^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12  4:44             ` H. Peter Anvin
  0 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12  4:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/11/2014 04:37 PM, Thomas Gleixner wrote:
>>
>> Specifically because marshaling the data in and out of the generic
>> decoder was more complex than a special-purpose decoder.
>
> I did not look at that detail and I trust your judgement here, but
> that is in no way explained in the changelog.
>
> This whole patchset is a pain to review due to half baken changelogs
> and complete lack of a proper design description.
>

I'm not wedded to that concept, by the way, but using the generic parser 
had a whole bunch of its own problems, including the fact that you're 
getting bytes from user space.

It might be worthwhile to compare the older patchset which did use the 
generic parser to make sure that it actually made sense.

	-hpa




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-12  3:02       ` Ren, Qiaowei
@ 2014-09-12  4:59         ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  4:59 UTC (permalink / raw)
  To: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 08:02 PM, Ren, Qiaowei wrote:
> On 2014-09-11, Hansen, Dave wrote:
>> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>>> + * This function will be called by do_munmap(), and the VMAs
>>> + covering
>>> + * the virtual address region start...end have already been split
>>> + if
>>> + * necessary and remvoed from the VMA list.
>>
>> "remvoed" -> "removed"
>>
>>> +void mpx_unmap(struct mm_struct *mm,
>>> +		unsigned long start, unsigned long end) {
>>> +	int ret;
>>> +
>>> +	ret = mpx_try_unmap(mm, start, end);
>>> +	if (ret == -EINVAL)
>>> +		force_sig(SIGSEGV, current);
>>> +}
>> 
>> In the case of a fault during an unmap, this just ignores the 
>> situation and returns silently.  Where is the code to retry the 
>> freeing operation outside of mmap_sem?
> 
> Dave, you mean delayed_work code? According to our discussion, it
> will be deferred to another mainline post.

OK, fine.  Just please call that out in the description.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-12  4:59         ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12  4:59 UTC (permalink / raw)
  To: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 08:02 PM, Ren, Qiaowei wrote:
> On 2014-09-11, Hansen, Dave wrote:
>> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>>> + * This function will be called by do_munmap(), and the VMAs
>>> + covering
>>> + * the virtual address region start...end have already been split
>>> + if
>>> + * necessary and remvoed from the VMA list.
>>
>> "remvoed" -> "removed"
>>
>>> +void mpx_unmap(struct mm_struct *mm,
>>> +		unsigned long start, unsigned long end) {
>>> +	int ret;
>>> +
>>> +	ret = mpx_try_unmap(mm, start, end);
>>> +	if (ret == -EINVAL)
>>> +		force_sig(SIGSEGV, current);
>>> +}
>> 
>> In the case of a fault during an unmap, this just ignores the 
>> situation and returns silently.  Where is the code to retry the 
>> freeing operation outside of mmap_sem?
> 
> Dave, you mean delayed_work code? According to our discussion, it
> will be deferred to another mainline post.

OK, fine.  Just please call that out in the description.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12  0:10       ` Dave Hansen
@ 2014-09-12  8:11         ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  8:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Dave Hansen wrote:
> On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> > On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> >> commands. These commands can be used to register and unregister MPX
> >> related resource on the x86 platform.
> > 
> > I cant see anything which is registered/unregistered.
> 
> This registers the location of the bounds directory with the kernel.
> 
> >From the app's perspective, it says "I'm using MPX, and here is where I
> put the root data structure".
> 
> Without this, the kernel would have to do an (expensive) xsave operation
> every time it wanted to see if MPX was in use.  This also makes the
> user/kernel interaction more explicit.  We would be in a world of hurt
> if userspace was allowed to move the bounds directory around.  With this
> interface, it's a bit more obvious that userspace can't just move it
> around willy-nilly.

And what prevents it to do so? Just the fact that you have a prctl
does not make userspace better.

> >> The base of the bounds directory is set into mm_struct during
> >> PR_MPX_REGISTER command execution. This member can be used to
> >> check whether one application is mpx enabled.
> > 
> > This changelog is completely useless.
> 
> Yeah, it's pretty bare-bones.  Let me know if the explanation above
> makes sense, and we'll get it updated.

Well, it at least explains what its supposed to do. Whether that
itself makes sense is a completely different question.
 
> >> + */
> >> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> >> +{
> >> +	struct xsave_struct *xsave_buf;
> >> +
> >> +	fpu_xsave(&tsk->thread.fpu);
> >> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> >> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> >> +		return NULL;
> > 
> > Now this might be understandable with a proper comment. Right now it's
> > a magic check for something uncomprehensible.
> 
> It's a bit ugly to access, but it seems pretty blatantly obvious that
> this is a check for "Is the enable flag in a hardware register set?"
> 
> Yes, the registers have names only a mother could love.  But that is
> what they're really called.
> 
> I guess we could add some comments about why we need to do the xsave.

Exactly.
 
> > So we use that information to check, whether we need to tear down a
> > VM_MPX flagged region with mpx_unmap(), right?
> 
> Well, we use it to figure out whether we _potentially_ need to tear down
> an VM_MPX-flagged area.  There's no guarantee that there will be one.

So what you are saying is, that if user space sets the pointer to NULL
via the unregister prctl, kernel can safely ignore vmas which have the
VM_MPX flag set. I really can't follow that logic.
 
	mmap_mpx();
	prctl(enable mpx);
	do lots of crap which uses mpx;
	prctl(disable mpx);

So after that point the previous use of MPX is irrelevant, just
because we set a pointer to NULL? Does it just look like crap because
I do not get the big picture how all of this is supposed to work?

> Yes.  The only other way the kernel can possibly know that it needs to
> go tearing things down is with a potentially frequent and expensive xsave.
> 
> Either we change mmap to say "this mmap() is for a bounds directory", or
> we have some other interface that says "the mmap() for the bounds
> directory is at $foo".  We could also record the bounds directory the
> first time that we catch userspace using it.  I'd rather have an
> explicit interface than an implicit one like that, though I don't feel
> that strongly about it.

I really have to disagree here. If I follow your logic then we would
have a prctl for using floating point as well instead of catching the
use and handle it from there. Just get it, if you make it simple for
user space to do stupid things, they will happen in all provided ways
and some more.

> > The design to support this feature makes no sense at all to me. We
> > have a special mmap interface, some magic kernel side mapping
> > functionality and then on top of it a prctl telling the kernel to
> > ignore/respect it.
> 
> That's a good point.  We don't seem to have anything in the
> allocate_bt() side of things to tell the kernel to refuse to create
> things if the prctl() hasn't been called.  That needs to get added.

And then you need another bunch of logic in the prctl(disable mpx)
path to cleanup the mess instead of just setting a random pointer to
NULL.

> If you don't want to share them in public, I'm happy to take this
> off-list, but please do share.

I'll let you know once I verified that it might work.

Thanks,

	tglx
 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12  8:11         ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  8:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Dave Hansen wrote:
> On 09/11/2014 04:28 PM, Thomas Gleixner wrote:
> > On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >> This patch adds the PR_MPX_REGISTER and PR_MPX_UNREGISTER prctl()
> >> commands. These commands can be used to register and unregister MPX
> >> related resource on the x86 platform.
> > 
> > I cant see anything which is registered/unregistered.
> 
> This registers the location of the bounds directory with the kernel.
> 
> >From the app's perspective, it says "I'm using MPX, and here is where I
> put the root data structure".
> 
> Without this, the kernel would have to do an (expensive) xsave operation
> every time it wanted to see if MPX was in use.  This also makes the
> user/kernel interaction more explicit.  We would be in a world of hurt
> if userspace was allowed to move the bounds directory around.  With this
> interface, it's a bit more obvious that userspace can't just move it
> around willy-nilly.

And what prevents it to do so? Just the fact that you have a prctl
does not make userspace better.

> >> The base of the bounds directory is set into mm_struct during
> >> PR_MPX_REGISTER command execution. This member can be used to
> >> check whether one application is mpx enabled.
> > 
> > This changelog is completely useless.
> 
> Yeah, it's pretty bare-bones.  Let me know if the explanation above
> makes sense, and we'll get it updated.

Well, it at least explains what its supposed to do. Whether that
itself makes sense is a completely different question.
 
> >> + */
> >> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> >> +{
> >> +	struct xsave_struct *xsave_buf;
> >> +
> >> +	fpu_xsave(&tsk->thread.fpu);
> >> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> >> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> >> +		return NULL;
> > 
> > Now this might be understandable with a proper comment. Right now it's
> > a magic check for something uncomprehensible.
> 
> It's a bit ugly to access, but it seems pretty blatantly obvious that
> this is a check for "Is the enable flag in a hardware register set?"
> 
> Yes, the registers have names only a mother could love.  But that is
> what they're really called.
> 
> I guess we could add some comments about why we need to do the xsave.

Exactly.
 
> > So we use that information to check, whether we need to tear down a
> > VM_MPX flagged region with mpx_unmap(), right?
> 
> Well, we use it to figure out whether we _potentially_ need to tear down
> an VM_MPX-flagged area.  There's no guarantee that there will be one.

So what you are saying is, that if user space sets the pointer to NULL
via the unregister prctl, kernel can safely ignore vmas which have the
VM_MPX flag set. I really can't follow that logic.
 
	mmap_mpx();
	prctl(enable mpx);
	do lots of crap which uses mpx;
	prctl(disable mpx);

So after that point the previous use of MPX is irrelevant, just
because we set a pointer to NULL? Does it just look like crap because
I do not get the big picture how all of this is supposed to work?

> Yes.  The only other way the kernel can possibly know that it needs to
> go tearing things down is with a potentially frequent and expensive xsave.
> 
> Either we change mmap to say "this mmap() is for a bounds directory", or
> we have some other interface that says "the mmap() for the bounds
> directory is at $foo".  We could also record the bounds directory the
> first time that we catch userspace using it.  I'd rather have an
> explicit interface than an implicit one like that, though I don't feel
> that strongly about it.

I really have to disagree here. If I follow your logic then we would
have a prctl for using floating point as well instead of catching the
use and handle it from there. Just get it, if you make it simple for
user space to do stupid things, they will happen in all provided ways
and some more.

> > The design to support this feature makes no sense at all to me. We
> > have a special mmap interface, some magic kernel side mapping
> > functionality and then on top of it a prctl telling the kernel to
> > ignore/respect it.
> 
> That's a good point.  We don't seem to have anything in the
> allocate_bt() side of things to tell the kernel to refuse to create
> things if the prctl() hasn't been called.  That needs to get added.

And then you need another bunch of logic in the prctl(disable mpx)
path to cleanup the mess instead of just setting a random pointer to
NULL.

> If you don't want to share them in public, I'm happy to take this
> off-list, but please do share.

I'll let you know once I verified that it might work.

Thanks,

	tglx
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
  2014-09-12  2:54       ` Ren, Qiaowei
@ 2014-09-12  8:17         ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  8:17 UTC (permalink / raw)
  To: Ren, Qiaowei
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Ren, Qiaowei wrote:
> On 2014-09-12, Thomas Gleixner wrote:
> > On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> > 
> >> Due to new fields about bound violation added into struct siginfo,
> >> this patch syncs it with general version to avoid build issue.
> > 
> > You completely fail to explain which build issue is addressed by this
> > patch. The code you added to kernel/signal.c which accesses _addr_bnd
> > is guarded by
> > 
> > +#ifdef SEGV_BNDERR
> > 
> > which is not defined my MIPS. Also why is this only affecting MIPS and
> > not any other architecture which provides its own struct siginfo ?
> > 
> > That patch makes no sense at all, at least not without a proper explanation.
> >
> For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will
> include general siginfo.h, and only replace general stuct siginfo
> with mips specific struct siginfo. So SEGV_BNDERR will be defined
> for all archs, and we will get error like "no _lower in struct
> siginfo" when arch=mips.

> In addition, only MIPS arch define its own struct siginfo, so this
> is only affecting MIPS.

So IA64 does not count as an architecture and therefor does not need
the same treatment, right?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
@ 2014-09-12  8:17         ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  8:17 UTC (permalink / raw)
  To: Ren, Qiaowei
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Ren, Qiaowei wrote:
> On 2014-09-12, Thomas Gleixner wrote:
> > On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> > 
> >> Due to new fields about bound violation added into struct siginfo,
> >> this patch syncs it with general version to avoid build issue.
> > 
> > You completely fail to explain which build issue is addressed by this
> > patch. The code you added to kernel/signal.c which accesses _addr_bnd
> > is guarded by
> > 
> > +#ifdef SEGV_BNDERR
> > 
> > which is not defined my MIPS. Also why is this only affecting MIPS and
> > not any other architecture which provides its own struct siginfo ?
> > 
> > That patch makes no sense at all, at least not without a proper explanation.
> >
> For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will
> include general siginfo.h, and only replace general stuct siginfo
> with mips specific struct siginfo. So SEGV_BNDERR will be defined
> for all archs, and we will get error like "no _lower in struct
> siginfo" when arch=mips.

> In addition, only MIPS arch define its own struct siginfo, so this
> is only affecting MIPS.

So IA64 does not count as an architecture and therefor does not need
the same treatment, right?

Thanks,

	tglx


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12  8:11         ` Thomas Gleixner
@ 2014-09-12  9:24           ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  9:24 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Dave Hansen wrote:
> > Well, we use it to figure out whether we _potentially_ need to tear down
> > an VM_MPX-flagged area.  There's no guarantee that there will be one.
> 
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
> 	mmap_mpx();
> 	prctl(enable mpx);
> 	do lots of crap which uses mpx;
> 	prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

do_bounds() will happily map new BTs no matter whether the prctl was
invoked or not. So what's the value of the prctl at all?

The mapping is flagged VM_MPX. Why is this not sufficient?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12  9:24           ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12  9:24 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Dave Hansen wrote:
> > Well, we use it to figure out whether we _potentially_ need to tear down
> > an VM_MPX-flagged area.  There's no guarantee that there will be one.
> 
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
> 	mmap_mpx();
> 	prctl(enable mpx);
> 	do lots of crap which uses mpx;
> 	prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

do_bounds() will happily map new BTs no matter whether the prctl was
invoked or not. So what's the value of the prctl at all?

The mapping is flagged VM_MPX. Why is this not sufficient?

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-12  4:44             ` H. Peter Anvin
@ 2014-09-12 13:10               ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 13:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 04:37 PM, Thomas Gleixner wrote:
> > > 
> > > Specifically because marshaling the data in and out of the generic
> > > decoder was more complex than a special-purpose decoder.
> > 
> > I did not look at that detail and I trust your judgement here, but
> > that is in no way explained in the changelog.
> > 
> > This whole patchset is a pain to review due to half baken changelogs
> > and complete lack of a proper design description.
> > 
> 
> I'm not wedded to that concept, by the way, but using the generic parser had a
> whole bunch of its own problems, including the fact that you're getting bytes
> from user space.

Errm. The instruction decoder does not even know about user space.

      u8 buf[MAX_INSN_SIZE];

      memset(buf, 0, MAX_INSN_SIZE);
      if (copy_from_user(buf, addr, MAX_INSN_SIZE))
      	    return 0;

      insn_init(insn, buf, is_64bit(current));

      /* Process the entire instruction */
      insn_get_length(insn);

      /* Decode the faulting address */
      return mpx_get_addr(insn, regs);

I really can't see why that should not work. insn_get_length()
retrieves exactly the information which is required to call
mpx_get_addr().

Sure it might be a bit slower because the generic decoder does a bit
more than the mpx private sauce, but this happens in the context of a
bounds violation and it really does not matter at all whether SIGSEGV
is delivered 5 microseconds later or not.

The only difference is the insn->limit handling in the MPX
decoder. The existing decoder has a limit check of:

#define MAX_INSN_SIZE       16

and MPX private one makes that

#define MAX_MPX_INSN_SIZE   15

and limits it runtime further to:

    MAX_MPX_INSN_SIZE - bytes_not_copied_from_user_space;

This is beyond silly, really. If we cannot copy 16 bytes from user
space, why bother in dealing with a partial copy at all.

Aside of that the existing decoder handles the 32bit app on a 64bit
kernel already correctly while the extra magic MPX decoder does
not. It just adds some magically optimized and different copy of the
existing decoder for exactly ZERO value.

> It might be worthwhile to compare the older patchset which did use the generic
> parser to make sure that it actually made sense.

I can't find such a thing. The first version I found contains an even
more convoluted private parser. Intelnal mail perhaps?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12 13:10               ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 13:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 04:37 PM, Thomas Gleixner wrote:
> > > 
> > > Specifically because marshaling the data in and out of the generic
> > > decoder was more complex than a special-purpose decoder.
> > 
> > I did not look at that detail and I trust your judgement here, but
> > that is in no way explained in the changelog.
> > 
> > This whole patchset is a pain to review due to half baken changelogs
> > and complete lack of a proper design description.
> > 
> 
> I'm not wedded to that concept, by the way, but using the generic parser had a
> whole bunch of its own problems, including the fact that you're getting bytes
> from user space.

Errm. The instruction decoder does not even know about user space.

      u8 buf[MAX_INSN_SIZE];

      memset(buf, 0, MAX_INSN_SIZE);
      if (copy_from_user(buf, addr, MAX_INSN_SIZE))
      	    return 0;

      insn_init(insn, buf, is_64bit(current));

      /* Process the entire instruction */
      insn_get_length(insn);

      /* Decode the faulting address */
      return mpx_get_addr(insn, regs);

I really can't see why that should not work. insn_get_length()
retrieves exactly the information which is required to call
mpx_get_addr().

Sure it might be a bit slower because the generic decoder does a bit
more than the mpx private sauce, but this happens in the context of a
bounds violation and it really does not matter at all whether SIGSEGV
is delivered 5 microseconds later or not.

The only difference is the insn->limit handling in the MPX
decoder. The existing decoder has a limit check of:

#define MAX_INSN_SIZE       16

and MPX private one makes that

#define MAX_MPX_INSN_SIZE   15

and limits it runtime further to:

    MAX_MPX_INSN_SIZE - bytes_not_copied_from_user_space;

This is beyond silly, really. If we cannot copy 16 bytes from user
space, why bother in dealing with a partial copy at all.

Aside of that the existing decoder handles the 32bit app on a 64bit
kernel already correctly while the extra magic MPX decoder does
not. It just adds some magically optimized and different copy of the
existing decoder for exactly ZERO value.

> It might be worthwhile to compare the older patchset which did use the generic
> parser to make sure that it actually made sense.

I can't find such a thing. The first version I found contains an even
more convoluted private parser. Intelnal mail perhaps?

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-12 13:10               ` Thomas Gleixner
@ 2014-09-12 13:39                 ` H. Peter Anvin
  -1 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12 13:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 06:10 AM, Thomas Gleixner wrote:
>>
>> I'm not wedded to that concept, by the way, but using the generic parser had a
>> whole bunch of its own problems, including the fact that you're getting bytes
>> from user space.
> 
> Errm. The instruction decoder does not even know about user space.
> 
>       u8 buf[MAX_INSN_SIZE];
> 
>       memset(buf, 0, MAX_INSN_SIZE);
>       if (copy_from_user(buf, addr, MAX_INSN_SIZE))
>       	    return 0;
> 
>       insn_init(insn, buf, is_64bit(current));
> 
>       /* Process the entire instruction */
>       insn_get_length(insn);
> 
>       /* Decode the faulting address */
>       return mpx_get_addr(insn, regs);
> 
> I really can't see why that should not work. insn_get_length()
> retrieves exactly the information which is required to call
> mpx_get_addr().
> 
> Sure it might be a bit slower because the generic decoder does a bit
> more than the mpx private sauce, but this happens in the context of a
> bounds violation and it really does not matter at all whether SIGSEGV
> is delivered 5 microseconds later or not.
> 
> The only difference is the insn->limit handling in the MPX
> decoder. The existing decoder has a limit check of:
> 
> #define MAX_INSN_SIZE       16
> 
> and MPX private one makes that
> 
> #define MAX_MPX_INSN_SIZE   15
> 
> and limits it runtime further to:
> 
>     MAX_MPX_INSN_SIZE - bytes_not_copied_from_user_space;
> 
> This is beyond silly, really. If we cannot copy 16 bytes from user
> space, why bother in dealing with a partial copy at all.
> 

The correct limit is 15 bytes, not anything else, so this is a bug in
the existing decoder.  A sequence of bytes longer than 15 bytes will
#UD, regardless of being "otherwise valid".

Keep in mind the instruction may not be aligned, and you could fit an
instruction plus a jump and still overrun a page in 15 bytes.

> Aside of that the existing decoder handles the 32bit app on a 64bit
> kernel already correctly while the extra magic MPX decoder does
> not. It just adds some magically optimized and different copy of the
> existing decoder for exactly ZERO value.
> 
>> It might be worthwhile to compare the older patchset which did use the generic
>> parser to make sure that it actually made sense.
> 
> I can't find such a thing. The first version I found contains an even
> more convoluted private parser. Intelnal mail perhaps?

Yes, I suspect so.

	-hpa


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12 13:39                 ` H. Peter Anvin
  0 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12 13:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 06:10 AM, Thomas Gleixner wrote:
>>
>> I'm not wedded to that concept, by the way, but using the generic parser had a
>> whole bunch of its own problems, including the fact that you're getting bytes
>> from user space.
> 
> Errm. The instruction decoder does not even know about user space.
> 
>       u8 buf[MAX_INSN_SIZE];
> 
>       memset(buf, 0, MAX_INSN_SIZE);
>       if (copy_from_user(buf, addr, MAX_INSN_SIZE))
>       	    return 0;
> 
>       insn_init(insn, buf, is_64bit(current));
> 
>       /* Process the entire instruction */
>       insn_get_length(insn);
> 
>       /* Decode the faulting address */
>       return mpx_get_addr(insn, regs);
> 
> I really can't see why that should not work. insn_get_length()
> retrieves exactly the information which is required to call
> mpx_get_addr().
> 
> Sure it might be a bit slower because the generic decoder does a bit
> more than the mpx private sauce, but this happens in the context of a
> bounds violation and it really does not matter at all whether SIGSEGV
> is delivered 5 microseconds later or not.
> 
> The only difference is the insn->limit handling in the MPX
> decoder. The existing decoder has a limit check of:
> 
> #define MAX_INSN_SIZE       16
> 
> and MPX private one makes that
> 
> #define MAX_MPX_INSN_SIZE   15
> 
> and limits it runtime further to:
> 
>     MAX_MPX_INSN_SIZE - bytes_not_copied_from_user_space;
> 
> This is beyond silly, really. If we cannot copy 16 bytes from user
> space, why bother in dealing with a partial copy at all.
> 

The correct limit is 15 bytes, not anything else, so this is a bug in
the existing decoder.  A sequence of bytes longer than 15 bytes will
#UD, regardless of being "otherwise valid".

Keep in mind the instruction may not be aligned, and you could fit an
instruction plus a jump and still overrun a page in 15 bytes.

> Aside of that the existing decoder handles the 32bit app on a 64bit
> kernel already correctly while the extra magic MPX decoder does
> not. It just adds some magically optimized and different copy of the
> existing decoder for exactly ZERO value.
> 
>> It might be worthwhile to compare the older patchset which did use the generic
>> parser to make sure that it actually made sense.
> 
> I can't find such a thing. The first version I found contains an even
> more convoluted private parser. Intelnal mail perhaps?

Yes, I suspect so.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12  9:24           ` Thomas Gleixner
@ 2014-09-12 14:36             ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 14:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Thu, 11 Sep 2014, Dave Hansen wrote:
>>> Well, we use it to figure out whether we _potentially_ need to tear down
>>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
>>
>> So what you are saying is, that if user space sets the pointer to NULL
>> via the unregister prctl, kernel can safely ignore vmas which have the
>> VM_MPX flag set. I really can't follow that logic.
>>  
>> 	mmap_mpx();
>> 	prctl(enable mpx);
>> 	do lots of crap which uses mpx;
>> 	prctl(disable mpx);
>>
>> So after that point the previous use of MPX is irrelevant, just
>> because we set a pointer to NULL? Does it just look like crap because
>> I do not get the big picture how all of this is supposed to work?
> 
> do_bounds() will happily map new BTs no matter whether the prctl was
> invoked or not. So what's the value of the prctl at all?

The behavior as it stands is wrong.  We should at least have the kernel
refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.

> The mapping is flagged VM_MPX. Why is this not sufficient?

The comment is confusing and only speaks to half of what the if() in
question is doing.  We'll get a better comment in there.  But, for the
sake of explaining it fully:

There are two mappings in play:
1. The mapping with the actual data, which userspace is munmap()ing or
   brk()ing away, etc... (never tagged VM_MPX)
2. The mapping for the bounds table *backing* the data (is tagged with
   VM_MPX)

The code ends up looking like this:

vm_munmap()
{
	do_unmap(vma); // #1 above
	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
		// lookup the backing vma (#2 above)
		vm_munmap(vma2)
}

The bd_addr check is intended to say "could the kernel have possibly
created some VM_MPX vmas?"  As you noted above, we will happily go
creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.

The VM_MPX _flags_ check on the VMA is there simply to prevent
recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
of the data VMA, and we've got to ensure it doesn't recurse.  *This*
part of the if() in question is not addressed in the comment.  That's
something we can fix up in the next version.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 14:36             ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 14:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Thu, 11 Sep 2014, Dave Hansen wrote:
>>> Well, we use it to figure out whether we _potentially_ need to tear down
>>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
>>
>> So what you are saying is, that if user space sets the pointer to NULL
>> via the unregister prctl, kernel can safely ignore vmas which have the
>> VM_MPX flag set. I really can't follow that logic.
>>  
>> 	mmap_mpx();
>> 	prctl(enable mpx);
>> 	do lots of crap which uses mpx;
>> 	prctl(disable mpx);
>>
>> So after that point the previous use of MPX is irrelevant, just
>> because we set a pointer to NULL? Does it just look like crap because
>> I do not get the big picture how all of this is supposed to work?
> 
> do_bounds() will happily map new BTs no matter whether the prctl was
> invoked or not. So what's the value of the prctl at all?

The behavior as it stands is wrong.  We should at least have the kernel
refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.

> The mapping is flagged VM_MPX. Why is this not sufficient?

The comment is confusing and only speaks to half of what the if() in
question is doing.  We'll get a better comment in there.  But, for the
sake of explaining it fully:

There are two mappings in play:
1. The mapping with the actual data, which userspace is munmap()ing or
   brk()ing away, etc... (never tagged VM_MPX)
2. The mapping for the bounds table *backing* the data (is tagged with
   VM_MPX)

The code ends up looking like this:

vm_munmap()
{
	do_unmap(vma); // #1 above
	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
		// lookup the backing vma (#2 above)
		vm_munmap(vma2)
}

The bd_addr check is intended to say "could the kernel have possibly
created some VM_MPX vmas?"  As you noted above, we will happily go
creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.

The VM_MPX _flags_ check on the VMA is there simply to prevent
recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
of the data VMA, and we've got to ensure it doesn't recurse.  *This*
part of the if() in question is not addressed in the comment.  That's
something we can fix up in the next version.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12  8:11         ` Thomas Gleixner
@ 2014-09-12 15:22           ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 15:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
> 	mmap_mpx();
> 	prctl(enable mpx);
> 	do lots of crap which uses mpx;
> 	prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

The prctl(register) is meant to be a signal from userspace to the kernel
to say, "I would like your help in managing these bounds tables".
prctl(unregister) is the opposite, meaning "I don't want your help any
more".

The kernel won't really ignore VM_MPX vmas, it just won't go looking for
them actively in response to the unmapping of other non-VM_MPX vmas.

>> Yes.  The only other way the kernel can possibly know that it needs to
>> go tearing things down is with a potentially frequent and expensive xsave.
>>
>> Either we change mmap to say "this mmap() is for a bounds directory", or
>> we have some other interface that says "the mmap() for the bounds
>> directory is at $foo".  We could also record the bounds directory the
>> first time that we catch userspace using it.  I'd rather have an
>> explicit interface than an implicit one like that, though I don't feel
>> that strongly about it.
> 
> I really have to disagree here. If I follow your logic then we would
> have a prctl for using floating point as well instead of catching the
> use and handle it from there. Just get it, if you make it simple for
> user space to do stupid things, they will happen in all provided ways
> and some more.

Here's what it boils down to:

If userspace uses a floating point register, it wants it saved.

If userspace uses MPX, it does not necessarily want the kernel to do
bounds table management all the time (or ever in some cases).  Without
the prctl(), the kernel has no way of distinguishing what userspace wants.

>>> The design to support this feature makes no sense at all to me. We
>>> have a special mmap interface, some magic kernel side mapping
>>> functionality and then on top of it a prctl telling the kernel to
>>> ignore/respect it.
>>
>> That's a good point.  We don't seem to have anything in the
>> allocate_bt() side of things to tell the kernel to refuse to create
>> things if the prctl() hasn't been called.  That needs to get added.
> 
> And then you need another bunch of logic in the prctl(disable mpx)
> path to cleanup the mess instead of just setting a random pointer to
> NULL.

The bounds tables potentially represent a *lot* of state.  If userspace
wants to temporarily turn off the kernel's MPX bounds table management,
it does not necessarily want that state destroyed.  On the other hand,
if userspace feels the need to go destroying all the state, it is free
to do so and does not need any help to do so from the kernel.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 15:22           ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 15:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> So what you are saying is, that if user space sets the pointer to NULL
> via the unregister prctl, kernel can safely ignore vmas which have the
> VM_MPX flag set. I really can't follow that logic.
>  
> 	mmap_mpx();
> 	prctl(enable mpx);
> 	do lots of crap which uses mpx;
> 	prctl(disable mpx);
> 
> So after that point the previous use of MPX is irrelevant, just
> because we set a pointer to NULL? Does it just look like crap because
> I do not get the big picture how all of this is supposed to work?

The prctl(register) is meant to be a signal from userspace to the kernel
to say, "I would like your help in managing these bounds tables".
prctl(unregister) is the opposite, meaning "I don't want your help any
more".

The kernel won't really ignore VM_MPX vmas, it just won't go looking for
them actively in response to the unmapping of other non-VM_MPX vmas.

>> Yes.  The only other way the kernel can possibly know that it needs to
>> go tearing things down is with a potentially frequent and expensive xsave.
>>
>> Either we change mmap to say "this mmap() is for a bounds directory", or
>> we have some other interface that says "the mmap() for the bounds
>> directory is at $foo".  We could also record the bounds directory the
>> first time that we catch userspace using it.  I'd rather have an
>> explicit interface than an implicit one like that, though I don't feel
>> that strongly about it.
> 
> I really have to disagree here. If I follow your logic then we would
> have a prctl for using floating point as well instead of catching the
> use and handle it from there. Just get it, if you make it simple for
> user space to do stupid things, they will happen in all provided ways
> and some more.

Here's what it boils down to:

If userspace uses a floating point register, it wants it saved.

If userspace uses MPX, it does not necessarily want the kernel to do
bounds table management all the time (or ever in some cases).  Without
the prctl(), the kernel has no way of distinguishing what userspace wants.

>>> The design to support this feature makes no sense at all to me. We
>>> have a special mmap interface, some magic kernel side mapping
>>> functionality and then on top of it a prctl telling the kernel to
>>> ignore/respect it.
>>
>> That's a good point.  We don't seem to have anything in the
>> allocate_bt() side of things to tell the kernel to refuse to create
>> things if the prctl() hasn't been called.  That needs to get added.
> 
> And then you need another bunch of logic in the prctl(disable mpx)
> path to cleanup the mess instead of just setting a random pointer to
> NULL.

The bounds tables potentially represent a *lot* of state.  If userspace
wants to temporarily turn off the kernel's MPX bounds table management,
it does not necessarily want that state destroyed.  On the other hand,
if userspace feels the need to go destroying all the state, it is free
to do so and does not need any help to do so from the kernel.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 14:36             ` Dave Hansen
@ 2014-09-12 17:34               ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Dave Hansen wrote:
> >>> Well, we use it to figure out whether we _potentially_ need to tear down
> >>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
> >>
> >> So what you are saying is, that if user space sets the pointer to NULL
> >> via the unregister prctl, kernel can safely ignore vmas which have the
> >> VM_MPX flag set. I really can't follow that logic.
> >>  
> >> 	mmap_mpx();
> >> 	prctl(enable mpx);
> >> 	do lots of crap which uses mpx;
> >> 	prctl(disable mpx);
> >>
> >> So after that point the previous use of MPX is irrelevant, just
> >> because we set a pointer to NULL? Does it just look like crap because
> >> I do not get the big picture how all of this is supposed to work?
> > 
> > do_bounds() will happily map new BTs no matter whether the prctl was
> > invoked or not. So what's the value of the prctl at all?
> 
> The behavior as it stands is wrong.  We should at least have the kernel
> refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.
> 
> > The mapping is flagged VM_MPX. Why is this not sufficient?
> 
> The comment is confusing and only speaks to half of what the if() in
> question is doing.  We'll get a better comment in there.  But, for the
> sake of explaining it fully:
> 
> There are two mappings in play:
> 1. The mapping with the actual data, which userspace is munmap()ing or
>    brk()ing away, etc... (never tagged VM_MPX)

It's not tagged that way because it is mapped by user space. This is
the directory, right?

> 2. The mapping for the bounds table *backing* the data (is tagged with
>    VM_MPX)

That's the stuff, which gets magically allocated from do_bounds(). And
the reason you do that from the #BR is that user space would have to
allocate a gazillion of bound tables to make sure that every corner
case is covered. With the allocation from #BR you make that behaviour
dynamic and you just provide an empty "no bounds" table to make the
bound checker happy.

> The code ends up looking like this:
> 
> vm_munmap()
> {
> 	do_unmap(vma); // #1 above
> 	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> 		// lookup the backing vma (#2 above)
> 		vm_munmap(vma2)
> }
> 
> The bd_addr check is intended to say "could the kernel have possibly
> created some VM_MPX vmas?"  As you noted above, we will happily go
> creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.
> 
> The VM_MPX _flags_ check on the VMA is there simply to prevent
> recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
> of the data VMA, and we've got to ensure it doesn't recurse.  *This*
> part of the if() in question is not addressed in the comment.  That's
> something we can fix up in the next version.

Ok, slowly I get the puzzle together :)

Now, the question is whether this magic fragile fixup is the right
thing to do in the context of unmap/brk.

So if the directory is unmapped, you want to free the bounds tables
which are referenced from the directory, i.e. those which you
allocated in do_bounds().
 
So you call arch_unmap() at the very end of do_unmap(). This walks the
directory to look at the entries and unmaps the bounds table which is
referenced from the directory and then clears the directory entry.

Now, I have a hard time to see how that is supposed to work.

do_unmap()
 detach_vmas_to_be_unmapped()
 unmap_region()
   free_pgtables()
 arch_unmap()
   mpx_unmap()

So at the point where you try to access the directory to gather the
information about the entries which might be affected, that stuff is
unmapped already and the page tables are gone.

Brilliant idea, really. And if you run into the fault in mpx_unmap()
you plan to delegate the fixup to a work queue. How is that thing
going to find what belonged to the unmapped directory?

Even if the stuff would be accessible at that point, it is a damned
stupid idea to rely on anything userspace is providing to you. I
learned that the hard way in futex.c

The proper solution to this problem is:

    do_bounds()
	bd_addr = get_bd_addr_from_xsave();
	bd_entry = bndstatus & ADDR_MASK:

	bt = mpx_mmap(bd_addr, bd_entry, len);

	set_bt_entry_in_bd(bd_entry, bt);

And in mpx_mmap()

       .....
       vma = find_vma();

       vma->bd_addr = bd_addr;
       vma->bd_entry = bd_entry;

Now on mpx_unmap()

    for_each_vma()
	if (is_affected(vma->bd_addr, vma->bd_entry))
 	   unmap(vma);

That does not require a prctl, no fault handling in the unmap path, it
just works and is robust by design because it does not rely on any
user space crappola. You store the directory context at allocation
time and free it when that context goes away. It's that simple, really.

So you can still think about a prctl in order to enable/disable the
automatic mapping stuff, but that's a completely different story.
   
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 17:34               ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Dave Hansen wrote:
> >>> Well, we use it to figure out whether we _potentially_ need to tear down
> >>> an VM_MPX-flagged area.  There's no guarantee that there will be one.
> >>
> >> So what you are saying is, that if user space sets the pointer to NULL
> >> via the unregister prctl, kernel can safely ignore vmas which have the
> >> VM_MPX flag set. I really can't follow that logic.
> >>  
> >> 	mmap_mpx();
> >> 	prctl(enable mpx);
> >> 	do lots of crap which uses mpx;
> >> 	prctl(disable mpx);
> >>
> >> So after that point the previous use of MPX is irrelevant, just
> >> because we set a pointer to NULL? Does it just look like crap because
> >> I do not get the big picture how all of this is supposed to work?
> > 
> > do_bounds() will happily map new BTs no matter whether the prctl was
> > invoked or not. So what's the value of the prctl at all?
> 
> The behavior as it stands is wrong.  We should at least have the kernel
> refuse to map new BTs if the prctl() hasn't been issued.  We'll fix it up.
> 
> > The mapping is flagged VM_MPX. Why is this not sufficient?
> 
> The comment is confusing and only speaks to half of what the if() in
> question is doing.  We'll get a better comment in there.  But, for the
> sake of explaining it fully:
> 
> There are two mappings in play:
> 1. The mapping with the actual data, which userspace is munmap()ing or
>    brk()ing away, etc... (never tagged VM_MPX)

It's not tagged that way because it is mapped by user space. This is
the directory, right?

> 2. The mapping for the bounds table *backing* the data (is tagged with
>    VM_MPX)

That's the stuff, which gets magically allocated from do_bounds(). And
the reason you do that from the #BR is that user space would have to
allocate a gazillion of bound tables to make sure that every corner
case is covered. With the allocation from #BR you make that behaviour
dynamic and you just provide an empty "no bounds" table to make the
bound checker happy.

> The code ends up looking like this:
> 
> vm_munmap()
> {
> 	do_unmap(vma); // #1 above
> 	if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> 		// lookup the backing vma (#2 above)
> 		vm_munmap(vma2)
> }
> 
> The bd_addr check is intended to say "could the kernel have possibly
> created some VM_MPX vmas?"  As you noted above, we will happily go
> creating VM_MPX vmas without mm->bd_addr being set.  That's will get fixed.
> 
> The VM_MPX _flags_ check on the VMA is there simply to prevent
> recursion.  vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
> of the data VMA, and we've got to ensure it doesn't recurse.  *This*
> part of the if() in question is not addressed in the comment.  That's
> something we can fix up in the next version.

Ok, slowly I get the puzzle together :)

Now, the question is whether this magic fragile fixup is the right
thing to do in the context of unmap/brk.

So if the directory is unmapped, you want to free the bounds tables
which are referenced from the directory, i.e. those which you
allocated in do_bounds().
 
So you call arch_unmap() at the very end of do_unmap(). This walks the
directory to look at the entries and unmaps the bounds table which is
referenced from the directory and then clears the directory entry.

Now, I have a hard time to see how that is supposed to work.

do_unmap()
 detach_vmas_to_be_unmapped()
 unmap_region()
   free_pgtables()
 arch_unmap()
   mpx_unmap()

So at the point where you try to access the directory to gather the
information about the entries which might be affected, that stuff is
unmapped already and the page tables are gone.

Brilliant idea, really. And if you run into the fault in mpx_unmap()
you plan to delegate the fixup to a work queue. How is that thing
going to find what belonged to the unmapped directory?

Even if the stuff would be accessible at that point, it is a damned
stupid idea to rely on anything userspace is providing to you. I
learned that the hard way in futex.c

The proper solution to this problem is:

    do_bounds()
	bd_addr = get_bd_addr_from_xsave();
	bd_entry = bndstatus & ADDR_MASK:

	bt = mpx_mmap(bd_addr, bd_entry, len);

	set_bt_entry_in_bd(bd_entry, bt);

And in mpx_mmap()

       .....
       vma = find_vma();

       vma->bd_addr = bd_addr;
       vma->bd_entry = bd_entry;

Now on mpx_unmap()

    for_each_vma()
	if (is_affected(vma->bd_addr, vma->bd_entry))
 	   unmap(vma);

That does not require a prctl, no fault handling in the unmap path, it
just works and is robust by design because it does not rely on any
user space crappola. You store the directory context at allocation
time and free it when that context goes away. It's that simple, really.

So you can still think about a prctl in order to enable/disable the
automatic mapping stuff, but that's a completely different story.
   
Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 15:22           ` Dave Hansen
@ 2014-09-12 17:42             ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> > So what you are saying is, that if user space sets the pointer to NULL
> > via the unregister prctl, kernel can safely ignore vmas which have the
> > VM_MPX flag set. I really can't follow that logic.
> >  
> > 	mmap_mpx();
> > 	prctl(enable mpx);
> > 	do lots of crap which uses mpx;
> > 	prctl(disable mpx);
> > 
> > So after that point the previous use of MPX is irrelevant, just
> > because we set a pointer to NULL? Does it just look like crap because
> > I do not get the big picture how all of this is supposed to work?
> 
> The prctl(register) is meant to be a signal from userspace to the kernel
> to say, "I would like your help in managing these bounds tables".
> prctl(unregister) is the opposite, meaning "I don't want your help any
> more".

Fine, but that's a totally different story. I can see the usefulness
of this, but then it's a complete misnomer. It should be:

   prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

So this wants to be a boolean value and not some random user space
address collected at some random point and then ignored until you do
the magic cleanup. See the other reply.

> If userspace uses MPX, it does not necessarily want the kernel to do
> bounds table management all the time (or ever in some cases).  Without
> the prctl(), the kernel has no way of distinguishing what userspace wants.

Fine with me, but it needs to be done proper. And proper means: ON/OFF

The kernel has to handle the information for which context it
allocated stuff and then tear it down when the context goes
away. Relying on a user space address sampled at some random prctl
point is just stupid.

> > And then you need another bunch of logic in the prctl(disable mpx)
> > path to cleanup the mess instead of just setting a random pointer to
> > NULL.
> 
> The bounds tables potentially represent a *lot* of state.  If userspace
> wants to temporarily turn off the kernel's MPX bounds table management,
> it does not necessarily want that state destroyed.  On the other hand,
> if userspace feels the need to go destroying all the state, it is free
> to do so and does not need any help to do so from the kernel.

Fine with me, but the above still stands.

Thanks,

	tglx

 

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 17:42             ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 01:11 AM, Thomas Gleixner wrote:
> > So what you are saying is, that if user space sets the pointer to NULL
> > via the unregister prctl, kernel can safely ignore vmas which have the
> > VM_MPX flag set. I really can't follow that logic.
> >  
> > 	mmap_mpx();
> > 	prctl(enable mpx);
> > 	do lots of crap which uses mpx;
> > 	prctl(disable mpx);
> > 
> > So after that point the previous use of MPX is irrelevant, just
> > because we set a pointer to NULL? Does it just look like crap because
> > I do not get the big picture how all of this is supposed to work?
> 
> The prctl(register) is meant to be a signal from userspace to the kernel
> to say, "I would like your help in managing these bounds tables".
> prctl(unregister) is the opposite, meaning "I don't want your help any
> more".

Fine, but that's a totally different story. I can see the usefulness
of this, but then it's a complete misnomer. It should be:

   prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

So this wants to be a boolean value and not some random user space
address collected at some random point and then ignored until you do
the magic cleanup. See the other reply.

> If userspace uses MPX, it does not necessarily want the kernel to do
> bounds table management all the time (or ever in some cases).  Without
> the prctl(), the kernel has no way of distinguishing what userspace wants.

Fine with me, but it needs to be done proper. And proper means: ON/OFF

The kernel has to handle the information for which context it
allocated stuff and then tear it down when the context goes
away. Relying on a user space address sampled at some random prctl
point is just stupid.

> > And then you need another bunch of logic in the prctl(disable mpx)
> > path to cleanup the mess instead of just setting a random pointer to
> > NULL.
> 
> The bounds tables potentially represent a *lot* of state.  If userspace
> wants to temporarily turn off the kernel's MPX bounds table management,
> it does not necessarily want that state destroyed.  On the other hand,
> if userspace feels the need to go destroying all the state, it is free
> to do so and does not need any help to do so from the kernel.

Fine with me, but the above still stands.

Thanks,

	tglx

 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-12 13:39                 ` H. Peter Anvin
@ 2014-09-12 17:48                   ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, H. Peter Anvin wrote:
> The correct limit is 15 bytes, not anything else, so this is a bug in
> the existing decoder.  A sequence of bytes longer than 15 bytes will

Fine. Lets fix it there.

> #UD, regardless of being "otherwise valid".

> Keep in mind the instruction may not be aligned, and you could fit an
> instruction plus a jump and still overrun a page in 15 bytes.

Fair enough. OTOH, I doubt that a text mapping will end exactly at
that jump after the MPX instruction.

So that's simple to fix.

Kill the hardcoded limit in lib/insn.c and let the callsites hand in a
lenght argument. So you can still use it for MPX and avoid 200 lines
of blindly copied and slightly different decoder code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12 17:48                   ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:48 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, H. Peter Anvin wrote:
> The correct limit is 15 bytes, not anything else, so this is a bug in
> the existing decoder.  A sequence of bytes longer than 15 bytes will

Fine. Lets fix it there.

> #UD, regardless of being "otherwise valid".

> Keep in mind the instruction may not be aligned, and you could fit an
> instruction plus a jump and still overrun a page in 15 bytes.

Fair enough. OTOH, I doubt that a text mapping will end exactly at
that jump after the MPX instruction.

So that's simple to fix.

Kill the hardcoded limit in lib/insn.c and let the callsites hand in a
lenght argument. So you can still use it for MPX and avoid 200 lines
of blindly copied and slightly different decoder code.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-11 22:35         ` H. Peter Anvin
@ 2014-09-12 17:52           ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 03:32 PM, Dave Hansen wrote:
> > On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >>> This patch sets bound violation fields of siginfo struct in #BR
> >>> exception handler by decoding the user instruction and constructing
> >>> the faulting pointer.
> >>>
> >>> This patch does't use the generic decoder, and implements a limited
> >>> special-purpose decoder to decode MPX instructions, simply because the
> >>> generic decoder is very heavyweight not just in terms of performance
> >>> but in terms of interface -- because it has to.
> >>
> >> And why is that an argument to add another special purpose decoder?
> > 
> > Peter asked for it to be done this way specifically:
> > 
> > 	https://lkml.org/lkml/2014/6/19/411
> > 
> 
> Specifically because marshaling the data in and out of the generic
> decoder was more complex than a special-purpose decoder.

Well, I did not see the trainwreck which tried to use the generic
decoder, but as I explained in the other mail, there is no reason not
to use it and I can't see any complexity in retrieving the data beyond
calling insn_get_length(insn);

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12 17:52           ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 17:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, H. Peter Anvin wrote:

> On 09/11/2014 03:32 PM, Dave Hansen wrote:
> > On 09/11/2014 03:18 PM, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
> >>> This patch sets bound violation fields of siginfo struct in #BR
> >>> exception handler by decoding the user instruction and constructing
> >>> the faulting pointer.
> >>>
> >>> This patch does't use the generic decoder, and implements a limited
> >>> special-purpose decoder to decode MPX instructions, simply because the
> >>> generic decoder is very heavyweight not just in terms of performance
> >>> but in terms of interface -- because it has to.
> >>
> >> And why is that an argument to add another special purpose decoder?
> > 
> > Peter asked for it to be done this way specifically:
> > 
> > 	https://lkml.org/lkml/2014/6/19/411
> > 
> 
> Specifically because marshaling the data in and out of the generic
> decoder was more complex than a special-purpose decoder.

Well, I did not see the trainwreck which tried to use the generic
decoder, but as I explained in the other mail, there is no reason not
to use it and I can't see any complexity in retrieving the data beyond
calling insn_get_length(insn);

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 17:34               ` Thomas Gleixner
@ 2014-09-12 18:42                 ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 18:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
> The proper solution to this problem is:
> 
>     do_bounds()
> 	bd_addr = get_bd_addr_from_xsave();
> 	bd_entry = bndstatus & ADDR_MASK:

Just for clarification. You CANNOT avoid the xsave here because it's
the only way to access BNDSTATUS according to the manual.

"The BNDCFGU and BNDSTATUS registers are accessible only with
 XSAVE/XRSTOR family of instructions"

So there is no point to cache BNDCFGU as you get it anyway when you
need to retrieve the invalid BD entry.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 18:42                 ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 18:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
> The proper solution to this problem is:
> 
>     do_bounds()
> 	bd_addr = get_bd_addr_from_xsave();
> 	bd_entry = bndstatus & ADDR_MASK:

Just for clarification. You CANNOT avoid the xsave here because it's
the only way to access BNDSTATUS according to the manual.

"The BNDCFGU and BNDSTATUS registers are accessible only with
 XSAVE/XRSTOR family of instructions"

So there is no point to cache BNDCFGU as you get it anyway when you
need to retrieve the invalid BD entry.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
  2014-09-12 17:52           ` Thomas Gleixner
@ 2014-09-12 19:07             ` H. Peter Anvin
  -1 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12 19:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:52 AM, Thomas Gleixner wrote:
> 
> Well, I did not see the trainwreck which tried to use the generic
> decoder, but as I explained in the other mail, there is no reason not
> to use it and I can't see any complexity in retrieving the data beyond
> calling insn_get_length(insn);
> 

Looking at how complex the state machine ended up being, it probably was
the wrong direction.  It is safe to copy_from_user() 15 bytes, decode
what we get (which may be less than 15 bytes) and then verify with
insn_get_length() that what we decoded is actually what we copied if the
copy_from_user() length is < 15.

My intent was to explore a state machine limited to the restricted "mib"
encodings that are valid for BNDSTX and BNDLDX only, but in the end it
really doesn't make enough difference that it is worth messing with, I
don't think.

	-hpa


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information
@ 2014-09-12 19:07             ` H. Peter Anvin
  0 siblings, 0 replies; 130+ messages in thread
From: H. Peter Anvin @ 2014-09-12 19:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Qiaowei Ren, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:52 AM, Thomas Gleixner wrote:
> 
> Well, I did not see the trainwreck which tried to use the generic
> decoder, but as I explained in the other mail, there is no reason not
> to use it and I can't see any complexity in retrieving the data beyond
> calling insn_get_length(insn);
> 

Looking at how complex the state machine ended up being, it probably was
the wrong direction.  It is safe to copy_from_user() 15 bytes, decode
what we get (which may be less than 15 bytes) and then verify with
insn_get_length() that what we decoded is actually what we copied if the
copy_from_user() length is < 15.

My intent was to explore a state machine limited to the restricted "mib"
encodings that are valid for BNDSTX and BNDLDX only, but in the end it
really doesn't make enough difference that it is worth messing with, I
don't think.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12  0:51   ` Dave Hansen
@ 2014-09-12 19:21     ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 19:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Dave Hansen wrote:

> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> > MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
> > provide handlers for bounds faults (#BR), and manage bounds memory.
> 
> Qiaowei, We probably need to mention here what "bounds memory" is, and
> why it has to be managed, and who is responsible for the different pieces.
> 
> Who allocates the memory?
> Who fills the memory?
> When is it freed?
> 
> Thomas, do you have any other suggestions for things you'd like to see
> clarified?

Yes, the most important question is WHY must the kernel handle the
bound table memory allocation in the first place. The "documentation"
patch completely fails to tell that.

> +3. Tips
> +=======
> +
> +1) Users are not allowed to create bounds tables and point the bounds
> +directory at them in the userspace. In fact, it is not also necessary
> +for users to create bounds tables in the userspace.

This misses to explain why. I studied the manual carefully and I have
no idea why you think this is a requirement.

MPX can be handled completely from user space. See below before you
answer.

> +When #BR fault is produced due to invalid entry, bounds table will be
> +created in kernel on demand and kernel will not transfer this fault to
> +userspace. So usersapce can't receive #BR fault for invalid entry, and
> +it is not also necessary for users to create bounds tables by themselves.
> +
> +Certainly users can allocate bounds tables and forcibly point the bounds
> +directory at them through XSAVE instruction, and then set valid bit
> +of bounds entry to have this entry valid. But we have no way to track
> +the memory usage of these user-created bounds tables. In regard to this,
> +this behaviour is outlawed here.

So what's the point of declaring it outlawed? Nothing as far as I can
see simply because you cannot enforce it. This is possible and people
simply will do it.

> +2) We will not support the case that multiple bounds directory entries
> +are pointed at the same bounds table.
> +
> +Users can be allowed to take multiple bounds directory entries and point
> +them at the same bounds table. See more information "Intel(R) Architecture
> +Instruction Set Extensions Programming Reference" (9.3.4).
> +
> +If userspace did this, it will be possible for kernel to unmap an in-use
> +bounds table since it does not recognize sharing. So this behavior is
> +also outlawed here.

Again, this is nothing you can enforce and just saying its outlawed
does not prevent user space from doing it and then sending hard to
decode bug reports where it complains about mappings silently
vanishing under it.

So all you can do here is to write up a rule set how well behaving
user space is supposed to use this facility and the kernel side of it. 

Now back to the original question WHY:

The only kind of "argument" you provided in the whole blurb is "if
user space handles the allocation we have no way to track the memory
usage of these tables".

So if the only value of this whole allocation endavour is that we have
a separate "name" entry in proc/$PID/maps then this definitely does
not justify the mess it creates. You'd be better off with creating a
syscall which allows putting a name tag on a anonymous
mapping. Seriously, that would be handy for other purposes than MPX as
well.

But after staring into the manual and the code trainwreck for a day, I
certainly know WHY you want to handle it in kernel space.

If user space wants to handle it, it needs to preallocate all the
Bound Table mappings simply because it cannot do so from the signal
handler which gets invoked on the #BR 'Invalid BD entry'. mmap is not
on the list of safe async handler functions and even if mmap would
work it still requires locking or nasty tricks to keep track of the
allocation state there.

Preallocation is simply not feasible, because user space does not know
about the requirements of libraries etc. So letting the kernel help
out here is the right approach.

All that information is completely missing in the "doc" and all
over the patch series. 

Thanks,

	tglx






^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-12 19:21     ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-12 19:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Thu, 11 Sep 2014, Dave Hansen wrote:

> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> > MPX kernel code, namely this patchset, has mainly the 2 responsibilities:
> > provide handlers for bounds faults (#BR), and manage bounds memory.
> 
> Qiaowei, We probably need to mention here what "bounds memory" is, and
> why it has to be managed, and who is responsible for the different pieces.
> 
> Who allocates the memory?
> Who fills the memory?
> When is it freed?
> 
> Thomas, do you have any other suggestions for things you'd like to see
> clarified?

Yes, the most important question is WHY must the kernel handle the
bound table memory allocation in the first place. The "documentation"
patch completely fails to tell that.

> +3. Tips
> +=======
> +
> +1) Users are not allowed to create bounds tables and point the bounds
> +directory at them in the userspace. In fact, it is not also necessary
> +for users to create bounds tables in the userspace.

This misses to explain why. I studied the manual carefully and I have
no idea why you think this is a requirement.

MPX can be handled completely from user space. See below before you
answer.

> +When #BR fault is produced due to invalid entry, bounds table will be
> +created in kernel on demand and kernel will not transfer this fault to
> +userspace. So usersapce can't receive #BR fault for invalid entry, and
> +it is not also necessary for users to create bounds tables by themselves.
> +
> +Certainly users can allocate bounds tables and forcibly point the bounds
> +directory at them through XSAVE instruction, and then set valid bit
> +of bounds entry to have this entry valid. But we have no way to track
> +the memory usage of these user-created bounds tables. In regard to this,
> +this behaviour is outlawed here.

So what's the point of declaring it outlawed? Nothing as far as I can
see simply because you cannot enforce it. This is possible and people
simply will do it.

> +2) We will not support the case that multiple bounds directory entries
> +are pointed at the same bounds table.
> +
> +Users can be allowed to take multiple bounds directory entries and point
> +them at the same bounds table. See more information "Intel(R) Architecture
> +Instruction Set Extensions Programming Reference" (9.3.4).
> +
> +If userspace did this, it will be possible for kernel to unmap an in-use
> +bounds table since it does not recognize sharing. So this behavior is
> +also outlawed here.

Again, this is nothing you can enforce and just saying its outlawed
does not prevent user space from doing it and then sending hard to
decode bug reports where it complains about mappings silently
vanishing under it.

So all you can do here is to write up a rule set how well behaving
user space is supposed to use this facility and the kernel side of it. 

Now back to the original question WHY:

The only kind of "argument" you provided in the whole blurb is "if
user space handles the allocation we have no way to track the memory
usage of these tables".

So if the only value of this whole allocation endavour is that we have
a separate "name" entry in proc/$PID/maps then this definitely does
not justify the mess it creates. You'd be better off with creating a
syscall which allows putting a name tag on a anonymous
mapping. Seriously, that would be handy for other purposes than MPX as
well.

But after staring into the manual and the code trainwreck for a day, I
certainly know WHY you want to handle it in kernel space.

If user space wants to handle it, it needs to preallocate all the
Bound Table mappings simply because it cannot do so from the signal
handler which gets invoked on the #BR 'Invalid BD entry'. mmap is not
on the list of safe async handler functions and even if mmap would
work it still requires locking or nasty tricks to keep track of the
allocation state there.

Preallocation is simply not feasible, because user space does not know
about the requirements of libraries etc. So letting the kernel help
out here is the right approach.

All that information is completely missing in the "doc" and all
over the patch series. 

Thanks,

	tglx





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 17:34               ` Thomas Gleixner
@ 2014-09-12 20:18                 ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> There are two mappings in play:
>> 1. The mapping with the actual data, which userspace is munmap()ing or
>>    brk()ing away, etc... (never tagged VM_MPX)
> 
> It's not tagged that way because it is mapped by user space.

Correct.  It is not tagged because it is mapped by user space.

> This is the directory, right?

No.  The untagged mapping in question here is for normal user data, like
an mmap() or brk(), unrelated to MPX.

The directory is a separate matter.  It is also (currently) untagged
with VM_MPX since it is also allocated by userspace.

>> 2. The mapping for the bounds table *backing* the data (is tagged with
>>    VM_MPX)
> 
> That's the stuff, which gets magically allocated from do_bounds(). And
> the reason you do that from the #BR is that user space would have to
> allocate a gazillion of bound tables to make sure that every corner
> case is covered.

Yes.

> With the allocation from #BR you make that behaviour
> dynamic and you just provide an empty "no bounds" table to make the
> bound checker happy.

Kinda.  We do provide an empty table, but the first access will always
be a write, so it doesn't stay empty for long.

...
> Now, I have a hard time to see how that is supposed to work.
> 
> do_unmap()
>  detach_vmas_to_be_unmapped()
>  unmap_region()
>    free_pgtables()
>  arch_unmap()
>    mpx_unmap()
> 
> So at the point where you try to access the directory to gather the
> information about the entries which might be affected, that stuff is
> unmapped already and the page tables are gone.
> 
> Brilliant idea, really. And if you run into the fault in mpx_unmap()
> you plan to delegate the fixup to a work queue. How is that thing
> going to find what belonged to the unmapped directory?

The bounds directory is not being unmapped here.  I _think_ I covered
that above, but don't be shy if I'm not being clear. ;)

> Even if the stuff would be accessible at that point, it is a damned
> stupid idea to rely on anything userspace is providing to you. I
> learned that the hard way in futex.c
> 
> The proper solution to this problem is:
> 
>     do_bounds()
> 	bd_addr = get_bd_addr_from_xsave();
> 	bd_entry = bndstatus & ADDR_MASK:
> 
> 	bt = mpx_mmap(bd_addr, bd_entry, len);
> 
> 	set_bt_entry_in_bd(bd_entry, bt);
> 
> And in mpx_mmap()
> 
>        .....
>        vma = find_vma();
> 
>        vma->bd_addr = bd_addr;
>        vma->bd_entry = bd_entry;

If the bounds directory moved around, this would make sense.  Otherwise,
it's a waste of space because all vmas in a given mm would have the
exact same bd_addr, and we might as well just store it in mm->bd_something.

Are you suggesting that we support moving the bounds directory around?

Also, the bd_entry can be _calculated_ from vma->vm_start and the
bd_addr.  It seems a bit redundant to store it like this.

Also this would add 16 bytes to the currently 184-byte VMA.  That seems
suboptimal to me.  It would eat over a megabyte of memory on my *laptop*
alone.

> Now on mpx_unmap()
> 
>     for_each_vma()
> 	if (is_affected(vma->bd_addr, vma->bd_entry))
>  	   unmap(vma);
> 
> That does not require a prctl, no fault handling in the unmap path, it
> just works and is robust by design because it does not rely on any
> user space crappola. You store the directory context at allocation
> time and free it when that context goes away. It's that simple, really.

If you are talking about the VM_MPX VMA that was allocated to hold the
bounds table, this won't work.

Once we unmap the bounds table, we would have a bounds directory entry
pointing at empty address space.  That address space could now be
allocated for some other (random) use, and the MPX hardware is now going
to go trying to walk it as if it were a bounds table.  That would be bad.

Any unmapping of a bounds table has to be accompanied by a corresponding
write to the bounds directory entry.  That write to the bounds directory
can fault.




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 20:18                 ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> There are two mappings in play:
>> 1. The mapping with the actual data, which userspace is munmap()ing or
>>    brk()ing away, etc... (never tagged VM_MPX)
> 
> It's not tagged that way because it is mapped by user space.

Correct.  It is not tagged because it is mapped by user space.

> This is the directory, right?

No.  The untagged mapping in question here is for normal user data, like
an mmap() or brk(), unrelated to MPX.

The directory is a separate matter.  It is also (currently) untagged
with VM_MPX since it is also allocated by userspace.

>> 2. The mapping for the bounds table *backing* the data (is tagged with
>>    VM_MPX)
> 
> That's the stuff, which gets magically allocated from do_bounds(). And
> the reason you do that from the #BR is that user space would have to
> allocate a gazillion of bound tables to make sure that every corner
> case is covered.

Yes.

> With the allocation from #BR you make that behaviour
> dynamic and you just provide an empty "no bounds" table to make the
> bound checker happy.

Kinda.  We do provide an empty table, but the first access will always
be a write, so it doesn't stay empty for long.

...
> Now, I have a hard time to see how that is supposed to work.
> 
> do_unmap()
>  detach_vmas_to_be_unmapped()
>  unmap_region()
>    free_pgtables()
>  arch_unmap()
>    mpx_unmap()
> 
> So at the point where you try to access the directory to gather the
> information about the entries which might be affected, that stuff is
> unmapped already and the page tables are gone.
> 
> Brilliant idea, really. And if you run into the fault in mpx_unmap()
> you plan to delegate the fixup to a work queue. How is that thing
> going to find what belonged to the unmapped directory?

The bounds directory is not being unmapped here.  I _think_ I covered
that above, but don't be shy if I'm not being clear. ;)

> Even if the stuff would be accessible at that point, it is a damned
> stupid idea to rely on anything userspace is providing to you. I
> learned that the hard way in futex.c
> 
> The proper solution to this problem is:
> 
>     do_bounds()
> 	bd_addr = get_bd_addr_from_xsave();
> 	bd_entry = bndstatus & ADDR_MASK:
> 
> 	bt = mpx_mmap(bd_addr, bd_entry, len);
> 
> 	set_bt_entry_in_bd(bd_entry, bt);
> 
> And in mpx_mmap()
> 
>        .....
>        vma = find_vma();
> 
>        vma->bd_addr = bd_addr;
>        vma->bd_entry = bd_entry;

If the bounds directory moved around, this would make sense.  Otherwise,
it's a waste of space because all vmas in a given mm would have the
exact same bd_addr, and we might as well just store it in mm->bd_something.

Are you suggesting that we support moving the bounds directory around?

Also, the bd_entry can be _calculated_ from vma->vm_start and the
bd_addr.  It seems a bit redundant to store it like this.

Also this would add 16 bytes to the currently 184-byte VMA.  That seems
suboptimal to me.  It would eat over a megabyte of memory on my *laptop*
alone.

> Now on mpx_unmap()
> 
>     for_each_vma()
> 	if (is_affected(vma->bd_addr, vma->bd_entry))
>  	   unmap(vma);
> 
> That does not require a prctl, no fault handling in the unmap path, it
> just works and is robust by design because it does not rely on any
> user space crappola. You store the directory context at allocation
> time and free it when that context goes away. It's that simple, really.

If you are talking about the VM_MPX VMA that was allocated to hold the
bounds table, this won't work.

Once we unmap the bounds table, we would have a bounds directory entry
pointing at empty address space.  That address space could now be
allocated for some other (random) use, and the MPX hardware is now going
to go trying to walk it as if it were a bounds table.  That would be bad.

Any unmapping of a bounds table has to be accompanied by a corresponding
write to the bounds directory entry.  That write to the bounds directory
can fault.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 17:42             ` Thomas Gleixner
@ 2014-09-12 20:33               ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The prctl(register) is meant to be a signal from userspace to the kernel
>> to say, "I would like your help in managing these bounds tables".
>> prctl(unregister) is the opposite, meaning "I don't want your help any
>> more".
> 
> Fine, but that's a totally different story. I can see the usefulness
> of this, but then it's a complete misnomer. It should be:
> 
>    prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

Agreed.  Those are much better names.

> So this wants to be a boolean value and not some random user space
> address collected at some random point and then ignored until you do
> the magic cleanup. See the other reply.

I know at this point you think the kernel can not or should not keep a
copy of the bounds directory location around.  I understand that.  Bear
with me for a moment, and please just assume for a moment that we need it.

It's far from a random userspace address.  When you make a syscall, we
put the arguments in registers.  The register we're putting it in here
just happens to be used by the hardware.

Right now, we do (ignoring the actual xsave/xrstr):

	bndcfgu = bnd_dir_ptr | ENABLE_BIT;
	prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf

We could pass it explicitly in %rdi as a syscall argument and not have
the prctl() code fetch it from the xsave buffer.  I'm just not sure what
this buys us:

	bndcfgu = bnd_dir_ptr | ENABLE_BIT;
	prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu);

Also, the "random cleanup" just happens to correspond with memory
deallocation, which is something we want to go fast.  I'd _prefer_ to
keep xsaves out of the unmap path if possible.  It's not a strict
requirement, but it does seem prudent as an xsave eats a dozen or so
cachelines.

It's also not "sampled".  I can't imagine a situation where the register
will change values during the execution of any sane program.  It really
is essentially fixed.  It's probably one of the reasons it is so
expensive to access: there's *no* reason to do it frequently.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 20:33               ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 10:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The prctl(register) is meant to be a signal from userspace to the kernel
>> to say, "I would like your help in managing these bounds tables".
>> prctl(unregister) is the opposite, meaning "I don't want your help any
>> more".
> 
> Fine, but that's a totally different story. I can see the usefulness
> of this, but then it's a complete misnomer. It should be:
> 
>    prctl(EN/DISABLE_MPX_BT_MANAGEMENT)

Agreed.  Those are much better names.

> So this wants to be a boolean value and not some random user space
> address collected at some random point and then ignored until you do
> the magic cleanup. See the other reply.

I know at this point you think the kernel can not or should not keep a
copy of the bounds directory location around.  I understand that.  Bear
with me for a moment, and please just assume for a moment that we need it.

It's far from a random userspace address.  When you make a syscall, we
put the arguments in registers.  The register we're putting it in here
just happens to be used by the hardware.

Right now, we do (ignoring the actual xsave/xrstr):

	bndcfgu = bnd_dir_ptr | ENABLE_BIT;
	prctl(ENABLE_MPX_BT_MANAGEMENT); // kernel grabs from xsave buf

We could pass it explicitly in %rdi as a syscall argument and not have
the prctl() code fetch it from the xsave buffer.  I'm just not sure what
this buys us:

	bndcfgu = bnd_dir_ptr | ENABLE_BIT;
	prctl(ENABLE_MPX_BT_MANAGEMENT, bndcfgu);

Also, the "random cleanup" just happens to correspond with memory
deallocation, which is something we want to go fast.  I'd _prefer_ to
keep xsaves out of the unmap path if possible.  It's not a strict
requirement, but it does seem prudent as an xsave eats a dozen or so
cachelines.

It's also not "sampled".  I can't imagine a situation where the register
will change values during the execution of any sane program.  It really
is essentially fixed.  It's probably one of the reasons it is so
expensive to access: there's *no* reason to do it frequently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 18:42                 ` Thomas Gleixner
@ 2014-09-12 20:35                   ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 11:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The proper solution to this problem is:
>>
>>     do_bounds()
>> 	bd_addr = get_bd_addr_from_xsave();
>> 	bd_entry = bndstatus & ADDR_MASK:
> 
> Just for clarification. You CANNOT avoid the xsave here because it's
> the only way to access BNDSTATUS according to the manual.
> 
> "The BNDCFGU and BNDSTATUS registers are accessible only with
>  XSAVE/XRSTOR family of instructions"
> 
> So there is no point to cache BNDCFGU as you get it anyway when you
> need to retrieve the invalid BD entry.

Agreed.  It serves no purpose during a bounds fault.

However, it does keep you from having to do an xsave during the bounds
table free operations, like at unmap() time.  That is actually a much
more critical path than bounds faults.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-12 20:35                   ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 20:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 11:42 AM, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Thomas Gleixner wrote:
>> On Fri, 12 Sep 2014, Dave Hansen wrote:
>> The proper solution to this problem is:
>>
>>     do_bounds()
>> 	bd_addr = get_bd_addr_from_xsave();
>> 	bd_entry = bndstatus & ADDR_MASK:
> 
> Just for clarification. You CANNOT avoid the xsave here because it's
> the only way to access BNDSTATUS according to the manual.
> 
> "The BNDCFGU and BNDSTATUS registers are accessible only with
>  XSAVE/XRSTOR family of instructions"
> 
> So there is no point to cache BNDCFGU as you get it anyway when you
> need to retrieve the invalid BD entry.

Agreed.  It serves no purpose during a bounds fault.

However, it does keep you from having to do an xsave during the bounds
table free operations, like at unmap() time.  That is actually a much
more critical path than bounds faults.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12 19:21     ` Thomas Gleixner
@ 2014-09-12 21:23       ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 21:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> Yes, the most important question is WHY must the kernel handle the
> bound table memory allocation in the first place. The "documentation"
> patch completely fails to tell that.

This will become the description of "patch 04/10".  Feel free to wait
until we repost these to read it, but I'm posting it here because it's
going to be a couple of days before we actually get a new set of patches
out.

Any suggestions for how much of this is appropriate for Documentation/
would be much appreciated.  I don't have a good feel for it.

---

Subject: x86: mpx: on-demand kernel allocation of bounds tables
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs
to spill them somewhere.  It has two special instructions for
this which allow the bounds to be moved between the bounds
registers and some new "bounds tables".

#BR exceptions are a new class of exceptions just for MPX.  They
are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present.  This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal
processes address space (essentially calling mmap() from inside
the kernel) and then pointing the bounds-directory over to it.

The tables *need* to be accessed and controlled by userspace
because the instructions for moving bounds in and out of them are
extremely frequent.  They potentially happen every time a
register points to memory.  Any direct kernel involvement (like a
syscall) to access the tables would obviously destroy
performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace.  Here are
a few ways this *could* be done.  I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*.  An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory.  If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time.  Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory.  IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall.  This can be done for small,
   constrained applications.  But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app migth allocate memory (think libraries).  The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler intead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand
in the kernel.



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-12 21:23       ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 21:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> Yes, the most important question is WHY must the kernel handle the
> bound table memory allocation in the first place. The "documentation"
> patch completely fails to tell that.

This will become the description of "patch 04/10".  Feel free to wait
until we repost these to read it, but I'm posting it here because it's
going to be a couple of days before we actually get a new set of patches
out.

Any suggestions for how much of this is appropriate for Documentation/
would be much appreciated.  I don't have a good feel for it.

---

Subject: x86: mpx: on-demand kernel allocation of bounds tables
MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs
to spill them somewhere.  It has two special instructions for
this which allow the bounds to be moved between the bounds
registers and some new "bounds tables".

#BR exceptions are a new class of exceptions just for MPX.  They
are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present.  This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal
processes address space (essentially calling mmap() from inside
the kernel) and then pointing the bounds-directory over to it.

The tables *need* to be accessed and controlled by userspace
because the instructions for moving bounds in and out of them are
extremely frequent.  They potentially happen every time a
register points to memory.  Any direct kernel involvement (like a
syscall) to access the tables would obviously destroy
performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace.  Here are
a few ways this *could* be done.  I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*.  An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory.  If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time.  Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory.  IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall.  This can be done for small,
   constrained applications.  But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app migth allocate memory (think libraries).  The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler intead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand
in the kernel.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12 19:21     ` Thomas Gleixner
@ 2014-09-12 21:31       ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 21:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Dave Hansen wrote:
>> +When #BR fault is produced due to invalid entry, bounds table will be
>> +created in kernel on demand and kernel will not transfer this fault to
>> +userspace. So usersapce can't receive #BR fault for invalid entry, and
>> +it is not also necessary for users to create bounds tables by themselves.
>> +
>> +Certainly users can allocate bounds tables and forcibly point the bounds
>> +directory at them through XSAVE instruction, and then set valid bit
>> +of bounds entry to have this entry valid. But we have no way to track
>> +the memory usage of these user-created bounds tables. In regard to this,
>> +this behaviour is outlawed here.
> 
> So what's the point of declaring it outlawed? Nothing as far as I can
> see simply because you cannot enforce it. This is possible and people
> simply will do it.

All that we want to get across is: if the kernel didn't make the mess,
we're not going to clean it up.

Userspace is free to do whatever the heck it wants.  But, if it wants
the kernel to clean up the bounds tables, it needs to follow the rules
we're laying out here.

I think it boils down to two rules:
1. Don't move the bounds directory without telling the kernel.
2. The kernel will not free any memory which it did not allocate.

>> +2) We will not support the case that multiple bounds directory entries
>> +are pointed at the same bounds table.
>> +
>> +Users can be allowed to take multiple bounds directory entries and point
>> +them at the same bounds table. See more information "Intel(R) Architecture
>> +Instruction Set Extensions Programming Reference" (9.3.4).
>> +
>> +If userspace did this, it will be possible for kernel to unmap an in-use
>> +bounds table since it does not recognize sharing. So this behavior is
>> +also outlawed here.
> 
> Again, this is nothing you can enforce and just saying its outlawed
> does not prevent user space from doing it and then sending hard to
> decode bug reports where it complains about mappings silently
> vanishing under it.
> 
> So all you can do here is to write up a rule set how well behaving
> user space is supposed to use this facility and the kernel side of it. 

"Outlaw" was probably the wrong word.

I completely agree that all we can do is set up a set of rules for what
well-behaved userspace is expected to do.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-12 21:31       ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 21:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> On Thu, 11 Sep 2014, Dave Hansen wrote:
>> +When #BR fault is produced due to invalid entry, bounds table will be
>> +created in kernel on demand and kernel will not transfer this fault to
>> +userspace. So usersapce can't receive #BR fault for invalid entry, and
>> +it is not also necessary for users to create bounds tables by themselves.
>> +
>> +Certainly users can allocate bounds tables and forcibly point the bounds
>> +directory at them through XSAVE instruction, and then set valid bit
>> +of bounds entry to have this entry valid. But we have no way to track
>> +the memory usage of these user-created bounds tables. In regard to this,
>> +this behaviour is outlawed here.
> 
> So what's the point of declaring it outlawed? Nothing as far as I can
> see simply because you cannot enforce it. This is possible and people
> simply will do it.

All that we want to get across is: if the kernel didn't make the mess,
we're not going to clean it up.

Userspace is free to do whatever the heck it wants.  But, if it wants
the kernel to clean up the bounds tables, it needs to follow the rules
we're laying out here.

I think it boils down to two rules:
1. Don't move the bounds directory without telling the kernel.
2. The kernel will not free any memory which it did not allocate.

>> +2) We will not support the case that multiple bounds directory entries
>> +are pointed at the same bounds table.
>> +
>> +Users can be allowed to take multiple bounds directory entries and point
>> +them at the same bounds table. See more information "Intel(R) Architecture
>> +Instruction Set Extensions Programming Reference" (9.3.4).
>> +
>> +If userspace did this, it will be possible for kernel to unmap an in-use
>> +bounds table since it does not recognize sharing. So this behavior is
>> +also outlawed here.
> 
> Again, this is nothing you can enforce and just saying its outlawed
> does not prevent user space from doing it and then sending hard to
> decode bug reports where it complains about mappings silently
> vanishing under it.
> 
> So all you can do here is to write up a rule set how well behaving
> user space is supposed to use this facility and the kernel side of it. 

"Outlaw" was probably the wrong word.

I completely agree that all we can do is set up a set of rules for what
well-behaved userspace is expected to do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12 19:21     ` Thomas Gleixner
@ 2014-09-12 22:08       ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 22:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

OK, here's some revised text for patch 00/10.  Again, this will
obviously be updated for the next post, but comments before that would
be much appreciated.

-----

This patch set adds support for the Memory Protection eXtensions
(MPX) feature found in future Intel processors.  MPX is used in
conjunction with compiler changes to check memory references, and can be
used to catch buffer overflow or underflow.

For MPX to work, changes are required in the kernel, binutils and
compiler.  No source changes are required for applications, just a
recompile.

There are a lot of moving parts of this to all work right:

===== Example Compiler / Application / Kernel Interaction =====

1. Application developer compiles with -fmpx.  The compiler will add the
   instrumentation as well as some setup code called early after the app
   starts.  New instruction prefixes are noops for old CPUs.
2. That setup code allocates (virtual) space for the "bounds directory",
   points the "bndcfgu" register to the directory and notifies the
   kernel (via the new prctl()) that the app will be using MPX.
3. The kernel detects that the CPU has MPX, allows the new prctl() to
   succeed, and notes the location of the bounds directory.  We note it
   instead of reading it each time because the 'xsave' operation needed
   to access the bounds directory register is an expensive operation.
4. If the application needs to spill bounds out of the 4 registers, it
   issues a bndstx instruction.  Since the bounds directory is empty at
   this point, a bounds fault (#BR) is raised, the kernel allocates a
   bounds table (in the user address space) and makes the relevant
   entry in the bounds directory point to the new table. [1]
5. If the application violates the bounds specified in the bounds
   registers, a separate kind of #BR is raised which will deliver a
   signal with information about the violation in the 'struct siginfo'.
6. Whenever memory is freed, we know that it can no longer contain
   valid pointers, and we attempt to free the associated space in the
   bounds tables.  If an entire table becomes unused, we will attempt
   to free the table and remove the entry in the directory.

To summarize, there are essentially three things interacting here:

GCC with -fmpx:
 * enables annotation of code with MPX instructions and prefixes
 * inserts code early in the application to call in to the "gcc runtime"
GCC MPX Runtime:
 * Checks for hardware MPX support in cpuid leaf
 * allocates virtual space for the bounds directory (malloc()
   essentially)
 * points the hardware BNDCFGU register at the directory
 * calls a new prctl() to notify the kernel to start managing the
   bounds directories
Kernel MPX Code:
 * Checks for hardware MPX support in cpuid leaf
 * Handles #BR exceptions and sends SIGSEGV to the app when it violates
   bounds, like during a buffer overflow.
 * When bounds are spilled in to an unallocated bounds table, the kernel
   notices in the #BR exception, allocates the virtual space, then
   updates the bounds directory to point to the new table.  It keeps
   special track of the memory with a VM_MPX flag.
 * Frees unused bounds tables at the time that the memory they described
   is unmapped. (See "cleanup unused bound tables")

===== Testing =====

This patchset has been tested on real internal hardware platform at
Intel.  We have some simple unit tests in user space, which directly
call MPX instructions to produce #BR to let kernel allocate bounds
tables and cause bounds violations. We also compiled several benchmarks
with an MPX-enabled compiler and ran them with this patch set.  We found
a number of bugs in this code in these tests.

1. For more info on why the kernel does these allocations, see the patch
"on-demand kernel allocation of bounds tables"


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-12 22:08       ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 22:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

OK, here's some revised text for patch 00/10.  Again, this will
obviously be updated for the next post, but comments before that would
be much appreciated.

-----

This patch set adds support for the Memory Protection eXtensions
(MPX) feature found in future Intel processors.  MPX is used in
conjunction with compiler changes to check memory references, and can be
used to catch buffer overflow or underflow.

For MPX to work, changes are required in the kernel, binutils and
compiler.  No source changes are required for applications, just a
recompile.

There are a lot of moving parts of this to all work right:

===== Example Compiler / Application / Kernel Interaction =====

1. Application developer compiles with -fmpx.  The compiler will add the
   instrumentation as well as some setup code called early after the app
   starts.  New instruction prefixes are noops for old CPUs.
2. That setup code allocates (virtual) space for the "bounds directory",
   points the "bndcfgu" register to the directory and notifies the
   kernel (via the new prctl()) that the app will be using MPX.
3. The kernel detects that the CPU has MPX, allows the new prctl() to
   succeed, and notes the location of the bounds directory.  We note it
   instead of reading it each time because the 'xsave' operation needed
   to access the bounds directory register is an expensive operation.
4. If the application needs to spill bounds out of the 4 registers, it
   issues a bndstx instruction.  Since the bounds directory is empty at
   this point, a bounds fault (#BR) is raised, the kernel allocates a
   bounds table (in the user address space) and makes the relevant
   entry in the bounds directory point to the new table. [1]
5. If the application violates the bounds specified in the bounds
   registers, a separate kind of #BR is raised which will deliver a
   signal with information about the violation in the 'struct siginfo'.
6. Whenever memory is freed, we know that it can no longer contain
   valid pointers, and we attempt to free the associated space in the
   bounds tables.  If an entire table becomes unused, we will attempt
   to free the table and remove the entry in the directory.

To summarize, there are essentially three things interacting here:

GCC with -fmpx:
 * enables annotation of code with MPX instructions and prefixes
 * inserts code early in the application to call in to the "gcc runtime"
GCC MPX Runtime:
 * Checks for hardware MPX support in cpuid leaf
 * allocates virtual space for the bounds directory (malloc()
   essentially)
 * points the hardware BNDCFGU register at the directory
 * calls a new prctl() to notify the kernel to start managing the
   bounds directories
Kernel MPX Code:
 * Checks for hardware MPX support in cpuid leaf
 * Handles #BR exceptions and sends SIGSEGV to the app when it violates
   bounds, like during a buffer overflow.
 * When bounds are spilled in to an unallocated bounds table, the kernel
   notices in the #BR exception, allocates the virtual space, then
   updates the bounds directory to point to the new table.  It keeps
   special track of the memory with a VM_MPX flag.
 * Frees unused bounds tables at the time that the memory they described
   is unmapped. (See "cleanup unused bound tables")

===== Testing =====

This patchset has been tested on real internal hardware platform at
Intel.  We have some simple unit tests in user space, which directly
call MPX instructions to produce #BR to let kernel allocate bounds
tables and cause bounds violations. We also compiled several benchmarks
with an MPX-enabled compiler and ran them with this patch set.  We found
a number of bugs in this code in these tests.

1. For more info on why the kernel does these allocations, see the patch
"on-demand kernel allocation of bounds tables"

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-12 22:58     ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 22:58 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +static int allocate_bt(long __user *bd_entry)
> +{
> +	unsigned long bt_addr, old_val = 0;
> +	int ret = 0;
> +
> +	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
> +	if (IS_ERR((void *)bt_addr))
> +		return bt_addr;
> +	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG;

Qiaowei, why do we need the "& MPX_BT_ADDR_MASK" here?


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
@ 2014-09-12 22:58     ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-12 22:58 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +static int allocate_bt(long __user *bd_entry)
> +{
> +	unsigned long bt_addr, old_val = 0;
> +	int ret = 0;
> +
> +	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
> +	if (IS_ERR((void *)bt_addr))
> +		return bt_addr;
> +	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) | MPX_BD_ENTRY_VALID_FLAG;

Qiaowei, why do we need the "& MPX_BT_ADDR_MASK" here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
  2014-09-12  8:17         ` Thomas Gleixner
@ 2014-09-13  7:13           ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-13  7:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel



On 2014-09-12, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Ren, Qiaowei wrote:
>> On 2014-09-12, Thomas Gleixner wrote:
>>> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>>> 
>>>> Due to new fields about bound violation added into struct
>>>> siginfo, this patch syncs it with general version to avoid build issue.
>>> 
>>> You completely fail to explain which build issue is addressed by
>>> this patch. The code you added to kernel/signal.c which accesses
>>> _addr_bnd is guarded by
>>> 
>>> +#ifdef SEGV_BNDERR
>>> 
>>> which is not defined my MIPS. Also why is this only affecting MIPS
>>> and not any other architecture which provides its own struct siginfo ?
>>> 
>>> That patch makes no sense at all, at least not without a proper explanation.
>>> 
>> For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will
>> include general siginfo.h, and only replace general stuct siginfo
>> with mips specific struct siginfo. So SEGV_BNDERR will be defined
>> for all archs, and we will get error like "no _lower in struct
>> siginfo" when arch=mips.
>> 
>> In addition, only MIPS arch define its own struct siginfo, so this
>> is only affecting MIPS.
> 
> So IA64 does not count as an architecture and therefor does not need
> the same treatment, right?
> 
struct siginfo for IA64 should be also synced. I will do this next post.

Thanks,
Qiaowei


^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 06/10] mips: sync struct siginfo with general version
@ 2014-09-13  7:13           ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-13  7:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Hansen, Dave, x86, linux-mm, linux-kernel



On 2014-09-12, Thomas Gleixner wrote:
> On Fri, 12 Sep 2014, Ren, Qiaowei wrote:
>> On 2014-09-12, Thomas Gleixner wrote:
>>> On Thu, 11 Sep 2014, Qiaowei Ren wrote:
>>> 
>>>> Due to new fields about bound violation added into struct
>>>> siginfo, this patch syncs it with general version to avoid build issue.
>>> 
>>> You completely fail to explain which build issue is addressed by
>>> this patch. The code you added to kernel/signal.c which accesses
>>> _addr_bnd is guarded by
>>> 
>>> +#ifdef SEGV_BNDERR
>>> 
>>> which is not defined my MIPS. Also why is this only affecting MIPS
>>> and not any other architecture which provides its own struct siginfo ?
>>> 
>>> That patch makes no sense at all, at least not without a proper explanation.
>>> 
>> For arch=mips, siginfo.h (arch/mips/include/uapi/asm/siginfo.h) will
>> include general siginfo.h, and only replace general stuct siginfo
>> with mips specific struct siginfo. So SEGV_BNDERR will be defined
>> for all archs, and we will get error like "no _lower in struct
>> siginfo" when arch=mips.
>> 
>> In addition, only MIPS arch define its own struct siginfo, so this
>> is only affecting MIPS.
> 
> So IA64 does not count as an architecture and therefor does not need
> the same treatment, right?
> 
struct siginfo for IA64 should be also synced. I will do this next post.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
  2014-09-12 22:58     ` Dave Hansen
@ 2014-09-13  7:24       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-13  7:24 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-13, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +static int allocate_bt(long __user *bd_entry) {
>> +	unsigned long bt_addr, old_val = 0;
>> +	int ret = 0;
>> +
>> +	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
>> +	if (IS_ERR((void *)bt_addr))
>> +		return bt_addr;
>> +	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) |
> MPX_BD_ENTRY_VALID_FLAG;
> 
> Qiaowei, why do we need the "& MPX_BT_ADDR_MASK" here?

It should be not necessary, and can be removed.

Thanks,
Qiaowei

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
@ 2014-09-13  7:24       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-13  7:24 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-13, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +static int allocate_bt(long __user *bd_entry) {
>> +	unsigned long bt_addr, old_val = 0;
>> +	int ret = 0;
>> +
>> +	bt_addr = mpx_mmap(MPX_BT_SIZE_BYTES);
>> +	if (IS_ERR((void *)bt_addr))
>> +		return bt_addr;
>> +	bt_addr = (bt_addr & MPX_BT_ADDR_MASK) |
> MPX_BD_ENTRY_VALID_FLAG;
> 
> Qiaowei, why do we need the "& MPX_BT_ADDR_MASK" here?

It should be not necessary, and can be removed.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-12 20:18                 ` Dave Hansen
@ 2014-09-13  9:01                   ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Dave Hansen wrote:
> >> There are two mappings in play:
> >> 1. The mapping with the actual data, which userspace is munmap()ing or
> >>    brk()ing away, etc... (never tagged VM_MPX)
> > 
> > It's not tagged that way because it is mapped by user space.
> 
> Correct.  It is not tagged because it is mapped by user space.
> 
> > This is the directory, right?
> 
> No.  The untagged mapping in question here is for normal user data, like
> an mmap() or brk(), unrelated to MPX.

Ok. That makes sense.
 
> The directory is a separate matter.  It is also (currently) untagged
> with VM_MPX since it is also allocated by userspace.

So if that gets unmapped my observation holds. You still try to access
the directory, take the fault, queue work and in the work you dont
know how to handle it either.

So if the unmapped region affects bd_addr then we should just release
the affected BT mappings, i.e. all vmas flagged with VMA_MPX.

> > With the allocation from #BR you make that behaviour
> > dynamic and you just provide an empty "no bounds" table to make the
> > bound checker happy.
> 
> Kinda.  We do provide an empty table, but the first access will always
> be a write, so it doesn't stay empty for long.

So this comes from adding an entry to a not yet mapped table not from
an actual bound check? I still need to digest the details in the
manual.

> The bounds directory is not being unmapped here.  I _think_ I covered
> that above, but don't be shy if I'm not being clear. ;)

Fair enough. My confusion.
 
> If the bounds directory moved around, this would make sense.  Otherwise,
> it's a waste of space because all vmas in a given mm would have the
> exact same bd_addr, and we might as well just store it in mm->bd_something.

Ok. But we really want to do some sanity checking on all of this.
 
> Are you suggesting that we support moving the bounds directory around?

No, but the stupid thing CAN move around and we want to think about it
now instead of figuring out what to do about it later.

So if we go and store bd_addr with the prctl then you can do in the
#BR "Invalid BD entry":

    bd_addr = xsave->xsave_buf->bndcsr.cfg_reg_u;
    
    /*
     * Catch the case that this is not enabled, i.e. mm->bd_addr == 0,
     * and the case that stupid user space moved the directory
     * around.
     */
    if (mm->bd_addr != bd_addr) {
       Yell and whack stupid app over the head;
    }

> Also, the bd_entry can be _calculated_ from vma->vm_start and the
> bd_addr.  It seems a bit redundant to store it like this.

Fair enough.

> If you are talking about the VM_MPX VMA that was allocated to hold the
> bounds table, this won't work.

Sorry yes, that only works for unmapping the bound directory itself.

> Once we unmap the bounds table, we would have a bounds directory entry
> pointing at empty address space.  That address space could now be
> allocated for some other (random) use, and the MPX hardware is now going
> to go trying to walk it as if it were a bounds table.  That would be bad.
> 
> Any unmapping of a bounds table has to be accompanied by a corresponding
> write to the bounds directory entry.  That write to the bounds directory
> can fault.

So if it fails you need to keep the bound table around until you can
handle that somewhere else, i.e. outside of the mmap sem held
region. That's what you are planning to do with the work queue thing.

Now I'm asking myself, whether we are forced to do that from the end
of do_unmap() rather than doing it from the call site outside of the
mmap_sem held region. I can see that adding arch_unmap() to do_unmap()
is a very simple solution, but it comes with the price of dealing with
faults inside of the mmap_sem held region.

It might be worthwhile to think about the following:

   down_write(mmap_sem);
   
   do_stuff()
     do_munmap(mm, start, len)
        ...
        arch_munmap(mm, start, len) {
	  if (!mm->bd_addr)
	     return;
	  bt_work = kmalloc(sizeof(struct bt_work));
	  bt_work->start = start;
	  bt_work->len = len;
	  hlist_add(&bt_work->list, &mm->bt_work_head);
        } 

And then instead of up_write(mmap_sem);

    arch_up_write(mmap_sem);

Which by default is mapped to up_write(mmap_sem);

Now for the MPX case you can do:
{
	HLIST_HEAD(bt_work_head);

	hlist_move_list(&mm->bt_work_head, &bt_work_head);
	up_write(mmap_sem);

	hlist_for_each_entry_safe()
		handle_bt_work();
}
          
So that needs a few more changes vs. the up_write(mmap_sem) at the
callsites of do_munmap(), but we might even make that a generic thing,
i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can
imagine that we have other use cases for this.

Thoughts?

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-13  9:01                   ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 10:34 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Dave Hansen wrote:
> >> There are two mappings in play:
> >> 1. The mapping with the actual data, which userspace is munmap()ing or
> >>    brk()ing away, etc... (never tagged VM_MPX)
> > 
> > It's not tagged that way because it is mapped by user space.
> 
> Correct.  It is not tagged because it is mapped by user space.
> 
> > This is the directory, right?
> 
> No.  The untagged mapping in question here is for normal user data, like
> an mmap() or brk(), unrelated to MPX.

Ok. That makes sense.
 
> The directory is a separate matter.  It is also (currently) untagged
> with VM_MPX since it is also allocated by userspace.

So if that gets unmapped my observation holds. You still try to access
the directory, take the fault, queue work and in the work you dont
know how to handle it either.

So if the unmapped region affects bd_addr then we should just release
the affected BT mappings, i.e. all vmas flagged with VMA_MPX.

> > With the allocation from #BR you make that behaviour
> > dynamic and you just provide an empty "no bounds" table to make the
> > bound checker happy.
> 
> Kinda.  We do provide an empty table, but the first access will always
> be a write, so it doesn't stay empty for long.

So this comes from adding an entry to a not yet mapped table not from
an actual bound check? I still need to digest the details in the
manual.

> The bounds directory is not being unmapped here.  I _think_ I covered
> that above, but don't be shy if I'm not being clear. ;)

Fair enough. My confusion.
 
> If the bounds directory moved around, this would make sense.  Otherwise,
> it's a waste of space because all vmas in a given mm would have the
> exact same bd_addr, and we might as well just store it in mm->bd_something.

Ok. But we really want to do some sanity checking on all of this.
 
> Are you suggesting that we support moving the bounds directory around?

No, but the stupid thing CAN move around and we want to think about it
now instead of figuring out what to do about it later.

So if we go and store bd_addr with the prctl then you can do in the
#BR "Invalid BD entry":

    bd_addr = xsave->xsave_buf->bndcsr.cfg_reg_u;
    
    /*
     * Catch the case that this is not enabled, i.e. mm->bd_addr == 0,
     * and the case that stupid user space moved the directory
     * around.
     */
    if (mm->bd_addr != bd_addr) {
       Yell and whack stupid app over the head;
    }

> Also, the bd_entry can be _calculated_ from vma->vm_start and the
> bd_addr.  It seems a bit redundant to store it like this.

Fair enough.

> If you are talking about the VM_MPX VMA that was allocated to hold the
> bounds table, this won't work.

Sorry yes, that only works for unmapping the bound directory itself.

> Once we unmap the bounds table, we would have a bounds directory entry
> pointing at empty address space.  That address space could now be
> allocated for some other (random) use, and the MPX hardware is now going
> to go trying to walk it as if it were a bounds table.  That would be bad.
> 
> Any unmapping of a bounds table has to be accompanied by a corresponding
> write to the bounds directory entry.  That write to the bounds directory
> can fault.

So if it fails you need to keep the bound table around until you can
handle that somewhere else, i.e. outside of the mmap sem held
region. That's what you are planning to do with the work queue thing.

Now I'm asking myself, whether we are forced to do that from the end
of do_unmap() rather than doing it from the call site outside of the
mmap_sem held region. I can see that adding arch_unmap() to do_unmap()
is a very simple solution, but it comes with the price of dealing with
faults inside of the mmap_sem held region.

It might be worthwhile to think about the following:

   down_write(mmap_sem);
   
   do_stuff()
     do_munmap(mm, start, len)
        ...
        arch_munmap(mm, start, len) {
	  if (!mm->bd_addr)
	     return;
	  bt_work = kmalloc(sizeof(struct bt_work));
	  bt_work->start = start;
	  bt_work->len = len;
	  hlist_add(&bt_work->list, &mm->bt_work_head);
        } 

And then instead of up_write(mmap_sem);

    arch_up_write(mmap_sem);

Which by default is mapped to up_write(mmap_sem);

Now for the MPX case you can do:
{
	HLIST_HEAD(bt_work_head);

	hlist_move_list(&mm->bt_work_head, &bt_work_head);
	up_write(mmap_sem);

	hlist_for_each_entry_safe()
		handle_bt_work();
}
          
So that needs a few more changes vs. the up_write(mmap_sem) at the
callsites of do_munmap(), but we might even make that a generic thing,
i.e. replace up_write(mmap_sem) with release_write(mmap_sem). I can
imagine that we have other use cases for this.

Thoughts?

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12 21:23       ` Dave Hansen
@ 2014-09-13  9:25         ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:

> On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> > Yes, the most important question is WHY must the kernel handle the
> > bound table memory allocation in the first place. The "documentation"
> > patch completely fails to tell that.
> 
> This will become the description of "patch 04/10".  Feel free to wait

Thanks for writing this up! That helps a lot.

> until we repost these to read it, but I'm posting it here because it's
> going to be a couple of days before we actually get a new set of patches
> out.
> 
> Any suggestions for how much of this is appropriate for Documentation/
> would be much appreciated.  I don't have a good feel for it.

I think all of it. The kernels problem is definitely not that it
drains in documentation :)
 
> Having ruled out all of the userspace-only approaches for managing
> bounds tables that we could think of, we create them on demand
> in the kernel.

So what the documentation wants on top of this is the rule set which
describes the expected behaviour of sane applications and perhaps the
potential consequences for insane ones. Not that people care about
that much, but at least we can point them to documentation if they
come up with their weird ass "bug" reports :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-13  9:25         ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:

> On 09/12/2014 12:21 PM, Thomas Gleixner wrote:
> > Yes, the most important question is WHY must the kernel handle the
> > bound table memory allocation in the first place. The "documentation"
> > patch completely fails to tell that.
> 
> This will become the description of "patch 04/10".  Feel free to wait

Thanks for writing this up! That helps a lot.

> until we repost these to read it, but I'm posting it here because it's
> going to be a couple of days before we actually get a new set of patches
> out.
> 
> Any suggestions for how much of this is appropriate for Documentation/
> would be much appreciated.  I don't have a good feel for it.

I think all of it. The kernels problem is definitely not that it
drains in documentation :)
 
> Having ruled out all of the userspace-only approaches for managing
> bounds tables that we could think of, we create them on demand
> in the kernel.

So what the documentation wants on top of this is the rule set which
describes the expected behaviour of sane applications and perhaps the
potential consequences for insane ones. Not that people care about
that much, but at least we can point them to documentation if they
come up with their weird ass "bug" reports :)

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
  2014-09-12 22:08       ` Dave Hansen
@ 2014-09-13  9:39         ` Thomas Gleixner
  -1 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:

> OK, here's some revised text for patch 00/10.  Again, this will
> obviously be updated for the next post, but comments before that would
> be much appreciated.

That looks good. So much of this wants to end up in documentation as
well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 00/10] Intel MPX support
@ 2014-09-13  9:39         ` Thomas Gleixner
  0 siblings, 0 replies; 130+ messages in thread
From: Thomas Gleixner @ 2014-09-13  9:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Qiaowei Ren, H. Peter Anvin, Ingo Molnar, x86, linux-mm, linux-kernel

On Fri, 12 Sep 2014, Dave Hansen wrote:

> OK, here's some revised text for patch 00/10.  Again, this will
> obviously be updated for the next post, but comments before that would
> be much appreciated.

That looks good. So much of this wants to end up in documentation as
well.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-15  0:00     ` One Thousand Gnomes
  -1 siblings, 0 replies; 130+ messages in thread
From: One Thousand Gnomes @ 2014-09-15  0:00 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	linux-mm, linux-kernel

> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

Not really because by the time you ask the question another thread might
have decided to unregister it.


> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */
> +	mm->bd_addr = task_get_bounds_dir(tsk);
> +	if (!mm->bd_addr)
> +		return -EINVAL;

What stops two threads calling this in parallel ?
> +
> +	return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	mm->bd_addr = NULL;

or indeed calling this in parallel

What are the semantics across execve() ?

Alan

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-15  0:00     ` One Thousand Gnomes
  0 siblings, 0 replies; 130+ messages in thread
From: One Thousand Gnomes @ 2014-09-15  0:00 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	linux-mm, linux-kernel

> The base of the bounds directory is set into mm_struct during
> PR_MPX_REGISTER command execution. This member can be used to
> check whether one application is mpx enabled.

Not really because by the time you ask the question another thread might
have decided to unregister it.


> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */
> +	mm->bd_addr = task_get_bounds_dir(tsk);
> +	if (!mm->bd_addr)
> +		return -EINVAL;

What stops two threads calling this in parallel ?
> +
> +	return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	mm->bd_addr = NULL;

or indeed calling this in parallel

What are the semantics across execve() ?

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-15 20:53     ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-15 20:53 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr)
> +{
> +	int valid;
> +
> +	if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry))))
> +		return -EFAULT;

Nit: get rid of unnecessary parenthesis.

> +	pagefault_disable();
> +	if (get_user(*bt_addr, bd_entry))
> +		goto out;
> +	pagefault_enable();

Nit #2: Rewrite this.  Do this:

	int ret;
	...
	pagefault_disable();
	ret = get_user(*bt_addr, bd_entry))
	pagefault_enable();
	if (ret)
		return ret;

Then you don't need the out block below.

> +	valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
> +	*bt_addr &= MPX_BT_ADDR_MASK;
> +
> +	/*
> +	 * If this bounds directory entry is nonzero, and meanwhile
> +	 * the valid bit is zero, one SIGSEGV will be produced due to
> +	 * this unexpected situation.
> +	 */
> +	if (!valid && *bt_addr)
> +		return -EINVAL;

/*
 * Not present is OK.  It just means there was no bounds table
 * for this memory, which is completely OK.  Make sure to distinguish
 * this from -EINVAL, which will cause a SEGV.
 */

> +	if (!valid)
> +		return -ENOENT;
> +
> +	return 0;
> +
> +out:
> +	pagefault_enable();
> +	return -EFAULT;
> +}
> +
> +/*
> + * Free the backing physical pages of bounds table 'bt_addr'.
> + * Assume start...end is within that bounds table.
> + */
> +static int __must_check zap_bt_entries(struct mm_struct *mm,
> +		unsigned long bt_addr,
> +		unsigned long start, unsigned long end)
> +{
> +	struct vm_area_struct *vma;
> +
> +	/* Find the vma which overlaps this bounds table */
> +	vma = find_vma(mm, bt_addr);
> +	/*
> +	 * The table entry comes from userspace and could be
> +	 * pointing anywhere, so make sure it is at least
> +	 * pointing to valid memory.
> +	 */
> +	if (!vma || !(vma->vm_flags & VM_MPX) ||
> +			vma->vm_start > bt_addr ||
> +			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
> +		return -EINVAL;

If someone did *ANYTHING* to split the VMA, this check would fail.  I
think that's a little draconian, considering that somebody could do a
NUMA policy on part of a VM_MPX VMA and cause it to be split.

This check should look across the entire 'bt_addr ->
bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap
only those.

If we encounter a non-VM_MPX vma, it should be ignored.

> +	zap_page_range(vma, start, end - start, NULL);
> +	return 0;
> +}
> +
> +static int __must_check unmap_single_bt(struct mm_struct *mm,
> +		long __user *bd_entry, unsigned long bt_addr)
> +{
> +	int ret;
> +
> +	pagefault_disable();
> +	ret = user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
> +			bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0);
> +	pagefault_enable();
> +	if (ret)
> +		return -EFAULT;
> +
> +	/*
> +	 * to avoid recursion, do_munmap() will check whether it comes
> +	 * from one bounds table through VM_MPX flag.
> +	 */

Add this to the comment: "Note, we are likely being called under
do_munmap() already."

> +	return do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
> +}

Add a comment about where we checked for VM_MPX already.

> +/*
> + * If the bounds table pointed by bounds directory 'bd_entry' is
> + * not shared, unmap this whole bounds table. Otherwise, only free
> + * those backing physical pages of bounds table entries covered
> + * in this virtual address region start...end.
> + */
> +static int __must_check unmap_shared_bt(struct mm_struct *mm,
> +		long __user *bd_entry, unsigned long start,
> +		unsigned long end, bool prev_shared, bool next_shared)
> +{
> +	unsigned long bt_addr;
> +	int ret;
> +
> +	ret = get_bt_addr(bd_entry, &bt_addr);
> +	if (ret)
> +		return ret;
> +
> +	if (prev_shared && next_shared)
> +		ret = zap_bt_entries(mm, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
> +	else if (prev_shared)
> +		ret = zap_bt_entries(mm, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> +				bt_addr+MPX_BT_SIZE_BYTES);
> +	else if (next_shared)
> +		ret = zap_bt_entries(mm, bt_addr, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
> +	else
> +		ret = unmap_single_bt(mm, bd_entry, bt_addr);
> +
> +	return ret;
> +}
> +
> +/*
> + * A virtual address region being munmap()ed might share bounds table
> + * with adjacent VMAs. We only need to free the backing physical
> + * memory of these shared bounds tables entries covered in this virtual
> + * address region.
> + *
> + * the VMAs covering the virtual address region start...end have already
> + * been split if necessary and removed from the VMA list.
> + */
> +static int __must_check unmap_side_bts(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{

> +	long __user *bde_start, *bde_end;
> +	struct vm_area_struct *prev, *next;
> +	bool prev_shared = false, next_shared = false;
> +
> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> +
> +	next = find_vma_prev(mm, start, &prev);

Let's update the comment here to:

/* We already unliked the VMAs from the mm's rbtree so 'start' is
guaranteed to be in a hole.  This gets us the first VMA before the hole
in to 'prev' and the next VMA after the hole in to 'next'. */

> +	if (prev && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1))
> +			== bde_start)
> +		prev_shared = true;
> +	if (next && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(next->vm_start))
> +			== bde_end)
> +		next_shared = true;
> +	/*
> +	 * This virtual address region being munmap()ed is only
> +	 * covered by one bounds table.
> +	 *
> +	 * In this case, if this table is also shared with adjacent
> +	 * VMAs, only part of the backing physical memory of the bounds
> +	 * table need be freeed. Otherwise the whole bounds table need
> +	 * be unmapped.
> +	 */
> +	if (bde_start == bde_end) {
> +		return unmap_shared_bt(mm, bde_start, start, end,
> +				prev_shared, next_shared);
> +	}
> +
> +	/*
> +	 * If more than one bounds tables are covered in this virtual
> +	 * address region being munmap()ed, we need to separately check
> +	 * whether bde_start and bde_end are shared with adjacent VMAs.
> +	 */
> +	ret = unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = unmap_shared_bt(mm, bde_end, start, end, false, next_shared);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static int __must_check mpx_try_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +	long __user *bd_entry, *bde_start, *bde_end;
> +	unsigned long bt_addr;
> +
> +	/*
> +	 * unmap bounds tables pointed out by start/end bounds directory
> +	 * entries, or only free part of their backing physical memroy
> +	 * if they are shared with adjacent VMAs.
> +	 */

New comment suggestion:
/*
 * "Side" bounds tables are those which are being used by the region
 * (start -> end), but that may be shared with adjacent areas.  If they
 * turn out to be completely unshared, they will be freed.  If they are
 * shared, we will free the backing store (like an MADV_DONTNEED) for
 * areas used by this region.
 */

> +	ret = unmap_side_bts(mm, start, end);

I think I'd start calling these "edge" bounds tables.

> +	if (ret == -EFAULT)
> +		return ret;
> +
> +	/*
> +	 * unmap those bounds table which are entirely covered in this
> +	 * virtual address region.
> +	 */

Entirely covered *AND* not at the edges, right?

> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> +	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {

This needs a big fat comment that it is only freeing the bounds tables
that are
1. fully covered
2. not at the edges of the mapping, even if full aligned

Does this get any nicer if we have unmap_side_bts() *ONLY* go after
bounds tables that are partially owned by the region being unmapped?

It seems like we really should do this:

	for (each bt fully owned)
		unmap_single_bt()
	if (start edge unaligned)
		free start edge
	if (end edge unaligned)
		free end edge

I bet the unmap_side_bts() code gets simpler if we do that, too.

> +		ret = get_bt_addr(bd_entry, &bt_addr);
> +		/*
> +		 * A fault means we have to drop mmap_sem,
> +		 * perform the fault, and retry this somehow.
> +		 */
> +		if (ret == -EFAULT)
> +			return ret;
> +		/*
> +		 * Any other issue (like a bad bounds-directory)
> +		 * we can try the next one.
> +		 */
> +		if (ret)
> +			continue;
> +
> +		ret = unmap_single_bt(mm, bd_entry, bt_addr);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Free unused bounds tables covered in a virtual address region being
> + * munmap()ed. Assume end > start.
> + *
> + * This function will be called by do_munmap(), and the VMAs covering
> + * the virtual address region start...end have already been split if
> + * necessary and remvoed from the VMA list.
> + */
> +void mpx_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +
> +	ret = mpx_try_unmap(mm, start, end);

We should rename mpx_try_unmap().  Please rename to:

	mpx_unmap_tables_for(mm, start, end);


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-15 20:53     ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-15 20:53 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +static int get_bt_addr(long __user *bd_entry, unsigned long *bt_addr)
> +{
> +	int valid;
> +
> +	if (!access_ok(VERIFY_READ, (bd_entry), sizeof(*(bd_entry))))
> +		return -EFAULT;

Nit: get rid of unnecessary parenthesis.

> +	pagefault_disable();
> +	if (get_user(*bt_addr, bd_entry))
> +		goto out;
> +	pagefault_enable();

Nit #2: Rewrite this.  Do this:

	int ret;
	...
	pagefault_disable();
	ret = get_user(*bt_addr, bd_entry))
	pagefault_enable();
	if (ret)
		return ret;

Then you don't need the out block below.

> +	valid = *bt_addr & MPX_BD_ENTRY_VALID_FLAG;
> +	*bt_addr &= MPX_BT_ADDR_MASK;
> +
> +	/*
> +	 * If this bounds directory entry is nonzero, and meanwhile
> +	 * the valid bit is zero, one SIGSEGV will be produced due to
> +	 * this unexpected situation.
> +	 */
> +	if (!valid && *bt_addr)
> +		return -EINVAL;

/*
 * Not present is OK.  It just means there was no bounds table
 * for this memory, which is completely OK.  Make sure to distinguish
 * this from -EINVAL, which will cause a SEGV.
 */

> +	if (!valid)
> +		return -ENOENT;
> +
> +	return 0;
> +
> +out:
> +	pagefault_enable();
> +	return -EFAULT;
> +}
> +
> +/*
> + * Free the backing physical pages of bounds table 'bt_addr'.
> + * Assume start...end is within that bounds table.
> + */
> +static int __must_check zap_bt_entries(struct mm_struct *mm,
> +		unsigned long bt_addr,
> +		unsigned long start, unsigned long end)
> +{
> +	struct vm_area_struct *vma;
> +
> +	/* Find the vma which overlaps this bounds table */
> +	vma = find_vma(mm, bt_addr);
> +	/*
> +	 * The table entry comes from userspace and could be
> +	 * pointing anywhere, so make sure it is at least
> +	 * pointing to valid memory.
> +	 */
> +	if (!vma || !(vma->vm_flags & VM_MPX) ||
> +			vma->vm_start > bt_addr ||
> +			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
> +		return -EINVAL;

If someone did *ANYTHING* to split the VMA, this check would fail.  I
think that's a little draconian, considering that somebody could do a
NUMA policy on part of a VM_MPX VMA and cause it to be split.

This check should look across the entire 'bt_addr ->
bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap
only those.

If we encounter a non-VM_MPX vma, it should be ignored.

> +	zap_page_range(vma, start, end - start, NULL);
> +	return 0;
> +}
> +
> +static int __must_check unmap_single_bt(struct mm_struct *mm,
> +		long __user *bd_entry, unsigned long bt_addr)
> +{
> +	int ret;
> +
> +	pagefault_disable();
> +	ret = user_atomic_cmpxchg_inatomic(&bt_addr, bd_entry,
> +			bt_addr | MPX_BD_ENTRY_VALID_FLAG, 0);
> +	pagefault_enable();
> +	if (ret)
> +		return -EFAULT;
> +
> +	/*
> +	 * to avoid recursion, do_munmap() will check whether it comes
> +	 * from one bounds table through VM_MPX flag.
> +	 */

Add this to the comment: "Note, we are likely being called under
do_munmap() already."

> +	return do_munmap(mm, bt_addr & MPX_BT_ADDR_MASK, MPX_BT_SIZE_BYTES);
> +}

Add a comment about where we checked for VM_MPX already.

> +/*
> + * If the bounds table pointed by bounds directory 'bd_entry' is
> + * not shared, unmap this whole bounds table. Otherwise, only free
> + * those backing physical pages of bounds table entries covered
> + * in this virtual address region start...end.
> + */
> +static int __must_check unmap_shared_bt(struct mm_struct *mm,
> +		long __user *bd_entry, unsigned long start,
> +		unsigned long end, bool prev_shared, bool next_shared)
> +{
> +	unsigned long bt_addr;
> +	int ret;
> +
> +	ret = get_bt_addr(bd_entry, &bt_addr);
> +	if (ret)
> +		return ret;
> +
> +	if (prev_shared && next_shared)
> +		ret = zap_bt_entries(mm, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
> +	else if (prev_shared)
> +		ret = zap_bt_entries(mm, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(start),
> +				bt_addr+MPX_BT_SIZE_BYTES);
> +	else if (next_shared)
> +		ret = zap_bt_entries(mm, bt_addr, bt_addr,
> +				bt_addr+MPX_GET_BT_ENTRY_OFFSET(end));
> +	else
> +		ret = unmap_single_bt(mm, bd_entry, bt_addr);
> +
> +	return ret;
> +}
> +
> +/*
> + * A virtual address region being munmap()ed might share bounds table
> + * with adjacent VMAs. We only need to free the backing physical
> + * memory of these shared bounds tables entries covered in this virtual
> + * address region.
> + *
> + * the VMAs covering the virtual address region start...end have already
> + * been split if necessary and removed from the VMA list.
> + */
> +static int __must_check unmap_side_bts(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{

> +	long __user *bde_start, *bde_end;
> +	struct vm_area_struct *prev, *next;
> +	bool prev_shared = false, next_shared = false;
> +
> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> +
> +	next = find_vma_prev(mm, start, &prev);

Let's update the comment here to:

/* We already unliked the VMAs from the mm's rbtree so 'start' is
guaranteed to be in a hole.  This gets us the first VMA before the hole
in to 'prev' and the next VMA after the hole in to 'next'. */

> +	if (prev && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(prev->vm_end-1))
> +			== bde_start)
> +		prev_shared = true;
> +	if (next && (mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(next->vm_start))
> +			== bde_end)
> +		next_shared = true;
> +	/*
> +	 * This virtual address region being munmap()ed is only
> +	 * covered by one bounds table.
> +	 *
> +	 * In this case, if this table is also shared with adjacent
> +	 * VMAs, only part of the backing physical memory of the bounds
> +	 * table need be freeed. Otherwise the whole bounds table need
> +	 * be unmapped.
> +	 */
> +	if (bde_start == bde_end) {
> +		return unmap_shared_bt(mm, bde_start, start, end,
> +				prev_shared, next_shared);
> +	}
> +
> +	/*
> +	 * If more than one bounds tables are covered in this virtual
> +	 * address region being munmap()ed, we need to separately check
> +	 * whether bde_start and bde_end are shared with adjacent VMAs.
> +	 */
> +	ret = unmap_shared_bt(mm, bde_start, start, end, prev_shared, false);
> +	if (ret)
> +		return ret;
> +
> +	ret = unmap_shared_bt(mm, bde_end, start, end, false, next_shared);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static int __must_check mpx_try_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +	long __user *bd_entry, *bde_start, *bde_end;
> +	unsigned long bt_addr;
> +
> +	/*
> +	 * unmap bounds tables pointed out by start/end bounds directory
> +	 * entries, or only free part of their backing physical memroy
> +	 * if they are shared with adjacent VMAs.
> +	 */

New comment suggestion:
/*
 * "Side" bounds tables are those which are being used by the region
 * (start -> end), but that may be shared with adjacent areas.  If they
 * turn out to be completely unshared, they will be freed.  If they are
 * shared, we will free the backing store (like an MADV_DONTNEED) for
 * areas used by this region.
 */

> +	ret = unmap_side_bts(mm, start, end);

I think I'd start calling these "edge" bounds tables.

> +	if (ret == -EFAULT)
> +		return ret;
> +
> +	/*
> +	 * unmap those bounds table which are entirely covered in this
> +	 * virtual address region.
> +	 */

Entirely covered *AND* not at the edges, right?

> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
> +	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {

This needs a big fat comment that it is only freeing the bounds tables
that are
1. fully covered
2. not at the edges of the mapping, even if full aligned

Does this get any nicer if we have unmap_side_bts() *ONLY* go after
bounds tables that are partially owned by the region being unmapped?

It seems like we really should do this:

	for (each bt fully owned)
		unmap_single_bt()
	if (start edge unaligned)
		free start edge
	if (end edge unaligned)
		free end edge

I bet the unmap_side_bts() code gets simpler if we do that, too.

> +		ret = get_bt_addr(bd_entry, &bt_addr);
> +		/*
> +		 * A fault means we have to drop mmap_sem,
> +		 * perform the fault, and retry this somehow.
> +		 */
> +		if (ret == -EFAULT)
> +			return ret;
> +		/*
> +		 * Any other issue (like a bad bounds-directory)
> +		 * we can try the next one.
> +		 */
> +		if (ret)
> +			continue;
> +
> +		ret = unmap_single_bt(mm, bd_entry, bt_addr);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Free unused bounds tables covered in a virtual address region being
> + * munmap()ed. Assume end > start.
> + *
> + * This function will be called by do_munmap(), and the VMAs covering
> + * the virtual address region start...end have already been split if
> + * necessary and remvoed from the VMA list.
> + */
> +void mpx_unmap(struct mm_struct *mm,
> +		unsigned long start, unsigned long end)
> +{
> +	int ret;
> +
> +	ret = mpx_try_unmap(mm, start, end);

We should rename mpx_try_unmap().  Please rename to:

	mpx_unmap_tables_for(mm, start, end);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-15  0:00     ` One Thousand Gnomes
@ 2014-09-16  3:20       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-16  3:20 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-15, One Thousand Gnomes wrote:
>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to check
>> whether one application is mpx enabled.
> 
> Not really because by the time you ask the question another thread
> might have decided to unregister it.
> 
> 
>> +int mpx_register(struct task_struct *tsk) {
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
>> +	mm->bd_addr = task_get_bounds_dir(tsk);
>> +	if (!mm->bd_addr)
>> +		return -EINVAL;
> 
> What stops two threads calling this in parallel ?
>> +
>> +	return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +	struct mm_struct *mm = current->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	mm->bd_addr = NULL;
> 
> or indeed calling this in parallel
> 
> What are the semantics across execve() ?
> 
This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 

It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation.

Thanks,
Qiaowei


^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-16  3:20       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-16  3:20 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-15, One Thousand Gnomes wrote:
>> The base of the bounds directory is set into mm_struct during
>> PR_MPX_REGISTER command execution. This member can be used to check
>> whether one application is mpx enabled.
> 
> Not really because by the time you ask the question another thread
> might have decided to unregister it.
> 
> 
>> +int mpx_register(struct task_struct *tsk) {
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
>> +	mm->bd_addr = task_get_bounds_dir(tsk);
>> +	if (!mm->bd_addr)
>> +		return -EINVAL;
> 
> What stops two threads calling this in parallel ?
>> +
>> +	return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +	struct mm_struct *mm = current->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	mm->bd_addr = NULL;
> 
> or indeed calling this in parallel
> 
> What are the semantics across execve() ?
> 
This will not impact on the semantics of execve(). One runtime library for MPX will be provided (or merged into Glibc), and when the application starts, this runtime will be called to initialize MPX runtime environment, including calling prctl() to notify the kernel to start managing the bounds directories. You can see the discussion about exec(): https://lkml.org/lkml/2014/1/26/199 

It would be extremely unusual for an application to have some MPX and some non-MPX threads, since they would share the same address space and the non-MPX threads would mess up the bounds. That is to say, it looks like be unusual for one of these threads to call prctl() to enable or disable MPX. I guess we need to add some rules into documentation.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-16  3:20       ` Ren, Qiaowei
@ 2014-09-16  4:17         ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-16  4:17 UTC (permalink / raw)
  To: Ren, Qiaowei, One Thousand Gnomes
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-mm,
	linux-kernel

On 09/15/2014 08:20 PM, Ren, Qiaowei wrote:
>> What are the semantics across execve() ?
>> 
> This will not impact on the semantics of execve(). One runtime
> library
> for MPX will be provided (or merged into Glibc), and when the
> application starts, this runtime will be called to initialize MPX
> runtime environment, including calling prctl() to notify the kernel to
> start managing the bounds directories. You can see the discussion
> about exec(): https://lkml.org/lkml/2014/1/26/199

I think he's asking what happens to the kernel value at execve() time.

The short answer is that it is zero'd along with the rest of a new mm.
It probably _shouldn't_ be, though.  It's actually valid to have a bound
directory at 0x0.  We probably need to initialize it to -1 instead, and
that means initializing to -1 at execve() time.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-16  4:17         ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-16  4:17 UTC (permalink / raw)
  To: Ren, Qiaowei, One Thousand Gnomes
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-mm,
	linux-kernel

On 09/15/2014 08:20 PM, Ren, Qiaowei wrote:
>> What are the semantics across execve() ?
>> 
> This will not impact on the semantics of execve(). One runtime
> library
> for MPX will be provided (or merged into Glibc), and when the
> application starts, this runtime will be called to initialize MPX
> runtime environment, including calling prctl() to notify the kernel to
> start managing the bounds directories. You can see the discussion
> about exec(): https://lkml.org/lkml/2014/1/26/199

I think he's asking what happens to the kernel value at execve() time.

The short answer is that it is zero'd along with the rest of a new mm.
It probably _shouldn't_ be, though.  It's actually valid to have a bound
directory at 0x0.  We probably need to initialize it to -1 instead, and
that means initializing to -1 at execve() time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-16  7:50     ` Kevin Easton
  -1 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-16  7:50 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	linux-mm, linux-kernel

On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:

> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> +	struct xsave_struct *xsave_buf;
> +
> +	fpu_xsave(&tsk->thread.fpu);
> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> +		return NULL;
> +
> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}

This only makes sense if called with 'current', so is there any need
for the function argument?

> +
> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */
> +	mm->bd_addr = task_get_bounds_dir(tsk);
> +	if (!mm->bd_addr)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	mm->bd_addr = NULL;
> +	return 0;
> +}

If that's changed, then mpx_register() and mpx_unregister() don't need
a task_struct, just an mm_struct.

Probably these functions should be locking mmap_sem.

Would it be prudent to use an error code other than EINVAL for the 
"hardware doesn't support it" case?

> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
>  		up_write(&me->mm->mmap_sem);
>  		break;
> +	case PR_MPX_REGISTER:
> +		error = MPX_REGISTER(me);
> +		break;
> +	case PR_MPX_UNREGISTER:
> +		error = MPX_UNREGISTER(me);
> +		break;

If you pass me->mm from prctl, that makes it clear that it's per-process
not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.

This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's
not using them, otherwise you'll be sunk if you ever want to use them later.

It seems like it only makes sense for all threads using the mm to have the
same bounds directory set.  If the interface was changed to directly pass
the address, then could the kernel take care of setting it for *all* of
the threads in the process? This seems like something that would be easier
for the kernel to do than userspace.

    - Kevin

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-16  7:50     ` Kevin Easton
  0 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-16  7:50 UTC (permalink / raw)
  To: Qiaowei Ren
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Dave Hansen, x86,
	linux-mm, linux-kernel

On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:

> +static __user void *task_get_bounds_dir(struct task_struct *tsk)
> +{
> +	struct xsave_struct *xsave_buf;
> +
> +	fpu_xsave(&tsk->thread.fpu);
> +	xsave_buf = &(tsk->thread.fpu.state->xsave);
> +	if (!(xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ENABLE_FLAG))
> +		return NULL;
> +
> +	return (void __user *)(unsigned long)(xsave_buf->bndcsr.cfg_reg_u &
> +			MPX_BNDCFG_ADDR_MASK);
> +}

This only makes sense if called with 'current', so is there any need
for the function argument?

> +
> +int mpx_register(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = tsk->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	/*
> +	 * runtime in the userspace will be responsible for allocation of
> +	 * the bounds directory. Then, it will save the base of the bounds
> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
> +	 * XRSTOR instruction.
> +	 *
> +	 * fpu_xsave() is expected to be very expensive. In order to do
> +	 * performance optimization, here we get the base of the bounds
> +	 * directory and then save it into mm_struct to be used in future.
> +	 */
> +	mm->bd_addr = task_get_bounds_dir(tsk);
> +	if (!mm->bd_addr)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +int mpx_unregister(struct task_struct *tsk)
> +{
> +	struct mm_struct *mm = current->mm;
> +
> +	if (!cpu_has_mpx)
> +		return -EINVAL;
> +
> +	mm->bd_addr = NULL;
> +	return 0;
> +}

If that's changed, then mpx_register() and mpx_unregister() don't need
a task_struct, just an mm_struct.

Probably these functions should be locking mmap_sem.

Would it be prudent to use an error code other than EINVAL for the 
"hardware doesn't support it" case?

> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
>  		up_write(&me->mm->mmap_sem);
>  		break;
> +	case PR_MPX_REGISTER:
> +		error = MPX_REGISTER(me);
> +		break;
> +	case PR_MPX_UNREGISTER:
> +		error = MPX_UNREGISTER(me);
> +		break;

If you pass me->mm from prctl, that makes it clear that it's per-process
not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.

This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if it's
not using them, otherwise you'll be sunk if you ever want to use them later.

It seems like it only makes sense for all threads using the mm to have the
same bounds directory set.  If the interface was changed to directly pass
the address, then could the kernel take care of setting it for *all* of
the threads in the process? This seems like something that would be easier
for the kernel to do than userspace.

    - Kevin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
  2014-09-15 20:53     ` Dave Hansen
@ 2014-09-16  8:06       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-16  8:06 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-16, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +/*
>> + * Free the backing physical pages of bounds table 'bt_addr'.
>> + * Assume start...end is within that bounds table.
>> + */
>> +static int __must_check zap_bt_entries(struct mm_struct *mm,
>> +		unsigned long bt_addr,
>> +		unsigned long start, unsigned long end) {
>> +	struct vm_area_struct *vma;
>> +
>> +	/* Find the vma which overlaps this bounds table */
>> +	vma = find_vma(mm, bt_addr);
>> +	/*
>> +	 * The table entry comes from userspace and could be
>> +	 * pointing anywhere, so make sure it is at least
>> +	 * pointing to valid memory.
>> +	 */
>> +	if (!vma || !(vma->vm_flags & VM_MPX) ||
>> +			vma->vm_start > bt_addr ||
>> +			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
>> +		return -EINVAL;
> 
> If someone did *ANYTHING* to split the VMA, this check would fail.  I
> think that's a little draconian, considering that somebody could do a
> NUMA policy on part of a VM_MPX VMA and cause it to be split.
> 
> This check should look across the entire 'bt_addr ->
> bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap
> only those.
> 
> If we encounter a non-VM_MPX vma, it should be ignored.
>
Ok.

>> +	if (ret == -EFAULT)
>> +		return ret;
>> +
>> +	/*
>> +	 * unmap those bounds table which are entirely covered in this
>> +	 * virtual address region.
>> +	 */
> 
> Entirely covered *AND* not at the edges, right?
> 
Yes.

>> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
>> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
>> +	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
> 
> This needs a big fat comment that it is only freeing the bounds tables that are 1.
> fully covered 2. not at the edges of the mapping, even if full aligned
> 
> Does this get any nicer if we have unmap_side_bts() *ONLY* go after
> bounds tables that are partially owned by the region being unmapped?
> 
> It seems like we really should do this:
> 
> 	for (each bt fully owned)
> 		unmap_single_bt()
> 	if (start edge unaligned)
> 		free start edge
> 	if (end edge unaligned)
> 		free end edge
> 
> I bet the unmap_side_bts() code gets simpler if we do that, too.
> 
Maybe. I will try this.

Thanks,
Qiaowei

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 09/10] x86, mpx: cleanup unused bound tables
@ 2014-09-16  8:06       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-16  8:06 UTC (permalink / raw)
  To: Hansen, Dave, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel



On 2014-09-16, Hansen, Dave wrote:
> On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
>> +/*
>> + * Free the backing physical pages of bounds table 'bt_addr'.
>> + * Assume start...end is within that bounds table.
>> + */
>> +static int __must_check zap_bt_entries(struct mm_struct *mm,
>> +		unsigned long bt_addr,
>> +		unsigned long start, unsigned long end) {
>> +	struct vm_area_struct *vma;
>> +
>> +	/* Find the vma which overlaps this bounds table */
>> +	vma = find_vma(mm, bt_addr);
>> +	/*
>> +	 * The table entry comes from userspace and could be
>> +	 * pointing anywhere, so make sure it is at least
>> +	 * pointing to valid memory.
>> +	 */
>> +	if (!vma || !(vma->vm_flags & VM_MPX) ||
>> +			vma->vm_start > bt_addr ||
>> +			vma->vm_end < bt_addr+MPX_BT_SIZE_BYTES)
>> +		return -EINVAL;
> 
> If someone did *ANYTHING* to split the VMA, this check would fail.  I
> think that's a little draconian, considering that somebody could do a
> NUMA policy on part of a VM_MPX VMA and cause it to be split.
> 
> This check should look across the entire 'bt_addr ->
> bt_addr+MPX_BT_SIZE_BYTES' range, find all of the VM_MPX VMAs, and zap
> only those.
> 
> If we encounter a non-VM_MPX vma, it should be ignored.
>
Ok.

>> +	if (ret == -EFAULT)
>> +		return ret;
>> +
>> +	/*
>> +	 * unmap those bounds table which are entirely covered in this
>> +	 * virtual address region.
>> +	 */
> 
> Entirely covered *AND* not at the edges, right?
> 
Yes.

>> +	bde_start = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(start);
>> +	bde_end = mm->bd_addr + MPX_GET_BD_ENTRY_OFFSET(end-1);
>> +	for (bd_entry = bde_start + 1; bd_entry < bde_end; bd_entry++) {
> 
> This needs a big fat comment that it is only freeing the bounds tables that are 1.
> fully covered 2. not at the edges of the mapping, even if full aligned
> 
> Does this get any nicer if we have unmap_side_bts() *ONLY* go after
> bounds tables that are partially owned by the region being unmapped?
> 
> It seems like we really should do this:
> 
> 	for (each bt fully owned)
> 		unmap_single_bt()
> 	if (start edge unaligned)
> 		free start edge
> 	if (end edge unaligned)
> 		free end edge
> 
> I bet the unmap_side_bts() code gets simpler if we do that, too.
> 
Maybe. I will try this.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-16  7:50     ` Kevin Easton
@ 2014-09-18  0:40       ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-18  0:40 UTC (permalink / raw)
  To: Kevin Easton
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-16, Kevin Easton wrote:
> On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:
>> +
>> +int mpx_register(struct task_struct *tsk) {
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
>> +	mm->bd_addr = task_get_bounds_dir(tsk);
>> +	if (!mm->bd_addr)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +	struct mm_struct *mm = current->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	mm->bd_addr = NULL;
>> +	return 0;
>> +}
> 
> If that's changed, then mpx_register() and mpx_unregister() don't need
> a task_struct, just an mm_struct.
> 
Yes. An mm_struct is enough.

> Probably these functions should be locking mmap_sem.
> 
> Would it be prudent to use an error code other than EINVAL for the
> "hardware doesn't support it" case?
>
Seems like no specific error code for this case.

>> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> long, arg2, unsigned long, arg3,
>>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
>>  		up_write(&me->mm->mmap_sem);
>>  		break;
>> +	case PR_MPX_REGISTER:
>> +		error = MPX_REGISTER(me);
>> +		break;
>> +	case PR_MPX_UNREGISTER:
>> +		error = MPX_UNREGISTER(me);
>> +		break;
> 
> If you pass me->mm from prctl, that makes it clear that it's
> per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> 
> This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> it's not using them, otherwise you'll be sunk if you ever want to use them later.
> 
> It seems like it only makes sense for all threads using the mm to have
> the same bounds directory set.  If the interface was changed to
> directly pass the address, then could the kernel take care of setting
> it for *all* of the threads in the process? This seems like something
> that would be easier for the kernel to do than userspace.
> 
If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this.

Thanks,
Qiaowei


^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  0:40       ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-18  0:40 UTC (permalink / raw)
  To: Kevin Easton
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-16, Kevin Easton wrote:
> On Thu, Sep 11, 2014 at 04:46:48PM +0800, Qiaowei Ren wrote:
>> +
>> +int mpx_register(struct task_struct *tsk) {
>> +	struct mm_struct *mm = tsk->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * runtime in the userspace will be responsible for allocation of
>> +	 * the bounds directory. Then, it will save the base of the bounds
>> +	 * directory into XSAVE/XRSTOR Save Area and enable MPX through
>> +	 * XRSTOR instruction.
>> +	 *
>> +	 * fpu_xsave() is expected to be very expensive. In order to do
>> +	 * performance optimization, here we get the base of the bounds
>> +	 * directory and then save it into mm_struct to be used in future.
>> +	 */
>> +	mm->bd_addr = task_get_bounds_dir(tsk);
>> +	if (!mm->bd_addr)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
>> +
>> +int mpx_unregister(struct task_struct *tsk) {
>> +	struct mm_struct *mm = current->mm;
>> +
>> +	if (!cpu_has_mpx)
>> +		return -EINVAL;
>> +
>> +	mm->bd_addr = NULL;
>> +	return 0;
>> +}
> 
> If that's changed, then mpx_register() and mpx_unregister() don't need
> a task_struct, just an mm_struct.
> 
Yes. An mm_struct is enough.

> Probably these functions should be locking mmap_sem.
> 
> Would it be prudent to use an error code other than EINVAL for the
> "hardware doesn't support it" case?
>
Seems like no specific error code for this case.

>> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> long, arg2, unsigned long, arg3,
>>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
>>  		up_write(&me->mm->mmap_sem);
>>  		break;
>> +	case PR_MPX_REGISTER:
>> +		error = MPX_REGISTER(me);
>> +		break;
>> +	case PR_MPX_UNREGISTER:
>> +		error = MPX_UNREGISTER(me);
>> +		break;
> 
> If you pass me->mm from prctl, that makes it clear that it's
> per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> 
> This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> it's not using them, otherwise you'll be sunk if you ever want to use them later.
> 
> It seems like it only makes sense for all threads using the mm to have
> the same bounds directory set.  If the interface was changed to
> directly pass the address, then could the kernel take care of setting
> it for *all* of the threads in the process? This seems like something
> that would be easier for the kernel to do than userspace.
> 
If the interface was changed to this, it will be possible for insane application to pass error bounds directory address to kernel. We still have to call fpu_xsave() to check this.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-18  3:23         ` Kevin Easton
@ 2014-09-18  2:37           ` Ren, Qiaowei
  -1 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-18  2:37 UTC (permalink / raw)
  To: Kevin Easton
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-18, Kevin Easton wrote:
> On Thu, Sep 18, 2014 at 12:40:29AM +0000, Ren, Qiaowei wrote:
>>> Would it be prudent to use an error code other than EINVAL for the
>>> "hardware doesn't support it" case?
>>> 
>> Seems like no specific error code for this case.
> 
> ENXIO would probably be OK.  It's not too important as long as it's
> documented.
> 
Yes. Looks like that ENXIO will be better.

Thanks,
Qiaowei

^ permalink raw reply	[flat|nested] 130+ messages in thread

* RE: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  2:37           ` Ren, Qiaowei
  0 siblings, 0 replies; 130+ messages in thread
From: Ren, Qiaowei @ 2014-09-18  2:37 UTC (permalink / raw)
  To: Kevin Easton
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel



On 2014-09-18, Kevin Easton wrote:
> On Thu, Sep 18, 2014 at 12:40:29AM +0000, Ren, Qiaowei wrote:
>>> Would it be prudent to use an error code other than EINVAL for the
>>> "hardware doesn't support it" case?
>>> 
>> Seems like no specific error code for this case.
> 
> ENXIO would probably be OK.  It's not too important as long as it's
> documented.
> 
Yes. Looks like that ENXIO will be better.

Thanks,
Qiaowei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-18  0:40       ` Ren, Qiaowei
@ 2014-09-18  3:23         ` Kevin Easton
  -1 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-18  3:23 UTC (permalink / raw)
  To: Ren, Qiaowei
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel

On Thu, Sep 18, 2014 at 12:40:29AM +0000, Ren, Qiaowei wrote:
> > Would it be prudent to use an error code other than EINVAL for the
> > "hardware doesn't support it" case?
> >
> Seems like no specific error code for this case.

ENXIO would probably be OK.  It's not too important as long as it's
documented.

> 
> >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> > long, arg2, unsigned long, arg3,
> >>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
> >>  		up_write(&me->mm->mmap_sem);
> >>  		break;
> >> +	case PR_MPX_REGISTER:
> >> +		error = MPX_REGISTER(me);
> >> +		break;
> >> +	case PR_MPX_UNREGISTER:
> >> +		error = MPX_UNREGISTER(me);
> >> +		break;
> > 
> > If you pass me->mm from prctl, that makes it clear that it's
> > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> > 
> > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> > it's not using them, otherwise you'll be sunk if you ever want to use them later.
> > 
> > It seems like it only makes sense for all threads using the mm to have
> > the same bounds directory set.  If the interface was changed to
> > directly pass the address, then could the kernel take care of setting
> > it for *all* of the threads in the process? This seems like something
> > that would be easier for the kernel to do than userspace.
> > 
> If the interface was changed to this, it will be possible for insane 
> application to pass error bounds directory address to kernel. We still 
> have to call fpu_xsave() to check this.

I was actually thinking that the kernel would take care of the xsave / 
xrstor (for current), updating tsk->thread.fpu.state (for non-running
threads) and sending an IPI for threads running on other CPUs.

Of course userspace can always then manually change the bounds directory
address itself, but then it's quite clear that they're doing something
unsupported.  Just an idea, anyway.

    - Kevin

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  3:23         ` Kevin Easton
  0 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-18  3:23 UTC (permalink / raw)
  To: Ren, Qiaowei
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, Hansen, Dave, x86,
	linux-mm, linux-kernel

On Thu, Sep 18, 2014 at 12:40:29AM +0000, Ren, Qiaowei wrote:
> > Would it be prudent to use an error code other than EINVAL for the
> > "hardware doesn't support it" case?
> >
> Seems like no specific error code for this case.

ENXIO would probably be OK.  It's not too important as long as it's
documented.

> 
> >> @@ -2011,6 +2017,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned
> > long, arg2, unsigned long, arg3,
> >>  			me->mm->def_flags &= ~VM_NOHUGEPAGE;
> >>  		up_write(&me->mm->mmap_sem);
> >>  		break;
> >> +	case PR_MPX_REGISTER:
> >> +		error = MPX_REGISTER(me);
> >> +		break;
> >> +	case PR_MPX_UNREGISTER:
> >> +		error = MPX_UNREGISTER(me);
> >> +		break;
> > 
> > If you pass me->mm from prctl, that makes it clear that it's
> > per-process not per-thread, just like PR_SET_DUMPABLE / PR_GET_DUMPABLE.
> > 
> > This code should also enforce nulls in arg2 / arg3 / arg4,/ arg5 if
> > it's not using them, otherwise you'll be sunk if you ever want to use them later.
> > 
> > It seems like it only makes sense for all threads using the mm to have
> > the same bounds directory set.  If the interface was changed to
> > directly pass the address, then could the kernel take care of setting
> > it for *all* of the threads in the process? This seems like something
> > that would be easier for the kernel to do than userspace.
> > 
> If the interface was changed to this, it will be possible for insane 
> application to pass error bounds directory address to kernel. We still 
> have to call fpu_xsave() to check this.

I was actually thinking that the kernel would take care of the xsave / 
xrstor (for current), updating tsk->thread.fpu.state (for non-running
threads) and sending an IPI for threads running on other CPUs.

Of course userspace can always then manually change the bounds directory
address itself, but then it's quite clear that they're doing something
unsupported.  Just an idea, anyway.

    - Kevin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-18  3:23         ` Kevin Easton
@ 2014-09-18  4:43           ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-18  4:43 UTC (permalink / raw)
  To: Kevin Easton, Ren, Qiaowei
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-mm,
	linux-kernel

On 09/17/2014 08:23 PM, Kevin Easton wrote:
> I was actually thinking that the kernel would take care of the xsave / 
> xrstor (for current), updating tsk->thread.fpu.state (for non-running
> threads) and sending an IPI for threads running on other CPUs.
> 
> Of course userspace can always then manually change the bounds directory
> address itself, but then it's quite clear that they're doing something
> unsupported.  Just an idea, anyway.

What's the benefit of that?

As it stands now, MPX is likely to be enabled well before any threads
are created, and the MPX enabling state will be inherited by the new
thread at clone() time.  The current mechanism allows a thread to
individually enable or disable MPX independently of the other threads.

I think it makes it both more complicated and less flexible.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  4:43           ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-18  4:43 UTC (permalink / raw)
  To: Kevin Easton, Ren, Qiaowei
  Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-mm,
	linux-kernel

On 09/17/2014 08:23 PM, Kevin Easton wrote:
> I was actually thinking that the kernel would take care of the xsave / 
> xrstor (for current), updating tsk->thread.fpu.state (for non-running
> threads) and sending an IPI for threads running on other CPUs.
> 
> Of course userspace can always then manually change the bounds directory
> address itself, but then it's quite clear that they're doing something
> unsupported.  Just an idea, anyway.

What's the benefit of that?

As it stands now, MPX is likely to be enabled well before any threads
are created, and the MPX enabling state will be inherited by the new
thread at clone() time.  The current mechanism allows a thread to
individually enable or disable MPX independently of the other threads.

I think it makes it both more complicated and less flexible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-18  7:17             ` Kevin Easton
@ 2014-09-18  6:20               ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-18  6:20 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86,
	linux-mm, linux-kernel

On 09/18/2014 12:17 AM, Kevin Easton wrote:
> I was assuming that if an application did want to enable MPX after threads
> had already been created, it would generally want to enable it
> simultaneously across all threads.  This would be a lot easier for the
> kernel than for the application.

The current gcc setup mechanism would set up MPX even before main().  So
I think it's pretty unlikely that help is needed to coordinate threads.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  6:20               ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-18  6:20 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86,
	linux-mm, linux-kernel

On 09/18/2014 12:17 AM, Kevin Easton wrote:
> I was assuming that if an application did want to enable MPX after threads
> had already been created, it would generally want to enable it
> simultaneously across all threads.  This would be a lot easier for the
> kernel than for the application.

The current gcc setup mechanism would set up MPX even before main().  So
I think it's pretty unlikely that help is needed to coordinate threads.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
  2014-09-18  4:43           ` Dave Hansen
@ 2014-09-18  7:17             ` Kevin Easton
  -1 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-18  7:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86,
	linux-mm, linux-kernel

On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote:
> On 09/17/2014 08:23 PM, Kevin Easton wrote:
> > I was actually thinking that the kernel would take care of the xsave / 
> > xrstor (for current), updating tsk->thread.fpu.state (for non-running
> > threads) and sending an IPI for threads running on other CPUs.
> > 
> > Of course userspace can always then manually change the bounds directory
> > address itself, but then it's quite clear that they're doing something
> > unsupported.  Just an idea, anyway.
> 
> What's the benefit of that?
> 
> As it stands now, MPX is likely to be enabled well before any threads
> are created, and the MPX enabling state will be inherited by the new
> thread at clone() time.  The current mechanism allows a thread to
> individually enable or disable MPX independently of the other threads.
> 
> I think it makes it both more complicated and less flexible.

I was assuming that if an application did want to enable MPX after threads
had already been created, it would generally want to enable it
simultaneously across all threads.  This would be a lot easier for the
kernel than for the application.

    - Kevin

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER
@ 2014-09-18  7:17             ` Kevin Easton
  0 siblings, 0 replies; 130+ messages in thread
From: Kevin Easton @ 2014-09-18  7:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ren, Qiaowei, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86,
	linux-mm, linux-kernel

On Wed, Sep 17, 2014 at 09:43:09PM -0700, Dave Hansen wrote:
> On 09/17/2014 08:23 PM, Kevin Easton wrote:
> > I was actually thinking that the kernel would take care of the xsave / 
> > xrstor (for current), updating tsk->thread.fpu.state (for non-running
> > threads) and sending an IPI for threads running on other CPUs.
> > 
> > Of course userspace can always then manually change the bounds directory
> > address itself, but then it's quite clear that they're doing something
> > unsupported.  Just an idea, anyway.
> 
> What's the benefit of that?
> 
> As it stands now, MPX is likely to be enabled well before any threads
> are created, and the MPX enabling state will be inherited by the new
> thread at clone() time.  The current mechanism allows a thread to
> individually enable or disable MPX independently of the other threads.
> 
> I think it makes it both more complicated and less flexible.

I was assuming that if an application did want to enable MPX after threads
had already been created, it would generally want to enable it
simultaneously across all threads.  This would be a lot easier for the
kernel than for the application.

    - Kevin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
  2014-09-11  8:46   ` Qiaowei Ren
@ 2014-09-24 14:40     ` Dave Hansen
  -1 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-24 14:40 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +/*
> + * When a BNDSTX instruction attempts to save bounds to a BD entry
> + * with the lack of the valid bit being set, a #BR is generated.
> + * This is an indication that no BT exists for this entry. In this
> + * case the fault handler will allocate a new BT.
> + *
> + * With 32-bit mode, the size of BD is 4MB, and the size of each
> + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
> + * and the size of each bound table is 4MB.
> + */
> +int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
> +{
> +	unsigned long status;
> +	unsigned long bd_entry, bd_base;
> +
> +	bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> +	status = xsave_buf->bndcsr.status_reg;
> +
> +	bd_entry = status & MPX_BNDSTA_ADDR_MASK;
> +	if ((bd_entry < bd_base) ||
> +		(bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
> +		return -EINVAL;
> +
> +	return allocate_bt((long __user *)bd_entry);
> +}

This needs a comment about how we got the address of the bd_entry.
Essentially just note that the hardware tells us where the missing/bad
entry is.

Would there be any value in ensuring that a VMA is present at bd_entry?



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables
@ 2014-09-24 14:40     ` Dave Hansen
  0 siblings, 0 replies; 130+ messages in thread
From: Dave Hansen @ 2014-09-24 14:40 UTC (permalink / raw)
  To: Qiaowei Ren, H. Peter Anvin, Thomas Gleixner, Ingo Molnar
  Cc: x86, linux-mm, linux-kernel

On 09/11/2014 01:46 AM, Qiaowei Ren wrote:
> +/*
> + * When a BNDSTX instruction attempts to save bounds to a BD entry
> + * with the lack of the valid bit being set, a #BR is generated.
> + * This is an indication that no BT exists for this entry. In this
> + * case the fault handler will allocate a new BT.
> + *
> + * With 32-bit mode, the size of BD is 4MB, and the size of each
> + * bound table is 16KB. With 64-bit mode, the size of BD is 2GB,
> + * and the size of each bound table is 4MB.
> + */
> +int do_mpx_bt_fault(struct xsave_struct *xsave_buf)
> +{
> +	unsigned long status;
> +	unsigned long bd_entry, bd_base;
> +
> +	bd_base = xsave_buf->bndcsr.cfg_reg_u & MPX_BNDCFG_ADDR_MASK;
> +	status = xsave_buf->bndcsr.status_reg;
> +
> +	bd_entry = status & MPX_BNDSTA_ADDR_MASK;
> +	if ((bd_entry < bd_base) ||
> +		(bd_entry >= bd_base + MPX_BD_SIZE_BYTES))
> +		return -EINVAL;
> +
> +	return allocate_bt((long __user *)bd_entry);
> +}

This needs a comment about how we got the address of the bd_entry.
Essentially just note that the hardware tells us where the missing/bad
entry is.

Would there be any value in ensuring that a VMA is present at bd_entry?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2014-09-24 14:41 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11  8:46 [PATCH v8 00/10] Intel MPX support Qiaowei Ren
2014-09-11  8:46 ` Qiaowei Ren
2014-09-11  8:46 ` [PATCH v8 01/10] x86, mpx: introduce VM_MPX to indicate that a VMA is MPX specific Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11  8:46 ` [PATCH v8 02/10] x86, mpx: add MPX specific mmap interface Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11  8:46 ` [PATCH v8 03/10] x86, mpx: add macro cpu_has_mpx Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11  8:46 ` [PATCH v8 04/10] x86, mpx: hook #BR exception handler to allocate bound tables Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-12 22:58   ` Dave Hansen
2014-09-12 22:58     ` Dave Hansen
2014-09-13  7:24     ` Ren, Qiaowei
2014-09-13  7:24       ` Ren, Qiaowei
2014-09-24 14:40   ` Dave Hansen
2014-09-24 14:40     ` Dave Hansen
2014-09-11  8:46 ` [PATCH v8 05/10] x86, mpx: extend siginfo structure to include bound violation information Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11  8:46 ` [PATCH v8 06/10] mips: sync struct siginfo with general version Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11 22:13   ` Thomas Gleixner
2014-09-11 22:13     ` Thomas Gleixner
2014-09-12  2:54     ` Ren, Qiaowei
2014-09-12  2:54       ` Ren, Qiaowei
2014-09-12  8:17       ` Thomas Gleixner
2014-09-12  8:17         ` Thomas Gleixner
2014-09-13  7:13         ` Ren, Qiaowei
2014-09-13  7:13           ` Ren, Qiaowei
2014-09-11  8:46 ` [PATCH v8 07/10] x86, mpx: decode MPX instruction to get bound violation information Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11 22:18   ` Thomas Gleixner
2014-09-11 22:18     ` Thomas Gleixner
2014-09-11 22:32     ` Dave Hansen
2014-09-11 22:32       ` Dave Hansen
2014-09-11 22:35       ` H. Peter Anvin
2014-09-11 22:35         ` H. Peter Anvin
2014-09-11 23:37         ` Thomas Gleixner
2014-09-11 23:37           ` Thomas Gleixner
2014-09-12  4:44           ` H. Peter Anvin
2014-09-12  4:44             ` H. Peter Anvin
2014-09-12 13:10             ` Thomas Gleixner
2014-09-12 13:10               ` Thomas Gleixner
2014-09-12 13:39               ` H. Peter Anvin
2014-09-12 13:39                 ` H. Peter Anvin
2014-09-12 17:48                 ` Thomas Gleixner
2014-09-12 17:48                   ` Thomas Gleixner
2014-09-12 17:52         ` Thomas Gleixner
2014-09-12 17:52           ` Thomas Gleixner
2014-09-12 19:07           ` H. Peter Anvin
2014-09-12 19:07             ` H. Peter Anvin
2014-09-11  8:46 ` [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER, PR_MPX_UNREGISTER Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11 15:03   ` Dave Hansen
2014-09-11 15:03     ` Dave Hansen
2014-09-12  3:10     ` Ren, Qiaowei
2014-09-12  3:10       ` Ren, Qiaowei
2014-09-11 23:28   ` Thomas Gleixner
2014-09-11 23:28     ` Thomas Gleixner
2014-09-12  0:10     ` Dave Hansen
2014-09-12  0:10       ` Dave Hansen
2014-09-12  8:11       ` Thomas Gleixner
2014-09-12  8:11         ` Thomas Gleixner
2014-09-12  9:24         ` Thomas Gleixner
2014-09-12  9:24           ` Thomas Gleixner
2014-09-12 14:36           ` Dave Hansen
2014-09-12 14:36             ` Dave Hansen
2014-09-12 17:34             ` Thomas Gleixner
2014-09-12 17:34               ` Thomas Gleixner
2014-09-12 18:42               ` Thomas Gleixner
2014-09-12 18:42                 ` Thomas Gleixner
2014-09-12 20:35                 ` Dave Hansen
2014-09-12 20:35                   ` Dave Hansen
2014-09-12 20:18               ` Dave Hansen
2014-09-12 20:18                 ` Dave Hansen
2014-09-13  9:01                 ` Thomas Gleixner
2014-09-13  9:01                   ` Thomas Gleixner
2014-09-12 15:22         ` Dave Hansen
2014-09-12 15:22           ` Dave Hansen
2014-09-12 17:42           ` Thomas Gleixner
2014-09-12 17:42             ` Thomas Gleixner
2014-09-12 20:33             ` Dave Hansen
2014-09-12 20:33               ` Dave Hansen
2014-09-15  0:00   ` One Thousand Gnomes
2014-09-15  0:00     ` One Thousand Gnomes
2014-09-16  3:20     ` Ren, Qiaowei
2014-09-16  3:20       ` Ren, Qiaowei
2014-09-16  4:17       ` Dave Hansen
2014-09-16  4:17         ` Dave Hansen
2014-09-16  7:50   ` Kevin Easton
2014-09-16  7:50     ` Kevin Easton
2014-09-18  0:40     ` Ren, Qiaowei
2014-09-18  0:40       ` Ren, Qiaowei
2014-09-18  3:23       ` Kevin Easton
2014-09-18  3:23         ` Kevin Easton
2014-09-18  2:37         ` Ren, Qiaowei
2014-09-18  2:37           ` Ren, Qiaowei
2014-09-18  4:43         ` Dave Hansen
2014-09-18  4:43           ` Dave Hansen
2014-09-18  7:17           ` Kevin Easton
2014-09-18  7:17             ` Kevin Easton
2014-09-18  6:20             ` Dave Hansen
2014-09-18  6:20               ` Dave Hansen
2014-09-11  8:46 ` [PATCH v8 09/10] x86, mpx: cleanup unused bound tables Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-11 14:59   ` Dave Hansen
2014-09-11 14:59     ` Dave Hansen
2014-09-12  3:02     ` Ren, Qiaowei
2014-09-12  3:02       ` Ren, Qiaowei
2014-09-12  4:59       ` Dave Hansen
2014-09-12  4:59         ` Dave Hansen
2014-09-15 20:53   ` Dave Hansen
2014-09-15 20:53     ` Dave Hansen
2014-09-16  8:06     ` Ren, Qiaowei
2014-09-16  8:06       ` Ren, Qiaowei
2014-09-11  8:46 ` [PATCH v8 10/10] x86, mpx: add documentation on Intel MPX Qiaowei Ren
2014-09-11  8:46   ` Qiaowei Ren
2014-09-12  0:51 ` [PATCH v8 00/10] Intel MPX support Dave Hansen
2014-09-12  0:51   ` Dave Hansen
2014-09-12 19:21   ` Thomas Gleixner
2014-09-12 19:21     ` Thomas Gleixner
2014-09-12 21:23     ` Dave Hansen
2014-09-12 21:23       ` Dave Hansen
2014-09-13  9:25       ` Thomas Gleixner
2014-09-13  9:25         ` Thomas Gleixner
2014-09-12 21:31     ` Dave Hansen
2014-09-12 21:31       ` Dave Hansen
2014-09-12 22:08     ` Dave Hansen
2014-09-12 22:08       ` Dave Hansen
2014-09-13  9:39       ` Thomas Gleixner
2014-09-13  9:39         ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.