linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK
@ 2017-08-04 19:07 riel
  2017-08-04 19:07 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: riel @ 2017-08-04 19:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

[resend because half the recipients got dropped due to IPv6 firewall issues]

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-04 19:07 [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
@ 2017-08-04 19:07 ` riel
  2017-08-04 19:25   ` Dave Hansen
  2017-08-04 19:07 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
  2017-08-04 23:44 ` [PATCH 0/2] mm,fork,security: " Kirill A. Shutemov
  2 siblings, 1 reply; 13+ messages in thread
From: riel @ 2017-08-04 19:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-04 19:07 [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
  2017-08-04 19:07 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
@ 2017-08-04 19:07 ` riel
  2017-08-04 23:09   ` Mike Kravetz
  2017-08-14 15:45   ` kbuild test robot
  2017-08-04 23:44 ` [PATCH 0/2] mm,fork,security: " Kirill A. Shutemov
  2 siblings, 2 replies; 13+ messages in thread
From: riel @ 2017-08-04 19:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

From: Rik van Riel <riel@redhat.com>

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCA!rtaigh <colm@allcosts.net>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/alpha/include/uapi/asm/mman.h     |  3 +++
 arch/mips/include/uapi/asm/mman.h      |  3 +++
 arch/parisc/include/uapi/asm/mman.h    |  3 +++
 arch/xtensa/include/uapi/asm/mman.h    |  3 +++
 fs/proc/task_mmu.c                     |  1 +
 include/linux/mm.h                     |  2 +-
 include/uapi/asm-generic/mman-common.h |  3 +++
 kernel/fork.c                          |  8 ++++++--
 mm/madvise.c                           |  8 ++++++++
 mm/memory.c                            | 10 ++++++++++
 10 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 02760f6e6ca4..2a708a792882 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -64,6 +64,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 655e2fb5395b..d59c57d60d7d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -91,6 +91,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 5979745815a5..e205e0179642 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -60,6 +60,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	70		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 71		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 72		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 #define MAP_VARIABLE	0
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 24365b30aae9..ed23e0a1b30d 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -103,6 +103,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd61ed87..2591e70216ff 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
 		[ilog2(VM_ARCH_1)]	= "ar",
+		[ilog2(VM_WIPEONFORK)]	= "wf",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY
 		[ilog2(VM_SOFTDIRTY)]	= "sd",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7550eeb06ccf..58788c1b9e9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -189,7 +189,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
-#define VM_ARCH_2	0x02000000
+#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 8c27db0c5c08..49e2b1d78093 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -58,6 +58,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..2dd0d0cae3bb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -628,7 +628,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 
 	prev = NULL;
 	for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
-		struct file *file;
+		struct file *file = NULL;
 
 		if (mpnt->vm_flags & VM_DONTCOPY) {
 			vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
@@ -658,7 +658,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
 		tmp->vm_next = tmp->vm_prev = NULL;
-		file = tmp->vm_file;
+
+		/* With VM_WIPEONFORK, the child gets an empty VMA. */
+		if (!(tmp->vm_flags & VM_WIPEONFORK))
+			file = tmp->vm_file;
+
 		if (file) {
 			struct inode *inode = file_inode(file);
 			struct address_space *mapping = file->f_mapping;
diff --git a/mm/madvise.c b/mm/madvise.c
index 9976852f1e1c..9e644c0ed4dc 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -80,6 +80,12 @@ static long madvise_behavior(struct vm_area_struct *vma,
 		}
 		new_flags &= ~VM_DONTCOPY;
 		break;
+	case MADV_WIPEONFORK:
+		new_flags |= VM_WIPEONFORK;
+		break;
+	case MADV_KEEPONFORK:
+		new_flags &= ~VM_WIPEONFORK;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -689,6 +695,8 @@ madvise_behavior_valid(int behavior)
 #endif
 	case MADV_DONTDUMP:
 	case MADV_DODUMP:
+	case MADV_WIPEONFORK:
+	case MADV_KEEPONFORK:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
diff --git a/mm/memory.c b/mm/memory.c
index 0e517be91a89..f9b0ad7feb57 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1134,6 +1134,16 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			!vma->anon_vma)
 		return 0;
 
+	/*
+	 * With VM_WIPEONFORK, the child inherits the VMA from the
+	 * parent, but not its contents.
+	 *
+	 * A child accessing VM_WIPEONFORK memory will see all zeroes;
+	 * a child accessing VM_DONTCOPY memory receives a segfault.
+	 */
+	if (vma->vm_flags & VM_WIPEONFORK)
+		return 0;
+
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-04 19:07 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
@ 2017-08-04 19:25   ` Dave Hansen
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Hansen @ 2017-08-04 19:25 UTC (permalink / raw)
  To: riel, linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

On 08/04/2017 12:07 PM, riel@redhat.com wrote:
> MPX only seems to be available on 64 bit CPUs, starting with Skylake
> and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
> in order to free up a VMA flag.

Makes me a little sad.  But, seems worth it.

Acked-by: Dave Hansen <dave.hansen@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-04 19:07 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
@ 2017-08-04 23:09   ` Mike Kravetz
  2017-08-05 14:05     ` Rik van Riel
  2017-08-14 15:45   ` kbuild test robot
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2017-08-04 23:09 UTC (permalink / raw)
  To: riel, linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

On 08/04/2017 12:07 PM, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.
> 
> This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
> 
>     https://man.openbsd.org/minherit.2
> 
> Reported-by: Florian Weimer <fweimer@redhat.com>
> Reported-by: Colm MacCA!rtaigh <colm@allcosts.net>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h     |  3 +++
>  arch/mips/include/uapi/asm/mman.h      |  3 +++
>  arch/parisc/include/uapi/asm/mman.h    |  3 +++
>  arch/xtensa/include/uapi/asm/mman.h    |  3 +++
>  fs/proc/task_mmu.c                     |  1 +
>  include/linux/mm.h                     |  2 +-
>  include/uapi/asm-generic/mman-common.h |  3 +++
>  kernel/fork.c                          |  8 ++++++--
>  mm/madvise.c                           |  8 ++++++++
>  mm/memory.c                            | 10 ++++++++++
>  10 files changed, 41 insertions(+), 3 deletions(-)
> 

This didn't seem 'quite right' to me for shared mappings and/or file
backed mappings.  I wasn't exactly sure what it 'should' do in such
cases.  So, I tried it with a mapping created as follows:

addr = mmap(ADDR, page_size,
                        PROT_READ | PROT_WRITE,
                        MAP_ANONYMOUS|MAP_SHARED, -1, 0);

When setting MADV_WIPEONFORK on the vma/mapping, I got the following
at task exit time:

[  694.558290] ------------[ cut here ]------------
[  694.558978] kernel BUG at mm/filemap.c:212!
[  694.559476] invalid opcode: 0000 [#1] SMP
[  694.560023] Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_conntrack ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle 9p iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq ppdev snd_seq_device joydev crct10dif_pclmul crc32_pclmul crc32c_intel snd_pcm ghash_clmulni_intel 9pnet_virtio virtio_balloon snd_timer 9pnet parport_pc snd parport i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net virtio_blk virtio_console 8139too qxl drm_kms_helper ttm drm serio_raw 8139cp
[  694.571554]  mii virtio_pci ata_generic virtio_ring virtio pata_acpi
[  694.572608] CPU: 3 PID: 1200 Comm: test_wipe2 Not tainted 4.13.0-rc3+ #8
[  694.573778] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[  694.574917] task: ffff880137178040 task.stack: ffffc900019d4000
[  694.575650] RIP: 0010:__delete_from_page_cache+0x344/0x410
[  694.576409] RSP: 0018:ffffc900019d7a88 EFLAGS: 00010082
[  694.577238] RAX: 0000000000000021 RBX: ffffea00047d0e00 RCX: 0000000000000006
[  694.578537] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88023fd0db90
[  694.579774] RBP: ffffc900019d7ad8 R08: 00000000000882b6 R09: 000000000000028a
[  694.580754] R10: ffffc900019d7da8 R11: ffffffff8211184d R12: ffffea00047d0e00
[  694.582040] R13: 0000000000000000 R14: 0000000000000202 R15: ffff8801384439e8
[  694.583236] FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
[  694.584607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  694.585409] CR2: 00007ff77a8da618 CR3: 0000000001e09000 CR4: 00000000001406e0
[  694.586547] Call Trace:
[  694.586996]  delete_from_page_cache+0x54/0x110
[  694.587481]  truncate_inode_page+0xab/0x120
[  694.588110]  shmem_undo_range+0x498/0xa50
[  694.588813]  ? save_stack_trace+0x1b/0x20
[  694.589529]  ? set_track+0x70/0x140
[  694.590150]  ? init_object+0x69/0xa0
[  694.590722]  ? __inode_wait_for_writeback+0x73/0xe0
[  694.591525]  shmem_truncate_range+0x16/0x40
[  694.592268]  shmem_evict_inode+0xb1/0x190
[  694.592735]  evict+0xbb/0x1c0
[  694.593147]  iput+0x1c0/0x210
[  694.593497]  dentry_unlink_inode+0xb4/0x150
[  694.593982]  __dentry_kill+0xc1/0x150
[  694.594400]  dput+0x1c8/0x1e0
[  694.594745]  __fput+0x172/0x1e0
[  694.595103]  ____fput+0xe/0x10
[  694.595463]  task_work_run+0x80/0xa0
[  694.595886]  do_exit+0x2d6/0xb50
[  694.596323]  ? __do_page_fault+0x288/0x4a0
[  694.596818]  do_group_exit+0x47/0xb0
[  694.597249]  SyS_exit_group+0x14/0x20
[  694.597682]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  694.598198] RIP: 0033:0x7ff77a5e78c8
[  694.598612] RSP: 002b:00007ffc5aece318 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  694.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff77a5e78c8
[  694.600609] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[  694.601424] RBP: 00007ff77a8da618 R08: 00000000000000e7 R09: ffffffffffffff98
[  694.602224] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000001
[  694.603151] R13: 00007ff77a8dbc60 R14: 0000000000000000 R15: 0000000000000000
[  694.603984] Code: 60 f3 c5 81 e8 2e 7e 03 00 0f 0b 48 c7 c6 60 f3 c5 81 4c 89 e7 e8 1d 7e 03 00 0f 0b 48 c7 c6 00 f4 c5 81 4c 89 e7 e8 0c 7e 03 00 <0f> 0b 48 c7 c6 38 f3 c5 81 4c 89 e7 e8 fb 7d 03 00 0f 0b 48 c7 
[  694.606500] RIP: __delete_from_page_cache+0x344/0x410 RSP: ffffc900019d7a88
[  694.607426] ---[ end trace 55e6b04ae95d8ce3 ]---

BTW, this was on 4.13.0-rc3 + your patches.  Simple test program is below.

-- 
Mike Kravetz


#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <errno.h>

#define MADV_WIPEONFORK 18
#define ADDR (void *)(0x0UL)

int main(int argc, char ** argv)
{
	unsigned long page_size;
	int ret;
	void *addr;
	char foo;

	page_size = sysconf(_SC_PAGE_SIZE);

	addr = mmap(ADDR, page_size,
			PROT_READ | PROT_WRITE,
			MAP_ANONYMOUS|MAP_SHARED, -1, 0);
	if (addr == MAP_FAILED) {
		perror("mmap");
		exit (1);
	}

	printf("Parent writing 'a' to page\n");
	*((char *)addr) = 'a'; 

	ret = madvise(addr, page_size, MADV_WIPEONFORK);
	if (ret) {
		perror("madvise");
		exit (1);
	}

	if (fork()) {
		/* In parent */
		sleep(1);
	} else {
		/* In child */
		foo = *((char *)addr);
		printf("child read '%c' from page\n", foo);
	}

	return ret;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-04 19:07 [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
  2017-08-04 19:07 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
  2017-08-04 19:07 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
@ 2017-08-04 23:44 ` Kirill A. Shutemov
  2017-08-05 15:21   ` Rik van Riel
  2 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2017-08-04 23:44 UTC (permalink / raw)
  To: riel
  Cc: linux-kernel, linux-mm, fweimer, colm, akpm, rppt, keescook,
	luto, wad, mingo

On Fri, Aug 04, 2017 at 03:07:28PM -0400, riel@redhat.com wrote:
> [resend because half the recipients got dropped due to IPv6 firewall issues]
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.

I feel like we are repeating mistake we made with MADV_DONTNEED.

MADV_WIPEONFORK would require a specific action from kernel, ignoring
the /advise/ would likely lead to application misbehaviour.

Is it something we really want to see from madvise()?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-04 23:09   ` Mike Kravetz
@ 2017-08-05 14:05     ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2017-08-05 14:05 UTC (permalink / raw)
  To: Mike Kravetz, linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

On Fri, 2017-08-04 at 16:09 -0700, Mike Kravetz wrote:
> On 08/04/2017 12:07 PM, riel@redhat.com wrote:
> > From: Rik van Riel <riel@redhat.com>
> > 
> > Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> > empty in the child process after fork. This differs from
> > MADV_DONTFORK
> > in one important way.
> > 
> > If a child process accesses memory that was MADV_WIPEONFORK, it
> > will get zeroes. The address ranges are still valid, they are just
> > empty.
> > 
> This didn't seem 'quite right' to me for shared mappings and/or file
> backed mappings.A A I wasn't exactly sure what it 'should' do in such
> cases.A A So, I tried it with a mapping created as follows:
> 
> addr = mmap(ADDR, page_size,
> A A A A A A A A A A A A A A A A A A A A A A A A PROT_READ | PROT_WRITE,
> A A A A A A A A A A A A A A A A A A A A A A A A MAP_ANONYMOUS|MAP_SHARED, -1, 0);

Your test program is pretty much the same I used, except I
used MAP_PRIVATE instead of MAP_SHARED.

Let me see how the code paths differ for both cases...


> When setting MADV_WIPEONFORK on the vma/mapping, I got the following
> at task exit time:
> 
> [A A 694.558290] ------------[ cut here ]------------
> [A A 694.558978] kernel BUG at mm/filemap.c:212!
> [A A 694.559476] invalid opcode: 0000 [#1] SMP
> [A A 694.560023] Modules linked in: ip6t_REJECT nf_reject_ipv6
> ip6t_rpfilter xt_conntrack ebtable_broute bridge stp llc ebtable_nat
> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_raw ip6table_mangle ip6table_security iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> iptable_raw iptable_mangle 9p iptable_security ebtable_filter
> ebtables ip6table_filter ip6_tables snd_hda_codec_generic
> snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq ppdev
> snd_seq_device joydev crct10dif_pclmul crc32_pclmul crc32c_intel
> snd_pcm ghash_clmulni_intel 9pnet_virtio virtio_balloon snd_timer
> 9pnet parport_pc snd parport i2c_piix4 soundcore nfsd auth_rpcgss
> nfs_acl lockd grace sunrpc virtio_net virtio_blk virtio_console
> 8139too qxl drm_kms_helper ttm drm serio_raw 8139cp
> [A A 694.571554]A A mii virtio_pci ata_generic virtio_ring virtio
> pata_acpi
> [A A 694.572608] CPU: 3 PID: 1200 Comm: test_wipe2 Not tainted 4.13.0-
> rc3+ #8
> [A A 694.573778] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.1-1.fc24 04/01/2014
> [A A 694.574917] task: ffff880137178040 task.stack: ffffc900019d4000
> [A A 694.575650] RIP: 0010:__delete_from_page_cache+0x344/0x410
> [A A 694.576409] RSP: 0018:ffffc900019d7a88 EFLAGS: 00010082
> [A A 694.577238] RAX: 0000000000000021 RBX: ffffea00047d0e00 RCX:
> 0000000000000006
> [A A 694.578537] RDX: 0000000000000000 RSI: 0000000000000096 RDI:
> ffff88023fd0db90
> [A A 694.579774] RBP: ffffc900019d7ad8 R08: 00000000000882b6 R09:
> 000000000000028a
> [A A 694.580754] R10: ffffc900019d7da8 R11: ffffffff8211184d R12:
> ffffea00047d0e00
> [A A 694.582040] R13: 0000000000000000 R14: 0000000000000202 R15:
> ffff8801384439e8
> [A A 694.583236] FS:A A 0000000000000000(0000) GS:ffff88023fd00000(0000)
> knlGS:0000000000000000
> [A A 694.584607] CS:A A 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [A A 694.585409] CR2: 00007ff77a8da618 CR3: 0000000001e09000 CR4:
> 00000000001406e0
> [A A 694.586547] Call Trace:
> [A A 694.586996]A A delete_from_page_cache+0x54/0x110
> [A A 694.587481]A A truncate_inode_page+0xab/0x120
> [A A 694.588110]A A shmem_undo_range+0x498/0xa50
> [A A 694.588813]A A ? save_stack_trace+0x1b/0x20
> [A A 694.589529]A A ? set_track+0x70/0x140
> [A A 694.590150]A A ? init_object+0x69/0xa0
> [A A 694.590722]A A ? __inode_wait_for_writeback+0x73/0xe0
> [A A 694.591525]A A shmem_truncate_range+0x16/0x40
> [A A 694.592268]A A shmem_evict_inode+0xb1/0x190
> [A A 694.592735]A A evict+0xbb/0x1c0
> [A A 694.593147]A A iput+0x1c0/0x210
> [A A 694.593497]A A dentry_unlink_inode+0xb4/0x150
> [A A 694.593982]A A __dentry_kill+0xc1/0x150
> [A A 694.594400]A A dput+0x1c8/0x1e0
> [A A 694.594745]A A __fput+0x172/0x1e0
> [A A 694.595103]A A ____fput+0xe/0x10
> [A A 694.595463]A A task_work_run+0x80/0xa0
> [A A 694.595886]A A do_exit+0x2d6/0xb50
> [A A 694.596323]A A ? __do_page_fault+0x288/0x4a0
> [A A 694.596818]A A do_group_exit+0x47/0xb0
> [A A 694.597249]A A SyS_exit_group+0x14/0x20
> [A A 694.597682]A A entry_SYSCALL_64_fastpath+0x1a/0xa5
> [A A 694.598198] RIP: 0033:0x7ff77a5e78c8
> [A A 694.598612] RSP: 002b:00007ffc5aece318 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000e7
> [A A 694.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff77a5e78c8
> [A A 694.600609] RDX: 0000000000000000 RSI: 000000000000003c RDI:
> 0000000000000000
> [A A 694.601424] RBP: 00007ff77a8da618 R08: 00000000000000e7 R09:
> ffffffffffffff98
> [A A 694.602224] R10: 0000000000000003 R11: 0000000000000246 R12:
> 0000000000000001
> [A A 694.603151] R13: 00007ff77a8dbc60 R14: 0000000000000000 R15:
> 0000000000000000
> [A A 694.603984] Code: 60 f3 c5 81 e8 2e 7e 03 00 0f 0b 48 c7 c6 60 f3
> c5 81 4c 89 e7 e8 1d 7e 03 00 0f 0b 48 c7 c6 00 f4 c5 81 4c 89 e7 e8
> 0c 7e 03 00 <0f> 0b 48 c7 c6 38 f3 c5 81 4c 89 e7 e8 fb 7d 03 00 0f
> 0b 48 c7A 
> [A A 694.606500] RIP: __delete_from_page_cache+0x344/0x410 RSP:
> ffffc900019d7a88
> [A A 694.607426] ---[ end trace 55e6b04ae95d8ce3 ]---
> 
> BTW, this was on 4.13.0-rc3 + your patches.A A Simple test program is
> below.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK
  2017-08-04 23:44 ` [PATCH 0/2] mm,fork,security: " Kirill A. Shutemov
@ 2017-08-05 15:21   ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2017-08-05 15:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, fweimer, colm, akpm, rppt, keescook,
	luto, wad, mingo

On Sat, 2017-08-05 at 02:44 +0300, Kirill A. Shutemov wrote:
> On Fri, Aug 04, 2017 at 03:07:28PM -0400, riel@redhat.com wrote:
> > [resend because half the recipients got dropped due to IPv6
> > firewall issues]
> > 
> > Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> > empty in the child process after fork. This differs from
> > MADV_DONTFORK
> > in one important way.
> > 
> > If a child process accesses memory that was MADV_WIPEONFORK, it
> > will get zeroes. The address ranges are still valid, they are just
> > empty.
> 
> I feel like we are repeating mistake we made with MADV_DONTNEED.
> 
> MADV_WIPEONFORK would require a specific action from kernel, ignoring
> the /advise/ would likely lead to application misbehaviour.
> 
> Is it something we really want to see from madvise()?

We already have various mandatory madvise behaviors in Linux,
including MADV_REMOVE, MADV_DONTFORK, and MADV_DONTDUMP.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-04 19:07 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
  2017-08-04 23:09   ` Mike Kravetz
@ 2017-08-14 15:45   ` kbuild test robot
  1 sibling, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2017-08-14 15:45 UTC (permalink / raw)
  To: riel
  Cc: kbuild-all, linux-kernel, linux-mm, fweimer, colm, akpm, rppt,
	keescook, luto, wad, mingo

[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]

Hi Rik,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/riel-redhat-com/mm-fork-security-introduce-MADV_WIPEONFORK/20170806-035914
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   In file included from mm/debug.c:12:0:
>> include/trace/events/mmflags.h:131:31: error: 'VM_ARCH_2' undeclared here (not in a function)
    #define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2, "arch_2" }
                                  ^
   include/trace/events/mmflags.h:165:2: note: in expansion of macro '__VM_ARCH_SPECIFIC_2'
     __VM_ARCH_SPECIFIC_2    ,  \
     ^
   mm/debug.c:39:2: note: in expansion of macro '__def_vmaflag_names'
     __def_vmaflag_names,
     ^

vim +/VM_ARCH_2 +131 include/trace/events/mmflags.h

bcf669179 Kirill A. Shutemov 2016-03-17  127  
bcf669179 Kirill A. Shutemov 2016-03-17  128  #if defined(CONFIG_X86)
bcf669179 Kirill A. Shutemov 2016-03-17  129  #define __VM_ARCH_SPECIFIC_2 {VM_MPX,		"mpx"		}
bcf669179 Kirill A. Shutemov 2016-03-17  130  #else
bcf669179 Kirill A. Shutemov 2016-03-17 @131  #define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2,	"arch_2"	}
420adbe9f Vlastimil Babka    2016-03-15  132  #endif
420adbe9f Vlastimil Babka    2016-03-15  133  

:::::: The code at line 131 was first introduced by commit
:::::: bcf6691797f425b301f629bb783b7ff2d0bcfa5a mm, tracing: refresh __def_vmaflag_names

:::::: TO: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 12050 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-11 21:28 [PATCH v4 " riel
@ 2017-08-11 21:28 ` riel
  0 siblings, 0 replies; 13+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-11 19:19 [PATCH v3 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
@ 2017-08-11 19:19 ` riel
  0 siblings, 0 replies; 13+ messages in thread
From: riel @ 2017-08-11 19:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-06 14:04 [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
@ 2017-08-06 14:04 ` riel
  0 siblings, 0 replies; 13+ messages in thread
From: riel @ 2017-08-06 14:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: mike.kravetz, linux-mm, fweimer, colm, akpm, keescook, luto, wad,
	mingo, kirill, dave.hansen

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-04 19:01 [PATCH 0/2] mm,fork: MADV_WIPEONFORK - an empty VMA in the child riel
@ 2017-08-04 19:01 ` riel
  0 siblings, 0 replies; 13+ messages in thread
From: riel @ 2017-08-04 19:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, fweimer, colm, akpm, rppt, keescook, luto, wad, mingo

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-14 15:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04 19:07 [PATCH 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
2017-08-04 19:07 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
2017-08-04 19:25   ` Dave Hansen
2017-08-04 19:07 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
2017-08-04 23:09   ` Mike Kravetz
2017-08-05 14:05     ` Rik van Riel
2017-08-14 15:45   ` kbuild test robot
2017-08-04 23:44 ` [PATCH 0/2] mm,fork,security: " Kirill A. Shutemov
2017-08-05 15:21   ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2017-08-11 21:28 [PATCH v4 " riel
2017-08-11 21:28 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
2017-08-11 19:19 [PATCH v3 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
2017-08-11 19:19 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
2017-08-06 14:04 [PATCH v2 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
2017-08-06 14:04 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
2017-08-04 19:01 [PATCH 0/2] mm,fork: MADV_WIPEONFORK - an empty VMA in the child riel
2017-08-04 19:01 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).