All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] mm,fork,security: introduce MADV_WIPEONFORK
@ 2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
  0 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

v4: don't clone anon vma chains, move all fork logic to dup_mmap
v3: simplify implementation, limit to anonymous, private mappings
v2: fix kbuild warnings

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 0/2] mm,fork,security: introduce MADV_WIPEONFORK
@ 2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
  0 siblings, 0 replies; 25+ messages in thread
From: riel-H+wXaHxf7aLQT0dZR+AlfA @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	mike.kravetz-QHcLZuEGTsvQT0dZR+AlfA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, fweimer-H+wXaHxf7aLQT0dZR+AlfA,
	colm-ZXBCfW2eEe/k1uMJSBkQmQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	keescook-F7+t8E8rja9g9hUCZPvPmw, luto-kltTT9wpgjJwATOyAt5JVQ,
	wad-F7+t8E8rja9g9hUCZPvPmw, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	willy-wEGCiKHe2LqWVfeAwA7xHQ

v4: don't clone anon vma chains, move all fork logic to dup_mmap
v3: simplify implementation, limit to anonymous, private mappings
v2: fix kbuild warnings

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 0/2] mm,fork,security: introduce MADV_WIPEONFORK
@ 2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
  0 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

v4: don't clone anon vma chains, move all fork logic to dup_mmap
v3: simplify implementation, limit to anonymous, private mappings
v2: fix kbuild warnings

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
  2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
@ 2017-08-11 21:28   ` riel
  -1 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag
@ 2017-08-11 21:28   ` riel
  0 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

MPX only seems to be available on 64 bit CPUs, starting with Skylake
and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags,
in order to free up a VMA flag.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
---
 arch/x86/Kconfig   | 4 +++-
 include/linux/mm.h | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..6dff14fadc6f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1756,7 +1756,9 @@ config X86_SMAP
 config X86_INTEL_MPX
 	prompt "Intel MPX (Memory Protection Extensions)"
 	def_bool n
-	depends on CPU_SUP_INTEL
+	# Note: only available in 64-bit mode due to VMA flags shortage
+	depends on CPU_SUP_INTEL && X86_64
+	select ARCH_USES_HIGH_VMA_FLAGS
 	---help---
 	  MPX provides hardware features that can be used in
 	  conjunction with compiler-instrumented code to check
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..7550eeb06ccf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -208,10 +208,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_BIT_1	33	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_2	34	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_BIT_3	35	/* bit only usable on 64-bit architectures */
+#define VM_HIGH_ARCH_BIT_4	36	/* bit only usable on 64-bit architectures */
 #define VM_HIGH_ARCH_0	BIT(VM_HIGH_ARCH_BIT_0)
 #define VM_HIGH_ARCH_1	BIT(VM_HIGH_ARCH_BIT_1)
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
+#define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #if defined(CONFIG_X86)
@@ -235,9 +237,11 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_MAPPED_COPY	VM_ARCH_1	/* T if mapped copy of data (nommu mmap) */
 #endif
 
-#if defined(CONFIG_X86)
+#if defined(CONFIG_X86_INTEL_MPX)
 /* MPX specific bounds table or bounds directory */
-# define VM_MPX		VM_ARCH_2
+# define VM_MPX		VM_HIGH_ARCH_BIT_4
+#else
+# define VM_MPX		VM_NONE
 #endif
 
 #ifndef VM_GROWSUP
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
@ 2017-08-11 21:28   ` riel
  -1 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCártaigh <colm@allcosts.net>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/alpha/include/uapi/asm/mman.h     |  3 +++
 arch/mips/include/uapi/asm/mman.h      |  3 +++
 arch/parisc/include/uapi/asm/mman.h    |  3 +++
 arch/xtensa/include/uapi/asm/mman.h    |  3 +++
 fs/proc/task_mmu.c                     |  1 +
 include/linux/mm.h                     |  2 +-
 include/trace/events/mmflags.h         |  8 +-------
 include/uapi/asm-generic/mman-common.h |  3 +++
 kernel/fork.c                          | 10 ++++++++--
 mm/madvise.c                           | 13 +++++++++++++
 10 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 02760f6e6ca4..2a708a792882 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -64,6 +64,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 655e2fb5395b..d59c57d60d7d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -91,6 +91,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 5979745815a5..e205e0179642 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -60,6 +60,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	70		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 71		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 72		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 #define MAP_VARIABLE	0
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 24365b30aae9..ed23e0a1b30d 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -103,6 +103,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd61ed87..2591e70216ff 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
 		[ilog2(VM_ARCH_1)]	= "ar",
+		[ilog2(VM_WIPEONFORK)]	= "wf",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY
 		[ilog2(VM_SOFTDIRTY)]	= "sd",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7550eeb06ccf..58788c1b9e9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -189,7 +189,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
-#define VM_ARCH_2	0x02000000
+#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 8e50d01c645f..4c2e4737d7bc 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -125,12 +125,6 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
 #define __VM_ARCH_SPECIFIC_1 {VM_ARCH_1,	"arch_1"	}
 #endif
 
-#if defined(CONFIG_X86)
-#define __VM_ARCH_SPECIFIC_2 {VM_MPX,		"mpx"		}
-#else
-#define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2,	"arch_2"	}
-#endif
-
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define IF_HAVE_VM_SOFTDIRTY(flag,name) {flag, name },
 #else
@@ -162,7 +156,7 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
 	{VM_NORESERVE,			"noreserve"	},		\
 	{VM_HUGETLB,			"hugetlb"	},		\
 	__VM_ARCH_SPECIFIC_1				,		\
-	__VM_ARCH_SPECIFIC_2				,		\
+	{VM_WIPEONFORK,			"wipeonfork"	},		\
 	{VM_DONTDUMP,			"dontdump"	},		\
 IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 	{VM_MIXEDMAP,			"mixedmap"	},		\
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 8c27db0c5c08..49e2b1d78093 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -58,6 +58,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..06376ae4877d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -654,7 +654,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		retval = dup_userfaultfd(tmp, &uf);
 		if (retval)
 			goto fail_nomem_anon_vma_fork;
-		if (anon_vma_fork(tmp, mpnt))
+		if (tmp->vm_flags & VM_WIPEONFORK) {
+			/* VM_WIPEONFORK gets a clean slate in the child. */
+			tmp->anon_vma = NULL;
+			if (anon_vma_prepare(tmp))
+				goto fail_nomem_anon_vma_fork;
+		} else if (anon_vma_fork(tmp, mpnt))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
 		tmp->vm_next = tmp->vm_prev = NULL;
@@ -698,7 +703,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		retval = copy_page_range(mm, oldmm, mpnt);
+		if (!(tmp->vm_flags & VM_WIPEONFORK))
+			retval = copy_page_range(mm, oldmm, mpnt);
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
diff --git a/mm/madvise.c b/mm/madvise.c
index 9976852f1e1c..9b82cfa88ccf 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
 		}
 		new_flags &= ~VM_DONTCOPY;
 		break;
+	case MADV_WIPEONFORK:
+		/* MADV_WIPEONFORK is only supported on anonymous memory. */
+		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags |= VM_WIPEONFORK;
+		break;
+	case MADV_KEEPONFORK:
+		new_flags &= ~VM_WIPEONFORK;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -689,6 +700,8 @@ madvise_behavior_valid(int behavior)
 #endif
 	case MADV_DONTDUMP:
 	case MADV_DODUMP:
+	case MADV_WIPEONFORK:
+	case MADV_KEEPONFORK:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-11 21:28   ` riel
  0 siblings, 0 replies; 25+ messages in thread
From: riel @ 2017-08-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: mhocko, mike.kravetz, linux-mm, fweimer, colm, akpm, keescook,
	luto, wad, mingo, kirill, dave.hansen, linux-api, torvalds,
	willy

From: Rik van Riel <riel@redhat.com>

Introduce MADV_WIPEONFORK semantics, which result in a VMA being
empty in the child process after fork. This differs from MADV_DONTFORK
in one important way.

If a child process accesses memory that was MADV_WIPEONFORK, it
will get zeroes. The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will
get a segmentation fault, since those address ranges are no longer
valid in the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large
programs to fork in systems with strict memory overcommit restrictions,
changing the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and
want to know that they need to regenerate it in the child process
after fork.

Examples of this would be:
- systemd/pulseaudio API checks (fail after fork)
  (replacing a getpid check, which is too slow without a PID cache)
- PKCS#11 API reinitialization check (mandated by specification)
- glibc's upcoming PRNG (reseed after fork)
- OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized
PRNG in every child process are pretty obvious. However, due to
libraries having all kinds of internal state, and programs getting
compiled with many different versions of each library, it is
unreasonable to expect calling programs to re-initialize everything
manually after fork.

A further complication is the proliferation of clone flags,
programs bypassing glibc's functions to call clone directly,
and programs calling unshare, causing the glibc pthread_atfork
hook to not get called.

It would be better to have the kernel take care of this automatically.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

Reported-by: Florian Weimer <fweimer@redhat.com>
Reported-by: Colm MacCA!rtaigh <colm@allcosts.net>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/alpha/include/uapi/asm/mman.h     |  3 +++
 arch/mips/include/uapi/asm/mman.h      |  3 +++
 arch/parisc/include/uapi/asm/mman.h    |  3 +++
 arch/xtensa/include/uapi/asm/mman.h    |  3 +++
 fs/proc/task_mmu.c                     |  1 +
 include/linux/mm.h                     |  2 +-
 include/trace/events/mmflags.h         |  8 +-------
 include/uapi/asm-generic/mman-common.h |  3 +++
 kernel/fork.c                          | 10 ++++++++--
 mm/madvise.c                           | 13 +++++++++++++
 10 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 02760f6e6ca4..2a708a792882 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -64,6 +64,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 655e2fb5395b..d59c57d60d7d 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -91,6 +91,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 5979745815a5..e205e0179642 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -60,6 +60,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	70		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 71		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 72		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 #define MAP_VARIABLE	0
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 24365b30aae9..ed23e0a1b30d 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -103,6 +103,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b836fd61ed87..2591e70216ff 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_NORESERVE)]	= "nr",
 		[ilog2(VM_HUGETLB)]	= "ht",
 		[ilog2(VM_ARCH_1)]	= "ar",
+		[ilog2(VM_WIPEONFORK)]	= "wf",
 		[ilog2(VM_DONTDUMP)]	= "dd",
 #ifdef CONFIG_MEM_SOFT_DIRTY
 		[ilog2(VM_SOFTDIRTY)]	= "sd",
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7550eeb06ccf..58788c1b9e9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -189,7 +189,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
-#define VM_ARCH_2	0x02000000
+#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 8e50d01c645f..4c2e4737d7bc 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -125,12 +125,6 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
 #define __VM_ARCH_SPECIFIC_1 {VM_ARCH_1,	"arch_1"	}
 #endif
 
-#if defined(CONFIG_X86)
-#define __VM_ARCH_SPECIFIC_2 {VM_MPX,		"mpx"		}
-#else
-#define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2,	"arch_2"	}
-#endif
-
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define IF_HAVE_VM_SOFTDIRTY(flag,name) {flag, name },
 #else
@@ -162,7 +156,7 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
 	{VM_NORESERVE,			"noreserve"	},		\
 	{VM_HUGETLB,			"hugetlb"	},		\
 	__VM_ARCH_SPECIFIC_1				,		\
-	__VM_ARCH_SPECIFIC_2				,		\
+	{VM_WIPEONFORK,			"wipeonfork"	},		\
 	{VM_DONTDUMP,			"dontdump"	},		\
 IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 	{VM_MIXEDMAP,			"mixedmap"	},		\
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 8c27db0c5c08..49e2b1d78093 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -58,6 +58,9 @@
 					   overrides the coredump filter bits */
 #define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
 
+#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
+#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..06376ae4877d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -654,7 +654,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		retval = dup_userfaultfd(tmp, &uf);
 		if (retval)
 			goto fail_nomem_anon_vma_fork;
-		if (anon_vma_fork(tmp, mpnt))
+		if (tmp->vm_flags & VM_WIPEONFORK) {
+			/* VM_WIPEONFORK gets a clean slate in the child. */
+			tmp->anon_vma = NULL;
+			if (anon_vma_prepare(tmp))
+				goto fail_nomem_anon_vma_fork;
+		} else if (anon_vma_fork(tmp, mpnt))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
 		tmp->vm_next = tmp->vm_prev = NULL;
@@ -698,7 +703,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		retval = copy_page_range(mm, oldmm, mpnt);
+		if (!(tmp->vm_flags & VM_WIPEONFORK))
+			retval = copy_page_range(mm, oldmm, mpnt);
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
diff --git a/mm/madvise.c b/mm/madvise.c
index 9976852f1e1c..9b82cfa88ccf 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
 		}
 		new_flags &= ~VM_DONTCOPY;
 		break;
+	case MADV_WIPEONFORK:
+		/* MADV_WIPEONFORK is only supported on anonymous memory. */
+		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags |= VM_WIPEONFORK;
+		break;
+	case MADV_KEEPONFORK:
+		new_flags &= ~VM_WIPEONFORK;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -689,6 +700,8 @@ madvise_behavior_valid(int behavior)
 #endif
 	case MADV_DONTDUMP:
 	case MADV_DODUMP:
+	case MADV_WIPEONFORK:
+	case MADV_KEEPONFORK:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
-- 
2.9.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-11 21:28   ` riel
@ 2017-08-15 22:51     ` Andrew Morton
  -1 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-15 22:51 UTC (permalink / raw)
  To: riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 11 Aug 2017 17:28:29 -0400 riel@redhat.com wrote:

> From: Rik van Riel <riel@redhat.com>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> MADV_WIPEONFORK only works on private, anonymous VMAs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.

I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of a
prior MADV_WIPEONFORK." here.

I guess it isn't worth mentioning that these things can cause VMA
merges and splits. 

> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		}
>  		new_flags &= ~VM_DONTCOPY;
>  		break;
> +	case MADV_WIPEONFORK:
> +		/* MADV_WIPEONFORK is only supported on anonymous memory. */
> +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags |= VM_WIPEONFORK;
> +		break;
> +	case MADV_KEEPONFORK:
> +		new_flags &= ~VM_WIPEONFORK;
> +		break;
>  	case MADV_DONTDUMP:
>  		new_flags |= VM_DONTDUMP;
>  		break;

It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-15 22:51     ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-15 22:51 UTC (permalink / raw)
  To: riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 11 Aug 2017 17:28:29 -0400 riel@redhat.com wrote:

> From: Rik van Riel <riel@redhat.com>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> MADV_WIPEONFORK only works on private, anonymous VMAs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.

I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of a
prior MADV_WIPEONFORK." here.

I guess it isn't worth mentioning that these things can cause VMA
merges and splits. 

> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		}
>  		new_flags &= ~VM_DONTCOPY;
>  		break;
> +	case MADV_WIPEONFORK:
> +		/* MADV_WIPEONFORK is only supported on anonymous memory. */
> +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags |= VM_WIPEONFORK;
> +		break;
> +	case MADV_KEEPONFORK:
> +		new_flags &= ~VM_WIPEONFORK;
> +		break;
>  	case MADV_DONTDUMP:
>  		new_flags |= VM_DONTDUMP;
>  		break;

It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-15 22:51     ` Andrew Morton
  (?)
@ 2017-08-16  2:18       ` Rik van Riel
  -1 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-16  2:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Tue, 2017-08-15 at 15:51 -0700, Andrew Morton wrote:
> On Fri, 11 Aug 2017 17:28:29 -0400 riel@redhat.com wrote:
> 
> > A further complication is the proliferation of clone flags,
> > programs bypassing glibc's functions to call clone directly,
> > and programs calling unshare, causing the glibc pthread_atfork
> > hook to not get called.
> > 
> > It would be better to have the kernel take care of this
> > automatically.
> 
> I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of
> a
> prior MADV_WIPEONFORK." here.
> 
> I guess it isn't worth mentioning that these things can cause VMA
> merges and splits. 

That's the same as every other Linux specific madvise operation.

> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > vm_area_struct *vma,
> >  		}
> >  		new_flags &= ~VM_DONTCOPY;
> >  		break;
> > +	case MADV_WIPEONFORK:
> > +		/* MADV_WIPEONFORK is only supported on anonymous
> > memory. */
> > +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> > +			error = -EINVAL;
> > +			goto out;
> > +		}
> > +		new_flags |= VM_WIPEONFORK;
> > +		break;
> > +	case MADV_KEEPONFORK:
> > +		new_flags &= ~VM_WIPEONFORK;
> > +		break;
> >  	case MADV_DONTDUMP:
> >  		new_flags |= VM_DONTDUMP;
> >  		break;
> 
> It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

Given that the only way to set VM_WIPEONFORK is through
MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
other-than-anon vma would be equivalent to a noop.

If new_flags == vma->vm_flags, madvise_behavior() will
immediately exit.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-16  2:18       ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-16  2:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Tue, 2017-08-15 at 15:51 -0700, Andrew Morton wrote:
> On Fri, 11 Aug 2017 17:28:29 -0400 riel@redhat.com wrote:
> 
> > A further complication is the proliferation of clone flags,
> > programs bypassing glibc's functions to call clone directly,
> > and programs calling unshare, causing the glibc pthread_atfork
> > hook to not get called.
> > 
> > It would be better to have the kernel take care of this
> > automatically.
> 
> I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of
> a
> prior MADV_WIPEONFORK." here.
> 
> I guess it isn't worth mentioning that these things can cause VMA
> merges and splits. 

That's the same as every other Linux specific madvise operation.

> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > vm_area_struct *vma,
> >  		}
> >  		new_flags &= ~VM_DONTCOPY;
> >  		break;
> > +	case MADV_WIPEONFORK:
> > +		/* MADV_WIPEONFORK is only supported on anonymous
> > memory. */
> > +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> > +			error = -EINVAL;
> > +			goto out;
> > +		}
> > +		new_flags |= VM_WIPEONFORK;
> > +		break;
> > +	case MADV_KEEPONFORK:
> > +		new_flags &= ~VM_WIPEONFORK;
> > +		break;
> >  	case MADV_DONTDUMP:
> >  		new_flags |= VM_DONTDUMP;
> >  		break;
> 
> It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

Given that the only way to set VM_WIPEONFORK is through
MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
other-than-anon vma would be equivalent to a noop.

If new_flags == vma->vm_flags, madvise_behavior() will
immediately exit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-16  2:18       ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-16  2:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Tue, 2017-08-15 at 15:51 -0700, Andrew Morton wrote:
> On Fri, 11 Aug 2017 17:28:29 -0400 riel@redhat.com wrote:
> 
> > A further complication is the proliferation of clone flags,
> > programs bypassing glibc's functions to call clone directly,
> > and programs calling unshare, causing the glibc pthread_atfork
> > hook to not get called.
> > 
> > It would be better to have the kernel take care of this
> > automatically.
> 
> I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of
> a
> prior MADV_WIPEONFORK." here.
> 
> I guess it isn't worth mentioning that these things can cause VMA
> merges and splits.A 

That's the same as every other Linux specific madvise operation.

> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > vm_area_struct *vma,
> > A 		}
> > A 		new_flags &= ~VM_DONTCOPY;
> > A 		break;
> > +	case MADV_WIPEONFORK:
> > +		/* MADV_WIPEONFORK is only supported on anonymous
> > memory. */
> > +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> > +			error = -EINVAL;
> > +			goto out;
> > +		}
> > +		new_flags |= VM_WIPEONFORK;
> > +		break;
> > +	case MADV_KEEPONFORK:
> > +		new_flags &= ~VM_WIPEONFORK;
> > +		break;
> > A 	case MADV_DONTDUMP:
> > A 		new_flags |= VM_DONTDUMP;
> > A 		break;
> 
> It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

Given that the only way to set VM_WIPEONFORK is through
MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
other-than-anon vma would be equivalent to a noop.

If new_flags == vma->vm_flags, madvise_behavior() will
immediately exit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-16  2:18       ` Rik van Riel
@ 2017-08-17 22:50         ` Andrew Morton
  -1 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-17 22:50 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com> wrote:

> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > vm_area_struct *vma,
> > > __		}
> > > __		new_flags &= ~VM_DONTCOPY;
> > > __		break;
> > > +	case MADV_WIPEONFORK:
> > > +		/* MADV_WIPEONFORK is only supported on anonymous
> > > memory. */
> > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> > > +			error = -EINVAL;
> > > +			goto out;
> > > +		}
> > > +		new_flags |= VM_WIPEONFORK;
> > > +		break;
> > > +	case MADV_KEEPONFORK:
> > > +		new_flags &= ~VM_WIPEONFORK;
> > > +		break;
> > > __	case MADV_DONTDUMP:
> > > __		new_flags |= VM_DONTDUMP;
> > > __		break;
> > 
> > It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?
> 
> Given that the only way to set VM_WIPEONFORK is through
> MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> other-than-anon vma would be equivalent to a noop.
> 
> If new_flags == vma->vm_flags, madvise_behavior() will
> immediately exit.

Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
presumably a userspace bug.  A bug which will probably result in
userspace having WIPEONFORK memory which it didn't want.  The kernel
can trivially tell userspace that it has this bug so why not do so?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-17 22:50         ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-17 22:50 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com> wrote:

> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > vm_area_struct *vma,
> > > __		}
> > > __		new_flags &= ~VM_DONTCOPY;
> > > __		break;
> > > +	case MADV_WIPEONFORK:
> > > +		/* MADV_WIPEONFORK is only supported on anonymous
> > > memory. */
> > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> > > +			error = -EINVAL;
> > > +			goto out;
> > > +		}
> > > +		new_flags |= VM_WIPEONFORK;
> > > +		break;
> > > +	case MADV_KEEPONFORK:
> > > +		new_flags &= ~VM_WIPEONFORK;
> > > +		break;
> > > __	case MADV_DONTDUMP:
> > > __		new_flags |= VM_DONTDUMP;
> > > __		break;
> > 
> > It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?
> 
> Given that the only way to set VM_WIPEONFORK is through
> MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> other-than-anon vma would be equivalent to a noop.
> 
> If new_flags == vma->vm_flags, madvise_behavior() will
> immediately exit.

Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
presumably a userspace bug.  A bug which will probably result in
userspace having WIPEONFORK memory which it didn't want.  The kernel
can trivially tell userspace that it has this bug so why not do so?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-17 22:50         ` Andrew Morton
  (?)
@ 2017-08-18 16:28           ` Rik van Riel
  -1 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-18 16:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > vm_area_struct *vma,
> > > > __		}
> > > > __		new_flags &= ~VM_DONTCOPY;
> > > > __		break;
> > > > +	case MADV_WIPEONFORK:
> > > > +		/* MADV_WIPEONFORK is only supported on
> > > > anonymous
> > > > memory. */
> > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > {
> > > > +			error = -EINVAL;
> > > > +			goto out;
> > > > +		}
> > > > +		new_flags |= VM_WIPEONFORK;
> > > > +		break;
> > > > +	case MADV_KEEPONFORK:
> > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > +		break;
> > > > __	case MADV_DONTDUMP:
> > > > __		new_flags |= VM_DONTDUMP;
> > > > __		break;
> > > 
> > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > vmas?
> > 
> > Given that the only way to set VM_WIPEONFORK is through
> > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > other-than-anon vma would be equivalent to a noop.
> > 
> > If new_flags == vma->vm_flags, madvise_behavior() will
> > immediately exit.
> 
> Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> presumably a userspace bug.  A bug which will probably result in
> userspace having WIPEONFORK memory which it didn't want.  The kernel
> can trivially tell userspace that it has this bug so why not do so?

Uh, what?

Calling MADV_WIPEONFORK on an other-than-anon vma results
in NOT getting VM_WIPEONFORK semantics on that VMA.

The code you are commenting on is the bit that CLEARS
the VM_WIPEONFORK code, not the bit where it gets set.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 16:28           ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-18 16:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > vm_area_struct *vma,
> > > > __		}
> > > > __		new_flags &= ~VM_DONTCOPY;
> > > > __		break;
> > > > +	case MADV_WIPEONFORK:
> > > > +		/* MADV_WIPEONFORK is only supported on
> > > > anonymous
> > > > memory. */
> > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > {
> > > > +			error = -EINVAL;
> > > > +			goto out;
> > > > +		}
> > > > +		new_flags |= VM_WIPEONFORK;
> > > > +		break;
> > > > +	case MADV_KEEPONFORK:
> > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > +		break;
> > > > __	case MADV_DONTDUMP:
> > > > __		new_flags |= VM_DONTDUMP;
> > > > __		break;
> > > 
> > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > vmas?
> > 
> > Given that the only way to set VM_WIPEONFORK is through
> > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > other-than-anon vma would be equivalent to a noop.
> > 
> > If new_flags == vma->vm_flags, madvise_behavior() will
> > immediately exit.
> 
> Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> presumably a userspace bug.  A bug which will probably result in
> userspace having WIPEONFORK memory which it didn't want.  The kernel
> can trivially tell userspace that it has this bug so why not do so?

Uh, what?

Calling MADV_WIPEONFORK on an other-than-anon vma results
in NOT getting VM_WIPEONFORK semantics on that VMA.

The code you are commenting on is the bit that CLEARS
the VM_WIPEONFORK code, not the bit where it gets set.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 16:28           ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-18 16:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > vm_area_struct *vma,
> > > > __		}
> > > > __		new_flags &= ~VM_DONTCOPY;
> > > > __		break;
> > > > +	case MADV_WIPEONFORK:
> > > > +		/* MADV_WIPEONFORK is only supported on
> > > > anonymous
> > > > memory. */
> > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > {
> > > > +			error = -EINVAL;
> > > > +			goto out;
> > > > +		}
> > > > +		new_flags |= VM_WIPEONFORK;
> > > > +		break;
> > > > +	case MADV_KEEPONFORK:
> > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > +		break;
> > > > __	case MADV_DONTDUMP:
> > > > __		new_flags |= VM_DONTDUMP;
> > > > __		break;
> > > 
> > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > vmas?
> > 
> > Given that the only way to set VM_WIPEONFORK is through
> > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > other-than-anon vma would be equivalent to a noop.
> > 
> > If new_flags == vma->vm_flags, madvise_behavior() will
> > immediately exit.
> 
> Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> presumably a userspace bug.A A A bug which will probably result in
> userspace having WIPEONFORK memory which it didn't want.A A The kernel
> can trivially tell userspace that it has this bug so why not do so?

Uh, what?

Calling MADV_WIPEONFORK on an other-than-anon vma results
in NOT getting VM_WIPEONFORK semantics on that VMA.

The code you are commenting on is the bit that CLEARS
the VM_WIPEONFORK code, not the bit where it gets set.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
  2017-08-11 21:28   ` riel
@ 2017-08-18 17:25     ` Mike Kravetz
  -1 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-18 17:25 UTC (permalink / raw)
  To: riel, linux-kernel
  Cc: mhocko, linux-mm, fweimer, colm, akpm, keescook, luto, wad,
	mingo, kirill, dave.hansen, linux-api, torvalds, willy

On 08/11/2017 02:28 PM, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> MADV_WIPEONFORK only works on private, anonymous VMAs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.
> 
> This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
> 
>     https://man.openbsd.org/minherit.2
> 
> Reported-by: Florian Weimer <fweimer@redhat.com>
> Reported-by: Colm MacCártaigh <colm@allcosts.net>
> Signed-off-by: Rik van Riel <riel@redhat.com>

My primary concern with the first suggested patch was trying to define
semantics if MADV_WIPEONFORK was applied to a shared or file backed
mapping.  This is no longer allowed.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

> ---
>  arch/alpha/include/uapi/asm/mman.h     |  3 +++
>  arch/mips/include/uapi/asm/mman.h      |  3 +++
>  arch/parisc/include/uapi/asm/mman.h    |  3 +++
>  arch/xtensa/include/uapi/asm/mman.h    |  3 +++
>  fs/proc/task_mmu.c                     |  1 +
>  include/linux/mm.h                     |  2 +-
>  include/trace/events/mmflags.h         |  8 +-------
>  include/uapi/asm-generic/mman-common.h |  3 +++
>  kernel/fork.c                          | 10 ++++++++--
>  mm/madvise.c                           | 13 +++++++++++++
>  10 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 02760f6e6ca4..2a708a792882 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -64,6 +64,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index 655e2fb5395b..d59c57d60d7d 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -91,6 +91,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index 5979745815a5..e205e0179642 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -60,6 +60,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	70		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 71		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 72		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  #define MAP_VARIABLE	0
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index 24365b30aae9..ed23e0a1b30d 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -103,6 +103,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b836fd61ed87..2591e70216ff 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  		[ilog2(VM_NORESERVE)]	= "nr",
>  		[ilog2(VM_HUGETLB)]	= "ht",
>  		[ilog2(VM_ARCH_1)]	= "ar",
> +		[ilog2(VM_WIPEONFORK)]	= "wf",
>  		[ilog2(VM_DONTDUMP)]	= "dd",
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  		[ilog2(VM_SOFTDIRTY)]	= "sd",
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7550eeb06ccf..58788c1b9e9d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -189,7 +189,7 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
>  #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
>  #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
> -#define VM_ARCH_2	0x02000000
> +#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
>  #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
>  
>  #ifdef CONFIG_MEM_SOFT_DIRTY
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 8e50d01c645f..4c2e4737d7bc 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -125,12 +125,6 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
>  #define __VM_ARCH_SPECIFIC_1 {VM_ARCH_1,	"arch_1"	}
>  #endif
>  
> -#if defined(CONFIG_X86)
> -#define __VM_ARCH_SPECIFIC_2 {VM_MPX,		"mpx"		}
> -#else
> -#define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2,	"arch_2"	}
> -#endif
> -
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  #define IF_HAVE_VM_SOFTDIRTY(flag,name) {flag, name },
>  #else
> @@ -162,7 +156,7 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
>  	{VM_NORESERVE,			"noreserve"	},		\
>  	{VM_HUGETLB,			"hugetlb"	},		\
>  	__VM_ARCH_SPECIFIC_1				,		\
> -	__VM_ARCH_SPECIFIC_2				,		\
> +	{VM_WIPEONFORK,			"wipeonfork"	},		\
>  	{VM_DONTDUMP,			"dontdump"	},		\
>  IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
>  	{VM_MIXEDMAP,			"mixedmap"	},		\
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index 8c27db0c5c08..49e2b1d78093 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -58,6 +58,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 17921b0390b4..06376ae4877d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -654,7 +654,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		retval = dup_userfaultfd(tmp, &uf);
>  		if (retval)
>  			goto fail_nomem_anon_vma_fork;
> -		if (anon_vma_fork(tmp, mpnt))
> +		if (tmp->vm_flags & VM_WIPEONFORK) {
> +			/* VM_WIPEONFORK gets a clean slate in the child. */
> +			tmp->anon_vma = NULL;
> +			if (anon_vma_prepare(tmp))
> +				goto fail_nomem_anon_vma_fork;
> +		} else if (anon_vma_fork(tmp, mpnt))
>  			goto fail_nomem_anon_vma_fork;
>  		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>  		tmp->vm_next = tmp->vm_prev = NULL;
> @@ -698,7 +703,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		rb_parent = &tmp->vm_rb;
>  
>  		mm->map_count++;
> -		retval = copy_page_range(mm, oldmm, mpnt);
> +		if (!(tmp->vm_flags & VM_WIPEONFORK))
> +			retval = copy_page_range(mm, oldmm, mpnt);
>  
>  		if (tmp->vm_ops && tmp->vm_ops->open)
>  			tmp->vm_ops->open(tmp);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 9976852f1e1c..9b82cfa88ccf 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		}
>  		new_flags &= ~VM_DONTCOPY;
>  		break;
> +	case MADV_WIPEONFORK:
> +		/* MADV_WIPEONFORK is only supported on anonymous memory. */
> +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags |= VM_WIPEONFORK;
> +		break;
> +	case MADV_KEEPONFORK:
> +		new_flags &= ~VM_WIPEONFORK;
> +		break;
>  	case MADV_DONTDUMP:
>  		new_flags |= VM_DONTDUMP;
>  		break;
> @@ -689,6 +700,8 @@ madvise_behavior_valid(int behavior)
>  #endif
>  	case MADV_DONTDUMP:
>  	case MADV_DODUMP:
> +	case MADV_WIPEONFORK:
> +	case MADV_KEEPONFORK:
>  #ifdef CONFIG_MEMORY_FAILURE
>  	case MADV_SOFT_OFFLINE:
>  	case MADV_HWPOISON:
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 17:25     ` Mike Kravetz
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2017-08-18 17:25 UTC (permalink / raw)
  To: riel, linux-kernel
  Cc: mhocko, linux-mm, fweimer, colm, akpm, keescook, luto, wad,
	mingo, kirill, dave.hansen, linux-api, torvalds, willy

On 08/11/2017 02:28 PM, riel@redhat.com wrote:
> From: Rik van Riel <riel@redhat.com>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> MADV_WIPEONFORK only works on private, anonymous VMAs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.
> 
> This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
> 
>     https://man.openbsd.org/minherit.2
> 
> Reported-by: Florian Weimer <fweimer@redhat.com>
> Reported-by: Colm MacCA!rtaigh <colm@allcosts.net>
> Signed-off-by: Rik van Riel <riel@redhat.com>

My primary concern with the first suggested patch was trying to define
semantics if MADV_WIPEONFORK was applied to a shared or file backed
mapping.  This is no longer allowed.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

> ---
>  arch/alpha/include/uapi/asm/mman.h     |  3 +++
>  arch/mips/include/uapi/asm/mman.h      |  3 +++
>  arch/parisc/include/uapi/asm/mman.h    |  3 +++
>  arch/xtensa/include/uapi/asm/mman.h    |  3 +++
>  fs/proc/task_mmu.c                     |  1 +
>  include/linux/mm.h                     |  2 +-
>  include/trace/events/mmflags.h         |  8 +-------
>  include/uapi/asm-generic/mman-common.h |  3 +++
>  kernel/fork.c                          | 10 ++++++++--
>  mm/madvise.c                           | 13 +++++++++++++
>  10 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 02760f6e6ca4..2a708a792882 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -64,6 +64,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index 655e2fb5395b..d59c57d60d7d 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -91,6 +91,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index 5979745815a5..e205e0179642 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -60,6 +60,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	70		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 71		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 72		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  #define MAP_VARIABLE	0
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index 24365b30aae9..ed23e0a1b30d 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -103,6 +103,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_NODUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b836fd61ed87..2591e70216ff 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  		[ilog2(VM_NORESERVE)]	= "nr",
>  		[ilog2(VM_HUGETLB)]	= "ht",
>  		[ilog2(VM_ARCH_1)]	= "ar",
> +		[ilog2(VM_WIPEONFORK)]	= "wf",
>  		[ilog2(VM_DONTDUMP)]	= "dd",
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  		[ilog2(VM_SOFTDIRTY)]	= "sd",
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7550eeb06ccf..58788c1b9e9d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -189,7 +189,7 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
>  #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
>  #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
> -#define VM_ARCH_2	0x02000000
> +#define VM_WIPEONFORK	0x02000000	/* Wipe VMA contents in child. */
>  #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
>  
>  #ifdef CONFIG_MEM_SOFT_DIRTY
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 8e50d01c645f..4c2e4737d7bc 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -125,12 +125,6 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
>  #define __VM_ARCH_SPECIFIC_1 {VM_ARCH_1,	"arch_1"	}
>  #endif
>  
> -#if defined(CONFIG_X86)
> -#define __VM_ARCH_SPECIFIC_2 {VM_MPX,		"mpx"		}
> -#else
> -#define __VM_ARCH_SPECIFIC_2 {VM_ARCH_2,	"arch_2"	}
> -#endif
> -
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  #define IF_HAVE_VM_SOFTDIRTY(flag,name) {flag, name },
>  #else
> @@ -162,7 +156,7 @@ IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
>  	{VM_NORESERVE,			"noreserve"	},		\
>  	{VM_HUGETLB,			"hugetlb"	},		\
>  	__VM_ARCH_SPECIFIC_1				,		\
> -	__VM_ARCH_SPECIFIC_2				,		\
> +	{VM_WIPEONFORK,			"wipeonfork"	},		\
>  	{VM_DONTDUMP,			"dontdump"	},		\
>  IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
>  	{VM_MIXEDMAP,			"mixedmap"	},		\
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index 8c27db0c5c08..49e2b1d78093 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -58,6 +58,9 @@
>  					   overrides the coredump filter bits */
>  #define MADV_DODUMP	17		/* Clear the MADV_DONTDUMP flag */
>  
> +#define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
> +#define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
> +
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 17921b0390b4..06376ae4877d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -654,7 +654,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		retval = dup_userfaultfd(tmp, &uf);
>  		if (retval)
>  			goto fail_nomem_anon_vma_fork;
> -		if (anon_vma_fork(tmp, mpnt))
> +		if (tmp->vm_flags & VM_WIPEONFORK) {
> +			/* VM_WIPEONFORK gets a clean slate in the child. */
> +			tmp->anon_vma = NULL;
> +			if (anon_vma_prepare(tmp))
> +				goto fail_nomem_anon_vma_fork;
> +		} else if (anon_vma_fork(tmp, mpnt))
>  			goto fail_nomem_anon_vma_fork;
>  		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>  		tmp->vm_next = tmp->vm_prev = NULL;
> @@ -698,7 +703,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		rb_parent = &tmp->vm_rb;
>  
>  		mm->map_count++;
> -		retval = copy_page_range(mm, oldmm, mpnt);
> +		if (!(tmp->vm_flags & VM_WIPEONFORK))
> +			retval = copy_page_range(mm, oldmm, mpnt);
>  
>  		if (tmp->vm_ops && tmp->vm_ops->open)
>  			tmp->vm_ops->open(tmp);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 9976852f1e1c..9b82cfa88ccf 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  		}
>  		new_flags &= ~VM_DONTCOPY;
>  		break;
> +	case MADV_WIPEONFORK:
> +		/* MADV_WIPEONFORK is only supported on anonymous memory. */
> +		if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags |= VM_WIPEONFORK;
> +		break;
> +	case MADV_KEEPONFORK:
> +		new_flags &= ~VM_WIPEONFORK;
> +		break;
>  	case MADV_DONTDUMP:
>  		new_flags |= VM_DONTDUMP;
>  		break;
> @@ -689,6 +700,8 @@ madvise_behavior_valid(int behavior)
>  #endif
>  	case MADV_DONTDUMP:
>  	case MADV_DODUMP:
> +	case MADV_WIPEONFORK:
> +	case MADV_KEEPONFORK:
>  #ifdef CONFIG_MEMORY_FAILURE
>  	case MADV_SOFT_OFFLINE:
>  	case MADV_HWPOISON:
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 18:15             ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-18 18:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel@redhat.com> wrote:

> On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> > wrote:
> > 
> > > > > --- a/mm/madvise.c
> > > > > +++ b/mm/madvise.c
> > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > vm_area_struct *vma,
> > > > > __		}
> > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > __		break;
> > > > > +	case MADV_WIPEONFORK:
> > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > anonymous
> > > > > memory. */
> > > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > > {
> > > > > +			error = -EINVAL;
> > > > > +			goto out;
> > > > > +		}
> > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > +		break;
> > > > > +	case MADV_KEEPONFORK:
> > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > +		break;
> > > > > __	case MADV_DONTDUMP:
> > > > > __		new_flags |= VM_DONTDUMP;
> > > > > __		break;
> > > > 
> > > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > > vmas?
> > > 
> > > Given that the only way to set VM_WIPEONFORK is through
> > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > other-than-anon vma would be equivalent to a noop.
> > > 
> > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > immediately exit.
> > 
> > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> > presumably a userspace bug.____A bug which will probably result in
> > userspace having WIPEONFORK memory which it didn't want.____The kernel
> > can trivially tell userspace that it has this bug so why not do so?
> 
> Uh, what?
> 

Braino.  I meant MADV_KEEPONFORK.  Calling MADV_KEEPONFORK against an
other-than-anon vma is a presumptive userspace bug and the kernel
should report that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 18:15             ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-18 18:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	mike.kravetz-QHcLZuEGTsvQT0dZR+AlfA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, fweimer-H+wXaHxf7aLQT0dZR+AlfA,
	colm-ZXBCfW2eEe/k1uMJSBkQmQ, keescook-F7+t8E8rja9g9hUCZPvPmw,
	luto-kltTT9wpgjJwATOyAt5JVQ, wad-F7+t8E8rja9g9hUCZPvPmw,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	willy-wEGCiKHe2LqWVfeAwA7xHQ

On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > wrote:
> > 
> > > > > --- a/mm/madvise.c
> > > > > +++ b/mm/madvise.c
> > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > vm_area_struct *vma,
> > > > > __		}
> > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > __		break;
> > > > > +	case MADV_WIPEONFORK:
> > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > anonymous
> > > > > memory. */
> > > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > > {
> > > > > +			error = -EINVAL;
> > > > > +			goto out;
> > > > > +		}
> > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > +		break;
> > > > > +	case MADV_KEEPONFORK:
> > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > +		break;
> > > > > __	case MADV_DONTDUMP:
> > > > > __		new_flags |= VM_DONTDUMP;
> > > > > __		break;
> > > > 
> > > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > > vmas?
> > > 
> > > Given that the only way to set VM_WIPEONFORK is through
> > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > other-than-anon vma would be equivalent to a noop.
> > > 
> > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > immediately exit.
> > 
> > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> > presumably a userspace bug.____A bug which will probably result in
> > userspace having WIPEONFORK memory which it didn't want.____The kernel
> > can trivially tell userspace that it has this bug so why not do so?
> 
> Uh, what?
> 

Braino.  I meant MADV_KEEPONFORK.  Calling MADV_KEEPONFORK against an
other-than-anon vma is a presumptive userspace bug and the kernel
should report that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-18 18:15             ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2017-08-18 18:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel@redhat.com> wrote:

> On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> > wrote:
> > 
> > > > > --- a/mm/madvise.c
> > > > > +++ b/mm/madvise.c
> > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > vm_area_struct *vma,
> > > > > __		}
> > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > __		break;
> > > > > +	case MADV_WIPEONFORK:
> > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > anonymous
> > > > > memory. */
> > > > > +		if (vma->vm_file || vma->vm_flags & VM_SHARED)
> > > > > {
> > > > > +			error = -EINVAL;
> > > > > +			goto out;
> > > > > +		}
> > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > +		break;
> > > > > +	case MADV_KEEPONFORK:
> > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > +		break;
> > > > > __	case MADV_DONTDUMP:
> > > > > __		new_flags |= VM_DONTDUMP;
> > > > > __		break;
> > > > 
> > > > It seems odd to permit MADV_KEEPONFORK against other-than-anon
> > > > vmas?
> > > 
> > > Given that the only way to set VM_WIPEONFORK is through
> > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > other-than-anon vma would be equivalent to a noop.
> > > 
> > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > immediately exit.
> > 
> > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma is
> > presumably a userspace bug.____A bug which will probably result in
> > userspace having WIPEONFORK memory which it didn't want.____The kernel
> > can trivially tell userspace that it has this bug so why not do so?
> 
> Uh, what?
> 

Braino.  I meant MADV_KEEPONFORK.  Calling MADV_KEEPONFORK against an
other-than-anon vma is a presumptive userspace bug and the kernel
should report that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-19  0:02               ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-19  0:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 2017-08-18 at 11:15 -0700, Andrew Morton wrote:
> On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> > > wrote:
> > > 
> > > > > > --- a/mm/madvise.c
> > > > > > +++ b/mm/madvise.c
> > > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > > vm_area_struct *vma,
> > > > > > __		}
> > > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > > __		break;
> > > > > > +	case MADV_WIPEONFORK:
> > > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > > anonymous
> > > > > > memory. */
> > > > > > +		if (vma->vm_file || vma->vm_flags &
> > > > > > VM_SHARED)
> > > > > > {
> > > > > > +			error = -EINVAL;
> > > > > > +			goto out;
> > > > > > +		}
> > > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > +	case MADV_KEEPONFORK:
> > > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > __	case MADV_DONTDUMP:
> > > > > > __		new_flags |= VM_DONTDUMP;
> > > > > > __		break;
> > > > > 
> > > > > It seems odd to permit MADV_KEEPONFORK against other-than-
> > > > > anon
> > > > > vmas?
> > > > 
> > > > Given that the only way to set VM_WIPEONFORK is through
> > > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > > other-than-anon vma would be equivalent to a noop.
> > > > 
> > > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > > immediately exit.
> > > 
> > > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma
> > > is
> > > presumably a userspace bug.____A bug which will probably result
> > > in
> > > userspace having WIPEONFORK memory which it didn't want.____The
> > > kernel
> > > can trivially tell userspace that it has this bug so why not do
> > > so?
> > 
> > Uh, what?
> > 
> 
> Braino.  I meant MADV_KEEPONFORK.  Calling MADV_KEEPONFORK against an
> other-than-anon vma is a presumptive userspace bug and the kernel
> should report that.

All MADV_KEEPONFORK does is clear the flag set by
MADV_WIPEONFORK. Since there is no way to set the
WIPEONFORK flag on other-than-anon VMAs, that means
MADV_KEEPONFORK is always a noop for those VMAs.

You remind me that I should send in a man page
addition, though, when this code gets sent to
Linus.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-19  0:02               ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-19  0:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mhocko-DgEjT+Ai2ygdnm+yROfE0A,
	mike.kravetz-QHcLZuEGTsvQT0dZR+AlfA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, fweimer-H+wXaHxf7aLQT0dZR+AlfA,
	colm-ZXBCfW2eEe/k1uMJSBkQmQ, keescook-F7+t8E8rja9g9hUCZPvPmw,
	luto-kltTT9wpgjJwATOyAt5JVQ, wad-F7+t8E8rja9g9hUCZPvPmw,
	mingo-DgEjT+Ai2ygdnm+yROfE0A, kirill-oKw7cIdHH8eLwutG50LtGA,
	dave.hansen-ral2JQCrhuEAvxtiuMwx3w,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	willy-wEGCiKHe2LqWVfeAwA7xHQ

On Fri, 2017-08-18 at 11:15 -0700, Andrew Morton wrote:
> On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> wrote:
> 
> > On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > wrote:
> > > 
> > > > > > --- a/mm/madvise.c
> > > > > > +++ b/mm/madvise.c
> > > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > > vm_area_struct *vma,
> > > > > > __		}
> > > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > > __		break;
> > > > > > +	case MADV_WIPEONFORK:
> > > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > > anonymous
> > > > > > memory. */
> > > > > > +		if (vma->vm_file || vma->vm_flags &
> > > > > > VM_SHARED)
> > > > > > {
> > > > > > +			error = -EINVAL;
> > > > > > +			goto out;
> > > > > > +		}
> > > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > +	case MADV_KEEPONFORK:
> > > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > __	case MADV_DONTDUMP:
> > > > > > __		new_flags |= VM_DONTDUMP;
> > > > > > __		break;
> > > > > 
> > > > > It seems odd to permit MADV_KEEPONFORK against other-than-
> > > > > anon
> > > > > vmas?
> > > > 
> > > > Given that the only way to set VM_WIPEONFORK is through
> > > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > > other-than-anon vma would be equivalent to a noop.
> > > > 
> > > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > > immediately exit.
> > > 
> > > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma
> > > is
> > > presumably a userspace bug.____A bug which will probably result
> > > in
> > > userspace having WIPEONFORK memory which it didn't want.____The
> > > kernel
> > > can trivially tell userspace that it has this bug so why not do
> > > so?
> > 
> > Uh, what?
> > 
> 
> Braino.  I meant MADV_KEEPONFORK.  Calling MADV_KEEPONFORK against an
> other-than-anon vma is a presumptive userspace bug and the kernel
> should report that.

All MADV_KEEPONFORK does is clear the flag set by
MADV_WIPEONFORK. Since there is no way to set the
WIPEONFORK flag on other-than-anon VMAs, that means
MADV_KEEPONFORK is always a noop for those VMAs.

You remind me that I should send in a man page
addition, though, when this code gets sent to
Linus.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK
@ 2017-08-19  0:02               ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2017-08-19  0:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mhocko, mike.kravetz, linux-mm, fweimer, colm,
	keescook, luto, wad, mingo, kirill, dave.hansen, linux-api,
	torvalds, willy

On Fri, 2017-08-18 at 11:15 -0700, Andrew Morton wrote:
> On Fri, 18 Aug 2017 12:28:29 -0400 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > On Thu, 2017-08-17 at 15:50 -0700, Andrew Morton wrote:
> > > On Tue, 15 Aug 2017 22:18:19 -0400 Rik van Riel <riel@redhat.com>
> > > wrote:
> > > 
> > > > > > --- a/mm/madvise.c
> > > > > > +++ b/mm/madvise.c
> > > > > > @@ -80,6 +80,17 @@ static long madvise_behavior(struct
> > > > > > vm_area_struct *vma,
> > > > > > __		}
> > > > > > __		new_flags &= ~VM_DONTCOPY;
> > > > > > __		break;
> > > > > > +	case MADV_WIPEONFORK:
> > > > > > +		/* MADV_WIPEONFORK is only supported on
> > > > > > anonymous
> > > > > > memory. */
> > > > > > +		if (vma->vm_file || vma->vm_flags &
> > > > > > VM_SHARED)
> > > > > > {
> > > > > > +			error = -EINVAL;
> > > > > > +			goto out;
> > > > > > +		}
> > > > > > +		new_flags |= VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > +	case MADV_KEEPONFORK:
> > > > > > +		new_flags &= ~VM_WIPEONFORK;
> > > > > > +		break;
> > > > > > __	case MADV_DONTDUMP:
> > > > > > __		new_flags |= VM_DONTDUMP;
> > > > > > __		break;
> > > > > 
> > > > > It seems odd to permit MADV_KEEPONFORK against other-than-
> > > > > anon
> > > > > vmas?
> > > > 
> > > > Given that the only way to set VM_WIPEONFORK is through
> > > > MADV_WIPEONFORK, calling MADV_KEEPONFORK on an
> > > > other-than-anon vma would be equivalent to a noop.
> > > > 
> > > > If new_flags == vma->vm_flags, madvise_behavior() will
> > > > immediately exit.
> > > 
> > > Yes, but calling MADV_WIPEONFORK against an other-than-anon vma
> > > is
> > > presumably a userspace bug.____A bug which will probably result
> > > in
> > > userspace having WIPEONFORK memory which it didn't want.____The
> > > kernel
> > > can trivially tell userspace that it has this bug so why not do
> > > so?
> > 
> > Uh, what?
> > 
> 
> Braino.A A I meant MADV_KEEPONFORK.A A Calling MADV_KEEPONFORK against an
> other-than-anon vma is a presumptive userspace bug and the kernel
> should report that.

All MADV_KEEPONFORK does is clear the flag set by
MADV_WIPEONFORK. Since there is no way to set the
WIPEONFORK flag on other-than-anon VMAs, that means
MADV_KEEPONFORK is always a noop for those VMAs.

You remind me that I should send in a man page
addition, though, when this code gets sent to
Linus.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-08-19  0:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-11 21:28 [PATCH v4 0/2] mm,fork,security: introduce MADV_WIPEONFORK riel
2017-08-11 21:28 ` riel
2017-08-11 21:28 ` riel-H+wXaHxf7aLQT0dZR+AlfA
2017-08-11 21:28 ` [PATCH 1/2] x86,mpx: make mpx depend on x86-64 to free up VMA flag riel
2017-08-11 21:28   ` riel
2017-08-11 21:28 ` [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK riel
2017-08-11 21:28   ` riel
2017-08-15 22:51   ` Andrew Morton
2017-08-15 22:51     ` Andrew Morton
2017-08-16  2:18     ` Rik van Riel
2017-08-16  2:18       ` Rik van Riel
2017-08-16  2:18       ` Rik van Riel
2017-08-17 22:50       ` Andrew Morton
2017-08-17 22:50         ` Andrew Morton
2017-08-18 16:28         ` Rik van Riel
2017-08-18 16:28           ` Rik van Riel
2017-08-18 16:28           ` Rik van Riel
2017-08-18 18:15           ` Andrew Morton
2017-08-18 18:15             ` Andrew Morton
2017-08-18 18:15             ` Andrew Morton
2017-08-19  0:02             ` Rik van Riel
2017-08-19  0:02               ` Rik van Riel
2017-08-19  0:02               ` Rik van Riel
2017-08-18 17:25   ` Mike Kravetz
2017-08-18 17:25     ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.