All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-15 20:57 ` Will Drewry
  0 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-15 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, Will Drewry, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li,
	linux-mm

This patch proposes a sysctl knob that allows a privileged user to
disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
mountpoint.  It does not alter the normal behavior resulting from
attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
of any other subsystems checking MNT_NOEXEC.

It is motivated by a common /dev/shm, /tmp usecase. There are few
facilities for creating a shared memory segment that can be remapped in
the same process address space with different permissions.  Often, a
file in /tmp provides this functionality.  However, on distributions
that are more restrictive/paranoid, world-writeable directories are
often mounted "noexec".  The only workaround to support software that
needs this behavior is to either not use that software or remount /tmp
exec.  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
the only recourse is using SysV IPC, the application programmer loses
many of the useful ABI features that they get using a mmap'd file (and
as such are often hesitant to explore that more painful path).

With this patch, it would be possible to change the sysctl variable
such that mprotect(PROT_EXEC) would succeed.  In cases like the example
above, an additional userspace mmap-wrapper would be needed, but in
other cases, like how code.google.com/p/nativeclient mmap()s then
mprotect()s, the behavior would be unaffected.

The tradeoff is a loss of defense in depth, but it seems reasonable when
the alternative is to disable the defense entirely.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 kernel/sysctl.c |   12 ++++++++++++
 mm/Kconfig      |   17 +++++++++++++++++
 mm/mmap.c       |    4 +++-
 3 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 11d65b5..aa8bcc0 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -89,6 +89,9 @@
 /* External variables not in a header file. */
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
+#ifdef CONFIG_MMU
+extern int sysctl_mmap_noexec_taint;
+#endif
 extern int max_threads;
 extern int core_uses_pid;
 extern int suid_dumpable;
@@ -1293,6 +1296,15 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= mmap_min_addr_handler,
 	},
+	{
+		.procname	= "mmap_noexec_taint",
+		.data		= &sysctl_mmap_noexec_taint,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 #endif
 #ifdef CONFIG_NUMA
 	{
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..539dc12 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -256,6 +256,23 @@ config DEFAULT_MMAP_MIN_ADDR
 	  This value can be changed after boot using the
 	  /proc/sys/vm/mmap_min_addr tunable.
 
+config MMAP_NOEXEC_TAINT
+	int "Turns on tainting of mmap()d files from noexec mountpoints"
+	depends on MMU
+	default 1
+	help
+	  By default, the ability to change the protections of a virtual
+	  memory area to allow execution depend on if the vma has the
+	  VM_MAYEXEC flag.  When mapping regions from files, VM_MAYEXEC
+	  will be unset if the containing mountpoint is mounted MNT_NOEXEC.
+	  By setting the value to 0, any mmap()d region may be later
+	  mprotect()d with PROT_EXEC.
+
+	  If unsure, keep the value set to 1.
+
+	  This value can be changed after boot using the
+	  /proc/sys/vm/mmap_noexec_taint tunable.
+
 config ARCH_SUPPORTS_MEMORY_FAILURE
 	bool
 
diff --git a/mm/mmap.c b/mm/mmap.c
index a65efd4..7aceddd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -87,6 +87,7 @@ EXPORT_SYMBOL(vm_get_page_prot);
 int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;  /* heuristic overcommit */
 int sysctl_overcommit_ratio __read_mostly = 50;	/* default is 50% */
 int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
+int sysctl_mmap_noexec_taint __read_mostly = CONFIG_DEFAULT_MMAP_NOEXEC_TAINT;
 /*
  * Make sure vm_committed_as in one cacheline and not cacheline shared with
  * other variables. It can be updated by several CPUs frequently.
@@ -1039,7 +1040,8 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 			if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) {
 				if (vm_flags & VM_EXEC)
 					return -EPERM;
-				vm_flags &= ~VM_MAYEXEC;
+				if (sysctl_mmap_noexec_taint)
+					vm_flags &= ~VM_MAYEXEC;
 			}
 
 			if (!file->f_op || !file->f_op->mmap)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-15 20:57 ` Will Drewry
  0 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-15 20:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: mcgrathr, Will Drewry, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Mel Gorman, Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li,
	linux-mm

This patch proposes a sysctl knob that allows a privileged user to
disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
mountpoint.  It does not alter the normal behavior resulting from
attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
of any other subsystems checking MNT_NOEXEC.

It is motivated by a common /dev/shm, /tmp usecase. There are few
facilities for creating a shared memory segment that can be remapped in
the same process address space with different permissions.  Often, a
file in /tmp provides this functionality.  However, on distributions
that are more restrictive/paranoid, world-writeable directories are
often mounted "noexec".  The only workaround to support software that
needs this behavior is to either not use that software or remount /tmp
exec.  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
the only recourse is using SysV IPC, the application programmer loses
many of the useful ABI features that they get using a mmap'd file (and
as such are often hesitant to explore that more painful path).

With this patch, it would be possible to change the sysctl variable
such that mprotect(PROT_EXEC) would succeed.  In cases like the example
above, an additional userspace mmap-wrapper would be needed, but in
other cases, like how code.google.com/p/nativeclient mmap()s then
mprotect()s, the behavior would be unaffected.

The tradeoff is a loss of defense in depth, but it seems reasonable when
the alternative is to disable the defense entirely.

Signed-off-by: Will Drewry <wad@chromium.org>
---
 kernel/sysctl.c |   12 ++++++++++++
 mm/Kconfig      |   17 +++++++++++++++++
 mm/mmap.c       |    4 +++-
 3 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 11d65b5..aa8bcc0 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -89,6 +89,9 @@
 /* External variables not in a header file. */
 extern int sysctl_overcommit_memory;
 extern int sysctl_overcommit_ratio;
+#ifdef CONFIG_MMU
+extern int sysctl_mmap_noexec_taint;
+#endif
 extern int max_threads;
 extern int core_uses_pid;
 extern int suid_dumpable;
@@ -1293,6 +1296,15 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= mmap_min_addr_handler,
 	},
+	{
+		.procname	= "mmap_noexec_taint",
+		.data		= &sysctl_mmap_noexec_taint,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 #endif
 #ifdef CONFIG_NUMA
 	{
diff --git a/mm/Kconfig b/mm/Kconfig
index f2f1ca1..539dc12 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -256,6 +256,23 @@ config DEFAULT_MMAP_MIN_ADDR
 	  This value can be changed after boot using the
 	  /proc/sys/vm/mmap_min_addr tunable.
 
+config MMAP_NOEXEC_TAINT
+	int "Turns on tainting of mmap()d files from noexec mountpoints"
+	depends on MMU
+	default 1
+	help
+	  By default, the ability to change the protections of a virtual
+	  memory area to allow execution depend on if the vma has the
+	  VM_MAYEXEC flag.  When mapping regions from files, VM_MAYEXEC
+	  will be unset if the containing mountpoint is mounted MNT_NOEXEC.
+	  By setting the value to 0, any mmap()d region may be later
+	  mprotect()d with PROT_EXEC.
+
+	  If unsure, keep the value set to 1.
+
+	  This value can be changed after boot using the
+	  /proc/sys/vm/mmap_noexec_taint tunable.
+
 config ARCH_SUPPORTS_MEMORY_FAILURE
 	bool
 
diff --git a/mm/mmap.c b/mm/mmap.c
index a65efd4..7aceddd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -87,6 +87,7 @@ EXPORT_SYMBOL(vm_get_page_prot);
 int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;  /* heuristic overcommit */
 int sysctl_overcommit_ratio __read_mostly = 50;	/* default is 50% */
 int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
+int sysctl_mmap_noexec_taint __read_mostly = CONFIG_DEFAULT_MMAP_NOEXEC_TAINT;
 /*
  * Make sure vm_committed_as in one cacheline and not cacheline shared with
  * other variables. It can be updated by several CPUs frequently.
@@ -1039,7 +1040,8 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 			if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) {
 				if (vm_flags & VM_EXEC)
 					return -EPERM;
-				vm_flags &= ~VM_MAYEXEC;
+				if (sysctl_mmap_noexec_taint)
+					vm_flags &= ~VM_MAYEXEC;
 			}
 
 			if (!file->f_op || !file->f_op->mmap)
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-15 20:57 ` Will Drewry
@ 2011-08-16  9:33   ` Mel Gorman
  -1 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2011-08-16  9:33 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Mon, Aug 15, 2011 at 03:57:35PM -0500, Will Drewry wrote:
> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint.  It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.
> 
> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions.  Often, a
> file in /tmp provides this functionality.  However, on distributions
> that are more restrictive/paranoid, world-writeable directories are
> often mounted "noexec".  The only workaround to support software that
> needs this behavior is to either not use that software or remount /tmp
> exec.  (E.g., https://bugs.gentoo.org/350336?id=350336) Given that
> the only recourse is using SysV IPC, the application programmer loses
> many of the useful ABI features that they get using a mmap'd file (and
> as such are often hesitant to explore that more painful path).
> 

Is using shm_open()+mmap instead of open()+mmap() to open a file on
/dev/shm really that difficult?

int shm_open(const char *name, int oflag, mode_t mode);
int open(const char *pathname, int flags, mode_t mode);

> With this patch, it would be possible to change the sysctl variable
> such that mprotect(PROT_EXEC) would succeed.

An ordinary user is not going to know that a segfault from an
application can be fixed with this sysctl. This looks like something
that should be fixed in the library so that it can work on kernels
that do not have the sysctl.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16  9:33   ` Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2011-08-16  9:33 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Mon, Aug 15, 2011 at 03:57:35PM -0500, Will Drewry wrote:
> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint.  It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.
> 
> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions.  Often, a
> file in /tmp provides this functionality.  However, on distributions
> that are more restrictive/paranoid, world-writeable directories are
> often mounted "noexec".  The only workaround to support software that
> needs this behavior is to either not use that software or remount /tmp
> exec.  (E.g., https://bugs.gentoo.org/350336?id=350336) Given that
> the only recourse is using SysV IPC, the application programmer loses
> many of the useful ABI features that they get using a mmap'd file (and
> as such are often hesitant to explore that more painful path).
> 

Is using shm_open()+mmap instead of open()+mmap() to open a file on
/dev/shm really that difficult?

int shm_open(const char *name, int oflag, mode_t mode);
int open(const char *pathname, int flags, mode_t mode);

> With this patch, it would be possible to change the sysctl variable
> such that mprotect(PROT_EXEC) would succeed.

An ordinary user is not going to know that a segfault from an
application can be fixed with this sysctl. This looks like something
that should be fixed in the library so that it can work on kernels
that do not have the sysctl.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16  9:33   ` Mel Gorman
@ 2011-08-16 17:07     ` Roland McGrath
  -1 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2011-08-16 17:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> Is using shm_open()+mmap instead of open()+mmap() to open a file on
> /dev/shm really that difficult?
>
> int shm_open(const char *name, int oflag, mode_t mode);
> int open(const char *pathname, int flags, mode_t mode);

I cannot figure out the rationale behind this question at all.
Both of these library functions result in the same system call.

> An ordinary user is not going to know that a segfault from an
> application can be fixed with this sysctl. This looks like something
> that should be fixed in the library so that it can work on kernels
> that do not have the sysctl.

I think the expectation is that the administrator or system builder
who decides to set the (non-default) noexec mount option will also
set the sysctl at the same time.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 17:07     ` Roland McGrath
  0 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2011-08-16 17:07 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> Is using shm_open()+mmap instead of open()+mmap() to open a file on
> /dev/shm really that difficult?
>
> int shm_open(const char *name, int oflag, mode_t mode);
> int open(const char *pathname, int flags, mode_t mode);

I cannot figure out the rationale behind this question at all.
Both of these library functions result in the same system call.

> An ordinary user is not going to know that a segfault from an
> application can be fixed with this sysctl. This looks like something
> that should be fixed in the library so that it can work on kernels
> that do not have the sysctl.

I think the expectation is that the administrator or system builder
who decides to set the (non-default) noexec mount option will also
set the sysctl at the same time.


Thanks,
Roland

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 17:07     ` Roland McGrath
@ 2011-08-16 19:40       ` Mel Gorman
  -1 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2011-08-16 19:40 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
> > /dev/shm really that difficult?
> >
> > int shm_open(const char *name, int oflag, mode_t mode);
> > int open(const char *pathname, int flags, mode_t mode);
> 
> I cannot figure out the rationale behind this question at all.
> Both of these library functions result in the same system call.
> 

They might result in the same system call but one of them creates
the file under /dev/shm which should not have the same permissions
problem. The library really appears to want to create a shared
executable object, using shm_open does not appear that unreasonable
to me.

> > An ordinary user is not going to know that a segfault from an
> > application can be fixed with this sysctl. This looks like something
> > that should be fixed in the library so that it can work on kernels
> > that do not have the sysctl.
> 
> I think the expectation is that the administrator or system builder
> who decides to set the (non-default) noexec mount option will also
> set the sysctl at the same time.
> 

Which then needs to be copied in each distro wanting to do the same
thing and is not backwards compatible where as using shm_open is.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 19:40       ` Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2011-08-16 19:40 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
> > /dev/shm really that difficult?
> >
> > int shm_open(const char *name, int oflag, mode_t mode);
> > int open(const char *pathname, int flags, mode_t mode);
> 
> I cannot figure out the rationale behind this question at all.
> Both of these library functions result in the same system call.
> 

They might result in the same system call but one of them creates
the file under /dev/shm which should not have the same permissions
problem. The library really appears to want to create a shared
executable object, using shm_open does not appear that unreasonable
to me.

> > An ordinary user is not going to know that a segfault from an
> > application can be fixed with this sysctl. This looks like something
> > that should be fixed in the library so that it can work on kernels
> > that do not have the sysctl.
> 
> I think the expectation is that the administrator or system builder
> who decides to set the (non-default) noexec mount option will also
> set the sysctl at the same time.
> 

Which then needs to be copied in each distro wanting to do the same
thing and is not backwards compatible where as using shm_open is.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 19:40       ` Mel Gorman
@ 2011-08-16 19:46         ` Roland McGrath
  -1 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2011-08-16 19:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 12:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> They might result in the same system call but one of them creates
> the file under /dev/shm which should not have the same permissions
> problem. The library really appears to want to create a shared
> executable object, using shm_open does not appear that unreasonable
> to me.

People do use shm_open.  Some systems mount /dev/shm with noexec.
That's why we're here in the first place.

> Which then needs to be copied in each distro wanting to do the same
> thing and is not backwards compatible where as using shm_open is.

Each distro wanting to set noexec on its /dev/shm mounts has to set the
sysctl (or its default in their kernel builds), yes.  Otherwise they are
not compatible with the expectation of using PROT_EXEC on files opened with
shm_open.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 19:46         ` Roland McGrath
  0 siblings, 0 replies; 19+ messages in thread
From: Roland McGrath @ 2011-08-16 19:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Will Drewry, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 12:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> They might result in the same system call but one of them creates
> the file under /dev/shm which should not have the same permissions
> problem. The library really appears to want to create a shared
> executable object, using shm_open does not appear that unreasonable
> to me.

People do use shm_open.  Some systems mount /dev/shm with noexec.
That's why we're here in the first place.

> Which then needs to be copied in each distro wanting to do the same
> thing and is not backwards compatible where as using shm_open is.

Each distro wanting to set noexec on its /dev/shm mounts has to set the
sysctl (or its default in their kernel builds), yes.  Otherwise they are
not compatible with the expectation of using PROT_EXEC on files opened with
shm_open.


Thanks,
Roland

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 19:40       ` Mel Gorman
@ 2011-08-16 19:50         ` Will Drewry
  -1 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 19:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Roland McGrath, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
>> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
>> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
>> > /dev/shm really that difficult?
>> >
>> > int shm_open(const char *name, int oflag, mode_t mode);
>> > int open(const char *pathname, int flags, mode_t mode);
>>
>> I cannot figure out the rationale behind this question at all.
>> Both of these library functions result in the same system call.
>>
>
> They might result in the same system call but one of them creates
> the file under /dev/shm which should not have the same permissions
> problem. The library really appears to want to create a shared
> executable object, using shm_open does not appear that unreasonable
> to me.

If /dev/shm is mounted noexec, the resulting file will have VM_MAYEXEC
stripped.  I don't believe it is capable of doing anything special
that will cause the mmap code path to find a different containing
mountpoint.  If it could, then that would certainly be preferable, but
it would also make this VM_MAYEXEC calculation less effective in the
default case.

thanks!

>> > An ordinary user is not going to know that a segfault from an
>> > application can be fixed with this sysctl. This looks like something
>> > that should be fixed in the library so that it can work on kernels
>> > that do not have the sysctl.
>>
>> I think the expectation is that the administrator or system builder
>> who decides to set the (non-default) noexec mount option will also
>> set the sysctl at the same time.
>>
>
> Which then needs to be copied in each distro wanting to do the same
> thing and is not backwards compatible where as using shm_open is.
>
> --
> Mel Gorman
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 19:50         ` Will Drewry
  0 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 19:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Roland McGrath, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
> On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
>> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
>> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
>> > /dev/shm really that difficult?
>> >
>> > int shm_open(const char *name, int oflag, mode_t mode);
>> > int open(const char *pathname, int flags, mode_t mode);
>>
>> I cannot figure out the rationale behind this question at all.
>> Both of these library functions result in the same system call.
>>
>
> They might result in the same system call but one of them creates
> the file under /dev/shm which should not have the same permissions
> problem. The library really appears to want to create a shared
> executable object, using shm_open does not appear that unreasonable
> to me.

If /dev/shm is mounted noexec, the resulting file will have VM_MAYEXEC
stripped.  I don't believe it is capable of doing anything special
that will cause the mmap code path to find a different containing
mountpoint.  If it could, then that would certainly be preferable, but
it would also make this VM_MAYEXEC calculation less effective in the
default case.

thanks!

>> > An ordinary user is not going to know that a segfault from an
>> > application can be fixed with this sysctl. This looks like something
>> > that should be fixed in the library so that it can work on kernels
>> > that do not have the sysctl.
>>
>> I think the expectation is that the administrator or system builder
>> who decides to set the (non-default) noexec mount option will also
>> set the sysctl at the same time.
>>
>
> Which then needs to be copied in each distro wanting to do the same
> thing and is not backwards compatible where as using shm_open is.
>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 19:50         ` Will Drewry
@ 2011-08-16 19:50           ` Will Drewry
  -1 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 19:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Roland McGrath, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:50 PM, Will Drewry <wad@chromium.org> wrote:
> On Tue, Aug 16, 2011 at 2:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>> On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
>>> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
>>> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
>>> > /dev/shm really that difficult?
>>> >
>>> > int shm_open(const char *name, int oflag, mode_t mode);
>>> > int open(const char *pathname, int flags, mode_t mode);
>>>
>>> I cannot figure out the rationale behind this question at all.
>>> Both of these library functions result in the same system call.
>>>
>>
>> They might result in the same system call but one of them creates
>> the file under /dev/shm which should not have the same permissions
>> problem. The library really appears to want to create a shared
>> executable object, using shm_open does not appear that unreasonable
>> to me.
>
> If /dev/shm is mounted noexec, the resulting file will have VM_MAYEXEC

Err VMA from mmaping the file from shm_open().

> stripped.  I don't believe it is capable of doing anything special
> that will cause the mmap code path to find a different containing
> mountpoint.  If it could, then that would certainly be preferable, but
> it would also make this VM_MAYEXEC calculation less effective in the
> default case.
>
> thanks!
>
>>> > An ordinary user is not going to know that a segfault from an
>>> > application can be fixed with this sysctl. This looks like something
>>> > that should be fixed in the library so that it can work on kernels
>>> > that do not have the sysctl.
>>>
>>> I think the expectation is that the administrator or system builder
>>> who decides to set the (non-default) noexec mount option will also
>>> set the sysctl at the same time.
>>>
>>
>> Which then needs to be copied in each distro wanting to do the same
>> thing and is not backwards compatible where as using shm_open is.
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 19:50           ` Will Drewry
  0 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 19:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Roland McGrath, linux-kernel, Ingo Molnar, Andrew Morton,
	Peter Zijlstra, Al Viro, Eric Paris, Andrea Arcangeli,
	Rik van Riel, Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 2:50 PM, Will Drewry <wad@chromium.org> wrote:
> On Tue, Aug 16, 2011 at 2:40 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>> On Tue, Aug 16, 2011 at 10:07:46AM -0700, Roland McGrath wrote:
>>> On Tue, Aug 16, 2011 at 2:33 AM, Mel Gorman <mel@csn.ul.ie> wrote:
>>> > Is using shm_open()+mmap instead of open()+mmap() to open a file on
>>> > /dev/shm really that difficult?
>>> >
>>> > int shm_open(const char *name, int oflag, mode_t mode);
>>> > int open(const char *pathname, int flags, mode_t mode);
>>>
>>> I cannot figure out the rationale behind this question at all.
>>> Both of these library functions result in the same system call.
>>>
>>
>> They might result in the same system call but one of them creates
>> the file under /dev/shm which should not have the same permissions
>> problem. The library really appears to want to create a shared
>> executable object, using shm_open does not appear that unreasonable
>> to me.
>
> If /dev/shm is mounted noexec, the resulting file will have VM_MAYEXEC

Err VMA from mmaping the file from shm_open().

> stripped.  I don't believe it is capable of doing anything special
> that will cause the mmap code path to find a different containing
> mountpoint.  If it could, then that would certainly be preferable, but
> it would also make this VM_MAYEXEC calculation less effective in the
> default case.
>
> thanks!
>
>>> > An ordinary user is not going to know that a segfault from an
>>> > application can be fixed with this sysctl. This looks like something
>>> > that should be fixed in the library so that it can work on kernels
>>> > that do not have the sysctl.
>>>
>>> I think the expectation is that the administrator or system builder
>>> who decides to set the (non-default) noexec mount option will also
>>> set the sysctl at the same time.
>>>
>>
>> Which then needs to be copied in each distro wanting to do the same
>> thing and is not backwards compatible where as using shm_open is.
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-15 20:57 ` Will Drewry
@ 2011-08-16 21:54   ` Andrew Morton
  -1 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2011-08-16 21:54 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Peter Zijlstra, Al Viro,
	Eric Paris, Andrea Arcangeli, Mel Gorman, Rik van Riel,
	Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Mon, 15 Aug 2011 15:57:35 -0500
Will Drewry <wad@chromium.org> wrote:

> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint.  It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.
> 
> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions.  Often, a
> file in /tmp provides this functionality.  However, on distributions
> that are more restrictive/paranoid, world-writeable directories are
> often mounted "noexec".  The only workaround to support software that
> needs this behavior is to either not use that software or remount /tmp
> exec.

Remounting /tmp would appear to have the same effect as altering this
sysctl, so why not just remount /tmp?

>  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
> the only recourse is using SysV IPC, the application programmer loses
> many of the useful ABI features that they get using a mmap'd file (and
> as such are often hesitant to explore that more painful path).
> 
> With this patch, it would be possible to change the sysctl variable
> such that mprotect(PROT_EXEC) would succeed.  In cases like the example
> above, an additional userspace mmap-wrapper would be needed, but in
> other cases, like how code.google.com/p/nativeclient mmap()s then
> mprotect()s, the behavior would be unaffected.
> 
> The tradeoff is a loss of defense in depth, but it seems reasonable when
> the alternative is to disable the defense entirely.
> 
> ...
>
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -89,6 +89,9 @@
>  /* External variables not in a header file. */
>  extern int sysctl_overcommit_memory;
>  extern int sysctl_overcommit_ratio;
> +#ifdef CONFIG_MMU

The ifdef isn't needed in the header and we generally omit it to avoid
clutter.

afaict this feature could be made available on NOMMU systems?

> +extern int sysctl_mmap_noexec_taint;

The term "taint" has a specific meaning in the kernel (see
add_taint()).  It's regrettable that this patch attaches a second
meaning to that term.  Can we think of a better word to use?

A better word would communicate the sense of the sysctl operation.  If
a "taint" flag is set to true, I don't know whether that means that
noexec is enabled or disabled.  Something like
sysctl_mmap_noexec_override or sysctl_mmap_noexec_disable, perhaps.

This patch forgot to document the new feature and its sysctl. 
Documentation/sysctl/vm.txt might be the right place.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 21:54   ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2011-08-16 21:54 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Peter Zijlstra, Al Viro,
	Eric Paris, Andrea Arcangeli, Mel Gorman, Rik van Riel,
	Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Mon, 15 Aug 2011 15:57:35 -0500
Will Drewry <wad@chromium.org> wrote:

> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint.  It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.
> 
> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions.  Often, a
> file in /tmp provides this functionality.  However, on distributions
> that are more restrictive/paranoid, world-writeable directories are
> often mounted "noexec".  The only workaround to support software that
> needs this behavior is to either not use that software or remount /tmp
> exec.

Remounting /tmp would appear to have the same effect as altering this
sysctl, so why not just remount /tmp?

>  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
> the only recourse is using SysV IPC, the application programmer loses
> many of the useful ABI features that they get using a mmap'd file (and
> as such are often hesitant to explore that more painful path).
> 
> With this patch, it would be possible to change the sysctl variable
> such that mprotect(PROT_EXEC) would succeed.  In cases like the example
> above, an additional userspace mmap-wrapper would be needed, but in
> other cases, like how code.google.com/p/nativeclient mmap()s then
> mprotect()s, the behavior would be unaffected.
> 
> The tradeoff is a loss of defense in depth, but it seems reasonable when
> the alternative is to disable the defense entirely.
> 
> ...
>
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -89,6 +89,9 @@
>  /* External variables not in a header file. */
>  extern int sysctl_overcommit_memory;
>  extern int sysctl_overcommit_ratio;
> +#ifdef CONFIG_MMU

The ifdef isn't needed in the header and we generally omit it to avoid
clutter.

afaict this feature could be made available on NOMMU systems?

> +extern int sysctl_mmap_noexec_taint;

The term "taint" has a specific meaning in the kernel (see
add_taint()).  It's regrettable that this patch attaches a second
meaning to that term.  Can we think of a better word to use?

A better word would communicate the sense of the sysctl operation.  If
a "taint" flag is set to true, I don't know whether that means that
noexec is enabled or disabled.  Something like
sysctl_mmap_noexec_override or sysctl_mmap_noexec_disable, perhaps.

This patch forgot to document the new feature and its sysctl. 
Documentation/sysctl/vm.txt might be the right place.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 21:54   ` Andrew Morton
@ 2011-08-16 22:35     ` Will Drewry
  -1 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 22:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Peter Zijlstra, Al Viro,
	Eric Paris, Andrea Arcangeli, Mel Gorman, Rik van Riel,
	Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 4:54 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 15 Aug 2011 15:57:35 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> This patch proposes a sysctl knob that allows a privileged user to
>> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
>> mountpoint.  It does not alter the normal behavior resulting from
>> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
>> of any other subsystems checking MNT_NOEXEC.
>>
>> It is motivated by a common /dev/shm, /tmp usecase. There are few
>> facilities for creating a shared memory segment that can be remapped in
>> the same process address space with different permissions.  Often, a
>> file in /tmp provides this functionality.  However, on distributions
>> that are more restrictive/paranoid, world-writeable directories are
>> often mounted "noexec".  The only workaround to support software that
>> needs this behavior is to either not use that software or remount /tmp
>> exec.
>
> Remounting /tmp would appear to have the same effect as altering this
> sysctl, so why not just remount /tmp?

The main difference is that you still achieve the primary goals of
noexec without the secondary:
1. exec still fails
2. mmap(PROT_EXEC) still fails

This means that with a common gnu-ish userspace, it's not possible to
execute an arbitrary binary in /tmp or use it as a preload or dlopen()
source.  It's like half-noexec.

>>  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
>> the only recourse is using SysV IPC, the application programmer loses
>> many of the useful ABI features that they get using a mmap'd file (and
>> as such are often hesitant to explore that more painful path).
>>
>> With this patch, it would be possible to change the sysctl variable
>> such that mprotect(PROT_EXEC) would succeed.  In cases like the example
>> above, an additional userspace mmap-wrapper would be needed, but in
>> other cases, like how code.google.com/p/nativeclient mmap()s then
>> mprotect()s, the behavior would be unaffected.
>>
>> The tradeoff is a loss of defense in depth, but it seems reasonable when
>> the alternative is to disable the defense entirely.
>>
>> ...
>>
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -89,6 +89,9 @@
>>  /* External variables not in a header file. */
>>  extern int sysctl_overcommit_memory;
>>  extern int sysctl_overcommit_ratio;
>> +#ifdef CONFIG_MMU
>
> The ifdef isn't needed in the header and we generally omit it to avoid
> clutter.

Thanks - I'll remove it!

> afaict this feature could be made available on NOMMU systems?

When I poked around I didn't see VM_MAYEXEC being used in NOMMU
systems, but I may have just been misreading!  I'll relook.

>> +extern int sysctl_mmap_noexec_taint;
>
> The term "taint" has a specific meaning in the kernel (see
> add_taint()).  It's regrettable that this patch attaches a second
> meaning to that term.  Can we think of a better word to use?
>
> A better word would communicate the sense of the sysctl operation.  If
> a "taint" flag is set to true, I don't know whether that means that
> noexec is enabled or disabled.  Something like
> sysctl_mmap_noexec_override or sysctl_mmap_noexec_disable, perhaps.

Thanks for the good points and suggestions.  Maybe something like
  sysctl_mprotect_ignores_noexec
would reflect this more closely, though still not quite as accurately
as your examples.
(hrm, maybe sysctl_mmap_noexec_propagates)

> This patch forgot to document the new feature and its sysctl.
> Documentation/sysctl/vm.txt might be the right place.

I will add that along with the changes from your other comments.

Thanks!
will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
@ 2011-08-16 22:35     ` Will Drewry
  0 siblings, 0 replies; 19+ messages in thread
From: Will Drewry @ 2011-08-16 22:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mcgrathr, Ingo Molnar, Peter Zijlstra, Al Viro,
	Eric Paris, Andrea Arcangeli, Mel Gorman, Rik van Riel,
	Nitin Gupta, Hugh Dickins, Shaohua Li, linux-mm

On Tue, Aug 16, 2011 at 4:54 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 15 Aug 2011 15:57:35 -0500
> Will Drewry <wad@chromium.org> wrote:
>
>> This patch proposes a sysctl knob that allows a privileged user to
>> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
>> mountpoint.  It does not alter the normal behavior resulting from
>> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
>> of any other subsystems checking MNT_NOEXEC.
>>
>> It is motivated by a common /dev/shm, /tmp usecase. There are few
>> facilities for creating a shared memory segment that can be remapped in
>> the same process address space with different permissions.  Often, a
>> file in /tmp provides this functionality.  However, on distributions
>> that are more restrictive/paranoid, world-writeable directories are
>> often mounted "noexec".  The only workaround to support software that
>> needs this behavior is to either not use that software or remount /tmp
>> exec.
>
> Remounting /tmp would appear to have the same effect as altering this
> sysctl, so why not just remount /tmp?

The main difference is that you still achieve the primary goals of
noexec without the secondary:
1. exec still fails
2. mmap(PROT_EXEC) still fails

This means that with a common gnu-ish userspace, it's not possible to
execute an arbitrary binary in /tmp or use it as a preload or dlopen()
source.  It's like half-noexec.

>>  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
>> the only recourse is using SysV IPC, the application programmer loses
>> many of the useful ABI features that they get using a mmap'd file (and
>> as such are often hesitant to explore that more painful path).
>>
>> With this patch, it would be possible to change the sysctl variable
>> such that mprotect(PROT_EXEC) would succeed.  In cases like the example
>> above, an additional userspace mmap-wrapper would be needed, but in
>> other cases, like how code.google.com/p/nativeclient mmap()s then
>> mprotect()s, the behavior would be unaffected.
>>
>> The tradeoff is a loss of defense in depth, but it seems reasonable when
>> the alternative is to disable the defense entirely.
>>
>> ...
>>
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -89,6 +89,9 @@
>>  /* External variables not in a header file. */
>>  extern int sysctl_overcommit_memory;
>>  extern int sysctl_overcommit_ratio;
>> +#ifdef CONFIG_MMU
>
> The ifdef isn't needed in the header and we generally omit it to avoid
> clutter.

Thanks - I'll remove it!

> afaict this feature could be made available on NOMMU systems?

When I poked around I didn't see VM_MAYEXEC being used in NOMMU
systems, but I may have just been misreading!  I'll relook.

>> +extern int sysctl_mmap_noexec_taint;
>
> The term "taint" has a specific meaning in the kernel (see
> add_taint()).  It's regrettable that this patch attaches a second
> meaning to that term.  Can we think of a better word to use?
>
> A better word would communicate the sense of the sysctl operation.  If
> a "taint" flag is set to true, I don't know whether that means that
> noexec is enabled or disabled.  Something like
> sysctl_mmap_noexec_override or sysctl_mmap_noexec_disable, perhaps.

Thanks for the good points and suggestions.  Maybe something like
  sysctl_mprotect_ignores_noexec
would reflect this more closely, though still not quite as accurately
as your examples.
(hrm, maybe sysctl_mmap_noexec_propagates)

> This patch forgot to document the new feature and its sysctl.
> Documentation/sysctl/vm.txt might be the right place.

I will add that along with the changes from your other comments.

Thanks!
will

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint
  2011-08-16 17:07     ` Roland McGrath
  (?)
  (?)
@ 2011-08-17 23:22     ` Valdis.Kletnieks
  -1 siblings, 0 replies; 19+ messages in thread
From: Valdis.Kletnieks @ 2011-08-17 23:22 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Mel Gorman, Will Drewry, linux-kernel, Ingo Molnar,
	Andrew Morton, Peter Zijlstra, Al Viro, Eric Paris,
	Andrea Arcangeli, Rik van Riel, Nitin Gupta, Hugh Dickins,
	Shaohua Li, linux-mm

[-- Attachment #1: Type: text/plain, Size: 795 bytes --]

On Tue, 16 Aug 2011 10:07:46 PDT, Roland McGrath said:

> I think the expectation is that the administrator or system builder
> who decides to set the (non-default) noexec mount option will also
> set the sysctl at the same time.

On the other hand, a design that requires 2 separate actions to be taken in
order to make it work, and which fails unsafe if the second step isn't taken,
is a bad design. If we're talking "expectations", let's not forget that the
mount option is called "noexec", not "only-really-noexec-if-you-set-a-magic-sysctl". 

I'll also point out that we didn't add a sysctl in 2.6.0 to say whether or not
to still allow the old "/lib/ld-linux.so your-binary-here" hack to execute binaries
off a partition mounted noexec - we simply said "this will no longer be permitted".

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-08-17 23:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-15 20:57 [PATCH] mmap: add sysctl for controlling ~VM_MAYEXEC taint Will Drewry
2011-08-15 20:57 ` Will Drewry
2011-08-16  9:33 ` Mel Gorman
2011-08-16  9:33   ` Mel Gorman
2011-08-16 17:07   ` Roland McGrath
2011-08-16 17:07     ` Roland McGrath
2011-08-16 19:40     ` Mel Gorman
2011-08-16 19:40       ` Mel Gorman
2011-08-16 19:46       ` Roland McGrath
2011-08-16 19:46         ` Roland McGrath
2011-08-16 19:50       ` Will Drewry
2011-08-16 19:50         ` Will Drewry
2011-08-16 19:50         ` Will Drewry
2011-08-16 19:50           ` Will Drewry
2011-08-17 23:22     ` Valdis.Kletnieks
2011-08-16 21:54 ` Andrew Morton
2011-08-16 21:54   ` Andrew Morton
2011-08-16 22:35   ` Will Drewry
2011-08-16 22:35     ` Will Drewry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.