linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint
@ 2016-08-26 16:30 robert.foss
  2016-08-26 17:17 ` kbuild test robot
  2016-08-26 21:32 ` Kirill A. Shutemov
  0 siblings, 2 replies; 4+ messages in thread
From: robert.foss @ 2016-08-26 16:30 UTC (permalink / raw)
  To: akpm, kirill.shutemov, vbabka, mhocko, mingo, dave.hansen,
	hannes, dan.j.williams, iamjoonsoo.kim, acme, keescook, mgorman,
	atomlin, hughd, dyoung, viro, dcashman, w, idryomov, yang.shi,
	wad, vkuznets, vdavydov, vitalywool, oleg, gang.chen.5i5j,
	koct9i, aarcange, aryabinin, kuleshovmail, minchan, mguzik,
	linux-mm, linux-kernel, krasin, Roland McGrath,
	Mandeep Singh Baines, Ben Zhang, Filipe Brandenburger
  Cc: Robert Foss

From: Will Drewry <wad@chromium.org>

This patch proposes a sysctl knob that allows a privileged user to
disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
mountpoint.  It does not alter the normal behavior resulting from
attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
of any other subsystems checking MNT_NOEXEC.

It is motivated by a common /dev/shm, /tmp usecase. There are few
facilities for creating a shared memory segment that can be remapped in
the same process address space with different permissions.  Often, a
file in /tmp provides this functionality.  However, on distributions
that are more restrictive/paranoid, world-writeable directories are
often mounted "noexec".  The only workaround to support software that
needs this behavior is to either not use that software or remount /tmp
exec.  (E.g., https://bugs.gentoo.org/350336?id=350336)  Given that
the only recourse is using SysV IPC, the application programmer loses
many of the useful ABI features that they get using a mmap'd file.

With this patch, it would be possible to change the sysctl variable
such that mprotect(PROT_EXEC) would succeed.  In cases like the example
above, an additional userspace mmap-wrapper would be needed, but in
other cases, like how code.google.com/p/nativeclient mmap()s then
mprotect()s, the behavior would be unaffected.

The tradeoff is a loss of defense in depth, but it seems reasonable when
the alternative is frequently to disable the defense entirely.

(There are many other ways to approach this problem, but this seemed to
 be the most practical and feel the least like a hack or a major change.)

Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Tested-by: Robert Foss <robert.foss@collabora.com>
---
 include/linux/mm.h |  2 ++
 kernel/sysctl.c    |  9 +++++++++
 mm/Kconfig         | 17 +++++++++++++++++
 mm/mmap.c          |  3 ++-
 mm/util.c          |  1 +
 5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..e2090c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -108,6 +108,8 @@ extern int mmap_rnd_compat_bits __read_mostly;
 
 extern int sysctl_max_map_count;
 
+extern int sysctl_mmap_noexec_taint;
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b43d0b2..ab1d714 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1564,6 +1564,15 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= mmap_min_addr_handler,
 	},
+	{
+		.procname	= "mmap_noexec_taint",
+		.data		= &sysctl_mmap_noexec_taint,
+		.maxlen		= sizeof(sysctl_mmap_noexec_taint),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 #endif
 #ifdef CONFIG_NUMA
 	{
diff --git a/mm/Kconfig b/mm/Kconfig
index 78a23c5..08d9bc8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -353,6 +353,23 @@ config DEFAULT_MMAP_MIN_ADDR
 	  This value can be changed after boot using the
 	  /proc/sys/vm/mmap_min_addr tunable.
 
+config MMAP_NOEXEC_TAINT
+	int "Turns on tainting of mmap()d files from noexec mountpoints"
+	depends on MMU
+	default 1
+	help
+	  By default, the ability to change the protections of a virtual
+	  memory area to allow execution depend on if the vma has the
+	  VM_MAYEXEC flag.  When mapping regions from files, VM_MAYEXEC
+	  will be unset if the containing mountpoint is mounted MNT_NOEXEC.
+	  By setting the value to 0, any mmap()d region may be later
+	  mprotect()d with PROT_EXEC.
+
+	  If unsure, keep the value set to 1.
+
+	  This value can be changed after boot using the
+	  /proc/sys/vm/mmap_noexec_taint tunable.
+
 config ARCH_SUPPORTS_MEMORY_FAILURE
 	bool
 
diff --git a/mm/mmap.c b/mm/mmap.c
index ca9d91b..b8be093 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1246,7 +1246,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			if (path_noexec(&file->f_path)) {
 				if (vm_flags & VM_EXEC)
 					return -EPERM;
-				vm_flags &= ~VM_MAYEXEC;
+				if (sysctl_mmap_noexec_taint)
+					vm_flags &= ~VM_MAYEXEC;
 			}
 
 			if (!file->f_op->mmap)
diff --git a/mm/util.c b/mm/util.c
index 662cddf..701f0a3 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -430,6 +430,7 @@ int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
 int sysctl_overcommit_ratio __read_mostly = 50;
 unsigned long sysctl_overcommit_kbytes __read_mostly;
 int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
+int sysctl_mmap_noexec_taint __read_mostly = CONFIG_MMAP_NOEXEC_TAINT;
 unsigned long sysctl_user_reserve_kbytes __read_mostly = 1UL << 17; /* 128MB */
 unsigned long sysctl_admin_reserve_kbytes __read_mostly = 1UL << 13; /* 8MB */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint
  2016-08-26 16:30 [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint robert.foss
@ 2016-08-26 17:17 ` kbuild test robot
  2016-08-26 21:32 ` Kirill A. Shutemov
  1 sibling, 0 replies; 4+ messages in thread
From: kbuild test robot @ 2016-08-26 17:17 UTC (permalink / raw)
  To: robert.foss
  Cc: kbuild-all, akpm, kirill.shutemov, vbabka, mhocko, mingo,
	dave.hansen, hannes, dan.j.williams, iamjoonsoo.kim, acme,
	keescook, mgorman, atomlin, hughd, dyoung, viro, dcashman, w,
	idryomov, yang.shi, wad, vkuznets, vdavydov, vitalywool, oleg,
	gang.chen.5i5j, koct9i, aarcange, aryabinin, kuleshovmail,
	minchan, mguzik, linux-mm, linux-kernel, krasin, Roland McGrath,
	Mandeep Singh Baines, Ben Zhang, Filipe Brandenburger,
	Robert Foss

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

Hi Will,

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.8-rc3 next-20160825]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/robert-foss-collabora-com/mm-sysctl-Add-sysctl-for-controlling-VM_MAYEXEC-taint/20160827-003416
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

>> mm/util.c:433:46: error: 'CONFIG_MMAP_NOEXEC_TAINT' undeclared here (not in a function)
    int sysctl_mmap_noexec_taint __read_mostly = CONFIG_MMAP_NOEXEC_TAINT;
                                                 ^

vim +/CONFIG_MMAP_NOEXEC_TAINT +433 mm/util.c

   427	EXPORT_SYMBOL_GPL(__page_mapcount);
   428	
   429	int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
   430	int sysctl_overcommit_ratio __read_mostly = 50;
   431	unsigned long sysctl_overcommit_kbytes __read_mostly;
   432	int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
 > 433	int sysctl_mmap_noexec_taint __read_mostly = CONFIG_MMAP_NOEXEC_TAINT;
   434	unsigned long sysctl_user_reserve_kbytes __read_mostly = 1UL << 17; /* 128MB */
   435	unsigned long sysctl_admin_reserve_kbytes __read_mostly = 1UL << 13; /* 8MB */
   436	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 10558 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint
  2016-08-26 16:30 [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint robert.foss
  2016-08-26 17:17 ` kbuild test robot
@ 2016-08-26 21:32 ` Kirill A. Shutemov
       [not found]   ` <CAAFS_9HiuMt=Xy=YXmvw0+kqcXw=8qXTx2-2bXaqPc_rjtRZgw@mail.gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2016-08-26 21:32 UTC (permalink / raw)
  To: robert.foss
  Cc: akpm, kirill.shutemov, vbabka, mhocko, mingo, dave.hansen,
	hannes, dan.j.williams, iamjoonsoo.kim, acme, keescook, mgorman,
	atomlin, hughd, dyoung, viro, dcashman, w, idryomov, yang.shi,
	wad, vkuznets, vdavydov, vitalywool, oleg, gang.chen.5i5j,
	koct9i, aarcange, aryabinin, kuleshovmail, minchan, mguzik,
	linux-mm, linux-kernel, krasin, Roland McGrath,
	Mandeep Singh Baines, Ben Zhang, Filipe Brandenburger

On Fri, Aug 26, 2016 at 12:30:04PM -0400, robert.foss@collabora.com wrote:
> From: Will Drewry <wad@chromium.org>
> 
> This patch proposes a sysctl knob that allows a privileged user to
> disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
> mountpoint.  It does not alter the normal behavior resulting from
> attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
> of any other subsystems checking MNT_NOEXEC.

Wouldn't it be equal to remounting all filesystems without noexec from
attacker POV? It's hardly a fence to make additional mprotect(PROT_EXEC)
call, before starting executing code from such filesystems.

If administrator of the system wants this, he can just mount filesystem
without noexec, no new kernel code required. And it's more fine-grained
than this.

So, no, I don't think we should add knob like this. Unless I miss
something.

NAK.

> It is motivated by a common /dev/shm, /tmp usecase. There are few
> facilities for creating a shared memory segment that can be remapped in
> the same process address space with different permissions.

What about using memfd_create(2) for such cases? You'll get a file
descriptor from in-kernel tmpfs (shm_mnt) which is not exposed to
userspace for remount as noexec.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint
       [not found]   ` <CAAFS_9HiuMt=Xy=YXmvw0+kqcXw=8qXTx2-2bXaqPc_rjtRZgw@mail.gmail.com>
@ 2016-08-29 15:31     ` Robert Foss
  0 siblings, 0 replies; 4+ messages in thread
From: Robert Foss @ 2016-08-29 15:31 UTC (permalink / raw)
  To: Will Drewry, Kirill A. Shutemov
  Cc: Andrew Morton, kirill.shutemov, vbabka, mhocko, mingo,
	dave.hansen, hannes, dan.j.williams, iamjoonsoo.kim, acme,
	Kees Cook, mgorman, atomlin, Hugh Dickins, dyoung, Al Viro,
	Daniel Cashman, w, idryomov, yang.shi, vkuznets, vdavydov,
	vitalywool, oleg, gang.chen.5i5j, koct9i, aarcange, aryabinin,
	kuleshovmail, minchan, mguzik, linux-mm, linux-kernel,
	Ivan Krasin, Roland McGrath, Mandeep Singh Baines, Ben Zhang,
	Filipe Brandenburger



On 2016-08-29 11:25 AM, Will Drewry wrote:
>
>
> On Fri, Aug 26, 2016 at 4:32 PM, Kirill A. Shutemov
> <kirill@shutemov.name <mailto:kirill@shutemov.name>> wrote:
>
>     On Fri, Aug 26, 2016 at 12:30:04PM -0400, robert.foss@collabora.com
>     <mailto:robert.foss@collabora.com> wrote:
>     > From: Will Drewry <wad@chromium.org <mailto:wad@chromium.org>>
>     >
>     > This patch proposes a sysctl knob that allows a privileged user to
>     > disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
>     > mountpoint.  It does not alter the normal behavior resulting from
>     > attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
>     > of any other subsystems checking MNT_NOEXEC.
>
>     Wouldn't it be equal to remounting all filesystems without noexec from
>     attacker POV? It's hardly a fence to make additional mprotect(PROT_EXEC)
>     call, before starting executing code from such filesystems.
>
>     If administrator of the system wants this, he can just mount filesystem
>     without noexec, no new kernel code required. And it's more fine-grained
>     than this.
>
>     So, no, I don't think we should add knob like this. Unless I miss
>     something.
>
>
> I don't believe this patch is necessary anymore (though, thank you
> Robert for testing and re-sending!).
>
> The primary offenders wrt to needing to mmap/mprotect a file in /dev/shm
> was the older nvidia
> driver (binary only iirc) and the Chrome Native Client code.
>
> The reason why half-exec is an "ok" (half) mitigation is because it
> blocks simple gadgets and other paths for using loadable libraries or
> binaries (via glibc) as it disallows mmap(PROT_EXEC) even though it
> allows mprotect(PROT_EXEC).  This stops ld in its tracks since it does
> the obvious thing and uses mmap(PROT_EXEC).
>
> I think time has marched on and this patch is now something I can toss
> in the dustbin of history. Both Chrome's Native Client and an older
> nvidia driver relied on creating-then-unlinking a file in tmpfs, but
> there is now a better facility!
>
>
>     NAK.
>
>
> Agreed - this is old and software that predicated it should be gone.. I
> hope. :)

Splendid, patch dropped!
Thanks Will and Kirill!


Rob.

>
>
>
>     > It is motivated by a common /dev/shm, /tmp usecase. There are few
>     > facilities for creating a shared memory segment that can be remapped in
>     > the same process address space with different permissions.
>
>     What about using memfd_create(2) for such cases? You'll get a file
>     descriptor from in-kernel tmpfs (shm_mnt) which is not exposed to
>     userspace for remount as noexec.
>
>
> This is a relatively old patch ( https://lwn.net/Articles/455256/
> <https://lwn.net/Articles/455256/> ) which predated memfd_create().
>  memfd_create() is the right solution to this problem!
>
>
> Thanks again!
> will

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-29 15:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-26 16:30 [PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint robert.foss
2016-08-26 17:17 ` kbuild test robot
2016-08-26 21:32 ` Kirill A. Shutemov
     [not found]   ` <CAAFS_9HiuMt=Xy=YXmvw0+kqcXw=8qXTx2-2bXaqPc_rjtRZgw@mail.gmail.com>
2016-08-29 15:31     ` Robert Foss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).