netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
@ 2014-06-26 12:55 Piotr Wilczek
  2014-06-26 12:55 ` [PATCH net-next V2 1/2] lib:proc_info:add library to get process information Piotr Wilczek
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Piotr Wilczek @ 2014-06-26 12:55 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Kyungmin Park, Juho Son, Bartlomiej Zolnierkiewicz,
	Jan Kaluza, Piotr Wilczek

Server-like processes in many cases need credentials and other
metadata of the peer, to decide if the calling process is allowed to
request a specific action, or the server just wants to log away this
type of information for auditing tasks.

The current practice to retrieve such process metadata is to look that
information up in procfs with the $PID received over SCM_CREDENTIALS.
This is sufficient for long-running tasks, but introduces a race which
cannot be worked around for short-living processes; the calling
process and all the information in /proc/$PID/ is gone before the
receiver of the socket message can look it up.

Changes introduced in this patchset can also increase performance
of such server-like processes, because current way of opening and
parsing /proc/$PID/* files is much more expensive than receiving these
metadata using SCM.

As an example, this patch set improves systemd-journald performance
by about 20%. Generally, performance improvement depends on how heavily
procfs is read the calling process.
http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467

This patch set is split in two patches:
- the first adds library to retrive process information without
dependency on procfs.
- the second introduces a new SCM type called SCM_PROCINFO to optionally
allow the direct attaching of process status to SCM.

This patchset is reworked version of:
https://lkml.org/lkml/2014/1/13/42

Changes for v2:
 - new patch with new proc info library
 - removed extern declaration
 - removed duplication code for parsing process information

Piotr Wilczek (2):
  lib:proc_info:add library to get process information
  Send process status in SCM_PROCINFO

 arch/alpha/include/uapi/asm/socket.h   |   2 +
 arch/avr32/include/uapi/asm/socket.h   |   2 +
 arch/cris/include/uapi/asm/socket.h    |   2 +
 arch/frv/include/uapi/asm/socket.h     |   2 +
 arch/ia64/include/uapi/asm/socket.h    |   2 +
 arch/m32r/include/uapi/asm/socket.h    |   2 +
 arch/mips/include/uapi/asm/socket.h    |   2 +
 arch/mn10300/include/uapi/asm/socket.h |   2 +
 arch/parisc/include/uapi/asm/socket.h  |   2 +
 arch/powerpc/include/uapi/asm/socket.h |   2 +
 arch/s390/include/uapi/asm/socket.h    |   2 +
 arch/sparc/include/uapi/asm/socket.h   |   2 +
 fs/proc/Kconfig                        |   1 +
 fs/proc/array.c                        | 476 +----------------------
 fs/proc/task_mmu.c                     |  66 ----
 fs/proc/task_nommu.c                   | 114 ------
 include/linux/net.h                    |   1 +
 include/linux/proc_info.h              |  35 ++
 include/linux/socket.h                 |   2 +
 include/net/af_unix.h                  |   9 +
 include/net/scm.h                      |  33 ++
 include/uapi/asm-generic/socket.h      |   2 +
 lib/Kconfig                            |   5 +
 lib/Makefile                           |   2 +
 lib/proc_info.c                        | 686 +++++++++++++++++++++++++++++++++
 net/Kconfig                            |   1 +
 net/core/scm.c                         | 166 ++++++++
 net/core/sock.c                        |  11 +
 net/unix/af_unix.c                     |  70 ++++
 29 files changed, 1057 insertions(+), 647 deletions(-)
 create mode 100644 include/linux/proc_info.h
 create mode 100644 lib/proc_info.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 1/2] lib:proc_info:add library to get process information
  2014-06-26 12:55 [PATCH net-next V2 0/2] send process status in SCM_PROCINFO Piotr Wilczek
@ 2014-06-26 12:55 ` Piotr Wilczek
  2014-06-26 12:55 ` [PATCH net-next V2 2/2] Send process status in SCM_PROCINFO Piotr Wilczek
  2014-07-01 23:31 ` [PATCH net-next V2 0/2] send " David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Piotr Wilczek @ 2014-06-26 12:55 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Kyungmin Park, Juho Son, Bartlomiej Zolnierkiewicz,
	Jan Kaluza, Piotr Wilczek

This patch adds library to get process information without dependency
on CONFIG_PROC_FS. The new library code is moved from
fs/proc/array.c and fs/proc/task_(no)mmu.c. The functions are modified that
all process information can be retrived with a single process lock improving
performance.

Signed-off-by: Piotr Wilczek <p.wilczek@samsung.com>
---
Changes for v2:
 - new patch with new proc info library

 fs/proc/Kconfig           |   1 +
 fs/proc/array.c           | 476 +-------------------------------
 fs/proc/task_mmu.c        |  66 -----
 fs/proc/task_nommu.c      | 114 --------
 include/linux/proc_info.h |  35 +++
 lib/Kconfig               |   5 +
 lib/Makefile              |   2 +
 lib/proc_info.c           | 686 ++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 738 insertions(+), 647 deletions(-)
 create mode 100644 include/linux/proc_info.h
 create mode 100644 lib/proc_info.c

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 2183fcf..2e27c48 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -1,5 +1,6 @@
 config PROC_FS
 	bool "/proc file system support" if EXPERT
+	select PROC_INFO
 	default y
 	help
 	  This is a virtual file system providing information about the status
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 64db2bc..8be396d 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -72,481 +72,39 @@
 #include <linux/signal.h>
 #include <linux/highmem.h>
 #include <linux/file.h>
-#include <linux/fdtable.h>
 #include <linux/times.h>
 #include <linux/cpuset.h>
 #include <linux/rcupdate.h>
 #include <linux/delayacct.h>
-#include <linux/seq_file.h>
-#include <linux/pid_namespace.h>
 #include <linux/ptrace.h>
 #include <linux/tracehook.h>
-#include <linux/user_namespace.h>
+#include <linux/proc_info.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
 #include "internal.h"
 
-static inline void task_name(struct seq_file *m, struct task_struct *p)
-{
-	int i;
-	char *buf, *end;
-	char *name;
-	char tcomm[sizeof(p->comm)];
-
-	get_task_comm(tcomm, p);
-
-	seq_puts(m, "Name:\t");
-	end = m->buf + m->size;
-	buf = m->buf + m->count;
-	name = tcomm;
-	i = sizeof(tcomm);
-	while (i && (buf < end)) {
-		unsigned char c = *name;
-		name++;
-		i--;
-		*buf = c;
-		if (!c)
-			break;
-		if (c == '\\') {
-			buf++;
-			if (buf < end)
-				*buf++ = c;
-			continue;
-		}
-		if (c == '\n') {
-			*buf++ = '\\';
-			if (buf < end)
-				*buf++ = 'n';
-			continue;
-		}
-		buf++;
-	}
-	m->count = buf - m->buf;
-	seq_putc(m, '\n');
-}
-
-/*
- * The task state array is a strange "bitmap" of
- * reasons to sleep. Thus "running" is zero, and
- * you can test for combinations of others with
- * simple bit tests.
- */
-static const char * const task_state_array[] = {
-	"R (running)",		/*   0 */
-	"S (sleeping)",		/*   1 */
-	"D (disk sleep)",	/*   2 */
-	"T (stopped)",		/*   4 */
-	"t (tracing stop)",	/*   8 */
-	"X (dead)",		/*  16 */
-	"Z (zombie)",		/*  32 */
-};
-
-static inline const char *get_task_state(struct task_struct *tsk)
-{
-	unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
-
-	BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
-
-	return task_state_array[fls(state)];
-}
-
-static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
-				struct pid *pid, struct task_struct *p)
-{
-	struct user_namespace *user_ns = seq_user_ns(m);
-	struct group_info *group_info;
-	int g;
-	struct fdtable *fdt = NULL;
-	const struct cred *cred;
-	pid_t ppid, tpid;
-
-	rcu_read_lock();
-	ppid = pid_alive(p) ?
-		task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
-	tpid = 0;
-	if (pid_alive(p)) {
-		struct task_struct *tracer = ptrace_parent(p);
-		if (tracer)
-			tpid = task_pid_nr_ns(tracer, ns);
-	}
-	cred = get_task_cred(p);
-	seq_printf(m,
-		"State:\t%s\n"
-		"Tgid:\t%d\n"
-		"Ngid:\t%d\n"
-		"Pid:\t%d\n"
-		"PPid:\t%d\n"
-		"TracerPid:\t%d\n"
-		"Uid:\t%d\t%d\t%d\t%d\n"
-		"Gid:\t%d\t%d\t%d\t%d\n",
-		get_task_state(p),
-		task_tgid_nr_ns(p, ns),
-		task_numa_group_id(p),
-		pid_nr_ns(pid, ns),
-		ppid, tpid,
-		from_kuid_munged(user_ns, cred->uid),
-		from_kuid_munged(user_ns, cred->euid),
-		from_kuid_munged(user_ns, cred->suid),
-		from_kuid_munged(user_ns, cred->fsuid),
-		from_kgid_munged(user_ns, cred->gid),
-		from_kgid_munged(user_ns, cred->egid),
-		from_kgid_munged(user_ns, cred->sgid),
-		from_kgid_munged(user_ns, cred->fsgid));
-
-	task_lock(p);
-	if (p->files)
-		fdt = files_fdtable(p->files);
-	seq_printf(m,
-		"FDSize:\t%d\n"
-		"Groups:\t",
-		fdt ? fdt->max_fds : 0);
-	rcu_read_unlock();
-
-	group_info = cred->group_info;
-	task_unlock(p);
-
-	for (g = 0; g < group_info->ngroups; g++)
-		seq_printf(m, "%d ",
-			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
-	put_cred(cred);
-
-	seq_putc(m, '\n');
-}
-
-void render_sigset_t(struct seq_file *m, const char *header,
-				sigset_t *set)
-{
-	int i;
-
-	seq_puts(m, header);
-
-	i = _NSIG;
-	do {
-		int x = 0;
-
-		i -= 4;
-		if (sigismember(set, i+1)) x |= 1;
-		if (sigismember(set, i+2)) x |= 2;
-		if (sigismember(set, i+3)) x |= 4;
-		if (sigismember(set, i+4)) x |= 8;
-		seq_printf(m, "%x", x);
-	} while (i >= 4);
-
-	seq_putc(m, '\n');
-}
-
-static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
-				    sigset_t *catch)
-{
-	struct k_sigaction *k;
-	int i;
-
-	k = p->sighand->action;
-	for (i = 1; i <= _NSIG; ++i, ++k) {
-		if (k->sa.sa_handler == SIG_IGN)
-			sigaddset(ign, i);
-		else if (k->sa.sa_handler != SIG_DFL)
-			sigaddset(catch, i);
-	}
-}
-
-static inline void task_sig(struct seq_file *m, struct task_struct *p)
-{
-	unsigned long flags;
-	sigset_t pending, shpending, blocked, ignored, caught;
-	int num_threads = 0;
-	unsigned long qsize = 0;
-	unsigned long qlim = 0;
-
-	sigemptyset(&pending);
-	sigemptyset(&shpending);
-	sigemptyset(&blocked);
-	sigemptyset(&ignored);
-	sigemptyset(&caught);
-
-	if (lock_task_sighand(p, &flags)) {
-		pending = p->pending.signal;
-		shpending = p->signal->shared_pending.signal;
-		blocked = p->blocked;
-		collect_sigign_sigcatch(p, &ignored, &caught);
-		num_threads = get_nr_threads(p);
-		rcu_read_lock();  /* FIXME: is this correct? */
-		qsize = atomic_read(&__task_cred(p)->user->sigpending);
-		rcu_read_unlock();
-		qlim = task_rlimit(p, RLIMIT_SIGPENDING);
-		unlock_task_sighand(p, &flags);
-	}
-
-	seq_printf(m, "Threads:\t%d\n", num_threads);
-	seq_printf(m, "SigQ:\t%lu/%lu\n", qsize, qlim);
-
-	/* render them all */
-	render_sigset_t(m, "SigPnd:\t", &pending);
-	render_sigset_t(m, "ShdPnd:\t", &shpending);
-	render_sigset_t(m, "SigBlk:\t", &blocked);
-	render_sigset_t(m, "SigIgn:\t", &ignored);
-	render_sigset_t(m, "SigCgt:\t", &caught);
-}
-
-static void render_cap_t(struct seq_file *m, const char *header,
-			kernel_cap_t *a)
-{
-	unsigned __capi;
-
-	seq_puts(m, header);
-	CAP_FOR_EACH_U32(__capi) {
-		seq_printf(m, "%08x",
-			   a->cap[(_KERNEL_CAPABILITY_U32S-1) - __capi]);
-	}
-	seq_putc(m, '\n');
-}
-
-/* Remove non-existent capabilities */
-#define NORM_CAPS(v) (v.cap[CAP_TO_INDEX(CAP_LAST_CAP)] &= \
-				CAP_TO_MASK(CAP_LAST_CAP + 1) - 1)
-
-static inline void task_cap(struct seq_file *m, struct task_struct *p)
-{
-	const struct cred *cred;
-	kernel_cap_t cap_inheritable, cap_permitted, cap_effective, cap_bset;
-
-	rcu_read_lock();
-	cred = __task_cred(p);
-	cap_inheritable	= cred->cap_inheritable;
-	cap_permitted	= cred->cap_permitted;
-	cap_effective	= cred->cap_effective;
-	cap_bset	= cred->cap_bset;
-	rcu_read_unlock();
-
-	NORM_CAPS(cap_inheritable);
-	NORM_CAPS(cap_permitted);
-	NORM_CAPS(cap_effective);
-	NORM_CAPS(cap_bset);
-
-	render_cap_t(m, "CapInh:\t", &cap_inheritable);
-	render_cap_t(m, "CapPrm:\t", &cap_permitted);
-	render_cap_t(m, "CapEff:\t", &cap_effective);
-	render_cap_t(m, "CapBnd:\t", &cap_bset);
-}
-
-static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
-{
-#ifdef CONFIG_SECCOMP
-	seq_printf(m, "Seccomp:\t%d\n", p->seccomp.mode);
-#endif
-}
-
-static inline void task_context_switch_counts(struct seq_file *m,
-						struct task_struct *p)
-{
-	seq_printf(m,	"voluntary_ctxt_switches:\t%lu\n"
-			"nonvoluntary_ctxt_switches:\t%lu\n",
-			p->nvcsw,
-			p->nivcsw);
-}
-
-static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
-{
-	seq_puts(m, "Cpus_allowed:\t");
-	seq_cpumask(m, &task->cpus_allowed);
-	seq_putc(m, '\n');
-	seq_puts(m, "Cpus_allowed_list:\t");
-	seq_cpumask_list(m, &task->cpus_allowed);
-	seq_putc(m, '\n');
-}
-
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
 	struct mm_struct *mm = get_task_mm(task);
 
-	task_name(m, task);
-	task_state(m, ns, pid, task);
-
-	if (mm) {
-		task_mem(m, mm);
+	proc_pid_status_mm(m, ns, pid, task, mm);
+	if (mm)
 		mmput(mm);
-	}
-	task_sig(m, task);
-	task_cap(m, task);
-	task_seccomp(m, task);
-	task_cpus_allowed(m, task);
-	cpuset_task_status_allowed(m, task);
-	task_context_switch_counts(m, task);
+
 	return 0;
 }
 
 static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task, int whole)
 {
-	unsigned long vsize, eip, esp, wchan = ~0UL;
-	int priority, nice;
-	int tty_pgrp = -1, tty_nr = 0;
-	sigset_t sigign, sigcatch;
-	char state;
-	pid_t ppid = 0, pgid = -1, sid = -1;
-	int num_threads = 0;
-	int permitted;
-	struct mm_struct *mm;
-	unsigned long long start_time;
-	unsigned long cmin_flt = 0, cmaj_flt = 0;
-	unsigned long  min_flt = 0,  maj_flt = 0;
-	cputime_t cutime, cstime, utime, stime;
-	cputime_t cgtime, gtime;
-	unsigned long rsslim = 0;
-	char tcomm[sizeof(task->comm)];
-	unsigned long flags;
-
-	state = *get_task_state(task);
-	vsize = eip = esp = 0;
-	permitted = ptrace_may_access(task, PTRACE_MODE_READ | PTRACE_MODE_NOAUDIT);
-	mm = get_task_mm(task);
-	if (mm) {
-		vsize = task_vsize(mm);
-		if (permitted) {
-			eip = KSTK_EIP(task);
-			esp = KSTK_ESP(task);
-		}
-	}
-
-	get_task_comm(tcomm, task);
-
-	sigemptyset(&sigign);
-	sigemptyset(&sigcatch);
-	cutime = cstime = utime = stime = 0;
-	cgtime = gtime = 0;
-
-	if (lock_task_sighand(task, &flags)) {
-		struct signal_struct *sig = task->signal;
-
-		if (sig->tty) {
-			struct pid *pgrp = tty_get_pgrp(sig->tty);
-			tty_pgrp = pid_nr_ns(pgrp, ns);
-			put_pid(pgrp);
-			tty_nr = new_encode_dev(tty_devnum(sig->tty));
-		}
-
-		num_threads = get_nr_threads(task);
-		collect_sigign_sigcatch(task, &sigign, &sigcatch);
-
-		cmin_flt = sig->cmin_flt;
-		cmaj_flt = sig->cmaj_flt;
-		cutime = sig->cutime;
-		cstime = sig->cstime;
-		cgtime = sig->cgtime;
-		rsslim = ACCESS_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);
-
-		/* add up live thread stats at the group level */
-		if (whole) {
-			struct task_struct *t = task;
-			do {
-				min_flt += t->min_flt;
-				maj_flt += t->maj_flt;
-				gtime += task_gtime(t);
-			} while_each_thread(task, t);
-
-			min_flt += sig->min_flt;
-			maj_flt += sig->maj_flt;
-			thread_group_cputime_adjusted(task, &utime, &stime);
-			gtime += sig->gtime;
-		}
-
-		sid = task_session_nr_ns(task, ns);
-		ppid = task_tgid_nr_ns(task->real_parent, ns);
-		pgid = task_pgrp_nr_ns(task, ns);
-
-		unlock_task_sighand(task, &flags);
-	}
-
-	if (permitted && (!whole || num_threads < 2))
-		wchan = get_wchan(task);
-	if (!whole) {
-		min_flt = task->min_flt;
-		maj_flt = task->maj_flt;
-		task_cputime_adjusted(task, &utime, &stime);
-		gtime = task_gtime(task);
-	}
+	struct mm_struct *mm = get_task_mm(task);
 
-	/* scale priority and nice values from timeslices to -20..20 */
-	/* to make it look like a "normal" Unix priority/nice value  */
-	priority = task_prio(task);
-	nice = task_nice(task);
-
-	/* Temporary variable needed for gcc-2.96 */
-	/* convert timespec -> nsec*/
-	start_time =
-		(unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
-				+ task->real_start_time.tv_nsec;
-	/* convert nsec -> ticks */
-	start_time = nsec_to_clock_t(start_time);
-
-	seq_printf(m, "%d (%s) %c", pid_nr_ns(pid, ns), tcomm, state);
-	seq_put_decimal_ll(m, ' ', ppid);
-	seq_put_decimal_ll(m, ' ', pgid);
-	seq_put_decimal_ll(m, ' ', sid);
-	seq_put_decimal_ll(m, ' ', tty_nr);
-	seq_put_decimal_ll(m, ' ', tty_pgrp);
-	seq_put_decimal_ull(m, ' ', task->flags);
-	seq_put_decimal_ull(m, ' ', min_flt);
-	seq_put_decimal_ull(m, ' ', cmin_flt);
-	seq_put_decimal_ull(m, ' ', maj_flt);
-	seq_put_decimal_ull(m, ' ', cmaj_flt);
-	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(utime));
-	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(stime));
-	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cutime));
-	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cstime));
-	seq_put_decimal_ll(m, ' ', priority);
-	seq_put_decimal_ll(m, ' ', nice);
-	seq_put_decimal_ll(m, ' ', num_threads);
-	seq_put_decimal_ull(m, ' ', 0);
-	seq_put_decimal_ull(m, ' ', start_time);
-	seq_put_decimal_ull(m, ' ', vsize);
-	seq_put_decimal_ull(m, ' ', mm ? get_mm_rss(mm) : 0);
-	seq_put_decimal_ull(m, ' ', rsslim);
-	seq_put_decimal_ull(m, ' ', mm ? (permitted ? mm->start_code : 1) : 0);
-	seq_put_decimal_ull(m, ' ', mm ? (permitted ? mm->end_code : 1) : 0);
-	seq_put_decimal_ull(m, ' ', (permitted && mm) ? mm->start_stack : 0);
-	seq_put_decimal_ull(m, ' ', esp);
-	seq_put_decimal_ull(m, ' ', eip);
-	/* The signal information here is obsolete.
-	 * It must be decimal for Linux 2.0 compatibility.
-	 * Use /proc/#/status for real-time signals.
-	 */
-	seq_put_decimal_ull(m, ' ', task->pending.signal.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
-	seq_put_decimal_ull(m, ' ', wchan);
-	seq_put_decimal_ull(m, ' ', 0);
-	seq_put_decimal_ull(m, ' ', 0);
-	seq_put_decimal_ll(m, ' ', task->exit_signal);
-	seq_put_decimal_ll(m, ' ', task_cpu(task));
-	seq_put_decimal_ull(m, ' ', task->rt_priority);
-	seq_put_decimal_ull(m, ' ', task->policy);
-	seq_put_decimal_ull(m, ' ', delayacct_blkio_ticks(task));
-	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gtime));
-	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cgtime));
-
-	if (mm && permitted) {
-		seq_put_decimal_ull(m, ' ', mm->start_data);
-		seq_put_decimal_ull(m, ' ', mm->end_data);
-		seq_put_decimal_ull(m, ' ', mm->start_brk);
-		seq_put_decimal_ull(m, ' ', mm->arg_start);
-		seq_put_decimal_ull(m, ' ', mm->arg_end);
-		seq_put_decimal_ull(m, ' ', mm->env_start);
-		seq_put_decimal_ull(m, ' ', mm->env_end);
-	} else
-		seq_printf(m, " 0 0 0 0 0 0 0");
-
-	if (permitted)
-		seq_put_decimal_ll(m, ' ', task->exit_code);
-	else
-		seq_put_decimal_ll(m, ' ', 0);
-
-	seq_putc(m, '\n');
+	task_stat_mm(m, ns, pid, task, whole, mm);
 	if (mm)
 		mmput(mm);
+
 	return 0;
 }
 
@@ -565,27 +123,11 @@ int proc_tgid_stat(struct seq_file *m, struct pid_namespace *ns,
 int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
-	unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
 	struct mm_struct *mm = get_task_mm(task);
 
-	if (mm) {
-		size = task_statm(mm, &shared, &text, &data, &resident);
+	proc_pid_statm_mm(m, ns, pid, task, mm);
+	if (mm)
 		mmput(mm);
-	}
-	/*
-	 * For quick read, open code by putting numbers directly
-	 * expected format is
-	 * seq_printf(m, "%lu %lu %lu %lu 0 %lu 0\n",
-	 *               size, resident, shared, text, data);
-	 */
-	seq_put_decimal_ull(m, 0, size);
-	seq_put_decimal_ull(m, ' ', resident);
-	seq_put_decimal_ull(m, ' ', shared);
-	seq_put_decimal_ull(m, ' ', text);
-	seq_put_decimal_ull(m, ' ', 0);
-	seq_put_decimal_ull(m, ' ', data);
-	seq_put_decimal_ull(m, ' ', 0);
-	seq_putc(m, '\n');
 
 	return 0;
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..8e88002 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -19,72 +19,6 @@
 #include <asm/tlbflush.h>
 #include "internal.h"
 
-void task_mem(struct seq_file *m, struct mm_struct *mm)
-{
-	unsigned long data, text, lib, swap;
-	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
-
-	/*
-	 * Note: to minimize their overhead, mm maintains hiwater_vm and
-	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
-	 * collector of these hiwater stats must therefore get total_vm
-	 * and rss too, which will usually be the higher.  Barriers? not
-	 * worth the effort, such snapshots can always be inconsistent.
-	 */
-	hiwater_vm = total_vm = mm->total_vm;
-	if (hiwater_vm < mm->hiwater_vm)
-		hiwater_vm = mm->hiwater_vm;
-	hiwater_rss = total_rss = get_mm_rss(mm);
-	if (hiwater_rss < mm->hiwater_rss)
-		hiwater_rss = mm->hiwater_rss;
-
-	data = mm->total_vm - mm->shared_vm - mm->stack_vm;
-	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
-	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
-	swap = get_mm_counter(mm, MM_SWAPENTS);
-	seq_printf(m,
-		"VmPeak:\t%8lu kB\n"
-		"VmSize:\t%8lu kB\n"
-		"VmLck:\t%8lu kB\n"
-		"VmPin:\t%8lu kB\n"
-		"VmHWM:\t%8lu kB\n"
-		"VmRSS:\t%8lu kB\n"
-		"VmData:\t%8lu kB\n"
-		"VmStk:\t%8lu kB\n"
-		"VmExe:\t%8lu kB\n"
-		"VmLib:\t%8lu kB\n"
-		"VmPTE:\t%8lu kB\n"
-		"VmSwap:\t%8lu kB\n",
-		hiwater_vm << (PAGE_SHIFT-10),
-		total_vm << (PAGE_SHIFT-10),
-		mm->locked_vm << (PAGE_SHIFT-10),
-		mm->pinned_vm << (PAGE_SHIFT-10),
-		hiwater_rss << (PAGE_SHIFT-10),
-		total_rss << (PAGE_SHIFT-10),
-		data << (PAGE_SHIFT-10),
-		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
-		(PTRS_PER_PTE * sizeof(pte_t) *
-		 atomic_long_read(&mm->nr_ptes)) >> 10,
-		swap << (PAGE_SHIFT-10));
-}
-
-unsigned long task_vsize(struct mm_struct *mm)
-{
-	return PAGE_SIZE * mm->total_vm;
-}
-
-unsigned long task_statm(struct mm_struct *mm,
-			 unsigned long *shared, unsigned long *text,
-			 unsigned long *data, unsigned long *resident)
-{
-	*shared = get_mm_counter(mm, MM_FILEPAGES);
-	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
-								>> PAGE_SHIFT;
-	*data = mm->total_vm - mm->shared_vm;
-	*resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
-	return mm->total_vm;
-}
-
 #ifdef CONFIG_NUMA
 /*
  * These functions are for numa_maps but called in generic **maps seq_file
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 678455d..ab50d9f 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -10,120 +10,6 @@
 #include "internal.h"
 
 /*
- * Logic: we've got two memory sums for each process, "shared", and
- * "non-shared". Shared memory may get counted more than once, for
- * each process that owns it. Non-shared memory is counted
- * accurately.
- */
-void task_mem(struct seq_file *m, struct mm_struct *mm)
-{
-	struct vm_area_struct *vma;
-	struct vm_region *region;
-	struct rb_node *p;
-	unsigned long bytes = 0, sbytes = 0, slack = 0, size;
-        
-	down_read(&mm->mmap_sem);
-	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
-		vma = rb_entry(p, struct vm_area_struct, vm_rb);
-
-		bytes += kobjsize(vma);
-
-		region = vma->vm_region;
-		if (region) {
-			size = kobjsize(region);
-			size += region->vm_end - region->vm_start;
-		} else {
-			size = vma->vm_end - vma->vm_start;
-		}
-
-		if (atomic_read(&mm->mm_count) > 1 ||
-		    vma->vm_flags & VM_MAYSHARE) {
-			sbytes += size;
-		} else {
-			bytes += size;
-			if (region)
-				slack = region->vm_end - vma->vm_end;
-		}
-	}
-
-	if (atomic_read(&mm->mm_count) > 1)
-		sbytes += kobjsize(mm);
-	else
-		bytes += kobjsize(mm);
-	
-	if (current->fs && current->fs->users > 1)
-		sbytes += kobjsize(current->fs);
-	else
-		bytes += kobjsize(current->fs);
-
-	if (current->files && atomic_read(&current->files->count) > 1)
-		sbytes += kobjsize(current->files);
-	else
-		bytes += kobjsize(current->files);
-
-	if (current->sighand && atomic_read(&current->sighand->count) > 1)
-		sbytes += kobjsize(current->sighand);
-	else
-		bytes += kobjsize(current->sighand);
-
-	bytes += kobjsize(current); /* includes kernel stack */
-
-	seq_printf(m,
-		"Mem:\t%8lu bytes\n"
-		"Slack:\t%8lu bytes\n"
-		"Shared:\t%8lu bytes\n",
-		bytes, slack, sbytes);
-
-	up_read(&mm->mmap_sem);
-}
-
-unsigned long task_vsize(struct mm_struct *mm)
-{
-	struct vm_area_struct *vma;
-	struct rb_node *p;
-	unsigned long vsize = 0;
-
-	down_read(&mm->mmap_sem);
-	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
-		vma = rb_entry(p, struct vm_area_struct, vm_rb);
-		vsize += vma->vm_end - vma->vm_start;
-	}
-	up_read(&mm->mmap_sem);
-	return vsize;
-}
-
-unsigned long task_statm(struct mm_struct *mm,
-			 unsigned long *shared, unsigned long *text,
-			 unsigned long *data, unsigned long *resident)
-{
-	struct vm_area_struct *vma;
-	struct vm_region *region;
-	struct rb_node *p;
-	unsigned long size = kobjsize(mm);
-
-	down_read(&mm->mmap_sem);
-	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
-		vma = rb_entry(p, struct vm_area_struct, vm_rb);
-		size += kobjsize(vma);
-		region = vma->vm_region;
-		if (region) {
-			size += kobjsize(region);
-			size += region->vm_end - region->vm_start;
-		}
-	}
-
-	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
-		>> PAGE_SHIFT;
-	*data = (PAGE_ALIGN(mm->start_stack) - (mm->start_data & PAGE_MASK))
-		>> PAGE_SHIFT;
-	up_read(&mm->mmap_sem);
-	size >>= PAGE_SHIFT;
-	size += *text + *data;
-	*resident = size;
-	return size;
-}
-
-/*
  * display a single VMA to a sequenced file
  */
 static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
diff --git a/include/linux/proc_info.h b/include/linux/proc_info.h
new file mode 100644
index 0000000..b083499
--- /dev/null
+++ b/include/linux/proc_info.h
@@ -0,0 +1,35 @@
+#ifndef _LINUX_PROC_INFO_H
+#define _LINUX_PROC_INFO_H
+
+#include <linux/types.h>
+#include <linux/fdtable.h>
+#include <linux/seq_file.h>
+#include <linux/user_namespace.h>
+#include <linux/ptrace.h>
+
+void task_mem(struct seq_file *m, struct mm_struct *mm);
+unsigned long task_vsize(struct mm_struct *mm);
+unsigned long task_statm(struct mm_struct *mm,
+			 unsigned long *shared, unsigned long *text,
+			 unsigned long *data, unsigned long *resident);
+void task_name(struct seq_file *m, struct task_struct *p);
+const char *get_task_state(struct task_struct *tsk);
+void task_state(struct seq_file *m, struct pid_namespace *ns,
+		struct pid *pid, struct task_struct *p);
+void task_sig(struct seq_file *m, struct task_struct *p);
+void task_cap(struct seq_file *m, struct task_struct *p);
+void task_seccomp(struct seq_file *m, struct task_struct *p);
+void task_context_switch_counts(struct seq_file *m, struct task_struct *p);
+void task_cpus_allowed(struct seq_file *m, struct task_struct *task);
+void render_sigset_t(struct seq_file *m, const char *header, sigset_t *set);
+int proc_pid_status_mm(struct seq_file *m, struct pid_namespace *ns,
+		       struct pid *pid, struct task_struct *task,
+		       struct mm_struct *mm);
+int task_stat_mm(struct seq_file *m, struct pid_namespace *ns,
+		 struct pid *pid, struct task_struct *task, int whole,
+		 struct mm_struct *mm);
+int proc_pid_statm_mm(struct seq_file *m, struct pid_namespace *ns,
+		      struct pid *pid, struct task_struct *task,
+		      struct mm_struct *mm);
+
+#endif /* _LINUX_PROC_INFO_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index 334f772..037acda 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -474,4 +474,9 @@ config UCS2_STRING
 
 source "lib/fonts/Kconfig"
 
+config PROC_INFO
+	bool
+	help
+	  This option provides process information
+
 endmenu
diff --git a/lib/Makefile b/lib/Makefile
index ba967a1..f2a492c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -40,6 +40,8 @@ CFLAGS_kobject.o += -DDEBUG
 CFLAGS_kobject_uevent.o += -DDEBUG
 endif
 
+obj-$(CONFIG_PROC_INFO) += proc_info.o
+
 obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
 obj-$(CONFIG_GENERIC_PCI_IOMAP) += pci_iomap.o
 obj-$(CONFIG_HAS_IOMEM) += iomap_copy.o devres.o
diff --git a/lib/proc_info.c b/lib/proc_info.c
new file mode 100644
index 0000000..f5f86a0
--- /dev/null
+++ b/lib/proc_info.c
@@ -0,0 +1,686 @@
+/*
+ * lib/proc_info.c
+ *
+ * The code is copied from fs/proc/array.c
+ *
+ * Helper functions used for process information.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/cpuset.h>
+#include <linux/tty.h>
+#include <linux/taskstats.h>
+#include <linux/delayacct.h>
+#include <linux/proc_info.h>
+
+#ifdef CONFIG_MMU
+void task_mem(struct seq_file *m, struct mm_struct *mm)
+{
+	unsigned long data, text, lib, swap;
+	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
+
+	/*
+	 * Note: to minimize their overhead, mm maintains hiwater_vm and
+	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
+	 * collector of these hiwater stats must therefore get total_vm
+	 * and rss too, which will usually be the higher.  Barriers? not
+	 * worth the effort, such snapshots can always be inconsistent.
+	 */
+	hiwater_vm = total_vm = mm->total_vm;
+	if (hiwater_vm < mm->hiwater_vm)
+		hiwater_vm = mm->hiwater_vm;
+	hiwater_rss = total_rss = get_mm_rss(mm);
+	if (hiwater_rss < mm->hiwater_rss)
+		hiwater_rss = mm->hiwater_rss;
+
+	data = mm->total_vm - mm->shared_vm - mm->stack_vm;
+	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
+	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
+	swap = get_mm_counter(mm, MM_SWAPENTS);
+	seq_printf(m,
+		"VmPeak:\t%8lu kB\n"
+		"VmSize:\t%8lu kB\n"
+		"VmLck:\t%8lu kB\n"
+		"VmPin:\t%8lu kB\n"
+		"VmHWM:\t%8lu kB\n"
+		"VmRSS:\t%8lu kB\n"
+		"VmData:\t%8lu kB\n"
+		"VmStk:\t%8lu kB\n"
+		"VmExe:\t%8lu kB\n"
+		"VmLib:\t%8lu kB\n"
+		"VmPTE:\t%8lu kB\n"
+		"VmSwap:\t%8lu kB\n",
+		hiwater_vm << (PAGE_SHIFT-10),
+		total_vm << (PAGE_SHIFT-10),
+		mm->locked_vm << (PAGE_SHIFT-10),
+		mm->pinned_vm << (PAGE_SHIFT-10),
+		hiwater_rss << (PAGE_SHIFT-10),
+		total_rss << (PAGE_SHIFT-10),
+		data << (PAGE_SHIFT-10),
+		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
+		(PTRS_PER_PTE * sizeof(pte_t) *
+		 atomic_long_read(&mm->nr_ptes)) >> 10,
+		swap << (PAGE_SHIFT-10));
+}
+
+unsigned long task_vsize(struct mm_struct *mm)
+{
+	return PAGE_SIZE * mm->total_vm;
+}
+
+unsigned long task_statm(struct mm_struct *mm,
+			 unsigned long *shared, unsigned long *text,
+			 unsigned long *data, unsigned long *resident)
+{
+	*shared = get_mm_counter(mm, MM_FILEPAGES);
+	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
+								>> PAGE_SHIFT;
+	*data = mm->total_vm - mm->shared_vm;
+	*resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
+	return mm->total_vm;
+}
+#else
+/*
+ * Logic: we've got two memory sums for each process, "shared", and
+ * "non-shared". Shared memory may get counted more than once, for
+ * each process that owns it. Non-shared memory is counted
+ * accurately.
+ */
+void task_mem(struct seq_file *m, struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+	struct vm_region *region;
+	struct rb_node *p;
+	unsigned long bytes = 0, sbytes = 0, slack = 0, size;
+
+	down_read(&mm->mmap_sem);
+	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
+		vma = rb_entry(p, struct vm_area_struct, vm_rb);
+
+		bytes += kobjsize(vma);
+
+		region = vma->vm_region;
+		if (region) {
+			size = kobjsize(region);
+			size += region->vm_end - region->vm_start;
+		} else {
+			size = vma->vm_end - vma->vm_start;
+		}
+
+		if (atomic_read(&mm->mm_count) > 1 ||
+		    vma->vm_flags & VM_MAYSHARE) {
+			sbytes += size;
+		} else {
+			bytes += size;
+			if (region)
+				slack = region->vm_end - vma->vm_end;
+		}
+	}
+
+	if (atomic_read(&mm->mm_count) > 1)
+		sbytes += kobjsize(mm);
+	else
+		bytes += kobjsize(mm);
+
+	if (current->fs && current->fs->users > 1)
+		sbytes += kobjsize(current->fs);
+	else
+		bytes += kobjsize(current->fs);
+
+	if (current->files && atomic_read(&current->files->count) > 1)
+		sbytes += kobjsize(current->files);
+	else
+		bytes += kobjsize(current->files);
+
+	if (current->sighand && atomic_read(&current->sighand->count) > 1)
+		sbytes += kobjsize(current->sighand);
+	else
+		bytes += kobjsize(current->sighand);
+
+	bytes += kobjsize(current); /* includes kernel stack */
+
+	seq_printf(m,
+		"Mem:\t%8lu bytes\n"
+		"Slack:\t%8lu bytes\n"
+		"Shared:\t%8lu bytes\n",
+		bytes, slack, sbytes);
+
+	up_read(&mm->mmap_sem);
+}
+
+unsigned long task_vsize(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+	struct rb_node *p;
+	unsigned long vsize = 0;
+
+	down_read(&mm->mmap_sem);
+	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
+		vma = rb_entry(p, struct vm_area_struct, vm_rb);
+		vsize += vma->vm_end - vma->vm_start;
+	}
+	up_read(&mm->mmap_sem);
+	return vsize;
+}
+
+unsigned long task_statm(struct mm_struct *mm,
+			 unsigned long *shared, unsigned long *text,
+			 unsigned long *data, unsigned long *resident)
+{
+	struct vm_area_struct *vma;
+	struct vm_region *region;
+	struct rb_node *p;
+	unsigned long size = kobjsize(mm);
+
+	down_read(&mm->mmap_sem);
+	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p)) {
+		vma = rb_entry(p, struct vm_area_struct, vm_rb);
+		size += kobjsize(vma);
+		region = vma->vm_region;
+		if (region) {
+			size += kobjsize(region);
+			size += region->vm_end - region->vm_start;
+		}
+	}
+
+	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
+		>> PAGE_SHIFT;
+	*data = (PAGE_ALIGN(mm->start_stack) - (mm->start_data & PAGE_MASK))
+		>> PAGE_SHIFT;
+	up_read(&mm->mmap_sem);
+	size >>= PAGE_SHIFT;
+	size += *text + *data;
+	*resident = size;
+	return size;
+}
+#endif
+
+void task_name(struct seq_file *m, struct task_struct *p)
+{
+	int i;
+	char *buf, *end;
+	char *name;
+	char tcomm[sizeof(p->comm)];
+
+	get_task_comm(tcomm, p);
+
+	seq_puts(m, "Name:\t");
+	end = m->buf + m->size;
+	buf = m->buf + m->count;
+	name = tcomm;
+	i = sizeof(tcomm);
+	while (i && (buf < end)) {
+		unsigned char c = *name;
+
+		name++;
+		i--;
+		*buf = c;
+		if (!c)
+			break;
+		if (c == '\\') {
+			buf++;
+			if (buf < end)
+				*buf++ = c;
+			continue;
+		}
+		if (c == '\n') {
+			*buf++ = '\\';
+			if (buf < end)
+				*buf++ = 'n';
+			continue;
+		}
+		buf++;
+	}
+	m->count = buf - m->buf;
+	seq_putc(m, '\n');
+}
+
+/*
+ * The task state array is a strange "bitmap" of
+ * reasons to sleep. Thus "running" is zero, and
+ * you can test for combinations of others with
+ * simple bit tests.
+ */
+static const char * const task_state_array[] = {
+	"R (running)",		/*   0 */
+	"S (sleeping)",		/*   1 */
+	"D (disk sleep)",	/*   2 */
+	"T (stopped)",		/*   4 */
+	"t (tracing stop)",	/*   8 */
+	"X (dead)",		/*  16 */
+	"Z (zombie)",		/*  32 */
+};
+
+const char *get_task_state(struct task_struct *tsk)
+{
+	unsigned int state = (tsk->state | tsk->exit_state) & TASK_REPORT;
+
+	BUILD_BUG_ON(1 + ilog2(TASK_REPORT) != ARRAY_SIZE(task_state_array)-1);
+
+	return task_state_array[fls(state)];
+}
+
+void task_state(struct seq_file *m, struct pid_namespace *ns,
+		struct pid *pid, struct task_struct *p)
+{
+	struct user_namespace *user_ns = seq_user_ns(m);
+	struct group_info *group_info;
+	int g;
+	struct fdtable *fdt = NULL;
+	const struct cred *cred;
+	pid_t ppid, tpid;
+
+	rcu_read_lock();
+	ppid = pid_alive(p) ?
+		task_tgid_nr_ns(rcu_dereference(p->real_parent), ns) : 0;
+	tpid = 0;
+	if (pid_alive(p)) {
+		struct task_struct *tracer = ptrace_parent(p);
+
+		if (tracer)
+			tpid = task_pid_nr_ns(tracer, ns);
+	}
+	cred = get_task_cred(p);
+	seq_printf(m,
+		"State:\t%s\n"
+		"Tgid:\t%d\n"
+		"Ngid:\t%d\n"
+		"Pid:\t%d\n"
+		"PPid:\t%d\n"
+		"TracerPid:\t%d\n"
+		"Uid:\t%d\t%d\t%d\t%d\n"
+		"Gid:\t%d\t%d\t%d\t%d\n",
+		get_task_state(p),
+		task_tgid_nr_ns(p, ns),
+		task_numa_group_id(p),
+		pid_nr_ns(pid, ns),
+		ppid, tpid,
+		from_kuid_munged(user_ns, cred->uid),
+		from_kuid_munged(user_ns, cred->euid),
+		from_kuid_munged(user_ns, cred->suid),
+		from_kuid_munged(user_ns, cred->fsuid),
+		from_kgid_munged(user_ns, cred->gid),
+		from_kgid_munged(user_ns, cred->egid),
+		from_kgid_munged(user_ns, cred->sgid),
+		from_kgid_munged(user_ns, cred->fsgid));
+
+	task_lock(p);
+	if (p->files)
+		fdt = files_fdtable(p->files);
+	seq_printf(m,
+		"FDSize:\t%d\n"
+		"Groups:\t",
+		fdt ? fdt->max_fds : 0);
+	rcu_read_unlock();
+
+	group_info = cred->group_info;
+	task_unlock(p);
+
+	for (g = 0; g < group_info->ngroups; g++)
+		seq_printf(m, "%d ",
+			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
+	put_cred(cred);
+
+	seq_putc(m, '\n');
+}
+
+void render_sigset_t(struct seq_file *m, const char *header, sigset_t *set)
+{
+	int i;
+
+	seq_puts(m, header);
+
+	i = _NSIG;
+	do {
+		int x = 0;
+
+		i -= 4;
+		if (sigismember(set, i+1)) x |= 1;
+		if (sigismember(set, i+2)) x |= 2;
+		if (sigismember(set, i+3)) x |= 4;
+		if (sigismember(set, i+4)) x |= 8;
+		seq_printf(m, "%x", x);
+	} while (i >= 4);
+
+	seq_putc(m, '\n');
+}
+
+static void collect_sigign_sigcatch(struct task_struct *p, sigset_t *ign,
+				    sigset_t *catch)
+{
+	struct k_sigaction *k;
+	int i;
+
+	k = p->sighand->action;
+	for (i = 1; i <= _NSIG; ++i, ++k) {
+		if (k->sa.sa_handler == SIG_IGN)
+			sigaddset(ign, i);
+		else if (k->sa.sa_handler != SIG_DFL)
+			sigaddset(catch, i);
+	}
+}
+
+void task_sig(struct seq_file *m, struct task_struct *p)
+{
+	unsigned long flags;
+	sigset_t pending, shpending, blocked, ignored, caught;
+	int num_threads = 0;
+	unsigned long qsize = 0;
+	unsigned long qlim = 0;
+
+	sigemptyset(&pending);
+	sigemptyset(&shpending);
+	sigemptyset(&blocked);
+	sigemptyset(&ignored);
+	sigemptyset(&caught);
+
+	if (lock_task_sighand(p, &flags)) {
+		pending = p->pending.signal;
+		shpending = p->signal->shared_pending.signal;
+		blocked = p->blocked;
+		collect_sigign_sigcatch(p, &ignored, &caught);
+		num_threads = get_nr_threads(p);
+		rcu_read_lock();  /* FIXME: is this correct? */
+		qsize = atomic_read(&__task_cred(p)->user->sigpending);
+		rcu_read_unlock();
+		qlim = task_rlimit(p, RLIMIT_SIGPENDING);
+		unlock_task_sighand(p, &flags);
+	}
+
+	seq_printf(m, "Threads:\t%d\n", num_threads);
+	seq_printf(m, "SigQ:\t%lu/%lu\n", qsize, qlim);
+
+	/* render them all */
+	render_sigset_t(m, "SigPnd:\t", &pending);
+	render_sigset_t(m, "ShdPnd:\t", &shpending);
+	render_sigset_t(m, "SigBlk:\t", &blocked);
+	render_sigset_t(m, "SigIgn:\t", &ignored);
+	render_sigset_t(m, "SigCgt:\t", &caught);
+}
+
+static void render_cap_t(struct seq_file *m, const char *header,
+			 kernel_cap_t *a)
+{
+	unsigned __capi;
+
+	seq_puts(m, header);
+	CAP_FOR_EACH_U32(__capi) {
+		seq_printf(m, "%08x",
+			   a->cap[(_KERNEL_CAPABILITY_U32S-1) - __capi]);
+	}
+	seq_putc(m, '\n');
+}
+
+/* Remove non-existent capabilities */
+#define NORM_CAPS(v) (v.cap[CAP_TO_INDEX(CAP_LAST_CAP)] &= \
+				CAP_TO_MASK(CAP_LAST_CAP + 1) - 1)
+
+void task_cap(struct seq_file *m, struct task_struct *p)
+{
+	const struct cred *cred;
+	kernel_cap_t cap_inheritable, cap_permitted, cap_effective, cap_bset;
+
+	rcu_read_lock();
+	cred = __task_cred(p);
+	cap_inheritable	= cred->cap_inheritable;
+	cap_permitted	= cred->cap_permitted;
+	cap_effective	= cred->cap_effective;
+	cap_bset	= cred->cap_bset;
+	rcu_read_unlock();
+
+	NORM_CAPS(cap_inheritable);
+	NORM_CAPS(cap_permitted);
+	NORM_CAPS(cap_effective);
+	NORM_CAPS(cap_bset);
+
+	render_cap_t(m, "CapInh:\t", &cap_inheritable);
+	render_cap_t(m, "CapPrm:\t", &cap_permitted);
+	render_cap_t(m, "CapEff:\t", &cap_effective);
+	render_cap_t(m, "CapBnd:\t", &cap_bset);
+}
+
+void task_seccomp(struct seq_file *m, struct task_struct *p)
+{
+#ifdef CONFIG_SECCOMP
+	seq_printf(m, "Seccomp:\t%d\n", p->seccomp.mode);
+#endif
+}
+
+void task_context_switch_counts(struct seq_file *m, struct task_struct *p)
+{
+	seq_printf(m,	"voluntary_ctxt_switches:\t%lu\n"
+			"nonvoluntary_ctxt_switches:\t%lu\n",
+			p->nvcsw,
+			p->nivcsw);
+}
+
+void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
+{
+	seq_puts(m, "Cpus_allowed:\t");
+	seq_cpumask(m, &task->cpus_allowed);
+	seq_putc(m, '\n');
+	seq_puts(m, "Cpus_allowed_list:\t");
+	seq_cpumask_list(m, &task->cpus_allowed);
+	seq_putc(m, '\n');
+}
+
+int proc_pid_status_mm(struct seq_file *m, struct pid_namespace *ns,
+		       struct pid *pid, struct task_struct *task,
+		       struct mm_struct *mm)
+{
+	task_name(m, task);
+	task_state(m, ns, pid, task);
+
+	if (mm)
+		task_mem(m, mm);
+
+	task_sig(m, task);
+	task_cap(m, task);
+	task_seccomp(m, task);
+	task_cpus_allowed(m, task);
+	cpuset_task_status_allowed(m, task);
+	task_context_switch_counts(m, task);
+	return 0;
+}
+
+int task_stat_mm(struct seq_file *m, struct pid_namespace *ns,
+		 struct pid *pid, struct task_struct *task, int whole,
+		 struct mm_struct *mm)
+{
+	unsigned long vsize, eip, esp, wchan = ~0UL;
+	int priority, nice;
+	int tty_pgrp = -1, tty_nr = 0;
+	sigset_t sigign, sigcatch;
+	char state;
+	pid_t ppid = 0, pgid = -1, sid = -1;
+	int num_threads = 0;
+	int permitted;
+	unsigned long long start_time;
+	unsigned long cmin_flt = 0, cmaj_flt = 0;
+	unsigned long  min_flt = 0,  maj_flt = 0;
+	cputime_t cutime, cstime, utime, stime;
+	cputime_t cgtime, gtime;
+	unsigned long rsslim = 0;
+	char tcomm[sizeof(task->comm)];
+	unsigned long flags;
+
+	state = *get_task_state(task);
+	vsize = eip = esp = 0;
+	permitted = ptrace_may_access(task, PTRACE_MODE_READ | PTRACE_MODE_NOAUDIT);
+	if (mm) {
+		vsize = task_vsize(mm);
+		if (permitted) {
+			eip = KSTK_EIP(task);
+			esp = KSTK_ESP(task);
+		}
+	}
+
+	get_task_comm(tcomm, task);
+
+	sigemptyset(&sigign);
+	sigemptyset(&sigcatch);
+	cutime = cstime = utime = stime = 0;
+	cgtime = gtime = 0;
+
+	if (lock_task_sighand(task, &flags)) {
+		struct signal_struct *sig = task->signal;
+
+		if (sig->tty) {
+			struct pid *pgrp = tty_get_pgrp(sig->tty);
+
+			tty_pgrp = pid_nr_ns(pgrp, ns);
+			put_pid(pgrp);
+			tty_nr = new_encode_dev(tty_devnum(sig->tty));
+		}
+
+		num_threads = get_nr_threads(task);
+		collect_sigign_sigcatch(task, &sigign, &sigcatch);
+
+		cmin_flt = sig->cmin_flt;
+		cmaj_flt = sig->cmaj_flt;
+		cutime = sig->cutime;
+		cstime = sig->cstime;
+		cgtime = sig->cgtime;
+		rsslim = ACCESS_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);
+
+		/* add up live thread stats at the group level */
+		if (whole) {
+			struct task_struct *t = task;
+
+			do {
+				min_flt += t->min_flt;
+				maj_flt += t->maj_flt;
+				gtime += task_gtime(t);
+			} while_each_thread(task, t);
+
+			min_flt += sig->min_flt;
+			maj_flt += sig->maj_flt;
+			thread_group_cputime_adjusted(task, &utime, &stime);
+			gtime += sig->gtime;
+		}
+
+		sid = task_session_nr_ns(task, ns);
+		ppid = task_tgid_nr_ns(task->real_parent, ns);
+		pgid = task_pgrp_nr_ns(task, ns);
+
+		unlock_task_sighand(task, &flags);
+	}
+
+	if (permitted && (!whole || num_threads < 2))
+		wchan = get_wchan(task);
+	if (!whole) {
+		min_flt = task->min_flt;
+		maj_flt = task->maj_flt;
+		task_cputime_adjusted(task, &utime, &stime);
+		gtime = task_gtime(task);
+	}
+
+	/* scale priority and nice values from timeslices to -20..20 */
+	/* to make it look like a "normal" Unix priority/nice value  */
+	priority = task_prio(task);
+	nice = task_nice(task);
+
+	/* Temporary variable needed for gcc-2.96 */
+	/* convert timespec -> nsec*/
+	start_time =
+		(unsigned long long)task->real_start_time.tv_sec * NSEC_PER_SEC
+				+ task->real_start_time.tv_nsec;
+	/* convert nsec -> ticks */
+	start_time = nsec_to_clock_t(start_time);
+
+	seq_printf(m, "%d (%s) %c", pid_nr_ns(pid, ns), tcomm, state);
+	seq_put_decimal_ll(m, ' ', ppid);
+	seq_put_decimal_ll(m, ' ', pgid);
+	seq_put_decimal_ll(m, ' ', sid);
+	seq_put_decimal_ll(m, ' ', tty_nr);
+	seq_put_decimal_ll(m, ' ', tty_pgrp);
+	seq_put_decimal_ull(m, ' ', task->flags);
+	seq_put_decimal_ull(m, ' ', min_flt);
+	seq_put_decimal_ull(m, ' ', cmin_flt);
+	seq_put_decimal_ull(m, ' ', maj_flt);
+	seq_put_decimal_ull(m, ' ', cmaj_flt);
+	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(utime));
+	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(stime));
+	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cutime));
+	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cstime));
+	seq_put_decimal_ll(m, ' ', priority);
+	seq_put_decimal_ll(m, ' ', nice);
+	seq_put_decimal_ll(m, ' ', num_threads);
+	seq_put_decimal_ull(m, ' ', 0);
+	seq_put_decimal_ull(m, ' ', start_time);
+	seq_put_decimal_ull(m, ' ', vsize);
+	seq_put_decimal_ull(m, ' ', mm ? get_mm_rss(mm) : 0);
+	seq_put_decimal_ull(m, ' ', rsslim);
+	seq_put_decimal_ull(m, ' ', mm ? (permitted ? mm->start_code : 1) : 0);
+	seq_put_decimal_ull(m, ' ', mm ? (permitted ? mm->end_code : 1) : 0);
+	seq_put_decimal_ull(m, ' ', (permitted && mm) ? mm->start_stack : 0);
+	seq_put_decimal_ull(m, ' ', esp);
+	seq_put_decimal_ull(m, ' ', eip);
+	/* The signal information here is obsolete.
+	 * It must be decimal for Linux 2.0 compatibility.
+	 * Use /proc/#/status for real-time signals.
+	 */
+	seq_put_decimal_ull(m, ' ', task->pending.signal.sig[0] & 0x7fffffffUL);
+	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
+	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
+	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
+	seq_put_decimal_ull(m, ' ', wchan);
+	seq_put_decimal_ull(m, ' ', 0);
+	seq_put_decimal_ull(m, ' ', 0);
+	seq_put_decimal_ll(m, ' ', task->exit_signal);
+	seq_put_decimal_ll(m, ' ', task_cpu(task));
+	seq_put_decimal_ull(m, ' ', task->rt_priority);
+	seq_put_decimal_ull(m, ' ', task->policy);
+	seq_put_decimal_ull(m, ' ', delayacct_blkio_ticks(task));
+	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gtime));
+	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cgtime));
+
+	if (mm && permitted) {
+		seq_put_decimal_ull(m, ' ', mm->start_data);
+		seq_put_decimal_ull(m, ' ', mm->end_data);
+		seq_put_decimal_ull(m, ' ', mm->start_brk);
+		seq_put_decimal_ull(m, ' ', mm->arg_start);
+		seq_put_decimal_ull(m, ' ', mm->arg_end);
+		seq_put_decimal_ull(m, ' ', mm->env_start);
+		seq_put_decimal_ull(m, ' ', mm->env_end);
+	} else {
+		seq_printf(m, " 0 0 0 0 0 0 0");
+	}
+
+	if (permitted)
+		seq_put_decimal_ll(m, ' ', task->exit_code);
+	else
+		seq_put_decimal_ll(m, ' ', 0);
+
+	seq_putc(m, '\n');
+	return 0;
+}
+
+int proc_pid_statm_mm(struct seq_file *m, struct pid_namespace *ns,
+		      struct pid *pid, struct task_struct *task,
+		      struct mm_struct *mm)
+{
+	unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
+
+	if (mm)
+		size = task_statm(mm, &shared, &text, &data, &resident);
+
+	/*
+	 * For quick read, open code by putting numbers directly
+	 * expected format is
+	 * seq_printf(m, "%lu %lu %lu %lu 0 %lu 0\n",
+	 *               size, resident, shared, text, data);
+	 */
+	seq_put_decimal_ull(m, 0, size);
+	seq_put_decimal_ull(m, ' ', resident);
+	seq_put_decimal_ull(m, ' ', shared);
+	seq_put_decimal_ull(m, ' ', text);
+	seq_put_decimal_ull(m, ' ', 0);
+	seq_put_decimal_ull(m, ' ', data);
+	seq_put_decimal_ull(m, ' ', 0);
+	seq_putc(m, '\n');
+
+	return 0;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V2 2/2] Send process status in SCM_PROCINFO
  2014-06-26 12:55 [PATCH net-next V2 0/2] send process status in SCM_PROCINFO Piotr Wilczek
  2014-06-26 12:55 ` [PATCH net-next V2 1/2] lib:proc_info:add library to get process information Piotr Wilczek
@ 2014-06-26 12:55 ` Piotr Wilczek
  2014-07-01 23:31 ` [PATCH net-next V2 0/2] send " David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Piotr Wilczek @ 2014-06-26 12:55 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Kyungmin Park, Juho Son, Bartlomiej Zolnierkiewicz,
	Jan Kaluza, Piotr Wilczek

This introduces a new SCM type called SCM_PROCINFO to allow the direct
attaching of process status to SCM, which is significantly more
efficient and will reliably avoid the race with the round-trip over
procfs. This is optional and can be turned on by setting SO_PASSPROC.

To achieve that, new struct called unix_skb_parms_scm had to be created,
because otherwise unix_skb_parms would be too big.

This patch is reworked version of:
https://lkml.org/lkml/2014/1/13/40

Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
Signed-off-by: Piotr Wilczek <p.wilczek@samsung.com>
---
Changes for v2:
 - removed extern declaration
 - removed duplication code for parsing process information

 arch/alpha/include/uapi/asm/socket.h   |   2 +
 arch/avr32/include/uapi/asm/socket.h   |   2 +
 arch/cris/include/uapi/asm/socket.h    |   2 +
 arch/frv/include/uapi/asm/socket.h     |   2 +
 arch/ia64/include/uapi/asm/socket.h    |   2 +
 arch/m32r/include/uapi/asm/socket.h    |   2 +
 arch/mips/include/uapi/asm/socket.h    |   2 +
 arch/mn10300/include/uapi/asm/socket.h |   2 +
 arch/parisc/include/uapi/asm/socket.h  |   2 +
 arch/powerpc/include/uapi/asm/socket.h |   2 +
 arch/s390/include/uapi/asm/socket.h    |   2 +
 arch/sparc/include/uapi/asm/socket.h   |   2 +
 include/linux/net.h                    |   1 +
 include/linux/socket.h                 |   2 +
 include/net/af_unix.h                  |   9 ++
 include/net/scm.h                      |  33 +++++++
 include/uapi/asm-generic/socket.h      |   2 +
 net/Kconfig                            |   1 +
 net/core/scm.c                         | 166 +++++++++++++++++++++++++++++++++
 net/core/sock.c                        |  11 +++
 net/unix/af_unix.c                     |  70 ++++++++++++++
 21 files changed, 319 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 3de1394..29608f2 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -87,4 +87,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 6e6cd15..a9bbc63 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -80,4 +80,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
index ed94e5e..df0c641 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -82,6 +82,8 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index ca2c6e6..3cd9cb1 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -80,5 +80,7 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index a1b49ba..656ba43 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 6c9a24b..758ce55 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -80,4 +80,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index a14baa2..cc7b8f5 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -98,4 +98,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 6aa3ce1..b49368f2 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -80,4 +80,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index fe35cea..b3facfa 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -79,4 +79,6 @@
 
 #define SO_BPF_EXTENSIONS	0x4029
 
+#define SO_PASSPROC		0x402a
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index a9c3e2e..b582047 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -87,4 +87,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index e031332..ffd6fa5 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -86,4 +86,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 54d9608..875353f 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -76,6 +76,8 @@
 
 #define SO_BPF_EXTENSIONS	0x0032
 
+#define SO_PASSPROC		0x0033
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/include/linux/net.h b/include/linux/net.h
index 17d8339..38ad416 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -39,6 +39,7 @@ struct net;
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
 #define SOCK_EXTERNALLY_ALLOCATED 5
+#define SOCK_PASSPROC		6
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8e98297..a5ebf79 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -130,6 +130,8 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
 #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
 #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
 #define SCM_SECURITY	0x03		/* rw: security label		*/
+#define SCM_PROCINFO   0x05		/* rw: comm + cmdline (NULL terminated
+						array of char *) */
 
 struct ucred {
 	__u32	pid;
diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index a175ba4..05c7678 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -27,6 +27,13 @@ struct unix_address {
 	struct sockaddr_un name[0];
 };
 
+struct unix_skb_parms_scm {
+	kuid_t loginuid;
+	unsigned int sessionid;
+	char *procinfo;
+	int procinfo_len;
+};
+
 struct unix_skb_parms {
 	struct pid		*pid;		/* Skb credentials	*/
 	kuid_t			uid;
@@ -36,10 +43,12 @@ struct unix_skb_parms {
 	u32			secid;		/* Security ID		*/
 #endif
 	u32			consumed;
+	struct unix_skb_parms_scm *scm;
 };
 
 #define UNIXCB(skb) 	(*(struct unix_skb_parms *)&((skb)->cb))
 #define UNIXSID(skb)	(&UNIXCB((skb)).secid)
+#define UNIXSCM(skb)	(*(UNIXCB((skb)).scm))
 
 #define unix_state_lock(s)	spin_lock(&unix_sk(s)->lock)
 #define unix_state_unlock(s)	spin_unlock(&unix_sk(s)->lock)
diff --git a/include/net/scm.h b/include/net/scm.h
index 262532d..981aa1f 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -24,6 +24,11 @@ struct scm_fp_list {
 	struct file		*fp[SCM_MAX_FD];
 };
 
+struct scm_procinfo {
+	char *procinfo;
+	int len;
+};
+
 struct scm_cookie {
 	struct pid		*pid;		/* Skb credentials */
 	struct scm_fp_list	*fp;		/* Passed files		*/
@@ -31,6 +36,7 @@ struct scm_cookie {
 #ifdef CONFIG_SECURITY_NETWORK
 	u32			secid;		/* Passed security ID 	*/
 #endif
+	struct scm_procinfo	procinfo;	/* Skb procinfo */
 };
 
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
@@ -38,6 +44,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
 int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
 void __scm_destroy(struct scm_cookie *scm);
 struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl);
+int scm_get_current_procinfo(char **procinfo);
 
 #ifdef CONFIG_SECURITY_NETWORK
 static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct scm_cookie *scm)
@@ -49,6 +56,13 @@ static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct scm_co
 { }
 #endif /* CONFIG_SECURITY_NETWORK */
 
+static inline void scm_set_procinfo(struct scm_cookie *scm,
+				    char *procinfo, int len)
+{
+	scm->procinfo.procinfo = procinfo;
+	scm->procinfo.len = len;
+}
+
 static __inline__ void scm_set_cred(struct scm_cookie *scm,
 				    struct pid *pid, kuid_t uid, kgid_t gid)
 {
@@ -64,9 +78,17 @@ static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
 	scm->pid  = NULL;
 }
 
+static __inline__ void scm_destroy_procinfo(struct scm_cookie *scm)
+{
+	kfree(scm->procinfo.procinfo);
+	scm->procinfo.procinfo = NULL;
+	scm->procinfo.len = 0;
+}
+
 static __inline__ void scm_destroy(struct scm_cookie *scm)
 {
 	scm_destroy_cred(scm);
+	scm_destroy_procinfo(scm);
 	if (scm->fp)
 		__scm_destroy(scm);
 }
@@ -74,11 +96,18 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
 static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
 			       struct scm_cookie *scm, bool forcecreds)
 {
+	char *procinfo;
+	int len;
 	memset(scm, 0, sizeof(*scm));
 	scm->creds.uid = INVALID_UID;
 	scm->creds.gid = INVALID_GID;
 	if (forcecreds)
 		scm_set_cred(scm, task_tgid(current), current_uid(), current_gid());
+	if (test_bit(SOCK_PASSPROC, &sock->flags)) {
+		len = scm_get_current_procinfo(&procinfo);
+		if (len > 0)
+			scm_set_procinfo(scm, procinfo, len);
+	}
 	unix_get_peersec_dgram(sock, scm);
 	if (msg->msg_controllen <= 0)
 		return 0;
@@ -125,8 +154,12 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 		};
 		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
 	}
+	if (test_bit(SOCK_PASSPROC, &sock->flags))
+		put_cmsg(msg, SOL_SOCKET, SCM_PROCINFO, scm->procinfo.len,
+			 scm->procinfo.procinfo);
 
 	scm_destroy_cred(scm);
+	scm_destroy_procinfo(scm);
 
 	scm_passec(sock, msg, scm);
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index ea0796b..8ce4e94 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -82,4 +82,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PASSPROC		49
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/Kconfig b/net/Kconfig
index d92afe4..f438269 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -6,6 +6,7 @@ menuconfig NET
 	bool "Networking support"
 	select NLATTR
 	select GENERIC_NET_UTILS
+	select PROC_INFO
 	---help---
 	  Unless you really know what you are doing, you should say Y here.
 	  The reason is that some programs need kernel networking support even
diff --git a/net/core/scm.c b/net/core/scm.c
index b442e7e..ce427b4 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -28,6 +28,10 @@
 #include <linux/pid.h>
 #include <linux/nsproxy.h>
 #include <linux/slab.h>
+#include <linux/ptrace.h>
+#include <linux/fdtable.h>
+#include <linux/cpuset.h>
+#include <linux/proc_info.h>
 
 #include <asm/uaccess.h>
 
@@ -38,6 +42,9 @@
 #include <net/scm.h>
 #include <net/cls_cgroup.h>
 
+#ifdef CONFIG_USER_NS
+extern struct user_namespace init_user_ns;
+#endif
 
 /*
  *	Only allow a user to send credentials, that they could set with
@@ -339,3 +346,162 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 	return new_fpl;
 }
 EXPORT_SYMBOL(scm_fp_dup);
+
+static int get_proc_cmdline(struct mm_struct *mm, char *buffer)
+{
+	int res, i;
+	unsigned int len;
+
+	len = mm->arg_end - mm->arg_start;
+
+	if (len > PAGE_SIZE)
+		len = PAGE_SIZE;
+
+	res = access_process_vm(current, mm->arg_start, buffer, len, 0);
+
+	/* If the nul at the end of args has been overwritten, then
+	 * assume application is using setproctitle(3).
+	 */
+	if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
+		len = strnlen(buffer, res);
+		if (len < res) {
+			res = len;
+		} else {
+			len = mm->env_end - mm->env_start;
+			if (len > PAGE_SIZE - res)
+				len = PAGE_SIZE - res;
+			res += access_process_vm(current, mm->env_start,
+						 buffer+res, len, 0);
+			res = strnlen(buffer, res);
+		}
+	}
+
+	for (i = 0; i < res - 1; i++)
+		if (buffer[i] == '\0')
+			buffer[i] = ' ';
+
+	return res;
+}
+
+static char *get_proc_exe(struct mm_struct *mm)
+{
+	struct file *exe_file;
+	struct path exe_path;
+	char tmp[80], *exepathname;
+
+	exe_file = get_mm_exe_file(mm);
+	if (!exe_file)
+		return NULL;
+
+	exe_path = exe_file->f_path;
+	path_get(&exe_file->f_path);
+	fput(exe_file);
+
+	exepathname = d_path(&exe_path, tmp, sizeof(tmp));
+	path_put(&exe_path);
+
+	return exepathname;
+}
+
+static void get_task_status(struct mm_struct *mm, struct task_struct *task,
+			    char *buf, int size)
+{
+	struct seq_file m;
+	struct pid_namespace *ns = task_active_pid_ns(task);
+	struct pid *pid = get_pid(task_tgid(task));
+
+	m.buf = buf;
+	m.size = size;
+#ifdef CONFIG_USER_NS
+	m.user_ns = &init_user_ns;
+#endif
+
+	proc_pid_status_mm(&m, ns, pid, task, mm);
+}
+
+int scm_get_current_procinfo(char **procinfo)
+{
+	int res = 0;
+	unsigned int len;
+	char *pos;
+	char *buf_cmdline = NULL;
+	char *buf_status = NULL;
+	struct mm_struct *mm;
+	int comm_len = strlen(current->comm);
+	static char *str_comm = "COMM=";
+	static char *str_cmdline = "CMDLINE=";
+	static char *str_exe = "EXE=";
+	static char *str_status = "STATUS=";
+	char *exepathname;
+
+	*procinfo = NULL;
+
+	buf_cmdline = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!buf_cmdline)
+		return -ENOMEM;
+
+	buf_status = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!buf_status) {
+		res = -ENOMEM;
+		goto out;
+	}
+	memset(buf_status, 0, PAGE_SIZE);
+
+	mm = get_task_mm(current);
+	if (!mm)
+		goto out;
+	if (!mm->arg_end)
+		goto out_mm;    /* Shh! No looking before we're done */
+
+	res = get_proc_cmdline(mm, buf_cmdline);
+
+	exepathname = get_proc_exe(mm);
+	if (exepathname == NULL)
+		goto out_mm;
+
+	get_task_status(mm, current, buf_status, PAGE_SIZE);
+
+	/* strlen(comm) + \0 + len of cmdline */
+	len = strlen(str_comm) + comm_len + 1;
+	if (res)
+		len += strlen(str_cmdline) + res + 1;
+	if (!IS_ERR(exepathname))
+		len += strlen(str_exe) + strlen(exepathname) + 1;
+	if (strlen(buf_status) > 0)
+		len += strlen(str_status) + strlen(buf_status);
+
+	*procinfo = kmalloc(len, GFP_KERNEL);
+	if (!*procinfo) {
+		res = -ENOMEM;
+		goto out_mm;
+	}
+
+	pos = *procinfo;
+	pos = strcpy(*procinfo, str_comm) + strlen(str_comm);
+	pos = memcpy(pos, current->comm, comm_len + 1) + comm_len + 1;
+
+	if (res > 0) {
+		pos = strcpy(pos, str_cmdline) + strlen(str_cmdline);
+		pos = memcpy(pos, buf_cmdline, res) + res;
+	}
+
+	if (!IS_ERR(exepathname)) {
+		pos = strcpy(pos, str_exe) + strlen(str_exe);
+		pos = strcpy(pos, exepathname) + strlen(exepathname);
+		pos = strcpy(pos + 1, "") /*+ 1*/;
+	}
+
+	if (strlen(buf_status) > 0) {
+		pos = strcpy(pos, str_status) + strlen(str_status);
+		pos = strcpy(pos, buf_status) + strlen(buf_status);
+	}
+
+	res = len;
+
+out_mm:
+	mmput(mm);
+out:
+	kfree(buf_cmdline);
+	kfree(buf_status);
+	return res;
+}
diff --git a/net/core/sock.c b/net/core/sock.c
index 026e01f..1099e19 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -828,6 +828,13 @@ set_rcvbuf:
 			clear_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSPROC:
+		if (valbool)
+			set_bit(SOCK_PASSPROC, &sock->flags);
+		else
+			clear_bit(SOCK_PASSPROC, &sock->flags);
+		break;
+
 	case SO_TIMESTAMP:
 	case SO_TIMESTAMPNS:
 		if (valbool)  {
@@ -1142,6 +1149,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
 		break;
 
+	case SO_PASSPROC:
+		v.val = !!test_bit(SOCK_PASSPROC, &sock->flags);
+		break;
+
 	case SO_PEERCRED:
 	{
 		struct ucred peercred;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e968843..bc6c367 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1360,9 +1360,15 @@ static void unix_destruct_scm(struct sk_buff *skb)
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
 	scm.pid  = UNIXCB(skb).pid;
+	if (UNIXCB(skb).scm) {
+		scm.procinfo.procinfo = UNIXSCM(skb).procinfo;
+		scm.procinfo.len = UNIXSCM(skb).procinfo_len;
+	}
 	if (UNIXCB(skb).fp)
 		unix_detach_fds(&scm, skb);
 
+	kfree(UNIXCB(skb).scm);
+
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
@@ -1409,6 +1415,13 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 {
 	int err = 0;
 
+	if (!UNIXCB(skb).scm) {
+		UNIXCB(skb).scm = kmalloc(sizeof(struct unix_skb_parms_scm),
+					  GFP_KERNEL);
+		if (!UNIXCB(skb).scm)
+			return -ENOMEM;
+	}
+
 	UNIXCB(skb).pid  = get_pid(scm->pid);
 	UNIXCB(skb).uid = scm->creds.uid;
 	UNIXCB(skb).gid = scm->creds.gid;
@@ -1416,6 +1429,15 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
+	UNIXSCM(skb).procinfo = NULL;
+	if (scm->procinfo.procinfo) {
+		UNIXSCM(skb).procinfo_len = scm->procinfo.len;
+		UNIXSCM(skb).procinfo = kmemdup(scm->procinfo.procinfo,
+						scm->procinfo.len, GFP_KERNEL);
+		if (!UNIXSCM(skb).procinfo)
+			return -ENOMEM;
+	}
+
 	skb->destructor = unix_destruct_scm;
 	return err;
 }
@@ -1438,6 +1460,22 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 	}
 }
 
+
+/* Include procinfo if source or destination socket
+ * asserted SOCK_PASSPROC.
+ */
+static void maybe_add_procinfo(struct sk_buff *skb, const struct socket *sock,
+			       const struct sock *other, int len,
+			       char *procinfo)
+{
+	if (test_bit(SOCK_PASSPROC, &sock->flags) ||
+	    !other->sk_socket ||
+	    test_bit(SOCK_PASSPROC, &other->sk_socket->flags)) {
+		UNIXSCM(skb).procinfo_len = len;
+		UNIXSCM(skb).procinfo = procinfo;
+	}
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -1459,6 +1497,8 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	struct scm_cookie tmp_scm;
 	int max_level;
 	int data_len = 0;
+	char *procinfo = NULL;
+	int procinfo_len = 0;
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1540,6 +1580,12 @@ restart:
 		goto out_free;
 	}
 
+	if (test_bit(SOCK_PASSPROC, &sock->flags) ||
+	    !other->sk_socket ||
+	    test_bit(SOCK_PASSPROC, &other->sk_socket->flags)) {
+		procinfo_len = scm_get_current_procinfo(&procinfo);
+	}
+
 	unix_state_lock(other);
 	err = -EPERM;
 	if (!unix_may_send(sk, other))
@@ -1600,6 +1646,7 @@ restart:
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
 	maybe_add_creds(skb, sock, other);
+	maybe_add_procinfo(skb, sock, other, procinfo_len, procinfo);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	if (max_level > unix_sk(other)->recursion_level)
 		unix_sk(other)->recursion_level = max_level;
@@ -1638,6 +1685,8 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	bool fds_sent = false;
 	int max_level;
 	int data_len;
+	char *procinfo = NULL;
+	int procinfo_len = 0;
 
 	if (NULL == siocb->scm)
 		siocb->scm = &tmp_scm;
@@ -1701,6 +1750,12 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			goto out_err;
 		}
 
+		if (test_bit(SOCK_PASSPROC, &sock->flags) ||
+		    !other->sk_socket ||
+		    test_bit(SOCK_PASSPROC, &other->sk_socket->flags)) {
+			procinfo_len = scm_get_current_procinfo(&procinfo);
+		}
+
 		unix_state_lock(other);
 
 		if (sock_flag(other, SOCK_DEAD) ||
@@ -1708,6 +1763,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 			goto pipe_err_free;
 
 		maybe_add_creds(skb, sock, other);
+		maybe_add_procinfo(skb, sock, other, procinfo_len, procinfo);
 		skb_queue_tail(&other->sk_receive_queue, skb);
 		if (max_level > unix_sk(other)->recursion_level)
 			unix_sk(other)->recursion_level = max_level;
@@ -1837,6 +1893,12 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		memset(&tmp_scm, 0, sizeof(tmp_scm));
 	}
 	scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
+	if (UNIXCB(skb).scm && UNIXSCM(skb).procinfo)
+		scm_set_procinfo(siocb->scm,
+				 kmemdup(UNIXSCM(skb).procinfo,
+					 UNIXSCM(skb).procinfo_len,
+					 GFP_KERNEL),
+				 UNIXSCM(skb).procinfo_len);
 	unix_set_secdata(siocb->scm, skb);
 
 	if (!(flags & MSG_PEEK)) {
@@ -2023,6 +2085,14 @@ again:
 			check_creds = 1;
 		}
 
+		if (UNIXCB(skb).scm && UNIXSCM(skb).procinfo) {
+			scm_set_procinfo(siocb->scm,
+					 kmemdup(UNIXSCM(skb).procinfo,
+						 UNIXSCM(skb).procinfo_len,
+						 GFP_KERNEL),
+					 UNIXSCM(skb).procinfo_len);
+		}
+
 		/* Copy address just once */
 		if (sunaddr) {
 			unix_copy_addr(msg, skb->sk);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-06-26 12:55 [PATCH net-next V2 0/2] send process status in SCM_PROCINFO Piotr Wilczek
  2014-06-26 12:55 ` [PATCH net-next V2 1/2] lib:proc_info:add library to get process information Piotr Wilczek
  2014-06-26 12:55 ` [PATCH net-next V2 2/2] Send process status in SCM_PROCINFO Piotr Wilczek
@ 2014-07-01 23:31 ` David Miller
  2014-07-01 23:52   ` Andy Lutomirski
  2 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2014-07-01 23:31 UTC (permalink / raw)
  To: p.wilczek; +Cc: netdev, kyungmin.park, juho80.son, b.zolnierkie, jkaluza, luto

From: Piotr Wilczek <p.wilczek@samsung.com>
Date: Thu, 26 Jun 2014 14:55:52 +0200

> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
> 
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
> 
> Changes introduced in this patchset can also increase performance
> of such server-like processes, because current way of opening and
> parsing /proc/$PID/* files is much more expensive than receiving these
> metadata using SCM.
> 
> As an example, this patch set improves systemd-journald performance
> by about 20%. Generally, performance improvement depends on how heavily
> procfs is read the calling process.
> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
> 
> This patch set is split in two patches:
> - the first adds library to retrive process information without
> dependency on procfs.
> - the second introduces a new SCM type called SCM_PROCINFO to optionally
> allow the direct attaching of process status to SCM.

I really would like someone smarter than me to review the security
implications et al. of these changes before I apply them.

Andy?  Maybe you have an opinion?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-01 23:31 ` [PATCH net-next V2 0/2] send " David Miller
@ 2014-07-01 23:52   ` Andy Lutomirski
  2014-07-03  6:00     ` Piotr Wilczek
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-01 23:52 UTC (permalink / raw)
  To: David Miller
  Cc: p.wilczek, Network Development, Kyungmin Park, juho80.son,
	b.zolnierkie, jkaluza

On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net> wrote:
> From: Piotr Wilczek <p.wilczek@samsung.com>
> Date: Thu, 26 Jun 2014 14:55:52 +0200
>
>> Server-like processes in many cases need credentials and other
>> metadata of the peer, to decide if the calling process is allowed to
>> request a specific action, or the server just wants to log away this
>> type of information for auditing tasks.
>>
>> The current practice to retrieve such process metadata is to look that
>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>> This is sufficient for long-running tasks, but introduces a race which
>> cannot be worked around for short-living processes; the calling
>> process and all the information in /proc/$PID/ is gone before the
>> receiver of the socket message can look it up.
>>
>> Changes introduced in this patchset can also increase performance
>> of such server-like processes, because current way of opening and
>> parsing /proc/$PID/* files is much more expensive than receiving these
>> metadata using SCM.
>>
>> As an example, this patch set improves systemd-journald performance
>> by about 20%. Generally, performance improvement depends on how heavily
>> procfs is read the calling process.
>> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>>
>> This patch set is split in two patches:
>> - the first adds library to retrive process information without
>> dependency on procfs.
>> - the second introduces a new SCM type called SCM_PROCINFO to optionally
>> allow the direct attaching of process status to SCM.
>
> I really would like someone smarter than me to review the security
> implications et al. of these changes before I apply them.
>
> Andy?  Maybe you have an opinion?
>

The degree to which I dislike this depends on what "procinfo" is.

>From the patch, it looks like procinfo includes cmdline.  In which
case, NAK in almost the strongest possible terms.  (If it includes
userspace addresses, then I'll upgrade that to the strongest possible
terms.)  This is a giant information leak.  Imagine what happens when
a privileged program gets one of these things as stdout.  Keep in mind
that procfs has an option to hide pids that belong to other users.

Even if procinfo were made relatively innocuous, this is just an
extension of a bad interface.  If you all really want a way to
efficiently pass kernel-verified information through a unix socket,
then add something reasonable.  This means:

1. It needs to be opt-in, *per send*.  After all of the
vulnerabilities that have happened due to write(2) capturing
unexpected credentials of the caller, there's just no excuse for
adding new examples of this.  Yes, this will prevent you from getting
a magic speed-up with legacy clients, but I've spent long enough
battling this sh*t from a security and correctness POV to care.  Fix
the clients.

2. It should be easily extensible, because people keep wanting to add
new things here.

Here's a proposal for how it could work:

Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
flags, value) where type is the type of verified information and value
is what the sender wants to send.

Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
returns -EPERM.
SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
Any other flags: -EINVAL.

This could be in CMSG format or in nlattr format.  Or it could be one
cmsg per value (which might be easier).

Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
all of the sent values that have recognized types and pass
verification.

For stream sockets, presumably the sender would send an
scm_verified_info using setsockopt and the receiver would grab it
using getsockopt.

Types could include:

SCMVI_UID
SCMVI_GID
SCMVI_PID (racy, but still better than SCM_CREDENTIALS)
SCMVI_LOGONUID (but only if CONFIG_AUDITSYSCALL, and someone needs to
figure out wtf its semantics are and *document* it before I'd ack such
a thing)
SCMVI_CGROUP (sigh)
SCMVI_AUDIT_SESSIONID (because sessionid is an awful name)
etc.

I *might* get around to implementing this, but don't expect anything
soon -- my current kernel hacking bandwidth is well-occupied by
seccomp.  But I'd be happy to review.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-01 23:52   ` Andy Lutomirski
@ 2014-07-03  6:00     ` Piotr Wilczek
  2014-07-03 16:14       ` Andy Lutomirski
  2014-07-21 12:59     ` Piotr Wilczek
  2014-07-22 12:24     ` Piotr Wilczek
  2 siblings, 1 reply; 18+ messages in thread
From: Piotr Wilczek @ 2014-07-03  6:00 UTC (permalink / raw)
  To: 'Andy Lutomirski', 'David Miller'
  Cc: 'Network Development', 'Kyungmin Park',
	juho80.son, Bartlomiej Zolnierkiewicz, jkaluza



> -----Original Message-----
> From: Andy Lutomirski [mailto:luto@amacapital.net]
> Sent: Wednesday, July 02, 2014 1:52 AM
> To: David Miller
> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
> Subject: Re: [PATCH net-next V2 0/2] send process status in
> SCM_PROCINFO
> 
> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
> wrote:
> > From: Piotr Wilczek <p.wilczek@samsung.com>
> > Date: Thu, 26 Jun 2014 14:55:52 +0200
> >
> >> Server-like processes in many cases need credentials and other
> >> metadata of the peer, to decide if the calling process is allowed to
> >> request a specific action, or the server just wants to log away this
> >> type of information for auditing tasks.
> >>
> >> The current practice to retrieve such process metadata is to look
> >> that information up in procfs with the $PID received over
> SCM_CREDENTIALS.
> >> This is sufficient for long-running tasks, but introduces a race
> >> which cannot be worked around for short-living processes; the
> calling
> >> process and all the information in /proc/$PID/ is gone before the
> >> receiver of the socket message can look it up.
> >>
> >> Changes introduced in this patchset can also increase performance of
> >> such server-like processes, because current way of opening and
> >> parsing /proc/$PID/* files is much more expensive than receiving
> >> these metadata using SCM.
> >>
> >> As an example, this patch set improves systemd-journald performance
> >> by about 20%. Generally, performance improvement depends on how
> >> heavily procfs is read the calling process.
> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
> >>
> >> This patch set is split in two patches:
> >> - the first adds library to retrive process information without
> >> dependency on procfs.
> >> - the second introduces a new SCM type called SCM_PROCINFO to
> >> optionally allow the direct attaching of process status to SCM.
> >
> > I really would like someone smarter than me to review the security
> > implications et al. of these changes before I apply them.
> >
> > Andy?  Maybe you have an opinion?
> >
> 
> The degree to which I dislike this depends on what "procinfo" is.
> 
> From the patch, it looks like procinfo includes cmdline.  In which
> case, NAK in almost the strongest possible terms.  (If it includes
> userspace addresses, then I'll upgrade that to the strongest possible
> terms.)  This is a giant information leak.  Imagine what happens when a
> privileged program gets one of these things as stdout.  Keep in mind
> that procfs has an option to hide pids that belong to other users.
> 
> Even if procinfo were made relatively innocuous, this is just an
> extension of a bad interface.  If you all really want a way to
> efficiently pass kernel-verified information through a unix socket,
> then add something reasonable.  This means:
> 
> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
> that have happened due to write(2) capturing unexpected credentials of
> the caller, there's just no excuse for adding new examples of this.
> Yes, this will prevent you from getting a magic speed-up with legacy
> clients, but I've spent long enough battling this sh*t from a security
> and correctness POV to care.  Fix the clients.
> 
> 2. It should be easily extensible, because people keep wanting to add
> new things here.
> 
> Here's a proposal for how it could work:
> 
> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> flags, value) where type is the type of verified information and value
> is what the sender wants to send.
> 
> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> returns -EPERM.
> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> Any other flags: -EINVAL.
> 
> This could be in CMSG format or in nlattr format.  Or it could be one
> cmsg per value (which might be easier).
> 
> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> all of the sent values that have recognized types and pass
> verification.
> 
> For stream sockets, presumably the sender would send an
> scm_verified_info using setsockopt and the receiver would grab it using
> getsockopt.
> 
> Types could include:
> 
> SCMVI_UID
> SCMVI_GID
> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
> its semantics are and *document* it before I'd ack such a thing)
> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
> awful name) etc.
> 
> I *might* get around to implementing this, but don't expect anything
> soon -- my current kernel hacking bandwidth is well-occupied by
> seccomp.  But I'd be happy to review.
> 
> --Andy

In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?

Thanks for comments.
Piotr Wilczek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-03  6:00     ` Piotr Wilczek
@ 2014-07-03 16:14       ` Andy Lutomirski
  2014-07-04 13:26         ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-03 16:14 UTC (permalink / raw)
  To: Piotr Wilczek
  Cc: David Miller, Network Development, Kyungmin Park, juho80.son,
	Bartlomiej Zolnierkiewicz, jkaluza

On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
>
>
>> -----Original Message-----
>> From: Andy Lutomirski [mailto:luto@amacapital.net]
>> Sent: Wednesday, July 02, 2014 1:52 AM
>> To: David Miller
>> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
>> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
>> Subject: Re: [PATCH net-next V2 0/2] send process status in
>> SCM_PROCINFO
>>
>> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
>> wrote:
>> > From: Piotr Wilczek <p.wilczek@samsung.com>
>> > Date: Thu, 26 Jun 2014 14:55:52 +0200
>> >
>> >> Server-like processes in many cases need credentials and other
>> >> metadata of the peer, to decide if the calling process is allowed to
>> >> request a specific action, or the server just wants to log away this
>> >> type of information for auditing tasks.
>> >>
>> >> The current practice to retrieve such process metadata is to look
>> >> that information up in procfs with the $PID received over
>> SCM_CREDENTIALS.
>> >> This is sufficient for long-running tasks, but introduces a race
>> >> which cannot be worked around for short-living processes; the
>> calling
>> >> process and all the information in /proc/$PID/ is gone before the
>> >> receiver of the socket message can look it up.
>> >>
>> >> Changes introduced in this patchset can also increase performance of
>> >> such server-like processes, because current way of opening and
>> >> parsing /proc/$PID/* files is much more expensive than receiving
>> >> these metadata using SCM.
>> >>
>> >> As an example, this patch set improves systemd-journald performance
>> >> by about 20%. Generally, performance improvement depends on how
>> >> heavily procfs is read the calling process.
>> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>> >>
>> >> This patch set is split in two patches:
>> >> - the first adds library to retrive process information without
>> >> dependency on procfs.
>> >> - the second introduces a new SCM type called SCM_PROCINFO to
>> >> optionally allow the direct attaching of process status to SCM.
>> >
>> > I really would like someone smarter than me to review the security
>> > implications et al. of these changes before I apply them.
>> >
>> > Andy?  Maybe you have an opinion?
>> >
>>
>> The degree to which I dislike this depends on what "procinfo" is.
>>
>> From the patch, it looks like procinfo includes cmdline.  In which
>> case, NAK in almost the strongest possible terms.  (If it includes
>> userspace addresses, then I'll upgrade that to the strongest possible
>> terms.)  This is a giant information leak.  Imagine what happens when a
>> privileged program gets one of these things as stdout.  Keep in mind
>> that procfs has an option to hide pids that belong to other users.
>>
>> Even if procinfo were made relatively innocuous, this is just an
>> extension of a bad interface.  If you all really want a way to
>> efficiently pass kernel-verified information through a unix socket,
>> then add something reasonable.  This means:
>>
>> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
>> that have happened due to write(2) capturing unexpected credentials of
>> the caller, there's just no excuse for adding new examples of this.
>> Yes, this will prevent you from getting a magic speed-up with legacy
>> clients, but I've spent long enough battling this sh*t from a security
>> and correctness POV to care.  Fix the clients.
>>
>> 2. It should be easily extensible, because people keep wanting to add
>> new things here.
>>
>> Here's a proposal for how it could work:
>>
>> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
>> flags, value) where type is the type of verified information and value
>> is what the sender wants to send.
>>
>> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
>> returns -EPERM.
>> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
>> Any other flags: -EINVAL.
>>
>> This could be in CMSG format or in nlattr format.  Or it could be one
>> cmsg per value (which might be easier).
>>
>> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
>> all of the sent values that have recognized types and pass
>> verification.
>>
>> For stream sockets, presumably the sender would send an
>> scm_verified_info using setsockopt and the receiver would grab it using
>> getsockopt.
>>
>> Types could include:
>>
>> SCMVI_UID
>> SCMVI_GID
>> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
>> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
>> its semantics are and *document* it before I'd ack such a thing)
>> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
>> awful name) etc.
>>
>> I *might* get around to implementing this, but don't expect anything
>> soon -- my current kernel hacking bandwidth is well-occupied by
>> seccomp.  But I'd be happy to review.
>>
>> --Andy
>
> In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
>

The problem is neither; it's the sending of information that's
unexpected by the sender that's a problem.  Here are two concrete
issues:

1. Information leaks.  Currently, sending on a UNIX socket reveals
your uid, gid, and pid.  I would argue that that is already a mistake,
but it's well-enshrined in the API.  But imagine if sending on a UNIX
socket suddenly started revealing, say, a few bytes of sender memory
-- this would have huge security implications.  This code doesn't do
that, but it's not clear to me that revealing cmdline or random bits
of process status is much better.

2. Inadvertent assertions of authority.  In Unix (and most, but not
all, other OSes), for better or for worse, many APIs implicitly use
the caller's credentials.  For the most part, these are
well-understood, and the caller intends to use the credential.  For
example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
check access.  bind(2) checks capabilities to determine whether
binding low-numbered ports is okay.  write(2) *does not check* the
caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
is a mistake.)  Because everyone knows that write(2) does not use the
caller's authority in a potentially dangerous way, no one takes any
precautions against it.  For example, basically every setuid program
on anyone's system will happily write untrusted data to stdout or
stderr.

The problem here is that there have been multiple vulnerabilities that
result from the kernel assuming that a program that calls write(2)
actually *intends* to use its authority.  For example, at one point,
writes to /proc/PID/uid_map checked capabilities at write(2) time.
That was a root hole.  There are probably real root holes due to the
same effect on UNIX datagram sockets.

If senders have to explicitly ask to use credentials, then this type
of attack can't happen.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-03 16:14       ` Andy Lutomirski
@ 2014-07-04 13:26         ` Bartlomiej Zolnierkiewicz
  2014-07-04 15:15           ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-07-04 13:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza


Hi,

On Thursday, July 03, 2014 09:14:56 AM Andy Lutomirski wrote:
> On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Andy Lutomirski [mailto:luto@amacapital.net]
> >> Sent: Wednesday, July 02, 2014 1:52 AM
> >> To: David Miller
> >> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
> >> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
> >> Subject: Re: [PATCH net-next V2 0/2] send process status in
> >> SCM_PROCINFO
> >>
> >> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
> >> wrote:
> >> > From: Piotr Wilczek <p.wilczek@samsung.com>
> >> > Date: Thu, 26 Jun 2014 14:55:52 +0200
> >> >
> >> >> Server-like processes in many cases need credentials and other
> >> >> metadata of the peer, to decide if the calling process is allowed to
> >> >> request a specific action, or the server just wants to log away this
> >> >> type of information for auditing tasks.
> >> >>
> >> >> The current practice to retrieve such process metadata is to look
> >> >> that information up in procfs with the $PID received over
> >> SCM_CREDENTIALS.
> >> >> This is sufficient for long-running tasks, but introduces a race
> >> >> which cannot be worked around for short-living processes; the
> >> calling
> >> >> process and all the information in /proc/$PID/ is gone before the
> >> >> receiver of the socket message can look it up.
> >> >>
> >> >> Changes introduced in this patchset can also increase performance of
> >> >> such server-like processes, because current way of opening and
> >> >> parsing /proc/$PID/* files is much more expensive than receiving
> >> >> these metadata using SCM.
> >> >>
> >> >> As an example, this patch set improves systemd-journald performance
> >> >> by about 20%. Generally, performance improvement depends on how
> >> >> heavily procfs is read the calling process.
> >> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
> >> >>
> >> >> This patch set is split in two patches:
> >> >> - the first adds library to retrive process information without
> >> >> dependency on procfs.
> >> >> - the second introduces a new SCM type called SCM_PROCINFO to
> >> >> optionally allow the direct attaching of process status to SCM.
> >> >
> >> > I really would like someone smarter than me to review the security
> >> > implications et al. of these changes before I apply them.
> >> >
> >> > Andy?  Maybe you have an opinion?
> >> >
> >>
> >> The degree to which I dislike this depends on what "procinfo" is.
> >>
> >> From the patch, it looks like procinfo includes cmdline.  In which
> >> case, NAK in almost the strongest possible terms.  (If it includes
> >> userspace addresses, then I'll upgrade that to the strongest possible
> >> terms.)  This is a giant information leak.  Imagine what happens when a
> >> privileged program gets one of these things as stdout.  Keep in mind
> >> that procfs has an option to hide pids that belong to other users.
> >>
> >> Even if procinfo were made relatively innocuous, this is just an
> >> extension of a bad interface.  If you all really want a way to
> >> efficiently pass kernel-verified information through a unix socket,
> >> then add something reasonable.  This means:
> >>
> >> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
> >> that have happened due to write(2) capturing unexpected credentials of
> >> the caller, there's just no excuse for adding new examples of this.
> >> Yes, this will prevent you from getting a magic speed-up with legacy
> >> clients, but I've spent long enough battling this sh*t from a security
> >> and correctness POV to care.  Fix the clients.
> >>
> >> 2. It should be easily extensible, because people keep wanting to add
> >> new things here.
> >>
> >> Here's a proposal for how it could work:
> >>
> >> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> >> flags, value) where type is the type of verified information and value
> >> is what the sender wants to send.
> >>
> >> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> >> returns -EPERM.
> >> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> >> Any other flags: -EINVAL.
> >>
> >> This could be in CMSG format or in nlattr format.  Or it could be one
> >> cmsg per value (which might be easier).
> >>
> >> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> >> all of the sent values that have recognized types and pass
> >> verification.
> >>
> >> For stream sockets, presumably the sender would send an
> >> scm_verified_info using setsockopt and the receiver would grab it using
> >> getsockopt.
> >>
> >> Types could include:
> >>
> >> SCMVI_UID
> >> SCMVI_GID
> >> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
> >> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
> >> its semantics are and *document* it before I'd ack such a thing)
> >> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
> >> awful name) etc.
> >>
> >> I *might* get around to implementing this, but don't expect anything
> >> soon -- my current kernel hacking bandwidth is well-occupied by
> >> seccomp.  But I'd be happy to review.
> >>
> >> --Andy
> >
> > In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
> >
> 
> The problem is neither; it's the sending of information that's
> unexpected by the sender that's a problem.  Here are two concrete
> issues:
> 
> 1. Information leaks.  Currently, sending on a UNIX socket reveals
> your uid, gid, and pid.  I would argue that that is already a mistake,
> but it's well-enshrined in the API.  But imagine if sending on a UNIX
> socket suddenly started revealing, say, a few bytes of sender memory
> -- this would have huge security implications.  This code doesn't do
> that, but it's not clear to me that revealing cmdline or random bits
> of process status is much better.

The current patch doesn't reveal any new information about process that
is currently not revealed by procfs already by default and if necessary
we can add a procfs (or sysfs) entry for disabling SCM_PROCINFO feature
completely.

> 2. Inadvertent assertions of authority.  In Unix (and most, but not
> all, other OSes), for better or for worse, many APIs implicitly use
> the caller's credentials.  For the most part, these are
> well-understood, and the caller intends to use the credential.  For
> example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
> check access.  bind(2) checks capabilities to determine whether
> binding low-numbered ports is okay.  write(2) *does not check* the
> caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
> is a mistake.)  Because everyone knows that write(2) does not use the
> caller's authority in a potentially dangerous way, no one takes any
> precautions against it.  For example, basically every setuid program
> on anyone's system will happily write untrusted data to stdout or
> stderr.
> 
> The problem here is that there have been multiple vulnerabilities that
> result from the kernel assuming that a program that calls write(2)
> actually *intends* to use its authority.  For example, at one point,
> writes to /proc/PID/uid_map checked capabilities at write(2) time.
> That was a root hole.  There are probably real root holes due to the
> same effect on UNIX datagram sockets.

>From what I understand about the current patch it doesn't change
anything from the application POV and it doesn't cause write(2) to
send any additional information (all SCM_PROCINFO data on socket
read comes from kernel).

> If senders have to explicitly ask to use credentials, then this type
> of attack can't happen.

Could you also please be more specific and tell what kind of attacks
exactly is made possible by the current patch?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 13:26         ` Bartlomiej Zolnierkiewicz
@ 2014-07-04 15:15           ` Andy Lutomirski
  2014-07-04 16:55             ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-04 15:15 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Fri, Jul 4, 2014 at 6:26 AM, Bartlomiej Zolnierkiewicz
<b.zolnierkie@samsung.com> wrote:
>
> Hi,
>
> On Thursday, July 03, 2014 09:14:56 AM Andy Lutomirski wrote:
>> On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Andy Lutomirski [mailto:luto@amacapital.net]
>> >> Sent: Wednesday, July 02, 2014 1:52 AM
>> >> To: David Miller
>> >> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
>> >> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
>> >> Subject: Re: [PATCH net-next V2 0/2] send process status in
>> >> SCM_PROCINFO
>> >>
>> >> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
>> >> wrote:
>> >> > From: Piotr Wilczek <p.wilczek@samsung.com>
>> >> > Date: Thu, 26 Jun 2014 14:55:52 +0200
>> >> >
>> >> >> Server-like processes in many cases need credentials and other
>> >> >> metadata of the peer, to decide if the calling process is allowed to
>> >> >> request a specific action, or the server just wants to log away this
>> >> >> type of information for auditing tasks.
>> >> >>
>> >> >> The current practice to retrieve such process metadata is to look
>> >> >> that information up in procfs with the $PID received over
>> >> SCM_CREDENTIALS.
>> >> >> This is sufficient for long-running tasks, but introduces a race
>> >> >> which cannot be worked around for short-living processes; the
>> >> calling
>> >> >> process and all the information in /proc/$PID/ is gone before the
>> >> >> receiver of the socket message can look it up.
>> >> >>
>> >> >> Changes introduced in this patchset can also increase performance of
>> >> >> such server-like processes, because current way of opening and
>> >> >> parsing /proc/$PID/* files is much more expensive than receiving
>> >> >> these metadata using SCM.
>> >> >>
>> >> >> As an example, this patch set improves systemd-journald performance
>> >> >> by about 20%. Generally, performance improvement depends on how
>> >> >> heavily procfs is read the calling process.
>> >> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>> >> >>
>> >> >> This patch set is split in two patches:
>> >> >> - the first adds library to retrive process information without
>> >> >> dependency on procfs.
>> >> >> - the second introduces a new SCM type called SCM_PROCINFO to
>> >> >> optionally allow the direct attaching of process status to SCM.
>> >> >
>> >> > I really would like someone smarter than me to review the security
>> >> > implications et al. of these changes before I apply them.
>> >> >
>> >> > Andy?  Maybe you have an opinion?
>> >> >
>> >>
>> >> The degree to which I dislike this depends on what "procinfo" is.
>> >>
>> >> From the patch, it looks like procinfo includes cmdline.  In which
>> >> case, NAK in almost the strongest possible terms.  (If it includes
>> >> userspace addresses, then I'll upgrade that to the strongest possible
>> >> terms.)  This is a giant information leak.  Imagine what happens when a
>> >> privileged program gets one of these things as stdout.  Keep in mind
>> >> that procfs has an option to hide pids that belong to other users.
>> >>
>> >> Even if procinfo were made relatively innocuous, this is just an
>> >> extension of a bad interface.  If you all really want a way to
>> >> efficiently pass kernel-verified information through a unix socket,
>> >> then add something reasonable.  This means:
>> >>
>> >> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
>> >> that have happened due to write(2) capturing unexpected credentials of
>> >> the caller, there's just no excuse for adding new examples of this.
>> >> Yes, this will prevent you from getting a magic speed-up with legacy
>> >> clients, but I've spent long enough battling this sh*t from a security
>> >> and correctness POV to care.  Fix the clients.
>> >>
>> >> 2. It should be easily extensible, because people keep wanting to add
>> >> new things here.
>> >>
>> >> Here's a proposal for how it could work:
>> >>
>> >> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
>> >> flags, value) where type is the type of verified information and value
>> >> is what the sender wants to send.
>> >>
>> >> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
>> >> returns -EPERM.
>> >> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
>> >> Any other flags: -EINVAL.
>> >>
>> >> This could be in CMSG format or in nlattr format.  Or it could be one
>> >> cmsg per value (which might be easier).
>> >>
>> >> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
>> >> all of the sent values that have recognized types and pass
>> >> verification.
>> >>
>> >> For stream sockets, presumably the sender would send an
>> >> scm_verified_info using setsockopt and the receiver would grab it using
>> >> getsockopt.
>> >>
>> >> Types could include:
>> >>
>> >> SCMVI_UID
>> >> SCMVI_GID
>> >> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
>> >> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
>> >> its semantics are and *document* it before I'd ack such a thing)
>> >> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
>> >> awful name) etc.
>> >>
>> >> I *might* get around to implementing this, but don't expect anything
>> >> soon -- my current kernel hacking bandwidth is well-occupied by
>> >> seccomp.  But I'd be happy to review.
>> >>
>> >> --Andy
>> >
>> > In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
>> >
>>
>> The problem is neither; it's the sending of information that's
>> unexpected by the sender that's a problem.  Here are two concrete
>> issues:
>>
>> 1. Information leaks.  Currently, sending on a UNIX socket reveals
>> your uid, gid, and pid.  I would argue that that is already a mistake,
>> but it's well-enshrined in the API.  But imagine if sending on a UNIX
>> socket suddenly started revealing, say, a few bytes of sender memory
>> -- this would have huge security implications.  This code doesn't do
>> that, but it's not clear to me that revealing cmdline or random bits
>> of process status is much better.
>
> The current patch doesn't reveal any new information about process that
> is currently not revealed by procfs already by default and if necessary
> we can add a procfs (or sysfs) entry for disabling SCM_PROCINFO feature
> completely.

Ugh.  We should cautious by default, not the other way around.  Also,
if you add a sysctl like that, it'll be worthless because I'm sure
it'll break systemd in a year.

>
>> 2. Inadvertent assertions of authority.  In Unix (and most, but not
>> all, other OSes), for better or for worse, many APIs implicitly use
>> the caller's credentials.  For the most part, these are
>> well-understood, and the caller intends to use the credential.  For
>> example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
>> check access.  bind(2) checks capabilities to determine whether
>> binding low-numbered ports is okay.  write(2) *does not check* the
>> caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
>> is a mistake.)  Because everyone knows that write(2) does not use the
>> caller's authority in a potentially dangerous way, no one takes any
>> precautions against it.  For example, basically every setuid program
>> on anyone's system will happily write untrusted data to stdout or
>> stderr.
>>
>> The problem here is that there have been multiple vulnerabilities that
>> result from the kernel assuming that a program that calls write(2)
>> actually *intends* to use its authority.  For example, at one point,
>> writes to /proc/PID/uid_map checked capabilities at write(2) time.
>> That was a root hole.  There are probably real root holes due to the
>> same effect on UNIX datagram sockets.
>
> From what I understand about the current patch it doesn't change
> anything from the application POV and it doesn't cause write(2) to
> send any additional information (all SCM_PROCINFO data on socket
> read comes from kernel).

This one isn't about revealing information.  It's about the recipient
trusting that information.

>
>> If senders have to explicitly ask to use credentials, then this type
>> of attack can't happen.
>
> Could you also please be more specific and tell what kind of attacks
> exactly is made possible by the current patch?

CVE-2013-1959 and CVE-2014-0181 were both examples of this type of attack.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 15:15           ` Andy Lutomirski
@ 2014-07-04 16:55             ` Bartlomiej Zolnierkiewicz
  2014-07-04 17:07               ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-07-04 16:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Friday, July 04, 2014 08:15:24 AM Andy Lutomirski wrote:
> On Fri, Jul 4, 2014 at 6:26 AM, Bartlomiej Zolnierkiewicz
> <b.zolnierkie@samsung.com> wrote:
> >
> > Hi,
> >
> > On Thursday, July 03, 2014 09:14:56 AM Andy Lutomirski wrote:
> >> On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Andy Lutomirski [mailto:luto@amacapital.net]
> >> >> Sent: Wednesday, July 02, 2014 1:52 AM
> >> >> To: David Miller
> >> >> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
> >> >> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
> >> >> Subject: Re: [PATCH net-next V2 0/2] send process status in
> >> >> SCM_PROCINFO
> >> >>
> >> >> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
> >> >> wrote:
> >> >> > From: Piotr Wilczek <p.wilczek@samsung.com>
> >> >> > Date: Thu, 26 Jun 2014 14:55:52 +0200
> >> >> >
> >> >> >> Server-like processes in many cases need credentials and other
> >> >> >> metadata of the peer, to decide if the calling process is allowed to
> >> >> >> request a specific action, or the server just wants to log away this
> >> >> >> type of information for auditing tasks.
> >> >> >>
> >> >> >> The current practice to retrieve such process metadata is to look
> >> >> >> that information up in procfs with the $PID received over
> >> >> SCM_CREDENTIALS.
> >> >> >> This is sufficient for long-running tasks, but introduces a race
> >> >> >> which cannot be worked around for short-living processes; the
> >> >> calling
> >> >> >> process and all the information in /proc/$PID/ is gone before the
> >> >> >> receiver of the socket message can look it up.
> >> >> >>
> >> >> >> Changes introduced in this patchset can also increase performance of
> >> >> >> such server-like processes, because current way of opening and
> >> >> >> parsing /proc/$PID/* files is much more expensive than receiving
> >> >> >> these metadata using SCM.
> >> >> >>
> >> >> >> As an example, this patch set improves systemd-journald performance
> >> >> >> by about 20%. Generally, performance improvement depends on how
> >> >> >> heavily procfs is read the calling process.
> >> >> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
> >> >> >>
> >> >> >> This patch set is split in two patches:
> >> >> >> - the first adds library to retrive process information without
> >> >> >> dependency on procfs.
> >> >> >> - the second introduces a new SCM type called SCM_PROCINFO to
> >> >> >> optionally allow the direct attaching of process status to SCM.
> >> >> >
> >> >> > I really would like someone smarter than me to review the security
> >> >> > implications et al. of these changes before I apply them.
> >> >> >
> >> >> > Andy?  Maybe you have an opinion?
> >> >> >
> >> >>
> >> >> The degree to which I dislike this depends on what "procinfo" is.
> >> >>
> >> >> From the patch, it looks like procinfo includes cmdline.  In which
> >> >> case, NAK in almost the strongest possible terms.  (If it includes
> >> >> userspace addresses, then I'll upgrade that to the strongest possible
> >> >> terms.)  This is a giant information leak.  Imagine what happens when a
> >> >> privileged program gets one of these things as stdout.  Keep in mind
> >> >> that procfs has an option to hide pids that belong to other users.
> >> >>
> >> >> Even if procinfo were made relatively innocuous, this is just an
> >> >> extension of a bad interface.  If you all really want a way to
> >> >> efficiently pass kernel-verified information through a unix socket,
> >> >> then add something reasonable.  This means:
> >> >>
> >> >> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
> >> >> that have happened due to write(2) capturing unexpected credentials of
> >> >> the caller, there's just no excuse for adding new examples of this.
> >> >> Yes, this will prevent you from getting a magic speed-up with legacy
> >> >> clients, but I've spent long enough battling this sh*t from a security
> >> >> and correctness POV to care.  Fix the clients.
> >> >>
> >> >> 2. It should be easily extensible, because people keep wanting to add
> >> >> new things here.
> >> >>
> >> >> Here's a proposal for how it could work:
> >> >>
> >> >> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> >> >> flags, value) where type is the type of verified information and value
> >> >> is what the sender wants to send.
> >> >>
> >> >> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> >> >> returns -EPERM.
> >> >> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> >> >> Any other flags: -EINVAL.
> >> >>
> >> >> This could be in CMSG format or in nlattr format.  Or it could be one
> >> >> cmsg per value (which might be easier).
> >> >>
> >> >> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> >> >> all of the sent values that have recognized types and pass
> >> >> verification.
> >> >>
> >> >> For stream sockets, presumably the sender would send an
> >> >> scm_verified_info using setsockopt and the receiver would grab it using
> >> >> getsockopt.
> >> >>
> >> >> Types could include:
> >> >>
> >> >> SCMVI_UID
> >> >> SCMVI_GID
> >> >> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
> >> >> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
> >> >> its semantics are and *document* it before I'd ack such a thing)
> >> >> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
> >> >> awful name) etc.
> >> >>
> >> >> I *might* get around to implementing this, but don't expect anything
> >> >> soon -- my current kernel hacking bandwidth is well-occupied by
> >> >> seccomp.  But I'd be happy to review.
> >> >>
> >> >> --Andy
> >> >
> >> > In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
> >> >
> >>
> >> The problem is neither; it's the sending of information that's
> >> unexpected by the sender that's a problem.  Here are two concrete
> >> issues:
> >>
> >> 1. Information leaks.  Currently, sending on a UNIX socket reveals
> >> your uid, gid, and pid.  I would argue that that is already a mistake,
> >> but it's well-enshrined in the API.  But imagine if sending on a UNIX
> >> socket suddenly started revealing, say, a few bytes of sender memory
> >> -- this would have huge security implications.  This code doesn't do
> >> that, but it's not clear to me that revealing cmdline or random bits
> >> of process status is much better.
> >
> > The current patch doesn't reveal any new information about process that
> > is currently not revealed by procfs already by default and if necessary
> > we can add a procfs (or sysfs) entry for disabling SCM_PROCINFO feature
> > completely.
> 
> Ugh.  We should cautious by default, not the other way around.  Also,
> if you add a sysctl like that, it'll be worthless because I'm sure
> it'll break systemd in a year.

How is the disabling of the new option different from disabling current
procfs support?  Isn't the option to disable it useless because it can
break systemd?

Also SCM_PROCINFO feature is completely optional for systemd.  When it is
not available systemd fails back to the old (slower) way of obtaining
the needed info using procfs.  Unless systemd wants to drop support for
older kernels there should be no problem with supporting procfs way.

> >> 2. Inadvertent assertions of authority.  In Unix (and most, but not
> >> all, other OSes), for better or for worse, many APIs implicitly use
> >> the caller's credentials.  For the most part, these are
> >> well-understood, and the caller intends to use the credential.  For
> >> example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
> >> check access.  bind(2) checks capabilities to determine whether
> >> binding low-numbered ports is okay.  write(2) *does not check* the
> >> caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
> >> is a mistake.)  Because everyone knows that write(2) does not use the
> >> caller's authority in a potentially dangerous way, no one takes any
> >> precautions against it.  For example, basically every setuid program
> >> on anyone's system will happily write untrusted data to stdout or
> >> stderr.
> >>
> >> The problem here is that there have been multiple vulnerabilities that
> >> result from the kernel assuming that a program that calls write(2)
> >> actually *intends* to use its authority.  For example, at one point,
> >> writes to /proc/PID/uid_map checked capabilities at write(2) time.
> >> That was a root hole.  There are probably real root holes due to the
> >> same effect on UNIX datagram sockets.
> >
> > From what I understand about the current patch it doesn't change
> > anything from the application POV and it doesn't cause write(2) to
> > send any additional information (all SCM_PROCINFO data on socket
> > read comes from kernel).
> 
> This one isn't about revealing information.  It's about the recipient
> trusting that information.

Then why this should be a problem?  All information obtained through
SCM_PROCINFO comes from the kernel not the application itself.

> >> If senders have to explicitly ask to use credentials, then this type
> >> of attack can't happen.
> >
> > Could you also please be more specific and tell what kind of attacks
> > exactly is made possible by the current patch?
> 
> CVE-2013-1959 and CVE-2014-0181 were both examples of this type of attack.

I was asking about what specific problems do you seen with the actual
SCM_PROCINFO patches.  Given CVEs are for problems which are related to
unauthorized _sending_ of information to the kernel.  The current patches
doesn't allow sending of any additional information to the kernel.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 16:55             ` Bartlomiej Zolnierkiewicz
@ 2014-07-04 17:07               ` Andy Lutomirski
  2014-07-04 17:58                 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-04 17:07 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Fri, Jul 4, 2014 at 9:55 AM, Bartlomiej Zolnierkiewicz
<b.zolnierkie@samsung.com> wrote:
> On Friday, July 04, 2014 08:15:24 AM Andy Lutomirski wrote:
>> On Fri, Jul 4, 2014 at 6:26 AM, Bartlomiej Zolnierkiewicz
>> <b.zolnierkie@samsung.com> wrote:
>> >
>> > Hi,
>> >
>> > On Thursday, July 03, 2014 09:14:56 AM Andy Lutomirski wrote:
>> >> On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Andy Lutomirski [mailto:luto@amacapital.net]
>> >> >> Sent: Wednesday, July 02, 2014 1:52 AM
>> >> >> To: David Miller
>> >> >> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
>> >> >> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
>> >> >> Subject: Re: [PATCH net-next V2 0/2] send process status in
>> >> >> SCM_PROCINFO
>> >> >>
>> >> >> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
>> >> >> wrote:
>> >> >> > From: Piotr Wilczek <p.wilczek@samsung.com>
>> >> >> > Date: Thu, 26 Jun 2014 14:55:52 +0200
>> >> >> >
>> >> >> >> Server-like processes in many cases need credentials and other
>> >> >> >> metadata of the peer, to decide if the calling process is allowed to
>> >> >> >> request a specific action, or the server just wants to log away this
>> >> >> >> type of information for auditing tasks.
>> >> >> >>
>> >> >> >> The current practice to retrieve such process metadata is to look
>> >> >> >> that information up in procfs with the $PID received over
>> >> >> SCM_CREDENTIALS.
>> >> >> >> This is sufficient for long-running tasks, but introduces a race
>> >> >> >> which cannot be worked around for short-living processes; the
>> >> >> calling
>> >> >> >> process and all the information in /proc/$PID/ is gone before the
>> >> >> >> receiver of the socket message can look it up.
>> >> >> >>
>> >> >> >> Changes introduced in this patchset can also increase performance of
>> >> >> >> such server-like processes, because current way of opening and
>> >> >> >> parsing /proc/$PID/* files is much more expensive than receiving
>> >> >> >> these metadata using SCM.
>> >> >> >>
>> >> >> >> As an example, this patch set improves systemd-journald performance
>> >> >> >> by about 20%. Generally, performance improvement depends on how
>> >> >> >> heavily procfs is read the calling process.
>> >> >> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>> >> >> >>
>> >> >> >> This patch set is split in two patches:
>> >> >> >> - the first adds library to retrive process information without
>> >> >> >> dependency on procfs.
>> >> >> >> - the second introduces a new SCM type called SCM_PROCINFO to
>> >> >> >> optionally allow the direct attaching of process status to SCM.
>> >> >> >
>> >> >> > I really would like someone smarter than me to review the security
>> >> >> > implications et al. of these changes before I apply them.
>> >> >> >
>> >> >> > Andy?  Maybe you have an opinion?
>> >> >> >
>> >> >>
>> >> >> The degree to which I dislike this depends on what "procinfo" is.
>> >> >>
>> >> >> From the patch, it looks like procinfo includes cmdline.  In which
>> >> >> case, NAK in almost the strongest possible terms.  (If it includes
>> >> >> userspace addresses, then I'll upgrade that to the strongest possible
>> >> >> terms.)  This is a giant information leak.  Imagine what happens when a
>> >> >> privileged program gets one of these things as stdout.  Keep in mind
>> >> >> that procfs has an option to hide pids that belong to other users.
>> >> >>
>> >> >> Even if procinfo were made relatively innocuous, this is just an
>> >> >> extension of a bad interface.  If you all really want a way to
>> >> >> efficiently pass kernel-verified information through a unix socket,
>> >> >> then add something reasonable.  This means:
>> >> >>
>> >> >> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
>> >> >> that have happened due to write(2) capturing unexpected credentials of
>> >> >> the caller, there's just no excuse for adding new examples of this.
>> >> >> Yes, this will prevent you from getting a magic speed-up with legacy
>> >> >> clients, but I've spent long enough battling this sh*t from a security
>> >> >> and correctness POV to care.  Fix the clients.
>> >> >>
>> >> >> 2. It should be easily extensible, because people keep wanting to add
>> >> >> new things here.
>> >> >>
>> >> >> Here's a proposal for how it could work:
>> >> >>
>> >> >> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
>> >> >> flags, value) where type is the type of verified information and value
>> >> >> is what the sender wants to send.
>> >> >>
>> >> >> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
>> >> >> returns -EPERM.
>> >> >> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
>> >> >> Any other flags: -EINVAL.
>> >> >>
>> >> >> This could be in CMSG format or in nlattr format.  Or it could be one
>> >> >> cmsg per value (which might be easier).
>> >> >>
>> >> >> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
>> >> >> all of the sent values that have recognized types and pass
>> >> >> verification.
>> >> >>
>> >> >> For stream sockets, presumably the sender would send an
>> >> >> scm_verified_info using setsockopt and the receiver would grab it using
>> >> >> getsockopt.
>> >> >>
>> >> >> Types could include:
>> >> >>
>> >> >> SCMVI_UID
>> >> >> SCMVI_GID
>> >> >> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
>> >> >> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
>> >> >> its semantics are and *document* it before I'd ack such a thing)
>> >> >> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
>> >> >> awful name) etc.
>> >> >>
>> >> >> I *might* get around to implementing this, but don't expect anything
>> >> >> soon -- my current kernel hacking bandwidth is well-occupied by
>> >> >> seccomp.  But I'd be happy to review.
>> >> >>
>> >> >> --Andy
>> >> >
>> >> > In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
>> >> >
>> >>
>> >> The problem is neither; it's the sending of information that's
>> >> unexpected by the sender that's a problem.  Here are two concrete
>> >> issues:
>> >>
>> >> 1. Information leaks.  Currently, sending on a UNIX socket reveals
>> >> your uid, gid, and pid.  I would argue that that is already a mistake,
>> >> but it's well-enshrined in the API.  But imagine if sending on a UNIX
>> >> socket suddenly started revealing, say, a few bytes of sender memory
>> >> -- this would have huge security implications.  This code doesn't do
>> >> that, but it's not clear to me that revealing cmdline or random bits
>> >> of process status is much better.
>> >
>> > The current patch doesn't reveal any new information about process that
>> > is currently not revealed by procfs already by default and if necessary
>> > we can add a procfs (or sysfs) entry for disabling SCM_PROCINFO feature
>> > completely.
>>
>> Ugh.  We should cautious by default, not the other way around.  Also,
>> if you add a sysctl like that, it'll be worthless because I'm sure
>> it'll break systemd in a year.
>
> How is the disabling of the new option different from disabling current
> procfs support?  Isn't the option to disable it useless because it can
> break systemd?
>
> Also SCM_PROCINFO feature is completely optional for systemd.  When it is
> not available systemd fails back to the old (slower) way of obtaining
> the needed info using procfs.  Unless systemd wants to drop support for
> older kernels there should be no problem with supporting procfs way.
>
>> >> 2. Inadvertent assertions of authority.  In Unix (and most, but not
>> >> all, other OSes), for better or for worse, many APIs implicitly use
>> >> the caller's credentials.  For the most part, these are
>> >> well-understood, and the caller intends to use the credential.  For
>> >> example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
>> >> check access.  bind(2) checks capabilities to determine whether
>> >> binding low-numbered ports is okay.  write(2) *does not check* the
>> >> caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
>> >> is a mistake.)  Because everyone knows that write(2) does not use the
>> >> caller's authority in a potentially dangerous way, no one takes any
>> >> precautions against it.  For example, basically every setuid program
>> >> on anyone's system will happily write untrusted data to stdout or
>> >> stderr.
>> >>
>> >> The problem here is that there have been multiple vulnerabilities that
>> >> result from the kernel assuming that a program that calls write(2)
>> >> actually *intends* to use its authority.  For example, at one point,
>> >> writes to /proc/PID/uid_map checked capabilities at write(2) time.
>> >> That was a root hole.  There are probably real root holes due to the
>> >> same effect on UNIX datagram sockets.
>> >
>> > From what I understand about the current patch it doesn't change
>> > anything from the application POV and it doesn't cause write(2) to
>> > send any additional information (all SCM_PROCINFO data on socket
>> > read comes from kernel).
>>
>> This one isn't about revealing information.  It's about the recipient
>> trusting that information.
>
> Then why this should be a problem?  All information obtained through
> SCM_PROCINFO comes from the kernel not the application itself.

So what?  The information is correct in the sense of correctly
identifying who called write(2).  The problem is that any code (user
or kernel) that thinks it cares who called write(2) is *wrong*.  Full
stop.  That code cares about who intended the message to be send,
which may or not be the same entity that called write(2).

>
>> >> If senders have to explicitly ask to use credentials, then this type
>> >> of attack can't happen.
>> >
>> > Could you also please be more specific and tell what kind of attacks
>> > exactly is made possible by the current patch?
>>
>> CVE-2013-1959 and CVE-2014-0181 were both examples of this type of attack.
>
> I was asking about what specific problems do you seen with the actual
> SCM_PROCINFO patches.  Given CVEs are for problems which are related to
> unauthorized _sending_ of information to the kernel.  The current patches
> doesn't allow sending of any additional information to the kernel.

Userspace will be just as vulnerable.  Some day I'll go exploit the
same thing to break udev or dbus or something like that.

I have a 10% written SCM_IDENTITY thing.  Maybe I'll try to finish it.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 17:07               ` Andy Lutomirski
@ 2014-07-04 17:58                 ` Bartlomiej Zolnierkiewicz
  2014-07-04 19:53                   ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-07-04 17:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Friday, July 04, 2014 10:07:24 AM Andy Lutomirski wrote:
> On Fri, Jul 4, 2014 at 9:55 AM, Bartlomiej Zolnierkiewicz
> <b.zolnierkie@samsung.com> wrote:
> > On Friday, July 04, 2014 08:15:24 AM Andy Lutomirski wrote:
> >> On Fri, Jul 4, 2014 at 6:26 AM, Bartlomiej Zolnierkiewicz
> >> <b.zolnierkie@samsung.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > On Thursday, July 03, 2014 09:14:56 AM Andy Lutomirski wrote:
> >> >> On Wed, Jul 2, 2014 at 11:00 PM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Andy Lutomirski [mailto:luto@amacapital.net]
> >> >> >> Sent: Wednesday, July 02, 2014 1:52 AM
> >> >> >> To: David Miller
> >> >> >> Cc: Piotr Wilczek; Network Development; Kyungmin Park;
> >> >> >> juho80.son@samsung.com; Bartlomiej Zolnierkiewicz; jkaluza@redhat.com
> >> >> >> Subject: Re: [PATCH net-next V2 0/2] send process status in
> >> >> >> SCM_PROCINFO
> >> >> >>
> >> >> >> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net>
> >> >> >> wrote:
> >> >> >> > From: Piotr Wilczek <p.wilczek@samsung.com>
> >> >> >> > Date: Thu, 26 Jun 2014 14:55:52 +0200
> >> >> >> >
> >> >> >> >> Server-like processes in many cases need credentials and other
> >> >> >> >> metadata of the peer, to decide if the calling process is allowed to
> >> >> >> >> request a specific action, or the server just wants to log away this
> >> >> >> >> type of information for auditing tasks.
> >> >> >> >>
> >> >> >> >> The current practice to retrieve such process metadata is to look
> >> >> >> >> that information up in procfs with the $PID received over
> >> >> >> SCM_CREDENTIALS.
> >> >> >> >> This is sufficient for long-running tasks, but introduces a race
> >> >> >> >> which cannot be worked around for short-living processes; the
> >> >> >> calling
> >> >> >> >> process and all the information in /proc/$PID/ is gone before the
> >> >> >> >> receiver of the socket message can look it up.
> >> >> >> >>
> >> >> >> >> Changes introduced in this patchset can also increase performance of
> >> >> >> >> such server-like processes, because current way of opening and
> >> >> >> >> parsing /proc/$PID/* files is much more expensive than receiving
> >> >> >> >> these metadata using SCM.
> >> >> >> >>
> >> >> >> >> As an example, this patch set improves systemd-journald performance
> >> >> >> >> by about 20%. Generally, performance improvement depends on how
> >> >> >> >> heavily procfs is read the calling process.
> >> >> >> >> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
> >> >> >> >>
> >> >> >> >> This patch set is split in two patches:
> >> >> >> >> - the first adds library to retrive process information without
> >> >> >> >> dependency on procfs.
> >> >> >> >> - the second introduces a new SCM type called SCM_PROCINFO to
> >> >> >> >> optionally allow the direct attaching of process status to SCM.
> >> >> >> >
> >> >> >> > I really would like someone smarter than me to review the security
> >> >> >> > implications et al. of these changes before I apply them.
> >> >> >> >
> >> >> >> > Andy?  Maybe you have an opinion?
> >> >> >> >
> >> >> >>
> >> >> >> The degree to which I dislike this depends on what "procinfo" is.
> >> >> >>
> >> >> >> From the patch, it looks like procinfo includes cmdline.  In which
> >> >> >> case, NAK in almost the strongest possible terms.  (If it includes
> >> >> >> userspace addresses, then I'll upgrade that to the strongest possible
> >> >> >> terms.)  This is a giant information leak.  Imagine what happens when a
> >> >> >> privileged program gets one of these things as stdout.  Keep in mind
> >> >> >> that procfs has an option to hide pids that belong to other users.
> >> >> >>
> >> >> >> Even if procinfo were made relatively innocuous, this is just an
> >> >> >> extension of a bad interface.  If you all really want a way to
> >> >> >> efficiently pass kernel-verified information through a unix socket,
> >> >> >> then add something reasonable.  This means:
> >> >> >>
> >> >> >> 1. It needs to be opt-in, *per send*.  After all of the vulnerabilities
> >> >> >> that have happened due to write(2) capturing unexpected credentials of
> >> >> >> the caller, there's just no excuse for adding new examples of this.
> >> >> >> Yes, this will prevent you from getting a magic speed-up with legacy
> >> >> >> clients, but I've spent long enough battling this sh*t from a security
> >> >> >> and correctness POV to care.  Fix the clients.
> >> >> >>
> >> >> >> 2. It should be easily extensible, because people keep wanting to add
> >> >> >> new things here.
> >> >> >>
> >> >> >> Here's a proposal for how it could work:
> >> >> >>
> >> >> >> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> >> >> >> flags, value) where type is the type of verified information and value
> >> >> >> is what the sender wants to send.
> >> >> >>
> >> >> >> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> >> >> >> returns -EPERM.
> >> >> >> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> >> >> >> Any other flags: -EINVAL.
> >> >> >>
> >> >> >> This could be in CMSG format or in nlattr format.  Or it could be one
> >> >> >> cmsg per value (which might be easier).
> >> >> >>
> >> >> >> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> >> >> >> all of the sent values that have recognized types and pass
> >> >> >> verification.
> >> >> >>
> >> >> >> For stream sockets, presumably the sender would send an
> >> >> >> scm_verified_info using setsockopt and the receiver would grab it using
> >> >> >> getsockopt.
> >> >> >>
> >> >> >> Types could include:
> >> >> >>
> >> >> >> SCMVI_UID
> >> >> >> SCMVI_GID
> >> >> >> SCMVI_PID (racy, but still better than SCM_CREDENTIALS) SCMVI_LOGONUID
> >> >> >> (but only if CONFIG_AUDITSYSCALL, and someone needs to figure out wtf
> >> >> >> its semantics are and *document* it before I'd ack such a thing)
> >> >> >> SCMVI_CGROUP (sigh) SCMVI_AUDIT_SESSIONID (because sessionid is an
> >> >> >> awful name) etc.
> >> >> >>
> >> >> >> I *might* get around to implementing this, but don't expect anything
> >> >> >> soon -- my current kernel hacking bandwidth is well-occupied by
> >> >> >> seccomp.  But I'd be happy to review.
> >> >> >>
> >> >> >> --Andy
> >> >> >
> >> >> > In this patch the kernel sends information about the sender. In your proposal, the sender sends the information and the kernel maybe completes the missing data. Could you please explain why your solution is secure? Is the source or the range of information sent the security problem?
> >> >> >
> >> >>
> >> >> The problem is neither; it's the sending of information that's
> >> >> unexpected by the sender that's a problem.  Here are two concrete
> >> >> issues:
> >> >>
> >> >> 1. Information leaks.  Currently, sending on a UNIX socket reveals
> >> >> your uid, gid, and pid.  I would argue that that is already a mistake,
> >> >> but it's well-enshrined in the API.  But imagine if sending on a UNIX
> >> >> socket suddenly started revealing, say, a few bytes of sender memory
> >> >> -- this would have huge security implications.  This code doesn't do
> >> >> that, but it's not clear to me that revealing cmdline or random bits
> >> >> of process status is much better.
> >> >
> >> > The current patch doesn't reveal any new information about process that
> >> > is currently not revealed by procfs already by default and if necessary
> >> > we can add a procfs (or sysfs) entry for disabling SCM_PROCINFO feature
> >> > completely.
> >>
> >> Ugh.  We should cautious by default, not the other way around.  Also,
> >> if you add a sysctl like that, it'll be worthless because I'm sure
> >> it'll break systemd in a year.
> >
> > How is the disabling of the new option different from disabling current
> > procfs support?  Isn't the option to disable it useless because it can
> > break systemd?
> >
> > Also SCM_PROCINFO feature is completely optional for systemd.  When it is
> > not available systemd fails back to the old (slower) way of obtaining
> > the needed info using procfs.  Unless systemd wants to drop support for
> > older kernels there should be no problem with supporting procfs way.
> >
> >> >> 2. Inadvertent assertions of authority.  In Unix (and most, but not
> >> >> all, other OSes), for better or for worse, many APIs implicitly use
> >> >> the caller's credentials.  For the most part, these are
> >> >> well-understood, and the caller intends to use the credential.  For
> >> >> example, open(2) is expected to use euid/egid (or fsuid/fsgid) to
> >> >> check access.  bind(2) checks capabilities to determine whether
> >> >> binding low-numbered ports is okay.  write(2) *does not check* the
> >> >> caller's authority.  (Except for SCM_CREDENTIALS, which I would argue
> >> >> is a mistake.)  Because everyone knows that write(2) does not use the
> >> >> caller's authority in a potentially dangerous way, no one takes any
> >> >> precautions against it.  For example, basically every setuid program
> >> >> on anyone's system will happily write untrusted data to stdout or
> >> >> stderr.
> >> >>
> >> >> The problem here is that there have been multiple vulnerabilities that
> >> >> result from the kernel assuming that a program that calls write(2)
> >> >> actually *intends* to use its authority.  For example, at one point,
> >> >> writes to /proc/PID/uid_map checked capabilities at write(2) time.
> >> >> That was a root hole.  There are probably real root holes due to the
> >> >> same effect on UNIX datagram sockets.
> >> >
> >> > From what I understand about the current patch it doesn't change
> >> > anything from the application POV and it doesn't cause write(2) to
> >> > send any additional information (all SCM_PROCINFO data on socket
> >> > read comes from kernel).
> >>
> >> This one isn't about revealing information.  It's about the recipient
> >> trusting that information.
> >
> > Then why this should be a problem?  All information obtained through
> > SCM_PROCINFO comes from the kernel not the application itself.
> 
> So what?  The information is correct in the sense of correctly
> identifying who called write(2).  The problem is that any code (user
> or kernel) that thinks it cares who called write(2) is *wrong*.  Full
> stop.  That code cares about who intended the message to be send,
> which may or not be the same entity that called write(2).

Do you mean that the process doing write(2) can be different from the one
intending it?  How is that a problem in our case?  We only care about info
about entity that actually called write(2).  We completely don't care about
the intent in the kernel and any code doing any authorization based on
the obtained information in user-space would be seriously wrong because
the information is stale once it leaves kernel and has only historic value.
Am I missing something?

> >> >> If senders have to explicitly ask to use credentials, then this type
> >> >> of attack can't happen.
> >> >
> >> > Could you also please be more specific and tell what kind of attacks
> >> > exactly is made possible by the current patch?
> >>
> >> CVE-2013-1959 and CVE-2014-0181 were both examples of this type of attack.
> >
> > I was asking about what specific problems do you seen with the actual
> > SCM_PROCINFO patches.  Given CVEs are for problems which are related to
> > unauthorized _sending_ of information to the kernel.  The current patches
> > doesn't allow sending of any additional information to the kernel.
> 
> Userspace will be just as vulnerable.  Some day I'll go exploit the
> same thing to break udev or dbus or something like that.
> 
> I have a 10% written SCM_IDENTITY thing.  Maybe I'll try to finish it.

That is good to hear. :)

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 17:58                 ` Bartlomiej Zolnierkiewicz
@ 2014-07-04 19:53                   ` Andy Lutomirski
  2014-07-14 16:43                     ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-04 19:53 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Fri, Jul 4, 2014 at 10:58 AM, Bartlomiej Zolnierkiewicz
<b.zolnierkie@samsung.com> wrote:
> On Friday, July 04, 2014 10:07:24 AM Andy Lutomirski wrote:
>> > Then why this should be a problem?  All information obtained through
>> > SCM_PROCINFO comes from the kernel not the application itself.
>>
>> So what?  The information is correct in the sense of correctly
>> identifying who called write(2).  The problem is that any code (user
>> or kernel) that thinks it cares who called write(2) is *wrong*.  Full
>> stop.  That code cares about who intended the message to be send,
>> which may or not be the same entity that called write(2).
>
> Do you mean that the process doing write(2) can be different from the one
> intending it?  How is that a problem in our case?  We only care about info
> about entity that actually called write(2).  We completely don't care about
> the intent in the kernel and any code doing any authorization based on
> the obtained information in user-space would be seriously wrong because
> the information is stale once it leaves kernel and has only historic value.
> Am I missing something?

What exactly is your case?  If it's purely for debugging, fine.  But
if it's a log and anyone ever wants to think of it as a reliable audit
log, this isn't so good.  For example, if I see a log message from
_UID=0 saying something important, I am likely to believe that
something privileged actually generated that text.  This *cannot* be
guaranteed by SCM_CREDENTIALS on a datagram socket.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-04 19:53                   ` Andy Lutomirski
@ 2014-07-14 16:43                     ` Bartlomiej Zolnierkiewicz
  2014-07-14 17:05                       ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2014-07-14 16:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza


Hi,

On Friday, July 04, 2014 12:53:19 PM Andy Lutomirski wrote:
> On Fri, Jul 4, 2014 at 10:58 AM, Bartlomiej Zolnierkiewicz
> <b.zolnierkie@samsung.com> wrote:
> > On Friday, July 04, 2014 10:07:24 AM Andy Lutomirski wrote:
> >> > Then why this should be a problem?  All information obtained through
> >> > SCM_PROCINFO comes from the kernel not the application itself.
> >>
> >> So what?  The information is correct in the sense of correctly
> >> identifying who called write(2).  The problem is that any code (user
> >> or kernel) that thinks it cares who called write(2) is *wrong*.  Full
> >> stop.  That code cares about who intended the message to be send,
> >> which may or not be the same entity that called write(2).
> >
> > Do you mean that the process doing write(2) can be different from the one
> > intending it?  How is that a problem in our case?  We only care about info
> > about entity that actually called write(2).  We completely don't care about
> > the intent in the kernel and any code doing any authorization based on
> > the obtained information in user-space would be seriously wrong because
> > the information is stale once it leaves kernel and has only historic value.
> > Am I missing something?
> 
> What exactly is your case?  If it's purely for debugging, fine.  But
> if it's a log and anyone ever wants to think of it as a reliable audit
> log, this isn't so good.  For example, if I see a log message from
> _UID=0 saying something important, I am likely to believe that
> something privileged actually generated that text.  This *cannot* be
> guaranteed by SCM_CREDENTIALS on a datagram socket.

It would be the best to give some _solid_ examples of how SCM_PROCINFO
interface can allow additional security issues that are currently not
a problem with procfs approach.  If the setuid application having _UID=0
allows to pass arbitrary messages to stdout it is buggy and needs fixing
already.

I see that with your proposed scheme you would need to explicitly send
the extra information from the application side which would be indeed
more secure from point of view of buggy setuid applications but it will
punish every normal application by having to update its code and getting
lower performance (than with SCM_PROCINFO approach).

Given all this I think that it would be probably much more efficient to
just audit and fix buggy setuid applications instead of converting all
normal applications to the proposed interface (it is probably orders of
magnitude more normal applications using systemd journald than setuid
ones).

Also, Piotr will explain more our (or rather systemd's) use case.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-14 16:43                     ` Bartlomiej Zolnierkiewicz
@ 2014-07-14 17:05                       ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-14 17:05 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Piotr Wilczek, David Miller, Network Development, Kyungmin Park,
	juho80.son, jkaluza

On Mon, Jul 14, 2014 at 9:43 AM, Bartlomiej Zolnierkiewicz
<b.zolnierkie@samsung.com> wrote:
>
> Hi,
>
> On Friday, July 04, 2014 12:53:19 PM Andy Lutomirski wrote:
>> On Fri, Jul 4, 2014 at 10:58 AM, Bartlomiej Zolnierkiewicz
>> <b.zolnierkie@samsung.com> wrote:
>> > On Friday, July 04, 2014 10:07:24 AM Andy Lutomirski wrote:
>> >> > Then why this should be a problem?  All information obtained through
>> >> > SCM_PROCINFO comes from the kernel not the application itself.
>> >>
>> >> So what?  The information is correct in the sense of correctly
>> >> identifying who called write(2).  The problem is that any code (user
>> >> or kernel) that thinks it cares who called write(2) is *wrong*.  Full
>> >> stop.  That code cares about who intended the message to be send,
>> >> which may or not be the same entity that called write(2).
>> >
>> > Do you mean that the process doing write(2) can be different from the one
>> > intending it?  How is that a problem in our case?  We only care about info
>> > about entity that actually called write(2).  We completely don't care about
>> > the intent in the kernel and any code doing any authorization based on
>> > the obtained information in user-space would be seriously wrong because
>> > the information is stale once it leaves kernel and has only historic value.
>> > Am I missing something?
>>
>> What exactly is your case?  If it's purely for debugging, fine.  But
>> if it's a log and anyone ever wants to think of it as a reliable audit
>> log, this isn't so good.  For example, if I see a log message from
>> _UID=0 saying something important, I am likely to believe that
>> something privileged actually generated that text.  This *cannot* be
>> guaranteed by SCM_CREDENTIALS on a datagram socket.
>
> It would be the best to give some _solid_ examples of how SCM_PROCINFO
> interface can allow additional security issues that are currently not
> a problem with procfs approach.

> If the setuid application having _UID=0
> allows to pass arbitrary messages to stdout it is buggy and needs fixing
> already.

This is, for better or for worse, not true.  I think that essentially
every setuid program on my system allows some degree of control over
what goes to stdout.  Many of them allow nearly complete control.

>
> I see that with your proposed scheme you would need to explicitly send
> the extra information from the application side which would be indeed
> more secure from point of view of buggy setuid applications but it will
> punish every normal application by having to update its code and getting
> lower performance (than with SCM_PROCINFO approach).

What about the existing programs that *don't* want to send this information?

>
> Given all this I think that it would be probably much more efficient to
> just audit and fix buggy setuid applications instead of converting all
> normal applications to the proposed interface (it is probably orders of
> magnitude more normal applications using systemd journald than setuid
> ones).

Having gone through this a couple times with existing security issues,
I guarantee that Linus will disagree with this sentiment.  These
setuid programs are the majority, and I would argue that they aren't
buggy.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-01 23:52   ` Andy Lutomirski
  2014-07-03  6:00     ` Piotr Wilczek
@ 2014-07-21 12:59     ` Piotr Wilczek
  2014-07-22 12:24     ` Piotr Wilczek
  2 siblings, 0 replies; 18+ messages in thread
From: Piotr Wilczek @ 2014-07-21 12:59 UTC (permalink / raw)
  To: netdev
  Cc: Network Development, Kyungmin Park, juho80.son, b.zolnierkie, jkaluza

Hi,

On 07/02/2014 01:52 AM, Andy Lutomirski wrote:
> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net> wrote:
>> From: Piotr Wilczek <p.wilczek@samsung.com>
>> Date: Thu, 26 Jun 2014 14:55:52 +0200
>>
>>> Server-like processes in many cases need credentials and other
>>> metadata of the peer, to decide if the calling process is allowed to
>>> request a specific action, or the server just wants to log away this
>>> type of information for auditing tasks.
>>>
>>> The current practice to retrieve such process metadata is to look that
>>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>>> This is sufficient for long-running tasks, but introduces a race which
>>> cannot be worked around for short-living processes; the calling
>>> process and all the information in /proc/$PID/ is gone before the
>>> receiver of the socket message can look it up.
>>>
>>> Changes introduced in this patchset can also increase performance
>>> of such server-like processes, because current way of opening and
>>> parsing /proc/$PID/* files is much more expensive than receiving these
>>> metadata using SCM.
>>>
>>> As an example, this patch set improves systemd-journald performance
>>> by about 20%. Generally, performance improvement depends on how heavily
>>> procfs is read the calling process.
>>> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>>>
>>> This patch set is split in two patches:
>>> - the first adds library to retrive process information without
>>> dependency on procfs.
>>> - the second introduces a new SCM type called SCM_PROCINFO to optionally
>>> allow the direct attaching of process status to SCM.
>>
>> I really would like someone smarter than me to review the security
>> implications et al. of these changes before I apply them.
>>
>> Andy?  Maybe you have an opinion?
>>
>
> The degree to which I dislike this depends on what "procinfo" is.
>
>  From the patch, it looks like procinfo includes cmdline.  In which
> case, NAK in almost the strongest possible terms.  (If it includes
> userspace addresses, then I'll upgrade that to the strongest possible
> terms.)  This is a giant information leak.  Imagine what happens when
> a privileged program gets one of these things as stdout.  Keep in mind
> that procfs has an option to hide pids that belong to other users.
>
> Even if procinfo were made relatively innocuous, this is just an
> extension of a bad interface.  If you all really want a way to
> efficiently pass kernel-verified information through a unix socket,
> then add something reasonable.  This means:
>
> 1. It needs to be opt-in, *per send*.  After all of the
> vulnerabilities that have happened due to write(2) capturing
> unexpected credentials of the caller, there's just no excuse for
> adding new examples of this.  Yes, this will prevent you from getting
> a magic speed-up with legacy clients, but I've spent long enough
> battling this sh*t from a security and correctness POV to care.  Fix
> the clients.
>
> 2. It should be easily extensible, because people keep wanting to add
> new things here.
>
> Here's a proposal for how it could work:
>
> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> flags, value) where type is the type of verified information and value
> is what the sender wants to send.
>
> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> returns -EPERM.
> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> Any other flags: -EINVAL.
>
> This could be in CMSG format or in nlattr format.  Or it could be one
> cmsg per value (which might be easier).
>
> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> all of the sent values that have recognized types and pass
> verification.
>
> For stream sockets, presumably the sender would send an
> scm_verified_info using setsockopt and the receiver would grab it
> using getsockopt.
>
> Types could include:
>
> SCMVI_UID
> SCMVI_GID
> SCMVI_PID (racy, but still better than SCM_CREDENTIALS)
> SCMVI_LOGONUID (but only if CONFIG_AUDITSYSCALL, and someone needs to
> figure out wtf its semantics are and *document* it before I'd ack such
> a thing)
> SCMVI_CGROUP (sigh)
> SCMVI_AUDIT_SESSIONID (because sessionid is an awful name)
> etc.
>
> I *might* get around to implementing this, but don't expect anything
> soon -- my current kernel hacking bandwidth is well-occupied by
> seccomp.  But I'd be happy to review.
>
> --Andy
>

Thank you for comments. We do understand security issues of that patch. 
For other reasons, the patch is important for us. To speed things up, 
would you consider cooperation in implementing your SCM_IDENTITY?

Piotr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-01 23:52   ` Andy Lutomirski
  2014-07-03  6:00     ` Piotr Wilczek
  2014-07-21 12:59     ` Piotr Wilczek
@ 2014-07-22 12:24     ` Piotr Wilczek
  2014-07-23 20:49       ` Andy Lutomirski
  2 siblings, 1 reply; 18+ messages in thread
From: Piotr Wilczek @ 2014-07-22 12:24 UTC (permalink / raw)
  To: Andy Lutomirski, David Miller
  Cc: Network Development, Kyungmin Park, juho80.son, b.zolnierkie, jkaluza

Hi Andy,

On 07/02/2014 01:52 AM, Andy Lutomirski wrote:
> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net> wrote:
>> From: Piotr Wilczek <p.wilczek@samsung.com>
>> Date: Thu, 26 Jun 2014 14:55:52 +0200
>>
>>> Server-like processes in many cases need credentials and other
>>> metadata of the peer, to decide if the calling process is allowed to
>>> request a specific action, or the server just wants to log away this
>>> type of information for auditing tasks.
>>>
>>> The current practice to retrieve such process metadata is to look that
>>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>>> This is sufficient for long-running tasks, but introduces a race which
>>> cannot be worked around for short-living processes; the calling
>>> process and all the information in /proc/$PID/ is gone before the
>>> receiver of the socket message can look it up.
>>>
>>> Changes introduced in this patchset can also increase performance
>>> of such server-like processes, because current way of opening and
>>> parsing /proc/$PID/* files is much more expensive than receiving these
>>> metadata using SCM.
>>>
>>> As an example, this patch set improves systemd-journald performance
>>> by about 20%. Generally, performance improvement depends on how heavily
>>> procfs is read the calling process.
>>> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>>>
>>> This patch set is split in two patches:
>>> - the first adds library to retrive process information without
>>> dependency on procfs.
>>> - the second introduces a new SCM type called SCM_PROCINFO to optionally
>>> allow the direct attaching of process status to SCM.
>>
>> I really would like someone smarter than me to review the security
>> implications et al. of these changes before I apply them.
>>
>> Andy?  Maybe you have an opinion?
>>
>
> The degree to which I dislike this depends on what "procinfo" is.
>
>  From the patch, it looks like procinfo includes cmdline.  In which
> case, NAK in almost the strongest possible terms.  (If it includes
> userspace addresses, then I'll upgrade that to the strongest possible
> terms.)  This is a giant information leak.  Imagine what happens when
> a privileged program gets one of these things as stdout.  Keep in mind
> that procfs has an option to hide pids that belong to other users.
>
> Even if procinfo were made relatively innocuous, this is just an
> extension of a bad interface.  If you all really want a way to
> efficiently pass kernel-verified information through a unix socket,
> then add something reasonable.  This means:
>
> 1. It needs to be opt-in, *per send*.  After all of the
> vulnerabilities that have happened due to write(2) capturing
> unexpected credentials of the caller, there's just no excuse for
> adding new examples of this.  Yes, this will prevent you from getting
> a magic speed-up with legacy clients, but I've spent long enough
> battling this sh*t from a security and correctness POV to care.  Fix
> the clients.
>
> 2. It should be easily extensible, because people keep wanting to add
> new things here.
>
> Here's a proposal for how it could work:
>
> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
> flags, value) where type is the type of verified information and value
> is what the sender wants to send.
>
> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
> returns -EPERM.
> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
> Any other flags: -EINVAL.
>
> This could be in CMSG format or in nlattr format.  Or it could be one
> cmsg per value (which might be easier).
>
> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
> all of the sent values that have recognized types and pass
> verification.
>
> For stream sockets, presumably the sender would send an
> scm_verified_info using setsockopt and the receiver would grab it
> using getsockopt.
>
> Types could include:
>
> SCMVI_UID
> SCMVI_GID
> SCMVI_PID (racy, but still better than SCM_CREDENTIALS)
> SCMVI_LOGONUID (but only if CONFIG_AUDITSYSCALL, and someone needs to
> figure out wtf its semantics are and *document* it before I'd ack such
> a thing)
> SCMVI_CGROUP (sigh)
> SCMVI_AUDIT_SESSIONID (because sessionid is an awful name)
> etc.
>
> I *might* get around to implementing this, but don't expect anything
> soon -- my current kernel hacking bandwidth is well-occupied by
> seccomp.  But I'd be happy to review.
>
> --Andy
>

Thank you for comments. We do understand security issues of that patch. 
For other reasons, the patch is important for us. To speed things up, 
would you consider cooperation in implementing your SCM_IDENTITY?

Piotr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V2 0/2] send process status in SCM_PROCINFO
  2014-07-22 12:24     ` Piotr Wilczek
@ 2014-07-23 20:49       ` Andy Lutomirski
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Lutomirski @ 2014-07-23 20:49 UTC (permalink / raw)
  To: Piotr Wilczek, Serge Hallyn
  Cc: David Miller, Network Development, Kyungmin Park, juho80.son,
	Bartlomiej Zolnierkiewicz, jkaluza

On Tue, Jul 22, 2014 at 5:24 AM, Piotr Wilczek <p.wilczek@samsung.com> wrote:
> Hi Andy,
>
>
> On 07/02/2014 01:52 AM, Andy Lutomirski wrote:
>>
>> On Tue, Jul 1, 2014 at 4:31 PM, David Miller <davem@davemloft.net> wrote:
>>>
>>> From: Piotr Wilczek <p.wilczek@samsung.com>
>>> Date: Thu, 26 Jun 2014 14:55:52 +0200
>>>
>>>> Server-like processes in many cases need credentials and other
>>>> metadata of the peer, to decide if the calling process is allowed to
>>>> request a specific action, or the server just wants to log away this
>>>> type of information for auditing tasks.
>>>>
>>>> The current practice to retrieve such process metadata is to look that
>>>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>>>> This is sufficient for long-running tasks, but introduces a race which
>>>> cannot be worked around for short-living processes; the calling
>>>> process and all the information in /proc/$PID/ is gone before the
>>>> receiver of the socket message can look it up.
>>>>
>>>> Changes introduced in this patchset can also increase performance
>>>> of such server-like processes, because current way of opening and
>>>> parsing /proc/$PID/* files is much more expensive than receiving these
>>>> metadata using SCM.
>>>>
>>>> As an example, this patch set improves systemd-journald performance
>>>> by about 20%. Generally, performance improvement depends on how heavily
>>>> procfs is read the calling process.
>>>> http://comments.gmane.org/gmane.comp.sysutils.systemd.devel/19467
>>>>
>>>> This patch set is split in two patches:
>>>> - the first adds library to retrive process information without
>>>> dependency on procfs.
>>>> - the second introduces a new SCM type called SCM_PROCINFO to optionally
>>>> allow the direct attaching of process status to SCM.
>>>
>>>
>>> I really would like someone smarter than me to review the security
>>> implications et al. of these changes before I apply them.
>>>
>>> Andy?  Maybe you have an opinion?
>>>
>>
>> The degree to which I dislike this depends on what "procinfo" is.
>>
>>  From the patch, it looks like procinfo includes cmdline.  In which
>> case, NAK in almost the strongest possible terms.  (If it includes
>> userspace addresses, then I'll upgrade that to the strongest possible
>> terms.)  This is a giant information leak.  Imagine what happens when
>> a privileged program gets one of these things as stdout.  Keep in mind
>> that procfs has an option to hide pids that belong to other users.
>>
>> Even if procinfo were made relatively innocuous, this is just an
>> extension of a bad interface.  If you all really want a way to
>> efficiently pass kernel-verified information through a unix socket,
>> then add something reasonable.  This means:
>>
>> 1. It needs to be opt-in, *per send*.  After all of the
>> vulnerabilities that have happened due to write(2) capturing
>> unexpected credentials of the caller, there's just no excuse for
>> adding new examples of this.  Yes, this will prevent you from getting
>> a magic speed-up with legacy clients, but I've spent long enough
>> battling this sh*t from a security and correctness POV to care.  Fix
>> the clients.
>>
>> 2. It should be easily extensible, because people keep wanting to add
>> new things here.
>>
>> Here's a proposal for how it could work:
>>
>> Senders can send SCM_VERIFIED_INFO.  The payload is a list of (type,
>> flags, value) where type is the type of verified information and value
>> is what the sender wants to send.
>>
>> Flags can include SCMVI_FLAG_MANDATORY: failure to send this attribute
>> returns -EPERM.
>> SCMVI_FLAG_AUTO: value must be blank; the kernel will fill it in.
>> Any other flags: -EINVAL.
>>
>> This could be in CMSG format or in nlattr format.  Or it could be one
>> cmsg per value (which might be easier).
>>
>> Receivers can set SO_PASS_VERIFIED_INFO.  If so, then they'll receive
>> all of the sent values that have recognized types and pass
>> verification.
>>
>> For stream sockets, presumably the sender would send an
>> scm_verified_info using setsockopt and the receiver would grab it
>> using getsockopt.
>>
>> Types could include:
>>
>> SCMVI_UID
>> SCMVI_GID
>> SCMVI_PID (racy, but still better than SCM_CREDENTIALS)
>> SCMVI_LOGONUID (but only if CONFIG_AUDITSYSCALL, and someone needs to
>> figure out wtf its semantics are and *document* it before I'd ack such
>> a thing)
>> SCMVI_CGROUP (sigh)
>> SCMVI_AUDIT_SESSIONID (because sessionid is an awful name)
>> etc.
>>
>> I *might* get around to implementing this, but don't expect anything
>> soon -- my current kernel hacking bandwidth is well-occupied by
>> seccomp.  But I'd be happy to review.
>>
>> --Andy
>>
>
> Thank you for comments. We do understand security issues of that patch. For
> other reasons, the patch is important for us. To speed things up, would you
> consider cooperation in implementing your SCM_IDENTITY?

Sure.

There's some code here:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=scm_identity

It's very much a work-in-progress.  Want to take a look and see if
it's a good place to start from?

--Andy

>
> Piotr
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-07-23 20:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-26 12:55 [PATCH net-next V2 0/2] send process status in SCM_PROCINFO Piotr Wilczek
2014-06-26 12:55 ` [PATCH net-next V2 1/2] lib:proc_info:add library to get process information Piotr Wilczek
2014-06-26 12:55 ` [PATCH net-next V2 2/2] Send process status in SCM_PROCINFO Piotr Wilczek
2014-07-01 23:31 ` [PATCH net-next V2 0/2] send " David Miller
2014-07-01 23:52   ` Andy Lutomirski
2014-07-03  6:00     ` Piotr Wilczek
2014-07-03 16:14       ` Andy Lutomirski
2014-07-04 13:26         ` Bartlomiej Zolnierkiewicz
2014-07-04 15:15           ` Andy Lutomirski
2014-07-04 16:55             ` Bartlomiej Zolnierkiewicz
2014-07-04 17:07               ` Andy Lutomirski
2014-07-04 17:58                 ` Bartlomiej Zolnierkiewicz
2014-07-04 19:53                   ` Andy Lutomirski
2014-07-14 16:43                     ` Bartlomiej Zolnierkiewicz
2014-07-14 17:05                       ` Andy Lutomirski
2014-07-21 12:59     ` Piotr Wilczek
2014-07-22 12:24     ` Piotr Wilczek
2014-07-23 20:49       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).