linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/4] [patch 0/4] A pile in c/r sake v2
@ 2012-02-03 15:19 Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v9 Cyrill Gorcunov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 15:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin

Hi, hope this time I've addressed more or less all concerns,
still please say me if something is not yet good enough.

 - sys_kcmp now depends on CONFIG_CHECKPOINT_RESTORE
   (Ingo, I suppose arch_initcall for kcmp_cookies_init
    should be better than late_initcall, but I'm not
    sure)

 - the extension of /proc/pid/stat now done against
   linux-next/master

Please review, thanks,

	Cyrill

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v9
  2012-02-03 15:19 [patch 0/4] [patch 0/4] A pile in c/r sake v2 Cyrill Gorcunov
@ 2012-02-03 15:19 ` Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 2/4] syscalls, x86: Add __NR_kcmp syscall v7 Cyrill Gorcunov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 15:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Cyrill Gorcunov,
	Pavel Emelyanov, Serge Hallyn, KAMEZAWA Hiroyuki

[-- Attachment #1: fs-proc-tid-children-13 --]
[-- Type: text/plain, Size: 7087 bytes --]

When we do checkpoint of a task we need to know the list of children
the task, has but there is no easy and fast way to generate reverse
parent->children chain from arbitrary <pid> (while a parent pid is
provided in "PPid" field of /proc/<pid>/status).

So instead of walking over all pids in the system (creating one big process
tree in memory, just to figure out which children a task has) -- we add
explicit /proc/<pid>/task/<tid>/children entry, because the kernel already has
this kind of information but it is not yet exported.

This is a first level children, not the whole process tree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/filesystems/proc.txt |   18 +++++
 fs/proc/array.c                    |  123 +++++++++++++++++++++++++++++++++++++
 fs/proc/base.c                     |    3 
 fs/proc/internal.h                 |    1 
 4 files changed, 145 insertions(+)

Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -40,6 +40,7 @@ Table of Contents
   3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
   3.5	/proc/<pid>/mountinfo - Information about mounts
   3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
+  3.7   /proc/<pid>/task/<tid>/children - Information about task children
 
   4	Configuring procfs
   4.1	Mount options
@@ -1549,6 +1550,23 @@ then the kernel's TASK_COMM_LEN (current
 comm value.
 
 
+3.7	/proc/<pid>/task/<tid>/children - Information about task children
+-------------------------------------------------------------------------
+This file provides a fast way to retrieve first level children pids
+of a task pointed by <pid>/<tid> pair. The format is a space separated
+stream of pids.
+
+Note the "first level" here -- if a child has own children they will
+not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
+to obtain the descendants.
+
+Since this interface is intended to be fast and cheap it doesn't
+guarantee to provide precise results and some children might be
+skipped, especially if they've exited right after we printed their
+pids, so one need to either stop or freeze processes being inspected
+if precise results are needed.
+
+
 ------------------------------------------------------------------------------
 Configuring procfs
 ------------------------------------------------------------------------------
Index: linux-2.6.git/fs/proc/array.c
===================================================================
--- linux-2.6.git.orig/fs/proc/array.c
+++ linux-2.6.git/fs/proc/array.c
@@ -556,3 +556,126 @@ int proc_pid_statm(struct seq_file *m, s
 
 	return 0;
 }
+
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static struct pid *
+get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
+{
+	struct task_struct *start, *task;
+	struct pid *pid = NULL;
+
+	read_lock(&tasklist_lock);
+
+	start = pid_task(proc_pid(inode), PIDTYPE_PID);
+	if (!start)
+		goto out;
+
+	/*
+	 * Lets try to continue searching first, this gives
+	 * us significant speedup on children-rich processes.
+	 */
+	if (pid_prev) {
+		task = pid_task(pid_prev, PIDTYPE_PID);
+		if (task && task->real_parent == start &&
+		    !(list_empty(&task->sibling))) {
+			if (list_is_last(&task->sibling, &start->children))
+				goto out;
+			task = list_first_entry(&task->sibling,
+						struct task_struct, sibling);
+			pid = get_pid(task_pid(task));
+			goto out;
+		}
+	}
+
+	/*
+	 * Slow search case.
+	 *
+	 * We might miss some children here if children
+	 * are exited while we were not holding the lock,
+	 * but it was never promised to be accurate that
+	 * much.
+	 *
+	 * "Just suppose that the parent sleeps, but N children
+	 *  exit after we printed their tids. Now the slow paths
+	 *  skips N extra children, we miss N tasks." (c)
+	 *
+	 * So one need to stop or freeze the leader and all
+	 * its children to get a precise result.
+	 */
+	list_for_each_entry(task, &start->children, sibling) {
+		if (pos-- == 0) {
+			pid = get_pid(task_pid(task));
+			break;
+		}
+	}
+
+out:
+	read_unlock(&tasklist_lock);
+	return pid;
+}
+
+static int children_seq_show(struct seq_file *seq, void *v)
+{
+	struct inode *inode = seq->private;
+	pid_t pid;
+
+	pid = pid_nr_ns(v, inode->i_sb->s_fs_info);
+	return seq_printf(seq, "%d ", pid);
+}
+
+static void *children_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	return get_children_pid(seq->private, NULL, *pos);
+}
+
+static void *children_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct pid *pid;
+
+	pid = get_children_pid(seq->private, v, *pos + 1);
+	put_pid(v);
+
+	++*pos;
+	return pid;
+}
+
+static void children_seq_stop(struct seq_file *seq, void *v)
+{
+	put_pid(v);
+}
+
+static const struct seq_operations children_seq_ops = {
+	.start	= children_seq_start,
+	.next	= children_seq_next,
+	.stop	= children_seq_stop,
+	.show	= children_seq_show,
+};
+
+static int children_seq_open(struct inode *inode, struct file *file)
+{
+	struct seq_file *m;
+	int ret;
+
+	ret = seq_open(file, &children_seq_ops);
+	if (ret)
+		return ret;
+
+	m = file->private_data;
+	m->private = inode;
+
+	return ret;
+}
+
+int children_seq_release(struct inode *inode, struct file *file)
+{
+	seq_release(inode, file);
+	return 0;
+}
+
+const struct file_operations proc_tid_children_operations = {
+	.open    = children_seq_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = children_seq_release,
+};
+#endif /* CONFIG_CHECKPOINT_RESTORE */
Index: linux-2.6.git/fs/proc/base.c
===================================================================
--- linux-2.6.git.orig/fs/proc/base.c
+++ linux-2.6.git/fs/proc/base.c
@@ -3350,6 +3350,9 @@ static const struct pid_entry tid_base_s
 	ONE("stat",      S_IRUGO, proc_tid_stat),
 	ONE("statm",     S_IRUGO, proc_pid_statm),
 	REG("maps",      S_IRUGO, proc_maps_operations),
+#ifdef CONFIG_CHECKPOINT_RESTORE
+	REG("children",  S_IRUGO, proc_tid_children_operations),
+#endif
 #ifdef CONFIG_NUMA
 	REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
 #endif
Index: linux-2.6.git/fs/proc/internal.h
===================================================================
--- linux-2.6.git.orig/fs/proc/internal.h
+++ linux-2.6.git/fs/proc/internal.h
@@ -53,6 +53,7 @@ extern int proc_pid_statm(struct seq_fil
 				struct pid *pid, struct task_struct *task);
 extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);
 
+extern const struct file_operations proc_tid_children_operations;
 extern const struct file_operations proc_maps_operations;
 extern const struct file_operations proc_numa_maps_operations;
 extern const struct file_operations proc_smaps_operations;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2/4] syscalls, x86: Add __NR_kcmp syscall v7
  2012-02-03 15:19 [patch 0/4] [patch 0/4] A pile in c/r sake v2 Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v9 Cyrill Gorcunov
@ 2012-02-03 15:19 ` Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
  3 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 15:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Cyrill Gorcunov,
	Pavel Emelyanov, Andrey Vagin, KOSAKI Motohiro, Thomas Gleixner,
	Glauber Costa, Andi Kleen, Tejun Heo, Matt Helsley, Pekka Enberg,
	Eric Dumazet, Vasiliy Kulikov, Alexey Dobriyan, Valdis.Kletnieks,
	Michal Marek

[-- Attachment #1: sys-kcmp-14 --]
[-- Type: text/plain, Size: 12018 bytes --]

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are shared
between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the unshare
syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g. mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate. So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and
(in case of comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Andrey Vagin <avagin@openvz.org>
CC: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Glauber Costa <glommer@parallels.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tejun Heo <tj@kernel.org>
CC: Matt Helsley <matthltc@us.ibm.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Valdis.Kletnieks@vt.edu
CC: Michal Marek <mmarek@suse.cz>
---
 arch/x86/syscalls/syscall_32.tbl         |    1 
 arch/x86/syscalls/syscall_64.tbl         |    1 
 include/linux/kcmp.h                     |   17 +++
 include/linux/syscalls.h                 |    2 
 kernel/Makefile                          |    3 
 kernel/kcmp.c                            |  158 +++++++++++++++++++++++++++++++
 kernel/sys_ni.c                          |    3 
 tools/testing/selftests/kcmp/Makefile    |   36 +++++++
 tools/testing/selftests/kcmp/kcmp_test.c |   84 ++++++++++++++++
 tools/testing/selftests/run_tests        |    2 
 10 files changed, 306 insertions(+), 1 deletion(-)

Index: linux-2.6.git/arch/x86/syscalls/syscall_32.tbl
===================================================================
--- linux-2.6.git.orig/arch/x86/syscalls/syscall_32.tbl
+++ linux-2.6.git/arch/x86/syscalls/syscall_32.tbl
@@ -355,3 +355,4 @@
 346	i386	setns			sys_setns
 347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
 348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+349	i386	kcmp			sys_kcmp
Index: linux-2.6.git/arch/x86/syscalls/syscall_64.tbl
===================================================================
--- linux-2.6.git.orig/arch/x86/syscalls/syscall_64.tbl
+++ linux-2.6.git/arch/x86/syscalls/syscall_64.tbl
@@ -318,3 +318,4 @@
 309	64	getcpu			sys_getcpu
 310	64	process_vm_readv	sys_process_vm_readv
 311	64	process_vm_writev	sys_process_vm_writev
+312	64	kcmp			sys_kcmp
Index: linux-2.6.git/include/linux/kcmp.h
===================================================================
--- /dev/null
+++ linux-2.6.git/include/linux/kcmp.h
@@ -0,0 +1,17 @@
+#ifndef _LINUX_KCMP_H
+#define _LINUX_KCMP_H
+
+/* Comparison type */
+enum kcmp_type {
+	KCMP_FILE,
+	KCMP_VM,
+	KCMP_FILES,
+	KCMP_FS,
+	KCMP_SIGHAND,
+	KCMP_IO,
+	KCMP_SYSVSEM,
+
+	KCMP_TYPES,
+};
+
+#endif /* _LINUX_KCMP_H */
Index: linux-2.6.git/include/linux/syscalls.h
===================================================================
--- linux-2.6.git.orig/include/linux/syscalls.h
+++ linux-2.6.git/include/linux/syscalls.h
@@ -857,4 +857,6 @@ asmlinkage long sys_process_vm_writev(pi
 				      unsigned long riovcnt,
 				      unsigned long flags);
 
+asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
+			 unsigned long idx1, unsigned long idx2);
 #endif
Index: linux-2.6.git/kernel/Makefile
===================================================================
--- linux-2.6.git.orig/kernel/Makefile
+++ linux-2.6.git/kernel/Makefile
@@ -25,6 +25,9 @@ endif
 obj-y += sched/
 obj-y += power/
 
+ifeq ($(CONFIG_CHECKPOINT_RESTORE),y)
+obj-$(CONFIG_X86) += kcmp.o
+endif
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
Index: linux-2.6.git/kernel/kcmp.c
===================================================================
--- /dev/null
+++ linux-2.6.git/kernel/kcmp.c
@@ -0,0 +1,158 @@
+#include <linux/kernel.h>
+#include <linux/syscalls.h>
+#include <linux/fdtable.h>
+#include <linux/string.h>
+#include <linux/random.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/cache.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <linux/kcmp.h>
+
+#include <asm/unistd.h>
+
+/*
+ * We don't expose real in-memory order of objects for security
+ * reasons, still the comparison results should be suitable for
+ * sorting. Thus, we obfuscate kernel pointers values and compare
+ * the production instead.
+ */
+static unsigned long cookies[KCMP_TYPES][2] __read_mostly;
+
+static long kptr_obfuscate(long v, int type)
+{
+	return (v ^ cookies[type][0]) * cookies[type][1];
+}
+
+/*
+ * 0 - equal, i.e. v1 = v2
+ * 1 - less than, i.e. v1 < v2
+ * 2 - greater than, i.e. v1 > v2
+ * 3 - not equal but ordering unavailable (reserved for future)
+ */
+static int kcmp_ptr(void *v1, void *v2, enum kcmp_type type)
+{
+	long ret;
+
+	ret = kptr_obfuscate((long)v1, type) - kptr_obfuscate((long)v2, type);
+
+	return (ret < 0) | ((ret > 0) << 1);
+}
+
+/* The caller must have pinned the task */
+static struct file *
+get_file_raw_ptr(struct task_struct *task, unsigned int idx)
+{
+	struct fdtable *fdt;
+	struct file *file;
+
+	spin_lock(&task->files->file_lock);
+	fdt = files_fdtable(task->files);
+	if (idx < fdt->max_fds)
+		file = fdt->fd[idx];
+	else
+		file = NULL;
+	spin_unlock(&task->files->file_lock);
+
+	return file;
+}
+
+SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
+		unsigned long, idx1, unsigned long, idx2)
+{
+	struct task_struct *task1, *task2;
+	int ret;
+
+	rcu_read_lock();
+
+	/*
+	 * Tasks are looked up in caller's PID namespace only.
+	 */
+	task1 = find_task_by_vpid(pid1);
+	task2 = find_task_by_vpid(pid2);
+	if (!task1 || !task2)
+		goto err_no_task;
+
+	get_task_struct(task1);
+	get_task_struct(task2);
+
+	rcu_read_unlock();
+
+	/*
+	 * One should have enough rights to inspect task details.
+	 */
+	if (!ptrace_may_access(task1, PTRACE_MODE_READ) ||
+	    !ptrace_may_access(task2, PTRACE_MODE_READ)) {
+		ret = -EACCES;
+		goto err;
+	}
+
+	switch (type) {
+	case KCMP_FILE: {
+		struct file *filp1, *filp2;
+
+		filp1 = get_file_raw_ptr(task1, idx1);
+		filp2 = get_file_raw_ptr(task2, idx2);
+
+		if (filp1 && filp2)
+			ret = kcmp_ptr(filp1, filp2, KCMP_FILE);
+		else
+			ret = -EBADF;
+		break;
+	}
+	case KCMP_VM:
+		ret = kcmp_ptr(task1->mm, task2->mm, KCMP_VM);
+		break;
+	case KCMP_FILES:
+		ret = kcmp_ptr(task1->files, task2->files, KCMP_FILES);
+		break;
+	case KCMP_FS:
+		ret = kcmp_ptr(task1->fs, task2->fs, KCMP_FS);
+		break;
+	case KCMP_SIGHAND:
+		ret = kcmp_ptr(task1->sighand, task2->sighand, KCMP_SIGHAND);
+		break;
+	case KCMP_IO:
+		ret = kcmp_ptr(task1->io_context, task2->io_context, KCMP_IO);
+		break;
+	case KCMP_SYSVSEM:
+#ifdef CONFIG_SYSVIPC
+		ret = kcmp_ptr(task1->sysvsem.undo_list,
+			       task2->sysvsem.undo_list,
+			       KCMP_SYSVSEM);
+#else
+		ret = -EOPNOTSUP;
+#endif
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+err:
+	put_task_struct(task1);
+	put_task_struct(task2);
+
+	return ret;
+
+err_no_task:
+	rcu_read_unlock();
+	return -ESRCH;
+}
+
+static __init int kcmp_cookies_init(void)
+{
+	const int size = sizeof(cookies[0][0]);
+	int i, j;
+
+	for (i = 0; i < KCMP_TYPES; i++) {
+		for (j = 0; j < 2; j++)
+			get_random_bytes(&cookies[i][j], size);
+
+		cookies[i][1] |= (~(~0UL >>  1) | 1);
+	}
+
+	return 0;
+}
+arch_initcall(kcmp_cookies_init);
Index: linux-2.6.git/kernel/sys_ni.c
===================================================================
--- linux-2.6.git.orig/kernel/sys_ni.c
+++ linux-2.6.git/kernel/sys_ni.c
@@ -203,3 +203,6 @@ cond_syscall(sys_fanotify_mark);
 cond_syscall(sys_name_to_handle_at);
 cond_syscall(sys_open_by_handle_at);
 cond_syscall(compat_sys_open_by_handle_at);
+
+/* compare kernel pointers */
+cond_syscall(sys_kcmp);
Index: linux-2.6.git/tools/testing/selftests/kcmp/Makefile
===================================================================
--- /dev/null
+++ linux-2.6.git/tools/testing/selftests/kcmp/Makefile
@@ -0,0 +1,36 @@
+ifeq ($(strip $(V)),)
+	E = @echo
+	Q = @
+else
+	E = @\#
+	Q =
+endif
+export E Q
+
+uname_M := $(shell uname -m 2>/dev/null || echo not)
+ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
+ifeq ($(ARCH),i386)
+        ARCH := X86
+	CFLAGS := -DCONFIG_X86_32 -D__i386__
+endif
+ifeq ($(ARCH),x86_64)
+	ARCH := X86
+	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
+endif
+
+CFLAGS += -I../../../../arch/x86/include/generated/
+CFLAGS += -I../../../../include/
+CFLAGS += -I../../../../usr/include/
+
+all:
+ifeq ($(ARCH),X86)
+	$(E) "  CC run_test"
+	$(Q) gcc $(CFLAGS) kcmp_test.c -o run_test
+else
+	$(E) "Not an x86 target, can't build kcmp selftest"
+endif
+
+clean:
+	$(E) "  CLEAN"
+	$(Q) rm -fr ./run_test
+	$(Q) rm -fr ./test-file
Index: linux-2.6.git/tools/testing/selftests/kcmp/kcmp_test.c
===================================================================
--- /dev/null
+++ linux-2.6.git/tools/testing/selftests/kcmp/kcmp_test.c
@@ -0,0 +1,84 @@
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <limits.h>
+#include <unistd.h>
+#include <errno.h>
+#include <string.h>
+#include <fcntl.h>
+
+#include <linux/unistd.h>
+#include <linux/kcmp.h>
+
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+
+static long sys_kcmp(int pid1, int pid2, int type, int fd1, int fd2)
+{
+	return syscall(__NR_kcmp, pid1, pid2, type, fd1, fd2);
+}
+
+int main(int argc, char **argv)
+{
+	const char kpath[] = "kcmp-test-file";
+	int pid1, pid2;
+	int fd1, fd2;
+	int status;
+
+	fd1 = open(kpath, O_RDWR | O_CREAT | O_TRUNC, 0644);
+	pid1 = getpid();
+
+	if (fd1 < 0) {
+		perror("Can't create file");
+		exit(1);
+	}
+
+	pid2 = fork();
+	if (pid2 < 0) {
+		perror("fork failed");
+		exit(1);
+	}
+
+	if (!pid2) {
+		int pid2 = getpid();
+		int ret;
+
+		fd2 = open(kpath, O_RDWR, 0644);
+		if (fd2 < 0) {
+			perror("Can't open file");
+			exit(1);
+		}
+
+		/* An example of output and arguments */
+		printf("pid1: %6d pid2: %6d FD: %2d FILES: %2d VM: %2d FS: %2d "
+		       "SIGHAND: %2d IO: %2d SYSVSEM: %2d INV: %2d\n",
+		       pid1, pid2,
+		       sys_kcmp(pid1, pid2, KCMP_FILE,		fd1, fd2),
+		       sys_kcmp(pid1, pid2, KCMP_FILES,		0, 0),
+		       sys_kcmp(pid1, pid2, KCMP_VM,		0, 0),
+		       sys_kcmp(pid1, pid2, KCMP_FS,		0, 0),
+		       sys_kcmp(pid1, pid2, KCMP_SIGHAND,	0, 0),
+		       sys_kcmp(pid1, pid2, KCMP_IO,		0, 0),
+		       sys_kcmp(pid1, pid2, KCMP_SYSVSEM,	0, 0),
+
+			/* This one should fail */
+		       sys_kcmp(pid1, pid2, KCMP_TYPES + 1,	0, 0));
+
+		/* This one should return same fd */
+		ret = sys_kcmp(pid1, pid2, KCMP_FILE, fd1, fd1);
+		if (ret) {
+			printf("FAIL: 0 expected but %d returned\n", ret);
+			ret = -1;
+		} else
+			printf("PASS: 0 returned as expected\n");
+		exit(ret);
+	}
+
+	waitpid(pid2, &status, P_ALL);
+
+	return 0;
+}
Index: linux-2.6.git/tools/testing/selftests/run_tests
===================================================================
--- linux-2.6.git.orig/tools/testing/selftests/run_tests
+++ linux-2.6.git/tools/testing/selftests/run_tests
@@ -1,6 +1,6 @@
 #!/bin/bash
 
-TARGETS=breakpoints
+TARGETS="breakpoints kcmp"
 
 for TARGET in $TARGETS
 do


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
  2012-02-03 15:19 [patch 0/4] [patch 0/4] A pile in c/r sake v2 Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v9 Cyrill Gorcunov
  2012-02-03 15:19 ` [patch 2/4] syscalls, x86: Add __NR_kcmp syscall v7 Cyrill Gorcunov
@ 2012-02-03 15:19 ` Cyrill Gorcunov
  2012-02-03 16:37   ` Kees Cook
  2012-02-03 15:19 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
  3 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 15:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Cyrill Gorcunov,
	Kees Cook, Pavel Emelyanov, Serge Hallyn, KAMEZAWA Hiroyuki,
	Alexey Dobriyan, Tejun Heo, Andrew Vagin, Vasiliy Kulikov

[-- Attachment #1: fs-proc-extend-state-2-linux-next --]
[-- Type: text/plain, Size: 3079 bytes --]

We would like to have an ability to restore command line
arguments and program environment pointers but first we
need to obtain them somehow. Thus we put these values into
/proc/$pid/stat. The exit_code is needed to restore zombie
tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/filesystems/proc.txt |    5 +++++
 fs/proc/array.c                    |   20 +++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -311,6 +311,11 @@ Table 1-4: Contents of the stat files (a
   start_data    address above which program data+bss is placed
   end_data      address below which program data+bss is placed
   start_brk     address above which program heap can be expanded with brk()
+  arg_start     address above which program command line is placed
+  arg_end       address below which program command line is placed
+  env_start     address above which program environment is placed
+  env_end       address below which program environment is placed
+  exit_code     the thread's exit_code in the form reported by the waitpid system call
 ..............................................................................
 
 The /proc/PID/maps file containing the currently mapped memory regions and
Index: linux-2.6.git/fs/proc/array.c
===================================================================
--- linux-2.6.git.orig/fs/proc/array.c
+++ linux-2.6.git/fs/proc/array.c
@@ -508,9 +508,23 @@ static int do_task_stat(struct seq_file
 	seq_put_decimal_ull(m, ' ', delayacct_blkio_ticks(task));
 	seq_put_decimal_ull(m, ' ', cputime_to_clock_t(gtime));
 	seq_put_decimal_ll(m, ' ', cputime_to_clock_t(cgtime));
-	seq_put_decimal_ull(m, ' ', (mm && permitted) ? mm->start_data : 0);
-	seq_put_decimal_ull(m, ' ', (mm && permitted) ? mm->end_data : 0);
-	seq_put_decimal_ull(m, ' ', (mm && permitted) ? mm->start_brk : 0);
+
+	if (mm && permitted) {
+		seq_put_decimal_ull(m, ' ', mm->start_data);
+		seq_put_decimal_ull(m, ' ', mm->end_data);
+		seq_put_decimal_ull(m, ' ', mm->start_brk);
+		seq_put_decimal_ull(m, ' ', mm->arg_start);
+		seq_put_decimal_ull(m, ' ', mm->arg_end);
+		seq_put_decimal_ull(m, ' ', mm->env_start);
+		seq_put_decimal_ull(m, ' ', mm->env_end);
+	} else
+		seq_printf(m, " 0 0 0 0 0 0 0");
+
+	if (permitted)
+		seq_put_decimal_ll(m, ' ', task->exit_code);
+	else
+		seq_put_decimal_ll(m, ' ', 0);
+
 	seq_putc(m, '\n');
 	if (mm)
 		mmput(mm);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-02-03 15:19 [patch 0/4] [patch 0/4] A pile in c/r sake v2 Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2012-02-03 15:19 ` [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat Cyrill Gorcunov
@ 2012-02-03 15:19 ` Cyrill Gorcunov
  2012-02-03 16:56   ` Kees Cook
  3 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 15:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Cyrill Gorcunov,
	Kees Cook, Tejun Heo, Andrew Vagin, Serge Hallyn,
	Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki,
	Michael Kerrisk

[-- Attachment #1: prctl-restore-mm-members-5 --]
[-- Type: text/plain, Size: 6285 bytes --]

During checkpoint we dump whole process memory to a file and
the dump includes process stack memory. But among stack data
itself, the stack carries additional parameters such as command
line arguments, environment data and auxiliary vector.

So when we do restore procedure and once we've restored stack
data itself we need to setup mm_struct::arg_start/end,
env_start/end, so restored process would be able to find
command line arguments and environment data it had at checkpoint
time. The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END |
ENV_START | ENV_END | AUXV) codes are introduced.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/prctl.h |    5 ++
 kernel/sys.c          |  105 +++++++++++++++++++++++++++++++++++---------------
 2 files changed, 80 insertions(+), 30 deletions(-)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -113,5 +113,10 @@
 # define PR_SET_MM_START_STACK		5
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
+# define PR_SET_MM_ARG_START		8
+# define PR_SET_MM_ARG_END		9
+# define PR_SET_MM_ENV_START		10
+# define PR_SET_MM_ENV_END		11
+# define PR_SET_MM_AUXV			12
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1693,17 +1693,23 @@ SYSCALL_DEFINE1(umask, int, mask)
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
+static bool vma_flags_mismatch(struct vm_area_struct *vma,
+			       unsigned long required,
+			       unsigned long banned)
+{
+	return (vma->vm_flags & required) != required ||
+		(vma->vm_flags & banned);
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	unsigned long rlim = rlimit(RLIMIT_DATA);
-	unsigned long vm_req_flags;
-	unsigned long vm_bad_flags;
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	int error = 0;
-	struct mm_struct *mm = current->mm;
 
-	if (arg4 | arg5)
+	if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
 		return -EINVAL;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -1715,7 +1721,9 @@ static int prctl_set_mm(int opt, unsigne
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, addr);
 
-	if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+	if (opt != PR_SET_MM_START_BRK &&
+	    opt != PR_SET_MM_BRK &&
+	    opt != PR_SET_MM_AUXV) {
 		/* It must be existing VMA */
 		if (!vma || vma->vm_start > addr)
 			goto out;
@@ -1725,11 +1733,8 @@ static int prctl_set_mm(int opt, unsigne
 	switch (opt) {
 	case PR_SET_MM_START_CODE:
 	case PR_SET_MM_END_CODE:
-		vm_req_flags = VM_READ | VM_EXEC;
-		vm_bad_flags = VM_WRITE | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_EXEC,
+				       VM_WRITE | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_CODE)
@@ -1740,11 +1745,8 @@ static int prctl_set_mm(int opt, unsigne
 
 	case PR_SET_MM_START_DATA:
 	case PR_SET_MM_END_DATA:
-		vm_req_flags = VM_READ | VM_WRITE;
-		vm_bad_flags = VM_EXEC | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE,
+				       VM_EXEC | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_DATA)
@@ -1753,19 +1755,6 @@ static int prctl_set_mm(int opt, unsigne
 			mm->end_data = addr;
 		break;
 
-	case PR_SET_MM_START_STACK:
-
-#ifdef CONFIG_STACK_GROWSUP
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
-#else
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
-#endif
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
-			goto out;
-
-		mm->start_stack = addr;
-		break;
-
 	case PR_SET_MM_START_BRK:
 		if (addr <= mm->end_data)
 			goto out;
@@ -1790,16 +1779,72 @@ static int prctl_set_mm(int opt, unsigne
 		mm->brk = addr;
 		break;
 
+	/*
+	 * If command line arguments and environment
+	 * are placed somewhere else on stack, we can
+	 * set them up here, ARG_START/END to setup
+	 * command line argumets and ENV_START/END
+	 * for environment.
+	 */
+	case PR_SET_MM_START_STACK:
+	case PR_SET_MM_ARG_START:
+	case PR_SET_MM_ARG_END:
+	case PR_SET_MM_ENV_START:
+	case PR_SET_MM_ENV_END:
+#ifdef CONFIG_STACK_GROWSUP
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSUP, 0))
+#else
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSDOWN, 0))
+#endif
+			goto out;
+		if (opt == PR_SET_MM_START_STACK)
+			mm->start_stack = addr;
+		else if (opt == PR_SET_MM_ARG_START)
+			mm->arg_start = addr;
+		else if (opt == PR_SET_MM_ARG_END)
+			mm->arg_end = addr;
+		else if (opt == PR_SET_MM_ENV_START)
+			mm->env_start = addr;
+		else if (opt == PR_SET_MM_ENV_END)
+			mm->env_end = addr;
+		break;
+
+	/*
+	 * This doesn't move auxiliary vector itself
+	 * since it's pinned to mm_struct, but allow
+	 * to fill vector with new values. It's up
+	 * to a caller to provide sane values here
+	 * otherwise user space tools which use this
+	 * vector might be unhappy.
+	 */
+	case PR_SET_MM_AUXV: {
+		unsigned long user_auxv[AT_VECTOR_SIZE];
+
+		if (arg4 > sizeof(mm->saved_auxv))
+			goto out;
+		up_read(&mm->mmap_sem);
+
+		if (copy_from_user(user_auxv, (const void __user *)addr, arg4))
+			return -EFAULT;
+
+		/* Make sure the last entry is always AT_NULL */
+		user_auxv[AT_VECTOR_SIZE - 2] = 0;
+		user_auxv[AT_VECTOR_SIZE - 1] = 0;
+
+		task_lock(current);
+		memcpy(mm->saved_auxv, user_auxv, arg4);
+		task_unlock(current);
+
+		return 0;
+	}
 	default:
 		error = -EINVAL;
 		goto out;
 	}
 
 	error = 0;
-
 out:
 	up_read(&mm->mmap_sem);
-
 	return error;
 }
 #else /* CONFIG_CHECKPOINT_RESTORE */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
  2012-02-03 15:19 ` [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat Cyrill Gorcunov
@ 2012-02-03 16:37   ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2012-02-03 16:37 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Pavel Emelyanov,
	Serge Hallyn, KAMEZAWA Hiroyuki, Alexey Dobriyan, Tejun Heo,
	Andrew Vagin, Vasiliy Kulikov

On Fri, Feb 3, 2012 at 7:19 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> We would like to have an ability to restore command line
> arguments and program environment pointers but first we
> need to obtain them somehow. Thus we put these values into
> /proc/$pid/stat. The exit_code is needed to restore zombie
> tasks.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-02-03 15:19 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
@ 2012-02-03 16:56   ` Kees Cook
  2012-02-03 17:10     ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Kees Cook @ 2012-02-03 16:56 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Andrew Vagin, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov,
	KAMEZAWA Hiroyuki, Michael Kerrisk

On Fri, Feb 3, 2012 at 7:19 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> +       case PR_SET_MM_AUXV: {
> +               unsigned long user_auxv[AT_VECTOR_SIZE];
> +
> +               if (arg4 > sizeof(mm->saved_auxv))
> +                       goto out;

While these are both AT_VECTOR_SIZE, I think it might be better to use
sizeof(mm->saved_auxv) instead of AT_VECTOR_SIZE, just so that they
can never get out sync and there's a single reference for the size.

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-02-03 16:56   ` Kees Cook
@ 2012-02-03 17:10     ` Cyrill Gorcunov
  2012-02-03 17:26       ` Kees Cook
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-02-03 17:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Andrew Vagin, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov,
	KAMEZAWA Hiroyuki, Michael Kerrisk

On Fri, Feb 03, 2012 at 08:56:20AM -0800, Kees Cook wrote:
> On Fri, Feb 3, 2012 at 7:19 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> > +       case PR_SET_MM_AUXV: {
> > +               unsigned long user_auxv[AT_VECTOR_SIZE];
> > +
> > +               if (arg4 > sizeof(mm->saved_auxv))
> > +                       goto out;
> 
> While these are both AT_VECTOR_SIZE, I think it might be better to use
> sizeof(mm->saved_auxv) instead of AT_VECTOR_SIZE, just so that they
> can never get out sync and there's a single reference for the size.
> 

I suppose you meant ARRAY_SIZE rather since plain sizeof will give you
the summary size in bytes, but I think I have better idea -- lets put
BUILD_BUG_ON here, like below.

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries v2

During checkpoint we dump whole process memory to a file and
the dump includes process stack memory. But among stack data
itself, the stack carries additional parameters such as command
line arguments, environment data and auxiliary vector.

So when we do restore procedure and once we've restored stack
data itself we need to setup mm_struct::arg_start/end,
env_start/end, so restored process would be able to find
command line arguments and environment data it had at checkpoint
time. The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END |
ENV_START | ENV_END | AUXV) codes are introduced.

v2: Add BUILD_BUG_ON guard

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/prctl.h |    5 ++
 kernel/sys.c          |  107 +++++++++++++++++++++++++++++++++++---------------
 2 files changed, 82 insertions(+), 30 deletions(-)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -113,5 +113,10 @@
 # define PR_SET_MM_START_STACK		5
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
+# define PR_SET_MM_ARG_START		8
+# define PR_SET_MM_ARG_END		9
+# define PR_SET_MM_ENV_START		10
+# define PR_SET_MM_ENV_END		11
+# define PR_SET_MM_AUXV			12
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1693,17 +1693,23 @@ SYSCALL_DEFINE1(umask, int, mask)
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
+static bool vma_flags_mismatch(struct vm_area_struct *vma,
+			       unsigned long required,
+			       unsigned long banned)
+{
+	return (vma->vm_flags & required) != required ||
+		(vma->vm_flags & banned);
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	unsigned long rlim = rlimit(RLIMIT_DATA);
-	unsigned long vm_req_flags;
-	unsigned long vm_bad_flags;
+	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	int error = 0;
-	struct mm_struct *mm = current->mm;
 
-	if (arg4 | arg5)
+	if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
 		return -EINVAL;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -1715,7 +1721,9 @@ static int prctl_set_mm(int opt, unsigne
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, addr);
 
-	if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+	if (opt != PR_SET_MM_START_BRK &&
+	    opt != PR_SET_MM_BRK &&
+	    opt != PR_SET_MM_AUXV) {
 		/* It must be existing VMA */
 		if (!vma || vma->vm_start > addr)
 			goto out;
@@ -1725,11 +1733,8 @@ static int prctl_set_mm(int opt, unsigne
 	switch (opt) {
 	case PR_SET_MM_START_CODE:
 	case PR_SET_MM_END_CODE:
-		vm_req_flags = VM_READ | VM_EXEC;
-		vm_bad_flags = VM_WRITE | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_EXEC,
+				       VM_WRITE | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_CODE)
@@ -1740,11 +1745,8 @@ static int prctl_set_mm(int opt, unsigne
 
 	case PR_SET_MM_START_DATA:
 	case PR_SET_MM_END_DATA:
-		vm_req_flags = VM_READ | VM_WRITE;
-		vm_bad_flags = VM_EXEC | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE,
+				       VM_EXEC | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_DATA)
@@ -1753,19 +1755,6 @@ static int prctl_set_mm(int opt, unsigne
 			mm->end_data = addr;
 		break;
 
-	case PR_SET_MM_START_STACK:
-
-#ifdef CONFIG_STACK_GROWSUP
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
-#else
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
-#endif
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
-			goto out;
-
-		mm->start_stack = addr;
-		break;
-
 	case PR_SET_MM_START_BRK:
 		if (addr <= mm->end_data)
 			goto out;
@@ -1790,16 +1779,74 @@ static int prctl_set_mm(int opt, unsigne
 		mm->brk = addr;
 		break;
 
+	/*
+	 * If command line arguments and environment
+	 * are placed somewhere else on stack, we can
+	 * set them up here, ARG_START/END to setup
+	 * command line argumets and ENV_START/END
+	 * for environment.
+	 */
+	case PR_SET_MM_START_STACK:
+	case PR_SET_MM_ARG_START:
+	case PR_SET_MM_ARG_END:
+	case PR_SET_MM_ENV_START:
+	case PR_SET_MM_ENV_END:
+#ifdef CONFIG_STACK_GROWSUP
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSUP, 0))
+#else
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSDOWN, 0))
+#endif
+			goto out;
+		if (opt == PR_SET_MM_START_STACK)
+			mm->start_stack = addr;
+		else if (opt == PR_SET_MM_ARG_START)
+			mm->arg_start = addr;
+		else if (opt == PR_SET_MM_ARG_END)
+			mm->arg_end = addr;
+		else if (opt == PR_SET_MM_ENV_START)
+			mm->env_start = addr;
+		else if (opt == PR_SET_MM_ENV_END)
+			mm->env_end = addr;
+		break;
+
+	/*
+	 * This doesn't move auxiliary vector itself
+	 * since it's pinned to mm_struct, but allow
+	 * to fill vector with new values. It's up
+	 * to a caller to provide sane values here
+	 * otherwise user space tools which use this
+	 * vector might be unhappy.
+	 */
+	case PR_SET_MM_AUXV: {
+		unsigned long user_auxv[AT_VECTOR_SIZE];
+
+		if (arg4 > sizeof(user_auxv))
+			goto out;
+		up_read(&mm->mmap_sem);
+
+		if (copy_from_user(user_auxv, (const void __user *)addr, arg4))
+			return -EFAULT;
+
+		/* Make sure the last entry is always AT_NULL */
+		user_auxv[AT_VECTOR_SIZE - 2] = 0;
+		user_auxv[AT_VECTOR_SIZE - 1] = 0;
+
+		BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
+
+		task_lock(current);
+		memcpy(mm->saved_auxv, user_auxv, arg4);
+		task_unlock(current);
+
+		return 0;
+	}
 	default:
 		error = -EINVAL;
 		goto out;
 	}
 
 	error = 0;
-
 out:
 	up_read(&mm->mmap_sem);
-
 	return error;
 }
 #else /* CONFIG_CHECKPOINT_RESTORE */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-02-03 17:10     ` Cyrill Gorcunov
@ 2012-02-03 17:26       ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2012-02-03 17:26 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Andrew Morton, Eric W. Biederman, Pavel Emelyanov,
	KOSAKI Motohiro, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Andrew Vagin, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov,
	KAMEZAWA Hiroyuki, Michael Kerrisk

On Fri, Feb 3, 2012 at 9:10 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> On Fri, Feb 03, 2012 at 08:56:20AM -0800, Kees Cook wrote:
>> On Fri, Feb 3, 2012 at 7:19 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
>> > +       case PR_SET_MM_AUXV: {
>> > +               unsigned long user_auxv[AT_VECTOR_SIZE];
>> > +
>> > +               if (arg4 > sizeof(mm->saved_auxv))
>> > +                       goto out;
>>
>> While these are both AT_VECTOR_SIZE, I think it might be better to use
>> sizeof(mm->saved_auxv) instead of AT_VECTOR_SIZE, just so that they
>> can never get out sync and there's a single reference for the size.
>>
>
> I suppose you meant ARRAY_SIZE rather since plain sizeof will give you
> the summary size in bytes, but I think I have better idea -- lets put
> BUILD_BUG_ON here, like below.

Ah, cool. Works for me. :)

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook
ChromeOS Security

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-01-23 15:55   ` Cyrill Gorcunov
@ 2012-01-23 20:02     ` Cyrill Gorcunov
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-01-23 20:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KAMEZAWA Hiroyuki,
	Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman,
	Alexey Dobriyan, Michael Kerrisk, Vasiliy Kulikov

On Mon, Jan 23, 2012 at 07:55:43PM +0400, Cyrill Gorcunov wrote:
> On Mon, Jan 23, 2012 at 06:20:40PM +0400, Cyrill Gorcunov wrote:
> > After restore we would like the 'ps' command show the command
> > line and evironment exactly the same it was at checkpoint time.
> > 
> > So this additional PR_SET_MM_ allow us to do so. Note that
> > these members of mm_struct is rather used for output in
> > procfs, except auxv vector which is used by ld.so mostly.
> > 
> 
> This one is with typo fixed.
> ---

After some more testing I found there is a proble, so I'll send
updated version later. Please dont review ;)

	Cyrill

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-01-23 14:20 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
@ 2012-01-23 15:55   ` Cyrill Gorcunov
  2012-01-23 20:02     ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-01-23 15:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KAMEZAWA Hiroyuki,
	Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman,
	Alexey Dobriyan, Michael Kerrisk, Vasiliy Kulikov

On Mon, Jan 23, 2012 at 06:20:40PM +0400, Cyrill Gorcunov wrote:
> After restore we would like the 'ps' command show the command
> line and evironment exactly the same it was at checkpoint time.
> 
> So this additional PR_SET_MM_ allow us to do so. Note that
> these members of mm_struct is rather used for output in
> procfs, except auxv vector which is used by ld.so mostly.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---

This one is with typo fixed.

	Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries

After restore we would like the 'ps' command show the command
line and evironment exactly the same it was at checkpoint time.

So this additional PR_SET_MM_ allow us to do so. Note that
these members of mm_struct is rather used for output in
procfs, except auxv vector which is used by ld.so mostly.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/prctl.h |    5 +++
 kernel/sys.c          |   73 +++++++++++++++++++++++++++++++-------------------
 2 files changed, 51 insertions(+), 27 deletions(-)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -113,5 +113,10 @@
 # define PR_SET_MM_START_STACK		5
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
+# define PR_SET_MM_ARG_START		8
+# define PR_SET_MM_ARG_END		9
+# define PR_SET_MM_ENV_START		10
+# define PR_SET_MM_ENV_END		11
+# define PR_SET_MM_AUXV			12
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1693,17 +1693,25 @@ SYSCALL_DEFINE1(umask, int, mask)
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
+static bool vma_flags_mismatch(struct vm_area_struct *vma,
+			       unsigned long required,
+			       unsigned long banned)
+{
+	return (vma->vm_flags & required) != required ||
+		(vma->vm_flags & banned);
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	unsigned long rlim = rlimit(RLIMIT_DATA);
-	unsigned long vm_req_flags;
-	unsigned long vm_bad_flags;
 	struct vm_area_struct *vma;
 	int error = 0;
 	struct mm_struct *mm = current->mm;
 
-	if (arg4 | arg5)
+	if (arg4 && opt != PR_SET_MM_AUXV)
+		return -EINVAL;
+	else if (arg4 | arg5)
 		return -EINVAL;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -1715,7 +1723,9 @@ static int prctl_set_mm(int opt, unsigne
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, addr);
 
-	if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+	if (opt != PR_SET_MM_START_BRK &&
+	    opt != PR_SET_MM_BRK &&
+	    opt != PR_SET_MM_AUXV) {
 		/* It must be existing VMA */
 		if (!vma || vma->vm_start > addr)
 			goto out;
@@ -1725,11 +1735,8 @@ static int prctl_set_mm(int opt, unsigne
 	switch (opt) {
 	case PR_SET_MM_START_CODE:
 	case PR_SET_MM_END_CODE:
-		vm_req_flags = VM_READ | VM_EXEC;
-		vm_bad_flags = VM_WRITE | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_EXEC,
+				       VM_WRITE | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_CODE)
@@ -1740,11 +1747,8 @@ static int prctl_set_mm(int opt, unsigne
 
 	case PR_SET_MM_START_DATA:
 	case PR_SET_MM_END_DATA:
-		vm_req_flags = VM_READ | VM_WRITE;
-		vm_bad_flags = VM_EXEC | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE,
+				       VM_EXEC | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_DATA)
@@ -1753,19 +1757,6 @@ static int prctl_set_mm(int opt, unsigne
 			mm->end_data = addr;
 		break;
 
-	case PR_SET_MM_START_STACK:
-
-#ifdef CONFIG_STACK_GROWSUP
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
-#else
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
-#endif
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
-			goto out;
-
-		mm->start_stack = addr;
-		break;
-
 	case PR_SET_MM_START_BRK:
 		if (addr <= mm->end_data)
 			goto out;
@@ -1790,6 +1781,34 @@ static int prctl_set_mm(int opt, unsigne
 		mm->brk = addr;
 		break;
 
+	case PR_SET_MM_START_STACK:
+	case PR_SET_MM_ARG_START:
+	case PR_SET_MM_ARG_END:
+	case PR_SET_MM_ENV_START:
+	case PR_SET_MM_ENV_END:
+#ifdef CONFIG_STACK_GROWSUP
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSUP, 0))
+#else
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSDOWN, 0))
+#endif
+			goto out;
+		if (opt == PR_SET_MM_START_STACK)
+			mm->start_stack = addr;
+		else if (opt == PR_SET_MM_ARG_START)
+			mm->arg_start = addr;
+		else if (opt == PR_SET_MM_ARG_END)
+			mm->arg_end = addr;
+		else if (opt == PR_SET_MM_ENV_START)
+			mm->env_start = addr;
+		else if (opt == PR_SET_MM_ENV_END)
+			mm->env_end = addr;
+		break;
+
+	case PR_SET_MM_AUXV:
+		if (arg4 > sizeof(mm->saved_auxv))
+			goto out;
+		error = copy_from_user(mm->saved_auxv, (const void __user *)addr, arg4);
+		break;
 	default:
 		error = -EINVAL;
 		goto out;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries
  2012-01-23 14:20 [patch 0/4] A few patches in a sake of c/r functionality Cyrill Gorcunov
@ 2012-01-23 14:20 ` Cyrill Gorcunov
  2012-01-23 15:55   ` Cyrill Gorcunov
  0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2012-01-23 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KAMEZAWA Hiroyuki,
	Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman,
	Alexey Dobriyan, Cyrill Gorcunov, Michael Kerrisk,
	Vasiliy Kulikov

[-- Attachment #1: prctl-restore-mm-members-2 --]
[-- Type: text/plain, Size: 5062 bytes --]

After restore we would like the 'ps' command show the command
line and evironment exactly the same it was at checkpoint time.

So this additional PR_SET_MM_ allow us to do so. Note that
these members of mm_struct is rather used for output in
procfs, except auxv vector which is used by ld.so mostly.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/prctl.h |    5 +++
 kernel/sys.c          |   73 +++++++++++++++++++++++++++++++-------------------
 2 files changed, 51 insertions(+), 27 deletions(-)

Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -113,5 +113,10 @@
 # define PR_SET_MM_START_STACK		5
 # define PR_SET_MM_START_BRK		6
 # define PR_SET_MM_BRK			7
+# define PR_SET_MM_ARG_START		8
+# define PR_SET_MM_ARG_END		9
+# define PR_SET_MM_ENV_START		10
+# define PR_SET_MM_ENV_END		11
+# define PR_SET_MM_AUXV			12
 
 #endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1693,17 +1693,25 @@ SYSCALL_DEFINE1(umask, int, mask)
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
+static bool vma_flags_mismatch(struct vm_area_struct *vma,
+			       unsigned long required,
+			       unsigned long banned)
+{
+	return (vma->vm_flags & required) != required ||
+		(vma->vm_flags & banned);
+}
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
 	unsigned long rlim = rlimit(RLIMIT_DATA);
-	unsigned long vm_req_flags;
-	unsigned long vm_bad_flags;
 	struct vm_area_struct *vma;
 	int error = 0;
 	struct mm_struct *mm = current->mm;
 
-	if (arg4 | arg5)
+	if (arg4 && opt != PR_SET_MM_AUXV)
+		return -EINVAL;
+	else if (arg4 | arg5)
 		return -EINVAL;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -1715,7 +1723,9 @@ static int prctl_set_mm(int opt, unsigne
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, addr);
 
-	if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+	if (opt != PR_SET_MM_START_BRK &&
+	    opt != PR_SET_MM_BRK &&
+	    opt != PR_SET_MM_AUXV) {
 		/* It must be existing VMA */
 		if (!vma || vma->vm_start > addr)
 			goto out;
@@ -1725,11 +1735,8 @@ static int prctl_set_mm(int opt, unsigne
 	switch (opt) {
 	case PR_SET_MM_START_CODE:
 	case PR_SET_MM_END_CODE:
-		vm_req_flags = VM_READ | VM_EXEC;
-		vm_bad_flags = VM_WRITE | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_EXEC,
+				       VM_WRITE | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_CODE)
@@ -1740,11 +1747,8 @@ static int prctl_set_mm(int opt, unsigne
 
 	case PR_SET_MM_START_DATA:
 	case PR_SET_MM_END_DATA:
-		vm_req_flags = VM_READ | VM_WRITE;
-		vm_bad_flags = VM_EXEC | VM_MAYSHARE;
-
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
-		    (vma->vm_flags & vm_bad_flags))
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE,
+				       VM_EXEC | VM_MAYSHARE))
 			goto out;
 
 		if (opt == PR_SET_MM_START_DATA)
@@ -1753,19 +1757,6 @@ static int prctl_set_mm(int opt, unsigne
 			mm->end_data = addr;
 		break;
 
-	case PR_SET_MM_START_STACK:
-
-#ifdef CONFIG_STACK_GROWSUP
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
-#else
-		vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
-#endif
-		if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
-			goto out;
-
-		mm->start_stack = addr;
-		break;
-
 	case PR_SET_MM_START_BRK:
 		if (addr <= mm->end_data)
 			goto out;
@@ -1790,6 +1781,34 @@ static int prctl_set_mm(int opt, unsigne
 		mm->brk = addr;
 		break;
 
+	case PR_SET_MM_START_STACK:
+	case PR_SET_MM_ARG_START:
+	case PR_SET_MM_ARG_END:
+	case PR_SET_MM_ENV_START:
+	case PR_SET_MM_ENV_END:
+#ifdef CONFIG_STACK_GROWSUP
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSUP, 0))
+#else
+		if (vma_flags_mismatch(vma, VM_READ | VM_WRITE | VM_GROWSDOWN, 0))
+#endif
+			goto out;
+		if (opt == PR_SET_MM_ARG_START)
+			mm->start_stack = addr;
+		else if (opt == PR_SET_MM_ARG_START)
+			mm->arg_start = addr;
+		else if (opt == PR_SET_MM_ARG_END)
+			mm->arg_end = addr;
+		else if (opt == PR_SET_MM_ENV_START)
+			mm->env_start = addr;
+		else if (opt == PR_SET_MM_ENV_END)
+			mm->env_end = addr;
+		break;
+
+	case PR_SET_MM_AUXV:
+		if (arg4 > sizeof(mm->saved_auxv))
+			goto out;
+		error = copy_from_user(mm->saved_auxv, (const void __user *)addr, arg4);
+		break;
 	default:
 		error = -EINVAL;
 		goto out;


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-02-03 17:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-03 15:19 [patch 0/4] [patch 0/4] A pile in c/r sake v2 Cyrill Gorcunov
2012-02-03 15:19 ` [patch 1/4] fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v9 Cyrill Gorcunov
2012-02-03 15:19 ` [patch 2/4] syscalls, x86: Add __NR_kcmp syscall v7 Cyrill Gorcunov
2012-02-03 15:19 ` [patch 3/4] c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat Cyrill Gorcunov
2012-02-03 16:37   ` Kees Cook
2012-02-03 15:19 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
2012-02-03 16:56   ` Kees Cook
2012-02-03 17:10     ` Cyrill Gorcunov
2012-02-03 17:26       ` Kees Cook
  -- strict thread matches above, loose matches on Subject: below --
2012-01-23 14:20 [patch 0/4] A few patches in a sake of c/r functionality Cyrill Gorcunov
2012-01-23 14:20 ` [patch 4/4] c/r: prctl: Extend PR_SET_MM to set up more mm_struct entries Cyrill Gorcunov
2012-01-23 15:55   ` Cyrill Gorcunov
2012-01-23 20:02     ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).