All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/4] prctl: set-mm -- Rework interface, v3
@ 2014-08-04 17:22 Cyrill Gorcunov
  2014-08-04 17:22 ` [patch 1/4] mm: Introduce check_data_rlimit helper, v2 Cyrill Gorcunov
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

Hi! Here is (I hope) a final version of the series (i've had a typo
in check_data_rlimit helper, thanks serge.hallyn@ for spotting it).

Please take a look once time permit (and drop previous two verisions
from mbox). Thanks!

	Cyrill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 1/4] mm: Introduce check_data_rlimit helper, v2
  2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
@ 2014-08-04 17:22 ` Cyrill Gorcunov
  2014-08-04 20:25   ` Serge E. Hallyn
  2014-08-04 17:22 ` [patch 2/4] mm: Use may_adjust_brk helper Cyrill Gorcunov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

[-- Attachment #1: prctl-add-may_adjust_brk-2 --]
[-- Type: text/plain, Size: 1757 bytes --]

To eliminate code duplication lets introduce check_data_rlimit
helper which we will use in brk() and prctl() syscalls.

v2 (serge.hallyn@):
 - need to check against RLIM_INFINITY rather than RLIMIT_DATA

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Julien Tinnes <jln@google.com>
---
 include/linux/mm.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

Index: linux-2.6.git/include/linux/mm.h
===================================================================
--- linux-2.6.git.orig/include/linux/mm.h
+++ linux-2.6.git/include/linux/mm.h
@@ -18,6 +18,7 @@
 #include <linux/pfn.h>
 #include <linux/bit_spinlock.h>
 #include <linux/shrinker.h>
+#include <linux/resource.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -1780,6 +1781,20 @@ extern struct vm_area_struct *copy_vma(s
 	bool *need_rmap_locks);
 extern void exit_mmap(struct mm_struct *);
 
+static inline int check_data_rlimit(unsigned long rlim,
+				    unsigned long new,
+				    unsigned long start,
+				    unsigned long end_data,
+				    unsigned long start_data)
+{
+	if (rlim < RLIM_INFINITY) {
+		if (((new - start) + (end_data - start_data)) > rlim)
+			return -ENOSPC;
+	}
+
+	return 0;
+}
+
 extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 2/4] mm: Use may_adjust_brk helper
  2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
  2014-08-04 17:22 ` [patch 1/4] mm: Introduce check_data_rlimit helper, v2 Cyrill Gorcunov
@ 2014-08-04 17:22 ` Cyrill Gorcunov
  2014-08-04 20:25   ` Serge E. Hallyn
  2014-08-04 17:22 ` [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

[-- Attachment #1: prctl-use-may_adjust_brk-2 --]
[-- Type: text/plain, Size: 2674 bytes --]

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Julien Tinnes <jln@google.com>
---
 kernel/sys.c |   11 ++++-------
 mm/mmap.c    |    7 +++----
 2 files changed, 7 insertions(+), 11 deletions(-)

Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1693,7 +1693,6 @@ exit:
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
-	unsigned long rlim = rlimit(RLIMIT_DATA);
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma;
 	int error;
@@ -1733,9 +1732,8 @@ static int prctl_set_mm(int opt, unsigne
 		if (addr <= mm->end_data)
 			goto out;
 
-		if (rlim < RLIM_INFINITY &&
-		    (mm->brk - addr) +
-		    (mm->end_data - mm->start_data) > rlim)
+		if (check_data_rlimit(rlimit(RLIMIT_DATA), mm->brk, addr,
+				      mm->end_data, mm->start_data))
 			goto out;
 
 		mm->start_brk = addr;
@@ -1745,9 +1743,8 @@ static int prctl_set_mm(int opt, unsigne
 		if (addr <= mm->end_data)
 			goto out;
 
-		if (rlim < RLIM_INFINITY &&
-		    (addr - mm->start_brk) +
-		    (mm->end_data - mm->start_data) > rlim)
+		if (check_data_rlimit(rlimit(RLIMIT_DATA), addr, mm->start_brk,
+				      mm->end_data, mm->start_data))
 			goto out;
 
 		mm->brk = addr;
Index: linux-2.6.git/mm/mmap.c
===================================================================
--- linux-2.6.git.orig/mm/mmap.c
+++ linux-2.6.git/mm/mmap.c
@@ -263,7 +263,7 @@ static unsigned long do_brk(unsigned lon
 
 SYSCALL_DEFINE1(brk, unsigned long, brk)
 {
-	unsigned long rlim, retval;
+	unsigned long retval;
 	unsigned long newbrk, oldbrk;
 	struct mm_struct *mm = current->mm;
 	unsigned long min_brk;
@@ -293,9 +293,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	 * segment grow beyond its set limit the in case where the limit is
 	 * not page aligned -Ram Gupta
 	 */
-	rlim = rlimit(RLIMIT_DATA);
-	if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
-			(mm->end_data - mm->start_data) > rlim)
+	if (check_data_rlimit(rlimit(RLIMIT_DATA), brk, mm->start_brk,
+			      mm->end_data, mm->start_data))
 		goto out;
 
 	newbrk = PAGE_ALIGN(brk);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file
  2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
  2014-08-04 17:22 ` [patch 1/4] mm: Introduce check_data_rlimit helper, v2 Cyrill Gorcunov
  2014-08-04 17:22 ` [patch 2/4] mm: Use may_adjust_brk helper Cyrill Gorcunov
@ 2014-08-04 17:22 ` Cyrill Gorcunov
  2014-08-04 20:22   ` Serge E. Hallyn
  2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
  2014-08-15 19:11 ` [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
  4 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

[-- Attachment #1: prctl-rework-prctl_set_mm_exe_file-locked --]
[-- Type: text/plain, Size: 2599 bytes --]

Instead of taking mm->mmap_sem inside prctl_set_mm_exe_file move
it out of and rename the helper to prctl_set_mm_exe_file_locked.
This will allow to reuse this function in a next patch.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Julien Tinnes <jln@google.com>
---
 kernel/sys.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1628,12 +1628,14 @@ SYSCALL_DEFINE1(umask, int, mask)
 	return mask;
 }
 
-static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
+static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
 	struct inode *inode;
 	int err;
 
+	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
+
 	exe = fdget(fd);
 	if (!exe.file)
 		return -EBADF;
@@ -1654,8 +1656,6 @@ static int prctl_set_mm_exe_file(struct
 	if (err)
 		goto exit;
 
-	down_write(&mm->mmap_sem);
-
 	/*
 	 * Forbid mm->exe_file change if old file still mapped.
 	 */
@@ -1667,7 +1667,7 @@ static int prctl_set_mm_exe_file(struct
 			if (vma->vm_file &&
 			    path_equal(&vma->vm_file->f_path,
 				       &mm->exe_file->f_path))
-				goto exit_unlock;
+				goto exit;
 	}
 
 	/*
@@ -1678,13 +1678,10 @@ static int prctl_set_mm_exe_file(struct
 	 */
 	err = -EPERM;
 	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
-		goto exit_unlock;
+		goto exit;
 
 	err = 0;
 	set_mm_exe_file(mm, exe.file);	/* this grabs a reference to exe.file */
-exit_unlock:
-	up_write(&mm->mmap_sem);
-
 exit:
 	fdput(exe);
 	return err;
@@ -1703,8 +1700,12 @@ static int prctl_set_mm(int opt, unsigne
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
-	if (opt == PR_SET_MM_EXE_FILE)
-		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
+	if (opt == PR_SET_MM_EXE_FILE) {
+		down_write(&mm->mmap_sem);
+		error = prctl_set_mm_exe_file_locked(mm, (unsigned int)addr);
+		up_write(&mm->mmap_sem);
+		return error;
+	}
 
 	if (addr >= TASK_SIZE || addr < mmap_min_addr)
 		return -EINVAL;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2014-08-04 17:22 ` [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov
@ 2014-08-04 17:22 ` Cyrill Gorcunov
  2014-08-04 21:01   ` Serge E. Hallyn
                     ` (2 more replies)
  2014-08-15 19:11 ` [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
  4 siblings, 3 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-04 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: gorcunov, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

[-- Attachment #1: prctl-rework-new-mm-map-7 --]
[-- Type: text/plain, Size: 14491 bytes --]

During development of c/r we've noticed that in case if we need to
support user namespaces we face a problem with capabilities in
prctl(PR_SET_MM, ...) call, in particular once new user namespace
is created capable(CAP_SYS_RESOURCE) no longer passes.

A approach is to eliminate CAP_SYS_RESOURCE check but pass all
new values in one bundle, which would allow the kernel to make
more intensive test for sanity of values and same time allow us to
support checkpoint/restore of user namespaces.

Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
prctl_mm_map structure which carries all the members to be updated.

	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)

	struct prctl_mm_map {
		__u64	start_code;
		__u64	end_code;
		__u64	start_data;
		__u64	end_data;
		__u64	start_brk;
		__u64	brk;
		__u64	start_stack;
		__u64	arg_start;
		__u64	arg_end;
		__u64	env_start;
		__u64	env_end;
		__u64	*auxv;
		__u32	auxv_size;
		__u32	exe_fd;
	};

All members except @exe_fd correspond ones of struct mm_struct.
To figure out which available values these members may take here
are meanings of the members.

 - start_code, end_code: represent bounds of executable code area
 - start_data, end_data: represent bounds of data area
 - start_brk, brk: used to calculate bounds for brk() syscall
 - start_stack: used when accounting space needed for command
   line arguments, environment and shmat() syscall
 - arg_start, arg_end, env_start, env_end: represent memory area
   supplied for command line arguments and environment variables
 - auxv, auxv_size: carries auxiliary vector, Elf format specifics
 - exe_fd: file descriptor number for executable link (/proc/self/exe)

Thus we apply the following requirements to the values

1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
   in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
   interval.

2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
   VMAs (say a program maps own new .text and .data segments during execution)
   the rest of members should belong to VMA which must exist.

3) Addresses must be ordered, ie @start_ member must not be greater or
   equal to appropriate @end_ member.

4) As in regular Elf loading procedure we require that @start_brk and
   @brk be greater than @end_data.

5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
   exceed existing limit. Same applies to RLIMIT_STACK.

6) Auxiliary vector size must not exceed existing one (which is
   predefined as AT_VECTOR_SIZE and depends on architecture).

7) File descriptor passed in @exe_file should be pointing
   to executable file (because we use existing prctl_set_mm_exe_file_locked
   helper it ensures that the file we are going to use as exe link has all
   required permission granted).

Now about where these members are involved inside kernel code:

 - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;

 - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
   also they are considered if there enough space for brk() syscall
   result if RLIMIT_DATA is set;

 - @start_brk shown in /proc/$pid/stat output and accounted in brk()
   syscall if RLIMIT_DATA is set; also this member is tested to
   find a symbolic name of mmap event for perf system (we choose
   if event is generated for "heap" area); one more aplication is
   selinux -- we test if a process has PROCESS__EXECHEAP permission
   if trying to make heap area being executable with mprotect() syscall;

 - @brk is a current value for brk() syscall which lays inside heap
   area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
   provides new memory area to a user space upon brk() completion the
   mm::brk is updated to carry new value;

   Both @start_brk and @brk are actively used in /proc/$pid/maps
   and /proc/$pid/smaps output to find a symbolic name "heap" for
   VMA being scanned;

 - @start_stack is printed out in /proc/$pid/stat and used to
   find a symbolic name "stack" for task and threads in
   /proc/$pid/maps and /proc/$pid/smaps output, and as the same
   as with @start_brk -- perf system uses it for event naming.
   Also kernel treat this member as a start address of where
   to map vDSO pages and to check if there is enough space
   for shmat() syscall;

 - @arg_start, @arg_end, @env_start and @env_end are printed out
   in /proc/$pid/stat. Another access to the data these members
   represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
   Any attempt to read these areas kernel tests with access_process_vm
   helper so a user must have enough rights for this action;

 - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
   speaking kernel doesn't care much about which exactly data is
   sitting there because it is solely for userspace;

 - @exe_fd is referred from /proc/$pid/exe and when generating
   coredump. We uses prctl_set_mm_exe_file_locked helper to update
   this member, so exe-file link modification remains one-shot
   action.

Still note that updating exe-file link now doesn't require sys-resource
capability anymore, after all there is no much profit in preventing setup
own file link (there are a number of ways to execute own code -- ptrace,
ld-preload, so that the only reliable way to find which exactly code
is executed is to inspect running program memory). Still we require
the caller to be at least user-namespace root user.

I believe the old interface should be deprecated and ripped off
in a couple of kernel releases if no one against.

To test if new interface is implemented in the kernel one
can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
the size of currently supported struct prctl_mm_map.

v2:
 - compact macros (by keescook@)
 - wrap new code with CONFIG_ (by akpm@)

v3 (by jln@):
 - use __prctl_check_order for brk and start_brk
 - use may_adjust_brk helper
 - make sure that only root can update @exe_fd link

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Julien Tinnes <jln@google.com>
---
 include/uapi/linux/prctl.h |   25 +++++
 kernel/sys.c               |  211 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 235 insertions(+), 1 deletion(-)

Index: linux-2.6.git/include/uapi/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/uapi/linux/prctl.h
+++ linux-2.6.git/include/uapi/linux/prctl.h
@@ -119,6 +119,31 @@
 # define PR_SET_MM_ENV_END		11
 # define PR_SET_MM_AUXV			12
 # define PR_SET_MM_EXE_FILE		13
+# define PR_SET_MM_MAP			14
+# define PR_SET_MM_MAP_SIZE		15
+
+/*
+ * This structure provides new memory descriptor
+ * map which mostly modifies /proc/pid/stat[m]
+ * output for a task. This mostly done in a
+ * sake of checkpoint/restore functionality.
+ */
+struct prctl_mm_map {
+	__u64	start_code;		/* code section bounds */
+	__u64	end_code;
+	__u64	start_data;		/* data section bounds */
+	__u64	end_data;
+	__u64	start_brk;		/* heap for brk() syscall */
+	__u64	brk;
+	__u64	start_stack;		/* stack starts at */
+	__u64	arg_start;		/* command line arguments bounds */
+	__u64	arg_end;
+	__u64	env_start;		/* environment variables bounds */
+	__u64	env_end;
+	__u64	*auxv;			/* auxiliary vector */
+	__u32	auxv_size;		/* vector size */
+	__u32	exe_fd;			/* /proc/$pid/exe link file */
+};
 
 /*
  * Set specific pid that is allowed to ptrace the current task.
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1687,6 +1687,208 @@ exit:
 	return err;
 }
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
+/*
+ * WARNING: we don't require any capability here so be very careful
+ * in what is allowed for modification from userspace.
+ */
+static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map)
+{
+	unsigned long mmap_max_addr = TASK_SIZE;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *stack_vma;
+	int error = 0;
+
+	/*
+	 * Make sure the members are not somewhere outside
+	 * of allowed address space.
+	 */
+#define __prctl_check_addr_space(__member)					\
+	({									\
+		int __rc;							\
+		if ((unsigned long)prctl_map->__member < mmap_max_addr &&	\
+		    (unsigned long)prctl_map->__member >= mmap_min_addr)	\
+			__rc = 0;						\
+		else								\
+			__rc = -EINVAL;						\
+		__rc;								\
+	})
+	error |= __prctl_check_addr_space(start_code);
+	error |= __prctl_check_addr_space(end_code);
+	error |= __prctl_check_addr_space(start_data);
+	error |= __prctl_check_addr_space(end_data);
+	error |= __prctl_check_addr_space(start_stack);
+	error |= __prctl_check_addr_space(start_brk);
+	error |= __prctl_check_addr_space(brk);
+	error |= __prctl_check_addr_space(arg_start);
+	error |= __prctl_check_addr_space(arg_end);
+	error |= __prctl_check_addr_space(env_start);
+	error |= __prctl_check_addr_space(env_end);
+	if (error)
+		goto out;
+#undef __prctl_check_addr_space
+
+	/*
+	 * Stack, brk, command line arguments and environment must exist.
+	 */
+	stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack);
+	if (!stack_vma) {
+		error = -EINVAL;
+		goto out;
+	}
+#define __prctl_check_vma(__member)						\
+	find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL
+	error |= __prctl_check_vma(start_brk);
+	error |= __prctl_check_vma(brk);
+	error |= __prctl_check_vma(arg_start);
+	error |= __prctl_check_vma(arg_end);
+	error |= __prctl_check_vma(env_start);
+	error |= __prctl_check_vma(env_end);
+	if (error)
+		goto out;
+#undef __prctl_check_vma
+
+	/*
+	 * Make sure the pairs are ordered.
+	 */
+#define __prctl_check_order(__m1, __op, __m2)					\
+	((unsigned long)prctl_map->__m1 __op					\
+	 (unsigned long)prctl_map->__m2) ? 0 : -EINVAL
+	error |= __prctl_check_order(start_code, <, end_code);
+	error |= __prctl_check_order(start_data, <, end_data);
+	error |= __prctl_check_order(start_brk, <=, brk);
+	error |= __prctl_check_order(arg_start, <=, arg_end);
+	error |= __prctl_check_order(env_start, <=, env_end);
+	if (error)
+		goto out;
+#undef __prctl_check_order
+
+	error = -EINVAL;
+
+	/*
+	 * @brk should be after @end_data in traditional maps.
+	 */
+	if (prctl_map->start_brk <= prctl_map->end_data ||
+	    prctl_map->brk <= prctl_map->end_data)
+		goto out;
+
+	/*
+	 * Neither we should allow to override limits if they set.
+	 */
+	if (check_data_rlimit(rlimit(RLIMIT_DATA), prctl_map->brk,
+			      prctl_map->start_brk, prctl_map->end_data,
+			      prctl_map->start_data))
+			goto out;
+
+#ifdef CONFIG_STACK_GROWSUP
+	if (check_data_rlimit(rlimit(RLIMIT_STACK),
+			      stack_vma->vm_end,
+			      prctl_map->start_stack, 0, 0))
+#else
+	if (check_data_rlimit(rlimit(RLIMIT_STACK),
+			      prctl_map->start_stack,
+			      stack_vma->vm_start, 0, 0))
+#endif
+		goto out;
+
+	/*
+	 * Someone is trying to cheat the auxv vector.
+	 */
+	if (prctl_map->auxv_size) {
+		if (!prctl_map->auxv ||
+		    prctl_map->auxv_size > sizeof(mm->saved_auxv))
+			goto out;
+	}
+
+	/*
+	 * Finally, make sure the caller has the rights to
+	 * change /proc/pid/exe link: only local root should
+	 * be allowed to.
+	 */
+	if (prctl_map->exe_fd != (u32)-1) {
+		struct user_namespace *ns = current_user_ns();
+		const struct cred *cred = current_cred();
+
+		if (!uid_eq(cred->uid, make_kuid(ns, 0)) ||
+		    !gid_eq(cred->gid, make_kgid(ns, 0)))
+			goto out;
+	}
+
+	error = 0;
+out:
+	return error;
+}
+
+static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size)
+{
+	struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, };
+	unsigned long user_auxv[AT_VECTOR_SIZE];
+	struct mm_struct *mm = current->mm;
+	int error = -EINVAL;
+
+	BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
+
+	if (opt == PR_SET_MM_MAP_SIZE)
+		return put_user((unsigned int)sizeof(prctl_map),
+				(unsigned int __user *)addr);
+
+	if (data_size != sizeof(prctl_map))
+		return -EINVAL;
+
+	if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
+		return -EFAULT;
+
+	down_read(&mm->mmap_sem);
+
+	if (validate_prctl_map_locked(&prctl_map))
+		goto out;
+
+	if (prctl_map.auxv_size) {
+		up_read(&mm->mmap_sem);
+		memset(user_auxv, 0, sizeof(user_auxv));
+		error = copy_from_user(user_auxv,
+				       (const void __user *)prctl_map.auxv,
+				       prctl_map.auxv_size);
+		down_read(&mm->mmap_sem);
+		if (error)
+			goto out;
+	}
+
+	if (prctl_map.exe_fd != (u32)-1) {
+		error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd);
+		if (error)
+			goto out;
+	}
+
+	if (prctl_map.auxv_size) {
+		/* Last entry must be AT_NULL as specification requires */
+		user_auxv[AT_VECTOR_SIZE - 2] = AT_NULL;
+		user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL;
+
+		task_lock(current);
+		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
+		task_unlock(current);
+	}
+
+	mm->start_code	= prctl_map.start_code;
+	mm->end_code	= prctl_map.end_code;
+	mm->start_data	= prctl_map.start_data;
+	mm->end_data	= prctl_map.end_data;
+	mm->start_brk	= prctl_map.start_brk;
+	mm->brk		= prctl_map.brk;
+	mm->start_stack	= prctl_map.start_stack;
+	mm->arg_start	= prctl_map.arg_start;
+	mm->arg_end	= prctl_map.arg_end;
+	mm->env_start	= prctl_map.env_start;
+	mm->env_end	= prctl_map.env_end;
+
+	error = 0;
+out:
+	up_read(&mm->mmap_sem);
+	return error;
+}
+#endif /* CONFIG_CHECKPOINT_RESTORE */
+
 static int prctl_set_mm(int opt, unsigned long addr,
 			unsigned long arg4, unsigned long arg5)
 {
@@ -1694,9 +1896,16 @@ static int prctl_set_mm(int opt, unsigne
 	struct vm_area_struct *vma;
 	int error;
 
-	if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
+	if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV &&
+			      opt != PR_SET_MM_MAP &&
+			      opt != PR_SET_MM_MAP_SIZE)))
 		return -EINVAL;
 
+#ifdef CONFIG_CHECKPOINT_RESTORE
+	if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
+		return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
+#endif
+
 	if (!capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file
  2014-08-04 17:22 ` [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov
@ 2014-08-04 20:22   ` Serge E. Hallyn
  0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2014-08-04 20:22 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

Quoting Cyrill Gorcunov (gorcunov@openvz.org):
> Instead of taking mm->mmap_sem inside prctl_set_mm_exe_file move
> it out of and rename the helper to prctl_set_mm_exe_file_locked.
> This will allow to reuse this function in a next patch.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Julien Tinnes <jln@google.com>
> ---
>  kernel/sys.c |   21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -1628,12 +1628,14 @@ SYSCALL_DEFINE1(umask, int, mask)
>  	return mask;
>  }
>  
> -static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
> +static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd)
>  {
>  	struct fd exe;
>  	struct inode *inode;
>  	int err;
>  
> +	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
> +
>  	exe = fdget(fd);
>  	if (!exe.file)
>  		return -EBADF;
> @@ -1654,8 +1656,6 @@ static int prctl_set_mm_exe_file(struct
>  	if (err)
>  		goto exit;
>  
> -	down_write(&mm->mmap_sem);
> -
>  	/*
>  	 * Forbid mm->exe_file change if old file still mapped.
>  	 */
> @@ -1667,7 +1667,7 @@ static int prctl_set_mm_exe_file(struct
>  			if (vma->vm_file &&
>  			    path_equal(&vma->vm_file->f_path,
>  				       &mm->exe_file->f_path))
> -				goto exit_unlock;
> +				goto exit;
>  	}
>  
>  	/*
> @@ -1678,13 +1678,10 @@ static int prctl_set_mm_exe_file(struct
>  	 */
>  	err = -EPERM;
>  	if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags))
> -		goto exit_unlock;
> +		goto exit;
>  
>  	err = 0;
>  	set_mm_exe_file(mm, exe.file);	/* this grabs a reference to exe.file */
> -exit_unlock:
> -	up_write(&mm->mmap_sem);
> -
>  exit:
>  	fdput(exe);
>  	return err;
> @@ -1703,8 +1700,12 @@ static int prctl_set_mm(int opt, unsigne
>  	if (!capable(CAP_SYS_RESOURCE))
>  		return -EPERM;
>  
> -	if (opt == PR_SET_MM_EXE_FILE)
> -		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
> +	if (opt == PR_SET_MM_EXE_FILE) {
> +		down_write(&mm->mmap_sem);
> +		error = prctl_set_mm_exe_file_locked(mm, (unsigned int)addr);
> +		up_write(&mm->mmap_sem);
> +		return error;
> +	}
>  
>  	if (addr >= TASK_SIZE || addr < mmap_min_addr)
>  		return -EINVAL;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/4] mm: Introduce check_data_rlimit helper, v2
  2014-08-04 17:22 ` [patch 1/4] mm: Introduce check_data_rlimit helper, v2 Cyrill Gorcunov
@ 2014-08-04 20:25   ` Serge E. Hallyn
  0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2014-08-04 20:25 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

Quoting Cyrill Gorcunov (gorcunov@openvz.org):
> To eliminate code duplication lets introduce check_data_rlimit
> helper which we will use in brk() and prctl() syscalls.
> 
> v2 (serge.hallyn@):
>  - need to check against RLIM_INFINITY rather than RLIMIT_DATA
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Julien Tinnes <jln@google.com>
> ---
>  include/linux/mm.h |   15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> Index: linux-2.6.git/include/linux/mm.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/mm.h
> +++ linux-2.6.git/include/linux/mm.h
> @@ -18,6 +18,7 @@
>  #include <linux/pfn.h>
>  #include <linux/bit_spinlock.h>
>  #include <linux/shrinker.h>
> +#include <linux/resource.h>
>  
>  struct mempolicy;
>  struct anon_vma;
> @@ -1780,6 +1781,20 @@ extern struct vm_area_struct *copy_vma(s
>  	bool *need_rmap_locks);
>  extern void exit_mmap(struct mm_struct *);
>  
> +static inline int check_data_rlimit(unsigned long rlim,
> +				    unsigned long new,
> +				    unsigned long start,
> +				    unsigned long end_data,
> +				    unsigned long start_data)
> +{
> +	if (rlim < RLIM_INFINITY) {
> +		if (((new - start) + (end_data - start_data)) > rlim)
> +			return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +
>  extern int mm_take_all_locks(struct mm_struct *mm);
>  extern void mm_drop_all_locks(struct mm_struct *mm);
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 2/4] mm: Use may_adjust_brk helper
  2014-08-04 17:22 ` [patch 2/4] mm: Use may_adjust_brk helper Cyrill Gorcunov
@ 2014-08-04 20:25   ` Serge E. Hallyn
  0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2014-08-04 20:25 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

Quoting Cyrill Gorcunov (gorcunov@openvz.org):
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Julien Tinnes <jln@google.com>
> ---
>  kernel/sys.c |   11 ++++-------
>  mm/mmap.c    |    7 +++----
>  2 files changed, 7 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -1693,7 +1693,6 @@ exit:
>  static int prctl_set_mm(int opt, unsigned long addr,
>  			unsigned long arg4, unsigned long arg5)
>  {
> -	unsigned long rlim = rlimit(RLIMIT_DATA);
>  	struct mm_struct *mm = current->mm;
>  	struct vm_area_struct *vma;
>  	int error;
> @@ -1733,9 +1732,8 @@ static int prctl_set_mm(int opt, unsigne
>  		if (addr <= mm->end_data)
>  			goto out;
>  
> -		if (rlim < RLIM_INFINITY &&
> -		    (mm->brk - addr) +
> -		    (mm->end_data - mm->start_data) > rlim)
> +		if (check_data_rlimit(rlimit(RLIMIT_DATA), mm->brk, addr,
> +				      mm->end_data, mm->start_data))
>  			goto out;
>  
>  		mm->start_brk = addr;
> @@ -1745,9 +1743,8 @@ static int prctl_set_mm(int opt, unsigne
>  		if (addr <= mm->end_data)
>  			goto out;
>  
> -		if (rlim < RLIM_INFINITY &&
> -		    (addr - mm->start_brk) +
> -		    (mm->end_data - mm->start_data) > rlim)
> +		if (check_data_rlimit(rlimit(RLIMIT_DATA), addr, mm->start_brk,
> +				      mm->end_data, mm->start_data))
>  			goto out;
>  
>  		mm->brk = addr;
> Index: linux-2.6.git/mm/mmap.c
> ===================================================================
> --- linux-2.6.git.orig/mm/mmap.c
> +++ linux-2.6.git/mm/mmap.c
> @@ -263,7 +263,7 @@ static unsigned long do_brk(unsigned lon
>  
>  SYSCALL_DEFINE1(brk, unsigned long, brk)
>  {
> -	unsigned long rlim, retval;
> +	unsigned long retval;
>  	unsigned long newbrk, oldbrk;
>  	struct mm_struct *mm = current->mm;
>  	unsigned long min_brk;
> @@ -293,9 +293,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
>  	 * segment grow beyond its set limit the in case where the limit is
>  	 * not page aligned -Ram Gupta
>  	 */
> -	rlim = rlimit(RLIMIT_DATA);
> -	if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> -			(mm->end_data - mm->start_data) > rlim)
> +	if (check_data_rlimit(rlimit(RLIMIT_DATA), brk, mm->start_brk,
> +			      mm->end_data, mm->start_data))
>  		goto out;
>  
>  	newbrk = PAGE_ALIGN(brk);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
@ 2014-08-04 21:01   ` Serge E. Hallyn
  2014-08-05  8:08   ` Andrew Vagin
  2014-08-21 22:51   ` Andrew Morton
  2 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2014-08-04 21:01 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

Quoting Cyrill Gorcunov (gorcunov@openvz.org):
> During development of c/r we've noticed that in case if we need to
> support user namespaces we face a problem with capabilities in
> prctl(PR_SET_MM, ...) call, in particular once new user namespace
> is created capable(CAP_SYS_RESOURCE) no longer passes.
> 
> A approach is to eliminate CAP_SYS_RESOURCE check but pass all
> new values in one bundle, which would allow the kernel to make
> more intensive test for sanity of values and same time allow us to
> support checkpoint/restore of user namespaces.
> 
> Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
> prctl_mm_map structure which carries all the members to be updated.
> 
> 	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)
> 
> 	struct prctl_mm_map {
> 		__u64	start_code;
> 		__u64	end_code;
> 		__u64	start_data;
> 		__u64	end_data;
> 		__u64	start_brk;
> 		__u64	brk;
> 		__u64	start_stack;
> 		__u64	arg_start;
> 		__u64	arg_end;
> 		__u64	env_start;
> 		__u64	env_end;
> 		__u64	*auxv;
> 		__u32	auxv_size;
> 		__u32	exe_fd;
> 	};
> 
> All members except @exe_fd correspond ones of struct mm_struct.
> To figure out which available values these members may take here
> are meanings of the members.
> 
>  - start_code, end_code: represent bounds of executable code area
>  - start_data, end_data: represent bounds of data area
>  - start_brk, brk: used to calculate bounds for brk() syscall
>  - start_stack: used when accounting space needed for command
>    line arguments, environment and shmat() syscall
>  - arg_start, arg_end, env_start, env_end: represent memory area
>    supplied for command line arguments and environment variables
>  - auxv, auxv_size: carries auxiliary vector, Elf format specifics
>  - exe_fd: file descriptor number for executable link (/proc/self/exe)
> 
> Thus we apply the following requirements to the values
> 
> 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
>    in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
>    interval.
> 
> 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
>    VMAs (say a program maps own new .text and .data segments during execution)
>    the rest of members should belong to VMA which must exist.
> 
> 3) Addresses must be ordered, ie @start_ member must not be greater or
>    equal to appropriate @end_ member.
> 
> 4) As in regular Elf loading procedure we require that @start_brk and
>    @brk be greater than @end_data.
> 
> 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
>    exceed existing limit. Same applies to RLIMIT_STACK.
> 
> 6) Auxiliary vector size must not exceed existing one (which is
>    predefined as AT_VECTOR_SIZE and depends on architecture).
> 
> 7) File descriptor passed in @exe_file should be pointing
>    to executable file (because we use existing prctl_set_mm_exe_file_locked
>    helper it ensures that the file we are going to use as exe link has all
>    required permission granted).
> 
> Now about where these members are involved inside kernel code:
> 
>  - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;
> 
>  - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
>    also they are considered if there enough space for brk() syscall
>    result if RLIMIT_DATA is set;
> 
>  - @start_brk shown in /proc/$pid/stat output and accounted in brk()
>    syscall if RLIMIT_DATA is set; also this member is tested to
>    find a symbolic name of mmap event for perf system (we choose
>    if event is generated for "heap" area); one more aplication is
>    selinux -- we test if a process has PROCESS__EXECHEAP permission
>    if trying to make heap area being executable with mprotect() syscall;
> 
>  - @brk is a current value for brk() syscall which lays inside heap
>    area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
>    provides new memory area to a user space upon brk() completion the
>    mm::brk is updated to carry new value;
> 
>    Both @start_brk and @brk are actively used in /proc/$pid/maps
>    and /proc/$pid/smaps output to find a symbolic name "heap" for
>    VMA being scanned;
> 
>  - @start_stack is printed out in /proc/$pid/stat and used to
>    find a symbolic name "stack" for task and threads in
>    /proc/$pid/maps and /proc/$pid/smaps output, and as the same
>    as with @start_brk -- perf system uses it for event naming.
>    Also kernel treat this member as a start address of where
>    to map vDSO pages and to check if there is enough space
>    for shmat() syscall;
> 
>  - @arg_start, @arg_end, @env_start and @env_end are printed out
>    in /proc/$pid/stat. Another access to the data these members
>    represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
>    Any attempt to read these areas kernel tests with access_process_vm
>    helper so a user must have enough rights for this action;
> 
>  - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
>    speaking kernel doesn't care much about which exactly data is
>    sitting there because it is solely for userspace;
> 
>  - @exe_fd is referred from /proc/$pid/exe and when generating
>    coredump. We uses prctl_set_mm_exe_file_locked helper to update
>    this member, so exe-file link modification remains one-shot
>    action.
> 
> Still note that updating exe-file link now doesn't require sys-resource
> capability anymore, after all there is no much profit in preventing setup
> own file link (there are a number of ways to execute own code -- ptrace,
> ld-preload, so that the only reliable way to find which exactly code
> is executed is to inspect running program memory). Still we require
> the caller to be at least user-namespace root user.
> 
> I believe the old interface should be deprecated and ripped off
> in a couple of kernel releases if no one against.
> 
> To test if new interface is implemented in the kernel one
> can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
> the size of currently supported struct prctl_mm_map.
> 
> v2:
>  - compact macros (by keescook@)
>  - wrap new code with CONFIG_ (by akpm@)
> 
> v3 (by jln@):
>  - use __prctl_check_order for brk and start_brk
>  - use may_adjust_brk helper
>  - make sure that only root can update @exe_fd link
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>

Going mainly by your description, as this is not my forté.
All looks kosher to me.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Julien Tinnes <jln@google.com>
> ---
>  include/uapi/linux/prctl.h |   25 +++++
>  kernel/sys.c               |  211 ++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 235 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.git/include/uapi/linux/prctl.h
> ===================================================================
> --- linux-2.6.git.orig/include/uapi/linux/prctl.h
> +++ linux-2.6.git/include/uapi/linux/prctl.h
> @@ -119,6 +119,31 @@
>  # define PR_SET_MM_ENV_END		11
>  # define PR_SET_MM_AUXV			12
>  # define PR_SET_MM_EXE_FILE		13
> +# define PR_SET_MM_MAP			14
> +# define PR_SET_MM_MAP_SIZE		15
> +
> +/*
> + * This structure provides new memory descriptor
> + * map which mostly modifies /proc/pid/stat[m]
> + * output for a task. This mostly done in a
> + * sake of checkpoint/restore functionality.
> + */
> +struct prctl_mm_map {
> +	__u64	start_code;		/* code section bounds */
> +	__u64	end_code;
> +	__u64	start_data;		/* data section bounds */
> +	__u64	end_data;
> +	__u64	start_brk;		/* heap for brk() syscall */
> +	__u64	brk;
> +	__u64	start_stack;		/* stack starts at */
> +	__u64	arg_start;		/* command line arguments bounds */
> +	__u64	arg_end;
> +	__u64	env_start;		/* environment variables bounds */
> +	__u64	env_end;
> +	__u64	*auxv;			/* auxiliary vector */
> +	__u32	auxv_size;		/* vector size */
> +	__u32	exe_fd;			/* /proc/$pid/exe link file */
> +};
>  
>  /*
>   * Set specific pid that is allowed to ptrace the current task.
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -1687,6 +1687,208 @@ exit:
>  	return err;
>  }
>  
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +/*
> + * WARNING: we don't require any capability here so be very careful
> + * in what is allowed for modification from userspace.
> + */
> +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map)
> +{
> +	unsigned long mmap_max_addr = TASK_SIZE;
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *stack_vma;
> +	int error = 0;
> +
> +	/*
> +	 * Make sure the members are not somewhere outside
> +	 * of allowed address space.
> +	 */
> +#define __prctl_check_addr_space(__member)					\
> +	({									\
> +		int __rc;							\
> +		if ((unsigned long)prctl_map->__member < mmap_max_addr &&	\
> +		    (unsigned long)prctl_map->__member >= mmap_min_addr)	\
> +			__rc = 0;						\
> +		else								\
> +			__rc = -EINVAL;						\
> +		__rc;								\
> +	})
> +	error |= __prctl_check_addr_space(start_code);
> +	error |= __prctl_check_addr_space(end_code);
> +	error |= __prctl_check_addr_space(start_data);
> +	error |= __prctl_check_addr_space(end_data);
> +	error |= __prctl_check_addr_space(start_stack);
> +	error |= __prctl_check_addr_space(start_brk);
> +	error |= __prctl_check_addr_space(brk);
> +	error |= __prctl_check_addr_space(arg_start);
> +	error |= __prctl_check_addr_space(arg_end);
> +	error |= __prctl_check_addr_space(env_start);
> +	error |= __prctl_check_addr_space(env_end);
> +	if (error)
> +		goto out;
> +#undef __prctl_check_addr_space
> +
> +	/*
> +	 * Stack, brk, command line arguments and environment must exist.
> +	 */
> +	stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack);
> +	if (!stack_vma) {
> +		error = -EINVAL;
> +		goto out;
> +	}
> +#define __prctl_check_vma(__member)						\
> +	find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL
> +	error |= __prctl_check_vma(start_brk);
> +	error |= __prctl_check_vma(brk);
> +	error |= __prctl_check_vma(arg_start);
> +	error |= __prctl_check_vma(arg_end);
> +	error |= __prctl_check_vma(env_start);
> +	error |= __prctl_check_vma(env_end);
> +	if (error)
> +		goto out;
> +#undef __prctl_check_vma
> +
> +	/*
> +	 * Make sure the pairs are ordered.
> +	 */
> +#define __prctl_check_order(__m1, __op, __m2)					\
> +	((unsigned long)prctl_map->__m1 __op					\
> +	 (unsigned long)prctl_map->__m2) ? 0 : -EINVAL
> +	error |= __prctl_check_order(start_code, <, end_code);
> +	error |= __prctl_check_order(start_data, <, end_data);
> +	error |= __prctl_check_order(start_brk, <=, brk);
> +	error |= __prctl_check_order(arg_start, <=, arg_end);
> +	error |= __prctl_check_order(env_start, <=, env_end);
> +	if (error)
> +		goto out;
> +#undef __prctl_check_order
> +
> +	error = -EINVAL;
> +
> +	/*
> +	 * @brk should be after @end_data in traditional maps.
> +	 */
> +	if (prctl_map->start_brk <= prctl_map->end_data ||
> +	    prctl_map->brk <= prctl_map->end_data)
> +		goto out;
> +
> +	/*
> +	 * Neither we should allow to override limits if they set.
> +	 */
> +	if (check_data_rlimit(rlimit(RLIMIT_DATA), prctl_map->brk,
> +			      prctl_map->start_brk, prctl_map->end_data,
> +			      prctl_map->start_data))
> +			goto out;
> +
> +#ifdef CONFIG_STACK_GROWSUP
> +	if (check_data_rlimit(rlimit(RLIMIT_STACK),
> +			      stack_vma->vm_end,
> +			      prctl_map->start_stack, 0, 0))
> +#else
> +	if (check_data_rlimit(rlimit(RLIMIT_STACK),
> +			      prctl_map->start_stack,
> +			      stack_vma->vm_start, 0, 0))
> +#endif
> +		goto out;
> +
> +	/*
> +	 * Someone is trying to cheat the auxv vector.
> +	 */
> +	if (prctl_map->auxv_size) {
> +		if (!prctl_map->auxv ||
> +		    prctl_map->auxv_size > sizeof(mm->saved_auxv))
> +			goto out;
> +	}
> +
> +	/*
> +	 * Finally, make sure the caller has the rights to
> +	 * change /proc/pid/exe link: only local root should
> +	 * be allowed to.
> +	 */
> +	if (prctl_map->exe_fd != (u32)-1) {
> +		struct user_namespace *ns = current_user_ns();
> +		const struct cred *cred = current_cred();
> +
> +		if (!uid_eq(cred->uid, make_kuid(ns, 0)) ||
> +		    !gid_eq(cred->gid, make_kgid(ns, 0)))
> +			goto out;
> +	}
> +
> +	error = 0;
> +out:
> +	return error;
> +}
> +
> +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size)
> +{
> +	struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, };
> +	unsigned long user_auxv[AT_VECTOR_SIZE];
> +	struct mm_struct *mm = current->mm;
> +	int error = -EINVAL;
> +
> +	BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
> +
> +	if (opt == PR_SET_MM_MAP_SIZE)
> +		return put_user((unsigned int)sizeof(prctl_map),
> +				(unsigned int __user *)addr);
> +
> +	if (data_size != sizeof(prctl_map))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
> +		return -EFAULT;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	if (validate_prctl_map_locked(&prctl_map))
> +		goto out;
> +
> +	if (prctl_map.auxv_size) {
> +		up_read(&mm->mmap_sem);
> +		memset(user_auxv, 0, sizeof(user_auxv));
> +		error = copy_from_user(user_auxv,
> +				       (const void __user *)prctl_map.auxv,
> +				       prctl_map.auxv_size);
> +		down_read(&mm->mmap_sem);
> +		if (error)
> +			goto out;
> +	}
> +
> +	if (prctl_map.exe_fd != (u32)-1) {
> +		error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd);
> +		if (error)
> +			goto out;
> +	}
> +
> +	if (prctl_map.auxv_size) {
> +		/* Last entry must be AT_NULL as specification requires */
> +		user_auxv[AT_VECTOR_SIZE - 2] = AT_NULL;
> +		user_auxv[AT_VECTOR_SIZE - 1] = AT_NULL;
> +
> +		task_lock(current);
> +		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
> +		task_unlock(current);
> +	}
> +
> +	mm->start_code	= prctl_map.start_code;
> +	mm->end_code	= prctl_map.end_code;
> +	mm->start_data	= prctl_map.start_data;
> +	mm->end_data	= prctl_map.end_data;
> +	mm->start_brk	= prctl_map.start_brk;
> +	mm->brk		= prctl_map.brk;
> +	mm->start_stack	= prctl_map.start_stack;
> +	mm->arg_start	= prctl_map.arg_start;
> +	mm->arg_end	= prctl_map.arg_end;
> +	mm->env_start	= prctl_map.env_start;
> +	mm->env_end	= prctl_map.env_end;
> +
> +	error = 0;
> +out:
> +	up_read(&mm->mmap_sem);
> +	return error;
> +}
> +#endif /* CONFIG_CHECKPOINT_RESTORE */
> +
>  static int prctl_set_mm(int opt, unsigned long addr,
>  			unsigned long arg4, unsigned long arg5)
>  {
> @@ -1694,9 +1896,16 @@ static int prctl_set_mm(int opt, unsigne
>  	struct vm_area_struct *vma;
>  	int error;
>  
> -	if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
> +	if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV &&
> +			      opt != PR_SET_MM_MAP &&
> +			      opt != PR_SET_MM_MAP_SIZE)))
>  		return -EINVAL;
>  
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +	if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
> +		return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
> +#endif
> +
>  	if (!capable(CAP_SYS_RESOURCE))
>  		return -EPERM;
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
  2014-08-04 21:01   ` Serge E. Hallyn
@ 2014-08-05  8:08   ` Andrew Vagin
  2014-08-05  8:12     ` Cyrill Gorcunov
  2014-08-21 22:51   ` Andrew Morton
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Vagin @ 2014-08-05  8:08 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Mon, Aug 04, 2014 at 09:22:59PM +0400, Cyrill Gorcunov wrote:
> During development of c/r we've noticed that in case if we need to
> support user namespaces we face a problem with capabilities in
> prctl(PR_SET_MM, ...) call, in particular once new user namespace
> is created capable(CAP_SYS_RESOURCE) no longer passes.
> 
> A approach is to eliminate CAP_SYS_RESOURCE check but pass all
> new values in one bundle, which would allow the kernel to make
> more intensive test for sanity of values and same time allow us to
> support checkpoint/restore of user namespaces.
> 
> Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
> prctl_mm_map structure which carries all the members to be updated.
> 
> 	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)
> 
> 	struct prctl_mm_map {
> 		__u64	start_code;
> 		__u64	end_code;
> 		__u64	start_data;
> 		__u64	end_data;
> 		__u64	start_brk;
> 		__u64	brk;
> 		__u64	start_stack;
> 		__u64	arg_start;
> 		__u64	arg_end;
> 		__u64	env_start;
> 		__u64	env_end;
> 		__u64	*auxv;
> 		__u32	auxv_size;
> 		__u32	exe_fd;
> 	};
> 
> All members except @exe_fd correspond ones of struct mm_struct.
> To figure out which available values these members may take here
> are meanings of the members.
> 
>  - start_code, end_code: represent bounds of executable code area
>  - start_data, end_data: represent bounds of data area
>  - start_brk, brk: used to calculate bounds for brk() syscall
>  - start_stack: used when accounting space needed for command
>    line arguments, environment and shmat() syscall
>  - arg_start, arg_end, env_start, env_end: represent memory area
>    supplied for command line arguments and environment variables
>  - auxv, auxv_size: carries auxiliary vector, Elf format specifics
>  - exe_fd: file descriptor number for executable link (/proc/self/exe)
> 
> Thus we apply the following requirements to the values
> 
> 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
>    in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
>    interval.
> 
> 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
>    VMAs (say a program maps own new .text and .data segments during execution)
>    the rest of members should belong to VMA which must exist.
> 
> 3) Addresses must be ordered, ie @start_ member must not be greater or
>    equal to appropriate @end_ member.
> 
> 4) As in regular Elf loading procedure we require that @start_brk and
>    @brk be greater than @end_data.
> 
> 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
>    exceed existing limit. Same applies to RLIMIT_STACK.
> 
> 6) Auxiliary vector size must not exceed existing one (which is
>    predefined as AT_VECTOR_SIZE and depends on architecture).
> 
> 7) File descriptor passed in @exe_file should be pointing
>    to executable file (because we use existing prctl_set_mm_exe_file_locked
>    helper it ensures that the file we are going to use as exe link has all
>    required permission granted).
> 
> Now about where these members are involved inside kernel code:
> 
>  - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;
> 
>  - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
>    also they are considered if there enough space for brk() syscall
>    result if RLIMIT_DATA is set;
> 
>  - @start_brk shown in /proc/$pid/stat output and accounted in brk()
>    syscall if RLIMIT_DATA is set; also this member is tested to
>    find a symbolic name of mmap event for perf system (we choose
>    if event is generated for "heap" area); one more aplication is
>    selinux -- we test if a process has PROCESS__EXECHEAP permission
>    if trying to make heap area being executable with mprotect() syscall;
> 
>  - @brk is a current value for brk() syscall which lays inside heap
>    area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
>    provides new memory area to a user space upon brk() completion the
>    mm::brk is updated to carry new value;
> 
>    Both @start_brk and @brk are actively used in /proc/$pid/maps
>    and /proc/$pid/smaps output to find a symbolic name "heap" for
>    VMA being scanned;
> 
>  - @start_stack is printed out in /proc/$pid/stat and used to
>    find a symbolic name "stack" for task and threads in
>    /proc/$pid/maps and /proc/$pid/smaps output, and as the same
>    as with @start_brk -- perf system uses it for event naming.
>    Also kernel treat this member as a start address of where
>    to map vDSO pages and to check if there is enough space
>    for shmat() syscall;
> 
>  - @arg_start, @arg_end, @env_start and @env_end are printed out
>    in /proc/$pid/stat. Another access to the data these members
>    represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
>    Any attempt to read these areas kernel tests with access_process_vm
>    helper so a user must have enough rights for this action;
> 
>  - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
>    speaking kernel doesn't care much about which exactly data is
>    sitting there because it is solely for userspace;
> 
>  - @exe_fd is referred from /proc/$pid/exe and when generating
>    coredump. We uses prctl_set_mm_exe_file_locked helper to update
>    this member, so exe-file link modification remains one-shot
>    action.
> 
> Still note that updating exe-file link now doesn't require sys-resource
> capability anymore, after all there is no much profit in preventing setup
> own file link (there are a number of ways to execute own code -- ptrace,
> ld-preload, so that the only reliable way to find which exactly code
> is executed is to inspect running program memory). Still we require
> the caller to be at least user-namespace root user.
> 
> I believe the old interface should be deprecated and ripped off
> in a couple of kernel releases if no one against.
> 
> To test if new interface is implemented in the kernel one
> can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
> the size of currently supported struct prctl_mm_map.
> 
> v2:
>  - compact macros (by keescook@)
>  - wrap new code with CONFIG_ (by akpm@)
> 
> v3 (by jln@):
>  - use __prctl_check_order for brk and start_brk
>  - use may_adjust_brk helper
>  - make sure that only root can update @exe_fd link
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Vagin <avagin@openvz.org>

Acked-by: Andrew Vagin <avagin@openvz.org>

I have tested this patch with criu. Everything work as expected.

Thanks.

> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: Vasiliy Kulikov <segoon@openwall.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Julien Tinnes <jln@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-05  8:08   ` Andrew Vagin
@ 2014-08-05  8:12     ` Cyrill Gorcunov
  0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-05  8:12 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: linux-kernel, keescook, tj, akpm, avagin, ebiederm, hpa,
	serge.hallyn, xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Tue, Aug 05, 2014 at 12:08:53PM +0400, Andrew Vagin wrote:
> > 
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Andrew Vagin <avagin@openvz.org>
> 
> Acked-by: Andrew Vagin <avagin@openvz.org>
> 
> I have tested this patch with criu. Everything work as expected.

Thanks Andrew, Serge!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 0/4] prctl: set-mm -- Rework interface, v3
  2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
                   ` (3 preceding siblings ...)
  2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
@ 2014-08-15 19:11 ` Cyrill Gorcunov
  4 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-15 19:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: keescook, tj, akpm, avagin, ebiederm, hpa, serge.hallyn, xemul,
	segoon, kamezawa.hiroyu, mtk.manpages, jln

On Mon, Aug 04, 2014 at 09:22:55PM +0400, Cyrill Gorcunov wrote:
> Hi! Here is (I hope) a final version of the series (i've had a typo
> in check_data_rlimit helper, thanks serge.hallyn@ for spotting it).
> 
> Please take a look once time permit (and drop previous two verisions
> from mbox). Thanks!

Ping, any more comments?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
  2014-08-04 21:01   ` Serge E. Hallyn
  2014-08-05  8:08   ` Andrew Vagin
@ 2014-08-21 22:51   ` Andrew Morton
  2014-08-22  6:32     ` Cyrill Gorcunov
  2 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2014-08-21 22:51 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Mon, 04 Aug 2014 21:22:59 +0400 Cyrill Gorcunov <gorcunov@openvz.org> wrote:

> During development of c/r we've noticed that in case if we need to
> support user namespaces we face a problem with capabilities in
> prctl(PR_SET_MM, ...) call, in particular once new user namespace
> is created capable(CAP_SYS_RESOURCE) no longer passes.
> 
> A approach is to eliminate CAP_SYS_RESOURCE check but pass all
> new values in one bundle, which would allow the kernel to make
> more intensive test for sanity of values and same time allow us to
> support checkpoint/restore of user namespaces.
> 
> Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
> prctl_mm_map structure which carries all the members to be updated.
> 
> 	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)
> 
> 	struct prctl_mm_map {
> 		__u64	start_code;
> 		__u64	end_code;
> 		__u64	start_data;
> 		__u64	end_data;
> 		__u64	start_brk;
> 		__u64	brk;
> 		__u64	start_stack;
> 		__u64	arg_start;
> 		__u64	arg_end;
> 		__u64	env_start;
> 		__u64	env_end;
> 		__u64	*auxv;
> 		__u32	auxv_size;
> 		__u32	exe_fd;
> 	};
> 
> All members except @exe_fd correspond ones of struct mm_struct.
> To figure out which available values these members may take here
> are meanings of the members.
> 
>  - start_code, end_code: represent bounds of executable code area
>  - start_data, end_data: represent bounds of data area
>  - start_brk, brk: used to calculate bounds for brk() syscall
>  - start_stack: used when accounting space needed for command
>    line arguments, environment and shmat() syscall
>  - arg_start, arg_end, env_start, env_end: represent memory area
>    supplied for command line arguments and environment variables
>  - auxv, auxv_size: carries auxiliary vector, Elf format specifics
>  - exe_fd: file descriptor number for executable link (/proc/self/exe)
> 
> Thus we apply the following requirements to the values
> 
> 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
>    in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
>    interval.
> 
> 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
>    VMAs (say a program maps own new .text and .data segments during execution)
>    the rest of members should belong to VMA which must exist.
> 
> 3) Addresses must be ordered, ie @start_ member must not be greater or
>    equal to appropriate @end_ member.
> 
> 4) As in regular Elf loading procedure we require that @start_brk and
>    @brk be greater than @end_data.
> 
> 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
>    exceed existing limit. Same applies to RLIMIT_STACK.
> 
> 6) Auxiliary vector size must not exceed existing one (which is
>    predefined as AT_VECTOR_SIZE and depends on architecture).
> 
> 7) File descriptor passed in @exe_file should be pointing
>    to executable file (because we use existing prctl_set_mm_exe_file_locked
>    helper it ensures that the file we are going to use as exe link has all
>    required permission granted).
> 
> Now about where these members are involved inside kernel code:
> 
>  - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;
> 
>  - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
>    also they are considered if there enough space for brk() syscall
>    result if RLIMIT_DATA is set;
> 
>  - @start_brk shown in /proc/$pid/stat output and accounted in brk()
>    syscall if RLIMIT_DATA is set; also this member is tested to
>    find a symbolic name of mmap event for perf system (we choose
>    if event is generated for "heap" area); one more aplication is
>    selinux -- we test if a process has PROCESS__EXECHEAP permission
>    if trying to make heap area being executable with mprotect() syscall;
> 
>  - @brk is a current value for brk() syscall which lays inside heap
>    area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
>    provides new memory area to a user space upon brk() completion the
>    mm::brk is updated to carry new value;
> 
>    Both @start_brk and @brk are actively used in /proc/$pid/maps
>    and /proc/$pid/smaps output to find a symbolic name "heap" for
>    VMA being scanned;
> 
>  - @start_stack is printed out in /proc/$pid/stat and used to
>    find a symbolic name "stack" for task and threads in
>    /proc/$pid/maps and /proc/$pid/smaps output, and as the same
>    as with @start_brk -- perf system uses it for event naming.
>    Also kernel treat this member as a start address of where
>    to map vDSO pages and to check if there is enough space
>    for shmat() syscall;
> 
>  - @arg_start, @arg_end, @env_start and @env_end are printed out
>    in /proc/$pid/stat. Another access to the data these members
>    represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
>    Any attempt to read these areas kernel tests with access_process_vm
>    helper so a user must have enough rights for this action;
> 
>  - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
>    speaking kernel doesn't care much about which exactly data is
>    sitting there because it is solely for userspace;
> 
>  - @exe_fd is referred from /proc/$pid/exe and when generating
>    coredump. We uses prctl_set_mm_exe_file_locked helper to update
>    this member, so exe-file link modification remains one-shot
>    action.
> 
> Still note that updating exe-file link now doesn't require sys-resource
> capability anymore, after all there is no much profit in preventing setup
> own file link (there are a number of ways to execute own code -- ptrace,
> ld-preload, so that the only reliable way to find which exactly code
> is executed is to inspect running program memory). Still we require
> the caller to be at least user-namespace root user.
> 
> I believe the old interface should be deprecated and ripped off
> in a couple of kernel releases if no one against.
> 
> To test if new interface is implemented in the kernel one
> can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
> the size of currently supported struct prctl_mm_map.

Please convince me that we're not adding any security holes.


> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +/*
> + * WARNING: we don't require any capability here so be very careful
> + * in what is allowed for modification from userspace.
> + */
> +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map)
> +{
> +	unsigned long mmap_max_addr = TASK_SIZE;
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *stack_vma;
> +	int error = 0;
> +
> +	/*
> +	 * Make sure the members are not somewhere outside
> +	 * of allowed address space.
> +	 */
> +#define __prctl_check_addr_space(__member)					\
> +	({									\
> +		int __rc;							\
> +		if ((unsigned long)prctl_map->__member < mmap_max_addr &&	\
> +		    (unsigned long)prctl_map->__member >= mmap_min_addr)	\
> +			__rc = 0;						\
> +		else								\
> +			__rc = -EINVAL;						\
> +		__rc;								\
> +	})
> +	error |= __prctl_check_addr_space(start_code);
> +	error |= __prctl_check_addr_space(end_code);
> +	error |= __prctl_check_addr_space(start_data);
> +	error |= __prctl_check_addr_space(end_data);
> +	error |= __prctl_check_addr_space(start_stack);
> +	error |= __prctl_check_addr_space(start_brk);
> +	error |= __prctl_check_addr_space(brk);
> +	error |= __prctl_check_addr_space(arg_start);
> +	error |= __prctl_check_addr_space(arg_end);
> +	error |= __prctl_check_addr_space(env_start);
> +	error |= __prctl_check_addr_space(env_end);

Boy this is verbose.  I had a little fiddle and came up with

--- a/kernel/sys.c~a
+++ a/kernel/sys.c
@@ -1713,19 +1713,32 @@ static int validate_prctl_map_locked(str
 			__rc = -EINVAL;					\
 		__rc;							\
 	})
-	error |= __prctl_check_addr_space(start_code);
-	error |= __prctl_check_addr_space(end_code);
-	error |= __prctl_check_addr_space(start_data);
-	error |= __prctl_check_addr_space(end_data);
-	error |= __prctl_check_addr_space(start_stack);
-	error |= __prctl_check_addr_space(start_brk);
-	error |= __prctl_check_addr_space(brk);
-	error |= __prctl_check_addr_space(arg_start);
-	error |= __prctl_check_addr_space(arg_end);
-	error |= __prctl_check_addr_space(env_start);
-	error |= __prctl_check_addr_space(env_end);
-	if (error)
-		goto out;
+	{
+		static const unsigned short offsets[] = {
+			offsetof(struct prctl_mm_map, start_code),
+			offsetof(struct prctl_mm_map, start_code),
+			offsetof(struct prctl_mm_map, end_code),
+			offsetof(struct prctl_mm_map, start_data),
+			offsetof(struct prctl_mm_map, end_data),
+			offsetof(struct prctl_mm_map, start_stack),
+			offsetof(struct prctl_mm_map, start_brk),
+			offsetof(struct prctl_mm_map, brk),
+			offsetof(struct prctl_mm_map, arg_start),
+			offsetof(struct prctl_mm_map, arg_end),
+			offsetof(struct prctl_mm_map, env_start),
+			offsetof(struct prctl_mm_map, env_end),
+		};
+		int i;
+
+		for (i = 0; i < ARRAY_SIZE(offsets); i++) {
+			u64 val = ((u64 *)prctl_map)[offsets[i]];
+
+			if (val < mmap_min_addr || val >= mmap_max_addr) {
+				error = -EINVAL;
+				goto out;
+			}
+		}
+	}

and it saved 400 bytes of text.

But it's a bit hacky.  Can anyone think of anything smarter?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-21 22:51   ` Andrew Morton
@ 2014-08-22  6:32     ` Cyrill Gorcunov
  2014-08-22  6:49       ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-22  6:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Thu, Aug 21, 2014 at 03:51:15PM -0700, Andrew Morton wrote:
...
> > 
> > Still note that updating exe-file link now doesn't require sys-resource
> > capability anymore, after all there is no much profit in preventing setup
> > own file link (there are a number of ways to execute own code -- ptrace,
> > ld-preload, so that the only reliable way to find which exactly code
> > is executed is to inspect running program memory). Still we require
> > the caller to be at least user-namespace root user.
> > 
> > I believe the old interface should be deprecated and ripped off
> > in a couple of kernel releases if no one against.
> > 
> > To test if new interface is implemented in the kernel one
> > can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
> > the size of currently supported struct prctl_mm_map.
> 
> Please convince me that we're not adding any security holes.

I've commented all the fields and their purpose and triple-checked them all,
so I don't see any sec. problems, but for same purpose I've CC'ed a number
of people just to be on safe side. Again, if we want this feature to be
somehow more controlled -- we can add some sysctl variable which would
enable/disable this interface globally.

> > +	error |= __prctl_check_addr_space(start_code);
> > +	error |= __prctl_check_addr_space(end_code);
> > +	error |= __prctl_check_addr_space(start_data);
> > +	error |= __prctl_check_addr_space(end_data);
> > +	error |= __prctl_check_addr_space(start_stack);
> > +	error |= __prctl_check_addr_space(start_brk);
> > +	error |= __prctl_check_addr_space(brk);
> > +	error |= __prctl_check_addr_space(arg_start);
> > +	error |= __prctl_check_addr_space(arg_end);
> > +	error |= __prctl_check_addr_space(env_start);
> > +	error |= __prctl_check_addr_space(env_end);
> 
> Boy this is verbose.  I had a little fiddle and came up with
...
> 
> +
> +		for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> +			u64 val = ((u64 *)prctl_map)[offsets[i]];
> +
> +			if (val < mmap_min_addr || val >= mmap_max_addr) {
> +				error = -EINVAL;
> +				goto out;
> +			}
> +		}
> +	}
> 
> and it saved 400 bytes of text.
> 
> But it's a bit hacky.  Can anyone think of anything smarter?

Looks good to me and not that hacky actually. Should I update on top
for -mm tree?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-22  6:32     ` Cyrill Gorcunov
@ 2014-08-22  6:49       ` Andrew Morton
  2014-08-22 20:38         ` Cyrill Gorcunov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2014-08-22  6:49 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Fri, 22 Aug 2014 10:32:42 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > > +	error |= __prctl_check_addr_space(start_code);
> > > +	error |= __prctl_check_addr_space(end_code);
> > > +	error |= __prctl_check_addr_space(start_data);
> > > +	error |= __prctl_check_addr_space(end_data);
> > > +	error |= __prctl_check_addr_space(start_stack);
> > > +	error |= __prctl_check_addr_space(start_brk);
> > > +	error |= __prctl_check_addr_space(brk);
> > > +	error |= __prctl_check_addr_space(arg_start);
> > > +	error |= __prctl_check_addr_space(arg_end);
> > > +	error |= __prctl_check_addr_space(env_start);
> > > +	error |= __prctl_check_addr_space(env_end);
> > 
> > Boy this is verbose.  I had a little fiddle and came up with
> ...
> > 
> > +
> > +		for (i = 0; i < ARRAY_SIZE(offsets); i++) {
> > +			u64 val = ((u64 *)prctl_map)[offsets[i]];
> > +
> > +			if (val < mmap_min_addr || val >= mmap_max_addr) {
> > +				error = -EINVAL;
> > +				goto out;
> > +			}
> > +		}
> > +	}
> > 
> > and it saved 400 bytes of text.
> > 
> > But it's a bit hacky.  Can anyone think of anything smarter?
> 
> Looks good to me and not that hacky actually.

Hacky :( I guess it's pretty safe because this is a userspace-visible
structure so we'll never be changing it.

Or will we?  What happens if we later decide that some additional field
needs to be added?  Do we version the interface?  Add a new prctl()
mode?  Let's cook up a plan for that and at least add to changelog?

> Should I update on top for -mm tree?

Spose so.  Let's see what the code savings are when the other two sites
are similarly changed?

To save a bit more space offsets[] could be an array of uchar, I guess.
A BUILD_BUG_ON(sizeof(struct prctl_map) >= 256) would keep that sane.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-22  6:49       ` Andrew Morton
@ 2014-08-22 20:38         ` Cyrill Gorcunov
  2014-08-22 20:46           ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-22 20:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Thu, Aug 21, 2014 at 11:49:12PM -0700, Andrew Morton wrote:
> > > 
> > > But it's a bit hacky.  Can anyone think of anything smarter?
> > 
> > Looks good to me and not that hacky actually.
> 
> Hacky :( I guess it's pretty safe because this is a userspace-visible
> structure so we'll never be changing it.

Well, I saw something similar in netfilter code a long ago :)

> 
> Or will we?  What happens if we later decide that some additional field
> needs to be added?  Do we version the interface?  Add a new prctl()
> mode?  Let's cook up a plan for that and at least add to changelog?

I don't expect to change it anytime soon but we still have an option --
if we decide to extend or shrink it we always can use sizeof/offsetof
helpers to check which exactly version userspace asks us to use.
As far as I understand the mm_struct is not the structure which
changes that frequently, right?

> > Should I update on top for -mm tree?
> 
> Spose so.  Let's see what the code savings are when the other two sites
> are similarly changed?
> 
> To save a bit more space offsets[] could be an array of uchar, I guess.
> A BUILD_BUG_ON(sizeof(struct prctl_map) >= 256) would keep that sane.

Sure, thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-22 20:38         ` Cyrill Gorcunov
@ 2014-08-22 20:46           ` Andrew Morton
  2014-08-22 21:13             ` Cyrill Gorcunov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2014-08-22 20:46 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Sat, 23 Aug 2014 00:38:09 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > 
> > Or will we?  What happens if we later decide that some additional field
> > needs to be added?  Do we version the interface?  Add a new prctl()
> > mode?  Let's cook up a plan for that and at least add to changelog?
> 
> I don't expect to change it anytime soon but we still have an option --
> if we decide to extend or shrink it we always can use sizeof/offsetof
> helpers to check which exactly version userspace asks us to use.

How does that work?  We just have a blob of bytes coming in from
userspace.

> As far as I understand the mm_struct is not the structure which
> changes that frequently, right?

We might find existing things which criu wants to access.  And criu
lives forever, yes?  The mm_struct is likely to change over that time
period ;)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3
  2014-08-22 20:46           ` Andrew Morton
@ 2014-08-22 21:13             ` Cyrill Gorcunov
  0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2014-08-22 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, keescook, tj, avagin, ebiederm, hpa, serge.hallyn,
	xemul, segoon, kamezawa.hiroyu, mtk.manpages, jln

On Fri, Aug 22, 2014 at 01:46:28PM -0700, Andrew Morton wrote:
> On Sat, 23 Aug 2014 00:38:09 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> 
> > > 
> > > Or will we?  What happens if we later decide that some additional field
> > > needs to be added?  Do we version the interface?  Add a new prctl()
> > > mode?  Let's cook up a plan for that and at least add to changelog?
> > 
> > I don't expect to change it anytime soon but we still have an option --
> > if we decide to extend or shrink it we always can use sizeof/offsetof
> > helpers to check which exactly version userspace asks us to use.
> 
> How does that work?  We just have a blob of bytes coming in from
> userspace.

Not just blob. We have it as a structure where all fields have a
constant size. Say we have

struct prctl_mm_map {
	__u64 start_code;
	__u64 start_code;
	__u64 some-new-field;
};

in the kernel, so its size will be 24 bytes but userspace
uses old definition without @some-new-field member (16 bytes).
So when we get a reguest with 16 bytes from userspace we can
find the userspace have passed old definition. It's not as
explicit as if we would have some @version field in struct
prctl_mm_mmap, but looks fine for me. Still I can add @version
into the structure if you prefer.

> > As far as I understand the mm_struct is not the structure which
> > changes that frequently, right?
> 
> We might find existing things which criu wants to access.  And criu
> lives forever, yes?  The mm_struct is likely to change over that time
> period ;)

Hopefully criu will live long enough so I would have a chance to update
prctl_mm_map accordingly :) Still the good thing is that once mm_struct
get changed the kernel fails to build in sys.c and the change will
be noticed immediately so we update sys.c as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-08-22 21:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-04 17:22 [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov
2014-08-04 17:22 ` [patch 1/4] mm: Introduce check_data_rlimit helper, v2 Cyrill Gorcunov
2014-08-04 20:25   ` Serge E. Hallyn
2014-08-04 17:22 ` [patch 2/4] mm: Use may_adjust_brk helper Cyrill Gorcunov
2014-08-04 20:25   ` Serge E. Hallyn
2014-08-04 17:22 ` [patch 3/4] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov
2014-08-04 20:22   ` Serge E. Hallyn
2014-08-04 17:22 ` [patch 4/4] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v3 Cyrill Gorcunov
2014-08-04 21:01   ` Serge E. Hallyn
2014-08-05  8:08   ` Andrew Vagin
2014-08-05  8:12     ` Cyrill Gorcunov
2014-08-21 22:51   ` Andrew Morton
2014-08-22  6:32     ` Cyrill Gorcunov
2014-08-22  6:49       ` Andrew Morton
2014-08-22 20:38         ` Cyrill Gorcunov
2014-08-22 20:46           ` Andrew Morton
2014-08-22 21:13             ` Cyrill Gorcunov
2014-08-15 19:11 ` [patch 0/4] prctl: set-mm -- Rework interface, v3 Cyrill Gorcunov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.