* [RFC 0/2] prctl: set-mm -- Rework interface @ 2014-07-03 14:33 Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 1/2] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov 0 siblings, 2 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-03 14:33 UTC (permalink / raw) To: linux-kernel; +Cc: gorcunov Hi! While been implementing c/r for user-namespaces we found that having CAP_SYS_RESOURCE requirement in prctl set-mm opcode is somewhat inconvenient and doesn't allow us to proceed restore of user-ns since we loose CAP_SYS_RESOURCE upon new user-ns creation. So it looks better to provide new interface and deprecate old one leaving cap-sys-resource behind. I would highly appreciate any comments on validate_prctl_map_locked() helper in second patch which tests for new struct mm_struct values being sane. I hope I didn't miss something obvious but as fas as I see even using some "bad" values can't affect the kernel since members are used mostly for statistics purpose. Please take a look once time permit, any comments are highly appreciated. Cyrill ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC 1/2] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file 2014-07-03 14:33 [RFC 0/2] prctl: set-mm -- Rework interface Cyrill Gorcunov @ 2014-07-03 14:33 ` Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov 1 sibling, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-03 14:33 UTC (permalink / raw) To: linux-kernel Cc: gorcunov, Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk [-- Attachment #1: prctl-rework-prctl_set_mm_exe_file-locked --] [-- Type: text/plain, Size: 2529 bytes --] Instead of taking mm->mmap_sem inside prctl_set_mm_exe_file move it out of and rename the helper to prctl_set_mm_exe_file_locked. This will allow to reuse this function in a next patch. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> --- kernel/sys.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) Index: linux-2.6.git/kernel/sys.c =================================================================== --- linux-2.6.git.orig/kernel/sys.c +++ linux-2.6.git/kernel/sys.c @@ -1628,12 +1628,14 @@ SYSCALL_DEFINE1(umask, int, mask) return mask; } -static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) +static int prctl_set_mm_exe_file_locked(struct mm_struct *mm, unsigned int fd) { struct fd exe; struct inode *inode; int err; + VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); + exe = fdget(fd); if (!exe.file) return -EBADF; @@ -1654,8 +1656,6 @@ static int prctl_set_mm_exe_file(struct if (err) goto exit; - down_write(&mm->mmap_sem); - /* * Forbid mm->exe_file change if old file still mapped. */ @@ -1667,7 +1667,7 @@ static int prctl_set_mm_exe_file(struct if (vma->vm_file && path_equal(&vma->vm_file->f_path, &mm->exe_file->f_path)) - goto exit_unlock; + goto exit; } /* @@ -1678,13 +1678,10 @@ static int prctl_set_mm_exe_file(struct */ err = -EPERM; if (test_and_set_bit(MMF_EXE_FILE_CHANGED, &mm->flags)) - goto exit_unlock; + goto exit; err = 0; set_mm_exe_file(mm, exe.file); /* this grabs a reference to exe.file */ -exit_unlock: - up_write(&mm->mmap_sem); - exit: fdput(exe); return err; @@ -1704,8 +1701,12 @@ static int prctl_set_mm(int opt, unsigne if (!capable(CAP_SYS_RESOURCE)) return -EPERM; - if (opt == PR_SET_MM_EXE_FILE) - return prctl_set_mm_exe_file(mm, (unsigned int)addr); + if (opt == PR_SET_MM_EXE_FILE) { + down_write(&mm->mmap_sem); + error = prctl_set_mm_exe_file_locked(mm, (unsigned int)addr); + up_write(&mm->mmap_sem); + return error; + } if (addr >= TASK_SIZE || addr < mmap_min_addr) return -EINVAL; ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-03 14:33 [RFC 0/2] prctl: set-mm -- Rework interface Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 1/2] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov @ 2014-07-03 14:33 ` Cyrill Gorcunov 2014-07-03 20:34 ` Cyrill Gorcunov ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-03 14:33 UTC (permalink / raw) To: linux-kernel Cc: gorcunov, Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk [-- Attachment #1: prctl-rework-new-mm-map-2 --] [-- Type: text/plain, Size: 9155 bytes --] During development of c/r we've noticed that in case if we need to support user namespaces we face a problem with capabilities in prctl(PR_SET_MM, ...) call. Current PR_SET_MM code forbids to modify fields if no CAP_SYS_RESOURCE granted, but rather relies on one who use this interface is passing more-less sane values (though the values must pass the basic validation procedure). It seems a better approach is to eliminate CAP_SYS_RESOURCE check but provide all new values in one bundle, which would allow the kernel to make more intensive test for sanity of values and same time allow us to support checkpoint/restore of user namespaces. Thus a new command (PR_SET_MM_MAP) introduced. It takes a pointer of prctl_mm_map structure which carries all members to be updated. Most intensive work is done in validate_prctl_map_locked helper, because we need to make sure the values are valid. Thus we do - check the values are laying inside available user address space - stack, brk, command line arguments and evnironment variables must point to already existing VMA - values must be ordered (start < end) - if RLIMITs are defined don't allow to exceed it with new values Since it uses prctl_set_mm_exe_file_locked helper, updating the exe-file link is one-shot action for security reason. I believe the old interface should be deprecated and ripped off in a couple of kernel releases if noone against. To test if new interface is implemented in the kernel one can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently supported struct prctl_mm_map. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> --- include/uapi/linux/prctl.h | 19 ++++ kernel/sys.c | 188 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 206 insertions(+), 1 deletion(-) Index: linux-2.6.git/include/uapi/linux/prctl.h =================================================================== --- linux-2.6.git.orig/include/uapi/linux/prctl.h +++ linux-2.6.git/include/uapi/linux/prctl.h @@ -119,6 +119,25 @@ # define PR_SET_MM_ENV_END 11 # define PR_SET_MM_AUXV 12 # define PR_SET_MM_EXE_FILE 13 +# define PR_SET_MM_MAP 14 +# define PR_SET_MM_MAP_SIZE 15 + +struct prctl_mm_map { + unsigned long start_code; + unsigned long end_code; + unsigned long start_data; + unsigned long end_data; + unsigned long start_brk; + unsigned long brk; + unsigned long start_stack; + unsigned long arg_start; + unsigned long arg_end; + unsigned long env_start; + unsigned long env_end; + unsigned long *auxv; + unsigned int auxv_size; + int exe_fd; +}; /* * Set specific pid that is allowed to ptrace the current task. Index: linux-2.6.git/kernel/sys.c =================================================================== --- linux-2.6.git.orig/kernel/sys.c +++ linux-2.6.git/kernel/sys.c @@ -1687,6 +1687,187 @@ exit: return err; } +/* + * WARNING: we don't require any capability here so be very careful + * in what is allowed for modification from userspace. + */ +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) +{ + unsigned long mmap_max_addr = TASK_SIZE; + struct mm_struct *mm = current->mm; + struct vm_area_struct *stack_vma; + unsigned long rlim; + int error = 0; + + /* + * Make sure the members are not somewhere outside + * of allowed address space. + */ +#define __prctl_check_addr_space(__map, __member) \ + ({ \ + int __rc; \ + if ((unsigned long)__map->__member < mmap_max_addr && \ + (unsigned long)__map->__member >= mmap_min_addr) \ + __rc = 0; \ + else \ + __rc = -EINVAL; \ + __rc; \ + }) + + error |= __prctl_check_addr_space(prctl_map, start_code); + error |= __prctl_check_addr_space(prctl_map, end_code); + error |= __prctl_check_addr_space(prctl_map, start_data); + error |= __prctl_check_addr_space(prctl_map, end_data); + error |= __prctl_check_addr_space(prctl_map, start_stack); + error |= __prctl_check_addr_space(prctl_map, start_brk); + error |= __prctl_check_addr_space(prctl_map, brk); + error |= __prctl_check_addr_space(prctl_map, arg_start); + error |= __prctl_check_addr_space(prctl_map, arg_end); + error |= __prctl_check_addr_space(prctl_map, env_start); + error |= __prctl_check_addr_space(prctl_map, env_end); + if (error) + goto out; +#undef __prctl_check_addr_space + + /* + * Stack, brk, command line arguments and environment must exist. + */ + stack_vma = find_vma(mm, prctl_map->start_stack); + if (!stack_vma) { + error = -EINVAL; + goto out; + } +#define __prctl_check_vma(mm, addr) find_vma(mm, addr) ? 0 : -EINVAL + error |= __prctl_check_vma(mm, prctl_map->start_brk); + error |= __prctl_check_vma(mm, prctl_map->brk); + error |= __prctl_check_vma(mm, prctl_map->arg_start); + error |= __prctl_check_vma(mm, prctl_map->arg_end); + error |= __prctl_check_vma(mm, prctl_map->env_start); + error |= __prctl_check_vma(mm, prctl_map->env_end); + if (error) + goto out; +#undef __prctl_check_vma + + /* + * Make sure the pairs are ordered. + */ +#define __prctl_check_order(__map, __m1, __m2) \ + __map->__m2 <= __map->__m1 + if (__prctl_check_order(prctl_map, start_code, end_code) || + __prctl_check_order(prctl_map, start_data, end_data) || + __prctl_check_order(prctl_map, arg_start, arg_end) || + __prctl_check_order(prctl_map, env_start, env_end)) + goto out; +#undef __prctl_check_order + + error = -EINVAL; + + /* + * @brk should be after @end_data in traditional maps. + */ + if (prctl_map->start_brk <= prctl_map->end_data || + prctl_map->brk <= prctl_map->end_data) + goto out; + + /* + * Neither we should allow to override limits if they set. + */ + rlim = rlimit(RLIMIT_DATA); + if (rlim < RLIM_INFINITY) { + if ((prctl_map->brk - prctl_map->start_brk) + + (prctl_map->end_data - prctl_map->start_data) > rlim) + goto out; + } + + rlim = rlimit(RLIMIT_STACK); + if (rlim < RLIM_INFINITY) { +#ifdef CONFIG_STACK_GROWSUP + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; +#else + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; +#endif + if (left > rlim) + goto out; + } + + /* + * Someone is trying to cheat the auxv vector. + */ + if (prctl_map->auxv && prctl_map->auxv_size > sizeof(mm->saved_auxv)) + goto out; + error = 0; +out: + return error; +} + +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) +{ + struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; + unsigned long user_auxv[AT_VECTOR_SIZE]; + struct mm_struct *mm = current->mm; + int error = -EINVAL; + + BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); + + if (opt == PR_SET_MM_MAP_SIZE) + return put_user((unsigned int)sizeof(prctl_map), + (unsigned int __user *)addr); + + if (data_size != sizeof(prctl_map)) + return -EINVAL; + + if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) + return -EFAULT; + + down_read(&mm->mmap_sem); + + if (validate_prctl_map_locked(&prctl_map)) + goto out; + + if (prctl_map.auxv && prctl_map.auxv_size) { + up_read(&mm->mmap_sem); + memset(user_auxv, 0, sizeof(user_auxv)); + error = copy_from_user(user_auxv, + (const void __user *)prctl_map.auxv, + prctl_map.auxv_size); + down_read(&mm->mmap_sem); + if (error) + goto out; + } + + if (prctl_map.exe_fd != (u32)-1) { + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); + if (error) + goto out; + } + + if (prctl_map.auxv && prctl_map.auxv_size) { + user_auxv[AT_VECTOR_SIZE - 2] = 0; + user_auxv[AT_VECTOR_SIZE - 1] = 0; + + task_lock(current); + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); + task_unlock(current); + } + + mm->start_code = prctl_map.start_code; + mm->end_code = prctl_map.end_code; + mm->start_data = prctl_map.start_data; + mm->end_data = prctl_map.end_data; + mm->start_brk = prctl_map.start_brk; + mm->brk = prctl_map.brk; + mm->start_stack = prctl_map.start_stack; + mm->arg_start = prctl_map.arg_start; + mm->arg_end = prctl_map.arg_end; + mm->env_start = prctl_map.env_start; + mm->env_end = prctl_map.env_end; + + error = 0; +out: + up_read(&mm->mmap_sem); + return error; +} + static int prctl_set_mm(int opt, unsigned long addr, unsigned long arg4, unsigned long arg5) { @@ -1695,9 +1876,14 @@ static int prctl_set_mm(int opt, unsigne struct vm_area_struct *vma; int error; - if (arg5 || (arg4 && opt != PR_SET_MM_AUXV)) + if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && + opt != PR_SET_MM_MAP && + opt != PR_SET_MM_MAP_SIZE))) return -EINVAL; + if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) + return prctl_set_mm_map(opt, (const void __user *)addr, arg4); + if (!capable(CAP_SYS_RESOURCE)) return -EPERM; ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov @ 2014-07-03 20:34 ` Cyrill Gorcunov 2014-07-04 7:52 ` Andrew Vagin 2014-07-08 19:08 ` Cyrill Gorcunov 2 siblings, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-03 20:34 UTC (permalink / raw) To: linux-kernel Cc: Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Thu, Jul 03, 2014 at 06:33:20PM +0400, Cyrill Gorcunov wrote: > During development of c/r we've noticed that in case if we need to > support user namespaces we face a problem with capabilities in > prctl(PR_SET_MM, ...) call. Sigh, I updated changelog after I started writting the 0/0 message, here is an updated variant (only patch description is changed code remains the same). Sorry for inconvenience. --- From: Cyrill Gorcunov <gorcunov@openvz.org> Subject: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation During development of c/r we've noticed that in case if we need to support user namespaces we face a problem with capabilities in prctl(PR_SET_MM, ...) call. Current PR_SET_MM code forbids to modify fields if no CAP_SYS_RESOURCE granted, but rather relies on one who use this interface is passing more-less sane values (though the values must pass the basic validation procedure). It seems a better approach is to eliminate CAP_SYS_RESOURCE check but provide all new values in one bundle, which would allow the kernel to make more intensive test for sanity of values and same time allow us to support checkpoint/restore of user namespaces. Thus a new command (PR_SET_MM_MAP) introduced. It takes a pointer of prctl_mm_map structure which carries all members to be updated. Most intensive work is done in validate_prctl_map_locked helper, because we need to make sure the values are valid. Thus we do - check the values are laying inside available user address space - stack, brk, command line arguments and evnironment variables must point to already existing VMA - values must be ordered (start < end) - if RLIMITs are defined don't allow to exceed it with new values Since it uses prctl_set_mm_exe_file_locked helper, updating the exe-file link remains one-shot action. Still note that updating exe-file link now doesn't require sys-resource capability anymore, after all there is no much profit in preventing setup own file link (there are a number of ways to execute own code -- ptrace, ld-preload, so that the only reliable way to find which exactly code is executed is to inspect running program memory). I believe the old interface should be deprecated and ripped off in a couple of kernel releases if noone against. To test if new interface is implemented in the kernel one can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently supported struct prctl_mm_map. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> --- include/uapi/linux/prctl.h | 19 ++++ kernel/sys.c | 188 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 206 insertions(+), 1 deletion(-) Index: linux-2.6.git/include/uapi/linux/prctl.h =================================================================== --- linux-2.6.git.orig/include/uapi/linux/prctl.h +++ linux-2.6.git/include/uapi/linux/prctl.h @@ -119,6 +119,25 @@ # define PR_SET_MM_ENV_END 11 # define PR_SET_MM_AUXV 12 # define PR_SET_MM_EXE_FILE 13 +# define PR_SET_MM_MAP 14 +# define PR_SET_MM_MAP_SIZE 15 + +struct prctl_mm_map { + unsigned long start_code; + unsigned long end_code; + unsigned long start_data; + unsigned long end_data; + unsigned long start_brk; + unsigned long brk; + unsigned long start_stack; + unsigned long arg_start; + unsigned long arg_end; + unsigned long env_start; + unsigned long env_end; + unsigned long *auxv; + unsigned int auxv_size; + int exe_fd; +}; /* * Set specific pid that is allowed to ptrace the current task. Index: linux-2.6.git/kernel/sys.c =================================================================== --- linux-2.6.git.orig/kernel/sys.c +++ linux-2.6.git/kernel/sys.c @@ -1687,6 +1687,187 @@ exit: return err; } +/* + * WARNING: we don't require any capability here so be very careful + * in what is allowed for modification from userspace. + */ +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) +{ + unsigned long mmap_max_addr = TASK_SIZE; + struct mm_struct *mm = current->mm; + struct vm_area_struct *stack_vma; + unsigned long rlim; + int error = 0; + + /* + * Make sure the members are not somewhere outside + * of allowed address space. + */ +#define __prctl_check_addr_space(__map, __member) \ + ({ \ + int __rc; \ + if ((unsigned long)__map->__member < mmap_max_addr && \ + (unsigned long)__map->__member >= mmap_min_addr) \ + __rc = 0; \ + else \ + __rc = -EINVAL; \ + __rc; \ + }) + + error |= __prctl_check_addr_space(prctl_map, start_code); + error |= __prctl_check_addr_space(prctl_map, end_code); + error |= __prctl_check_addr_space(prctl_map, start_data); + error |= __prctl_check_addr_space(prctl_map, end_data); + error |= __prctl_check_addr_space(prctl_map, start_stack); + error |= __prctl_check_addr_space(prctl_map, start_brk); + error |= __prctl_check_addr_space(prctl_map, brk); + error |= __prctl_check_addr_space(prctl_map, arg_start); + error |= __prctl_check_addr_space(prctl_map, arg_end); + error |= __prctl_check_addr_space(prctl_map, env_start); + error |= __prctl_check_addr_space(prctl_map, env_end); + if (error) + goto out; +#undef __prctl_check_addr_space + + /* + * Stack, brk, command line arguments and environment must exist. + */ + stack_vma = find_vma(mm, prctl_map->start_stack); + if (!stack_vma) { + error = -EINVAL; + goto out; + } +#define __prctl_check_vma(mm, addr) find_vma(mm, addr) ? 0 : -EINVAL + error |= __prctl_check_vma(mm, prctl_map->start_brk); + error |= __prctl_check_vma(mm, prctl_map->brk); + error |= __prctl_check_vma(mm, prctl_map->arg_start); + error |= __prctl_check_vma(mm, prctl_map->arg_end); + error |= __prctl_check_vma(mm, prctl_map->env_start); + error |= __prctl_check_vma(mm, prctl_map->env_end); + if (error) + goto out; +#undef __prctl_check_vma + + /* + * Make sure the pairs are ordered. + */ +#define __prctl_check_order(__map, __m1, __m2) \ + __map->__m2 <= __map->__m1 + if (__prctl_check_order(prctl_map, start_code, end_code) || + __prctl_check_order(prctl_map, start_data, end_data) || + __prctl_check_order(prctl_map, arg_start, arg_end) || + __prctl_check_order(prctl_map, env_start, env_end)) + goto out; +#undef __prctl_check_order + + error = -EINVAL; + + /* + * @brk should be after @end_data in traditional maps. + */ + if (prctl_map->start_brk <= prctl_map->end_data || + prctl_map->brk <= prctl_map->end_data) + goto out; + + /* + * Neither we should allow to override limits if they set. + */ + rlim = rlimit(RLIMIT_DATA); + if (rlim < RLIM_INFINITY) { + if ((prctl_map->brk - prctl_map->start_brk) + + (prctl_map->end_data - prctl_map->start_data) > rlim) + goto out; + } + + rlim = rlimit(RLIMIT_STACK); + if (rlim < RLIM_INFINITY) { +#ifdef CONFIG_STACK_GROWSUP + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; +#else + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; +#endif + if (left > rlim) + goto out; + } + + /* + * Someone is trying to cheat the auxv vector. + */ + if (prctl_map->auxv && prctl_map->auxv_size > sizeof(mm->saved_auxv)) + goto out; + error = 0; +out: + return error; +} + +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) +{ + struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; + unsigned long user_auxv[AT_VECTOR_SIZE]; + struct mm_struct *mm = current->mm; + int error = -EINVAL; + + BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); + + if (opt == PR_SET_MM_MAP_SIZE) + return put_user((unsigned int)sizeof(prctl_map), + (unsigned int __user *)addr); + + if (data_size != sizeof(prctl_map)) + return -EINVAL; + + if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) + return -EFAULT; + + down_read(&mm->mmap_sem); + + if (validate_prctl_map_locked(&prctl_map)) + goto out; + + if (prctl_map.auxv && prctl_map.auxv_size) { + up_read(&mm->mmap_sem); + memset(user_auxv, 0, sizeof(user_auxv)); + error = copy_from_user(user_auxv, + (const void __user *)prctl_map.auxv, + prctl_map.auxv_size); + down_read(&mm->mmap_sem); + if (error) + goto out; + } + + if (prctl_map.exe_fd != (u32)-1) { + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); + if (error) + goto out; + } + + if (prctl_map.auxv && prctl_map.auxv_size) { + user_auxv[AT_VECTOR_SIZE - 2] = 0; + user_auxv[AT_VECTOR_SIZE - 1] = 0; + + task_lock(current); + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); + task_unlock(current); + } + + mm->start_code = prctl_map.start_code; + mm->end_code = prctl_map.end_code; + mm->start_data = prctl_map.start_data; + mm->end_data = prctl_map.end_data; + mm->start_brk = prctl_map.start_brk; + mm->brk = prctl_map.brk; + mm->start_stack = prctl_map.start_stack; + mm->arg_start = prctl_map.arg_start; + mm->arg_end = prctl_map.arg_end; + mm->env_start = prctl_map.env_start; + mm->env_end = prctl_map.env_end; + + error = 0; +out: + up_read(&mm->mmap_sem); + return error; +} + static int prctl_set_mm(int opt, unsigned long addr, unsigned long arg4, unsigned long arg5) { @@ -1695,9 +1876,14 @@ static int prctl_set_mm(int opt, unsigne struct vm_area_struct *vma; int error; - if (arg5 || (arg4 && opt != PR_SET_MM_AUXV)) + if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && + opt != PR_SET_MM_MAP && + opt != PR_SET_MM_MAP_SIZE))) return -EINVAL; + if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) + return prctl_set_mm_map(opt, (const void __user *)addr, arg4); + if (!capable(CAP_SYS_RESOURCE)) return -EPERM; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov 2014-07-03 20:34 ` Cyrill Gorcunov @ 2014-07-04 7:52 ` Andrew Vagin 2014-07-04 8:11 ` Cyrill Gorcunov 2014-07-08 19:08 ` Cyrill Gorcunov 2 siblings, 1 reply; 19+ messages in thread From: Andrew Vagin @ 2014-07-04 7:52 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-kernel, Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Thu, Jul 03, 2014 at 06:33:20PM +0400, Cyrill Gorcunov wrote: > During development of c/r we've noticed that in case if we need to > support user namespaces we face a problem with capabilities in > prctl(PR_SET_MM, ...) call. > > Current PR_SET_MM code forbids to modify fields if no CAP_SYS_RESOURCE > granted, but rather relies on one who use this interface is passing > more-less sane values (though the values must pass the basic validation > procedure). > > It seems a better approach is to eliminate CAP_SYS_RESOURCE check but > provide all new values in one bundle, which would allow the kernel to make > more intensive test for sanity of values and same time allow us to > support checkpoint/restore of user namespaces. > > Thus a new command (PR_SET_MM_MAP) introduced. It takes a pointer of > prctl_mm_map structure which carries all members to be updated. > > Most intensive work is done in validate_prctl_map_locked helper, > because we need to make sure the values are valid. Thus we do > > - check the values are laying inside available user address space > - stack, brk, command line arguments and evnironment variables > must point to already existing VMA > - values must be ordered (start < end) > - if RLIMITs are defined don't allow to exceed it with new values > > Since it uses prctl_set_mm_exe_file_locked helper, updating the > exe-file link is one-shot action for security reason. > > I believe the old interface should be deprecated and ripped off > in a couple of kernel releases if noone against. > > To test if new interface is implemented in the kernel one > can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns > the size of currently supported struct prctl_mm_map. I like the idea of this patch. See a few comments inline > > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andrew Vagin <avagin@openvz.org> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Serge Hallyn <serge.hallyn@canonical.com> > Cc: Pavel Emelyanov <xemul@parallels.com> > Cc: Vasiliy Kulikov <segoon@openwall.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: Michael Kerrisk <mtk.manpages@gmail.com> > --- > include/uapi/linux/prctl.h | 19 ++++ > kernel/sys.c | 188 ++++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 206 insertions(+), 1 deletion(-) > > Index: linux-2.6.git/include/uapi/linux/prctl.h > =================================================================== > --- linux-2.6.git.orig/include/uapi/linux/prctl.h > +++ linux-2.6.git/include/uapi/linux/prctl.h > @@ -119,6 +119,25 @@ > # define PR_SET_MM_ENV_END 11 > # define PR_SET_MM_AUXV 12 > # define PR_SET_MM_EXE_FILE 13 > +# define PR_SET_MM_MAP 14 > +# define PR_SET_MM_MAP_SIZE 15 > + > +struct prctl_mm_map { > + unsigned long start_code; "unsigned long" has different sizes on x86_64 and x86, so a compat is required for x32 processes on x64 kernel. > + unsigned long end_code; > + unsigned long start_data; > + unsigned long end_data; > + unsigned long start_brk; > + unsigned long brk; ... > + > + error |= __prctl_check_addr_space(prctl_map, start_code); > + error |= __prctl_check_addr_space(prctl_map, end_code); > + error |= __prctl_check_addr_space(prctl_map, start_data); > + error |= __prctl_check_addr_space(prctl_map, end_data); > + error |= __prctl_check_addr_space(prctl_map, start_stack); > + error |= __prctl_check_addr_space(prctl_map, start_brk); > + error |= __prctl_check_addr_space(prctl_map, brk); > + error |= __prctl_check_addr_space(prctl_map, arg_start); > + error |= __prctl_check_addr_space(prctl_map, arg_end); > + error |= __prctl_check_addr_space(prctl_map, env_start); > + error |= __prctl_check_addr_space(prctl_map, env_end); > + if (error) > + goto out; > +#undef __prctl_check_addr_space > + > + /* > + * Stack, brk, command line arguments and environment must exist. > + */ > + stack_vma = find_vma(mm, prctl_map->start_stack); Why do we not use __prctl_check_vma here? > + if (!stack_vma) { > + error = -EINVAL; > + goto out; > + } > +#define __prctl_check_vma(mm, addr) find_vma(mm, addr) ? 0 : -EINVAL > + error |= __prctl_check_vma(mm, prctl_map->start_brk); > + error |= __prctl_check_vma(mm, prctl_map->brk); > + error |= __prctl_check_vma(mm, prctl_map->arg_start); > + error |= __prctl_check_vma(mm, prctl_map->arg_end); > + error |= __prctl_check_vma(mm, prctl_map->env_start); > + error |= __prctl_check_vma(mm, prctl_map->env_end); > + if (error) > + goto out; > +#undef __prctl_check_vma > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-04 7:52 ` Andrew Vagin @ 2014-07-04 8:11 ` Cyrill Gorcunov 0 siblings, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-04 8:11 UTC (permalink / raw) To: Andrew Vagin Cc: linux-kernel, Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Fri, Jul 04, 2014 at 11:52:56AM +0400, Andrew Vagin wrote: > > + > > +struct prctl_mm_map { > > + unsigned long start_code; > > "unsigned long" has different sizes on x86_64 and x86, so a compat > is required for x32 processes on x64 kernel. Yes, good point. I think we can use u64/32 types instead to make a code shared. I'll update once I collect all comments about aproach. > > + > > + /* > > + * Stack, brk, command line arguments and environment must exist. > > + */ > > + stack_vma = find_vma(mm, prctl_map->start_stack); > > Why do we not use __prctl_check_vma here? Well __prctl_check_vma does return error or zero while I need this vma reference for stack rlim check which is done below in code. > > > + if (!stack_vma) { > > + error = -EINVAL; > > + goto out; > > + } > > +#define __prctl_check_vma(mm, addr) find_vma(mm, addr) ? 0 : -EINVAL > > + error |= __prctl_check_vma(mm, prctl_map->start_brk); > > + error |= __prctl_check_vma(mm, prctl_map->brk); Cyrill ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov 2014-07-03 20:34 ` Cyrill Gorcunov 2014-07-04 7:52 ` Andrew Vagin @ 2014-07-08 19:08 ` Cyrill Gorcunov 2014-07-08 21:38 ` Andrew Morton 2 siblings, 1 reply; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-08 19:08 UTC (permalink / raw) To: linux-kernel Cc: Kees Cook, Tejun Heo, Andrew Morton, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Thu, Jul 03, 2014 at 06:33:20PM +0400, Cyrill Gorcunov wrote: > During development of c/r we've noticed that in case if we need to > support user namespaces we face a problem with capabilities in > prctl(PR_SET_MM, ...) call. > > Current PR_SET_MM code forbids to modify fields if no CAP_SYS_RESOURCE > granted, but rather relies on one who use this interface is passing > more-less sane values (though the values must pass the basic validation > procedure). > > It seems a better approach is to eliminate CAP_SYS_RESOURCE check but > provide all new values in one bundle, which would allow the kernel to make > more intensive test for sanity of values and same time allow us to > support checkpoint/restore of user namespaces. > > Thus a new command (PR_SET_MM_MAP) introduced. It takes a pointer of > prctl_mm_map structure which carries all members to be updated. > > Most intensive work is done in validate_prctl_map_locked helper, > because we need to make sure the values are valid. Thus we do > > - check the values are laying inside available user address space > - stack, brk, command line arguments and evnironment variables > must point to already existing VMA > - values must be ordered (start < end) > - if RLIMITs are defined don't allow to exceed it with new values > > Since it uses prctl_set_mm_exe_file_locked helper, updating the > exe-file link is one-shot action for security reason. > > I believe the old interface should be deprecated and ripped off > in a couple of kernel releases if noone against. > > To test if new interface is implemented in the kernel one > can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns > the size of currently supported struct prctl_mm_map. > > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andrew Vagin <avagin@openvz.org> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Serge Hallyn <serge.hallyn@canonical.com> > Cc: Pavel Emelyanov <xemul@parallels.com> > Cc: Vasiliy Kulikov <segoon@openwall.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: Michael Kerrisk <mtk.manpages@gmail.com> Ping. Guys, any commens please? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-08 19:08 ` Cyrill Gorcunov @ 2014-07-08 21:38 ` Andrew Morton 2014-07-08 22:13 ` Cyrill Gorcunov 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2014-07-08 21:38 UTC (permalink / raw) To: Cyrill Gorcunov Cc: linux-kernel, Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Tue, 8 Jul 2014 23:08:49 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > Ping. Guys, any commens please? Well, allowing a process to modify pretty deep internals like this is always scary from a security point of view, and now we're removing CAP_SYS_RESOURCE protections. Yikes. Convince me we aren't handing out root here. The changelog doesn't make it clear (to me) why this is actually being done. criu runs unprivileged? What's the requirement here? struct prctl_mm_map could do with a nice comment explaining its role in the world. I'm not seeing a coherent description of the proposed userspace interface. We'll eventually want to update the prctl manpage for this, so how about laying out all the needed details now, at patch review time so we can see what is proposed. Why isn't the newly-added code under #ifdef CONFIG_CHECKPOINT_RESTORE? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-08 21:38 ` Andrew Morton @ 2014-07-08 22:13 ` Cyrill Gorcunov 2014-07-09 14:13 ` Cyrill Gorcunov 0 siblings, 1 reply; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-08 22:13 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Tue, Jul 08, 2014 at 02:38:30PM -0700, Andrew Morton wrote: > On Tue, 8 Jul 2014 23:08:49 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > > Ping. Guys, any commens please? > > Well, allowing a process to modify pretty deep internals like this is > always scary from a security point of view, and now we're removing > CAP_SYS_RESOURCE protections. Yikes. Convince me we aren't handing > out root here. > > The changelog doesn't make it clear (to me) why this is actually being > done. criu runs unprivileged? What's the requirement here? It comes from user-ns, we are almost ready to support c/r for user namespaces and faced a problem -- when we create new user-namespace we loose CAP_SYS_RESOURCE bit and criu fails to proceed. Andrew Vagin was implementing user-ns in criu and as far as I remember there were a talk with Eric which (again if my memory doesn't betray me) end up in -- guys, simply do proper check for values you read from user space instead of relying on CAP_SYS_RESOURCE. Erik, am I right? Probably I should walk over _every_ member we're modifying explaning where exactly is used in kernel. Still the good news about all this members we modify -- they are used for statistics mostly except brk/stack related members but they are checked very carefully to not exceed the limits (if the limits are set). > struct prctl_mm_map could do with a nice comment explaining its role in > the world. ok, i'll update > I'm not seeing a coherent description of the proposed userspace > interface. We'll eventually want to update the prctl manpage for this, > so how about laying out all the needed details now, at patch review > time so we can see what is proposed. Sure, I'll write more descriptive comment since original "It takes a pointer of prctl_mm_map structure which carries all members to be updated" is too short. > > Why isn't the newly-added code under #ifdef CONFIG_CHECKPOINT_RESTORE? Initially all prctl set-mm opcodes were CONFIG_CHECKPOINT_RESTORE guarded but then assumed that such modification might be needed not only for criu but other tools as well and this #ifdef were dropped off. Now new PR_SET_MM_MAP is a part of old interface so I'm not sure if it should be CONFIGed. That said it is not a problem to wrap it but looks unreasonable in this particular case. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-08 22:13 ` Cyrill Gorcunov @ 2014-07-09 14:13 ` Cyrill Gorcunov 2014-07-09 14:53 ` Kees Cook 0 siblings, 1 reply; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-09 14:13 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, Kees Cook, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Wed, Jul 09, 2014 at 02:13:36AM +0400, Cyrill Gorcunov wrote: > > Still the good news about all this members we modify -- they are used > for statistics mostly except brk/stack related members but they > are checked very carefully to not exceed the limits (if the > limits are set). > > > struct prctl_mm_map could do with a nice comment explaining its role in > > the world. > > ok, i'll update > > > I'm not seeing a coherent description of the proposed userspace > > interface. We'll eventually want to update the prctl manpage for this, > > so how about laying out all the needed details now, at patch review > > time so we can see what is proposed. > > Sure, I'll write more descriptive comment since original "It takes > a pointer of prctl_mm_map structure which carries all members to be > updated" is too short. Here is a way more descriptove changelog I hope. Please poke me if more details needed, or something should be improved/changed and etc. --- From: Cyrill Gorcunov <gorcunov@openvz.org> Subject: prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation During development of c/r we've noticed that in case if we need to support user namespaces we face a problem with capabilities in prctl(PR_SET_MM, ...) call, in particular once new user namespace is created capable(CAP_SYS_RESOURCE) no longer passes. A approach is to eliminate CAP_SYS_RESOURCE check but pass all new values in one bundle, which would allow the kernel to make more intensive test for sanity of values and same time allow us to support checkpoint/restore of user namespaces. Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of prctl_mm_map structure which carries all the members to be updated. prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size) struct prctl_mm_map { __u64 start_code; __u64 end_code; __u64 start_data; __u64 end_data; __u64 start_brk; __u64 brk; __u64 start_stack; __u64 arg_start; __u64 arg_end; __u64 env_start; __u64 env_end; __u64 *auxv; __u32 auxv_size; __u32 exe_fd; }; All members except @exe_fd correspond ones of struct mm_struct. To figure out which available values these members may take here are meanings of the members. - start_code, end_code: represent bounds of executable code area - start_data, end_data: represent bounds of data area - start_brk, brk: used to calculate bounds for brk() syscall - start_stack: used when accounting space needed for command line arguments, environment and shmat() syscall - arg_start, arg_end, env_start, env_end: represent memory area supplied for command line arguments and environment variables - auxv, auxv_size: carries auxiliary vector, Elf format specifics - exe_fd: file descriptor number for executable link (/proc/self/exe) Thus we apply the following requirements to the values 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr) interval. 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting VMAs (say a program maps own new .text and .data segments during execution) the rest of members should belong to VMA which must exist. 3) Addresses must be ordered, ie @start_ member must not be greater or equal to appropriate @end_ member. 4) As in regular Elf loading procedure we require that @start_brk and @brk be greater than @end_data. 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not exceed existing limit. Same applies to RLIMIT_STACK. 6) Auxiliary vector size must not exceed existing one (which is predefined as AT_VECTOR_SIZE and depends on architecture). 7) File descriptor passed in @exe_file should be pointing to executable file (because we use existing prctl_set_mm_exe_file_locked helper it ensures that the file we are going to use as exe link has all required permission granted). Now about where these members are involved inside kernel code: - @start_code and @end_code are used in /proc/$pid/[stat|statm] output; - @start_data and @end_data are used in /proc/$pid/[stat|statm] output, also they are considered if there enough space for brk() syscall result if RLIMIT_DATA is set; - @start_brk shown in /proc/$pid/stat output and accounted in brk() syscall if RLIMIT_DATA is set; also this member is tested to find a symbolic name of mmap event for perf system (we choose if event is generated for "heap" area); one more aplication is selinux -- we test if a process has PROCESS__EXECHEAP permission if trying to make heap area being executable with mprotect() syscall; - @brk is a current value for brk() syscall which lays inside heap area, it's shown in /proc/$pid/stat. When syscall brk() succesfully provides new memory area to a user space upon brk() completion the mm::brk is updated to carry new value; Both @start_brk and @brk are actively used in /proc/$pid/maps and /proc/$pid/smaps output to find a symbolic name "heap" for VMA being scanned; - @start_stack is printed out in /proc/$pid/stat and used to find a symbolic name "stack" for task and threads in /proc/$pid/maps and /proc/$pid/smaps output, and as the same as with @start_brk -- perf system uses it for event naming. Also kernel treat this member as a start address of where to map vDSO pages and to check if there is enough space for shmat() syscall; - @arg_start, @arg_end, @env_start and @env_end are printed out in /proc/$pid/stat. Another access to the data these members represent is to read /proc/$pid/environ or /proc/$pid/cmdline. Any attempt to read these areas kernel tests with access_process_vm helper so a user must have enough rights for this action; - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly speaking kernel doesn't care much about which exactly data is sitting there because it is solely for userspace; - @exe_fd is referred from /proc/$pid/exe and when generating coredump. We uses prctl_set_mm_exe_file_locked helper to update this member, so exe-file link modification remains one-shot action. Still note that updating exe-file link now doesn't require sys-resource capability anymore, after all there is no much profit in preventing setup own file link (there are a number of ways to execute own code -- ptrace, ld-preload, so that the only reliable way to find which exactly code is executed is to inspect running program memory). I believe the old interface should be deprecated and ripped off in a couple of kernel releases if no one against. To test if new interface is implemented in the kernel one can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently supported struct prctl_mm_map. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> --- include/uapi/linux/prctl.h | 25 +++++ kernel/sys.c | 192 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 216 insertions(+), 1 deletion(-) Index: linux-2.6.git/include/uapi/linux/prctl.h =================================================================== --- linux-2.6.git.orig/include/uapi/linux/prctl.h +++ linux-2.6.git/include/uapi/linux/prctl.h @@ -119,6 +119,31 @@ # define PR_SET_MM_ENV_END 11 # define PR_SET_MM_AUXV 12 # define PR_SET_MM_EXE_FILE 13 +# define PR_SET_MM_MAP 14 +# define PR_SET_MM_MAP_SIZE 15 + +/* + * This structure provides new memory descriptor + * map which mostly modifies /proc/pid/stat[m] + * output for a task. This mostly done in a + * sake of checkpoint/restore functionality. + */ +struct prctl_mm_map { + __u64 start_code; /* code section bounds */ + __u64 end_code; + __u64 start_data; /* data section bounds */ + __u64 end_data; + __u64 start_brk; /* heap for brk() syscall */ + __u64 brk; + __u64 start_stack; /* stack starts at */ + __u64 arg_start; /* command line arguments bounds */ + __u64 arg_end; + __u64 env_start; /* environment variables bounds */ + __u64 env_end; + __u64 *auxv; /* auxiliary vector */ + __u32 auxv_size; /* vector size */ + __u32 exe_fd; /* /proc/$pid/exe link file */ +}; /* * Set specific pid that is allowed to ptrace the current task. Index: linux-2.6.git/kernel/sys.c =================================================================== --- linux-2.6.git.orig/kernel/sys.c +++ linux-2.6.git/kernel/sys.c @@ -1687,6 +1687,189 @@ exit: return err; } +#ifdef CONFIG_CHECKPOINT_RESTORE +/* + * WARNING: we don't require any capability here so be very careful + * in what is allowed for modification from userspace. + */ +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) +{ + unsigned long mmap_max_addr = TASK_SIZE; + struct mm_struct *mm = current->mm; + struct vm_area_struct *stack_vma; + unsigned long rlim; + int error = 0; + + /* + * Make sure the members are not somewhere outside + * of allowed address space. + */ +#define __prctl_check_addr_space(__map, __member) \ + ({ \ + int __rc; \ + if ((unsigned long)__map->__member < mmap_max_addr && \ + (unsigned long)__map->__member >= mmap_min_addr) \ + __rc = 0; \ + else \ + __rc = -EINVAL; \ + __rc; \ + }) + + error |= __prctl_check_addr_space(prctl_map, start_code); + error |= __prctl_check_addr_space(prctl_map, end_code); + error |= __prctl_check_addr_space(prctl_map, start_data); + error |= __prctl_check_addr_space(prctl_map, end_data); + error |= __prctl_check_addr_space(prctl_map, start_stack); + error |= __prctl_check_addr_space(prctl_map, start_brk); + error |= __prctl_check_addr_space(prctl_map, brk); + error |= __prctl_check_addr_space(prctl_map, arg_start); + error |= __prctl_check_addr_space(prctl_map, arg_end); + error |= __prctl_check_addr_space(prctl_map, env_start); + error |= __prctl_check_addr_space(prctl_map, env_end); + if (error) + goto out; +#undef __prctl_check_addr_space + + /* + * Stack, brk, command line arguments and environment must exist. + */ + stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack); + if (!stack_vma) { + error = -EINVAL; + goto out; + } +#define __prctl_check_vma(mm, addr) find_vma(mm, (unsigned long)addr) ? 0 : -EINVAL + error |= __prctl_check_vma(mm, prctl_map->start_brk); + error |= __prctl_check_vma(mm, prctl_map->brk); + error |= __prctl_check_vma(mm, prctl_map->arg_start); + error |= __prctl_check_vma(mm, prctl_map->arg_end); + error |= __prctl_check_vma(mm, prctl_map->env_start); + error |= __prctl_check_vma(mm, prctl_map->env_end); + if (error) + goto out; +#undef __prctl_check_vma + + /* + * Make sure the pairs are ordered. + */ +#define __prctl_check_order(__map, __m1, __m2) \ + (unsigned long)__map->__m2 <= (unsigned long)__map->__m1 + if (__prctl_check_order(prctl_map, start_code, end_code) || + __prctl_check_order(prctl_map, start_data, end_data) || + __prctl_check_order(prctl_map, arg_start, arg_end) || + __prctl_check_order(prctl_map, env_start, env_end)) + goto out; +#undef __prctl_check_order + + error = -EINVAL; + + /* + * @brk should be after @end_data in traditional maps. + */ + if (prctl_map->start_brk <= prctl_map->end_data || + prctl_map->brk <= prctl_map->end_data) + goto out; + + /* + * Neither we should allow to override limits if they set. + */ + rlim = rlimit(RLIMIT_DATA); + if (rlim < RLIM_INFINITY) { + if ((prctl_map->brk - prctl_map->start_brk) + + (prctl_map->end_data - prctl_map->start_data) > rlim) + goto out; + } + + rlim = rlimit(RLIMIT_STACK); + if (rlim < RLIM_INFINITY) { +#ifdef CONFIG_STACK_GROWSUP + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; +#else + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; +#endif + if (left > rlim) + goto out; + } + + /* + * Someone is trying to cheat the auxv vector. + */ + if (prctl_map->auxv && prctl_map->auxv_size > sizeof(mm->saved_auxv)) + goto out; + error = 0; +out: + return error; +} + +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) +{ + struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; + unsigned long user_auxv[AT_VECTOR_SIZE]; + struct mm_struct *mm = current->mm; + int error = -EINVAL; + + BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); + + if (opt == PR_SET_MM_MAP_SIZE) + return put_user((unsigned int)sizeof(prctl_map), + (unsigned int __user *)addr); + + if (data_size != sizeof(prctl_map)) + return -EINVAL; + + if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) + return -EFAULT; + + down_read(&mm->mmap_sem); + + if (validate_prctl_map_locked(&prctl_map)) + goto out; + + if (prctl_map.auxv && prctl_map.auxv_size) { + up_read(&mm->mmap_sem); + memset(user_auxv, 0, sizeof(user_auxv)); + error = copy_from_user(user_auxv, + (const void __user *)prctl_map.auxv, + prctl_map.auxv_size); + down_read(&mm->mmap_sem); + if (error) + goto out; + } + + if (prctl_map.exe_fd != (u32)-1) { + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); + if (error) + goto out; + } + + if (prctl_map.auxv && prctl_map.auxv_size) { + user_auxv[AT_VECTOR_SIZE - 2] = 0; + user_auxv[AT_VECTOR_SIZE - 1] = 0; + + task_lock(current); + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); + task_unlock(current); + } + + mm->start_code = prctl_map.start_code; + mm->end_code = prctl_map.end_code; + mm->start_data = prctl_map.start_data; + mm->end_data = prctl_map.end_data; + mm->start_brk = prctl_map.start_brk; + mm->brk = prctl_map.brk; + mm->start_stack = prctl_map.start_stack; + mm->arg_start = prctl_map.arg_start; + mm->arg_end = prctl_map.arg_end; + mm->env_start = prctl_map.env_start; + mm->env_end = prctl_map.env_end; + + error = 0; +out: + up_read(&mm->mmap_sem); + return error; +} +#endif /* CONFIG_CHECKPOINT_RESTORE */ + static int prctl_set_mm(int opt, unsigned long addr, unsigned long arg4, unsigned long arg5) { @@ -1695,9 +1878,16 @@ static int prctl_set_mm(int opt, unsigne struct vm_area_struct *vma; int error; - if (arg5 || (arg4 && opt != PR_SET_MM_AUXV)) + if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && + opt != PR_SET_MM_MAP && + opt != PR_SET_MM_MAP_SIZE))) return -EINVAL; +#ifdef CONFIG_CHECKPOINT_RESTORE + if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) + return prctl_set_mm_map(opt, (const void __user *)addr, arg4); +#endif + if (!capable(CAP_SYS_RESOURCE)) return -EPERM; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-09 14:13 ` Cyrill Gorcunov @ 2014-07-09 14:53 ` Kees Cook 2014-07-09 15:06 ` Cyrill Gorcunov 0 siblings, 1 reply; 19+ messages in thread From: Kees Cook @ 2014-07-09 14:53 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Wed, Jul 9, 2014 at 7:13 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On Wed, Jul 09, 2014 at 02:13:36AM +0400, Cyrill Gorcunov wrote: >> >> Still the good news about all this members we modify -- they are used >> for statistics mostly except brk/stack related members but they >> are checked very carefully to not exceed the limits (if the >> limits are set). >> >> > struct prctl_mm_map could do with a nice comment explaining its role in >> > the world. >> >> ok, i'll update >> >> > I'm not seeing a coherent description of the proposed userspace >> > interface. We'll eventually want to update the prctl manpage for this, >> > so how about laying out all the needed details now, at patch review >> > time so we can see what is proposed. >> >> Sure, I'll write more descriptive comment since original "It takes >> a pointer of prctl_mm_map structure which carries all members to be >> updated" is too short. > > Here is a way more descriptove changelog I hope. Please poke me if > more details needed, or something should be improved/changed and > etc. > --- > From: Cyrill Gorcunov <gorcunov@openvz.org> > Subject: prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation > > During development of c/r we've noticed that in case if we need to > support user namespaces we face a problem with capabilities in > prctl(PR_SET_MM, ...) call, in particular once new user namespace > is created capable(CAP_SYS_RESOURCE) no longer passes. > > A approach is to eliminate CAP_SYS_RESOURCE check but pass all > new values in one bundle, which would allow the kernel to make > more intensive test for sanity of values and same time allow us to > support checkpoint/restore of user namespaces. > > Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of > prctl_mm_map structure which carries all the members to be updated. > > prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size) > > struct prctl_mm_map { > __u64 start_code; > __u64 end_code; > __u64 start_data; > __u64 end_data; > __u64 start_brk; > __u64 brk; > __u64 start_stack; > __u64 arg_start; > __u64 arg_end; > __u64 env_start; > __u64 env_end; > __u64 *auxv; > __u32 auxv_size; > __u32 exe_fd; > }; > > All members except @exe_fd correspond ones of struct mm_struct. > To figure out which available values these members may take here > are meanings of the members. > > - start_code, end_code: represent bounds of executable code area > - start_data, end_data: represent bounds of data area > - start_brk, brk: used to calculate bounds for brk() syscall > - start_stack: used when accounting space needed for command > line arguments, environment and shmat() syscall > - arg_start, arg_end, env_start, env_end: represent memory area > supplied for command line arguments and environment variables > - auxv, auxv_size: carries auxiliary vector, Elf format specifics > - exe_fd: file descriptor number for executable link (/proc/self/exe) > > Thus we apply the following requirements to the values > > 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address > in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr) > interval. > > 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting > VMAs (say a program maps own new .text and .data segments during execution) > the rest of members should belong to VMA which must exist. > > 3) Addresses must be ordered, ie @start_ member must not be greater or > equal to appropriate @end_ member. > > 4) As in regular Elf loading procedure we require that @start_brk and > @brk be greater than @end_data. > > 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not > exceed existing limit. Same applies to RLIMIT_STACK. > > 6) Auxiliary vector size must not exceed existing one (which is > predefined as AT_VECTOR_SIZE and depends on architecture). > > 7) File descriptor passed in @exe_file should be pointing > to executable file (because we use existing prctl_set_mm_exe_file_locked > helper it ensures that the file we are going to use as exe link has all > required permission granted). > > Now about where these members are involved inside kernel code: > > - @start_code and @end_code are used in /proc/$pid/[stat|statm] output; > > - @start_data and @end_data are used in /proc/$pid/[stat|statm] output, > also they are considered if there enough space for brk() syscall > result if RLIMIT_DATA is set; > > - @start_brk shown in /proc/$pid/stat output and accounted in brk() > syscall if RLIMIT_DATA is set; also this member is tested to > find a symbolic name of mmap event for perf system (we choose > if event is generated for "heap" area); one more aplication is > selinux -- we test if a process has PROCESS__EXECHEAP permission > if trying to make heap area being executable with mprotect() syscall; > > - @brk is a current value for brk() syscall which lays inside heap > area, it's shown in /proc/$pid/stat. When syscall brk() succesfully > provides new memory area to a user space upon brk() completion the > mm::brk is updated to carry new value; > > Both @start_brk and @brk are actively used in /proc/$pid/maps > and /proc/$pid/smaps output to find a symbolic name "heap" for > VMA being scanned; > > - @start_stack is printed out in /proc/$pid/stat and used to > find a symbolic name "stack" for task and threads in > /proc/$pid/maps and /proc/$pid/smaps output, and as the same > as with @start_brk -- perf system uses it for event naming. > Also kernel treat this member as a start address of where > to map vDSO pages and to check if there is enough space > for shmat() syscall; > > - @arg_start, @arg_end, @env_start and @env_end are printed out > in /proc/$pid/stat. Another access to the data these members > represent is to read /proc/$pid/environ or /proc/$pid/cmdline. > Any attempt to read these areas kernel tests with access_process_vm > helper so a user must have enough rights for this action; > > - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly > speaking kernel doesn't care much about which exactly data is > sitting there because it is solely for userspace; > > - @exe_fd is referred from /proc/$pid/exe and when generating > coredump. We uses prctl_set_mm_exe_file_locked helper to update > this member, so exe-file link modification remains one-shot > action. > > Still note that updating exe-file link now doesn't require sys-resource > capability anymore, after all there is no much profit in preventing setup > own file link (there are a number of ways to execute own code -- ptrace, > ld-preload, so that the only reliable way to find which exactly code > is executed is to inspect running program memory). > > I believe the old interface should be deprecated and ripped off > in a couple of kernel releases if no one against. > > To test if new interface is implemented in the kernel one > can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns > the size of currently supported struct prctl_mm_map. > > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andrew Vagin <avagin@openvz.org> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: H. Peter Anvin <hpa@zytor.com> > Cc: Serge Hallyn <serge.hallyn@canonical.com> > Cc: Pavel Emelyanov <xemul@parallels.com> > Cc: Vasiliy Kulikov <segoon@openwall.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: Michael Kerrisk <mtk.manpages@gmail.com> > --- > include/uapi/linux/prctl.h | 25 +++++ > kernel/sys.c | 192 ++++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 216 insertions(+), 1 deletion(-) > > Index: linux-2.6.git/include/uapi/linux/prctl.h > =================================================================== > --- linux-2.6.git.orig/include/uapi/linux/prctl.h > +++ linux-2.6.git/include/uapi/linux/prctl.h > @@ -119,6 +119,31 @@ > # define PR_SET_MM_ENV_END 11 > # define PR_SET_MM_AUXV 12 > # define PR_SET_MM_EXE_FILE 13 > +# define PR_SET_MM_MAP 14 > +# define PR_SET_MM_MAP_SIZE 15 > + > +/* > + * This structure provides new memory descriptor > + * map which mostly modifies /proc/pid/stat[m] > + * output for a task. This mostly done in a > + * sake of checkpoint/restore functionality. > + */ > +struct prctl_mm_map { > + __u64 start_code; /* code section bounds */ > + __u64 end_code; > + __u64 start_data; /* data section bounds */ > + __u64 end_data; > + __u64 start_brk; /* heap for brk() syscall */ > + __u64 brk; > + __u64 start_stack; /* stack starts at */ > + __u64 arg_start; /* command line arguments bounds */ > + __u64 arg_end; > + __u64 env_start; /* environment variables bounds */ > + __u64 env_end; > + __u64 *auxv; /* auxiliary vector */ > + __u32 auxv_size; /* vector size */ > + __u32 exe_fd; /* /proc/$pid/exe link file */ > +}; > > /* > * Set specific pid that is allowed to ptrace the current task. > Index: linux-2.6.git/kernel/sys.c > =================================================================== > --- linux-2.6.git.orig/kernel/sys.c > +++ linux-2.6.git/kernel/sys.c > @@ -1687,6 +1687,189 @@ exit: > return err; > } > > +#ifdef CONFIG_CHECKPOINT_RESTORE > +/* > + * WARNING: we don't require any capability here so be very careful > + * in what is allowed for modification from userspace. > + */ > +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) > +{ > + unsigned long mmap_max_addr = TASK_SIZE; > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *stack_vma; > + unsigned long rlim; > + int error = 0; > + > + /* > + * Make sure the members are not somewhere outside > + * of allowed address space. > + */ > +#define __prctl_check_addr_space(__map, __member) \ > + ({ \ > + int __rc; \ > + if ((unsigned long)__map->__member < mmap_max_addr && \ > + (unsigned long)__map->__member >= mmap_min_addr) \ > + __rc = 0; \ > + else \ > + __rc = -EINVAL; \ > + __rc; \ > + }) > + > + error |= __prctl_check_addr_space(prctl_map, start_code); > + error |= __prctl_check_addr_space(prctl_map, end_code); > + error |= __prctl_check_addr_space(prctl_map, start_data); > + error |= __prctl_check_addr_space(prctl_map, end_data); > + error |= __prctl_check_addr_space(prctl_map, start_stack); > + error |= __prctl_check_addr_space(prctl_map, start_brk); > + error |= __prctl_check_addr_space(prctl_map, brk); > + error |= __prctl_check_addr_space(prctl_map, arg_start); > + error |= __prctl_check_addr_space(prctl_map, arg_end); > + error |= __prctl_check_addr_space(prctl_map, env_start); > + error |= __prctl_check_addr_space(prctl_map, env_end); > + if (error) > + goto out; > +#undef __prctl_check_addr_space > + > + /* > + * Stack, brk, command line arguments and environment must exist. > + */ > + stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack); > + if (!stack_vma) { > + error = -EINVAL; > + goto out; > + } > +#define __prctl_check_vma(mm, addr) find_vma(mm, (unsigned long)addr) ? 0 : -EINVAL > + error |= __prctl_check_vma(mm, prctl_map->start_brk); > + error |= __prctl_check_vma(mm, prctl_map->brk); > + error |= __prctl_check_vma(mm, prctl_map->arg_start); > + error |= __prctl_check_vma(mm, prctl_map->arg_end); > + error |= __prctl_check_vma(mm, prctl_map->env_start); > + error |= __prctl_check_vma(mm, prctl_map->env_end); > + if (error) > + goto out; > +#undef __prctl_check_vma > + > + /* > + * Make sure the pairs are ordered. > + */ > +#define __prctl_check_order(__map, __m1, __m2) \ > + (unsigned long)__map->__m2 <= (unsigned long)__map->__m1 > + if (__prctl_check_order(prctl_map, start_code, end_code) || > + __prctl_check_order(prctl_map, start_data, end_data) || > + __prctl_check_order(prctl_map, arg_start, arg_end) || > + __prctl_check_order(prctl_map, env_start, env_end)) > + goto out; > +#undef __prctl_check_order This approach seems like a good solution given the security concerns with the earlier approach. I'm still pondering the implications, but as a minor style note, these macros are locally defined, but also all take at least a single identical argument in every usage. I would think it might be easier to read if they just used what they needed to directly. #define __prctl_check_addr_space(__member) \ ((unsigned long)prctl_map->__member < mmap_max_addr && \ (unsigned long)prctl_map->__member >= mmap_min_addr) ? 0 : -EINVAL #define __prctl_check_vma(__member) \ find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL Also, why change the symantics of the final macro? Seems like that one can use the same "error |=" style: #define __prctl_check_order(__m1, __m2) \ prctl_map->__m1 < prctl_map->__m2 ? 0 : -EINVAL -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-09 14:53 ` Kees Cook @ 2014-07-09 15:06 ` Cyrill Gorcunov 2014-07-11 17:36 ` Cyrill Gorcunov 0 siblings, 1 reply; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-09 15:06 UTC (permalink / raw) To: Kees Cook Cc: Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Wed, Jul 09, 2014 at 07:53:10AM -0700, Kees Cook wrote: ... > > + > > + /* > > + * Make sure the pairs are ordered. > > + */ > > +#define __prctl_check_order(__map, __m1, __m2) \ > > + (unsigned long)__map->__m2 <= (unsigned long)__map->__m1 > > + if (__prctl_check_order(prctl_map, start_code, end_code) || > > + __prctl_check_order(prctl_map, start_data, end_data) || > > + __prctl_check_order(prctl_map, arg_start, arg_end) || > > + __prctl_check_order(prctl_map, env_start, env_end)) > > + goto out; > > +#undef __prctl_check_order > > This approach seems like a good solution given the security concerns > with the earlier approach. I'm still pondering the implications, but > as a minor style note, these macros are locally defined, but also all > take at least a single identical argument in every usage. I would > think it might be easier to read if they just used what they needed to > directly. > > #define __prctl_check_addr_space(__member) \ > ((unsigned long)prctl_map->__member < mmap_max_addr && \ > (unsigned long)prctl_map->__member >= mmap_min_addr) ? 0 : -EINVAL > > #define __prctl_check_vma(__member) \ > find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL > > Also, why change the symantics of the final macro? Seems like that one > can use the same "error |=" style: > > #define __prctl_check_order(__m1, __m2) \ > prctl_map->__m1 < prctl_map->__m2 ? 0 : -EINVAL Thanks a lot for comments, Kees! I tend to agre, leaving off the @prctl_map variable out of macros should make code also shorter, I'll update that's not the problem. Could you please re-check if I'm not missing something in security aspects when time permits. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-09 15:06 ` Cyrill Gorcunov @ 2014-07-11 17:36 ` Cyrill Gorcunov 2014-07-22 20:07 ` Kees Cook 0 siblings, 1 reply; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-11 17:36 UTC (permalink / raw) To: Kees Cook Cc: Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk On Wed, Jul 09, 2014 at 07:06:04PM +0400, Cyrill Gorcunov wrote: > > Thanks a lot for comments, Kees! I tend to agre, leaving off the @prctl_map > variable out of macros should make code also shorter, I'll update that's > not the problem. Could you please re-check if I'm not missing something > in security aspects when time permits. I suppse this one should look better. --- From: Cyrill Gorcunov <gorcunov@openvz.org> Subject: prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v2 During development of c/r we've noticed that in case if we need to support user namespaces we face a problem with capabilities in prctl(PR_SET_MM, ...) call, in particular once new user namespace is created capable(CAP_SYS_RESOURCE) no longer passes. A approach is to eliminate CAP_SYS_RESOURCE check but pass all new values in one bundle, which would allow the kernel to make more intensive test for sanity of values and same time allow us to support checkpoint/restore of user namespaces. Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of prctl_mm_map structure which carries all the members to be updated. prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size) struct prctl_mm_map { __u64 start_code; __u64 end_code; __u64 start_data; __u64 end_data; __u64 start_brk; __u64 brk; __u64 start_stack; __u64 arg_start; __u64 arg_end; __u64 env_start; __u64 env_end; __u64 *auxv; __u32 auxv_size; __u32 exe_fd; }; All members except @exe_fd correspond ones of struct mm_struct. To figure out which available values these members may take here are meanings of the members. - start_code, end_code: represent bounds of executable code area - start_data, end_data: represent bounds of data area - start_brk, brk: used to calculate bounds for brk() syscall - start_stack: used when accounting space needed for command line arguments, environment and shmat() syscall - arg_start, arg_end, env_start, env_end: represent memory area supplied for command line arguments and environment variables - auxv, auxv_size: carries auxiliary vector, Elf format specifics - exe_fd: file descriptor number for executable link (/proc/self/exe) Thus we apply the following requirements to the values 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr) interval. 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting VMAs (say a program maps own new .text and .data segments during execution) the rest of members should belong to VMA which must exist. 3) Addresses must be ordered, ie @start_ member must not be greater or equal to appropriate @end_ member. 4) As in regular Elf loading procedure we require that @start_brk and @brk be greater than @end_data. 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not exceed existing limit. Same applies to RLIMIT_STACK. 6) Auxiliary vector size must not exceed existing one (which is predefined as AT_VECTOR_SIZE and depends on architecture). 7) File descriptor passed in @exe_file should be pointing to executable file (because we use existing prctl_set_mm_exe_file_locked helper it ensures that the file we are going to use as exe link has all required permission granted). Now about where these members are involved inside kernel code: - @start_code and @end_code are used in /proc/$pid/[stat|statm] output; - @start_data and @end_data are used in /proc/$pid/[stat|statm] output, also they are considered if there enough space for brk() syscall result if RLIMIT_DATA is set; - @start_brk shown in /proc/$pid/stat output and accounted in brk() syscall if RLIMIT_DATA is set; also this member is tested to find a symbolic name of mmap event for perf system (we choose if event is generated for "heap" area); one more aplication is selinux -- we test if a process has PROCESS__EXECHEAP permission if trying to make heap area being executable with mprotect() syscall; - @brk is a current value for brk() syscall which lays inside heap area, it's shown in /proc/$pid/stat. When syscall brk() succesfully provides new memory area to a user space upon brk() completion the mm::brk is updated to carry new value; Both @start_brk and @brk are actively used in /proc/$pid/maps and /proc/$pid/smaps output to find a symbolic name "heap" for VMA being scanned; - @start_stack is printed out in /proc/$pid/stat and used to find a symbolic name "stack" for task and threads in /proc/$pid/maps and /proc/$pid/smaps output, and as the same as with @start_brk -- perf system uses it for event naming. Also kernel treat this member as a start address of where to map vDSO pages and to check if there is enough space for shmat() syscall; - @arg_start, @arg_end, @env_start and @env_end are printed out in /proc/$pid/stat. Another access to the data these members represent is to read /proc/$pid/environ or /proc/$pid/cmdline. Any attempt to read these areas kernel tests with access_process_vm helper so a user must have enough rights for this action; - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly speaking kernel doesn't care much about which exactly data is sitting there because it is solely for userspace; - @exe_fd is referred from /proc/$pid/exe and when generating coredump. We uses prctl_set_mm_exe_file_locked helper to update this member, so exe-file link modification remains one-shot action. Still note that updating exe-file link now doesn't require sys-resource capability anymore, after all there is no much profit in preventing setup own file link (there are a number of ways to execute own code -- ptrace, ld-preload, so that the only reliable way to find which exactly code is executed is to inspect running program memory). I believe the old interface should be deprecated and ripped off in a couple of kernel releases if no one against. To test if new interface is implemented in the kernel one can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently supported struct prctl_mm_map. v2: - compact macros (by keescook@) - wrap new code with CONFIG_ (by akpm@) Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> --- include/uapi/linux/prctl.h | 25 +++++ kernel/sys.c | 194 ++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 218 insertions(+), 1 deletion(-) Index: linux-2.6.git/include/uapi/linux/prctl.h =================================================================== --- linux-2.6.git.orig/include/uapi/linux/prctl.h +++ linux-2.6.git/include/uapi/linux/prctl.h @@ -119,6 +119,31 @@ # define PR_SET_MM_ENV_END 11 # define PR_SET_MM_AUXV 12 # define PR_SET_MM_EXE_FILE 13 +# define PR_SET_MM_MAP 14 +# define PR_SET_MM_MAP_SIZE 15 + +/* + * This structure provides new memory descriptor + * map which mostly modifies /proc/pid/stat[m] + * output for a task. This mostly done in a + * sake of checkpoint/restore functionality. + */ +struct prctl_mm_map { + __u64 start_code; /* code section bounds */ + __u64 end_code; + __u64 start_data; /* data section bounds */ + __u64 end_data; + __u64 start_brk; /* heap for brk() syscall */ + __u64 brk; + __u64 start_stack; /* stack starts at */ + __u64 arg_start; /* command line arguments bounds */ + __u64 arg_end; + __u64 env_start; /* environment variables bounds */ + __u64 env_end; + __u64 *auxv; /* auxiliary vector */ + __u32 auxv_size; /* vector size */ + __u32 exe_fd; /* /proc/$pid/exe link file */ +}; /* * Set specific pid that is allowed to ptrace the current task. Index: linux-2.6.git/kernel/sys.c =================================================================== --- linux-2.6.git.orig/kernel/sys.c +++ linux-2.6.git/kernel/sys.c @@ -1687,6 +1687,191 @@ exit: return err; } +#ifdef CONFIG_CHECKPOINT_RESTORE +/* + * WARNING: we don't require any capability here so be very careful + * in what is allowed for modification from userspace. + */ +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) +{ + unsigned long mmap_max_addr = TASK_SIZE; + struct mm_struct *mm = current->mm; + struct vm_area_struct *stack_vma; + unsigned long rlim; + int error = 0; + + /* + * Make sure the members are not somewhere outside + * of allowed address space. + */ +#define __prctl_check_addr_space(__member) \ + ({ \ + int __rc; \ + if ((unsigned long)prctl_map->__member < mmap_max_addr && \ + (unsigned long)prctl_map->__member >= mmap_min_addr) \ + __rc = 0; \ + else \ + __rc = -EINVAL; \ + __rc; \ + }) + error |= __prctl_check_addr_space(start_code); + error |= __prctl_check_addr_space(end_code); + error |= __prctl_check_addr_space(start_data); + error |= __prctl_check_addr_space(end_data); + error |= __prctl_check_addr_space(start_stack); + error |= __prctl_check_addr_space(start_brk); + error |= __prctl_check_addr_space(brk); + error |= __prctl_check_addr_space(arg_start); + error |= __prctl_check_addr_space(arg_end); + error |= __prctl_check_addr_space(env_start); + error |= __prctl_check_addr_space(env_end); + if (error) + goto out; +#undef __prctl_check_addr_space + + /* + * Stack, brk, command line arguments and environment must exist. + */ + stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack); + if (!stack_vma) { + error = -EINVAL; + goto out; + } +#define __prctl_check_vma(__member) \ + find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL + error |= __prctl_check_vma(start_brk); + error |= __prctl_check_vma(brk); + error |= __prctl_check_vma(arg_start); + error |= __prctl_check_vma(arg_end); + error |= __prctl_check_vma(env_start); + error |= __prctl_check_vma(env_end); + if (error) + goto out; +#undef __prctl_check_vma + + /* + * Make sure the pairs are ordered. + */ +#define __prctl_check_order(__m1, __m2) \ + ((unsigned long)prctl_map->__m2 > \ + (unsigned long)prctl_map->__m1) ? 0 : -EINVAL + error |= __prctl_check_order(start_code, end_code); + error |= __prctl_check_order(start_data, end_data); + error |= __prctl_check_order(arg_start, arg_end); + error |= __prctl_check_order(env_start, env_end); + if (error) + goto out; +#undef __prctl_check_order + + error = -EINVAL; + + /* + * @brk should be after @end_data in traditional maps. + */ + if (prctl_map->start_brk <= prctl_map->end_data || + prctl_map->brk <= prctl_map->end_data) + goto out; + + /* + * Neither we should allow to override limits if they set. + */ + rlim = rlimit(RLIMIT_DATA); + if (rlim < RLIM_INFINITY) { + if ((prctl_map->brk - prctl_map->start_brk) + + (prctl_map->end_data - prctl_map->start_data) > rlim) + goto out; + } + + rlim = rlimit(RLIMIT_STACK); + if (rlim < RLIM_INFINITY) { +#ifdef CONFIG_STACK_GROWSUP + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; +#else + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; +#endif + if (left > rlim) + goto out; + } + + /* + * Someone is trying to cheat the auxv vector. + */ + if (prctl_map->auxv && prctl_map->auxv_size > sizeof(mm->saved_auxv)) + goto out; + error = 0; +out: + return error; +} + +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) +{ + struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; + unsigned long user_auxv[AT_VECTOR_SIZE]; + struct mm_struct *mm = current->mm; + int error = -EINVAL; + + BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); + + if (opt == PR_SET_MM_MAP_SIZE) + return put_user((unsigned int)sizeof(prctl_map), + (unsigned int __user *)addr); + + if (data_size != sizeof(prctl_map)) + return -EINVAL; + + if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) + return -EFAULT; + + down_read(&mm->mmap_sem); + + if (validate_prctl_map_locked(&prctl_map)) + goto out; + + if (prctl_map.auxv && prctl_map.auxv_size) { + up_read(&mm->mmap_sem); + memset(user_auxv, 0, sizeof(user_auxv)); + error = copy_from_user(user_auxv, + (const void __user *)prctl_map.auxv, + prctl_map.auxv_size); + down_read(&mm->mmap_sem); + if (error) + goto out; + } + + if (prctl_map.exe_fd != (u32)-1) { + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); + if (error) + goto out; + } + + if (prctl_map.auxv && prctl_map.auxv_size) { + user_auxv[AT_VECTOR_SIZE - 2] = 0; + user_auxv[AT_VECTOR_SIZE - 1] = 0; + + task_lock(current); + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); + task_unlock(current); + } + + mm->start_code = prctl_map.start_code; + mm->end_code = prctl_map.end_code; + mm->start_data = prctl_map.start_data; + mm->end_data = prctl_map.end_data; + mm->start_brk = prctl_map.start_brk; + mm->brk = prctl_map.brk; + mm->start_stack = prctl_map.start_stack; + mm->arg_start = prctl_map.arg_start; + mm->arg_end = prctl_map.arg_end; + mm->env_start = prctl_map.env_start; + mm->env_end = prctl_map.env_end; + + error = 0; +out: + up_read(&mm->mmap_sem); + return error; +} +#endif /* CONFIG_CHECKPOINT_RESTORE */ + static int prctl_set_mm(int opt, unsigned long addr, unsigned long arg4, unsigned long arg5) { @@ -1695,9 +1880,16 @@ static int prctl_set_mm(int opt, unsigne struct vm_area_struct *vma; int error; - if (arg5 || (arg4 && opt != PR_SET_MM_AUXV)) + if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && + opt != PR_SET_MM_MAP && + opt != PR_SET_MM_MAP_SIZE))) return -EINVAL; +#ifdef CONFIG_CHECKPOINT_RESTORE + if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) + return prctl_set_mm_map(opt, (const void __user *)addr, arg4); +#endif + if (!capable(CAP_SYS_RESOURCE)) return -EPERM; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-11 17:36 ` Cyrill Gorcunov @ 2014-07-22 20:07 ` Kees Cook 2014-07-22 20:36 ` Cyrill Gorcunov 2014-07-24 13:48 ` Andrew Vagin 0 siblings, 2 replies; 19+ messages in thread From: Kees Cook @ 2014-07-22 20:07 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Fri, Jul 11, 2014 at 10:36 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On Wed, Jul 09, 2014 at 07:06:04PM +0400, Cyrill Gorcunov wrote: >> >> Thanks a lot for comments, Kees! I tend to agre, leaving off the @prctl_map >> variable out of macros should make code also shorter, I'll update that's >> not the problem. Could you please re-check if I'm not missing something >> in security aspects when time permits. I asked Julien (now in CC) into look at this with me, and he had several comments that I've paraphrased/expanded on below... > > I suppse this one should look better. > --- > From: Cyrill Gorcunov <gorcunov@openvz.org> > Subject: prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation, v2 > > During development of c/r we've noticed that in case if we need to > support user namespaces we face a problem with capabilities in > prctl(PR_SET_MM, ...) call, in particular once new user namespace > is created capable(CAP_SYS_RESOURCE) no longer passes. > > A approach is to eliminate CAP_SYS_RESOURCE check but pass all > new values in one bundle, which would allow the kernel to make > more intensive test for sanity of values and same time allow us to > support checkpoint/restore of user namespaces. > > Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of > prctl_mm_map structure which carries all the members to be updated. > > prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size) > > struct prctl_mm_map { > __u64 start_code; > __u64 end_code; > __u64 start_data; > __u64 end_data; > __u64 start_brk; > __u64 brk; > __u64 start_stack; > __u64 arg_start; > __u64 arg_end; > __u64 env_start; > __u64 env_end; > __u64 *auxv; > __u32 auxv_size; > __u32 exe_fd; > }; > > All members except @exe_fd correspond ones of struct mm_struct. > To figure out which available values these members may take here > are meanings of the members. > > - start_code, end_code: represent bounds of executable code area > - start_data, end_data: represent bounds of data area > - start_brk, brk: used to calculate bounds for brk() syscall > - start_stack: used when accounting space needed for command > line arguments, environment and shmat() syscall > - arg_start, arg_end, env_start, env_end: represent memory area > supplied for command line arguments and environment variables > - auxv, auxv_size: carries auxiliary vector, Elf format specifics > - exe_fd: file descriptor number for executable link (/proc/self/exe) > > Thus we apply the following requirements to the values > > 1) Any member except @auxv, @auxv_size, @exe_fd is rather an address > in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr) > interval. > > 2) While @[start|end]_code and @[start|end]_data may point to an nonexisting > VMAs (say a program maps own new .text and .data segments during execution) > the rest of members should belong to VMA which must exist. > > 3) Addresses must be ordered, ie @start_ member must not be greater or > equal to appropriate @end_ member. > > 4) As in regular Elf loading procedure we require that @start_brk and > @brk be greater than @end_data. > > 5) If RLIMIT_DATA rlimit is set to non-infinity new values should not > exceed existing limit. Same applies to RLIMIT_STACK. > > 6) Auxiliary vector size must not exceed existing one (which is > predefined as AT_VECTOR_SIZE and depends on architecture). > > 7) File descriptor passed in @exe_file should be pointing > to executable file (because we use existing prctl_set_mm_exe_file_locked > helper it ensures that the file we are going to use as exe link has all > required permission granted). > > Now about where these members are involved inside kernel code: > > - @start_code and @end_code are used in /proc/$pid/[stat|statm] output; > > - @start_data and @end_data are used in /proc/$pid/[stat|statm] output, > also they are considered if there enough space for brk() syscall > result if RLIMIT_DATA is set; > > - @start_brk shown in /proc/$pid/stat output and accounted in brk() > syscall if RLIMIT_DATA is set; also this member is tested to > find a symbolic name of mmap event for perf system (we choose > if event is generated for "heap" area); one more aplication is > selinux -- we test if a process has PROCESS__EXECHEAP permission > if trying to make heap area being executable with mprotect() syscall; > > - @brk is a current value for brk() syscall which lays inside heap > area, it's shown in /proc/$pid/stat. When syscall brk() succesfully > provides new memory area to a user space upon brk() completion the > mm::brk is updated to carry new value; > > Both @start_brk and @brk are actively used in /proc/$pid/maps > and /proc/$pid/smaps output to find a symbolic name "heap" for > VMA being scanned; > > - @start_stack is printed out in /proc/$pid/stat and used to > find a symbolic name "stack" for task and threads in > /proc/$pid/maps and /proc/$pid/smaps output, and as the same > as with @start_brk -- perf system uses it for event naming. > Also kernel treat this member as a start address of where > to map vDSO pages and to check if there is enough space > for shmat() syscall; > > - @arg_start, @arg_end, @env_start and @env_end are printed out > in /proc/$pid/stat. Another access to the data these members > represent is to read /proc/$pid/environ or /proc/$pid/cmdline. > Any attempt to read these areas kernel tests with access_process_vm > helper so a user must have enough rights for this action; > > - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly > speaking kernel doesn't care much about which exactly data is > sitting there because it is solely for userspace; > > - @exe_fd is referred from /proc/$pid/exe and when generating > coredump. We uses prctl_set_mm_exe_file_locked helper to update > this member, so exe-file link modification remains one-shot > action. Controlling exe_fd without privileges may turn out to be dangerous. At least things like tomoyo examine it for making policy decisions (see tomoyo_manager()). > > Still note that updating exe-file link now doesn't require sys-resource > capability anymore, after all there is no much profit in preventing setup > own file link (there are a number of ways to execute own code -- ptrace, > ld-preload, so that the only reliable way to find which exactly code > is executed is to inspect running program memory). > > I believe the old interface should be deprecated and ripped off > in a couple of kernel releases if no one against. > > To test if new interface is implemented in the kernel one > can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns > the size of currently supported struct prctl_mm_map. > > v2: > - compact macros (by keescook@) > - wrap new code with CONFIG_ (by akpm@) > > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andrew Vagin <avagin@openvz.org> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: H. Peter Anvin <hpa@zytor.com> > Cc: Serge Hallyn <serge.hallyn@canonical.com> > Cc: Pavel Emelyanov <xemul@parallels.com> > Cc: Vasiliy Kulikov <segoon@openwall.com> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > Cc: Michael Kerrisk <mtk.manpages@gmail.com> > --- > include/uapi/linux/prctl.h | 25 +++++ > kernel/sys.c | 194 ++++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 218 insertions(+), 1 deletion(-) > > Index: linux-2.6.git/include/uapi/linux/prctl.h > =================================================================== > --- linux-2.6.git.orig/include/uapi/linux/prctl.h > +++ linux-2.6.git/include/uapi/linux/prctl.h > @@ -119,6 +119,31 @@ > # define PR_SET_MM_ENV_END 11 > # define PR_SET_MM_AUXV 12 > # define PR_SET_MM_EXE_FILE 13 > +# define PR_SET_MM_MAP 14 > +# define PR_SET_MM_MAP_SIZE 15 > + > +/* > + * This structure provides new memory descriptor > + * map which mostly modifies /proc/pid/stat[m] > + * output for a task. This mostly done in a > + * sake of checkpoint/restore functionality. > + */ > +struct prctl_mm_map { > + __u64 start_code; /* code section bounds */ > + __u64 end_code; > + __u64 start_data; /* data section bounds */ > + __u64 end_data; > + __u64 start_brk; /* heap for brk() syscall */ > + __u64 brk; > + __u64 start_stack; /* stack starts at */ > + __u64 arg_start; /* command line arguments bounds */ > + __u64 arg_end; > + __u64 env_start; /* environment variables bounds */ > + __u64 env_end; > + __u64 *auxv; /* auxiliary vector */ > + __u32 auxv_size; /* vector size */ > + __u32 exe_fd; /* /proc/$pid/exe link file */ > +}; > > /* > * Set specific pid that is allowed to ptrace the current task. > Index: linux-2.6.git/kernel/sys.c > =================================================================== > --- linux-2.6.git.orig/kernel/sys.c > +++ linux-2.6.git/kernel/sys.c > @@ -1687,6 +1687,191 @@ exit: > return err; > } > > +#ifdef CONFIG_CHECKPOINT_RESTORE > +/* > + * WARNING: we don't require any capability here so be very careful > + * in what is allowed for modification from userspace. > + */ > +static int validate_prctl_map_locked(struct prctl_mm_map *prctl_map) > +{ > + unsigned long mmap_max_addr = TASK_SIZE; > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *stack_vma; > + unsigned long rlim; > + int error = 0; > + > + /* > + * Make sure the members are not somewhere outside > + * of allowed address space. > + */ > +#define __prctl_check_addr_space(__member) \ > + ({ \ > + int __rc; \ > + if ((unsigned long)prctl_map->__member < mmap_max_addr && \ > + (unsigned long)prctl_map->__member >= mmap_min_addr) \ > + __rc = 0; \ > + else \ > + __rc = -EINVAL; \ > + __rc; \ > + }) > + error |= __prctl_check_addr_space(start_code); > + error |= __prctl_check_addr_space(end_code); > + error |= __prctl_check_addr_space(start_data); > + error |= __prctl_check_addr_space(end_data); > + error |= __prctl_check_addr_space(start_stack); > + error |= __prctl_check_addr_space(start_brk); > + error |= __prctl_check_addr_space(brk); > + error |= __prctl_check_addr_space(arg_start); > + error |= __prctl_check_addr_space(arg_end); > + error |= __prctl_check_addr_space(env_start); > + error |= __prctl_check_addr_space(env_end); > + if (error) > + goto out; > +#undef __prctl_check_addr_space > + > + /* > + * Stack, brk, command line arguments and environment must exist. > + */ > + stack_vma = find_vma(mm, (unsigned long)prctl_map->start_stack); > + if (!stack_vma) { > + error = -EINVAL; > + goto out; > + } > +#define __prctl_check_vma(__member) \ > + find_vma(mm, (unsigned long)prctl_map->__member) ? 0 : -EINVAL > + error |= __prctl_check_vma(start_brk); > + error |= __prctl_check_vma(brk); > + error |= __prctl_check_vma(arg_start); > + error |= __prctl_check_vma(arg_end); > + error |= __prctl_check_vma(env_start); > + error |= __prctl_check_vma(env_end); > + if (error) > + goto out; > +#undef __prctl_check_vma > + > + /* > + * Make sure the pairs are ordered. > + */ > +#define __prctl_check_order(__m1, __m2) \ > + ((unsigned long)prctl_map->__m2 > \ > + (unsigned long)prctl_map->__m1) ? 0 : -EINVAL > + error |= __prctl_check_order(start_code, end_code); > + error |= __prctl_check_order(start_data, end_data); > + error |= __prctl_check_order(arg_start, arg_end); > + error |= __prctl_check_order(env_start, env_end); > + if (error) > + goto out; > +#undef __prctl_check_order > + > + error = -EINVAL; > + > + /* > + * @brk should be after @end_data in traditional maps. > + */ > + if (prctl_map->start_brk <= prctl_map->end_data || > + prctl_map->brk <= prctl_map->end_data) > + goto out; > + > + /* > + * Neither we should allow to override limits if they set. > + */ > + rlim = rlimit(RLIMIT_DATA); > + if (rlim < RLIM_INFINITY) { > + if ((prctl_map->brk - prctl_map->start_brk) + > + (prctl_map->end_data - prctl_map->start_data) > rlim) > + goto out; > + } I think this has an integer overflow in it. This could be avoided by checking brk vs start_brk with an additional __prctl_check_order call. This is done for start_data and end_data already. > + > + rlim = rlimit(RLIMIT_STACK); > + if (rlim < RLIM_INFINITY) { > +#ifdef CONFIG_STACK_GROWSUP > + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; > +#else > + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; > +#endif > + if (left > rlim) > + goto out; > + } There should be a __prctl_check_order for stack_start vs stack_vma->vm_end (and another in the stack growsdown case). > + > + /* > + * Someone is trying to cheat the auxv vector. > + */ > + if (prctl_map->auxv && prctl_map->auxv_size > sizeof(mm->saved_auxv)) > + goto out; > + error = 0; > +out: > + return error; > +} > + > +static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data_size) > +{ > + struct prctl_mm_map prctl_map = { .exe_fd = (u32)-1, }; > + unsigned long user_auxv[AT_VECTOR_SIZE]; > + struct mm_struct *mm = current->mm; > + int error = -EINVAL; > + > + BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv)); > + > + if (opt == PR_SET_MM_MAP_SIZE) > + return put_user((unsigned int)sizeof(prctl_map), > + (unsigned int __user *)addr); > + > + if (data_size != sizeof(prctl_map)) > + return -EINVAL; > + > + if (copy_from_user(&prctl_map, addr, sizeof(prctl_map))) > + return -EFAULT; > + > + down_read(&mm->mmap_sem); > + > + if (validate_prctl_map_locked(&prctl_map)) > + goto out; > + > + if (prctl_map.auxv && prctl_map.auxv_size) { > + up_read(&mm->mmap_sem); > + memset(user_auxv, 0, sizeof(user_auxv)); > + error = copy_from_user(user_auxv, > + (const void __user *)prctl_map.auxv, > + prctl_map.auxv_size); > + down_read(&mm->mmap_sem); > + if (error) > + goto out; > + } "prctl_map.auxv" should be removed from this if condition (i.e. make sure any auxv_size does, in fact, attempt to write to the .auxv location). > + > + if (prctl_map.exe_fd != (u32)-1) { > + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); > + if (error) > + goto out; > + } > + > + if (prctl_map.auxv && prctl_map.auxv_size) { > + user_auxv[AT_VECTOR_SIZE - 2] = 0; > + user_auxv[AT_VECTOR_SIZE - 1] = 0; > + > + task_lock(current); > + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); > + task_unlock(current); > + } This auxv if should probably be consolidated with the first one. And it may be worthwhile to mention this is making sure AT_NULL is at the end. > + > + mm->start_code = prctl_map.start_code; > + mm->end_code = prctl_map.end_code; > + mm->start_data = prctl_map.start_data; > + mm->end_data = prctl_map.end_data; > + mm->start_brk = prctl_map.start_brk; > + mm->brk = prctl_map.brk; > + mm->start_stack = prctl_map.start_stack; > + mm->arg_start = prctl_map.arg_start; > + mm->arg_end = prctl_map.arg_end; > + mm->env_start = prctl_map.env_start; > + mm->env_end = prctl_map.env_end; > + > + error = 0; > +out: > + up_read(&mm->mmap_sem); > + return error; > +} > +#endif /* CONFIG_CHECKPOINT_RESTORE */ > + > static int prctl_set_mm(int opt, unsigned long addr, > unsigned long arg4, unsigned long arg5) > { > @@ -1695,9 +1880,16 @@ static int prctl_set_mm(int opt, unsigne > struct vm_area_struct *vma; > int error; > > - if (arg5 || (arg4 && opt != PR_SET_MM_AUXV)) > + if (arg5 || (arg4 && (opt != PR_SET_MM_AUXV && > + opt != PR_SET_MM_MAP && > + opt != PR_SET_MM_MAP_SIZE))) > return -EINVAL; > > +#ifdef CONFIG_CHECKPOINT_RESTORE > + if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE) > + return prctl_set_mm_map(opt, (const void __user *)addr, arg4); > +#endif > + > if (!capable(CAP_SYS_RESOURCE)) > return -EPERM; > To avoid future errors, the rlimit checks should probably go into some common place, so that the same functions are called during rlimit checks when "classic" modification of fields such as ->brk happen (for instance in sys_brk). Thanks! -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-22 20:07 ` Kees Cook @ 2014-07-22 20:36 ` Cyrill Gorcunov 2014-07-24 13:48 ` Andrew Vagin 1 sibling, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-22 20:36 UTC (permalink / raw) To: Kees Cook Cc: Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Tue, Jul 22, 2014 at 01:07:51PM -0700, Kees Cook wrote: > On Fri, Jul 11, 2014 at 10:36 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > On Wed, Jul 09, 2014 at 07:06:04PM +0400, Cyrill Gorcunov wrote: > >> > >> Thanks a lot for comments, Kees! I tend to agre, leaving off the @prctl_map > >> variable out of macros should make code also shorter, I'll update that's > >> not the problem. Could you please re-check if I'm not missing something > >> in security aspects when time permits. > > I asked Julien (now in CC) into look at this with me, and he had > several comments that I've paraphrased/expanded on below... Thanks a huge, Kees! > > - @exe_fd is referred from /proc/$pid/exe and when generating > > coredump. We uses prctl_set_mm_exe_file_locked helper to update > > this member, so exe-file link modification remains one-shot > > action. > > Controlling exe_fd without privileges may turn out to be dangerous. At > least things like tomoyo examine it for making policy decisions (see > tomoyo_manager()). OK, so if we worry about this that much -- what if I bring some sysctl variable which would be able to turn this non-root functionality on and off? In criu we would turn it off when start restoring (with root privilegues of course) and the once restore is complete we turn it off back? Sounds reasonable? (I still personally think this @exe_fd is just a hint and as I mentioned if we have ptrace/preload rights there damn a lot of ways to inject own code into any program so that a user won't even notice ;) > > + /* > > + * Neither we should allow to override limits if they set. > > + */ > > + rlim = rlimit(RLIMIT_DATA); > > + if (rlim < RLIM_INFINITY) { > > + if ((prctl_map->brk - prctl_map->start_brk) + > > + (prctl_map->end_data - prctl_map->start_data) > rlim) > > + goto out; > > + } > > I think this has an integer overflow in it. This could be avoided by > checking brk vs start_brk with an additional __prctl_check_order call. > This is done for start_data and end_data already. Thanks, will update. > > + rlim = rlimit(RLIMIT_STACK); > > + if (rlim < RLIM_INFINITY) { > > +#ifdef CONFIG_STACK_GROWSUP > > + unsigned long left = stack_vma->vm_end - prctl_map->start_stack; > > +#else > > + unsigned long left = prctl_map->start_stack - stack_vma->vm_start; > > +#endif > > + if (left > rlim) > > + goto out; > > + } > > There should be a __prctl_check_order for stack_start vs > stack_vma->vm_end (and another in the stack growsdown case). Sure, thanks! > > + if (prctl_map.auxv && prctl_map.auxv_size) { > > + up_read(&mm->mmap_sem); > > + memset(user_auxv, 0, sizeof(user_auxv)); > > + error = copy_from_user(user_auxv, > > + (const void __user *)prctl_map.auxv, > > + prctl_map.auxv_size); > > + down_read(&mm->mmap_sem); > > + if (error) > > + goto out; > > + } > > "prctl_map.auxv" should be removed from this if condition (i.e. make > sure any auxv_size does, in fact, attempt to write to the .auxv > location). Hmm, why? Only having two variables valid we can be sure the copy_from_user is proper to call. You propose to make it as if (prctl_map.auxv_size) { ... copy_from_user(user_auxv, ? Or I misunderstand you? > > + > > + if (prctl_map.exe_fd != (u32)-1) { > > + error = prctl_set_mm_exe_file_locked(mm, prctl_map.exe_fd); > > + if (error) > > + goto out; > > + } > > + > > + if (prctl_map.auxv && prctl_map.auxv_size) { > > + user_auxv[AT_VECTOR_SIZE - 2] = 0; > > + user_auxv[AT_VECTOR_SIZE - 1] = 0; > > + > > + task_lock(current); > > + memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv)); > > + task_unlock(current); > > + } > > This auxv if should probably be consolidated with the first one. And > it may be worthwhile to mention this is making sure AT_NULL is at the > end. As to AT_NULL -- sure, I'll update (0 is the same as AT_NULL iirc, so I'm sorry to not making it explicit). As to consolidation -- no. Look, the whole idea is to modify real current->mm if and only if everything else won't fail so I splitted it as 1) validate_prctl_map_locked to make sure all members we're going to use are valid 2) copy auxv vector -- if we fail here, we can exit safely leaving current->mm completely untouched 3) setup new exe_fd, again if we fail here current->mm remains untouched 4) finally we can modify current->mm because no error can happen here > > + > > + mm->start_code = prctl_map.start_code; > > + mm->end_code = prctl_map.end_code; > > + mm->start_data = prctl_map.start_data; > > + mm->end_data = prctl_map.end_data; > > + mm->start_brk = prctl_map.start_brk; > > + mm->brk = prctl_map.brk; > > + mm->start_stack = prctl_map.start_stack; > > + mm->arg_start = prctl_map.arg_start; > > + mm->arg_end = prctl_map.arg_end; > > + mm->env_start = prctl_map.env_start; > > + mm->env_end = prctl_map.env_end; > > + > > + error = 0; > > +out: > > + up_read(&mm->mmap_sem); > > + return error; > > +} > > +#endif /* CONFIG_CHECKPOINT_RESTORE */ > > To avoid future errors, the rlimit checks should probably go into some > common place, so that the same functions are called during rlimit > checks when "classic" modification of fields such as ->brk happen (for > instance in sys_brk). OK, I'll take a look, this will require one more patch but I hope we're fine with that. Thanks a lot for comments! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-22 20:07 ` Kees Cook 2014-07-22 20:36 ` Cyrill Gorcunov @ 2014-07-24 13:48 ` Andrew Vagin 2014-07-24 16:42 ` Cyrill Gorcunov 2014-07-24 18:44 ` Kees Cook 1 sibling, 2 replies; 19+ messages in thread From: Andrew Vagin @ 2014-07-24 13:48 UTC (permalink / raw) To: Kees Cook Cc: Cyrill Gorcunov, Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Tue, Jul 22, 2014 at 01:07:51PM -0700, Kees Cook wrote: > > - @exe_fd is referred from /proc/$pid/exe and when generating > > coredump. We uses prctl_set_mm_exe_file_locked helper to update > > this member, so exe-file link modification remains one-shot > > action. > > Controlling exe_fd without privileges may turn out to be dangerous. At > least things like tomoyo examine it for making policy decisions (see > tomoyo_manager()). > We don't want to reduce security. How can we get a process with a target exe link, which executes our code? We can execute the target file and attach to it with ptrace. ptrace allows to inject and execute any code. So if we are sure that we are able to do a previous scenario, we can safely change exe-link, can't we? prctl already has a check of permissions to execute the target file. If we execute a file. What can prevent us to attach to the process with ptrace? The file can have a suid bit, so after executing it we may lose ability to attach to it. To check that we can check that uid and gid is zero in a current userns (local root). What else do we need to check? Thanks, Andrey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-24 13:48 ` Andrew Vagin @ 2014-07-24 16:42 ` Cyrill Gorcunov 2014-07-24 18:44 ` Kees Cook 1 sibling, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-24 16:42 UTC (permalink / raw) To: Andrew Vagin Cc: Kees Cook, Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Thu, Jul 24, 2014 at 05:48:28PM +0400, Andrew Vagin wrote: > On Tue, Jul 22, 2014 at 01:07:51PM -0700, Kees Cook wrote: > > > - @exe_fd is referred from /proc/$pid/exe and when generating > > > coredump. We uses prctl_set_mm_exe_file_locked helper to update > > > this member, so exe-file link modification remains one-shot > > > action. > > > > Controlling exe_fd without privileges may turn out to be dangerous. At > > least things like tomoyo examine it for making policy decisions (see > > tomoyo_manager()). > > > > We don't want to reduce security. How can we get a process with a > target exe link, which executes our code? > > We can execute the target file and attach to it with ptrace. ptrace > allows to inject and execute any code. > > So if we are sure that we are able to do a previous scenario, we can > safely change exe-link, can't we? > > prctl already has a check of permissions to execute the target file. > If we execute a file. What can prevent us to attach to the process with ptrace? > > The file can have a suid bit, so after executing it we may lose ability > to attach to it. To check that we can check that uid and gid is zero > in a current userns (local root). > > What else do we need to check? Good question. I suppose plain check for local root should be enough. Guys, I'm about to send a new series for review. Please take a look once time permit. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-24 13:48 ` Andrew Vagin 2014-07-24 16:42 ` Cyrill Gorcunov @ 2014-07-24 18:44 ` Kees Cook 2014-07-24 18:50 ` Cyrill Gorcunov 1 sibling, 1 reply; 19+ messages in thread From: Kees Cook @ 2014-07-24 18:44 UTC (permalink / raw) To: Andrew Vagin Cc: Cyrill Gorcunov, Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Thu, Jul 24, 2014 at 6:48 AM, Andrew Vagin <avagin@parallels.com> wrote: > On Tue, Jul 22, 2014 at 01:07:51PM -0700, Kees Cook wrote: >> > - @exe_fd is referred from /proc/$pid/exe and when generating >> > coredump. We uses prctl_set_mm_exe_file_locked helper to update >> > this member, so exe-file link modification remains one-shot >> > action. >> >> Controlling exe_fd without privileges may turn out to be dangerous. At >> least things like tomoyo examine it for making policy decisions (see >> tomoyo_manager()). >> > > We don't want to reduce security. How can we get a process with a > target exe link, which executes our code? > > We can execute the target file and attach to it with ptrace. ptrace > allows to inject and execute any code. > > So if we are sure that we are able to do a previous scenario, we can > safely change exe-link, can't we? > > prctl already has a check of permissions to execute the target file. > If we execute a file. What can prevent us to attach to the process with ptrace? > > The file can have a suid bit, so after executing it we may lose ability > to attach to it. To check that we can check that uid and gid is zero > in a current userns (local root). > > What else do we need to check? Yeah, I think all the checks are sufficient, but I (and Julien) are still trying to think about side-effects. It would be nice if these checks (like the rlimit checks) were merged into some common helper. That way if something changes in the exec path, it won't go missed in the c/r path. -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation 2014-07-24 18:44 ` Kees Cook @ 2014-07-24 18:50 ` Cyrill Gorcunov 0 siblings, 0 replies; 19+ messages in thread From: Cyrill Gorcunov @ 2014-07-24 18:50 UTC (permalink / raw) To: Kees Cook Cc: Andrew Vagin, Andrew Morton, LKML, Tejun Heo, Andrew Vagin, Eric W. Biederman, Serge Hallyn, Pavel Emelyanov, Vasiliy Kulikov, KAMEZAWA Hiroyuki, Michael Kerrisk, Julien Tinnes On Thu, Jul 24, 2014 at 11:44:50AM -0700, Kees Cook wrote: ... > > > > The file can have a suid bit, so after executing it we may lose ability > > to attach to it. To check that we can check that uid and gid is zero > > in a current userns (local root). > > > > What else do we need to check? > > Yeah, I think all the checks are sufficient, but I (and Julien) are > still trying to think about side-effects. > > It would be nice if these checks (like the rlimit checks) were merged > into some common helper. That way if something changes in the exec > path, it won't go missed in the c/r path. For rlimit I've done a separate helper in new rfc series, please take a look. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-07-24 18:50 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-03 14:33 [RFC 0/2] prctl: set-mm -- Rework interface Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 1/2] prctl: PR_SET_MM -- Factor out mmap_sem when update mm::exe_file Cyrill Gorcunov 2014-07-03 14:33 ` [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation Cyrill Gorcunov 2014-07-03 20:34 ` Cyrill Gorcunov 2014-07-04 7:52 ` Andrew Vagin 2014-07-04 8:11 ` Cyrill Gorcunov 2014-07-08 19:08 ` Cyrill Gorcunov 2014-07-08 21:38 ` Andrew Morton 2014-07-08 22:13 ` Cyrill Gorcunov 2014-07-09 14:13 ` Cyrill Gorcunov 2014-07-09 14:53 ` Kees Cook 2014-07-09 15:06 ` Cyrill Gorcunov 2014-07-11 17:36 ` Cyrill Gorcunov 2014-07-22 20:07 ` Kees Cook 2014-07-22 20:36 ` Cyrill Gorcunov 2014-07-24 13:48 ` Andrew Vagin 2014-07-24 16:42 ` Cyrill Gorcunov 2014-07-24 18:44 ` Kees Cook 2014-07-24 18:50 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).