* [PATCH 1/2] fs/exec: switch timens when a task gets a new mm @ 2022-09-21 0:31 Andrei Vagin 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Andrei Vagin @ 2022-09-21 0:31 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer From: Andrei Vagin <avagin@gmail.com> Changing a time namespace requires remapping a vvar page, so we don't want to allow doing that if any other tasks can use the same mm. Currently, we install a time namespace when a task is created with a new vm. exec() is another case when a task gets a new mm and so it can switch a time namespace safely, but it isn't handled now. One more issue of the current interface is that clone() with CLONE_VM isn't allowed if the current task has unshared a time namespace (timens_for_children doesn't match the current timens). Both these issues make some inconvenience for users. For example, Alexey and Florian reported that posix_spawn() uses vfork+exec and this pattern doesn't work with time namespaces due to the both described issues. LXC needed to workaround the exec() issue by calling setns. In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But it adds extra complexity and some undesirable side effects. Eric suggested fixing the issues properly because here are all the reasons to suppose that there are no users that depend on the old behavior. Cc: Alexey Izbyshev <izbyshev@ispras.ru> Cc: Christian Brauner <brauner@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Kees Cook <keescook@chromium.org> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Origin-author: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> --- fs/exec.c | 5 +++++ include/linux/nsproxy.h | 1 + kernel/fork.c | 9 --------- kernel/nsproxy.c | 23 +++++++++++++++++++++-- 4 files changed, 27 insertions(+), 11 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index d046dbb9cbd0..71284188b96d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -65,6 +65,7 @@ #include <linux/io_uring.h> #include <linux/syscall_user_dispatch.h> #include <linux/coredump.h> +#include <linux/time_namespace.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -1296,6 +1297,10 @@ int begin_new_exec(struct linux_binprm * bprm) bprm->mm = NULL; + retval = exec_task_namespaces(); + if (retval) + goto out_unlock; + #ifdef CONFIG_POSIX_TIMERS spin_lock_irq(&me->sighand->siglock); posix_cpu_timers_exit(me); diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index cdb171efc7cb..fee881cded01 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -94,6 +94,7 @@ static inline struct cred *nsset_cred(struct nsset *set) int copy_namespaces(unsigned long flags, struct task_struct *tsk); void exit_task_namespaces(struct task_struct *tsk); void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new); +int exec_task_namespaces(void); void free_nsproxy(struct nsproxy *ns); int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **, struct cred *, struct fs_struct *); diff --git a/kernel/fork.c b/kernel/fork.c index 2b6bd511c6ed..4eb803f75225 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2044,15 +2044,6 @@ static __latent_entropy struct task_struct *copy_process( return ERR_PTR(-EINVAL); } - /* - * If the new process will be in a different time namespace - * do not allow it to share VM or a thread group with the forking task. - */ - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { - if (nsp->time_ns != nsp->time_ns_for_children) - return ERR_PTR(-EINVAL); - } - if (clone_flags & CLONE_PIDFD) { /* * - CLONE_DETACHED is blocked so that we can potentially diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index eec72ca962e2..a487ff24129b 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -157,7 +157,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWCGROUP | CLONE_NEWTIME)))) { - if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) { + if ((flags & CLONE_VM) || + likely(old_ns->time_ns_for_children == old_ns->time_ns)) { get_nsproxy(old_ns); return 0; } @@ -179,7 +180,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) if (IS_ERR(new_ns)) return PTR_ERR(new_ns); - timens_on_fork(new_ns, tsk); + if ((flags & CLONE_VM) == 0) + timens_on_fork(new_ns, tsk); tsk->nsproxy = new_ns; return 0; @@ -254,6 +256,23 @@ void exit_task_namespaces(struct task_struct *p) switch_task_namespaces(p, NULL); } +int exec_task_namespaces(void) +{ + struct task_struct *tsk = current; + struct nsproxy *new; + + if (tsk->nsproxy->time_ns_for_children == tsk->nsproxy->time_ns) + return 0; + + new = create_new_namespaces(0, tsk, current_user_ns(), tsk->fs); + if (IS_ERR(new)) + return PTR_ERR(new); + + timens_on_fork(new, tsk); + switch_task_namespaces(tsk, new); + return 0; +} + static int check_setns_flags(unsigned long flags) { if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | -- 2.37.3.968.ga6b4b080e4-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] selftests/timens: add a test for vfork+exit 2022-09-21 0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin @ 2022-09-21 0:31 ` Andrei Vagin 2022-10-09 16:10 ` Alexey Izbyshev 2022-10-13 17:31 ` [PATCH 2/2 v2] " Andrei Vagin 2022-09-22 3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook 2022-10-18 7:21 ` Kees Cook 2 siblings, 2 replies; 10+ messages in thread From: Andrei Vagin @ 2022-09-21 0:31 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer From: Andrei Vagin <avagin@gmail.com> * check that a child process is in parent's time namespace after vfork. * check that a child process is in the target namespace after exec. Output on success: 1..5 ok 1 parent before vfork ok 2 child before exec ok 3 child after exec ok 4 wait for child ok 5 parent after vfork # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Andrei Vagin <avagin@gmail.com> --- tools/testing/selftests/timens/.gitignore | 1 + tools/testing/selftests/timens/Makefile | 2 +- tools/testing/selftests/timens/vfork_exec.c | 132 ++++++++++++++++++++ 3 files changed, 134 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/timens/vfork_exec.c diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore index fe1eb8271b35..cae8dca0fbff 100644 --- a/tools/testing/selftests/timens/.gitignore +++ b/tools/testing/selftests/timens/.gitignore @@ -8,3 +8,4 @@ procfs timens timer timerfd +vfork_exec diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile index 3a5936cc10ab..f0d51d4d2c87 100644 --- a/tools/testing/selftests/timens/Makefile +++ b/tools/testing/selftests/timens/Makefile @@ -1,4 +1,4 @@ -TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex +TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec TEST_GEN_PROGS_EXTENDED := gettime_perf CFLAGS := -Wall -Werror -pthread diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c new file mode 100644 index 000000000000..9fd8a64d25a9 --- /dev/null +++ b/tools/testing/selftests/timens/vfork_exec.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <sched.h> +#include <stdio.h> +#include <stdbool.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <unistd.h> +#include <string.h> +#include <pthread.h> + +#include "log.h" +#include "timens.h" + +#define OFFSET (36000) + +static void *tcheck(void *arg) +{ + struct timespec *now = arg, tst; + int i; + + for (i = 0; i < 2; i++) { + _gettime(CLOCK_MONOTONIC, &tst, i); + if (abs(tst.tv_sec - now->tv_sec) > 5) { + pr_fail("thread: unexpected value: %ld (%ld)\n", + tst.tv_sec, now->tv_sec); + return (void *)1UL; + } + } + return NULL; +} + +static int check_in_thread(struct timespec *now) +{ + pthread_t th; + void *retval; + + if (pthread_create(&th, NULL, tcheck, now)) + return pr_perror("thread"); + if (pthread_join(th, &retval)) + return pr_perror("pthread_join"); + return !(retval == NULL); +} + +static int check(char *tst_name, struct timespec *now) +{ + struct timespec tst; + int i; + + for (i = 0; i < 2; i++) { + _gettime(CLOCK_MONOTONIC, &tst, i); + if (abs(tst.tv_sec - now->tv_sec) > 5) + return pr_fail("%s: unexpected value: %ld (%ld)\n", + tst.tv_sec, now->tv_sec); + } + if (check_in_thread(now)) + return 1; + ksft_test_result_pass("%s\n", tst_name); + return 0; +} + +int main(int argc, char *argv[]) +{ + struct timespec now; + int status; + pid_t pid; + + if (argc > 1) { + char *endptr; + + ksft_cnt.ksft_pass = 2; + now.tv_sec = strtoul(argv[1], &endptr, 0); + if (*endptr != 0) + return pr_perror("strtoul"); + + return check("child after exec", &now); + } + + nscheck(); + + ksft_set_plan(5); + + clock_gettime(CLOCK_MONOTONIC, &now); + + if (unshare_timens()) + return 1; + + if (_settime(CLOCK_MONOTONIC, OFFSET)) + return 1; + + if (check("parent before vfork", &now)) + return 1; + + pid = vfork(); + if (pid < 0) + return pr_perror("fork"); + + if (pid == 0) { + char now_str[64]; + char *cargv[] = {"exec", now_str, NULL}; + char *cenv[] = {NULL}; + + // Check that we are still in the source timens. + if (check("child before exec", &now)) + return 1; + + /* Check for proper vvar offsets after execve. */ + snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET); + execve("/proc/self/exe", cargv, cenv); + return pr_perror("execve"); + } + + if (waitpid(pid, &status, 0) != pid) + return pr_perror("waitpid"); + + if (status) + ksft_exit_fail(); + ksft_inc_pass_cnt(); + ksft_test_result_pass("wait for child\n"); + + // Check that we are still in the source timens. + if (check("parent after vfork", &now)) + return 1; + + ksft_exit_pass(); + return 0; +} -- 2.37.3.968.ga6b4b080e4-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin @ 2022-10-09 16:10 ` Alexey Izbyshev 2022-10-13 17:46 ` Andrei Vagin 2022-10-13 17:31 ` [PATCH 2/2 v2] " Andrei Vagin 1 sibling, 1 reply; 10+ messages in thread From: Alexey Izbyshev @ 2022-10-09 16:10 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On 2022-09-21 03:31, Andrei Vagin wrote: > From: Andrei Vagin <avagin@gmail.com> > > * check that a child process is in parent's time namespace after vfork. > * check that a child process is in the target namespace after exec. > > Output on success: > 1..5 > ok 1 parent before vfork > ok 2 child before exec > ok 3 child after exec > ok 4 wait for child > ok 5 parent after vfork > # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 > > Signed-off-by: Andrei Vagin <avagin@gmail.com> > --- > tools/testing/selftests/timens/.gitignore | 1 + > tools/testing/selftests/timens/Makefile | 2 +- > tools/testing/selftests/timens/vfork_exec.c | 132 ++++++++++++++++++++ > 3 files changed, 134 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/timens/vfork_exec.c > > diff --git a/tools/testing/selftests/timens/.gitignore > b/tools/testing/selftests/timens/.gitignore > index fe1eb8271b35..cae8dca0fbff 100644 > --- a/tools/testing/selftests/timens/.gitignore > +++ b/tools/testing/selftests/timens/.gitignore > @@ -8,3 +8,4 @@ procfs > timens > timer > timerfd > +vfork_exec > diff --git a/tools/testing/selftests/timens/Makefile > b/tools/testing/selftests/timens/Makefile > index 3a5936cc10ab..f0d51d4d2c87 100644 > --- a/tools/testing/selftests/timens/Makefile > +++ b/tools/testing/selftests/timens/Makefile > @@ -1,4 +1,4 @@ > -TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec > futex > +TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec > futex vfork_exec > TEST_GEN_PROGS_EXTENDED := gettime_perf > > CFLAGS := -Wall -Werror -pthread > diff --git a/tools/testing/selftests/timens/vfork_exec.c > b/tools/testing/selftests/timens/vfork_exec.c > new file mode 100644 > index 000000000000..9fd8a64d25a9 > --- /dev/null > +++ b/tools/testing/selftests/timens/vfork_exec.c > @@ -0,0 +1,132 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#define _GNU_SOURCE > +#include <errno.h> > +#include <fcntl.h> > +#include <sched.h> > +#include <stdio.h> > +#include <stdbool.h> > +#include <sys/stat.h> > +#include <sys/syscall.h> > +#include <sys/types.h> > +#include <sys/wait.h> > +#include <time.h> > +#include <unistd.h> > +#include <string.h> > +#include <pthread.h> > + > +#include "log.h" > +#include "timens.h" > + > +#define OFFSET (36000) > + > +static void *tcheck(void *arg) > +{ > + struct timespec *now = arg, tst; > + int i; > + > + for (i = 0; i < 2; i++) { > + _gettime(CLOCK_MONOTONIC, &tst, i); > + if (abs(tst.tv_sec - now->tv_sec) > 5) { > + pr_fail("thread: unexpected value: %ld (%ld)\n", > + tst.tv_sec, now->tv_sec); > + return (void *)1UL; > + } > + } > + return NULL; > +} > + > +static int check_in_thread(struct timespec *now) > +{ > + pthread_t th; > + void *retval; > + > + if (pthread_create(&th, NULL, tcheck, now)) > + return pr_perror("thread"); > + if (pthread_join(th, &retval)) > + return pr_perror("pthread_join"); > + return !(retval == NULL); > +} > + > +static int check(char *tst_name, struct timespec *now) > +{ > + struct timespec tst; > + int i; > + > + for (i = 0; i < 2; i++) { > + _gettime(CLOCK_MONOTONIC, &tst, i); > + if (abs(tst.tv_sec - now->tv_sec) > 5) > + return pr_fail("%s: unexpected value: %ld (%ld)\n", > + tst.tv_sec, now->tv_sec); There is a missing argument for "%s" in pr_fail(). I'm actually surprised that there is no __attribute__((format)) on ksft_test_result_fail() that would allow GCC to catch this. > + } > + if (check_in_thread(now)) > + return 1; > + ksft_test_result_pass("%s\n", tst_name); > + return 0; > +} > + > +int main(int argc, char *argv[]) > +{ > + struct timespec now; > + int status; > + pid_t pid; > + > + if (argc > 1) { > + char *endptr; > + > + ksft_cnt.ksft_pass = 2; > + now.tv_sec = strtoul(argv[1], &endptr, 0); > + if (*endptr != 0) > + return pr_perror("strtoul"); > + > + return check("child after exec", &now); > + } > + > + nscheck(); > + > + ksft_set_plan(5); > + > + clock_gettime(CLOCK_MONOTONIC, &now); > + > + if (unshare_timens()) > + return 1; > + > + if (_settime(CLOCK_MONOTONIC, OFFSET)) > + return 1; > + > + if (check("parent before vfork", &now)) > + return 1; > + > + pid = vfork(); > + if (pid < 0) > + return pr_perror("fork"); > + > + if (pid == 0) { > + char now_str[64]; > + char *cargv[] = {"exec", now_str, NULL}; > + char *cenv[] = {NULL}; > + > + // Check that we are still in the source timens. > + if (check("child before exec", &now)) > + return 1; I know this is just a test, but... Creating threads in a vfork()-child is quite dangerous (like most other things that touch the libc state, which is shared with the parent process). Here it works probably only because pthread_create() followed by pthread_join() restores everything into more-or-less the original state before returning control to the parent, but this is something that libcs don't guarantee and that can break at any moment. Also, returning from a vfork()-child is explicitly forbidden by the vfork() contract because the parent would then return to an invalid stack frame that could be arbitrarily clobbered by code executed in the child after main() returned. Moreover, if I'm not mistaken, on x86 with Intel CET-enabled glibc (assuming the support for CET is ever merged into the kernel) such return would cause the parent to always trap because the shadow stack will become inconsistent with the normal stack. Instead, _exit() should be used here... > + > + /* Check for proper vvar offsets after execve. */ > + snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET); > + execve("/proc/self/exe", cargv, cenv); > + return pr_perror("execve"); ...and here. > + } > + > + if (waitpid(pid, &status, 0) != pid) > + return pr_perror("waitpid"); > + > + if (status) > + ksft_exit_fail(); > + ksft_inc_pass_cnt(); > + ksft_test_result_pass("wait for child\n"); > + > + // Check that we are still in the source timens. > + if (check("parent after vfork", &now)) > + return 1; > + > + ksft_exit_pass(); > + return 0; > +} Otherwise, both patches look good to me, thanks! Sorry for being late, Alexey ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit 2022-10-09 16:10 ` Alexey Izbyshev @ 2022-10-13 17:46 ` Andrei Vagin 2022-10-13 22:15 ` Alexey Izbyshev 0 siblings, 1 reply; 10+ messages in thread From: Andrei Vagin @ 2022-10-13 17:46 UTC (permalink / raw) To: Alexey Izbyshev Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On Sun, Oct 9, 2022 at 9:10 AM Alexey Izbyshev <izbyshev@ispras.ru> wrote: > > On 2022-09-21 03:31, Andrei Vagin wrote: > > From: Andrei Vagin <avagin@gmail.com> <snip> > > + if (pid == 0) { > > + char now_str[64]; > > + char *cargv[] = {"exec", now_str, NULL}; > > + char *cenv[] = {NULL}; > > + > > + // Check that we are still in the source timens. > > + if (check("child before exec", &now)) > > + return 1; > > I know this is just a test, but... > > Creating threads in a vfork()-child is quite dangerous (like most other > things that touch the libc state, which is shared with the parent > process). Here it works probably only because pthread_create() followed > by pthread_join() restores everything into more-or-less the original > state before returning control to the parent, but this is something that > libcs don't guarantee and that can break at any moment. > > Also, returning from a vfork()-child is explicitly forbidden by the > vfork() contract because the parent would then return to an invalid > stack frame that could be arbitrarily clobbered by code executed in the > child after main() returned. Moreover, if I'm not mistaken, on x86 with > Intel CET-enabled glibc (assuming the support for CET is ever merged > into the kernel) such return would cause the parent to always trap > because the shadow stack will become inconsistent with the normal stack. > Instead, _exit() should be used here... > Hi Alexey, You are right, it isn't a good idea to create threads from the vfork-ed process. Now, vfork isn't a special case in the kernel code, so I think we can just remove the check() call from here. I have sent an updated version of this patch, pls take a look at it. Thanks, Andrei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit 2022-10-13 17:46 ` Andrei Vagin @ 2022-10-13 22:15 ` Alexey Izbyshev 0 siblings, 0 replies; 10+ messages in thread From: Alexey Izbyshev @ 2022-10-13 22:15 UTC (permalink / raw) To: Andrei Vagin Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On 2022-10-13 20:46, Andrei Vagin wrote: > On Sun, Oct 9, 2022 at 9:10 AM Alexey Izbyshev <izbyshev@ispras.ru> > wrote: >> >> On 2022-09-21 03:31, Andrei Vagin wrote: >> > From: Andrei Vagin <avagin@gmail.com> > > <snip> > >> > + if (pid == 0) { >> > + char now_str[64]; >> > + char *cargv[] = {"exec", now_str, NULL}; >> > + char *cenv[] = {NULL}; >> > + >> > + // Check that we are still in the source timens. >> > + if (check("child before exec", &now)) >> > + return 1; >> >> I know this is just a test, but... >> >> Creating threads in a vfork()-child is quite dangerous (like most >> other >> things that touch the libc state, which is shared with the parent >> process). Here it works probably only because pthread_create() >> followed >> by pthread_join() restores everything into more-or-less the original >> state before returning control to the parent, but this is something >> that >> libcs don't guarantee and that can break at any moment. >> >> Also, returning from a vfork()-child is explicitly forbidden by the >> vfork() contract because the parent would then return to an invalid >> stack frame that could be arbitrarily clobbered by code executed in >> the >> child after main() returned. Moreover, if I'm not mistaken, on x86 >> with >> Intel CET-enabled glibc (assuming the support for CET is ever merged >> into the kernel) such return would cause the parent to always trap >> because the shadow stack will become inconsistent with the normal >> stack. >> Instead, _exit() should be used here... >> > > Hi Alexey, > > You are right, it isn't a good idea to create threads from the vfork-ed > process. Now, vfork isn't a special case in the kernel code, so I think > we can just remove the check() call from here. I have sent an updated > version of this patch, pls take a look at it. > Hi, Andrei, While I think you could just skip check_in_thread() in the vfork()-child instead of removing check() completely (the rest of the code in check() is unlikely to mess up the libc state), I agree that the test is still able to catch problems unconditionally affecting all CLONE_VM tasks thanks to check_in_thread() in the parent, so I don't see much point in holding it up further. Your v2 patch looks good enough to me, thanks! Alexey ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2 v2] selftests/timens: add a test for vfork+exit 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin 2022-10-09 16:10 ` Alexey Izbyshev @ 2022-10-13 17:31 ` Andrei Vagin 1 sibling, 0 replies; 10+ messages in thread From: Andrei Vagin @ 2022-10-13 17:31 UTC (permalink / raw) To: Kees Cook Cc: linux-kernel, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer, Andrei Vagin From: Andrei Vagin <avagin@gmail.com> * check that a child process is in parent's time namespace after vfork. * check that a child process is in the target namespace after exec. Output on success: 1..4 ok 1 parent before vfork ok 2 child after exec ok 3 wait for child ok 4 parent after vfork # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Andrei Vagin <avagin@gmail.com> --- v2: - don't create threads from a vfork-ed process. - use _exit to exit from a vfork-ed process. tools/testing/selftests/timens/.gitignore | 1 + tools/testing/selftests/timens/Makefile | 2 +- tools/testing/selftests/timens/vfork_exec.c | 139 ++++++++++++++++++++ 3 files changed, 141 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/timens/vfork_exec.c diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore index fe1eb8271b35..cae8dca0fbff 100644 --- a/tools/testing/selftests/timens/.gitignore +++ b/tools/testing/selftests/timens/.gitignore @@ -8,3 +8,4 @@ procfs timens timer timerfd +vfork_exec diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile index 3a5936cc10ab..f0d51d4d2c87 100644 --- a/tools/testing/selftests/timens/Makefile +++ b/tools/testing/selftests/timens/Makefile @@ -1,4 +1,4 @@ -TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex +TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec TEST_GEN_PROGS_EXTENDED := gettime_perf CFLAGS := -Wall -Werror -pthread diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c new file mode 100644 index 000000000000..fe3d0e15aa7e --- /dev/null +++ b/tools/testing/selftests/timens/vfork_exec.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <sched.h> +#include <stdio.h> +#include <stdbool.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <time.h> +#include <unistd.h> +#include <string.h> +#include <pthread.h> + +#include "log.h" +#include "timens.h" + +#define OFFSET (36000) + +struct thread_args { + char *tst_name; + struct timespec *now; +}; + +static void *tcheck(void *_args) +{ + struct thread_args *args = _args; + struct timespec *now = args->now, tst; + int i; + + for (i = 0; i < 2; i++) { + _gettime(CLOCK_MONOTONIC, &tst, i); + if (abs(tst.tv_sec - now->tv_sec) > 5) { + pr_fail("%s: in-thread: unexpected value: %ld (%ld)\n", + args->tst_name, tst.tv_sec, now->tv_sec); + return (void *)1UL; + } + } + return NULL; +} + +static int check_in_thread(char *tst_name, struct timespec *now) +{ + struct thread_args args = { + .tst_name = tst_name, + .now = now, + }; + pthread_t th; + void *retval; + + if (pthread_create(&th, NULL, tcheck, &args)) + return pr_perror("thread"); + if (pthread_join(th, &retval)) + return pr_perror("pthread_join"); + return !(retval == NULL); +} + +static int check(char *tst_name, struct timespec *now) +{ + struct timespec tst; + int i; + + for (i = 0; i < 2; i++) { + _gettime(CLOCK_MONOTONIC, &tst, i); + if (abs(tst.tv_sec - now->tv_sec) > 5) + return pr_fail("%s: unexpected value: %ld (%ld)\n", + tst_name, tst.tv_sec, now->tv_sec); + } + if (check_in_thread(tst_name, now)) + return 1; + ksft_test_result_pass("%s\n", tst_name); + return 0; +} + +int main(int argc, char *argv[]) +{ + struct timespec now; + int status; + pid_t pid; + + if (argc > 1) { + char *endptr; + + ksft_cnt.ksft_pass = 1; + now.tv_sec = strtoul(argv[1], &endptr, 0); + if (*endptr != 0) + return pr_perror("strtoul"); + + return check("child after exec", &now); + } + + nscheck(); + + ksft_set_plan(4); + + clock_gettime(CLOCK_MONOTONIC, &now); + + if (unshare_timens()) + return 1; + + if (_settime(CLOCK_MONOTONIC, OFFSET)) + return 1; + + if (check("parent before vfork", &now)) + return 1; + + pid = vfork(); + if (pid < 0) + return pr_perror("fork"); + + if (pid == 0) { + char now_str[64]; + char *cargv[] = {"exec", now_str, NULL}; + char *cenv[] = {NULL}; + + /* Check for proper vvar offsets after execve. */ + snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET); + execve("/proc/self/exe", cargv, cenv); + pr_perror("execve"); + _exit(1); + } + + if (waitpid(pid, &status, 0) != pid) + return pr_perror("waitpid"); + + if (status) + ksft_exit_fail(); + ksft_inc_pass_cnt(); + ksft_test_result_pass("wait for child\n"); + + /* Check that we are still in the source timens. */ + if (check("parent after vfork", &now)) + return 1; + + ksft_exit_pass(); + return 0; +} -- 2.38.0.413.g74048e4d9e-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm 2022-09-21 0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin @ 2022-09-22 3:29 ` Kees Cook 2022-09-23 1:47 ` Andrei Vagin 2022-10-18 7:21 ` Kees Cook 2 siblings, 1 reply; 10+ messages in thread From: Kees Cook @ 2022-09-22 3:29 UTC (permalink / raw) To: Andrei Vagin Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On Tue, Sep 20, 2022 at 05:31:19PM -0700, Andrei Vagin wrote: > From: Andrei Vagin <avagin@gmail.com> > > Changing a time namespace requires remapping a vvar page, so we don't want > to allow doing that if any other tasks can use the same mm. > > Currently, we install a time namespace when a task is created with a new > vm. exec() is another case when a task gets a new mm and so it can switch > a time namespace safely, but it isn't handled now. > > One more issue of the current interface is that clone() with CLONE_VM isn't > allowed if the current task has unshared a time namespace > (timens_for_children doesn't match the current timens). > > Both these issues make some inconvenience for users. For example, Alexey > and Florian reported that posix_spawn() uses vfork+exec and this pattern > doesn't work with time namespaces due to the both described issues. > LXC needed to workaround the exec() issue by calling setns. > > In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on > vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But > it adds extra complexity and some undesirable side effects. Eric suggested > fixing the issues properly because here are all the reasons to suppose that > there are no users that depend on the old behavior. > > Cc: Alexey Izbyshev <izbyshev@ispras.ru> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Dmitry Safonov <0x7f454c46@gmail.com> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Kees Cook <keescook@chromium.org> > Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> > Origin-author: "Eric W. Biederman" <ebiederm@xmission.com> > Signed-off-by: Andrei Vagin <avagin@gmail.com> This looks good -- my intention is for this to go into -next after the v6.1 merge window closes. Does that match everyone's expectations? Thanks! -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm 2022-09-22 3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook @ 2022-09-23 1:47 ` Andrei Vagin 2022-09-24 5:02 ` Kees Cook 0 siblings, 1 reply; 10+ messages in thread From: Andrei Vagin @ 2022-09-23 1:47 UTC (permalink / raw) To: Kees Cook Cc: Andrei Vagin, linux-kernel, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On Wed, Sep 21, 2022 at 8:29 PM Kees Cook <keescook@chromium.org> wrote: ... > > This looks good -- my intention is for this to go into -next after the > v6.1 merge window closes. Does that match everyone's expectations? > This works for me. If everything goes well, it will be merged to v6.2. Do I understand this right? Thanks, Andrei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm 2022-09-23 1:47 ` Andrei Vagin @ 2022-09-24 5:02 ` Kees Cook 0 siblings, 0 replies; 10+ messages in thread From: Kees Cook @ 2022-09-24 5:02 UTC (permalink / raw) To: Andrei Vagin Cc: Andrei Vagin, linux-kernel, Alexey Izbyshev, Christian Brauner, Dmitry Safonov, Eric W. Biederman, Florian Weimer On Thu, Sep 22, 2022 at 06:47:46PM -0700, Andrei Vagin wrote: > On Wed, Sep 21, 2022 at 8:29 PM Kees Cook <keescook@chromium.org> wrote: > ... > > > > This looks good -- my intention is for this to go into -next after the > > v6.1 merge window closes. Does that match everyone's expectations? > > > > This works for me. > > If everything goes well, it will be merged to v6.2. Do I understand this right? Right, that's what I'm expecting. :) -- Kees Cook ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm 2022-09-21 0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin 2022-09-22 3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook @ 2022-10-18 7:21 ` Kees Cook 2 siblings, 0 replies; 10+ messages in thread From: Kees Cook @ 2022-10-18 7:21 UTC (permalink / raw) To: avagin Cc: Kees Cook, fweimer, linux-kernel, ebiederm, izbyshev, avagin, 0x7f454c46, brauner On Tue, 20 Sep 2022 17:31:19 -0700, Andrei Vagin wrote: > From: Andrei Vagin <avagin@gmail.com> > > Changing a time namespace requires remapping a vvar page, so we don't want > to allow doing that if any other tasks can use the same mm. > > Currently, we install a time namespace when a task is created with a new > vm. exec() is another case when a task gets a new mm and so it can switch > a time namespace safely, but it isn't handled now. > > [...] Applied to for-next/execve, thanks! [1/2] fs/exec: switch timens when a task gets a new mm https://git.kernel.org/kees/c/f8e06046dd69 [2/2] selftests/timens: add a test for vfork+exit https://git.kernel.org/kees/c/de8517320d01 -- Kees Cook ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-10-18 7:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-21 0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin 2022-09-21 0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin 2022-10-09 16:10 ` Alexey Izbyshev 2022-10-13 17:46 ` Andrei Vagin 2022-10-13 22:15 ` Alexey Izbyshev 2022-10-13 17:31 ` [PATCH 2/2 v2] " Andrei Vagin 2022-09-22 3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook 2022-09-23 1:47 ` Andrei Vagin 2022-09-24 5:02 ` Kees Cook 2022-10-18 7:21 ` Kees Cook
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.