All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] fs/exec: switch timens when a task gets a new mm
@ 2022-09-21  0:31 Andrei Vagin
  2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrei Vagin @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

From: Andrei Vagin <avagin@gmail.com>

Changing a time namespace requires remapping a vvar page, so we don't want
to allow doing that if any other tasks can use the same mm.

Currently, we install a time namespace when a task is created with a new
vm. exec() is another case when a task gets a new mm and so it can switch
a time namespace safely, but it isn't handled now.

One more issue of the current interface is that clone() with CLONE_VM isn't
allowed if the current task has unshared a time namespace
(timens_for_children doesn't match the current timens).

Both these issues make some inconvenience for users. For example, Alexey
and Florian reported that posix_spawn() uses vfork+exec and this pattern
doesn't work with time namespaces due to the both described issues.
LXC needed to workaround the exec() issue by calling setns.

In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on
vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But
it adds extra complexity and some undesirable side effects. Eric suggested
fixing the issues properly because here are all the reasons to suppose that
there are no users that depend on the old behavior.

Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Origin-author: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 fs/exec.c               |  5 +++++
 include/linux/nsproxy.h |  1 +
 kernel/fork.c           |  9 ---------
 kernel/nsproxy.c        | 23 +++++++++++++++++++++--
 4 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d046dbb9cbd0..71284188b96d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -65,6 +65,7 @@
 #include <linux/io_uring.h>
 #include <linux/syscall_user_dispatch.h>
 #include <linux/coredump.h>
+#include <linux/time_namespace.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1296,6 +1297,10 @@ int begin_new_exec(struct linux_binprm * bprm)
 
 	bprm->mm = NULL;
 
+	retval = exec_task_namespaces();
+	if (retval)
+		goto out_unlock;
+
 #ifdef CONFIG_POSIX_TIMERS
 	spin_lock_irq(&me->sighand->siglock);
 	posix_cpu_timers_exit(me);
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index cdb171efc7cb..fee881cded01 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -94,6 +94,7 @@ static inline struct cred *nsset_cred(struct nsset *set)
 int copy_namespaces(unsigned long flags, struct task_struct *tsk);
 void exit_task_namespaces(struct task_struct *tsk);
 void switch_task_namespaces(struct task_struct *tsk, struct nsproxy *new);
+int exec_task_namespaces(void);
 void free_nsproxy(struct nsproxy *ns);
 int unshare_nsproxy_namespaces(unsigned long, struct nsproxy **,
 	struct cred *, struct fs_struct *);
diff --git a/kernel/fork.c b/kernel/fork.c
index 2b6bd511c6ed..4eb803f75225 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2044,15 +2044,6 @@ static __latent_entropy struct task_struct *copy_process(
 			return ERR_PTR(-EINVAL);
 	}
 
-	/*
-	 * If the new process will be in a different time namespace
-	 * do not allow it to share VM or a thread group with the forking task.
-	 */
-	if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
-		if (nsp->time_ns != nsp->time_ns_for_children)
-			return ERR_PTR(-EINVAL);
-	}
-
 	if (clone_flags & CLONE_PIDFD) {
 		/*
 		 * - CLONE_DETACHED is blocked so that we can potentially
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index eec72ca962e2..a487ff24129b 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -157,7 +157,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
 			      CLONE_NEWPID | CLONE_NEWNET |
 			      CLONE_NEWCGROUP | CLONE_NEWTIME)))) {
-		if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) {
+		if ((flags & CLONE_VM) ||
+		    likely(old_ns->time_ns_for_children == old_ns->time_ns)) {
 			get_nsproxy(old_ns);
 			return 0;
 		}
@@ -179,7 +180,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	if (IS_ERR(new_ns))
 		return  PTR_ERR(new_ns);
 
-	timens_on_fork(new_ns, tsk);
+	if ((flags & CLONE_VM) == 0)
+		timens_on_fork(new_ns, tsk);
 
 	tsk->nsproxy = new_ns;
 	return 0;
@@ -254,6 +256,23 @@ void exit_task_namespaces(struct task_struct *p)
 	switch_task_namespaces(p, NULL);
 }
 
+int exec_task_namespaces(void)
+{
+	struct task_struct *tsk = current;
+	struct nsproxy *new;
+
+	if (tsk->nsproxy->time_ns_for_children == tsk->nsproxy->time_ns)
+		return 0;
+
+	new = create_new_namespaces(0, tsk, current_user_ns(), tsk->fs);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	timens_on_fork(new, tsk);
+	switch_task_namespaces(tsk, new);
+	return 0;
+}
+
 static int check_setns_flags(unsigned long flags)
 {
 	if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] selftests/timens: add a test for vfork+exit
  2022-09-21  0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin
@ 2022-09-21  0:31 ` Andrei Vagin
  2022-10-09 16:10   ` Alexey Izbyshev
  2022-10-13 17:31   ` [PATCH 2/2 v2] " Andrei Vagin
  2022-09-22  3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook
  2022-10-18  7:21 ` Kees Cook
  2 siblings, 2 replies; 10+ messages in thread
From: Andrei Vagin @ 2022-09-21  0:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

From: Andrei Vagin <avagin@gmail.com>

* check that a child process is in parent's time namespace after vfork.
* check that a child process is in the target namespace after exec.

Output on success:
 1..5
 ok 1 parent before vfork
 ok 2 child before exec
 ok 3 child after exec
 ok 4 wait for child
 ok 5 parent after vfork
 # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 tools/testing/selftests/timens/.gitignore   |   1 +
 tools/testing/selftests/timens/Makefile     |   2 +-
 tools/testing/selftests/timens/vfork_exec.c | 132 ++++++++++++++++++++
 3 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/vfork_exec.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index fe1eb8271b35..cae8dca0fbff 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -8,3 +8,4 @@ procfs
 timens
 timer
 timerfd
+vfork_exec
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index 3a5936cc10ab..f0d51d4d2c87 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec
 TEST_GEN_PROGS_EXTENDED := gettime_perf
 
 CFLAGS := -Wall -Werror -pthread
diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c
new file mode 100644
index 000000000000..9fd8a64d25a9
--- /dev/null
+++ b/tools/testing/selftests/timens/vfork_exec.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "log.h"
+#include "timens.h"
+
+#define OFFSET (36000)
+
+static void *tcheck(void *arg)
+{
+	struct timespec *now = arg, tst;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		_gettime(CLOCK_MONOTONIC, &tst, i);
+		if (abs(tst.tv_sec - now->tv_sec) > 5) {
+			pr_fail("thread: unexpected value: %ld (%ld)\n",
+				tst.tv_sec, now->tv_sec);
+			return (void *)1UL;
+		}
+	}
+	return NULL;
+}
+
+static int check_in_thread(struct timespec *now)
+{
+	pthread_t th;
+	void *retval;
+
+	if (pthread_create(&th, NULL, tcheck, now))
+		return pr_perror("thread");
+	if (pthread_join(th, &retval))
+		return pr_perror("pthread_join");
+	return !(retval == NULL);
+}
+
+static int check(char *tst_name, struct timespec *now)
+{
+	struct timespec tst;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		_gettime(CLOCK_MONOTONIC, &tst, i);
+		if (abs(tst.tv_sec - now->tv_sec) > 5)
+			return pr_fail("%s: unexpected value: %ld (%ld)\n",
+					tst.tv_sec, now->tv_sec);
+	}
+	if (check_in_thread(now))
+		return 1;
+	ksft_test_result_pass("%s\n", tst_name);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	struct timespec now;
+	int status;
+	pid_t pid;
+
+	if (argc > 1) {
+		char *endptr;
+
+		ksft_cnt.ksft_pass = 2;
+		now.tv_sec = strtoul(argv[1], &endptr, 0);
+		if (*endptr != 0)
+			return pr_perror("strtoul");
+
+		return check("child after exec", &now);
+	}
+
+	nscheck();
+
+	ksft_set_plan(5);
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+
+	if (unshare_timens())
+		return 1;
+
+	if (_settime(CLOCK_MONOTONIC, OFFSET))
+		return 1;
+
+	if (check("parent before vfork", &now))
+		return 1;
+
+	pid = vfork();
+	if (pid < 0)
+		return pr_perror("fork");
+
+	if (pid == 0) {
+		char now_str[64];
+		char *cargv[] = {"exec", now_str, NULL};
+		char *cenv[] = {NULL};
+
+		// Check that we are still in the source timens.
+		if (check("child before exec", &now))
+			return 1;
+
+		/* Check for proper vvar offsets after execve. */
+		snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET);
+		execve("/proc/self/exe", cargv, cenv);
+		return pr_perror("execve");
+	}
+
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("waitpid");
+
+	if (status)
+		ksft_exit_fail();
+	ksft_inc_pass_cnt();
+	ksft_test_result_pass("wait for child\n");
+
+	// Check that we are still in the source timens.
+	if (check("parent after vfork", &now))
+		return 1;
+
+	ksft_exit_pass();
+	return 0;
+}
-- 
2.37.3.968.ga6b4b080e4-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm
  2022-09-21  0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin
  2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
@ 2022-09-22  3:29 ` Kees Cook
  2022-09-23  1:47   ` Andrei Vagin
  2022-10-18  7:21 ` Kees Cook
  2 siblings, 1 reply; 10+ messages in thread
From: Kees Cook @ 2022-09-22  3:29 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: linux-kernel, Andrei Vagin, Alexey Izbyshev, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On Tue, Sep 20, 2022 at 05:31:19PM -0700, Andrei Vagin wrote:
> From: Andrei Vagin <avagin@gmail.com>
> 
> Changing a time namespace requires remapping a vvar page, so we don't want
> to allow doing that if any other tasks can use the same mm.
> 
> Currently, we install a time namespace when a task is created with a new
> vm. exec() is another case when a task gets a new mm and so it can switch
> a time namespace safely, but it isn't handled now.
> 
> One more issue of the current interface is that clone() with CLONE_VM isn't
> allowed if the current task has unshared a time namespace
> (timens_for_children doesn't match the current timens).
> 
> Both these issues make some inconvenience for users. For example, Alexey
> and Florian reported that posix_spawn() uses vfork+exec and this pattern
> doesn't work with time namespaces due to the both described issues.
> LXC needed to workaround the exec() issue by calling setns.
> 
> In the commit 133e2d3e81de5 ("fs/exec: allow to unshare a time namespace on
> vfork+exec"), we tried to fix these issues with minimal impact on UAPI. But
> it adds extra complexity and some undesirable side effects. Eric suggested
> fixing the issues properly because here are all the reasons to suppose that
> there are no users that depend on the old behavior.
> 
> Cc: Alexey Izbyshev <izbyshev@ispras.ru>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Origin-author: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Andrei Vagin <avagin@gmail.com>

This looks good -- my intention is for this to go into -next after the
v6.1 merge window closes. Does that match everyone's expectations?

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm
  2022-09-22  3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook
@ 2022-09-23  1:47   ` Andrei Vagin
  2022-09-24  5:02     ` Kees Cook
  0 siblings, 1 reply; 10+ messages in thread
From: Andrei Vagin @ 2022-09-23  1:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrei Vagin, linux-kernel, Alexey Izbyshev, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On Wed, Sep 21, 2022 at 8:29 PM Kees Cook <keescook@chromium.org> wrote:
...
>
> This looks good -- my intention is for this to go into -next after the
> v6.1 merge window closes. Does that match everyone's expectations?
>

This works for me.

If everything goes well, it will be merged to v6.2. Do I understand this right?

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm
  2022-09-23  1:47   ` Andrei Vagin
@ 2022-09-24  5:02     ` Kees Cook
  0 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2022-09-24  5:02 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Andrei Vagin, linux-kernel, Alexey Izbyshev, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On Thu, Sep 22, 2022 at 06:47:46PM -0700, Andrei Vagin wrote:
> On Wed, Sep 21, 2022 at 8:29 PM Kees Cook <keescook@chromium.org> wrote:
> ...
> >
> > This looks good -- my intention is for this to go into -next after the
> > v6.1 merge window closes. Does that match everyone's expectations?
> >
> 
> This works for me.
> 
> If everything goes well, it will be merged to v6.2. Do I understand this right?

Right, that's what I'm expecting. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit
  2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
@ 2022-10-09 16:10   ` Alexey Izbyshev
  2022-10-13 17:46     ` Andrei Vagin
  2022-10-13 17:31   ` [PATCH 2/2 v2] " Andrei Vagin
  1 sibling, 1 reply; 10+ messages in thread
From: Alexey Izbyshev @ 2022-10-09 16:10 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On 2022-09-21 03:31, Andrei Vagin wrote:
> From: Andrei Vagin <avagin@gmail.com>
> 
> * check that a child process is in parent's time namespace after vfork.
> * check that a child process is in the target namespace after exec.
> 
> Output on success:
>  1..5
>  ok 1 parent before vfork
>  ok 2 child before exec
>  ok 3 child after exec
>  ok 4 wait for child
>  ok 5 parent after vfork
>  # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> ---
>  tools/testing/selftests/timens/.gitignore   |   1 +
>  tools/testing/selftests/timens/Makefile     |   2 +-
>  tools/testing/selftests/timens/vfork_exec.c | 132 ++++++++++++++++++++
>  3 files changed, 134 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/timens/vfork_exec.c
> 
> diff --git a/tools/testing/selftests/timens/.gitignore
> b/tools/testing/selftests/timens/.gitignore
> index fe1eb8271b35..cae8dca0fbff 100644
> --- a/tools/testing/selftests/timens/.gitignore
> +++ b/tools/testing/selftests/timens/.gitignore
> @@ -8,3 +8,4 @@ procfs
>  timens
>  timer
>  timerfd
> +vfork_exec
> diff --git a/tools/testing/selftests/timens/Makefile
> b/tools/testing/selftests/timens/Makefile
> index 3a5936cc10ab..f0d51d4d2c87 100644
> --- a/tools/testing/selftests/timens/Makefile
> +++ b/tools/testing/selftests/timens/Makefile
> @@ -1,4 +1,4 @@
> -TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec 
> futex
> +TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec
> futex vfork_exec
>  TEST_GEN_PROGS_EXTENDED := gettime_perf
> 
>  CFLAGS := -Wall -Werror -pthread
> diff --git a/tools/testing/selftests/timens/vfork_exec.c
> b/tools/testing/selftests/timens/vfork_exec.c
> new file mode 100644
> index 000000000000..9fd8a64d25a9
> --- /dev/null
> +++ b/tools/testing/selftests/timens/vfork_exec.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdbool.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <time.h>
> +#include <unistd.h>
> +#include <string.h>
> +#include <pthread.h>
> +
> +#include "log.h"
> +#include "timens.h"
> +
> +#define OFFSET (36000)
> +
> +static void *tcheck(void *arg)
> +{
> +	struct timespec *now = arg, tst;
> +	int i;
> +
> +	for (i = 0; i < 2; i++) {
> +		_gettime(CLOCK_MONOTONIC, &tst, i);
> +		if (abs(tst.tv_sec - now->tv_sec) > 5) {
> +			pr_fail("thread: unexpected value: %ld (%ld)\n",
> +				tst.tv_sec, now->tv_sec);
> +			return (void *)1UL;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +static int check_in_thread(struct timespec *now)
> +{
> +	pthread_t th;
> +	void *retval;
> +
> +	if (pthread_create(&th, NULL, tcheck, now))
> +		return pr_perror("thread");
> +	if (pthread_join(th, &retval))
> +		return pr_perror("pthread_join");
> +	return !(retval == NULL);
> +}
> +
> +static int check(char *tst_name, struct timespec *now)
> +{
> +	struct timespec tst;
> +	int i;
> +
> +	for (i = 0; i < 2; i++) {
> +		_gettime(CLOCK_MONOTONIC, &tst, i);
> +		if (abs(tst.tv_sec - now->tv_sec) > 5)
> +			return pr_fail("%s: unexpected value: %ld (%ld)\n",
> +					tst.tv_sec, now->tv_sec);

There is a missing argument for "%s" in pr_fail(). I'm actually 
surprised that there is no __attribute__((format)) on 
ksft_test_result_fail() that would allow GCC to catch this.

> +	}
> +	if (check_in_thread(now))
> +		return 1;
> +	ksft_test_result_pass("%s\n", tst_name);
> +	return 0;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	struct timespec now;
> +	int status;
> +	pid_t pid;
> +
> +	if (argc > 1) {
> +		char *endptr;
> +
> +		ksft_cnt.ksft_pass = 2;
> +		now.tv_sec = strtoul(argv[1], &endptr, 0);
> +		if (*endptr != 0)
> +			return pr_perror("strtoul");
> +
> +		return check("child after exec", &now);
> +	}
> +
> +	nscheck();
> +
> +	ksft_set_plan(5);
> +
> +	clock_gettime(CLOCK_MONOTONIC, &now);
> +
> +	if (unshare_timens())
> +		return 1;
> +
> +	if (_settime(CLOCK_MONOTONIC, OFFSET))
> +		return 1;
> +
> +	if (check("parent before vfork", &now))
> +		return 1;
> +
> +	pid = vfork();
> +	if (pid < 0)
> +		return pr_perror("fork");
> +
> +	if (pid == 0) {
> +		char now_str[64];
> +		char *cargv[] = {"exec", now_str, NULL};
> +		char *cenv[] = {NULL};
> +
> +		// Check that we are still in the source timens.
> +		if (check("child before exec", &now))
> +			return 1;

I know this is just a test, but...

Creating threads in a vfork()-child is quite dangerous (like most other 
things that touch the libc state, which is shared with the parent 
process). Here it works probably only because pthread_create() followed 
by pthread_join() restores everything into more-or-less the original 
state before returning control to the parent, but this is something that 
libcs don't guarantee and that can break at any moment.

Also, returning from a vfork()-child is explicitly forbidden by the 
vfork() contract because the parent would then return to an invalid 
stack frame that could be arbitrarily clobbered by code executed in the 
child after main() returned. Moreover, if I'm not mistaken, on x86 with 
Intel CET-enabled glibc (assuming the support for CET is ever merged 
into the kernel) such return would cause the parent to always trap 
because the shadow stack will become inconsistent with the normal stack. 
Instead, _exit() should be used here...

> +
> +		/* Check for proper vvar offsets after execve. */
> +		snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET);
> +		execve("/proc/self/exe", cargv, cenv);
> +		return pr_perror("execve");

...and here.

> +	}
> +
> +	if (waitpid(pid, &status, 0) != pid)
> +		return pr_perror("waitpid");
> +
> +	if (status)
> +		ksft_exit_fail();
> +	ksft_inc_pass_cnt();
> +	ksft_test_result_pass("wait for child\n");
> +
> +	// Check that we are still in the source timens.
> +	if (check("parent after vfork", &now))
> +		return 1;
> +
> +	ksft_exit_pass();
> +	return 0;
> +}

Otherwise, both patches look good to me, thanks!

Sorry for being late,
Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2 v2] selftests/timens: add a test for vfork+exit
  2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
  2022-10-09 16:10   ` Alexey Izbyshev
@ 2022-10-13 17:31   ` Andrei Vagin
  1 sibling, 0 replies; 10+ messages in thread
From: Andrei Vagin @ 2022-10-13 17:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Alexey Izbyshev, Christian Brauner, Dmitry Safonov,
	Eric W. Biederman, Florian Weimer, Andrei Vagin

From: Andrei Vagin <avagin@gmail.com>

* check that a child process is in parent's time namespace after vfork.
* check that a child process is in the target namespace after exec.

Output on success:
 1..4
 ok 1 parent before vfork
 ok 2 child after exec
 ok 3 wait for child
 ok 4 parent after vfork
 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---

v2: - don't create threads from a vfork-ed process.
    - use _exit to exit from a vfork-ed process.

 tools/testing/selftests/timens/.gitignore   |   1 +
 tools/testing/selftests/timens/Makefile     |   2 +-
 tools/testing/selftests/timens/vfork_exec.c | 139 ++++++++++++++++++++
 3 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/vfork_exec.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index fe1eb8271b35..cae8dca0fbff 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -8,3 +8,4 @@ procfs
 timens
 timer
 timerfd
+vfork_exec
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index 3a5936cc10ab..f0d51d4d2c87 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs exec futex vfork_exec
 TEST_GEN_PROGS_EXTENDED := gettime_perf
 
 CFLAGS := -Wall -Werror -pthread
diff --git a/tools/testing/selftests/timens/vfork_exec.c b/tools/testing/selftests/timens/vfork_exec.c
new file mode 100644
index 000000000000..fe3d0e15aa7e
--- /dev/null
+++ b/tools/testing/selftests/timens/vfork_exec.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "log.h"
+#include "timens.h"
+
+#define OFFSET (36000)
+
+struct thread_args {
+	char *tst_name;
+	struct timespec *now;
+};
+
+static void *tcheck(void *_args)
+{
+	struct thread_args *args = _args;
+	struct timespec *now = args->now, tst;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		_gettime(CLOCK_MONOTONIC, &tst, i);
+		if (abs(tst.tv_sec - now->tv_sec) > 5) {
+			pr_fail("%s: in-thread: unexpected value: %ld (%ld)\n",
+				args->tst_name, tst.tv_sec, now->tv_sec);
+			return (void *)1UL;
+		}
+	}
+	return NULL;
+}
+
+static int check_in_thread(char *tst_name, struct timespec *now)
+{
+	struct thread_args args = {
+		.tst_name = tst_name,
+		.now = now,
+	};
+	pthread_t th;
+	void *retval;
+
+	if (pthread_create(&th, NULL, tcheck, &args))
+		return pr_perror("thread");
+	if (pthread_join(th, &retval))
+		return pr_perror("pthread_join");
+	return !(retval == NULL);
+}
+
+static int check(char *tst_name, struct timespec *now)
+{
+	struct timespec tst;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		_gettime(CLOCK_MONOTONIC, &tst, i);
+		if (abs(tst.tv_sec - now->tv_sec) > 5)
+			return pr_fail("%s: unexpected value: %ld (%ld)\n",
+					tst_name, tst.tv_sec, now->tv_sec);
+	}
+	if (check_in_thread(tst_name, now))
+		return 1;
+	ksft_test_result_pass("%s\n", tst_name);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	struct timespec now;
+	int status;
+	pid_t pid;
+
+	if (argc > 1) {
+		char *endptr;
+
+		ksft_cnt.ksft_pass = 1;
+		now.tv_sec = strtoul(argv[1], &endptr, 0);
+		if (*endptr != 0)
+			return pr_perror("strtoul");
+
+		return check("child after exec", &now);
+	}
+
+	nscheck();
+
+	ksft_set_plan(4);
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+
+	if (unshare_timens())
+		return 1;
+
+	if (_settime(CLOCK_MONOTONIC, OFFSET))
+		return 1;
+
+	if (check("parent before vfork", &now))
+		return 1;
+
+	pid = vfork();
+	if (pid < 0)
+		return pr_perror("fork");
+
+	if (pid == 0) {
+		char now_str[64];
+		char *cargv[] = {"exec", now_str, NULL};
+		char *cenv[] = {NULL};
+
+		/* Check for proper vvar offsets after execve. */
+		snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET);
+		execve("/proc/self/exe", cargv, cenv);
+		pr_perror("execve");
+		_exit(1);
+	}
+
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("waitpid");
+
+	if (status)
+		ksft_exit_fail();
+	ksft_inc_pass_cnt();
+	ksft_test_result_pass("wait for child\n");
+
+	/* Check that we are still in the source timens. */
+	if (check("parent after vfork", &now))
+		return 1;
+
+	ksft_exit_pass();
+	return 0;
+}
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit
  2022-10-09 16:10   ` Alexey Izbyshev
@ 2022-10-13 17:46     ` Andrei Vagin
  2022-10-13 22:15       ` Alexey Izbyshev
  0 siblings, 1 reply; 10+ messages in thread
From: Andrei Vagin @ 2022-10-13 17:46 UTC (permalink / raw)
  To: Alexey Izbyshev
  Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On Sun, Oct 9, 2022 at 9:10 AM Alexey Izbyshev <izbyshev@ispras.ru> wrote:
>
> On 2022-09-21 03:31, Andrei Vagin wrote:
> > From: Andrei Vagin <avagin@gmail.com>

<snip>

> > +     if (pid == 0) {
> > +             char now_str[64];
> > +             char *cargv[] = {"exec", now_str, NULL};
> > +             char *cenv[] = {NULL};
> > +
> > +             // Check that we are still in the source timens.
> > +             if (check("child before exec", &now))
> > +                     return 1;
>
> I know this is just a test, but...
>
> Creating threads in a vfork()-child is quite dangerous (like most other
> things that touch the libc state, which is shared with the parent
> process). Here it works probably only because pthread_create() followed
> by pthread_join() restores everything into more-or-less the original
> state before returning control to the parent, but this is something that
> libcs don't guarantee and that can break at any moment.
>
> Also, returning from a vfork()-child is explicitly forbidden by the
> vfork() contract because the parent would then return to an invalid
> stack frame that could be arbitrarily clobbered by code executed in the
> child after main() returned. Moreover, if I'm not mistaken, on x86 with
> Intel CET-enabled glibc (assuming the support for CET is ever merged
> into the kernel) such return would cause the parent to always trap
> because the shadow stack will become inconsistent with the normal stack.
> Instead, _exit() should be used here...
>

Hi Alexey,

You are right, it isn't a good idea to create threads from the vfork-ed
process. Now, vfork isn't a special case in the kernel code, so I think
we can just remove the check() call from here. I have sent an updated
version of this patch, pls take a look at it.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] selftests/timens: add a test for vfork+exit
  2022-10-13 17:46     ` Andrei Vagin
@ 2022-10-13 22:15       ` Alexey Izbyshev
  0 siblings, 0 replies; 10+ messages in thread
From: Alexey Izbyshev @ 2022-10-13 22:15 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, linux-kernel, Andrei Vagin, Christian Brauner,
	Dmitry Safonov, Eric W. Biederman, Florian Weimer

On 2022-10-13 20:46, Andrei Vagin wrote:
> On Sun, Oct 9, 2022 at 9:10 AM Alexey Izbyshev <izbyshev@ispras.ru> 
> wrote:
>> 
>> On 2022-09-21 03:31, Andrei Vagin wrote:
>> > From: Andrei Vagin <avagin@gmail.com>
> 
> <snip>
> 
>> > +     if (pid == 0) {
>> > +             char now_str[64];
>> > +             char *cargv[] = {"exec", now_str, NULL};
>> > +             char *cenv[] = {NULL};
>> > +
>> > +             // Check that we are still in the source timens.
>> > +             if (check("child before exec", &now))
>> > +                     return 1;
>> 
>> I know this is just a test, but...
>> 
>> Creating threads in a vfork()-child is quite dangerous (like most 
>> other
>> things that touch the libc state, which is shared with the parent
>> process). Here it works probably only because pthread_create() 
>> followed
>> by pthread_join() restores everything into more-or-less the original
>> state before returning control to the parent, but this is something 
>> that
>> libcs don't guarantee and that can break at any moment.
>> 
>> Also, returning from a vfork()-child is explicitly forbidden by the
>> vfork() contract because the parent would then return to an invalid
>> stack frame that could be arbitrarily clobbered by code executed in 
>> the
>> child after main() returned. Moreover, if I'm not mistaken, on x86 
>> with
>> Intel CET-enabled glibc (assuming the support for CET is ever merged
>> into the kernel) such return would cause the parent to always trap
>> because the shadow stack will become inconsistent with the normal 
>> stack.
>> Instead, _exit() should be used here...
>> 
> 
> Hi Alexey,
> 
> You are right, it isn't a good idea to create threads from the vfork-ed
> process. Now, vfork isn't a special case in the kernel code, so I think
> we can just remove the check() call from here. I have sent an updated
> version of this patch, pls take a look at it.
> 
Hi, Andrei,

While I think you could just skip check_in_thread() in the vfork()-child 
instead of removing check() completely (the rest of the code in check() 
is unlikely to mess up the libc state), I agree that the test is still 
able to catch problems unconditionally affecting all CLONE_VM tasks 
thanks to check_in_thread() in the parent, so I don't see much point in 
holding it up further. Your v2 patch looks good enough to me, thanks!

Alexey

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] fs/exec: switch timens when a task gets a new mm
  2022-09-21  0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin
  2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
  2022-09-22  3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook
@ 2022-10-18  7:21 ` Kees Cook
  2 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2022-10-18  7:21 UTC (permalink / raw)
  To: avagin
  Cc: Kees Cook, fweimer, linux-kernel, ebiederm, izbyshev, avagin,
	0x7f454c46, brauner

On Tue, 20 Sep 2022 17:31:19 -0700, Andrei Vagin wrote:
> From: Andrei Vagin <avagin@gmail.com>
> 
> Changing a time namespace requires remapping a vvar page, so we don't want
> to allow doing that if any other tasks can use the same mm.
> 
> Currently, we install a time namespace when a task is created with a new
> vm. exec() is another case when a task gets a new mm and so it can switch
> a time namespace safely, but it isn't handled now.
> 
> [...]

Applied to for-next/execve, thanks!

[1/2] fs/exec: switch timens when a task gets a new mm
      https://git.kernel.org/kees/c/f8e06046dd69
[2/2] selftests/timens: add a test for vfork+exit
      https://git.kernel.org/kees/c/de8517320d01

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-18  7:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21  0:31 [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Andrei Vagin
2022-09-21  0:31 ` [PATCH 2/2] selftests/timens: add a test for vfork+exit Andrei Vagin
2022-10-09 16:10   ` Alexey Izbyshev
2022-10-13 17:46     ` Andrei Vagin
2022-10-13 22:15       ` Alexey Izbyshev
2022-10-13 17:31   ` [PATCH 2/2 v2] " Andrei Vagin
2022-09-22  3:29 ` [PATCH 1/2] fs/exec: switch timens when a task gets a new mm Kees Cook
2022-09-23  1:47   ` Andrei Vagin
2022-09-24  5:02     ` Kees Cook
2022-10-18  7:21 ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.