linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] nsproxy: support CLONE_NEWTIME with setns()
@ 2020-06-19 15:35 Christian Brauner
  2020-06-19 15:35 ` [PATCH 1/3] timens: make vdso_join_timens() always succeed Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christian Brauner @ 2020-06-19 15:35 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, x86
  Cc: Will Deacon, Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov, Andrei Vagin, Christian Brauner

Hey,

So far setns() was missing time namespace support. This was partially
due to it simply not being implemented but also because
vdso_join_timens() could still fail which made switching to multiple
namespaces atomically problematic. This series first fixes
vdso_join_timens() to never fail, introduces timens_commit() and finally
adds CLONE_NEWTIME support for setns().

Please note, that arm is currently in the process of adding
vdso_join_timens() support (cf. [1]) so it might make sense to split the
vdso_join_timens() change out and route it to mainline as a fix so both
my series and the arm support can be rebased on top of it. I've Cced the
relevant people and I'm also replying to the arm thread now.

[1]: https://lore.kernel.org/lkml/20200602180259.76361-1-avagin@gmail.com/

Thanks!
Christian

Christian Brauner (3):
  timens: make vdso_join_timens() always succeed
  timens: add timens_commit() helper
  nsproxy: support CLONE_NEWTIME with setns()

 arch/x86/entry/vdso/vma.c      |  6 ++----
 include/linux/time_namespace.h | 13 +++++++++----
 kernel/nsproxy.c               | 21 +++++++++++++++++++--
 kernel/time/namespace.c        | 22 ++++++++--------------
 4 files changed, 38 insertions(+), 24 deletions(-)


base-commit: b3a9e3b9622ae10064826dccb4f7a52bd88c7407
-- 
2.27.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] timens: make vdso_join_timens() always succeed
  2020-06-19 15:35 [PATCH 0/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
@ 2020-06-19 15:35 ` Christian Brauner
  2020-06-19 15:35 ` [PATCH 2/3] timens: add timens_commit() helper Christian Brauner
  2020-06-19 15:35 ` [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2020-06-19 15:35 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, x86
  Cc: Will Deacon, Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov, Andrei Vagin, Christian Brauner

As discussed on-list (cf. [1]), in order to make setns() support time
namespaces properly we need to tweak vdso_join_timens() to always succeed.
So switch vdso_join_timens() from mmap_write_lock_killable() to
mmap_write_lock().

Last cycle setns() was changed to support attaching to multiple namespaces
atomically. This requires all namespaces to have a point of no return where
they can't fail anymore. Specifically, <namespace-type>_install() is
allowed to perform permission checks and install the namespace into the new
struct nsset that it has been given but it is not allowed to make visible
changes to the affected task. Once <namespace-type>_install() returns
anything that the given namespace type requires to be setup in addition
needs to ideally be done in a function that can't fail or if it fails the
failure is not fatal. For time namespaces the relevant functions that fall
into this category are timens_set_vvar_page() and vdso_join_timens().
Currently the latter can fail but doesn't need to. With this we can go on
to implement a timens_commit() helper in a follow up patch to be used by
setns().

[1]: https://lore.kernel.org/lkml/20200611110221.pgd3r5qkjrjmfqa2@wittgenstein
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 arch/x86/entry/vdso/vma.c      |  6 ++----
 include/linux/time_namespace.h |  7 +++----
 kernel/time/namespace.c        | 10 ++--------
 3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index ea7c1f0b79df..be3f542e419c 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -139,13 +139,12 @@ static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
  * corresponding layout.
  * See also the comment near timens_setup_vdso_data() for details.
  */
-int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
+void vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 {
 	struct mm_struct *mm = task->mm;
 	struct vm_area_struct *vma;
 
-	if (mmap_write_lock_killable(mm))
-		return -EINTR;
+	mmap_write_lock(mm);
 
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		unsigned long size = vma->vm_end - vma->vm_start;
@@ -155,7 +154,6 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 	}
 
 	mmap_write_unlock(mm);
-	return 0;
 }
 #else
 static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma)
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 824d54e057eb..4d1768c6f836 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -31,8 +31,8 @@ struct time_namespace {
 extern struct time_namespace init_time_ns;
 
 #ifdef CONFIG_TIME_NS
-extern int vdso_join_timens(struct task_struct *task,
-			    struct time_namespace *ns);
+extern void vdso_join_timens(struct task_struct *task,
+			     struct time_namespace *ns);
 
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
 {
@@ -90,10 +90,9 @@ static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
 }
 
 #else
-static inline int vdso_join_timens(struct task_struct *task,
+static inline void vdso_join_timens(struct task_struct *task,
 				   struct time_namespace *ns)
 {
-	return 0;
 }
 
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index 5d9fc22d836a..e5af6fe87af8 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -284,7 +284,6 @@ static int timens_install(struct nsset *nsset, struct ns_common *new)
 {
 	struct nsproxy *nsproxy = nsset->nsproxy;
 	struct time_namespace *ns = to_time_ns(new);
-	int err;
 
 	if (!current_is_single_threaded())
 		return -EUSERS;
@@ -295,9 +294,7 @@ static int timens_install(struct nsset *nsset, struct ns_common *new)
 
 	timens_set_vvar_page(current, ns);
 
-	err = vdso_join_timens(current, ns);
-	if (err)
-		return err;
+	vdso_join_timens(current, ns);
 
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
@@ -313,7 +310,6 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 {
 	struct ns_common *nsc = &nsproxy->time_ns_for_children->ns;
 	struct time_namespace *ns = to_time_ns(nsc);
-	int err;
 
 	/* create_new_namespaces() already incremented the ref counter */
 	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
@@ -321,9 +317,7 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 
 	timens_set_vvar_page(tsk, ns);
 
-	err = vdso_join_timens(tsk, ns);
-	if (err)
-		return err;
+	vdso_join_timens(tsk, ns);
 
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] timens: add timens_commit() helper
  2020-06-19 15:35 [PATCH 0/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
  2020-06-19 15:35 ` [PATCH 1/3] timens: make vdso_join_timens() always succeed Christian Brauner
@ 2020-06-19 15:35 ` Christian Brauner
  2020-06-19 15:35 ` [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2020-06-19 15:35 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, x86
  Cc: Will Deacon, Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov, Andrei Vagin, Christian Brauner

Wrap the calls to timens_set_vvar_page() and vdso_join_timens() in
timens_on_fork() and timens_install() in a new timens_commit() helper.
We'll use this helper in a follow-up patch in nsproxy too.

Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 kernel/time/namespace.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index e5af6fe87af8..aa7b90aac2a7 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -280,6 +280,12 @@ static void timens_put(struct ns_common *ns)
 	put_time_ns(to_time_ns(ns));
 }
 
+static void timens_commit(struct task_struct *tsk, struct time_namespace *ns)
+{
+	timens_set_vvar_page(tsk, ns);
+	vdso_join_timens(tsk, ns);
+}
+
 static int timens_install(struct nsset *nsset, struct ns_common *new)
 {
 	struct nsproxy *nsproxy = nsset->nsproxy;
@@ -292,9 +298,8 @@ static int timens_install(struct nsset *nsset, struct ns_common *new)
 	    !ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
-	timens_set_vvar_page(current, ns);
 
-	vdso_join_timens(current, ns);
+	timens_commit(current, ns);
 
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
@@ -315,14 +320,12 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
 		return 0;
 
-	timens_set_vvar_page(tsk, ns);
-
-	vdso_join_timens(tsk, ns);
-
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
 	nsproxy->time_ns = ns;
 
+	timens_commit(tsk, ns);
+
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
  2020-06-19 15:35 [PATCH 0/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
  2020-06-19 15:35 ` [PATCH 1/3] timens: make vdso_join_timens() always succeed Christian Brauner
  2020-06-19 15:35 ` [PATCH 2/3] timens: add timens_commit() helper Christian Brauner
@ 2020-06-19 15:35 ` Christian Brauner
  2020-06-23 11:55   ` Christian Brauner
  2020-06-25  9:06   ` Andrei Vagin
  2 siblings, 2 replies; 8+ messages in thread
From: Christian Brauner @ 2020-06-19 15:35 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, x86
  Cc: Will Deacon, Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov, Andrei Vagin, Christian Brauner

So far setns() was missing time namespace support. This was partially due
to it simply not being implemented but also because vdso_join_timens()
could still fail which made switching to multiple namespaces atomically
problematic. This is now fixed so support CLONE_NEWTIME with setns()

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/time_namespace.h |  6 ++++++
 kernel/nsproxy.c               | 21 +++++++++++++++++++--
 kernel/time/namespace.c        |  5 +----
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 4d1768c6f836..d308a3812f79 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -33,6 +33,7 @@ extern struct time_namespace init_time_ns;
 #ifdef CONFIG_TIME_NS
 extern void vdso_join_timens(struct task_struct *task,
 			     struct time_namespace *ns);
+extern void timens_commit(struct task_struct *tsk, struct time_namespace *ns);
 
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
 {
@@ -95,6 +96,11 @@ static inline void vdso_join_timens(struct task_struct *task,
 {
 }
 
+static inline void timens_commit(struct task_struct *tsk,
+				 struct time_namespace *ns)
+{
+}
+
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
 {
 	return NULL;
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b03df67621d0..f12231c41b69 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -262,8 +262,8 @@ void exit_task_namespaces(struct task_struct *p)
 static int check_setns_flags(unsigned long flags)
 {
 	if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-				 CLONE_NEWNET | CLONE_NEWUSER | CLONE_NEWPID |
-				 CLONE_NEWCGROUP)))
+				 CLONE_NEWNET | CLONE_NEWTIME | CLONE_NEWUSER |
+				 CLONE_NEWPID | CLONE_NEWCGROUP)))
 		return -EINVAL;
 
 #ifndef CONFIG_USER_NS
@@ -290,6 +290,10 @@ static int check_setns_flags(unsigned long flags)
 	if (flags & CLONE_NEWNET)
 		return -EINVAL;
 #endif
+#ifndef CONFIG_TIME_NS
+	if (flags & CLONE_NEWTIME)
+		return -EINVAL;
+#endif
 
 	return 0;
 }
@@ -464,6 +468,14 @@ static int validate_nsset(struct nsset *nsset, struct pid *pid)
 	}
 #endif
 
+#ifdef CONFIG_TIME_NS
+	if (flags & CLONE_NEWTIME) {
+		ret = validate_ns(nsset, &nsp->time_ns->ns);
+		if (ret)
+			goto out;
+	}
+#endif
+
 out:
 	if (pid_ns)
 		put_pid_ns(pid_ns);
@@ -507,6 +519,11 @@ static void commit_nsset(struct nsset *nsset)
 		exit_sem(me);
 #endif
 
+#ifdef CONFIG_TIME_NS
+	if (flags & CLONE_NEWTIME)
+		timens_commit(me, nsset->nsproxy->time_ns);
+#endif
+
 	/* transfer ownership */
 	switch_task_namespaces(me, nsset->nsproxy);
 	nsset->nsproxy = NULL;
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index aa7b90aac2a7..afc65e6be33e 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -280,7 +280,7 @@ static void timens_put(struct ns_common *ns)
 	put_time_ns(to_time_ns(ns));
 }
 
-static void timens_commit(struct task_struct *tsk, struct time_namespace *ns)
+void timens_commit(struct task_struct *tsk, struct time_namespace *ns)
 {
 	timens_set_vvar_page(tsk, ns);
 	vdso_join_timens(tsk, ns);
@@ -298,9 +298,6 @@ static int timens_install(struct nsset *nsset, struct ns_common *new)
 	    !ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
-
-	timens_commit(current, ns);
-
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
 	nsproxy->time_ns = ns;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
  2020-06-19 15:35 ` [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
@ 2020-06-23 11:55   ` Christian Brauner
  2020-06-25  8:42     ` Andrei Vagin
  2020-06-25  9:06   ` Andrei Vagin
  1 sibling, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2020-06-23 11:55 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel, x86, Dmitry Safonov, Andrei Vagin
  Cc: Will Deacon, Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	adrian

On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote:
> So far setns() was missing time namespace support. This was partially due
> to it simply not being implemented but also because vdso_join_timens()
> could still fail which made switching to multiple namespaces atomically
> problematic. This is now fixed so support CLONE_NEWTIME with setns()
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Dmitry Safonov <dima@arista.com>
> Cc: Andrei Vagin <avagin@gmail.com>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---

Andrei,
Dmitry,

A little off-topic since its not related to the patch here but I've been
going through the current time namespace semantics and i just want to
confirm something with you:

Afaict, unshare(CLONE_NEWTIME) currently works similar to
unshare(CLONE_NEWPID) in that it only changes {pid,time}_for_children
but does _not_ change the {pid, time} namespace of the caller itself.
For pid namespaces that makes a lot of sense but I'm not completely
clear why you're doing this for time namespaces, especially since the
setns() behavior for CLONE_NEWPID and CLONE_NEWTIME is very different:
Similar to unshare(CLONE_NEWPID), setns(CLONE_NEWPID) doesn't change the
pid namespace of the caller itself, it only changes it for it's
children by setting up pid_for_children. _But_ for setns(CLONE_NEWTIME)
both the caller's and the children's time namespace is changed, i.e.
unshare(CLONE_NEWTIME) behaves different from setns(CLONE_NEWTIME). Why?

This also has the consequence that the unshare(CLONE_NEWTIME) +
setns(CLONE_NEWTIME) sequence can be used to change the callers pid
namespace. Is this intended?
Here's some code where you can verify this (please excuse the aweful
code I'm using to illustrate this):

int main(int argc, char *argv[])
{
	char buf1[4096], buf2[4096];

	if (unshare(0x00000080))
		exit(1);

	int fd = open("/proc/self/ns/time", O_RDONLY);
	if (fd < 0)
		exit(2);

	readlink("/proc/self/ns/time", buf1, sizeof(buf1));
	readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
	printf("unshare(CLONE_NEWTIME):		time(%s) ~= time_for_children(%s)\n", buf1, buf2);

	if (setns(fd, 0x00000080))
		exit(3);

	readlink("/proc/self/ns/time", buf1, sizeof(buf1));
	readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
	printf("setns(self, CLONE_NEWTIME):	time(%s) == time_for_children(%s)\n", buf1, buf2);

	exit(EXIT_SUCCESS);
}

which gives:

root@f2-vm:/# ./test
unshare(CLONE_NEWTIME):		time(time:[4026531834]) ~= time_for_children(time:[4026532366])
setns(self, CLONE_NEWTIME):	time(time:[4026531834]) == time_for_children(time:[4026531834])

why is unshare(CLONE_NEWTIME) blocked from changing the callers pid
namespace when setns(CLONE_NEWTIME) is allowed to do this?

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
  2020-06-23 11:55   ` Christian Brauner
@ 2020-06-25  8:42     ` Andrei Vagin
  0 siblings, 0 replies; 8+ messages in thread
From: Andrei Vagin @ 2020-06-25  8:42 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-arm-kernel, linux-kernel, x86, Dmitry Safonov, Will Deacon,
	Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	adrian

On Tue, Jun 23, 2020 at 01:55:21PM +0200, Christian Brauner wrote:
> On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote:
> > So far setns() was missing time namespace support. This was partially due
> > to it simply not being implemented but also because vdso_join_timens()
> > could still fail which made switching to multiple namespaces atomically
> > problematic. This is now fixed so support CLONE_NEWTIME with setns()
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> > Cc: Serge Hallyn <serge@hallyn.com>
> > Cc: Dmitry Safonov <dima@arista.com>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> > ---
> 
> Andrei,
> Dmitry,
> 
> A little off-topic since its not related to the patch here but I've been
> going through the current time namespace semantics and i just want to
> confirm something with you:
> 
> Afaict, unshare(CLONE_NEWTIME) currently works similar to
> unshare(CLONE_NEWPID) in that it only changes {pid,time}_for_children
> but does _not_ change the {pid, time} namespace of the caller itself.
> For pid namespaces that makes a lot of sense but I'm not completely
> clear why you're doing this for time namespaces, especially since the
> setns() behavior for CLONE_NEWPID and CLONE_NEWTIME is very different:
> Similar to unshare(CLONE_NEWPID), setns(CLONE_NEWPID) doesn't change the
> pid namespace of the caller itself, it only changes it for it's
> children by setting up pid_for_children. _But_ for setns(CLONE_NEWTIME)
> both the caller's and the children's time namespace is changed, i.e.
> unshare(CLONE_NEWTIME) behaves different from setns(CLONE_NEWTIME). Why?

This scheme allows setting clock offsets for a namespace, before any
processes appear in it. It is not allowed to change offsets if any task
has joined a time namespace. We need this to avoid corner cases with
timers and tasks don't need to be aware of offset changes.

> 
> This also has the consequence that the unshare(CLONE_NEWTIME) +
> setns(CLONE_NEWTIME) sequence can be used to change the callers pid
> namespace. Is this intended?
> Here's some code where you can verify this (please excuse the aweful
> code I'm using to illustrate this):
> 
> int main(int argc, char *argv[])
> {
> 	char buf1[4096], buf2[4096];
> 
> 	if (unshare(0x00000080))
> 		exit(1);
> 
> 	int fd = open("/proc/self/ns/time", O_RDONLY);
> 	if (fd < 0)
> 		exit(2);
> 
> 	readlink("/proc/self/ns/time", buf1, sizeof(buf1));
> 	readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
> 	printf("unshare(CLONE_NEWTIME):		time(%s) ~= time_for_children(%s)\n", buf1, buf2);
> 
> 	if (setns(fd, 0x00000080))
> 		exit(3);

And in this example, you use the right sequence of steps: unshare, set
offsets, setns. With clone3, we will be able to do this in one call.

> 
> 	readlink("/proc/self/ns/time", buf1, sizeof(buf1));
> 	readlink("/proc/self/ns/time_for_children", buf2, sizeof(buf2));
> 	printf("setns(self, CLONE_NEWTIME):	time(%s) == time_for_children(%s)\n", buf1, buf2);
> 
> 	exit(EXIT_SUCCESS);
> }
> 
> which gives:
> 
> root@f2-vm:/# ./test
> unshare(CLONE_NEWTIME):		time(time:[4026531834]) ~= time_for_children(time:[4026532366])
> setns(self, CLONE_NEWTIME):	time(time:[4026531834]) == time_for_children(time:[4026531834])
> 
> why is unshare(CLONE_NEWTIME) blocked from changing the callers pid
> namespace when setns(CLONE_NEWTIME) is allowed to do this?
> 
> Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
  2020-06-19 15:35 ` [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
  2020-06-23 11:55   ` Christian Brauner
@ 2020-06-25  9:06   ` Andrei Vagin
  2020-06-25 12:48     ` Christian Brauner
  1 sibling, 1 reply; 8+ messages in thread
From: Andrei Vagin @ 2020-06-25  9:06 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-arm-kernel, linux-kernel, x86, Will Deacon,
	Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov

On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote:
> So far setns() was missing time namespace support. This was partially due
> to it simply not being implemented but also because vdso_join_timens()
> could still fail which made switching to multiple namespaces atomically
> problematic. This is now fixed so support CLONE_NEWTIME with setns()
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Dmitry Safonov <dima@arista.com>
> Cc: Andrei Vagin <avagin@gmail.com>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>

Hi Christian,

I have reviewed this series and it looks good to me.

We decided to not change the return type of vdso_join_timens to avoid
conflicts with the arm64 timens patchset. With this change, you can add
my Reviewed-by to all patched in this series.

Reviewed-by: Andrei Vagin <avagin@gmail.com>

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns()
  2020-06-25  9:06   ` Andrei Vagin
@ 2020-06-25 12:48     ` Christian Brauner
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2020-06-25 12:48 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: linux-arm-kernel, linux-kernel, x86, Will Deacon,
	Vincenzo Frascino, Thomas Gleixner, Serge Hallyn,
	Michael Kerrisk, Andy Lutomirski, Catalin Marinas, Mark Rutland,
	Dmitry Safonov

On Thu, Jun 25, 2020 at 02:06:18AM -0700, Andrei Vagin wrote:
> On Fri, Jun 19, 2020 at 05:35:59PM +0200, Christian Brauner wrote:
> > So far setns() was missing time namespace support. This was partially due
> > to it simply not being implemented but also because vdso_join_timens()
> > could still fail which made switching to multiple namespaces atomically
> > problematic. This is now fixed so support CLONE_NEWTIME with setns()
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> > Cc: Serge Hallyn <serge@hallyn.com>
> > Cc: Dmitry Safonov <dima@arista.com>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> Hi Christian,
> 
> I have reviewed this series and it looks good to me.
> 
> We decided to not change the return type of vdso_join_timens to avoid
> conflicts with the arm64 timens patchset. With this change, you can add
> my Reviewed-by to all patched in this series.
> 
> Reviewed-by: Andrei Vagin <avagin@gmail.com>

Thanks! As discussed in the thread for th arm changes. We'll defer the
return type changes!

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-25 12:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19 15:35 [PATCH 0/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
2020-06-19 15:35 ` [PATCH 1/3] timens: make vdso_join_timens() always succeed Christian Brauner
2020-06-19 15:35 ` [PATCH 2/3] timens: add timens_commit() helper Christian Brauner
2020-06-19 15:35 ` [PATCH 3/3] nsproxy: support CLONE_NEWTIME with setns() Christian Brauner
2020-06-23 11:55   ` Christian Brauner
2020-06-25  8:42     ` Andrei Vagin
2020-06-25  9:06   ` Andrei Vagin
2020-06-25 12:48     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).