All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] unshare: allow persisting namespaces
@ 2015-04-09 11:22 Karel Zak
  2015-04-09 11:22 ` [PATCH 2/2] unshare: allow persisting mount namespaces Karel Zak
  0 siblings, 1 reply; 10+ messages in thread
From: Karel Zak @ 2015-04-09 11:22 UTC (permalink / raw)
  To: util-linux; +Cc: Eric W. Biederman, Karel Zak

For nsenter(1) we already support namespace specification by file
(e.g. bind mount to namespace /proc/[pid]/ns/[type] file). For
example:

  # nsenter --uts=/some/path

This patch extends unshare(1) to setup the bind mount for specified
namespace, for example

  # touch /some/path
  # unshare --uts=/some/path hostname FOO
  # nsenter --uts=/some/path hostname
  FOO

Note that the problem is mount namespace, because create bind mount
to ns/mount file within unshared namespace does not make sense.

Based on patch from Lubomir Rintel <lkundrak@v3.sk>.

Signed-off-by: Karel Zak <kzak@redhat.com>
---
 sys-utils/unshare.1 | 70 +++++++++++++++++++++++++++++-----------
 sys-utils/unshare.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 132 insertions(+), 31 deletions(-)

diff --git a/sys-utils/unshare.1 b/sys-utils/unshare.1
index 6fc71f4..14755e7 100644
--- a/sys-utils/unshare.1
+++ b/sys-utils/unshare.1
@@ -8,8 +8,17 @@ unshare \- run program with some namespaces unshared from parent
 .RI [ arguments ]
 .SH DESCRIPTION
 Unshares the indicated namespaces from the parent process and then executes
-the specified \fIprogram\fR.  The namespaces to be unshared are indicated via
-options.  Unshareable namespaces are:
+the specified \fIprogram\fR.
+.PP
+The namespaces can optionally be persisted by bind mounting /proc/[pid]/ns/[type] files
+to a filesystem path and entered with
+.BR nsenter (1)
+even after \fIprogram\fR terminates.
+Once a persistent namespace is no longer needed it can be unpersisted with
+.BR umount (8).
+See EXAMPLES section for more details.
+.PP
+The namespaces to be unshared are indicated via options.  Unshareable namespaces are:
 .TP
 .BR "mount namespace"
 Mounting and unmounting filesystems will not affect the rest of the system
@@ -47,24 +56,29 @@ The process will have a distinct set of UIDs, GIDs and capabilities.
 See \fBclone\fR(2) for the exact semantics of the flags.
 .SH OPTIONS
 .TP
-.BR \-i , " \-\-ipc"
-Unshare the IPC namespace.
+.BR \-i , " \-\-ipc"[=\fIfile\fP]
+Unshare the IPC namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
 .TP
-.BR \-m , " \-\-mount"
-Unshare the mount namespace.
+.BR \-m , " \-\-mount"[=\fIfile\fP]
+Unshare the mount namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
 .TP
-.BR \-n , " \-\-net"
-Unshare the network namespace.
+.BR \-n , " \-\-net"[=\fIfile\fP]
+Unshare the network namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
 .TP
-.BR \-p , " \-\-pid"
-Unshare the pid namespace.
-See also the \fB--fork\fP and \fB--mount-proc\fP options.
+.BR \-p , " \-\-pid"[=\fIfile\fP]
+Unshare the pid namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount. See also the \fB--fork\fP and \fB--mount-proc\fP options.
 .TP
-.BR \-u , " \-\-uts"
-Unshare the UTS namespace.
+.BR \-u , " \-\-uts"[=\fIfile\fP]
+Unshare the UTS namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
 .TP
-.BR \-U , " \-\-user"
-Unshare the user namespace.
+.BR \-U , " \-\-user"[=\fIfile\fP]
+Unshare the user namespace. If \fIfile\fP is specified then persistent namespace is created
+by bind mount.
 .TP
 .BR \-f , " \-\-fork"
 Fork the specified \fIprogram\fR as a child process of \fBunshare\fR rather than
@@ -125,14 +139,32 @@ procfs instance.
 root
 .br
 Establish a user namespace as an unprivileged user with a root user within it.
+.TP
+.TQ
+.B # touch /root/uts-ns
+.TQ
+.B # unshare --uts=/root/uts-ns hostanme FOO
+.TQ
+.B # nsenter --uts=/root/uts-ns hostname
+.TQ
+FOO
+.TQ
+.B # umount /root/uts-ns
+.br
+Establish a persistent UTS namespace, modify hostname. The namespace maybe later entered
+by nsenter. The namespace is destroyed by umount the bind reference.
 .SH SEE ALSO
 .BR unshare (2),
 .BR clone (2),
 .BR mount (8)
-.SH BUGS
-None known so far.
-.SH AUTHOR
-Mikhail Gusarov <dottedmag@dottedmag.net>
+.SH AUTHORS
+.UR dottedmag@dottedmag.net
+Mikhail Gusarov
+.UE
+.br
+.UR kzak@redhat.com
+Karel Zak
+.UE
 .SH AVAILABILITY
 The unshare command is part of the util-linux package and is available from
 ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 18a7c7b..65d3e61 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -42,6 +42,24 @@
 /* 'private' is kernel default */
 #define UNSHARE_PROPAGATION_DEFAULT	(MS_REC | MS_PRIVATE)
 
+/* /proc namespace files and mountpoints for binds */
+static struct namespace_file {
+	int		type;		/* CLONE_NEW* */
+	const char	*name;		/* ns/<type> */
+	const char	*target;	/* user specified target for bind mount */
+} namespace_files[] = {
+	{ .type = CLONE_NEWUSER, .name = "ns/user" },
+	{ .type = CLONE_NEWIPC,  .name = "ns/ipc"  },
+	{ .type = CLONE_NEWUTS,  .name = "ns/uts"  },
+	{ .type = CLONE_NEWNET,  .name = "ns/net"  },
+	{ .type = CLONE_NEWPID,  .name = "ns/pid"  },
+	{ .type = CLONE_NEWNS,   .name = "ns/mnt"  },
+	{ .name = NULL }
+};
+
+static int npersists;	/* number of persistent namespaces */
+
+
 enum {
 	SETGROUPS_NONE = -1,
 	SETGROUPS_DENY = 0,
@@ -133,6 +151,40 @@ static void set_propagation(unsigned long flags)
 		err(EXIT_FAILURE, _("cannot change root filesystem propagation"));
 }
 
+
+static int set_ns_target(int type, const char *path)
+{
+	struct namespace_file *ns;
+
+	for (ns = namespace_files; ns->name; ns++) {
+		if (ns->type != type)
+			continue;
+		ns->target = path;
+		npersists++;
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int bind_ns_files(pid_t pid)
+{
+	struct namespace_file *ns;
+	char src[PATH_MAX];
+
+	for (ns = namespace_files; ns->name; ns++) {
+		if (!ns->target)
+			continue;
+
+		snprintf(src, sizeof(src), "/proc/%u/%s", (unsigned) pid, ns->name);
+
+		if (mount(src, ns->target, NULL, MS_BIND, NULL) != 0)
+			err(EXIT_FAILURE, _("mount %s on %s failed"), src, ns->target);
+	}
+
+	return 0;
+}
+
 static void usage(int status)
 {
 	FILE *out = status == EXIT_SUCCESS ? stdout : stderr;
@@ -145,12 +197,12 @@ static void usage(int status)
 	fputs(_("Run a program with some namespaces unshared from the parent.\n"), out);
 
 	fputs(USAGE_OPTIONS, out);
-	fputs(_(" -m, --mount               unshare mounts namespace\n"), out);
-	fputs(_(" -u, --uts                 unshare UTS namespace (hostname etc)\n"), out);
-	fputs(_(" -i, --ipc                 unshare System V IPC namespace\n"), out);
-	fputs(_(" -n, --net                 unshare network namespace\n"), out);
-	fputs(_(" -p, --pid                 unshare pid namespace\n"), out);
-	fputs(_(" -U, --user                unshare user namespace\n"), out);
+	fputs(_(" -m, --mount[=<file>]      unshare mounts namespace\n"), out);
+	fputs(_(" -u, --uts[=<file>]        unshare UTS namespace (hostname etc)\n"), out);
+	fputs(_(" -i, --ipc[=<file>]        unshare System V IPC namespace\n"), out);
+	fputs(_(" -n, --net[=<file>]        unshare network namespace\n"), out);
+	fputs(_(" -p, --pid[=<file>]        unshare pid namespace\n"), out);
+	fputs(_(" -U, --user[=<file>]       unshare user namespace\n"), out);
 	fputs(_(" -f, --fork                fork before launching <program>\n"), out);
 	fputs(_("     --mount-proc[=<dir>]  mount proc filesystem first (implies --mount)\n"), out);
 	fputs(_(" -r, --map-root-user       map current user to root (implies --user)\n"), out);
@@ -176,12 +228,14 @@ int main(int argc, char *argv[])
 	static const struct option longopts[] = {
 		{ "help", no_argument, 0, 'h' },
 		{ "version", no_argument, 0, 'V'},
-		{ "mount", no_argument, 0, 'm' },
-		{ "uts", no_argument, 0, 'u' },
-		{ "ipc", no_argument, 0, 'i' },
-		{ "net", no_argument, 0, 'n' },
-		{ "pid", no_argument, 0, 'p' },
-		{ "user", no_argument, 0, 'U' },
+
+		{ "mount", optional_argument, 0, 'm' },
+		{ "uts",   optional_argument, 0, 'u' },
+		{ "ipc",   optional_argument, 0, 'i' },
+		{ "net",   optional_argument, 0, 'n' },
+		{ "pid",   optional_argument, 0, 'p' },
+		{ "user",  optional_argument, 0, 'U' },
+
 		{ "fork", no_argument, 0, 'f' },
 		{ "mount-proc", optional_argument, 0, OPT_MOUNTPROC },
 		{ "map-root-user", no_argument, 0, 'r' },
@@ -215,21 +269,33 @@ int main(int argc, char *argv[])
 			return EXIT_SUCCESS;
 		case 'm':
 			unshare_flags |= CLONE_NEWNS;
+			if (optarg)
+				set_ns_target(CLONE_NEWNS, optarg);
 			break;
 		case 'u':
 			unshare_flags |= CLONE_NEWUTS;
+			if (optarg)
+				set_ns_target(CLONE_NEWUTS, optarg);
 			break;
 		case 'i':
 			unshare_flags |= CLONE_NEWIPC;
+			if (optarg)
+				set_ns_target(CLONE_NEWIPC, optarg);
 			break;
 		case 'n':
 			unshare_flags |= CLONE_NEWNET;
+			if (optarg)
+				set_ns_target(CLONE_NEWNET, optarg);
 			break;
 		case 'p':
 			unshare_flags |= CLONE_NEWPID;
+			if (optarg)
+				set_ns_target(CLONE_NEWPID, optarg);
 			break;
 		case 'U':
 			unshare_flags |= CLONE_NEWUSER;
+			if (optarg)
+				set_ns_target(CLONE_NEWUSER, optarg);
 			break;
 		case OPT_MOUNTPROC:
 			unshare_flags |= CLONE_NEWNS;
@@ -273,6 +339,9 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (npersists)
+		bind_ns_files(getpid());
+
 	if (maproot) {
 		if (setgrpcmd == SETGROUPS_ALLOW)
 			errx(EXIT_FAILURE, _("options --setgroups=allow and "
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] unshare: allow persisting mount namespaces
  2015-04-09 11:22 [PATCH 1/2] unshare: allow persisting namespaces Karel Zak
@ 2015-04-09 11:22 ` Karel Zak
  2015-04-09 17:07   ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Karel Zak @ 2015-04-09 11:22 UTC (permalink / raw)
  To: util-linux; +Cc: Eric W. Biederman, Karel Zak

We can create a reference (bind mount) to the new namespace after
unshare(2), but it does not make sense to do it within unshared
namespace. (And if I read kernel fs/namespace.c: do_loopback()
correctly than copy mount bind mounts of /proc/<pid>/ns/mnt between
namespaces is unsupported.)

This patch bypass this problem by fork() where parent continue as
usually (call unshare(2), setup another things, etc.), but child
waits for /proc/[ppid]/ns/mnt inode number change (the ino is
changed after parent's unshare(2)) and then it bind mounts the new
namespaces and exit.

Signed-off-by: Karel Zak <kzak@redhat.com>
---
 sys-utils/unshare.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 65d3e61..4ed9ad3 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -27,6 +27,10 @@
 #include <sys/wait.h>
 #include <sys/mount.h>
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 /* we only need some defines missing in sys/mount.h, no libmount linkage */
 #include <libmount.h>
 
@@ -185,6 +189,43 @@ static int bind_ns_files(pid_t pid)
 	return 0;
 }
 
+static ino_t get_mnt_ino(pid_t pid)
+{
+	struct stat st;
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "/proc/%u/ns/mnt", (unsigned) pid);
+
+	if (stat(path, &st) != 0)
+		err(EXIT_FAILURE, _("cannot stat %s"), path);
+	return st.st_ino;
+}
+
+static void bind_ns_files_from_child(pid_t *child)
+{
+	pid_t ppid = getpid();
+	ino_t ino = get_mnt_ino(ppid);
+
+	*child = fork();
+
+	switch(*child) {
+	case -1:
+		err(EXIT_FAILURE, _("fork failed"));
+	case 0:	/* child */
+		do {
+			/* wait until parent unshare() */
+			ino_t new_ino = get_mnt_ino(ppid);
+			if (ino != new_ino)
+				break;
+		} while (1);
+		bind_ns_files(ppid);
+		exit(EXIT_SUCCESS);
+		break;
+	default: /* parent */
+		break;
+	}
+}
+
 static void usage(int status)
 {
 	FILE *out = status == EXIT_SUCCESS ? stdout : stderr;
@@ -248,6 +289,8 @@ int main(int argc, char *argv[])
 	int unshare_flags = 0;
 	int c, forkit = 0, maproot = 0;
 	const char *procmnt = NULL;
+	pid_t pid = 0;
+	int status;
 	unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
 	uid_t real_euid = geteuid();
 	gid_t real_egid = getegid();;
@@ -316,12 +359,32 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (npersists && (unshare_flags & CLONE_NEWNS))
+		bind_ns_files_from_child(&pid);
+
 	if (-1 == unshare(unshare_flags))
 		err(EXIT_FAILURE, _("unshare failed"));
 
+	if (npersists) {
+		if (pid && (unshare_flags & CLONE_NEWNS)) {
+			/* wait for bind_ns_files_from_child() */
+			do {
+				int rc = waitpid(pid, &status, 0);
+				if (rc < 0 && errno == EINTR)
+					continue;
+				else if (rc < 0)
+					err(EXIT_FAILURE, _("waitpid failed"));
+			} while (0);
+
+			if (WIFEXITED(status) && WEXITSTATUS(status) != EXIT_SUCCESS)
+				return WEXITSTATUS(status);
+		} else
+			/* simple way, just bind */
+			bind_ns_files(getpid());
+	}
+
 	if (forkit) {
-		int status;
-		pid_t pid = fork();
+		pid = fork();
 
 		switch(pid) {
 		case -1:
@@ -339,8 +402,6 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	if (npersists)
-		bind_ns_files(getpid());
 
 	if (maproot) {
 		if (setgrpcmd == SETGROUPS_ALLOW)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2015-04-09 11:22 ` [PATCH 2/2] unshare: allow persisting mount namespaces Karel Zak
@ 2015-04-09 17:07   ` Eric W. Biederman
  2015-04-10  8:17     ` Karel Zak
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2015-04-09 17:07 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

Karel Zak <kzak@redhat.com> writes:

> We can create a reference (bind mount) to the new namespace after
> unshare(2), but it does not make sense to do it within unshared
> namespace. (And if I read kernel fs/namespace.c: do_loopback()
> correctly than copy mount bind mounts of /proc/<pid>/ns/mnt between
> namespaces is unsupported.)

The support is limited.  mount namespaces are only allowed to be bound
into older mount namespaces.

But yes it is much more interesting to have the namespaces visible in
the parent than in the newly created mount namespace.

Your do { } while(0); loop below concerns me.  I think continue and
break are equivalent in that construct.

> This patch bypass this problem by fork() where parent continue as
> usually (call unshare(2), setup another things, etc.), but child
> waits for /proc/[ppid]/ns/mnt inode number change (the ino is
> changed after parent's unshare(2)) and then it bind mounts the new
> namespaces and exit.
>
> Signed-off-by: Karel Zak <kzak@redhat.com>
> ---
>  sys-utils/unshare.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 65 insertions(+), 4 deletions(-)
>
> diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
> index 65d3e61..4ed9ad3 100644
> --- a/sys-utils/unshare.c
> +++ b/sys-utils/unshare.c
> @@ -27,6 +27,10 @@
>  #include <sys/wait.h>
>  #include <sys/mount.h>
>  
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
>  /* we only need some defines missing in sys/mount.h, no libmount linkage */
>  #include <libmount.h>
>  
> @@ -185,6 +189,43 @@ static int bind_ns_files(pid_t pid)
>  	return 0;
>  }
>  
> +static ino_t get_mnt_ino(pid_t pid)
> +{
> +	struct stat st;
> +	char path[PATH_MAX];
> +
> +	snprintf(path, sizeof(path), "/proc/%u/ns/mnt", (unsigned) pid);
> +
> +	if (stat(path, &st) != 0)
> +		err(EXIT_FAILURE, _("cannot stat %s"), path);
> +	return st.st_ino;
> +}
> +
> +static void bind_ns_files_from_child(pid_t *child)
> +{
> +	pid_t ppid = getpid();
> +	ino_t ino = get_mnt_ino(ppid);
> +
> +	*child = fork();
> +
> +	switch(*child) {
> +	case -1:
> +		err(EXIT_FAILURE, _("fork failed"));
> +	case 0:	/* child */
> +		do {
> +			/* wait until parent unshare() */
> +			ino_t new_ino = get_mnt_ino(ppid);
> +			if (ino != new_ino)
> +				break;
> +		} while (1);
> +		bind_ns_files(ppid);
> +		exit(EXIT_SUCCESS);
> +		break;
> +	default: /* parent */
> +		break;
> +	}
> +}
> +
>  static void usage(int status)
>  {
>  	FILE *out = status == EXIT_SUCCESS ? stdout : stderr;
> @@ -248,6 +289,8 @@ int main(int argc, char *argv[])
>  	int unshare_flags = 0;
>  	int c, forkit = 0, maproot = 0;
>  	const char *procmnt = NULL;
> +	pid_t pid = 0;
> +	int status;
>  	unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
>  	uid_t real_euid = geteuid();
>  	gid_t real_egid = getegid();;
> @@ -316,12 +359,32 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> +	if (npersists && (unshare_flags & CLONE_NEWNS))
> +		bind_ns_files_from_child(&pid);
> +
>  	if (-1 == unshare(unshare_flags))
>  		err(EXIT_FAILURE, _("unshare failed"));
>  
> +	if (npersists) {
> +		if (pid && (unshare_flags & CLONE_NEWNS)) {
> +			/* wait for bind_ns_files_from_child() */
> +			do {
> +				int rc = waitpid(pid, &status, 0);
> +				if (rc < 0 && errno == EINTR)
> +					continue;
> +				else if (rc < 0)
> +					err(EXIT_FAILURE, _("waitpid failed"));
> +			} while (0);
> +
> +			if (WIFEXITED(status) && WEXITSTATUS(status) != EXIT_SUCCESS)
> +				return WEXITSTATUS(status);
> +		} else
> +			/* simple way, just bind */
> +			bind_ns_files(getpid());
> +	}
> +
>  	if (forkit) {
> -		int status;
> -		pid_t pid = fork();
> +		pid = fork();
>  
>  		switch(pid) {
>  		case -1:
> @@ -339,8 +402,6 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> -	if (npersists)
> -		bind_ns_files(getpid());
>  
>  	if (maproot) {
>  		if (setgrpcmd == SETGROUPS_ALLOW)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2015-04-09 17:07   ` Eric W. Biederman
@ 2015-04-10  8:17     ` Karel Zak
  2016-01-30  3:52       ` Yuriy M. Kaminskiy
  0 siblings, 1 reply; 10+ messages in thread
From: Karel Zak @ 2015-04-10  8:17 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: util-linux

On Thu, Apr 09, 2015 at 12:07:09PM -0500, Eric W. Biederman wrote:
> Your do { } while(0); loop below concerns me.  I think continue and
> break are equivalent in that construct.

Ah.. sorry, stupid copy & past from another code; below is updated
version. Thanks for review.

    Karel


>From d7a3cfb3bcd315620aca136a5811d4bbc5cbe77d Mon Sep 17 00:00:00 2001
From: Karel Zak <kzak@redhat.com>
Date: Thu, 9 Apr 2015 11:48:07 +0200
Subject: [PATCH] unshare: allow persisting mount namespaces

We can create a reference (bind mount) to the new namespace after
unshare(2), but it does not make sense to do it within unshared
namespace. (And if I read kernel fs/namespace.c: do_loopback()
correctly than copy mount bind mounts of /proc/<pid>/ns/mnt between
namespaces is unsupported.)

This patch bypass this problem by fork() where parent continue as
usually (call unshare(2), setup another things, etc.), but child
waits for /proc/[ppid]/ns/mnt inode number change (the ino is
changed after parent's unshare(2)) and then it bind mounts the new
namespaces and exit.

Signed-off-by: Karel Zak <kzak@redhat.com>
---
 sys-utils/unshare.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 68 insertions(+), 4 deletions(-)

diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 65d3e61..460d149 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -27,6 +27,10 @@
 #include <sys/wait.h>
 #include <sys/mount.h>
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 /* we only need some defines missing in sys/mount.h, no libmount linkage */
 #include <libmount.h>
 
@@ -185,6 +189,43 @@ static int bind_ns_files(pid_t pid)
 	return 0;
 }
 
+static ino_t get_mnt_ino(pid_t pid)
+{
+	struct stat st;
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "/proc/%u/ns/mnt", (unsigned) pid);
+
+	if (stat(path, &st) != 0)
+		err(EXIT_FAILURE, _("cannot stat %s"), path);
+	return st.st_ino;
+}
+
+static void bind_ns_files_from_child(pid_t *child)
+{
+	pid_t ppid = getpid();
+	ino_t ino = get_mnt_ino(ppid);
+
+	*child = fork();
+
+	switch(*child) {
+	case -1:
+		err(EXIT_FAILURE, _("fork failed"));
+	case 0:	/* child */
+		do {
+			/* wait until parent unshare() */
+			ino_t new_ino = get_mnt_ino(ppid);
+			if (ino != new_ino)
+				break;
+		} while (1);
+		bind_ns_files(ppid);
+		exit(EXIT_SUCCESS);
+		break;
+	default: /* parent */
+		break;
+	}
+}
+
 static void usage(int status)
 {
 	FILE *out = status == EXIT_SUCCESS ? stdout : stderr;
@@ -248,6 +289,8 @@ int main(int argc, char *argv[])
 	int unshare_flags = 0;
 	int c, forkit = 0, maproot = 0;
 	const char *procmnt = NULL;
+	pid_t pid = 0;
+	int status;
 	unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
 	uid_t real_euid = geteuid();
 	gid_t real_egid = getegid();;
@@ -316,12 +359,35 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (npersists && (unshare_flags & CLONE_NEWNS))
+		bind_ns_files_from_child(&pid);
+
 	if (-1 == unshare(unshare_flags))
 		err(EXIT_FAILURE, _("unshare failed"));
 
+	if (npersists) {
+		if (pid && (unshare_flags & CLONE_NEWNS)) {
+			/* wait for bind_ns_files_from_child() */
+			int rc;
+
+			do {
+				rc = waitpid(pid, &status, 0);
+				if (rc < 0) {
+					if (errno == EINTR)
+						continue;
+					err(EXIT_FAILURE, _("waitpid failed"));
+				}
+				if (WIFEXITED(status) &&
+				    WEXITSTATUS(status) != EXIT_SUCCESS)
+					return WEXITSTATUS(status);
+			} while (rc < 0);
+		} else
+			/* simple way, just bind */
+			bind_ns_files(getpid());
+	}
+
 	if (forkit) {
-		int status;
-		pid_t pid = fork();
+		pid = fork();
 
 		switch(pid) {
 		case -1:
@@ -339,8 +405,6 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	if (npersists)
-		bind_ns_files(getpid());
 
 	if (maproot) {
 		if (setgrpcmd == SETGROUPS_ALLOW)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2015-04-10  8:17     ` Karel Zak
@ 2016-01-30  3:52       ` Yuriy M. Kaminskiy
  2016-01-30 13:31         ` Yuriy M. Kaminskiy
  0 siblings, 1 reply; 10+ messages in thread
From: Yuriy M. Kaminskiy @ 2016-01-30  3:52 UTC (permalink / raw)
  To: util-linux

Karel Zak <kzak@redhat.com> writes:

(this was commited as c84f2590dfdb8570beeb731e0105f8fe597443d1)

> +static void bind_ns_files_from_child(pid_t *child)
> +{
> +	pid_t ppid = getpid();
> +	ino_t ino = get_mnt_ino(ppid);
> +
> +	*child = fork();
> +
> +	switch(*child) {
> +	case -1:
> +		err(EXIT_FAILURE, _("fork failed"));
> +	case 0:	/* child */
> +		do {
> +			/* wait until parent unshare() */
> +			ino_t new_ino = get_mnt_ino(ppid);
> +			if (ino != new_ino)
> +				break;
> +		} while (1);

Racing? Suppose, parent died (e.g. unshare(2) failed), (parent) process
was reaped, new (unrelated) process was created with same pid, as a
result this function will bind namespaces from wrong process. (That
said, pretty unlikely, and no idea how to properly fix it).

> +		bind_ns_files(ppid);
> +		exit(EXIT_SUCCESS);
> +		break;
> +	default: /* parent */
> +		break;
> +	}
> +}
> +
>  static void usage(int status)
>  {
>  	FILE *out = status == EXIT_SUCCESS ? stdout : stderr;
> @@ -248,6 +289,8 @@ int main(int argc, char *argv[])
>  	int unshare_flags = 0;
>  	int c, forkit = 0, maproot = 0;
>  	const char *procmnt = NULL;
> +	pid_t pid = 0;
> +	int status;
>  	unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
>  	uid_t real_euid = geteuid();
>  	gid_t real_egid = getegid();;
> @@ -316,12 +359,35 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> +	if (npersists && (unshare_flags & CLONE_NEWNS))
> +		bind_ns_files_from_child(&pid);
> +
>  	if (-1 == unshare(unshare_flags))
>  		err(EXIT_FAILURE, _("unshare failed"));
>  
> +	if (npersists) {
> +		if (pid && (unshare_flags & CLONE_NEWNS)) {
> +			/* wait for bind_ns_files_from_child() */
> +			int rc;
> +
> +			do {
> +				rc = waitpid(pid, &status, 0);
> +				if (rc < 0) {
> +					if (errno == EINTR)
> +						continue;
> +					err(EXIT_FAILURE, _("waitpid failed"));
> +				}
> +				if (WIFEXITED(status) &&
> +				    WEXITSTATUS(status) != EXIT_SUCCESS)
> +					return WEXITSTATUS(status);
> +			} while (rc < 0);
> +		} else
> +			/* simple way, just bind */
> +			bind_ns_files(getpid());
> +	}
> +
>  	if (forkit) {
> -		int status;
> -		pid_t pid = fork();
> +		pid = fork();
>  
>  		switch(pid) {
>  		case -1:
> @@ -339,8 +405,6 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> -	if (npersists)
> -		bind_ns_files(getpid());
>  
>  	if (maproot) {
>  		if (setgrpcmd == SETGROUPS_ALLOW)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2016-01-30  3:52       ` Yuriy M. Kaminskiy
@ 2016-01-30 13:31         ` Yuriy M. Kaminskiy
  2016-02-01 10:41           ` Karel Zak
  2016-02-17 13:07           ` Karel Zak
  0 siblings, 2 replies; 10+ messages in thread
From: Yuriy M. Kaminskiy @ 2016-01-30 13:31 UTC (permalink / raw)
  To: util-linux

[-- Attachment #1: Type: text/plain, Size: 1461 bytes --]

yumkam@gmail.com (Yuriy M. Kaminskiy)
writes:

> Karel Zak <kzak@redhat.com> writes:
>
> (this was commited as c84f2590dfdb8570beeb731e0105f8fe597443d1)
>
>> +static void bind_ns_files_from_child(pid_t *child)
>> +{
>> +	pid_t ppid = getpid();
>> +	ino_t ino = get_mnt_ino(ppid);
>> +
>> +	*child = fork();
>> +
>> +	switch(*child) {
>> +	case -1:
>> +		err(EXIT_FAILURE, _("fork failed"));
>> +	case 0:	/* child */
>> +		do {
>> +			/* wait until parent unshare() */
>> +			ino_t new_ino = get_mnt_ino(ppid);
>> +			if (ino != new_ino)
>> +				break;
>> +		} while (1);
>
> Racing? Suppose, parent died (e.g. unshare(2) failed), (parent) process
> was reaped, new (unrelated) process was created with same pid, as a
> result this function will bind namespaces from wrong process.

... besides, this is a busyloop, if "imposer process" is in same mnt
namespace as original process, it will occupy CPU forever (and this *is*
easy to fix; see attached; note that with attached patch race probability is
reduced, but not totally avoided, there are still window between
read(pipe) and mount() calls [but at least it won't busyloop, and won't
trigger racing on unshare(2) error]).

P.S. that said, for me, on debian's 3.16.7-ckt* kernel, `touch foo && unshare
--mount=foo` still fails with EINVAL (with either unpatched git master
or with this patch), so consider this as "compile-tested only".

> (That said, pretty unlikely, and no idea how to properly fix it).
>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-unshare-fix-busyloop-and-reduce-racing-probability.patch --]
[-- Type: text/x-diff, Size: 2898 bytes --]

>From 64d1e9d905455b8177d65ad58ca2b403bbe8222c Mon Sep 17 00:00:00 2001
From: "Yuriy M. Kaminskiy" <yumkam@gmail.com>
Date: Sat, 30 Jan 2016 16:18:39 +0300
Subject: [PATCH] unshare: fix busyloop and reduce racing probability

Replace busy-loop with waiting on pipe from parent.

Note: reduces racing probability, but still there are window where
it is possible (if parent unshare process will be [externally] killed
between successful read(fds[0]) and mount() calls).

Signed-off-by: Yuriy M. Kaminskiy <yumkam@gmail.com>
---
 sys-utils/unshare.c | 49 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 7482a8b..bf4f6d2 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -199,27 +199,50 @@ static ino_t get_mnt_ino(pid_t pid)
 	return st.st_ino;
 }
 
-static void bind_ns_files_from_child(pid_t *child)
+static void bind_ns_files_from_child(pid_t *child, int fds[2])
 {
 	pid_t ppid = getpid();
 	ino_t ino = get_mnt_ino(ppid);
 
+	if (pipe(fds) < 0)
+		err(EXIT_FAILURE, _("pipe failed"));
+
 	*child = fork();
 
 	switch(*child) {
 	case -1:
 		err(EXIT_FAILURE, _("fork failed"));
 	case 0:	/* child */
-		do {
-			/* wait until parent unshare() */
-			ino_t new_ino = get_mnt_ino(ppid);
-			if (ino != new_ino)
-				break;
-		} while (1);
+		close(fds[1]);
+		fds[1] = -1;
+		/* wait for parent */
+		for (;;) {
+			char ch = -1;
+
+			switch(read(fds[0], &ch, 1)) {
+			case -1:
+				if (errno == EINTR)
+					continue;
+				/* fallthrough */
+			default:
+				err(EXIT_FAILURE, _("pipe read error"));
+			case 0: /* parent died? */
+				exit(EXIT_FAILURE);
+			case 1:
+				if (ch != 0)
+					exit(EXIT_FAILURE);
+				/* break loop*/;
+			}
+			break;
+		}
+		if (get_mnt_ino(ppid) == ino)
+			exit(EXIT_FAILURE);
 		bind_ns_files(ppid);
 		exit(EXIT_SUCCESS);
 		break;
 	default: /* parent */
+		close(fds[0]);
+		fds[0] = -1;
 		break;
 	}
 }
@@ -288,6 +311,7 @@ int main(int argc, char *argv[])
 	int c, forkit = 0, maproot = 0;
 	const char *procmnt = NULL;
 	pid_t pid = 0;
+	int fds[2];
 	int status;
 	unsigned long propagation = UNSHARE_PROPAGATION_DEFAULT;
 	uid_t real_euid = geteuid();
@@ -358,16 +382,23 @@ int main(int argc, char *argv[])
 	}
 
 	if (npersists && (unshare_flags & CLONE_NEWNS))
-		bind_ns_files_from_child(&pid);
+		bind_ns_files_from_child(&pid, fds);
 
 	if (-1 == unshare(unshare_flags))
 		err(EXIT_FAILURE, _("unshare failed"));
 
 	if (npersists) {
 		if (pid && (unshare_flags & CLONE_NEWNS)) {
-			/* wait for bind_ns_files_from_child() */
 			int rc;
+			char ch = 0;
+
+			/* signal child we are ready */
+			while (write(fds[1], &ch, 1) == -1 && errno == EINTR)
+				;
+			close(fds[1]);
+			fds[1] = -1;
 
+			/* wait for bind_ns_files_from_child() */
 			do {
 				rc = waitpid(pid, &status, 0);
 				if (rc < 0) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2016-01-30 13:31         ` Yuriy M. Kaminskiy
@ 2016-02-01 10:41           ` Karel Zak
  2016-02-01 14:31             ` Yuriy M. Kaminskiy
  2016-02-17 13:07           ` Karel Zak
  1 sibling, 1 reply; 10+ messages in thread
From: Karel Zak @ 2016-02-01 10:41 UTC (permalink / raw)
  To: Yuriy M. Kaminskiy; +Cc: util-linux

On Sat, Jan 30, 2016 at 04:31:06PM +0300, Yuriy M. Kaminskiy wrote:
> >> +	case 0:	/* child */
> >> +		do {
> >> +			/* wait until parent unshare() */
> >> +			ino_t new_ino = get_mnt_ino(ppid);
> >> +			if (ino != new_ino)
> >> +				break;
> >> +		} while (1);
> >
> > Racing? Suppose, parent died (e.g. unshare(2) failed), (parent) process
> > was reaped, new (unrelated) process was created with same pid, as a
> > result this function will bind namespaces from wrong process.
> 
> ... besides, this is a busyloop, if "imposer process" is in same mnt
> namespace as original process, it will occupy CPU forever (and this *is*

Would be possible to use any lightweight solution rather than
pipe()+read/write(), for example use sigtimedwait() in child and
kill() in parent?

The ideal solution would be to have /proc/self/ns/<name> files poll()-able.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2016-02-01 10:41           ` Karel Zak
@ 2016-02-01 14:31             ` Yuriy M. Kaminskiy
  2016-02-02 10:14               ` Karel Zak
  0 siblings, 1 reply; 10+ messages in thread
From: Yuriy M. Kaminskiy @ 2016-02-01 14:31 UTC (permalink / raw)
  To: util-linux

Karel Zak <kzak@redhat.com> writes:

> On Sat, Jan 30, 2016 at 04:31:06PM +0300, Yuriy M. Kaminskiy wrote:
>> >> +	case 0:	/* child */
>> >> +		do {
>> >> +			/* wait until parent unshare() */
>> >> +			ino_t new_ino = get_mnt_ino(ppid);
>> >> +			if (ino != new_ino)
>> >> +				break;
>> >> +		} while (1);
>> >
>> > Racing? Suppose, parent died (e.g. unshare(2) failed), (parent) process
>> > was reaped, new (unrelated) process was created with same pid, as a
>> > result this function will bind namespaces from wrong process.
>> 
>> ... besides, this is a busyloop, if "imposer process" is in same mnt
>> namespace as original process, it will occupy CPU forever (and this *is*
>
> Would be possible to use any lightweight solution rather than
> pipe()+read/write(), for example use sigtimedwait() in child and
> kill() in parent?

With my patch, if parent dies, child is automatically awoken (got EOF and
cleanly exit). With signals, it is not.
(Besides, it is not easy to set up without introducing more racing.)

> The ideal solution would be to have /proc/self/ns/<name> files poll()-able.

I'm not sure how it can fit with the way ns/<name> exposed by kernel
currently (symlink to ns inode). (And more racing opportunities too).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2016-02-01 14:31             ` Yuriy M. Kaminskiy
@ 2016-02-02 10:14               ` Karel Zak
  0 siblings, 0 replies; 10+ messages in thread
From: Karel Zak @ 2016-02-02 10:14 UTC (permalink / raw)
  To: Yuriy M. Kaminskiy; +Cc: util-linux

On Mon, Feb 01, 2016 at 05:31:15PM +0300, Yuriy M. Kaminskiy wrote:
> Karel Zak <kzak@redhat.com> writes:
> 
> > On Sat, Jan 30, 2016 at 04:31:06PM +0300, Yuriy M. Kaminskiy wrote:
> >> >> +	case 0:	/* child */
> >> >> +		do {
> >> >> +			/* wait until parent unshare() */
> >> >> +			ino_t new_ino = get_mnt_ino(ppid);
> >> >> +			if (ino != new_ino)
> >> >> +				break;
> >> >> +		} while (1);
> >> >
> >> > Racing? Suppose, parent died (e.g. unshare(2) failed), (parent) process
> >> > was reaped, new (unrelated) process was created with same pid, as a
> >> > result this function will bind namespaces from wrong process.
> >> 
> >> ... besides, this is a busyloop, if "imposer process" is in same mnt
> >> namespace as original process, it will occupy CPU forever (and this *is*
> >
> > Would be possible to use any lightweight solution rather than
> > pipe()+read/write(), for example use sigtimedwait() in child and
> > kill() in parent?
> 
> With my patch, if parent dies, child is automatically awoken (got EOF and
> cleanly exit). With signals, it is not.
> (Besides, it is not easy to set up without introducing more racing.)

OK, fair enough.

  Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] unshare: allow persisting mount namespaces
  2016-01-30 13:31         ` Yuriy M. Kaminskiy
  2016-02-01 10:41           ` Karel Zak
@ 2016-02-17 13:07           ` Karel Zak
  1 sibling, 0 replies; 10+ messages in thread
From: Karel Zak @ 2016-02-17 13:07 UTC (permalink / raw)
  To: Yuriy M. Kaminskiy; +Cc: util-linux

On Sat, Jan 30, 2016 at 04:31:06PM +0300, Yuriy M. Kaminskiy wrote:
> From 64d1e9d905455b8177d65ad58ca2b403bbe8222c Mon Sep 17 00:00:00 2001
> From: "Yuriy M. Kaminskiy" <yumkam@gmail.com>
> Date: Sat, 30 Jan 2016 16:18:39 +0300
> Subject: [PATCH] unshare: fix busyloop and reduce racing probability
> 
> Replace busy-loop with waiting on pipe from parent.
> 
> Note: reduces racing probability, but still there are window where
> it is possible (if parent unshare process will be [externally] killed
> between successful read(fds[0]) and mount() calls).
> 
> Signed-off-by: Yuriy M. Kaminskiy <yumkam@gmail.com>
> ---
>  sys-utils/unshare.c | 49 ++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 40 insertions(+), 9 deletions(-)

 Thanks, applied with small changes. 
 
 I have used stuff from include/all-io.h (loops around read/write)
 to make code more readable.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-02-17 13:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09 11:22 [PATCH 1/2] unshare: allow persisting namespaces Karel Zak
2015-04-09 11:22 ` [PATCH 2/2] unshare: allow persisting mount namespaces Karel Zak
2015-04-09 17:07   ` Eric W. Biederman
2015-04-10  8:17     ` Karel Zak
2016-01-30  3:52       ` Yuriy M. Kaminskiy
2016-01-30 13:31         ` Yuriy M. Kaminskiy
2016-02-01 10:41           ` Karel Zak
2016-02-01 14:31             ` Yuriy M. Kaminskiy
2016-02-02 10:14               ` Karel Zak
2016-02-17 13:07           ` Karel Zak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.