util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] unshare: allow setting up filesystems in the mount namespace
@ 2019-08-15 10:54 Patrick Steinhardt
  2019-08-20 12:51 ` Karel Zak
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Steinhardt @ 2019-08-15 10:54 UTC (permalink / raw)
  To: util-linux; +Cc: Patrick Steinhardt

In order to execute commands with the least-possible privileges, it may
be desirable to provide them with a trimmed down filesystem view.
unshare naturally provides the ability to create mount namespaces, but
it doesn't yet offer much in preparing these. For now, a combination of
unshare and nsenter is required to prepare culled filesystems views,
which is kind of unwieldy.

To remedy that, this implements a new option "--mount-fs". As
parameters, one may specify a source filesystem, the destination where
this filesystem shall be mounted, the type of filesystem as well as a
set of options. unshare will then mount it using libmount right before
performing `chroot`, `chdir` and the subsequent `execve`, which allows
for preparing the `chroot` environment without using nsenter at all.

The above is useful in several different cases, for example when one
wants to execute the process in a read-only environment or execute it
with a reduced view of the filesystem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 sys-utils/Makemodule.am |  2 +-
 sys-utils/unshare.1     | 22 ++++++++++++++++++++
 sys-utils/unshare.c     | 46 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/sys-utils/Makemodule.am b/sys-utils/Makemodule.am
index 1b2277321..639824f53 100644
--- a/sys-utils/Makemodule.am
+++ b/sys-utils/Makemodule.am
@@ -424,7 +424,7 @@ if BUILD_UNSHARE
 usrbin_exec_PROGRAMS += unshare
 dist_man_MANS += sys-utils/unshare.1
 unshare_SOURCES = sys-utils/unshare.c
-unshare_LDADD = $(LDADD) libcommon.la
+unshare_LDADD = $(LDADD) libcommon.la libmount.la
 unshare_CFLAGS = $(AM_CFLAGS) -I$(ul_libmount_incdir)
 
 if HAVE_STATIC_UNSHARE
diff --git a/sys-utils/unshare.1 b/sys-utils/unshare.1
index d2ba6c3a5..ccc830923 100644
--- a/sys-utils/unshare.1
+++ b/sys-utils/unshare.1
@@ -152,6 +152,15 @@ implies creating a new mount namespace since the /proc mount would otherwise
 mess up existing programs on the system.  The new proc filesystem is explicitly
 mounted as private (with MS_PRIVATE|MS_REC).
 .TP
+.BR \-\-mount\-fs = \fIsource\fP : \fItarget\fP : \fIfstype\fP [ :\fIoptions\fP ]
+Just before running the program, mount the filesystem \fIsource\fP of type
+\fIfstype\fP at \fIdestination\fP, which will only be visible in the child's
+mount namespace. \fIoptions\fP may be used to specify a comma separated list of
+mount options for the filesystem. This option is processed before \fB--root\fR
+and is thus useful to prepare the chroot environment of the child. Can be passed
+multiple times, in which case mounts will be processed in the given order. This
+option implies \fB--mount\fR.
+.TP
 .BR \-r , " \-\-map\-root\-user"
 Run the program only after the current effective user and group IDs have been mapped to
 the superuser UID and GID in the newly created user namespace.  This makes it possible to
@@ -250,6 +259,19 @@ Establish a persistent mount namespace referenced by the bind mount
 /root/namespaces/mnt.  This example shows a portable solution, because it
 makes sure that the bind mount is created on a shared filesystem.
 .TP
+.B # unshare
+.B    --mount-fs=none:/tmp:tmpfs
+.B    --mount-fs=/bin:/tmp/bin:none:bind,ro,X-mount.mkdir
+.B    --mount-fs=/lib:/tmp/lib:none:bind,ro,X-mount.mkdir
+.B    --mount-fs=/usr/lib:/tmp/usr/lib:none:bind,ro,X-mount.mkdir
+.B    --root=/tmp /bin/ls /
+.TQ
+bin lib usr
+.br
+Establish a tmpfs at /tmp, bind-mounts parts of the root filesystem into it
+and executes the process in a chroot. This example shows how processes can be
+spawned with a reduced view of the filesystem.
+.TP
 .B # unshare -pf --kill-child -- bash -c "(sleep 999 &) && sleep 1000" &
 .TQ
 .B # pid=$!
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 21910a4ee..c72f25bdc 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -242,6 +242,38 @@ static void bind_ns_files_from_child(pid_t *child, int fds[2])
 	}
 }
 
+static int mnt_ns_filesystems(char **mounts, size_t nmounts)
+{
+	struct libmnt_context *cxt = mnt_new_context();
+	size_t i;
+
+	for (i = 0; i < nmounts; i++) {
+		char *src, *dst, *fstype, *opts;
+
+		if ((src = strtok(mounts[i], ":")) == NULL)
+			errx(EXIT_FAILURE, _("mount-fs missing src"));
+		if ((dst = strtok(NULL, ":")) == NULL)
+			errx(EXIT_FAILURE, _("mount-fs missing dst"));
+		if ((fstype = strtok(NULL, ":")) == NULL)
+			errx(EXIT_FAILURE, _("mount-fs missing fstype"));
+		opts = strtok(NULL, ":");
+
+		mnt_context_set_optsmode(cxt, MNT_OMODE_NOTAB);
+		mnt_context_set_source(cxt, src);
+		mnt_context_set_target(cxt, dst);
+		mnt_context_set_fstype(cxt, fstype);
+		mnt_context_set_options(cxt, opts);
+
+		if (mnt_context_mount(cxt) != 0)
+			err(EXIT_FAILURE, _("mount failed"));
+
+		mnt_reset_context(cxt);
+	}
+
+	mnt_free_context(cxt);
+	return 0;
+}
+
 static void __attribute__((__noreturn__)) usage(void)
 {
 	FILE *out = stdout;
@@ -268,6 +300,8 @@ static void __attribute__((__noreturn__)) usage(void)
 	fputs(_(" --kill-child[=<signame>]  when dying, kill the forked child (implies --fork)\n"
 		"                             defaults to SIGKILL\n"), out);
 	fputs(_(" --mount-proc[=<dir>]      mount proc filesystem first (implies --mount)\n"), out);
+	fputs(_(" --mount-fs <source>:<target>:<fstype>[:<opts>]\n"
+	        "                           mount filesystem (implies --mount)\n"), out);
 	fputs(_(" --propagation slave|shared|private|unchanged\n"
 	        "                           modify mount propagation in mount namespace\n"), out);
 	fputs(_(" --setgroups allow|deny    control the setgroups syscall in user namespaces\n"), out);
@@ -288,6 +322,7 @@ int main(int argc, char *argv[])
 {
 	enum {
 		OPT_MOUNTPROC = CHAR_MAX + 1,
+		OPT_MOUNTFS,
 		OPT_PROPAGATION,
 		OPT_SETGROUPS,
 		OPT_KILLCHILD,
@@ -307,6 +342,7 @@ int main(int argc, char *argv[])
 		{ "fork",          no_argument,       NULL, 'f'             },
 		{ "kill-child",    optional_argument, NULL, OPT_KILLCHILD   },
 		{ "mount-proc",    optional_argument, NULL, OPT_MOUNTPROC   },
+		{ "mount-fs",      required_argument, NULL, OPT_MOUNTFS     },
 		{ "map-root-user", no_argument,       NULL, 'r'             },
 		{ "propagation",   required_argument, NULL, OPT_PROPAGATION },
 		{ "setgroups",     required_argument, NULL, OPT_SETGROUPS   },
@@ -324,6 +360,8 @@ int main(int argc, char *argv[])
 	const char *procmnt = NULL;
 	const char *newroot = NULL;
 	const char *newdir = NULL;
+	char **mounts = NULL;
+	size_t nmounts = 0;
 	pid_t pid = 0;
 	int fds[2];
 	int status;
@@ -381,6 +419,11 @@ int main(int argc, char *argv[])
 			unshare_flags |= CLONE_NEWNS;
 			procmnt = optarg ? optarg : "/proc";
 			break;
+		case OPT_MOUNTFS:
+			mounts = xrealloc(mounts, ++nmounts * sizeof(char *));
+			mounts[nmounts - 1] = optarg;
+			unshare_flags |= CLONE_NEWNS;
+			break;
 		case 'r':
 			unshare_flags |= CLONE_NEWUSER;
 			maproot = 1;
@@ -499,6 +542,9 @@ int main(int argc, char *argv[])
 	if ((unshare_flags & CLONE_NEWNS) && propagation)
 		set_propagation(propagation);
 
+	if (mnt_ns_filesystems(mounts, nmounts) < 0)
+		errx(EXIT_FAILURE, _("mounting namespace filesystems failed"));
+
 	if (newroot) {
 		if (chroot(newroot) != 0)
 			err(EXIT_FAILURE,
-- 
2.22.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] unshare: allow setting up filesystems in the mount namespace
  2019-08-15 10:54 [PATCH] unshare: allow setting up filesystems in the mount namespace Patrick Steinhardt
@ 2019-08-20 12:51 ` Karel Zak
  2019-08-20 13:09   ` Patrick Steinhardt
  0 siblings, 1 reply; 4+ messages in thread
From: Karel Zak @ 2019-08-20 12:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: util-linux, Eric W. Biederman

On Thu, Aug 15, 2019 at 12:54:45PM +0200, Patrick Steinhardt wrote:
> In order to execute commands with the least-possible privileges, it may
> be desirable to provide them with a trimmed down filesystem view.
> unshare naturally provides the ability to create mount namespaces, but
> it doesn't yet offer much in preparing these. For now, a combination of
> unshare and nsenter is required to prepare culled filesystems views,
> which is kind of unwieldy.
> 
> To remedy that, this implements a new option "--mount-fs". As
> parameters, one may specify a source filesystem, the destination where
> this filesystem shall be mounted, the type of filesystem as well as a
> set of options. unshare will then mount it using libmount right before
> performing `chroot`, `chdir` and the subsequent `execve`, which allows
> for preparing the `chroot` environment without using nsenter at all.
>
> The above is useful in several different cases, for example when one
> wants to execute the process in a read-only environment or execute it
> with a reduced view of the filesystem.

I understand your point of view, but it's a way how unshare(1) will
slowly grow from simple one-purpose tool to complex container/namespace
setup tool ;-) I do not have any strong opinion about it. Maybe your 
--mount-fs is still so basic that we can merge it into unshare(1)

Sounds like we need a discussion about it to gather more opinions :-)
(CC to Eric).

Note that the latest mount(8) has --namespace option, so you can mount
filesystems in the another namespace although the namespace does not
contain mount command and necessary libs.

And note that for systemd based distros there is systemd-nspawn which
provides many many features (include IPC, hostname, TZ, private users,
...).

> +.B # unshare
> +.B    --mount-fs=none:/tmp:tmpfs
> +.B    --mount-fs=/bin:/tmp/bin:none:bind,ro,X-mount.mkdir
> +.B    --mount-fs=/lib:/tmp/lib:none:bind,ro,X-mount.mkdir
> +.B    --mount-fs=/usr/lib:/tmp/usr/lib:none:bind,ro,X-mount.mkdir
> +.B    --root=/tmp /bin/ls /

The libmount also allows to mount all filesystem according to mount
table stored in a file, so I can imagine --fstab option ;-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] unshare: allow setting up filesystems in the mount namespace
  2019-08-20 12:51 ` Karel Zak
@ 2019-08-20 13:09   ` Patrick Steinhardt
  2019-08-20 15:41     ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Steinhardt @ 2019-08-20 13:09 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 3335 bytes --]

On Tue, Aug 20, 2019 at 02:51:32PM +0200, Karel Zak wrote:
> On Thu, Aug 15, 2019 at 12:54:45PM +0200, Patrick Steinhardt wrote:
> > In order to execute commands with the least-possible privileges, it may
> > be desirable to provide them with a trimmed down filesystem view.
> > unshare naturally provides the ability to create mount namespaces, but
> > it doesn't yet offer much in preparing these. For now, a combination of
> > unshare and nsenter is required to prepare culled filesystems views,
> > which is kind of unwieldy.
> > 
> > To remedy that, this implements a new option "--mount-fs". As
> > parameters, one may specify a source filesystem, the destination where
> > this filesystem shall be mounted, the type of filesystem as well as a
> > set of options. unshare will then mount it using libmount right before
> > performing `chroot`, `chdir` and the subsequent `execve`, which allows
> > for preparing the `chroot` environment without using nsenter at all.
> >
> > The above is useful in several different cases, for example when one
> > wants to execute the process in a read-only environment or execute it
> > with a reduced view of the filesystem.
> 
> I understand your point of view, but it's a way how unshare(1) will
> slowly grow from simple one-purpose tool to complex container/namespace
> setup tool ;-) I do not have any strong opinion about it. Maybe your 
> --mount-fs is still so basic that we can merge it into unshare(1)
> 
> Sounds like we need a discussion about it to gather more opinions :-)
> (CC to Eric).

Sounds fair to me. The main motivation I have is that I want to
use unshare(1) as part of runit(8) to spawn supervised processes
in their own namespaces. And using multiple steps to set up
namespaces and spawn the executable makes things a lot more error
prone.

> Note that the latest mount(8) has --namespace option, so you can mount
> filesystems in the another namespace although the namespace does not
> contain mount command and necessary libs.

That would require me to set up persistent namespaces first,
though, while unshare(1) allows me to use transient ones that
disappear as soon as the executable exits.

> And note that for systemd based distros there is systemd-nspawn which
> provides many many features (include IPC, hostname, TZ, private users,
> ...).

Yeah, I know of that one, but as I'm using runit(8) as PID1
systemd-nspawn(1) is not a viable route, at least as far as I
know. I'm definitely inspired by that tool, though, and would
love to have something similar that is completely agnostic of
what init system is running.

> > +.B # unshare
> > +.B    --mount-fs=none:/tmp:tmpfs
> > +.B    --mount-fs=/bin:/tmp/bin:none:bind,ro,X-mount.mkdir
> > +.B    --mount-fs=/lib:/tmp/lib:none:bind,ro,X-mount.mkdir
> > +.B    --mount-fs=/usr/lib:/tmp/usr/lib:none:bind,ro,X-mount.mkdir
> > +.B    --root=/tmp /bin/ls /
> 
> The libmount also allows to mount all filesystem according to mount
> table stored in a file, so I can imagine --fstab option ;-)

I thought about exposing parsing of fstab-style lines from
libmount. But I'd definitely be happy to implement an "--fstab"
option instead, that would work perfectly fine for my own usecase
and probably simplify code by quite a bit.

Regards
Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] unshare: allow setting up filesystems in the mount namespace
  2019-08-20 13:09   ` Patrick Steinhardt
@ 2019-08-20 15:41     ` Eric W. Biederman
  0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2019-08-20 15:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karel Zak, util-linux

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Aug 20, 2019 at 02:51:32PM +0200, Karel Zak wrote:
>> On Thu, Aug 15, 2019 at 12:54:45PM +0200, Patrick Steinhardt wrote:
>> > In order to execute commands with the least-possible privileges, it may
>> > be desirable to provide them with a trimmed down filesystem view.
>> > unshare naturally provides the ability to create mount namespaces, but
>> > it doesn't yet offer much in preparing these. For now, a combination of
>> > unshare and nsenter is required to prepare culled filesystems views,
>> > which is kind of unwieldy.
>> > 
>> > To remedy that, this implements a new option "--mount-fs". As
>> > parameters, one may specify a source filesystem, the destination where
>> > this filesystem shall be mounted, the type of filesystem as well as a
>> > set of options. unshare will then mount it using libmount right before
>> > performing `chroot`, `chdir` and the subsequent `execve`, which allows
>> > for preparing the `chroot` environment without using nsenter at all.
>> >
>> > The above is useful in several different cases, for example when one
>> > wants to execute the process in a read-only environment or execute it
>> > with a reduced view of the filesystem.
>> 
>> I understand your point of view, but it's a way how unshare(1) will
>> slowly grow from simple one-purpose tool to complex container/namespace
>> setup tool ;-) I do not have any strong opinion about it. Maybe your 
>> --mount-fs is still so basic that we can merge it into unshare(1)
>> 
>> Sounds like we need a discussion about it to gather more opinions :-)
>> (CC to Eric).
>
> Sounds fair to me. The main motivation I have is that I want to
> use unshare(1) as part of runit(8) to spawn supervised processes
> in their own namespaces. And using multiple steps to set up
> namespaces and spawn the executable makes things a lot more error
> prone.

My vision of unshare is a simple command line debugging tool.  It let's
you get at the raw functionality.  It might be useful in scripts but it
doesn't provide a nice environment.  The secondary purpose I see for
unshare is as a small example that shows how easy it is to use all
of the functionality.

At least for me unshare is what I turn to do all of the steps manually,
and keeping it simple and focused is a major benefit to that cause.

>> Note that the latest mount(8) has --namespace option, so you can mount
>> filesystems in the another namespace although the namespace does not
>> contain mount command and necessary libs.
>
> That would require me to set up persistent namespaces first,
> though, while unshare(1) allows me to use transient ones that
> disappear as soon as the executable exits.
>
>> And note that for systemd based distros there is systemd-nspawn which
>> provides many many features (include IPC, hostname, TZ, private users,
>> ...).
>
> Yeah, I know of that one, but as I'm using runit(8) as PID1
> systemd-nspawn(1) is not a viable route, at least as far as I
> know. I'm definitely inspired by that tool, though, and would
> love to have something similar that is completely agnostic of
> what init system is running.
>
>> > +.B # unshare
>> > +.B    --mount-fs=none:/tmp:tmpfs
>> > +.B    --mount-fs=/bin:/tmp/bin:none:bind,ro,X-mount.mkdir
>> > +.B    --mount-fs=/lib:/tmp/lib:none:bind,ro,X-mount.mkdir
>> > +.B    --mount-fs=/usr/lib:/tmp/usr/lib:none:bind,ro,X-mount.mkdir
>> > +.B    --root=/tmp /bin/ls /
>> 
>> The libmount also allows to mount all filesystem according to mount
>> table stored in a file, so I can imagine --fstab option ;-)
>
> I thought about exposing parsing of fstab-style lines from
> libmount. But I'd definitely be happy to implement an "--fstab"
> option instead, that would work perfectly fine for my own usecase
> and probably simplify code by quite a bit.

The tricky part of all of this appears to be permission management.  As
soon as you change your uids and/or exec you are in trouble.  As that
will cause you to loose CAP_SYS_ADMIN (unless you are running a service
as root).

My sense is that it would be easiest to write a little tool that does
what you need to run services.  Possibly as a PAM plugin.  I know
originally that is how the unshare system call was expected to be used,
and unshare fits in well with that model.  The example of a PAM plugin
is that potentially runit and sshd could be convinced to setup the
environment for you when you start them.

In fact I think there might already be a PAM plugin for a private /tmp.

Now maybe util-linux is the place for that tool to live.  But I don't
think the unshare command itself is where we want to put the
functionality.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-08-20 15:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 10:54 [PATCH] unshare: allow setting up filesystems in the mount namespace Patrick Steinhardt
2019-08-20 12:51 ` Karel Zak
2019-08-20 13:09   ` Patrick Steinhardt
2019-08-20 15:41     ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).