All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
@ 2017-10-27 18:07 Ximin Luo
  2017-11-03 13:33 ` Karel Zak
  0 siblings, 1 reply; 6+ messages in thread
From: Ximin Luo @ 2017-10-27 18:07 UTC (permalink / raw)
  To: util-linux

(Please keep me on CC, I'm not subscribed)

When unsharing persistent mount namespaces, unshare+nsenter does not seem to
work properly when run from inside a chroot session. However, unshare by itself
works.

As a workaround for the unshare+nsenter case, one can run `nsenter --mount=<ns>
chroot <real/path/to/chroot> command args`. The `--root` option to `nsenter`
sounds like it should work, but it does not - see below for details.

Is this a bug? I'm trying to write code to work regardless of whether it's run
inside a chroot, so it would be nice not to have to pass arguments to
`nsenter(1)` that are specific to chroots, like `chroot <real/path/to/chroot>`.
It's also a bit counterintuitive to have to re-enter the chroot again.

Also, these extra steps are not needed with `unshare(1)`, which works fine by
itself. It's solely re-entering the namespace that seems to be problematic.

I'm using util-linux 2.30.2-0.1 on Debian. I don't believe it's a problem
specific to Debian, because everything works when using `unshare(1)` by itself,
as stated.

(I haven't tried running this inside a chroot-inside-a-chroot.)

Details:

# Below is all run inside a "schroot" session, which is a Debian tool for making chroot use more convenient.
# I used the instructions here (https://wiki.debian.org/sbuild#Create_the_chroot) to create one.

## Preparation for the tests

# Enter the chroot
$ sudo schroot -c unstable-amd64-sbuild
# Set up a private-bind file to hold a handle to our new namespace, as documented in the man page of unshare(1)
(unstable-amd64-sbuild)root@localhost:/tmp# touch ns-mnt; mount --bind --make-private ns-mnt ns-mnt
# Set up our test script
(unstable-amd64-sbuild)root@localhost:/tmp# script='mount; ls /; ls -l /proc/$$/ns/mnt; mount -B /dev/null /etc/hosts; echo hosts:; cat /etc/hosts'

## Case 1: unshare(1) with no special options or commands, everything works as expected

(unstable-amd64-sbuild)root@localhost:/tmp# unshare --mount=ns-mnt sh -ec "$script"
unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
[.. etc. other mappings in my chroot ..]
unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
lrwxrwxrwx 1 root root 0 Oct 27 17:35 /proc/31691/ns/mnt -> 'mnt:[4026532398]'
hosts:
[.. empty hosts (inside the namespace) ..]
# we are now back outside the namespace
# if we cat /etc/hosts (both inside and outside the chroot), we see the original

## now we try to re-enter the namespace.

## Case 2: nsenter(1) with no extra options or commands, doesn't work:

(unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt sh -ec "$script"
[.. mappings for my host system, outside the chroot ..]
bin  boot  dev	etc  home  initrd.img  initrd.img.old  lib  lib32  lib64  libx32  lost+found  media  mnt  opt  proc  root  run	sbin  selinux  srv  sys  tmp  usr  var	vmlinuz  vmlinuz.old
[.. aka the / on my host filesystem outside the chroot ..]
lrwxrwxrwx 1 root root 0 Oct 27 19:36 /proc/32434/ns/mnt -> 'mnt:[4026532398]'
[.. correct namespace ..]
hosts:
[.. empty hosts (inside the namespace) ..]
# if we cat /etc/hosts outside the namespace, it's non-empty inside the chroot but EMPTY outside the chroot.
# whoops, because we ran mount -B on the original non-chrooted / filesystem. findmnt says:
└─/etc/hosts                          udev[/null]                        devtmpfs    rw,nosuid,relatime,size=8181852k,nr_inodes=2045463,mode=755
# we unmount it before proceeding

## Case 3: nsenter(1) with --root, partially works but not really:

(unstable-amd64-sbuild)root@localhost:/tmp# nsenter --root=/ --mount=ns-mnt sh -ec "$script"
[.. i.e. mount(1) gives empty output ..]
bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
[.. at least the root is inside the chroot ..]
lrwxrwxrwx 1 root root 0 Oct 27 17:37 /proc/878/ns/mnt -> 'mnt:[4026532398]'
[.. correct namespace ..]
mount: /etc/hosts: wrong fs type, bad option, bad superblock on /dev/null, missing codepage or helper program, or other error.
[.. mount operations fail, but the namespace is correct ..]
[.. if you analyse this case a bit more, you find that /proc/$$/{mounts,mountinfo,mountstats} are all empty ..]
# exit code 32
# outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot

## Case 4: nsenter(1) with explicit chroot(1) call, everything works as expected, again:

(unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt chroot /run/schroot/mount/<<SESSIONID>> sh -ec 'mount && ls /'
unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
[.. etc. other mappings in my chroot ..]
unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
[.. great, we got our mounts back! ..]
bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
lrwxrwxrwx 1 root root 0 Oct 27 17:39 /proc/2025/ns/mnt -> 'mnt:[4026532398]'
[.. correct namespace ..]
hosts:
[.. empty hosts, as desired ..]
# outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
  2017-10-27 18:07 Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't Ximin Luo
@ 2017-11-03 13:33 ` Karel Zak
  2017-11-09 22:54   ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Karel Zak @ 2017-11-03 13:33 UTC (permalink / raw)
  To: Ximin Luo; +Cc: util-linux, Eric W. Biederman

On Fri, Oct 27, 2017 at 06:07:00PM +0000, Ximin Luo wrote:
> When unsharing persistent mount namespaces, unshare+nsenter does not seem to
> work properly when run from inside a chroot session. However, unshare by itself
> works.

It's not related to persistent namespace, but to the way how nsenter
uses chroot().

> As a workaround for the unshare+nsenter case, one can run `nsenter --mount=<ns>
> chroot <real/path/to/chroot> command args`. The `--root` option to `nsenter`
> sounds like it should work, but it does not - see below for details.
> 
> Is this a bug? 

It seems like nsenter logic problem.

The command nsenter opens root-dir and cwd file descriptors *before*
setns() syscall, and than *after* the syscall it calls chroot(). The
final process is in the namespace, but no in the root directory.

        open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
        open("/mnt/test/chroot", O_RDONLY)      = 4
        open("/mnt/test/chroot", O_RDONLY)      = 5
        setns(3, CLONE_NEWNS)                   = 0
        close(3)                                = 0
        fchdir(4)                               = 0
        chroot(".")                             = 0
        close(4)                                = 0
        fchdir(5)                               = 0
        close(5)                                = 0
        execve("/bin/bash", ["-bash"], 0x7ffd2b5244d0 /* 31 vars */) = 0

The patch below fixes the issue. It just moves root-dir and cwd open
calls *after* the setns():

        open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
        setns(3, CLONE_NEWNS)                   = 0
        close(3)                                = 0
        open("/mnt/test/chroot", O_RDONLY)      = 3
        open("/mnt/test/chroot", O_RDONLY)      = 4
        fchdir(4)                               = 0
        chroot(".")                             = 0
        close(4)                                = 0
        fchdir(3)                               = 0
        close(3)                                = 0
        execve("/bin/bash", ["-bash"], 0x7fff1ff8eb60 /* 31 vars */) = 0

Unfortunately, I'm not sure if this is the right way in all cases. 

Eric?


Examples:

*** I have simple chroot directory:

        ls -la /mnt/test/chroot
        total 20
        drwxr-xr-x    5 root root 4096 Nov  3 13:10 .
        drwxr-xr-x.   8 root root 4096 Nov  2 15:36 ..
        lrwxrwxrwx    1 root root    8 Nov  2 15:40 bin -> /usr/bin
        lrwxrwxrwx    1 root root    8 Nov  2 15:40 lib -> /usr/lib
        lrwxrwxrwx    1 root root   10 Nov  2 15:40 lib64 -> /usr/lib64
        drwxr-xr-x    4 root root 4096 Nov  3 13:22 namespaces
        dr-xr-xr-x  330 root root    0 Sep 26 22:17 proc
        lrwxrwxrwx    1 root root    9 Nov  2 15:40 sbin -> /usr/sbin
        drwxr-xr-x.  14 root root 4096 Aug 16 10:50 usr

where is bind mounted /usr and mounted /proc

        # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION --submounts /mnt/test/chroot
        TARGET                  SOURCE                      FSTYPE PROPAGATION
        /mnt/test/chroot        /dev/sda4[/mnt/test/chroot] ext4   private
        ├─/mnt/test/chroot/usr  /dev/sda4[/usr]             ext4   shared
        └─/mnt/test/chroot/proc proc                        proc   private

let's enter the root and create persistent mount namespace within the chroot:

        # chroot /mnt/test/chroot
        # unshare --mount=namespaces/mnt

our mount table:

        findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
        TARGET  SOURCE                      FSTYPE PROPAGATION
        /       /dev/sda4[/mnt/test/chroot] ext4   private
        ├─/usr  /dev/sda4[/usr]             ext4   private
        └─/proc proc                        proc   private

and our mount namespace:

        # ls -la /proc/self/ns | grep mnt
        lrwxrwxrwx 1 0 0 0 Nov  3 12:56 mnt -> mnt:[4026532457]

our pid:

        # echo $$
        14411

IMHO good idea is keep the shell alive in the chroot and use another session 
to play with nsenter.

*** nsenter examples:

a) let's try it by PID, all works as expected:

        # nsenter --target 14411 --mount --root --wd

        # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
        TARGET  SOURCE                      FSTYPE PROPAGATION
        /       /dev/sda4[/mnt/test/chroot] ext4   private
        ├─/usr  /dev/sda4[/usr]             ext4   private
        └─/proc proc                        proc   private

        # ls -la /proc/self/ns | grep mnt
        lrwxrwxrwx 1 0 0 0 Nov  3 13:02 mnt -> mnt:[4026532457]

   Important note: in this case nsenter uses /proc/<target>/root for
   chroot(), but the goal is to use persistent namespace where no <target>
   available.

b) let's try chroot() by path:

        # nsenter --target 14411 --mount --root=/mnt/test/chroot --wd=/mnt/test/chroot

        # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION

   failed, mount table is empty

c) let's try chroot by /proc paths:

        # nsenter --target 14411 --mount --root=/mnt/test/chroot/proc/14411/root --wd=/mnt/test/chroot/proc/14411/cwd

        # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
        TARGET  SOURCE                      FSTYPE PROPAGATION
        /       /dev/sda4[/mnt/test/chroot] ext4   private
        ├─/usr  /dev/sda4[/usr]             ext4   private
        └─/proc proc                        proc   private

        # ls -la /proc/self/ns | grep mnt
        lrwxrwxrwx 1 0 0 0 Nov  3 13:09 mnt -> mnt:[4026532457]

   it works!


Note that --target or --mount=<persistent> namespace does not change
anything here.

The nsenter with the patch:


        # ./nsenter --mount=/mnt/test/chroot/namespaces/mnt  --root=/mnt/test/chroot --wd=/mnt/test/chroot

        # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
        TARGET  SOURCE                      FSTYPE PROPAGATION
        /       /dev/sda4[/mnt/test/chroot] ext4   private
        ├─/usr  /dev/sda4[/usr]             ext4   private
        └─/proc proc                        proc   private

        # ls -la /proc/self/ns | grep mnt
        lrwxrwxrwx 1 0 0 0 Nov  3 13:11 mnt -> mnt:[4026532457]

all works as expected. The patch is below.

    Karel


diff --git a/sys-utils/nsenter.c b/sys-utils/nsenter.c
index 9c452c1d1..464f9f98c 100644
--- a/sys-utils/nsenter.c
+++ b/sys-utils/nsenter.c
@@ -238,6 +238,7 @@ int main(int argc, char *argv[])
 	int do_fork = -1; /* unknown yet */
 	uid_t uid = 0;
 	gid_t gid = 0;
+	const char *rd_path = NULL, *wd_path = NULL;
 #ifdef HAVE_LIBSELINUX
 	bool selinux = 0;
 #endif
@@ -318,13 +319,13 @@ int main(int argc, char *argv[])
 			break;
 		case 'r':
 			if (optarg)
-				open_target_fd(&root_fd, "root", optarg);
+				rd_path = optarg;
 			else
 				do_rd = true;
 			break;
 		case 'w':
 			if (optarg)
-				open_target_fd(&wd_fd, "cwd", optarg);
+				wd_path = optarg;
 			else
 				do_wd = true;
 			break;
@@ -433,6 +434,11 @@ int main(int argc, char *argv[])
 		}
 	}
 
+	if (wd_path)
+		open_target_fd(&wd_fd, "cwd", wd_path);
+	if (rd_path)
+		open_target_fd(&root_fd, "root", rd_path);
+
 	/* Remember the current working directory if I'm not changing it */
 	if (root_fd >= 0 && wd_fd < 0) {
 		wd_fd = open(".", O_RDONLY);

    


> I'm trying to write code to work regardless of whether it's run
> inside a chroot, so it would be nice not to have to pass arguments to
> `nsenter(1)` that are specific to chroots, like `chroot <real/path/to/chroot>`.
> It's also a bit counterintuitive to have to re-enter the chroot again.
> 
> Also, these extra steps are not needed with `unshare(1)`, which works fine by
> itself. It's solely re-entering the namespace that seems to be problematic.
> 
> I'm using util-linux 2.30.2-0.1 on Debian. I don't believe it's a problem
> specific to Debian, because everything works when using `unshare(1)` by itself,
> as stated.
> 
> (I haven't tried running this inside a chroot-inside-a-chroot.)
> 
> Details:
> 
> # Below is all run inside a "schroot" session, which is a Debian tool for making chroot use more convenient.
> # I used the instructions here (https://wiki.debian.org/sbuild#Create_the_chroot) to create one.
> 
> ## Preparation for the tests
> 
> # Enter the chroot
> $ sudo schroot -c unstable-amd64-sbuild
> # Set up a private-bind file to hold a handle to our new namespace, as documented in the man page of unshare(1)
> (unstable-amd64-sbuild)root@localhost:/tmp# touch ns-mnt; mount --bind --make-private ns-mnt ns-mnt
> # Set up our test script
> (unstable-amd64-sbuild)root@localhost:/tmp# script='mount; ls /; ls -l /proc/$$/ns/mnt; mount -B /dev/null /etc/hosts; echo hosts:; cat /etc/hosts'
> 
> ## Case 1: unshare(1) with no special options or commands, everything works as expected
> 
> (unstable-amd64-sbuild)root@localhost:/tmp# unshare --mount=ns-mnt sh -ec "$script"
> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
> proc on /proc type proc (rw,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> [.. etc. other mappings in my chroot ..]
> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
> lrwxrwxrwx 1 root root 0 Oct 27 17:35 /proc/31691/ns/mnt -> 'mnt:[4026532398]'
> hosts:
> [.. empty hosts (inside the namespace) ..]
> # we are now back outside the namespace
> # if we cat /etc/hosts (both inside and outside the chroot), we see the original
> 
> ## now we try to re-enter the namespace.
> 
> ## Case 2: nsenter(1) with no extra options or commands, doesn't work:
> 
> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt sh -ec "$script"
> [.. mappings for my host system, outside the chroot ..]
> bin  boot  dev	etc  home  initrd.img  initrd.img.old  lib  lib32  lib64  libx32  lost+found  media  mnt  opt  proc  root  run	sbin  selinux  srv  sys  tmp  usr  var	vmlinuz  vmlinuz.old
> [.. aka the / on my host filesystem outside the chroot ..]
> lrwxrwxrwx 1 root root 0 Oct 27 19:36 /proc/32434/ns/mnt -> 'mnt:[4026532398]'
> [.. correct namespace ..]
> hosts:
> [.. empty hosts (inside the namespace) ..]
> # if we cat /etc/hosts outside the namespace, it's non-empty inside the chroot but EMPTY outside the chroot.
> # whoops, because we ran mount -B on the original non-chrooted / filesystem. findmnt says:
> └─/etc/hosts                          udev[/null]                        devtmpfs    rw,nosuid,relatime,size=8181852k,nr_inodes=2045463,mode=755
> # we unmount it before proceeding
> 
> ## Case 3: nsenter(1) with --root, partially works but not really:
> 
> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --root=/ --mount=ns-mnt sh -ec "$script"
> [.. i.e. mount(1) gives empty output ..]
> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
> [.. at least the root is inside the chroot ..]
> lrwxrwxrwx 1 root root 0 Oct 27 17:37 /proc/878/ns/mnt -> 'mnt:[4026532398]'
> [.. correct namespace ..]
> mount: /etc/hosts: wrong fs type, bad option, bad superblock on /dev/null, missing codepage or helper program, or other error.
> [.. mount operations fail, but the namespace is correct ..]
> [.. if you analyse this case a bit more, you find that /proc/$$/{mounts,mountinfo,mountstats} are all empty ..]
> # exit code 32
> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
> 
> ## Case 4: nsenter(1) with explicit chroot(1) call, everything works as expected, again:
> 
> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt chroot /run/schroot/mount/<<SESSIONID>> sh -ec 'mount && ls /'
> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
> proc on /proc type proc (rw,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> [.. etc. other mappings in my chroot ..]
> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
> [.. great, we got our mounts back! ..]
> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
> lrwxrwxrwx 1 root root 0 Oct 27 17:39 /proc/2025/ns/mnt -> 'mnt:[4026532398]'
> [.. correct namespace ..]
> hosts:
> [.. empty hosts, as desired ..]
> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
> 
> -- 
> GPG: ed25519/56034877E1F87C35
> GPG: rsa4096/1318EFAC5FBBDBCE
> https://github.com/infinity0/pubkeys.git
> --
> To unsubscribe from this list: send the line "unsubscribe util-linux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
  2017-11-03 13:33 ` Karel Zak
@ 2017-11-09 22:54   ` Eric W. Biederman
  2017-11-10 13:14     ` Karel Zak
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2017-11-09 22:54 UTC (permalink / raw)
  To: Karel Zak; +Cc: Ximin Luo, util-linux

Karel Zak <kzak@redhat.com> writes:

> On Fri, Oct 27, 2017 at 06:07:00PM +0000, Ximin Luo wrote:
>> When unsharing persistent mount namespaces, unshare+nsenter does not seem to
>> work properly when run from inside a chroot session. However, unshare by itself
>> works.
>
> It's not related to persistent namespace, but to the way how nsenter
> uses chroot().

At a practical level it is related to persistent namespaces as this
problem will come up nowhere else.

In the non-persistent case you can do:
nsenter --mount=/proc/<pid>/ns/mnt --root=/proc/<pid>/root

Which works because the root directory is in the mount namespace.

>> As a workaround for the unshare+nsenter case, one can run `nsenter --mount=<ns>
>> chroot <real/path/to/chroot> command args`. The `--root` option to `nsenter`
>> sounds like it should work, but it does not - see below for details.
>> 
>> Is this a bug? 
>
> It seems like nsenter logic problem.
>
> The command nsenter opens root-dir and cwd file descriptors *before*
> setns() syscall, and than *after* the syscall it calls chroot(). The
> final process is in the namespace, but no in the root directory.

Which is necessary for the opening of file descriptors to have a well
defined meaning.  

>         open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
>         open("/mnt/test/chroot", O_RDONLY)      = 4
>         open("/mnt/test/chroot", O_RDONLY)      = 5
>         setns(3, CLONE_NEWNS)                   = 0
>         close(3)                                = 0
>         fchdir(4)                               = 0
>         chroot(".")                             = 0
>         close(4)                                = 0
>         fchdir(5)                               = 0
>         close(5)                                = 0
>         execve("/bin/bash", ["-bash"], 0x7ffd2b5244d0 /* 31 vars */) = 0

> The patch below fixes the issue. It just moves root-dir and cwd open
> calls *after* the setns():
>
>         open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
>         setns(3, CLONE_NEWNS)                   = 0
>         close(3)                                = 0
>         open("/mnt/test/chroot", O_RDONLY)      = 3
>         open("/mnt/test/chroot", O_RDONLY)      = 4
>         fchdir(4)                               = 0
>         chroot(".")                             = 0
>         close(4)                                = 0
>         fchdir(3)                               = 0
>         close(3)                                = 0
>         execve("/bin/bash", ["-bash"], 0x7fff1ff8eb60 /* 31 vars */) = 0
>
> Unfortunately, I'm not sure if this is the right way in all cases.

I believe this will break all except the case mentioned.

My personal recommendation is not to use chroot with persistent mount
namespaces.  That just seems to keep unnecessary mounts around.  Those
extra mounts will almost certainly be a problem later when you discover
you want to unmount one of those mounted filesystems you don't care
about but are chrooting over.

I think it would be quite reasonable to have an additional option to
open things in the new mount namespace, just before exec.  I just don't
see how useful it would be.

A second possibility is to issue a warning if root and is not a member
of the target mount namespace.  That might even allow doing the right
thing automatically.  It looks like the mnt_id is available from
/proc/<pid>/fdinfo/<fd#>.  So it looks like it is possible with the
existing kernel interfaces (at least in theory).

Ugh.  It looks like you commited your change below to sys-utils by
accident.

Eric


>
>
> Examples:
>
> *** I have simple chroot directory:
>
>         ls -la /mnt/test/chroot
>         total 20
>         drwxr-xr-x    5 root root 4096 Nov  3 13:10 .
>         drwxr-xr-x.   8 root root 4096 Nov  2 15:36 ..
>         lrwxrwxrwx    1 root root    8 Nov  2 15:40 bin -> /usr/bin
>         lrwxrwxrwx    1 root root    8 Nov  2 15:40 lib -> /usr/lib
>         lrwxrwxrwx    1 root root   10 Nov  2 15:40 lib64 -> /usr/lib64
>         drwxr-xr-x    4 root root 4096 Nov  3 13:22 namespaces
>         dr-xr-xr-x  330 root root    0 Sep 26 22:17 proc
>         lrwxrwxrwx    1 root root    9 Nov  2 15:40 sbin -> /usr/sbin
>         drwxr-xr-x.  14 root root 4096 Aug 16 10:50 usr
>
> where is bind mounted /usr and mounted /proc
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION --submounts /mnt/test/chroot
>         TARGET                  SOURCE                      FSTYPE PROPAGATION
>         /mnt/test/chroot        /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/mnt/test/chroot/usr  /dev/sda4[/usr]             ext4   shared
>         └─/mnt/test/chroot/proc proc                        proc   private
>
> let's enter the root and create persistent mount namespace within the chroot:
>
>         # chroot /mnt/test/chroot
>         # unshare --mount=namespaces/mnt
>
> our mount table:
>
>         findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
> and our mount namespace:
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 12:56 mnt -> mnt:[4026532457]
>
> our pid:
>
>         # echo $$
>         14411
>
> IMHO good idea is keep the shell alive in the chroot and use another session 
> to play with nsenter.
>
> *** nsenter examples:
>
> a) let's try it by PID, all works as expected:
>
>         # nsenter --target 14411 --mount --root --wd
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:02 mnt -> mnt:[4026532457]
>
>    Important note: in this case nsenter uses /proc/<target>/root for
>    chroot(), but the goal is to use persistent namespace where no <target>
>    available.
>
> b) let's try chroot() by path:
>
>         # nsenter --target 14411 --mount --root=/mnt/test/chroot --wd=/mnt/test/chroot
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>
>    failed, mount table is empty
>
> c) let's try chroot by /proc paths:
>
>         # nsenter --target 14411 --mount --root=/mnt/test/chroot/proc/14411/root --wd=/mnt/test/chroot/proc/14411/cwd
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:09 mnt -> mnt:[4026532457]
>
>    it works!
>
>
> Note that --target or --mount=<persistent> namespace does not change
> anything here.
>
> The nsenter with the patch:
>
>
>         # ./nsenter --mount=/mnt/test/chroot/namespaces/mnt  --root=/mnt/test/chroot --wd=/mnt/test/chroot
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:11 mnt -> mnt:[4026532457]
>
> all works as expected. The patch is below.
>
>     Karel

>
>
> diff --git a/sys-utils/nsenter.c b/sys-utils/nsenter.c
> index 9c452c1d1..464f9f98c 100644
> --- a/sys-utils/nsenter.c
> +++ b/sys-utils/nsenter.c
> @@ -238,6 +238,7 @@ int main(int argc, char *argv[])
>  	int do_fork = -1; /* unknown yet */
>  	uid_t uid = 0;
>  	gid_t gid = 0;
> +	const char *rd_path = NULL, *wd_path = NULL;
>  #ifdef HAVE_LIBSELINUX
>  	bool selinux = 0;
>  #endif
> @@ -318,13 +319,13 @@ int main(int argc, char *argv[])
>  			break;
>  		case 'r':
>  			if (optarg)
> -				open_target_fd(&root_fd, "root", optarg);
> +				rd_path = optarg;
>  			else
>  				do_rd = true;
>  			break;
>  		case 'w':
>  			if (optarg)
> -				open_target_fd(&wd_fd, "cwd", optarg);
> +				wd_path = optarg;
>  			else
>  				do_wd = true;
>  			break;
> @@ -433,6 +434,11 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> +	if (wd_path)
> +		open_target_fd(&wd_fd, "cwd", wd_path);
> +	if (rd_path)
> +		open_target_fd(&root_fd, "root", rd_path);
> +
>  	/* Remember the current working directory if I'm not changing it */
>  	if (root_fd >= 0 && wd_fd < 0) {
>  		wd_fd = open(".", O_RDONLY);
>
>     
>
>
>> I'm trying to write code to work regardless of whether it's run
>> inside a chroot, so it would be nice not to have to pass arguments to
>> `nsenter(1)` that are specific to chroots, like `chroot <real/path/to/chroot>`.
>> It's also a bit counterintuitive to have to re-enter the chroot again.
>> 
>> Also, these extra steps are not needed with `unshare(1)`, which works fine by
>> itself. It's solely re-entering the namespace that seems to be problematic.
>> 
>> I'm using util-linux 2.30.2-0.1 on Debian. I don't believe it's a problem
>> specific to Debian, because everything works when using `unshare(1)` by itself,
>> as stated.
>> 
>> (I haven't tried running this inside a chroot-inside-a-chroot.)
>> 
>> Details:
>> 
>> # Below is all run inside a "schroot" session, which is a Debian tool for making chroot use more convenient.
>> # I used the instructions here (https://wiki.debian.org/sbuild#Create_the_chroot) to create one.
>> 
>> ## Preparation for the tests
>> 
>> # Enter the chroot
>> $ sudo schroot -c unstable-amd64-sbuild
>> # Set up a private-bind file to hold a handle to our new namespace, as documented in the man page of unshare(1)
>> (unstable-amd64-sbuild)root@localhost:/tmp# touch ns-mnt; mount --bind --make-private ns-mnt ns-mnt
>> # Set up our test script
>> (unstable-amd64-sbuild)root@localhost:/tmp# script='mount; ls /; ls -l /proc/$$/ns/mnt; mount -B /dev/null /etc/hosts; echo hosts:; cat /etc/hosts'
>> 
>> ## Case 1: unshare(1) with no special options or commands, everything works as expected
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# unshare --mount=ns-mnt sh -ec "$script"
>> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> proc on /proc type proc (rw,relatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>> [.. etc. other mappings in my chroot ..]
>> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> lrwxrwxrwx 1 root root 0 Oct 27 17:35 /proc/31691/ns/mnt -> 'mnt:[4026532398]'
>> hosts:
>> [.. empty hosts (inside the namespace) ..]
>> # we are now back outside the namespace
>> # if we cat /etc/hosts (both inside and outside the chroot), we see the original
>> 
>> ## now we try to re-enter the namespace.
>> 
>> ## Case 2: nsenter(1) with no extra options or commands, doesn't work:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt sh -ec "$script"
>> [.. mappings for my host system, outside the chroot ..]
>> bin  boot  dev	etc  home  initrd.img  initrd.img.old  lib  lib32  lib64  libx32  lost+found  media  mnt  opt  proc  root  run	sbin  selinux  srv  sys  tmp  usr  var	vmlinuz  vmlinuz.old
>> [.. aka the / on my host filesystem outside the chroot ..]
>> lrwxrwxrwx 1 root root 0 Oct 27 19:36 /proc/32434/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> hosts:
>> [.. empty hosts (inside the namespace) ..]
>> # if we cat /etc/hosts outside the namespace, it's non-empty inside the chroot but EMPTY outside the chroot.
>> # whoops, because we ran mount -B on the original non-chrooted / filesystem. findmnt says:
>> └─/etc/hosts                          udev[/null]                        devtmpfs    rw,nosuid,relatime,size=8181852k,nr_inodes=2045463,mode=755
>> # we unmount it before proceeding
>> 
>> ## Case 3: nsenter(1) with --root, partially works but not really:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --root=/ --mount=ns-mnt sh -ec "$script"
>> [.. i.e. mount(1) gives empty output ..]
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> [.. at least the root is inside the chroot ..]
>> lrwxrwxrwx 1 root root 0 Oct 27 17:37 /proc/878/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> mount: /etc/hosts: wrong fs type, bad option, bad superblock on /dev/null, missing codepage or helper program, or other error.
>> [.. mount operations fail, but the namespace is correct ..]
>> [.. if you analyse this case a bit more, you find that /proc/$$/{mounts,mountinfo,mountstats} are all empty ..]
>> # exit code 32
>> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
>> 
>> ## Case 4: nsenter(1) with explicit chroot(1) call, everything works as expected, again:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt chroot /run/schroot/mount/<<SESSIONID>> sh -ec 'mount && ls /'
>> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> proc on /proc type proc (rw,relatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>> [.. etc. other mappings in my chroot ..]
>> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> [.. great, we got our mounts back! ..]
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> lrwxrwxrwx 1 root root 0 Oct 27 17:39 /proc/2025/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> hosts:
>> [.. empty hosts, as desired ..]
>> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
>> 
>> -- 
>> GPG: ed25519/56034877E1F87C35
>> GPG: rsa4096/1318EFAC5FBBDBCE
>> https://github.com/infinity0/pubkeys.git
>> --
>> To unsubscribe from this list: send the line "unsubscribe util-linux" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
  2017-11-09 22:54   ` Eric W. Biederman
@ 2017-11-10 13:14     ` Karel Zak
  2017-11-10 14:22       ` Ximin Luo
  0 siblings, 1 reply; 6+ messages in thread
From: Karel Zak @ 2017-11-10 13:14 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Ximin Luo, util-linux

On Thu, Nov 09, 2017 at 04:54:06PM -0600, Eric W. Biederman wrote:
> Karel Zak <kzak@redhat.com> writes:
> 
> > Unfortunately, I'm not sure if this is the right way in all cases.
> 
> I believe this will break all except the case mentioned.

I have expected something like this...

> My personal recommendation is not to use chroot with persistent mount
> namespaces.  That just seems to keep unnecessary mounts around.  Those
> extra mounts will almost certainly be a problem later when you discover
> you want to unmount one of those mounted filesystems you don't care
> about but are chrooting over.
> 
> I think it would be quite reasonable to have an additional option to
> open things in the new mount namespace, just before exec.  I just don't
> see how useful it would be.

It would be solution for this use-case, but it will increase
complexity and I'm not sure this use-case is important enough.

Especially if the all you need is to use chroot command before nsenter.
I don't think nsenter has to be all-in-one command. It's very basic
tool.

> A second possibility is to issue a warning if root and is not a member
> of the target mount namespace.  That might even allow doing the right
> thing automatically.  It looks like the mnt_id is available from
> /proc/<pid>/fdinfo/<fd#>.  So it looks like it is possible with the
> existing kernel interfaces (at least in theory).

I'll think about it.

> Ugh.  It looks like you commited your change below to sys-utils by
> accident.

OMG...<censored>... fixed. Thanks!

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
  2017-11-10 13:14     ` Karel Zak
@ 2017-11-10 14:22       ` Ximin Luo
  2017-11-24 13:09         ` Ximin Luo
  0 siblings, 1 reply; 6+ messages in thread
From: Ximin Luo @ 2017-11-10 14:22 UTC (permalink / raw)
  To: Karel Zak, Eric W. Biederman; +Cc: util-linux

Karel Zak:
> [..]
> 
>> My personal recommendation is not to use chroot with persistent mount
>> namespaces.  That just seems to keep unnecessary mounts around.  Those
>> extra mounts will almost certainly be a problem later when you discover
>> you want to unmount one of those mounted filesystems you don't care
>> about but are chrooting over.
>>
>> I think it would be quite reasonable to have an additional option to
>> open things in the new mount namespace, just before exec.  I just don't
>> see how useful it would be.
> 
> It would be solution for this use-case, but it will increase
> complexity and I'm not sure this use-case is important enough.
> 
> Especially if the all you need is to use chroot command before nsenter.
> I don't think nsenter has to be all-in-one command. It's very basic
> tool.
> 

My nsenter code may be run inside or outside a chroot, I have no control over that in the general case - users decide whether they want to run it inside a chroot or not.

The issue with using the chroot(1) command, is that you must give it the path to the chroot *from outside the chroot*. I don't know of a clean way to figure this out from my code, that starts life running from inside the chroot, and just wants to unshare part of the tree that it sees there.

An option to open root/wd in the new ns, sounds like it would allow me (and others) to write code that is chroot-independent. I'd very much appreciate that.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't
  2017-11-10 14:22       ` Ximin Luo
@ 2017-11-24 13:09         ` Ximin Luo
  0 siblings, 0 replies; 6+ messages in thread
From: Ximin Luo @ 2017-11-24 13:09 UTC (permalink / raw)
  To: Karel Zak, Eric W. Biederman; +Cc: util-linux

Ximin Luo:
> Karel Zak:
>> [..]
>>
>>> My personal recommendation is not to use chroot with persistent mount
>>> namespaces.  That just seems to keep unnecessary mounts around.  Those
>>> extra mounts will almost certainly be a problem later when you discover
>>> you want to unmount one of those mounted filesystems you don't care
>>> about but are chrooting over.
>>>
>>> I think it would be quite reasonable to have an additional option to
>>> open things in the new mount namespace, just before exec.  I just don't
>>> see how useful it would be.
>>
>> It would be solution for this use-case, but it will increase
>> complexity and I'm not sure this use-case is important enough.
>>
>> Especially if the all you need is to use chroot command before nsenter.
>> I don't think nsenter has to be all-in-one command. It's very basic
>> tool.
>>
> 
> My nsenter code may be run inside or outside a chroot, I have no control over that in the general case - users decide whether they want to run it inside a chroot or not.
> 
> The issue with using the chroot(1) command, is that you must give it the path to the chroot *from outside the chroot*. I don't know of a clean way to figure this out from my code, that starts life running from inside the chroot, and just wants to unshare part of the tree that it sees there.
> 
> An option to open root/wd in the new ns, sounds like it would allow me (and others) to write code that is chroot-independent. I'd very much appreciate that.
> 

I tried Karel's patch from earlier and it seems that it does not work as I need it to - with the patch, it is still required to pass the path-to-the-chroot, from the parent mount namespace's "real" root.

I guess the problem stems from the fact that the unshare process's root, is a child-namespace-specific copy of the root from the parent namespace. When the process exists the handle to this root is lost, and nsenter does not have enough information to be able to pick it up again.

And opening the parent namespace's root (inside the chroot), if I understood correctly, does not correspond to a valid file descriptor inside the child namespace, and that's why this bug exists.

Perhaps the solution then, is to offer a way for unshare(1) to save a handle to the root inside the child-namespace, and then nsenter(1) can later be pointed to this root? i.e.

# all commands run inside the chroot
$ unshare --mount=ns-mnt --root=./root true
#                        ^^^^^^^^^^^^^
# this would effectively save (chroot)/proc/self/root to ./root so it's available after the process exits

$ nsenter --mount=ns-mnt --root=./root true
#                        ^^^^^^^^^^^^^
# nsenter can then chroot to this as normal

(The wd is less important to me but I suppose a similar thing could be done with that too.)

I am not sure if the first step is possible in the current kernel however. But it seems to me that it *ought* to be possible, to make chroots and persistent-mount-namespaces play nicely together.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-24 13:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27 18:07 Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't Ximin Luo
2017-11-03 13:33 ` Karel Zak
2017-11-09 22:54   ` Eric W. Biederman
2017-11-10 13:14     ` Karel Zak
2017-11-10 14:22       ` Ximin Luo
2017-11-24 13:09         ` Ximin Luo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.