All of lore.kernel.org
 help / color / mirror / Atom feed
* correct usage of unshare+nsenter for persistent namespaces?
@ 2017-03-10 17:51 Assaf Gordon
  2017-03-27 11:41 ` Karel Zak
  0 siblings, 1 reply; 2+ messages in thread
From: Assaf Gordon @ 2017-03-10 17:51 UTC (permalink / raw)
  To: util-linux

Hello Karel and all,

I'd like to ask you advice regarding proper usage of unshare+nsenter
to create persistent containers. I understand unshare(1) is rather 
low-level, but it would like to still be able to understand how to use 
it.

Apologise in advance for the long email, but I hope it will
result in better documentation (or at least better understanding for 
me).

There are many bits and pieces of information
around (man pages and blogs and stack-overflow, etc.),
but I haven't been able to find an authoritative example
of using it to create a contained re-entrant persistent environment.
(If I missed it, please do point me to it).





Step 1: preparations
--------------------

All my testing was done stock Debian 8.7,
with kernel 3.16.39-1+deb8u1,
and util-linux 2.29.2 compiled from source.
All commands run as 'root'.

Extrapolating from unshare's man page about creating
a persistent environment:

    basedir=/var/namespaces/ns1
    mkdir -p $basedir
    mount --bind $basedir $basedir
    mount --make-private $basedir
    for i in uts mnt pid net ipc user ;
    do
     touch $basedir/$i
    done

Are these correct?



Step 2: creating shared namespace
---------------------------------

(for now, I'm ignoring user-namespace, as it brings
its own complications.)

Starting a new environment using the following:

    unshare --uts=$basedir/uts \
            --mount=$basedir/mnt \
            --ipc=$basedir/ipc \
            --pid=$basedir/pid \
            --net=$basedir/net \
            --mount-proc \
            --fork \
            sh -c 'hostname foobar ; exec /bin/bash -il'

And indeed I get a prompt inside the container:

    root@foobar# ps ax
    PID TTY      STAT   TIME COMMAND
     1 pts/2    S      0:00 /bin/bash -il
     8 pts/2    R+     0:00 ps ax


    root@foobar# ifconfig -a
    lo        Link encap:Local Loopback
              LOOPBACK  MTU:65536  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)



On the outside host, I see the mounts and the namespaces:

    # findmnt -O TARGET
    [...]
    └─/var/namespaces/ns1
     ├─/var/namespaces/ns1/ipc
     ├─/var/namespaces/ns1/uts
     ├─/var/namespaces/ns1/net
     ├─/var/namespaces/ns1/pid
     └─/var/namespaces/ns1/mnt

    # lsns
    NS        TYPE  NPROCS   PID USER     COMMAND
    [...]
    4026532329 mnt        2 19221 root     unshare --uts=..
    4026532330 uts        2 19221 root     unshare --uts=..
    4026532331 ipc        2 19221 root     unshare --uts=..
    4026532332 pid        1 19223 root     /bin/bash -il
    4026532334 net        2 19221 root     unshare --uts=..


Step 3: Re-entering
-------------------

Trying to enter based on PID works:

    # nsenter -t 19223 -m -u -i -n -p \
          sh -c 'hostname ; echo ; ps ax ; echo ; ifconfig -a'
    foobar

      PID TTY      STAT   TIME COMMAND
        1 pts/2    S+     0:00 /bin/bash -il
       15 pts/1    S+     0:00 sh -c hostname ; ps ax
       17 pts/1    R+     0:00 ps ax

    lo        Link encap:Local Loopback
          LOOPBACK  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)


However trying to enter by the persistent mounts does not
re-enter the pid/net namespace:

    # nsenter --uts=$basedir/uts \
              --mount=$basedir/mnt \
              --ipc=$basedir/ipc \
              --pid=$basedir/pid \
              --net=$basedir/net \
              sh -c 'hostname ; echo ; ps ax ; echo ; ifconfig -a'
    foobar

    Error, do this: mount -t proc proc /proc

    Warning: cannot open /proc/net/dev (No such file or directory).
    Limited output.

Listing /proc inside the container shows it only lists PID 1
(the running '/bin/bash' from the original 'unshare' invocation).

Based on naive reading of unshare(1) man page (with the example of 
persistent UTS at the bottom), I assumed the above two examples with 
PID and with persistent mount points should be equivalent.

Is this a kernel limitation ?



Step 4: PID namespace is never persistent?
------------------------------------------

IIUC, this is a kernel limitation:
If the program which is PID1 inside the container
terminates, there is no way to re-enter the PID namespace
(http://man7.org/linux/man-pages/man7/pid_namespaces.7.html).

Is that correct?

If so, perhaps it would be helpful to add a caveat in the
unshare/nsenter man pages, saying the PID namespace will
not persist if the process termintes?

And if this is the case, would the following
work to create a re-entrant persistent namespace:

    unshare --uts=$basedir/uts \
            --mount=$basedir/mnt \
            --ipc=$basedir/ipc \
            --pid=$basedir/pid \
            --net=$basedir/net \
            --mount-proc \
            --fork \
            sleep inf

Obviosuly sleep(1) is not a good PID1, but is it conceptually correct
way to ensure the PID namespace is persistent?

There are already some examples of minimal 'init' for containers:
  https://github.com/Yelp/dumb-init
  https://github.com/krallin/tini
  and most minimal: https://gist.github.com/rofl0r/6168719 

I wonder if you will be willing to consider a patch to add
something like 'unshare --do-nothing-init' which
will simply create a process that does nothing except handling signals
and never terminates, to facilitate truly persistent namespaces with 
unshare(1) ? (if so I'm happy to try and write it).


Thank you for reaing so far.
regards,
 - assaf

P.S.
I have more questions about proper usage of user-namespace and 
switch_root/pivot_root, but I'll save them for later :)


P.P.S.

The download URL in the 2.92.2 announcement was http://ftp.kernel.org/
and it seems broken:
  $ host ftp.kernel.org
  Host ftp.kernel.org not found: 3(NXDOMAIN)
The working URL seems like 'www.kernel.org' (www. instead of ftp.):
  https://www.kernel.org/pub/linux/utils/util-linux/


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: correct usage of unshare+nsenter for persistent namespaces?
  2017-03-10 17:51 correct usage of unshare+nsenter for persistent namespaces? Assaf Gordon
@ 2017-03-27 11:41 ` Karel Zak
  0 siblings, 0 replies; 2+ messages in thread
From: Karel Zak @ 2017-03-27 11:41 UTC (permalink / raw)
  To: Assaf Gordon; +Cc: util-linux

On Fri, Mar 10, 2017 at 05:51:57PM +0000, Assaf Gordon wrote:
> IIUC, this is a kernel limitation:
> If the program which is PID1 inside the container
> terminates, there is no way to re-enter the PID namespace
> (http://man7.org/linux/man-pages/man7/pid_namespaces.7.html).
> 
> Is that correct?

Yes, this namespace is strictly based on the within namespace init
process.

> If so, perhaps it would be helpful to add a caveat in the
> unshare/nsenter man pages, saying the PID namespace will
> not persist if the process termintes?

Added.

> There are already some examples of minimal 'init' for containers:
>  https://github.com/Yelp/dumb-init
>  https://github.com/krallin/tini
>  and most minimal: https://gist.github.com/rofl0r/6168719
> 
> I wonder if you will be willing to consider a patch to add
> something like 'unshare --do-nothing-init' which
> will simply create a process that does nothing except handling signals
> and never terminates, to facilitate truly persistent namespaces with
> unshare(1) ? (if so I'm happy to try and write it).

Hmm, when I think about it I'm not able to see any argument against
this feature :-) So go ahead.

Important is keep it simple and stupid and avoid arbitrary additional
features. I think for serious containers people will use another
solutions (systemd etc.).
 
> The working URL seems like 'www.kernel.org' (www. instead of ftp.):
>  https://www.kernel.org/pub/linux/utils/util-linux/

Ah, thanks! (Seems my template is a little bit obsolete. Fixed.)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-27 11:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10 17:51 correct usage of unshare+nsenter for persistent namespaces? Assaf Gordon
2017-03-27 11:41 ` Karel Zak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.