pivot_root(".", ".") and the fchdir() dance

* pivot_root(".", ".") and the fchdir() dance
@ 2019-08-01 13:38 Michael Kerrisk (man-pages)
  2019-08-05 10:36 ` Aleksa Sarai
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Kerrisk (man-pages) @ 2019-08-01 13:38 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andy Lutomirski, Containers, Stéphane Graber,
	Christian Brauner, Al Viro, lkml, linux-man, Jordan Ogas

Hi Serge, Andy, et al,

I've been looking at doing some updates for the rather inaccurate
pivot_root(2) manual page, and I noticed this 2014 commit in LXC

[[commit 2d489f9e87fa0cccd8a1762680a43eeff2fe1b6e
Author: Serge Hallyn <serge.hallyn@ubuntu.com>
Date:   Sat Sep 20 03:15:44 2014 +0000

    pivot_root: switch to a new mechanism (v2)

    This idea came from Andy Lutomirski.  Instead of using a
    temporary directory for the pivot_root put-old, use "." both
    for new-root and old-root.  Then fchdir into the old root
    temporarily in order to unmount the old-root, and finally
    chdir back into our '/'.
]]

I'd like to add some documentation about the pivot_root(".", ".")
idea, but I have a doubt/question. In the lxc_pivot_root() code we
have these steps

        oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC);
        newroot = open(rootfs, O_DIRECTORY | O_RDONLY | O_CLOEXEC);

        fchdir(newroot);
        pivot_root(".", ".");

        fchdir(oldroot);      // ****

        mount("", ".", "", MS_SLAVE | MS_REC, NULL);
        umount2(".", MNT_DETACH);

        fchdir(newroot);      // ****

My question: are the two fchdir() calls marked "****" really
necessary? I suspect not. My reasoning:
1. By this point, both the CWD and root dir of the calling process are
in newroot (and so do not keep newroot busy, and thus don't prevent
the unmount).
2. After the pivot_root() operation, there are two mount points
stacked at "/": oldroot and newroot, with oldroot a child mount
stacked on top of newroot (I did some experiments to verify that this
is so, by examination of /proc/self/mountinfo).
3. The umount(".") operation unmounts the topmost mount from the pair
of mounts stacked at "/".

At least, in some separate tests that I've done, things seem to work
as I describe above without the use of the marked fchdir() calls. (My
tests omit the mount(MS_SLAVE) piece, since in my tests I do a
more-or-less equivalent step at an earlier point.

Am I missing something?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 25+ messages in thread