All of lore.kernel.org
 help / color / mirror / Atom feed
* pivot_root depreciated?
@ 2014-01-29 15:04 Phillip Susi
  2014-01-29 15:29 ` Dave Reisner
  2014-01-30  8:16 ` Karel Zak
  0 siblings, 2 replies; 12+ messages in thread
From: Phillip Susi @ 2014-01-29 15:04 UTC (permalink / raw)
  To: util-linux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I could have sworn that the pivot_root syscall was considered a
fragile dirty hack that needed to die in a fire, and everyone switched
to run-init years ago.  Am I misremembering things or does anyone know
what I'm talking about?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS6RiIAAoJEI5FoCIzSKrwvW0IAJtdki3mJn6AqiRQdxcDA4Km
Wbxk/YAuotL/RD4JHASs5ewlRD8FJDHK5zm38b5Gp6h1QRAtmWYFgX3v2QDzsnK9
gkurBVNNYKklARBo4yKoGbbzIPfnpeIP1bgKblPc1/bLjt9/d9bJY0NQ8MsuhR1+
HmGTB5WiIidQ5n8A9x3IgySBmrqTikiVEwk4SshJtbfWGGd2WT4kzNlYKbvAyxFm
Pok+f88+f/Zj5toajVtnbE2QOIMn2LSJYZeecodWa/Eq787OK9hAt2lnJSexFSV3
wfWVxrtEbQtjoqSYXixbu0OnuJ8InEcWEObALiEa5069JmTOi8i/njE7M9sBef4=
=O5GB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-29 15:04 pivot_root depreciated? Phillip Susi
@ 2014-01-29 15:29 ` Dave Reisner
  2014-01-30  8:16 ` Karel Zak
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Reisner @ 2014-01-29 15:29 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux

On Wed, Jan 29, 2014 at 10:04:40AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I could have sworn that the pivot_root syscall was considered a
> fragile dirty hack that needed to die in a fire, and everyone switched
> to run-init years ago.  Am I misremembering things or does anyone know
> what I'm talking about?

pivot_root is still useful. It's used by systemd on shutdown to allow
the system to return to an initramfs where the root filesystem can be
unmounted and disassembled (e.g. close crypto mappings). Both dracut and
mkinitcpio provide support for this.

I'm not familiar with run-init but, glancing at the code, it looks like
it does the same job of util-linux's switch_root.

d

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-29 15:04 pivot_root depreciated? Phillip Susi
  2014-01-29 15:29 ` Dave Reisner
@ 2014-01-30  8:16 ` Karel Zak
  2014-01-30 14:15   ` Phillip Susi
  1 sibling, 1 reply; 12+ messages in thread
From: Karel Zak @ 2014-01-30  8:16 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux

On Wed, Jan 29, 2014 at 10:04:40AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I could have sworn that the pivot_root syscall was considered a
> fragile dirty hack that needed to die in a fire, and everyone switched

 everyone switched to our switch_root :-)

> to run-init years ago.  Am I misremembering things or does anyone know
> what I'm talking about?

 Well, I still see that people use pivot_root.

 Anyway, I don't see any info about pivot_root syscall deprecation in
 Linux kernel source tree.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30  8:16 ` Karel Zak
@ 2014-01-30 14:15   ` Phillip Susi
  2014-01-30 14:50     ` Thomas Bächler
  2014-01-30 14:54     ` Dave Reisner
  0 siblings, 2 replies; 12+ messages in thread
From: Phillip Susi @ 2014-01-30 14:15 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/30/2014 3:16 AM, Karel Zak wrote:
> everyone switched to our switch_root :-)

Except apparently for the systemd folks, which for some odd reason
like the idea of keeping around the initrd for the life of the system
so init can "return" to it ( yuck! ).

> Anyway, I don't see any info about pivot_root syscall deprecation
> in Linux kernel source tree.

Me neither, but then why switch_root?  I thought the whole reason it
came about was because Linus et al considered pivot_root() to have
been a terrible idea.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS6l5vAAoJEI5FoCIzSKrwaXoIAJ+aU1qEbvIjUqyZjQmbf3tV
zvDLwxK+BGtf7YYjqpe/SaU6nfusdjyALiLUYvwhbT4QbjBqxhcvbbNhe9Om0VSC
ClOdL/+aeHhjmy5GENeeaT1QKmVj5k/lHe/8u0eAF4Ha9GDPeSctq7ExhO1NzRVw
kibYrlMWrSKa4NrAw0lJYGQUrCMnksRqvLPM8r5T1xW6+GzUgideHR4K40i7sVnk
tNAQKdgTn8E3R0vUZfNIFvIlLwcZFWCjJxIFXB5ibq3zauNtW9kCU86r9+lN1E7I
ZHywaMb/w7qlFLDXrNMp0EPP8gisfhEFalWjhERraSeJthH4s865u6tFKCts7Dw=
=TQZ1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 14:15   ` Phillip Susi
@ 2014-01-30 14:50     ` Thomas Bächler
  2014-01-30 15:20       ` Phillip Susi
  2014-01-31  9:17       ` Karel Zak
  2014-01-30 14:54     ` Dave Reisner
  1 sibling, 2 replies; 12+ messages in thread
From: Thomas Bächler @ 2014-01-30 14:50 UTC (permalink / raw)
  To: Phillip Susi, Karel Zak; +Cc: util-linux

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

Am 30.01.2014 15:15, schrieb Phillip Susi:
> On 1/30/2014 3:16 AM, Karel Zak wrote:
>> everyone switched to our switch_root :-)
> 
> Except apparently for the systemd folks, which for some odd reason
> like the idea of keeping around the initrd for the life of the system
> so init can "return" to it ( yuck! ).

That is false. In order to keep the initramfs around, you have to copy
it to a tmpfs, since you can't pivot rootfs. In order to free the memory
occupied by the original initramfs, you still need to run switch_root or
an equivalent.

In particular, if systemd is used in the initramfs, it will do an
equivalent to util-linux's switch_root, but it will use an internal
reimplementation to be able to pass some internal state from the
initrd's systemd to the system's systemd (which would be lost if systemd
would exec an external binary other than systemd).

At least mkinitcpio now generates a fresh "initrd" to return to during
system shutdown instead of keeping around the actual initrd.

>> Anyway, I don't see any info about pivot_root syscall deprecation
>> in Linux kernel source tree.
> 
> Me neither, but then why switch_root?  I thought the whole reason it
> came about was because Linus et al considered pivot_root() to have
> been a terrible idea.

No idea what Linus said or didn't say about it, but here are the facts:

Initrd is a bad idea. It's a ramdisk of fixed size with an actual file
system on it. Instead, we now use initramfs, which is simply an archive
which is extracted directly into rootfs. And as I said above, you can't
pivot rootfs.

So, the old method was:
1) Mount "initrd" on /.
2) Do stuff, mount root on /realroot.
3) Pivot /realroot to / and / to /initrd
4) Exec /sbin/init
5) Unmount /initrd.

The new method is:
1) Extract "initramfs" into rootfs.
2) Do stuff, mount root on /realroot.
3) Delete all contents of / using unlink/rmdir
4) chdir("/realroot"); mount --move /realroot ., chroot(".");
5) Exec /sbin/init.

Step 3)-5) is what switch_root does. Somewhere in those steps, you also
move all API file system mount into /realroot, and maybe some details
which I forgot.

This doesn't mean that pivot_root should be deprecated, it just means
that you cannot use it for switchting from initramfs to the system.

Detailed explanations:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/ramfs-rootfs-initramfs.txt

http://git.busybox.net/busybox/tree/util-linux/switch_root.c#n132


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 14:15   ` Phillip Susi
  2014-01-30 14:50     ` Thomas Bächler
@ 2014-01-30 14:54     ` Dave Reisner
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Reisner @ 2014-01-30 14:54 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Karel Zak, util-linux

On Thu, Jan 30, 2014 at 09:15:11AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 1/30/2014 3:16 AM, Karel Zak wrote:
> > everyone switched to our switch_root :-)
> 
> Except apparently for the systemd folks, which for some odd reason
> like the idea of keeping around the initrd for the life of the system
> so init can "return" to it ( yuck! ).
> 

To be clear, systemd uses the syscall, not the util-linux utility. I'm
not sure why you think this is a poor idea when it, in fact, solves real
problems. If your root filesystem resides on a stacked block device
(mdadm, lvm, dmraid, dm-crypt), this is the *only* way to cleanly umount
the filesystem for disassembly. Remounting the filesystem read-only
might not be enough.

Results of not doing this vary. If your root is dm-crypt, you open up
more possibilities of cold boot attacks. If you use mdadm for a fakeraid
array, your fakeraid controller might insist on rebuilding the array on
the next reboot which could take hours.

> > Anyway, I don't see any info about pivot_root syscall deprecation
> > in Linux kernel source tree.
> 
> Me neither, but then why switch_root?  I thought the whole reason it
> came about was because Linus et al considered pivot_root() to have
> been a terrible idea.

My understanding is that pivot_root is a relic from the days of
/dev/initrd. Since 2.6 and the introduction of initramfs, it's no longer
needed for this purpose.

d

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 14:50     ` Thomas Bächler
@ 2014-01-30 15:20       ` Phillip Susi
  2014-01-30 16:13         ` Thomas Bächler
  2014-02-03 10:31         ` Michal Soltys
  2014-01-31  9:17       ` Karel Zak
  1 sibling, 2 replies; 12+ messages in thread
From: Phillip Susi @ 2014-01-30 15:20 UTC (permalink / raw)
  To: Thomas Bächler, Karel Zak; +Cc: util-linux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 1/30/2014 9:50 AM, Thomas Bächler wrote:
> At least mkinitcpio now generates a fresh "initrd" to return to
> during system shutdown instead of keeping around the actual
> initrd.

I see, so at shutdown the initramfs is re-loaded into a tmpfs that is
then pivot_root()ed to?  And at boot time, pivot_root is not used?

> Initrd is a bad idea. It's a ramdisk of fixed size with an actual
> file system on it. Instead, we now use initramfs, which is simply
> an archive which is extracted directly into rootfs. And as I said
> above, you can't pivot rootfs.

Ok, that sounds like it is what I was trying to remember.  They said
hell no, rootfs is rootfs, and there will be no pivoting it.  I guess
I assumed that meant pivot_root() was dead, not thinking you could
still use it after chrooting to a non rootfs.

> 4) chdir("/realroot"); mount --move /realroot ., chroot(".");

Wait, how do you move a mount into itself?  I thought this was just
the chdir and chroot.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS6m26AAoJEI5FoCIzSKrwUIcH/0Gq64aC4Sr0bb9vqhNYTgeY
TiSxsR4lx3r0g3Oja91j0az8TXeCdpb2LakY+TOxMYEtA6JyJ5rF9kFeo5imbVT4
XYj/40gmirs4m021gGNgAXIqyt7+RDMZh2KGSFN6zuUTX39HlvIqjBrPzDp2Howa
ASCfcOn6LujS5F1iaBK7aSkBdSQhq3WhsQAJN4OJ3DAjOoO74F07AoziPc0vG0Cn
66bkvEvMHTYpKULoJOB7YPTtkDqp3IjyrBx5kKPB1DKYGDKXQX2y4I0ZNb/VRkWn
RUvnFneULyWIWZRWb0z2Rf+NfdpkYU+Bfkg6TpnGquE4hKHZpTw5DrUB6Uv3UX4=
=Sgny
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 15:20       ` Phillip Susi
@ 2014-01-30 16:13         ` Thomas Bächler
  2014-02-03 10:31         ` Michal Soltys
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Bächler @ 2014-01-30 16:13 UTC (permalink / raw)
  To: Phillip Susi, Thomas Bächler, Karel Zak; +Cc: util-linux

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

Am 30.01.2014 16:20, schrieb Phillip Susi:
> On 1/30/2014 9:50 AM, Thomas Bächler wrote:
>> At least mkinitcpio now generates a fresh "initrd" to return to
>> during system shutdown instead of keeping around the actual
>> initrd.
> 
> I see, so at shutdown the initramfs is re-loaded into a tmpfs that is
> then pivot_root()ed to?

Yes. This is the only choice, since you cannot access rootfs anymore, as
that would require running unmounting /.

And in our case, it's not "the initramfs" that is put into tmpfs - we
merely install systemd-shutdown as /shutdown and all needed shared
libraries. We use the initramfs generator for convenience, since it
knows how to resolve library dependencies.

> And at boot time, pivot_root is not used?

Correct.

>> 4) chdir("/realroot"); mount --move /realroot ., chroot(".");
> 
> Wait, how do you move a mount into itself?  I thought this was just
> the chdir and chroot.

Hm, there's something wrong here. It should be:

chdir("/realroot");
mount(".", "/", NULL, MS_MOVE, NULL);
chroot(".");

The comment included in the busybox source file which I linked to
explains it all properly.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 14:50     ` Thomas Bächler
  2014-01-30 15:20       ` Phillip Susi
@ 2014-01-31  9:17       ` Karel Zak
  2014-01-31  9:21         ` Thomas Bächler
  1 sibling, 1 reply; 12+ messages in thread
From: Karel Zak @ 2014-01-31  9:17 UTC (permalink / raw)
  To: Thomas Bächler; +Cc: Phillip Susi, util-linux

On Thu, Jan 30, 2014 at 03:50:52PM +0100, Thomas Bächler wrote:
> In particular, if systemd is used in the initramfs, it will do an
> equivalent to util-linux's switch_root, but it will use an internal
> reimplementation to be able to pass some internal state from the
> initrd's systemd to the system's systemd (which would be lost if systemd
> would exec an external binary other than systemd).

Just for the record, switch_root has been originally implemented by
dracut folks and then moved to util-linux. (I guess that dracut
without systemd still uses the original switch_root.)

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-31  9:17       ` Karel Zak
@ 2014-01-31  9:21         ` Thomas Bächler
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Bächler @ 2014-01-31  9:21 UTC (permalink / raw)
  To: Karel Zak, Thomas Bächler; +Cc: Phillip Susi, util-linux

[-- Attachment #1: Type: text/plain, Size: 293 bytes --]

Am 31.01.2014 10:17, schrieb Karel Zak:
>(I guess that dracut
> without systemd still uses the original switch_root.)

So does mkinitcpio when used without systemd (which is still the
default). We used to use busybox switch_root, but at some point switched
to util-linux's version.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-01-30 15:20       ` Phillip Susi
  2014-01-30 16:13         ` Thomas Bächler
@ 2014-02-03 10:31         ` Michal Soltys
  2014-02-03 11:10           ` Kevin Wilson
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Soltys @ 2014-02-03 10:31 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux, Thomas Bächler, Karel Zak

On 2014-01-30 16:20, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 1/30/2014 9:50 AM, Thomas Bächler wrote:
>> At least mkinitcpio now generates a fresh "initrd" to return to
>> during system shutdown instead of keeping around the actual
>> initrd.
>
> I see, so at shutdown the initramfs is re-loaded into a tmpfs that is
> then pivot_root()ed to?  And at boot time, pivot_root is not used?
>

Initially when initramfs prepares /run and stuff (and switchroot moves 
it to real root), workable "minisystem" is kept under /run/initramfs (or 
similar).

On shutdown, /run/initramfs is bind-mounted to itself, which makes it 
pivotable, with old root say available at /oldroot. Now your aim is to 
cleanly unmount old root. In case of classic non-systemd, you would need:

- telinit u (graceful re-exec of init)
- mdmon --takeover (if you have non-native raid handled by md, to 
gracefully reexec the daemon)
- etc.

This allows to close all open files on real root (things are usually 
tried in loop, with attempts to stop stuff like lvm, md, ...) and 
finally unmount old root.

Of course there is also a question - why bother doing all this, when 
simple ro remount before halt has been working fine for decades (and 
still works) ....

Dracut (and others) /could/ be doing even more clever thing (hmm, maybe 
I should review certain old patches of mine) - during boot it could 
prepare mini-root on tmpfs before (some copying, some symlinks and/or 
PATH stuff) starting everything - then move actual tmpfs with living 
root to real root. The small advantage of that is, that you can at any 
moment chroot to such miniroot and do some maintenance on e.g. 
boot-critical storage daemon that lacks ability to reexecute itself 
gracefully.

Another interesting use for pivot_root() call - when paired with mount 
namespaces - is ability to provide stronger chroot jails.

So the call itself is still pretty useful. It hasn't beed used during 
boot anymore for years, but other possibilities emerged for it =)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pivot_root depreciated?
  2014-02-03 10:31         ` Michal Soltys
@ 2014-02-03 11:10           ` Kevin Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin Wilson @ 2014-02-03 11:10 UTC (permalink / raw)
  To: Michal Soltys; +Cc: Phillip Susi, util-linux, Thomas Bächler, Karel Zak

Hi all,
The pivot_root() system call is also used by the LXC project (LinuX Contain=
ers);
This project is gaining very high popularity recently especially due
to the Docker container engine;

see:
https://github.com/lxc/lxc/blob/master/src/lxc/conf.c#L1067

http://linuxcontainers.org/

Regards,
Kevin

On Mon, Feb 3, 2014 at 12:31 PM, Michal Soltys <soltys@ziu.info> wrote:
> On 2014-01-30 16:20, Phillip Susi wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 1/30/2014 9:50 AM, Thomas B=E4chler wrote:
>>>
>>> At least mkinitcpio now generates a fresh "initrd" to return to
>>> during system shutdown instead of keeping around the actual
>>> initrd.
>>
>>
>> I see, so at shutdown the initramfs is re-loaded into a tmpfs that is
>> then pivot_root()ed to?  And at boot time, pivot_root is not used?
>>
>
> Initially when initramfs prepares /run and stuff (and switchroot moves it=
 to
> real root), workable "minisystem" is kept under /run/initramfs (or simila=
r).
>
> On shutdown, /run/initramfs is bind-mounted to itself, which makes it
> pivotable, with old root say available at /oldroot. Now your aim is to
> cleanly unmount old root. In case of classic non-systemd, you would need:
>
> - telinit u (graceful re-exec of init)
> - mdmon --takeover (if you have non-native raid handled by md, to gracefu=
lly
> reexec the daemon)
> - etc.
>
> This allows to close all open files on real root (things are usually trie=
d
> in loop, with attempts to stop stuff like lvm, md, ...) and finally unmou=
nt
> old root.
>
> Of course there is also a question - why bother doing all this, when simp=
le
> ro remount before halt has been working fine for decades (and still works=
)
> ....
>
> Dracut (and others) /could/ be doing even more clever thing (hmm, maybe I
> should review certain old patches of mine) - during boot it could prepare
> mini-root on tmpfs before (some copying, some symlinks and/or PATH stuff)
> starting everything - then move actual tmpfs with living root to real roo=
t.
> The small advantage of that is, that you can at any moment chroot to such
> miniroot and do some maintenance on e.g. boot-critical storage daemon tha=
t
> lacks ability to reexecute itself gracefully.
>
> Another interesting use for pivot_root() call - when paired with mount
> namespaces - is ability to provide stronger chroot jails.
>
> So the call itself is still pretty useful. It hasn't beed used during boo=
t
> anymore for years, but other possibilities emerged for it =3D)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe util-linux" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-03 11:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-29 15:04 pivot_root depreciated? Phillip Susi
2014-01-29 15:29 ` Dave Reisner
2014-01-30  8:16 ` Karel Zak
2014-01-30 14:15   ` Phillip Susi
2014-01-30 14:50     ` Thomas Bächler
2014-01-30 15:20       ` Phillip Susi
2014-01-30 16:13         ` Thomas Bächler
2014-02-03 10:31         ` Michal Soltys
2014-02-03 11:10           ` Kevin Wilson
2014-01-31  9:17       ` Karel Zak
2014-01-31  9:21         ` Thomas Bächler
2014-01-30 14:54     ` Dave Reisner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.