linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][CFT] per-process namespaces for Linux
@ 2001-02-25  4:16 Alexander Viro
  2001-02-26 16:26 ` Peter J. Braam
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-25  4:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

	He's back. And this time he's got a chainsaw.

	Yes, folks. We got per-process namespaces. Working. With proper
behaviour on exit(), yodda, yodda. Enjoy. Help with testing would be more
than welcome.

Current patch is on ftp.math.psu.edu/pub/viro/namespaces-S2.gz
It's against 2.4.2.

Contents:
	* proper refcounting of struct super_block
	* GC for vfsmounts (finally)
	* fix for races between get_super() and umount()
	* SMP-safe lock_super()
	* general cleanup of fs/super.c
	* "lazy" option for umount() (detach from mountpoint now, do the
rest when it will cease to be busy - use MNT_DETACH in 'flags' argument
to get that behaviour).
	* Plan 9 per-process namespaces (sans unions so far)
	* large cleanup of boot process (ramdisk handling, etc.)

Variant without namespaces (they were the last part) is in the same
directory, called s_lock-S2.gz.

rfork.c (in the same place) will copy a namespace and start shell in it.
Use for testing... It's an equivalent of rfork(RFNAMEG) on Plan 9.

One detail - patch requires ramfs built into the kernel (boot process cleanup
part needs that).

It works here (ran for about 12 hours with no problems). It's _NOT_ for
inclusion into 2.4. Some pieces might go (get_super() races have to be
fixed, after all), but most of this stuff is 2.5 fodder. However, it
seems to be working. No doubt there are bugs and it's far from being
a final version. I would call it _very_ early beta. Please, help with
testing.

Comments on the code/design/amount of dope it took to write the thing (zero,
actually) are welcome. I _will_ document it, but it's still not in the
final form. Pretty close to it, hopefully, but...

I'm more than willing to answer questions on the design of the thing - just
ask. So far that's the best I can do - all documentation is a pile of notes
+ CVS log.

							Cheers,
								Al
PS: hopefully - back for good.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25  4:16 [PATCH][CFT] per-process namespaces for Linux Alexander Viro
@ 2001-02-26 16:26 ` Peter J. Braam
  2001-02-26 20:23   ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Peter J. Braam @ 2001-02-26 16:26 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel; +Cc: Ronald G. Minnich

Hi Al,

Very neat!

Ron Minnich and I built something similar: we built private namespaces for
login sessions.  Ours have slightly different semantics I think.

To do so we changed mount+chroot into "imount" (i = invisible).  This landed
a process in a file system that had no root in the Unix directory tree.
(see the "Private name spaces, PNS" project on SourceForge.

We added another goodie, which was called "memdev".  It provided a new block
device from a private, i.e. copy on write, memory mapped block device.  See
"memdev" on SourceForge.

We used it as follows:

 - when you login, you get imounted into an environment where you have full
priviliges (except mknod).  The "/" of your environment is not a directory
in the Unix tree.
 - in this environment the system file systems are available to you on a
copy on write private basis.
 - any files you change get out over a network file system to a server.  We
used InterMezzo backed by a ramfs cache.

When the user logs out, everything is gone, except possibly footprints in
swap.

- Peter J. Braam -

Mountain View Data, Inc.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26 16:26 ` Peter J. Braam
@ 2001-02-26 20:23   ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2001-02-26 20:23 UTC (permalink / raw)
  To: Peter J. Braam
  Cc: Alexander Viro, linux-fsdevel, linux-kernel, Ronald G. Minnich

On Mon, Feb 26, 2001 at 08:26:23AM -0800, Peter J. Braam wrote:
>  - when you login, you get imounted into an environment where you have full
> priviliges (except mknod).  The "/" of your environment is not a directory
> in the Unix tree.
>  - in this environment the system file systems are available to you on a
> copy on write private basis.
>  - any files you change get out over a network file system to a server.  We
> used InterMezzo backed by a ramfs cache.
> 
> When the user logs out, everything is gone, except possibly footprints in
> swap.

These changes can be used separately, can't they?
I'd really like to use them with Al's more generic namespaces concept.
Once thing that worries is that his patch want special privilegs for
creating a new namespace and I wonder if we really want that...

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28 19:18                               ` Alexander Viro
@ 2001-02-28 20:17                                 ` Ion Badulescu
  0 siblings, 0 replies; 29+ messages in thread
From: Ion Badulescu @ 2001-02-28 20:17 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-fsdevel

On Wed, 28 Feb 2001, Alexander Viro wrote:

> > And disadvantages: you can't have broken symlinks.
> > 
> > This actually turns out to be quite a bit of a problem when one tries
> > to use bind mounts with autofs. For one thing, it's perfectly legal
> > to have /autofs/foo as a symlink to /autofs/bar/foo, where /autofs/bar
> > is not yet mounted -- but a bind mount can't handle that...
> 
> First of all, you still have symlinks. 

Oh yeah, of course. :-)

> What's more, the right solution is to use local objects at the
> mountpoints. And forget about having a small tree full of links to
> real mountpoints. Think of autofs-with-one-node.

That's what Sun's autofs and am-utils call 'direct mounts', which are not 
yet supported by our autofs (unless I missed something recently). Direct 
mounts are good for some things, but not for everything. In particular, 
they are useless for cascading auto-triggered mounts (think 
/usr/local/src, /usr/local, and /usr, all automounted).

[and, btw, Linux _still_ doesn't properly support am-utils' direct mounts, 
although all that's needed is to remove LOOKUP_FOLLOW from path_init in 
sys_umount...]

As for bind mounts, I'll probably revisit them after I'm done with the 
Solaris autofs support in am-utils -- which will probably be a while. If I 
can get the thing to chain-trigger all the necessary mounts, we might be 
able to do something useful with it..

Thanks,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28 19:06                             ` Ion Badulescu
@ 2001-02-28 19:18                               ` Alexander Viro
  2001-02-28 20:17                                 ` Ion Badulescu
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-28 19:18 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: linux-kernel, linux-fsdevel



On Wed, 28 Feb 2001, Ion Badulescu wrote:

> On Wed, 28 Feb 2001 13:07:29 -0500 (EST), Alexander Viro <viro@math.psu.edu> wrote:
> 
> > On Wed, 28 Feb 2001, David L. Parsley wrote:
> 
> >> Yeah, mount --bind is cool, I've been using it on one of my projects
> >> today.  But - maybe I'm just not thinking creatively enough - what are
> >> the advantages of mount --bind versus just symlinking?
> > 
> > 1) Correctly working ".." (obviously relevant only for directories)
> > 2) Try to create symlinks on read-only NFS mount. For bonus points, try
> > to do that one one client without disturbing everybody else.
> > 3) Try to make it different for different users, for that matter.
> 
> And disadvantages: you can't have broken symlinks.
> 
> This actually turns out to be quite a bit of a problem when one tries
> to use bind mounts with autofs. For one thing, it's perfectly legal
> to have /autofs/foo as a symlink to /autofs/bar/foo, where /autofs/bar
> is not yet mounted -- but a bind mount can't handle that...

First of all, you still have symlinks. What's more, the right solution
is to use local objects at the mountpoints. And forget about having a
small tree full of links to real mountpoints. Think of autofs-with-one-node.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28 18:07                           ` Alexander Viro
@ 2001-02-28 19:06                             ` Ion Badulescu
  2001-02-28 19:18                               ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Ion Badulescu @ 2001-02-28 19:06 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-fsdevel

On Wed, 28 Feb 2001 13:07:29 -0500 (EST), Alexander Viro <viro@math.psu.edu> wrote:

> On Wed, 28 Feb 2001, David L. Parsley wrote:

>> Yeah, mount --bind is cool, I've been using it on one of my projects
>> today.  But - maybe I'm just not thinking creatively enough - what are
>> the advantages of mount --bind versus just symlinking?
> 
> 1) Correctly working ".." (obviously relevant only for directories)
> 2) Try to create symlinks on read-only NFS mount. For bonus points, try
> to do that one one client without disturbing everybody else.
> 3) Try to make it different for different users, for that matter.

And disadvantages: you can't have broken symlinks.

This actually turns out to be quite a bit of a problem when one tries
to use bind mounts with autofs. For one thing, it's perfectly legal
to have /autofs/foo as a symlink to /autofs/bar/foo, where /autofs/bar
is not yet mounted -- but a bind mount can't handle that...

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28  7:14                       ` Alexander Viro
@ 2001-02-28 18:13                         ` David L. Parsley
  2001-02-28 18:07                           ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: David L. Parsley @ 2001-02-28 18:13 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-fsdevel

Alexander Viro wrote:
> > Evil idea of the day: non-directory (even non-existant) mount points and
> > non-directory mounts. So then "mount --bind /etc/foo /dev/bar" works.
> 
> Try it. It _does_ work.

Yeah, mount --bind is cool, I've been using it on one of my projects
today.  But - maybe I'm just not thinking creatively enough - what are
the advantages of mount --bind versus just symlinking?

Also, I tried mount --bind fileone filetwo, and it fails if filetwo
doesn't exist. ('mount point filetwo doesn't exist').  Is that supposed
to work?  (using mount from latest redhat beta)

BTW, pivot_root is nifty, too. ;-)

regards,
	David

-- 
David L. Parsley
Network Administrator
Roanoke College

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28 18:13                         ` David L. Parsley
@ 2001-02-28 18:07                           ` Alexander Viro
  2001-02-28 19:06                             ` Ion Badulescu
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-28 18:07 UTC (permalink / raw)
  To: David L. Parsley; +Cc: linux-kernel, linux-fsdevel



On Wed, 28 Feb 2001, David L. Parsley wrote:

> Alexander Viro wrote:
> > > Evil idea of the day: non-directory (even non-existant) mount points and
> > > non-directory mounts. So then "mount --bind /etc/foo /dev/bar" works.
> > 
> > Try it. It _does_ work.
> 
> Yeah, mount --bind is cool, I've been using it on one of my projects
> today.  But - maybe I'm just not thinking creatively enough - what are
> the advantages of mount --bind versus just symlinking?

1) Correctly working ".." (obviously relevant only for directories)
2) Try to create symlinks on read-only NFS mount. For bonus points, try
to do that one one client without disturbing everybody else.
3) Try to make it different for different users, for that matter.

> Also, I tried mount --bind fileone filetwo, and it fails if filetwo
> doesn't exist. ('mount point filetwo doesn't exist').  Is that supposed
> to work?  (using mount from latest redhat beta)

Nope. It does exactly what it should - changing that is a too large
can of worms I simply don't want to touch.

> BTW, pivot_root is nifty, too. ;-)

Thank Werner for that ;-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28  7:03                     ` Albert D. Cahalan
  2001-02-28  7:14                       ` Alexander Viro
@ 2001-02-28  7:51                       ` Alexander Viro
  1 sibling, 0 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-28  7:51 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-fsdevel, linux-kernel



On Wed, 28 Feb 2001, Albert D. Cahalan wrote:

> Alexander Viro writes:
> 
> > 	* CLONE_NEWNS is made root-only (CAP_SYS_ADMIN, actually)
> 
> Would an unprivileged version that killed setuid be OK to have?
> 
> Evil idea of the day: non-directory (even non-existant) mount points and
> non-directory mounts. So then "mount --bind /etc/foo /dev/bar" works.

BTW, out of curiosity: what's that evil about non-directory mounts?
You obviously shouldn't mix directories with non-directories in that
context (userland will not take that lightly, same as with rename(),
etc.), but binding a non-directory over non-directory... Why not?
Me, I'm playing with
% mount -t devloop /tmp/image /dev/loop0 -o offset=4096
Yes, in that order. /dev/loop0 is the mountpoint here. ioctls? We don't
need on stinkin' ioctls. Now, _that_ I would call evil... Pretty simple,
actually - filesystem with ->read_super() making ->s_root not a directory
but a block device. And setting it up (lo_set_fd() with small modifications).
Still alpha, requires namespace patch (or at least s_lock one), but seems
to be working. Simpler than loop.c in official tree, BTW - no ioctls, no
handling pending requests since we unset device only upon umount, when
we have nobody keeping it open. losetup? What losetup? Shell script, if
somebody would bother to write it (going through losetup options and turning
them into mount ones).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-28  7:03                     ` Albert D. Cahalan
@ 2001-02-28  7:14                       ` Alexander Viro
  2001-02-28 18:13                         ` David L. Parsley
  2001-02-28  7:51                       ` Alexander Viro
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-28  7:14 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-fsdevel, linux-kernel



On Wed, 28 Feb 2001, Albert D. Cahalan wrote:

> Alexander Viro writes:
> 
> > 	* CLONE_NEWNS is made root-only (CAP_SYS_ADMIN, actually)
> 
> Would an unprivileged version that killed setuid be OK to have?

Not until we get decent resource accounting here.

> Evil idea of the day: non-directory (even non-existant) mount points and
> non-directory mounts. So then "mount --bind /etc/foo /dev/bar" works.

Try it. It _does_ work.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26 16:43                   ` Alexander Viro
  2001-02-27 20:08                     ` Alexander Viro
@ 2001-02-28  7:03                     ` Albert D. Cahalan
  2001-02-28  7:14                       ` Alexander Viro
  2001-02-28  7:51                       ` Alexander Viro
  1 sibling, 2 replies; 29+ messages in thread
From: Albert D. Cahalan @ 2001-02-28  7:03 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel

Alexander Viro writes:

> 	* CLONE_NEWNS is made root-only (CAP_SYS_ADMIN, actually)

Would an unprivileged version that killed setuid be OK to have?

Evil idea of the day: non-directory (even non-existant) mount points and
non-directory mounts. So then "mount --bind /etc/foo /dev/bar" works.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26 16:43                   ` Alexander Viro
@ 2001-02-27 20:08                     ` Alexander Viro
  2001-02-28  7:03                     ` Albert D. Cahalan
  1 sibling, 0 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-27 20:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel



 	New version uploaded on ftp.math.psu.edu/pub/viro/namespaces-d-S2.gz

Changes:
	* fixed an idiotic bug in get_filesystem_info() that din't 
unfortunately) show up on UP.
	* nosuid/nodev/noexec work in any combinations (had been b0rken in
previous version).
	* fixed multiple-mount (had been b0rken; --bind worked, but attempt
to mount the device you've already had mounted did bad things).
	* sanity checks for mount --move were missing. Fixed.
	* Assorted cleanups.

Folks, please help with testing.
 							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 16:04 ` Alexander Viro
  2001-02-25 19:01   ` Sandy Harris
@ 2001-02-27  9:50   ` David Woodhouse
  1 sibling, 0 replies; 29+ messages in thread
From: David Woodhouse @ 2001-02-27  9:50 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel


viro@math.psu.edu said:
> > Have you thought about supporting .tar.gz into ramfs? Creating custom
> > boot images would be simpler.

> *uh*. It's definitely easier to do than it used to be, but I'm
> seriously sceptical about adding more cruft into the thing.

The really neat part of untarring into a ramfs-root is that it allows you 
to remove a whole pile of other unnecessary cruft - ll_rw_blk and all the 
other crap for dealing with block devices and buffer_heads. 

CONFIG_BLK_DEV=n

--
dwmw2



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26 12:51                 ` Alexander Viro
@ 2001-02-26 16:43                   ` Alexander Viro
  2001-02-27 20:08                     ` Alexander Viro
  2001-02-28  7:03                     ` Albert D. Cahalan
  0 siblings, 2 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-26 16:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

	New version uploaded on ftp.math.psu.edu/pub/viro/namespaces-a-S2.gz
Changes:
	* nosuid, nodev and noexec are per-mountpoint now.
	* new flag for mount() - MS_MOVE (move a subtree, probable syntax
for mount(8) - mount --move old new; old must be a mountpoint)
	* Fixes for "lazy" umount.
	* CLONE_NEWNS is made root-only (CAP_SYS_ADMIN, actually)

Folks, please help with testing. Again, It Works Here(tm).
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26 11:54               ` Marco d'Itri
@ 2001-02-26 12:51                 ` Alexander Viro
  2001-02-26 16:43                   ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-26 12:51 UTC (permalink / raw)
  To: Marco d'Itri; +Cc: linux-kernel



On Mon, 26 Feb 2001, Marco d'Itri wrote:

> On Feb 26, Alexander Viro <viro@math.psu.edu> wrote:
> 
>  >There is no way to implement them without credentials' cache. Which needs
>  >to be done for many other reasons, but that's a separate patch and
>  >separate story. If it's done - no serious penalty involved. However,
>  >I doubt that we want a union on / itself. /dev - sure, /bin and /lib -
>  >maybe, but /... What for?
> What I'd really like to do is remount / somewhere with mount --bind,
> mount over it another skeleton file system which hides setuid programs
> and some directories and then run a chrooted sshd in the new root.
> If I'm not missing something, this would make creation of secure chroot
> environments very easy.

I'm making NOSUID per-mountpoint. So
	pid = clone(CLONE_NEWNS,0);
	if (!pid) {
		...
		remount everything with nosuid
		exec sshd
	}
should be OK
As for hiding the directories - also easy, mount --bind an empty 
immutable directory over each of them.

NODEV is also easy to make per-mountpoint, but readonly may be trickier;
we need permission() to take vfsmount+dentry instead of inode for that.
Doable, but will touch quite a few places.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26  0:26             ` Alexander Viro
@ 2001-02-26 11:54               ` Marco d'Itri
  2001-02-26 12:51                 ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Marco d'Itri @ 2001-02-26 11:54 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

On Feb 26, Alexander Viro <viro@math.psu.edu> wrote:

 >There is no way to implement them without credentials' cache. Which needs
 >to be done for many other reasons, but that's a separate patch and
 >separate story. If it's done - no serious penalty involved. However,
 >I doubt that we want a union on / itself. /dev - sure, /bin and /lib -
 >maybe, but /... What for?
What I'd really like to do is remount / somewhere with mount --bind,
mount over it another skeleton file system which hides setuid programs
and some directories and then run a chrooted sshd in the new root.
If I'm not missing something, this would make creation of secure chroot
environments very easy.

 >Tomorrow I'll try to catch Erik and talk with him about that. I'm not sure
 >that I know anyone in Debian Install System Team (oh, boy... somebody sure
Just write to debian-boot@lists.debian.org.

-- 
ciao,
Marco


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-26  1:14 Andries.Brouwer
@ 2001-02-26  1:39 ` Alexander Viro
  0 siblings, 0 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-26  1:39 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: Werner.Almesberger, linux-kernel



On Mon, 26 Feb 2001 Andries.Brouwer@cwi.nl wrote:

> > BTW, we probably want to add mount --move <old> <new> - atomically moving
> > a subtree from one place to another. Code is there, we just need to
> > decide on API. Andries?
> 
> Since we already have "mount --bind olddir newdir" this is not
> an unreasonable extension of the mount(8) syntax.
> And since the kernel is no longer so interested in coeds as
> some former mount author, we have lots of free bits.

/me scratches head and tries to figure out waht does "coed" mean...
<looking into webster>
C|N>K
<adding l-k to the "don't drink coffee while reading that" list>

> There are even old bits.
> 
> #define MS_MOVE	0x2000

Works for me...
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
@ 2001-02-26  1:14 Andries.Brouwer
  2001-02-26  1:39 ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Andries.Brouwer @ 2001-02-26  1:14 UTC (permalink / raw)
  To: Werner.Almesberger, viro; +Cc: linux-kernel

> BTW, we probably want to add mount --move <old> <new> - atomically moving
> a subtree from one place to another. Code is there, we just need to
> decide on API. Andries?

Since we already have "mount --bind olddir newdir" this is not
an unreasonable extension of the mount(8) syntax.
And since the kernel is no longer so interested in coeds as
some former mount author, we have lots of free bits.
There are even old bits.

#define MS_MOVE	0x2000

Andries

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 23:51           ` Werner Almesberger
@ 2001-02-26  0:26             ` Alexander Viro
  2001-02-26 11:54               ` Marco d'Itri
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-26  0:26 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: Andries Brouwer, linux-kernel



On Mon, 26 Feb 2001, Werner Almesberger wrote:

> Alexander Viro wrote:
> > No. Just an overmount.
> 
> Ah, too bad. Union mounts would have been really elegant (allowing the
> operation to be repeated without residues, and also allowing umounting
> of the covered FS as a sanity check). But I guess there's no way to
> implement them without performance penalty ...

There is no way to implement them without credentials' cache. Which needs
to be done for many other reasons, but that's a separate patch and
separate story. If it's done - no serious penalty involved. However,
I doubt that we want a union on / itself. /dev - sure, /bin and /lib -
maybe, but /... What for?
 
> > Is it worth emptying?
> 
> Probably not ... the only interesting case would be if you could completely
> umount it.

What's the point in unmounting it? Let the root of the mount tree be fixed -
it actually simplifies the things big way. Not that we had any performance
penalty for having the thing in place - after this forced chroot we never
touch it in lookups. BTW, pivot_root() is simpler that way.

BTW, we probably want to add mount --move <old> <new> - atomically moving
a subtree from one place to another. Code is there, we just need to
decide on API. Andries?

> So with some luck, distributors will switch to pivot_root sometime soon,
> when deploying 2.4. So if we drop all the old junk in 2.5, the amount of
> letter bombs should be small ;-)

Tomorrow I'll try to catch Erik and talk with him about that. I'm not sure
that I know anyone in Debian Install System Team (oh, boy... somebody sure
loved capital letters). And I've absolutely no idea who is doing that stuff
in other distributions...
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 22:39         ` Alexander Viro
@ 2001-02-25 23:51           ` Werner Almesberger
  2001-02-26  0:26             ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Werner Almesberger @ 2001-02-25 23:51 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

Alexander Viro wrote:
> No. Just an overmount.

Ah, too bad. Union mounts would have been really elegant (allowing the
operation to be repeated without residues, and also allowing umounting
of the covered FS as a sanity check). But I guess there's no way to
implement them without performance penalty ...

> Is it worth emptying?

Probably not ... the only interesting case would be if you could completely
umount it.

> BTW, Werner - could you take a look at the
> prepare_namespace()/handle_initrd()?

Okay, I'll have a look.

> That's our late boot process taken into one place. I'm really not happy
> about the following:

Agreed on all three counts. Also, change_root might just die by evolution,
just like most of NFS-root-from-initrd (using change_root) died.

What we need is a migration plan. Right now, it seems that most people
still use change_root. Hopefully they read the little message I left them
in linux/Documentation/initrd.txt:

  Current kernels still support it, but you should _not_ rely on its
  continued availability.

So with some luck, distributors will switch to pivot_root sometime soon,
when deploying 2.4. So if we drop all the old junk in 2.5, the amount of
letter bombs should be small ;-)

> Again, current patch reproduces the behaviour of the main tree.

Since you've already done all the work ... ;-) It's good if we can make
one change at a time.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, ICA, EPFL, CH           Werner.Almesberger@epfl.ch /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 21:57       ` Werner Almesberger
@ 2001-02-25 22:39         ` Alexander Viro
  2001-02-25 23:51           ` Werner Almesberger
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-25 22:39 UTC (permalink / raw)
  To: Werner Almesberger; +Cc: linux-kernel



On Sun, 25 Feb 2001, Werner Almesberger wrote:

> Alexander Viro wrote:
> > No kludges actually needed. "Simplified boot sequence" _is_ simplified -
> > we overmount the "final" root over ramfs. Initially empty. So you have
> > the normal environment when you load ramdisk, etc.
> 
> So is this the Holy Grail, err, union mount we've discussed about one year
> ago ? I.e.

No. Just an overmount. Final root ends up mounted atop of absolute root -
see comments in fs/super.c:mount_root() and in init/do_mounts.c

> stat foo	# output A
> mount /dev/whatever /
> stat foo	# output B
> 
> with A != B ?

We end with forced chroot to covering one. Due to details of path_walk()
it's unbreakable even for root (well, barring the direct access to
kernel data structures via /dev/kmem ;-)

So yes, you'll see the covering fs. Chech do_chroot() in init/do_mounts.c
 
> If yes, is there also a way to destroy/empty ramfs after this ?

At the end of boot process we are left with (at most) 8 dentries, 8 inodes and
no data pages on ramfs. Is it worth emptying? I can do that (reduce to
1 dentry/1 inode), just add sys_rmdir() and sys_unlink() calls in
the end of init/do_mounts.c:prepare_namespace(), but I don't really
see the point of it.

Fs _is_ covered - you don't get its objects after mouting the final root.

BTW, Werner - could you take a look at the prepare_namespace()/handle_initrd()?
That's our late boot process taken into one place. I'm really not happy
about the following:
	a) initrd with /linuxrc exec'ing init leaves init with PID > 1.
Is it a good idea? I've reproduced the behaviour we have in the main tree,
but I have a bad feeling about it. For one thing, init is killable that
way. Not good...
	b) can we _please_ kill the real_root_dev sysctl?
	c) you had plans for mandating non-exiting /linuxrc. What's the status
of these plans? I'd be glad if we could pull that one off... More than
half of handle_initrd() implements the behaviour for the case when /linuxrc
does exit and I would be only happy to remove that cruft. AFAICS both
RH and Debian have /linuxrc that _does_ exit, though...

Again, current patch reproduces the behaviour of the main tree. Every
boot setup that used to work should stay working - that was the design
goal. I want to, erm, concentrate the existing logics in one place
and make it readable before even thinking of changing behaviour.

I've tested it with all combinations that end up with root on local
fs (initrd or not, ramdisks from floppies, devfs mounted or not and
their combinations, with different variants of /linuxrc in cases that
did initrd).  I didn't do exhaustive testing for NFS-root. If someone
can find a setup that works with official tree and doesn't work with 
the patched one - yell. I consider that as a bug.

BTW, people with rootfs=... patches may find that in this variant
their patches would become _much_ simpler - they can actually call sys_mount()
to mount the final root.

Don't get me wrong - I would be glad to see both rootfs=... and tar patches
done atop of that namespace/s_lock (and archive them, keep up-to-date, put
on FTP, etc).  Just don't expect me to _merge_ them until their counterparts
get merged into the Linus' tree (or this patch ends up these, but that won't
happen until 2.5).

IOW, I consider the boot-process part of the patch as cleanup of
existing code. If it makes some experiments easier - great, but
in _that_ respect namespace patch is in permanent feature freeze.
Unless behaviour is accepted by Linus - it won't get merged.

							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 19:13     ` Alexander Viro
@ 2001-02-25 21:57       ` Werner Almesberger
  2001-02-25 22:39         ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Werner Almesberger @ 2001-02-25 21:57 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

Alexander Viro wrote:
> No kludges actually needed. "Simplified boot sequence" _is_ simplified -
> we overmount the "final" root over ramfs. Initially empty. So you have
> the normal environment when you load ramdisk, etc.

So is this the Holy Grail, err, union mount we've discussed about one year
ago ? I.e.

stat foo	# output A
mount /dev/whatever /
stat foo	# output B

with A != B ?

If yes, is there also a way to destroy/empty ramfs after this ?

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, ICA, EPFL, CH           Werner.Almesberger@epfl.ch /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 19:01   ` Sandy Harris
  2001-02-25 19:13     ` Alexander Viro
@ 2001-02-25 19:48     ` Arjan van de Ven
  1 sibling, 0 replies; 29+ messages in thread
From: Arjan van de Ven @ 2001-02-25 19:48 UTC (permalink / raw)
  To: Sandy Harris; +Cc: linux-kernel

In article <3A99569F.98C64B29@storm.ca> you wrote:

> A better approach might be to find or invent a generic compressed file system.
> Given that, you just build a compressed root, copy an image of it into ramdisk
> and let the compressed FS driver handle it from there. I suspect such a driver
> might be useful elsewhere as well. Does one exist?

cramfs is compressed but read-only, jffs has the potential to do compressed
writes as well....

Greetings,
    Arjan van de Ven

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 19:01   ` Sandy Harris
@ 2001-02-25 19:13     ` Alexander Viro
  2001-02-25 21:57       ` Werner Almesberger
  2001-02-25 19:48     ` Arjan van de Ven
  1 sibling, 1 reply; 29+ messages in thread
From: Alexander Viro @ 2001-02-25 19:13 UTC (permalink / raw)
  To: Sandy Harris; +Cc: linux-kernel



On Sun, 25 Feb 2001, Sandy Harris wrote:

> One is just mount a ramdisk and extract a tarball into its root. Yes, this has
> some problems -- how do you load tar when you haven't set up your root? -- but
> I suspect they can be solved. At worst, this would involve some strictly limited
> kluge to do that.

No kludges actually needed. "Simplified boot sequence" _is_ simplified -
we overmount the "final" root over ramfs. Initially empty. So you have
the normal environment when you load ramdisk, etc.

IOW, with the namespaces patch you can have root (empty, writable)
as soon as you've registered ramfs driver. I.e. _very_ early - before
device initialization, for one thing. Actual mounting of the "final"
root happen very late, along with all initrd games, etc. That stuff
(in do_mounts.c) could be executed as userland process, actually -
see the comments in init/do_mounts.c and actual code there.
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 16:04 ` Alexander Viro
@ 2001-02-25 19:01   ` Sandy Harris
  2001-02-25 19:13     ` Alexander Viro
  2001-02-25 19:48     ` Arjan van de Ven
  2001-02-27  9:50   ` David Woodhouse
  1 sibling, 2 replies; 29+ messages in thread
From: Sandy Harris @ 2001-02-25 19:01 UTC (permalink / raw)
  To: linux-kernel

Alexander Viro wrote:

> > Have you thought about supporting .tar.gz into ramfs? Creating custom
> > boot images would be simpler.
> 
> *uh*. It's definitely easier to do than it used to be, but I'm seriously
> sceptical about adding more cruft into the thing. ...
> 
> (I presume that you mean "unpacking tar.gz into initrd/floppy-loaded ramdisk"
> and not "adding into ramfs a loader of tarballs" - the latter is out of
> question, as far as I'm concerned;

Yes, indeed.

> such code belongs to do_mounts.c if it belongs anywhere at all)
> 
> IOW, look into init/do_mounts.c - that's the right place to do that
> stuff.

Methinks there are at least two possibilities that could do everything we
might need here without unnecessary complications.

One is just mount a ramdisk and extract a tarball into its root. Yes, this has
some problems -- how do you load tar when you haven't set up your root? -- but
I suspect they can be solved. At worst, this would involve some strictly limited
kluge to do that.

A better approach might be to find or invent a generic compressed file system.
Given that, you just build a compressed root, copy an image of it into ramdisk
and let the compressed FS driver handle it from there. I suspect such a driver
might be useful elsewhere as well. Does one exist?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25 10:44 Manfred Spraul
@ 2001-02-25 16:04 ` Alexander Viro
  2001-02-25 19:01   ` Sandy Harris
  2001-02-27  9:50   ` David Woodhouse
  0 siblings, 2 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-25 16:04 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel



On Sun, 25 Feb 2001, Manfred Spraul wrote:

> 
> >  * large cleanup of boot process (ramdisk handling, etc.)
> 
> Have you thought about supporting .tar.gz into ramfs? Creating custom
> boot images would be simpler.

*uh*. It's definitely easier to do than it used to be, but I'm seriously
sceptical about adding more cruft into the thing. Let's sort it out
and then see what can be added to the sequences. At least now it's in
one place and doesn't have to pull the tricks it used to need for dealing
with IO...

(I presume that you mean "unpacking tar.gz into initrd/floppy-loaded ramdisk"
and not "adding into ramfs a loader of tarballs" - the latter is out of
question, as far as I'm concerned; such code belongs to do_mounts.c if
it belongs anywhere at all)

IOW, look into init/do_mounts.c - that's the right place to do that
stuff.
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH][CFT] per-process namespaces for Linux
@ 2001-02-25 10:44 Manfred Spraul
  2001-02-25 16:04 ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Manfred Spraul @ 2001-02-25 10:44 UTC (permalink / raw)
  To: viro, linux-kernel


>  * large cleanup of boot process (ramdisk handling, etc.)

Have you thought about supporting .tar.gz into ramfs? Creating custom
boot images would be simpler.

--
	Manfred

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
  2001-02-25  5:28 Rick Hohensee
@ 2001-02-25  5:40 ` Alexander Viro
  0 siblings, 0 replies; 29+ messages in thread
From: Alexander Viro @ 2001-02-25  5:40 UTC (permalink / raw)
  To: Rick Hohensee; +Cc: linux-kernel



On Sun, 25 Feb 2001, Rick Hohensee wrote:

[I wrote]

> >ask. So far that's the best I can do - all documentation is a pile of
> >notes
> >+ CVS log.

[snip]

> That sounds like an especially fascinating pile of notes. Perhaps you
> could pile it next to the patch on the ftp site?

You know, CDA is dead and gone, but I really doubt that putting this
pile as-is in any vicinity of this account would be a good idea.
Besides, half of them will need a translation - I doubt that 80Kb of
grep output intermixed with comments in English and Russian, some of
them printable, would be useful. Fasicanting - maybe, but... IOW, turning
that into documentation will take some efforts.
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH][CFT] per-process namespaces for Linux
@ 2001-02-25  5:28 Rick Hohensee
  2001-02-25  5:40 ` Alexander Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Rick Hohensee @ 2001-02-25  5:28 UTC (permalink / raw)
  To: linux-kernel

>I'm more than willing to answer questions on the design of the thing -
>just
>ask. So far that's the best I can do - all documentation is a pile of
>notes
>+ CVS log.
>
>                                                        Cheers,
>                                                                Al
>PS: hopefully - back for good.

That sounds like an especially fascinating pile of notes. Perhaps you
could pile it next to the patch on the ftp site?

Rick Hohensee

:; cLIeNUX /dev/tty5  01:08:45   /
:;ls -d */
Linux/        dev/          help/         owner/        temp/
boot/         device/       incoming/     source/
command/      floppy/       log/          subroutines/
configure/    guest/        mounts/       suite/
:; cLIeNUX /dev/tty5  01:08:55   /
:;


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2001-02-28 20:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-25  4:16 [PATCH][CFT] per-process namespaces for Linux Alexander Viro
2001-02-26 16:26 ` Peter J. Braam
2001-02-26 20:23   ` Christoph Hellwig
2001-02-25  5:28 Rick Hohensee
2001-02-25  5:40 ` Alexander Viro
2001-02-25 10:44 Manfred Spraul
2001-02-25 16:04 ` Alexander Viro
2001-02-25 19:01   ` Sandy Harris
2001-02-25 19:13     ` Alexander Viro
2001-02-25 21:57       ` Werner Almesberger
2001-02-25 22:39         ` Alexander Viro
2001-02-25 23:51           ` Werner Almesberger
2001-02-26  0:26             ` Alexander Viro
2001-02-26 11:54               ` Marco d'Itri
2001-02-26 12:51                 ` Alexander Viro
2001-02-26 16:43                   ` Alexander Viro
2001-02-27 20:08                     ` Alexander Viro
2001-02-28  7:03                     ` Albert D. Cahalan
2001-02-28  7:14                       ` Alexander Viro
2001-02-28 18:13                         ` David L. Parsley
2001-02-28 18:07                           ` Alexander Viro
2001-02-28 19:06                             ` Ion Badulescu
2001-02-28 19:18                               ` Alexander Viro
2001-02-28 20:17                                 ` Ion Badulescu
2001-02-28  7:51                       ` Alexander Viro
2001-02-25 19:48     ` Arjan van de Ven
2001-02-27  9:50   ` David Woodhouse
2001-02-26  1:14 Andries.Brouwer
2001-02-26  1:39 ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).