All of lore.kernel.org
 help / color / mirror / Atom feed
* read only bind mount ignores ready only
@ 2013-12-11 14:37 Phillip Susi
  2013-12-11 16:49 ` Karel Zak
  2013-12-12 12:05 ` Karel Zak
  0 siblings, 2 replies; 11+ messages in thread
From: Phillip Susi @ 2013-12-11 14:37 UTC (permalink / raw)
  To: util-linux

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Forwarding report from
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/712892

It seems that the kernel has a bug where it silently ignores the
MS_RDONLY flag when creating a bind mount.  mount issues a warning
that the mount point appears to be read-write even though you
requested read only.  The reporter suggests a patch to automatically
attempt to remount with MS_RDONLY before issuing this warning to work
around the kernel bug.  What do you think?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSqHjFAAoJEI5FoCIzSKrwP5wH/3vfHcUVra8Zh2GUcTEMU7ex
BEed+jb4KYeuuISO8wxrkGb7eRAw/mHQTTPmVPjouWbG0s7AMXb/k1JQw3VEwtPA
7Mm8Y6jZoMJTiHvegWAKCWiaKcZ2ututJa23OP7RAgWJeGoZVdRtpRCyC6XOT3ES
anUdwKpoZgDILKMdi+ssgfDVjPgDpaOluHkXLvhPlYyYiHb7WeAjEWGryTCt/vXq
74CjnD0l07Ryvg0ZNehxLQG6YJqQyNK69MUlDfNo3Tr66oeZNfbvso2Npkhvi6/E
Ut+6dJr0Z9fasfHb88UmhmLUcSCKM6HRGy29fSKQ3UJDpD77o7smPr/p51YaX0M=
=x0mB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-11 14:37 read only bind mount ignores ready only Phillip Susi
@ 2013-12-11 16:49 ` Karel Zak
  2013-12-12 12:05 ` Karel Zak
  1 sibling, 0 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-11 16:49 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux

On Wed, Dec 11, 2013 at 09:37:57AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Forwarding report from
> https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/712892
> 
> It seems that the kernel has a bug where it silently ignores the
> MS_RDONLY flag when creating a bind mount.  mount issues a warning

Yes, this is known issue.

> that the mount point appears to be read-write even though you

I think that libmount based mount does not warn about it (mistake?).

> requested read only.  The reporter suggests a patch to automatically
> attempt to remount with MS_RDONLY before issuing this warning to work
> around the kernel bug.  What do you think?

Well, it means that the kernel disadvantage will never be fixed ;-)

It would be relatively simple to fix it, because libmount already
support "additional mounts" to implement things like

   mount --make-private /dev/sda1 /mnt

(kernel does not allow to use propagation flags for regular mount
 operation). I'll try it tomorrow.

The problem is that all these userspace hacks does not atomic...

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-11 14:37 read only bind mount ignores ready only Phillip Susi
  2013-12-11 16:49 ` Karel Zak
@ 2013-12-12 12:05 ` Karel Zak
  2013-12-12 14:59   ` Phillip Susi
  2013-12-12 19:42   ` Miklos Szeredi
  1 sibling, 2 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-12 12:05 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux, Miklos Szeredi


 [CC: kernel guys]

On Wed, Dec 11, 2013 at 09:37:57AM -0500, Phillip Susi wrote:
> It seems that the kernel has a bug where it silently ignores the
> MS_RDONLY flag when creating a bind mount.  mount issues a warning
> that the mount point appears to be read-write even though you
> requested read only.  The reporter suggests a patch to automatically
> attempt to remount with MS_RDONLY before issuing this warning to work
> around the kernel bug.  What do you think?

I have it implemented, so

 mount --bind --read-only /mnt /mnt

is interpreted as two requests (two mount(2) calls)

 mount --bind /mnt /mnt
 mount -o remount,bind,ro /tmp      

it works as expected, but it does not work with MS_REC (recursive)
because kernel currently does not support

  MS_REMOUNT|MS_BIND|MS_REC|...

it means that 

  mount --rbind --read-only /mnt /mnt

creates only top-level read-only mountpoint, the rest is unchanged.


Miklos would be possible to fix kernel to accept MS_REC for
MS_REMOUNT|MS_BIND|MS_RDONLY operation? Please.

It seems that all we need is to call stuff in mnt_make_readonly() for 
all next_mnt() items.


(Well, it would be also nice to learn kernel to support
 MS_BIND|MS_RDONLY, but it's probably more invasive change.)

    Lare;

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-12 12:05 ` Karel Zak
@ 2013-12-12 14:59   ` Phillip Susi
  2013-12-12 16:02     ` Karel Zak
  2013-12-12 19:42   ` Miklos Szeredi
  1 sibling, 1 reply; 11+ messages in thread
From: Phillip Susi @ 2013-12-12 14:59 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Miklos Szeredi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/12/2013 7:05 AM, Karel Zak wrote:
> I have it implemented, so
> 
> mount --bind --read-only /mnt /mnt
> 
> is interpreted as two requests (two mount(2) calls)
> 
> mount --bind /mnt /mnt mount -o remount,bind,ro /tmp

And mount -o bind,ro is the same right?  So you can set up a ro bind
mount in fstab?


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSqc89AAoJEI5FoCIzSKrwdhUH/R0lBLjHBtgK6BBrh+ULpRyJ
o78QpMMrDH1qwui/MlCg1gXZe7ue6l7InIEfx3e62VBJTeMtIOUFKJB6Cvqt6/sb
wtP3iUgqTqlD2L4FTmbX5hAB9b1XTYpfko4NIfFy6Xc92jgpPoDjQh9W47q5keQ1
N1HgHAG2iyFWrtkBYsFBFv1tFIKqXF59/oPPF70lQESJldmvYr8FHtSinYISIHAH
Hcc+SVjTlUhQZVRb8teqcy7T8oAZu78NSqLXzgOG9uWooKduRG9Revye2/71tOHz
6PqPxreNXe+SgEHsQ2+R4tl4Rm1ttzMc7voJC04lDSIaxZJtPrxnylQnflndYjk=
=EJIS
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-12 14:59   ` Phillip Susi
@ 2013-12-12 16:02     ` Karel Zak
  0 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-12 16:02 UTC (permalink / raw)
  To: Phillip Susi; +Cc: util-linux, Miklos Szeredi

On Thu, Dec 12, 2013 at 09:59:10AM -0500, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/12/2013 7:05 AM, Karel Zak wrote:
> > I have it implemented, so
> > 
> > mount --bind --read-only /mnt /mnt
> > 
> > is interpreted as two requests (two mount(2) calls)
> > 
> > mount --bind /mnt /mnt mount -o remount,bind,ro /tmp
> 
> And mount -o bind,ro is the same right?  So you can set up a ro bind
> mount in fstab?

 Yes, but it is not in the git tree yet (I'd like to wait for Miklos's
 reply).

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-12 12:05 ` Karel Zak
  2013-12-12 14:59   ` Phillip Susi
@ 2013-12-12 19:42   ` Miklos Szeredi
  2013-12-12 21:53       ` Al Viro
  2013-12-13  8:18       ` Karel Zak
  1 sibling, 2 replies; 11+ messages in thread
From: Miklos Szeredi @ 2013-12-12 19:42 UTC (permalink / raw)
  To: Karel Zak; +Cc: Phillip Susi, util-linux, Linux-Fsdevel

On Thu, Dec 12, 2013 at 1:05 PM, Karel Zak <kzak@redhat.com> wrote:
>
>  [CC: kernel guys]
>
> On Wed, Dec 11, 2013 at 09:37:57AM -0500, Phillip Susi wrote:
>> It seems that the kernel has a bug where it silently ignores the
>> MS_RDONLY flag when creating a bind mount.  mount issues a warning
>> that the mount point appears to be read-write even though you
>> requested read only.  The reporter suggests a patch to automatically
>> attempt to remount with MS_RDONLY before issuing this warning to work
>> around the kernel bug.  What do you think?
>
> I have it implemented, so
>
>  mount --bind --read-only /mnt /mnt
>
> is interpreted as two requests (two mount(2) calls)
>
>  mount --bind /mnt /mnt
>  mount -o remount,bind,ro /tmp
>
> it works as expected, but it does not work with MS_REC (recursive)
> because kernel currently does not support
>
>   MS_REMOUNT|MS_BIND|MS_REC|...
>
> it means that
>
>   mount --rbind --read-only /mnt /mnt
>
> creates only top-level read-only mountpoint, the rest is unchanged.
>
>
> Miklos would be possible to fix kernel to accept MS_REC for
> MS_REMOUNT|MS_BIND|MS_RDONLY operation? Please.

I really hate the current mount(2) API. It's a gigantic hack, and it's
nearing the end of its life anyway due to flags running out.

So instead of adding more hacks, I think it would be better to think
about adding a couple of syscalls that have clearly defined semantics.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
@ 2013-12-12 21:53       ` Al Viro
  0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2013-12-12 21:53 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Karel Zak, Phillip Susi, util-linux, Linux-Fsdevel

On Thu, Dec 12, 2013 at 08:42:54PM +0100, Miklos Szeredi wrote:

> I really hate the current mount(2) API. It's a gigantic hack, and it's
> nearing the end of its life anyway due to flags running out.

You and me and just about anyone who'd ever looked at that mess ;-/

> So instead of adding more hacks, I think it would be better to think
> about adding a couple of syscalls that have clearly defined semantics.

It's not just flags, unfortunately.  Another problem stems from the
fact that the normal case used to be "mount the filesystem from this
block device on this directory", with additional flag added in v5
to indicate whether we want it rw or ro (v1 to v4 had everything rw).

On any modern Unix, Linux included, that does not fit the reality.
First of all, the main property of filesystem is not a block device -
it's filesystem type.  I.e. the real type of mount(2) (the normal
case, after you shed all the cruft with remount, bind, etc.) is
int (mountpoint, fs type, arguments specific for that fs type).
What's more, type-specific arguments really are almost entirely up
to fs driver.  "The block device of given filesystem" is not a well-defined
thing - it makes no sense for any network filesystem, for something like
procfs, for something that lives in userland, or uses more than one block
device, or lives on mtd device, etc.

Furthermore, even for types that do live on a single block device we need
more than just that device.  Even back in 1974 (v5), they had to add
a flag for rw vs. ro mounts.  For a while it looked like it would be
possible to keep it as bitmap (and the things were getting even more
muddled by mixing the flags fs itself doesn't care about into the
same thing - e.g. nosuid/nodev/noexec went there as well).  Alas, the
things got even nastier with NFS and its ilk - there had been too much
extra data to hope to pack it into a bitmap (timeout, etc.).

One approach had been
	type, mountpoint, flags, type-dependent pointer to struct
with flags still being a mix of "fs itself doesn't give a damn" ones
with ones that are very much for fs use (sync vs. async, for starters).
Pointer to device name had been hidden inside that struct in cases when
fs types needed one.  The really messy part of that approach is a binary
structure passed along, complete with alignment differences, size of
pointer headache, marshalling for case of userland filesystems, etc.
Moreover, mount(8) had to know the layouts of all these structures -
after all, it has to build one from the text you've got in fstab.  In
practice that meant separate binaries for different fs types - mount_nfs,
mount_xfs, etc., called by mount(8).  That's more or less what *BSD had
done.  Much later FreeBSD tried to go for array of pairs, passed as
an iovec (see nmount(2)).  At least nobody has been deranged enough to
pass XML...

	Linux started with v7-like (even pre-v5-like; there was no ro/rw flag)
variant, proceed to type x device name x opaque other data and shortly after
(in 0.97) to type x device name x flags x opaque other data.  With opaque data
being sometimes a string options, sometimes a binary structure.  Led to all
kinds of interesting headache for 32bit vs. 64bit userland later on; these
days it has mostly converged to device name x flags x opaque option string -
there are some exceptions, the worst offender being ncpfs.

	Note that device name is *also* opaque - it's interpreted by fs
type.  The parts of kernel outside of specific fs have no idea what to
do with that thing; quite a few filesystems simply ignore it (common
userland conventions include "none" or fs type name itself), some treat it
as a pathname of block device, some interpret it as a mix of server name and
path on server, etc.  As far as the rest of the kernel (starting with VFS)
is concerned, device name is a part of opaque triple passed along to fs driver.

	Another ugly thing is that e.g. ncpfs needs a non-trivial dialog
with server and it's implemented thus:
	mount(2) is given enough information to connect to server and mount
something.  Server is not willing to give any fs contents yet, though, so
all we see is an empty directory.
	mount(8) opens that directory and uses ioctl(2) to talk to server
	eventually that dialog with the server convinced it that we are to
be allowed to mount the sucker.  At that point the contents suddenly appears
in the previously empty directory.  No way for somebody looking at that
empty directory to tell if it's genuinely empty fs imported from the server
or just a half-authenticated one (you can see that ncpfs is mounted there,
but that's it).

	Frankly, I wonder if we are trying to pack too much into one
syscall - not just in terms of overloading it (that much is obvious),
but in terms of trying to cram a sequence of syscalls into one.  If
we end up introducing new API(s) for mount(), it's probably worth
considering something like this:
	* open a connection to fs type driver, get a descriptor
	* use normal IO syscalls (usually just write(2)) on that
descriptor to tell fs type driver what do we want.  If any kind of
authentication is needed, that's the time for doing it
	* attach the thing identified by that descriptor to mountpoint

I have an old writeup somewhere (several variants of it, actually) on possible
replacement APIs; I'll try to dig it out and post it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
@ 2013-12-12 21:53       ` Al Viro
  0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2013-12-12 21:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Karel Zak, Phillip Susi, util-linux-u79uwXL29TY76Z2rM5mHXA,
	Linux-Fsdevel

On Thu, Dec 12, 2013 at 08:42:54PM +0100, Miklos Szeredi wrote:

> I really hate the current mount(2) API. It's a gigantic hack, and it's
> nearing the end of its life anyway due to flags running out.

You and me and just about anyone who'd ever looked at that mess ;-/

> So instead of adding more hacks, I think it would be better to think
> about adding a couple of syscalls that have clearly defined semantics.

It's not just flags, unfortunately.  Another problem stems from the
fact that the normal case used to be "mount the filesystem from this
block device on this directory", with additional flag added in v5
to indicate whether we want it rw or ro (v1 to v4 had everything rw).

On any modern Unix, Linux included, that does not fit the reality.
First of all, the main property of filesystem is not a block device -
it's filesystem type.  I.e. the real type of mount(2) (the normal
case, after you shed all the cruft with remount, bind, etc.) is
int (mountpoint, fs type, arguments specific for that fs type).
What's more, type-specific arguments really are almost entirely up
to fs driver.  "The block device of given filesystem" is not a well-defined
thing - it makes no sense for any network filesystem, for something like
procfs, for something that lives in userland, or uses more than one block
device, or lives on mtd device, etc.

Furthermore, even for types that do live on a single block device we need
more than just that device.  Even back in 1974 (v5), they had to add
a flag for rw vs. ro mounts.  For a while it looked like it would be
possible to keep it as bitmap (and the things were getting even more
muddled by mixing the flags fs itself doesn't care about into the
same thing - e.g. nosuid/nodev/noexec went there as well).  Alas, the
things got even nastier with NFS and its ilk - there had been too much
extra data to hope to pack it into a bitmap (timeout, etc.).

One approach had been
	type, mountpoint, flags, type-dependent pointer to struct
with flags still being a mix of "fs itself doesn't give a damn" ones
with ones that are very much for fs use (sync vs. async, for starters).
Pointer to device name had been hidden inside that struct in cases when
fs types needed one.  The really messy part of that approach is a binary
structure passed along, complete with alignment differences, size of
pointer headache, marshalling for case of userland filesystems, etc.
Moreover, mount(8) had to know the layouts of all these structures -
after all, it has to build one from the text you've got in fstab.  In
practice that meant separate binaries for different fs types - mount_nfs,
mount_xfs, etc., called by mount(8).  That's more or less what *BSD had
done.  Much later FreeBSD tried to go for array of pairs, passed as
an iovec (see nmount(2)).  At least nobody has been deranged enough to
pass XML...

	Linux started with v7-like (even pre-v5-like; there was no ro/rw flag)
variant, proceed to type x device name x opaque other data and shortly after
(in 0.97) to type x device name x flags x opaque other data.  With opaque data
being sometimes a string options, sometimes a binary structure.  Led to all
kinds of interesting headache for 32bit vs. 64bit userland later on; these
days it has mostly converged to device name x flags x opaque option string -
there are some exceptions, the worst offender being ncpfs.

	Note that device name is *also* opaque - it's interpreted by fs
type.  The parts of kernel outside of specific fs have no idea what to
do with that thing; quite a few filesystems simply ignore it (common
userland conventions include "none" or fs type name itself), some treat it
as a pathname of block device, some interpret it as a mix of server name and
path on server, etc.  As far as the rest of the kernel (starting with VFS)
is concerned, device name is a part of opaque triple passed along to fs driver.

	Another ugly thing is that e.g. ncpfs needs a non-trivial dialog
with server and it's implemented thus:
	mount(2) is given enough information to connect to server and mount
something.  Server is not willing to give any fs contents yet, though, so
all we see is an empty directory.
	mount(8) opens that directory and uses ioctl(2) to talk to server
	eventually that dialog with the server convinced it that we are to
be allowed to mount the sucker.  At that point the contents suddenly appears
in the previously empty directory.  No way for somebody looking at that
empty directory to tell if it's genuinely empty fs imported from the server
or just a half-authenticated one (you can see that ncpfs is mounted there,
but that's it).

	Frankly, I wonder if we are trying to pack too much into one
syscall - not just in terms of overloading it (that much is obvious),
but in terms of trying to cram a sequence of syscalls into one.  If
we end up introducing new API(s) for mount(), it's probably worth
considering something like this:
	* open a connection to fs type driver, get a descriptor
	* use normal IO syscalls (usually just write(2)) on that
descriptor to tell fs type driver what do we want.  If any kind of
authentication is needed, that's the time for doing it
	* attach the thing identified by that descriptor to mountpoint

I have an old writeup somewhere (several variants of it, actually) on possible
replacement APIs; I'll try to dig it out and post it.
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
@ 2013-12-13  8:18       ` Karel Zak
  0 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-13  8:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Phillip Susi, util-linux, Linux-Fsdevel

On Thu, Dec 12, 2013 at 08:42:54PM +0100, Miklos Szeredi wrote:
> On Thu, Dec 12, 2013 at 1:05 PM, Karel Zak <kzak@redhat.com> wrote:
> > Miklos would be possible to fix kernel to accept MS_REC for
> > MS_REMOUNT|MS_BIND|MS_RDONLY operation? Please.
> 
> I really hate the current mount(2) API. It's a gigantic hack, and it's

 We all hate it, but we have to use it every day..

> nearing the end of its life anyway due to flags running out.

 well, the current problem with MS_REC is just one small inconsistence
 in the current MS_REMOUNT|MS_BIND semantic. It would be really nice
 to fix it now.

> So instead of adding more hacks, I think it would be better to think
> about adding a couple of syscalls that have clearly defined semantics.

 Yes, but it's (very) long term goal...

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
@ 2013-12-13  8:18       ` Karel Zak
  0 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-13  8:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Phillip Susi, util-linux-u79uwXL29TY76Z2rM5mHXA, Linux-Fsdevel

On Thu, Dec 12, 2013 at 08:42:54PM +0100, Miklos Szeredi wrote:
> On Thu, Dec 12, 2013 at 1:05 PM, Karel Zak <kzak-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Miklos would be possible to fix kernel to accept MS_REC for
> > MS_REMOUNT|MS_BIND|MS_RDONLY operation? Please.
> 
> I really hate the current mount(2) API. It's a gigantic hack, and it's

 We all hate it, but we have to use it every day..

> nearing the end of its life anyway due to flags running out.

 well, the current problem with MS_REC is just one small inconsistence
 in the current MS_REMOUNT|MS_BIND semantic. It would be really nice
 to fix it now.

> So instead of adding more hacks, I think it would be better to think
> about adding a couple of syscalls that have clearly defined semantics.

 Yes, but it's (very) long term goal...

    Karel

-- 
 Karel Zak  <kzak-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
 http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: read only bind mount ignores ready only
  2013-12-12 21:53       ` Al Viro
  (?)
@ 2013-12-13 10:45       ` Karel Zak
  -1 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2013-12-13 10:45 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, Phillip Susi, util-linux, Linux-Fsdevel

On Thu, Dec 12, 2013 at 09:53:25PM +0000, Al Viro wrote:
> 	Frankly, I wonder if we are trying to pack too much into one
> syscall - not just in terms of overloading it (that much is obvious),
> but in terms of trying to cram a sequence of syscalls into one.  If
> we end up introducing new API(s) for mount(), it's probably worth
> considering something like this:
> 	* open a connection to fs type driver, get a descriptor
> 	* use normal IO syscalls (usually just write(2)) on that
> descriptor to tell fs type driver what do we want.  If any kind of
> authentication is needed, that's the time for doing it
> 	* attach the thing identified by that descriptor to mountpoint

Yes, exactly. This is my wish for years.

I don't think we need more *independent* syscalls to replace mount(2)
(for example a special syscall to change propagation flags, or so).

I strongly believe that APIs for complex tasks have to be based on 
handlers (file descriptors). These APIs are extendible.

It would be also nice to provide some information about the mount
operation to userspace by the file descriptor -- it means to support
read(2) and at least to return mount Id.  The current situation when
we have only errno in userspace is insufficient. If you want to know
more information then you have parse /proc/self/mountinfo, but which
entry in the right entry for the last mount(2) call?

> I have an old writeup somewhere (several variants of it, actually) on possible
> replacement APIs; I'll try to dig it out and post it.

 Please, share it :-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-12-13 10:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11 14:37 read only bind mount ignores ready only Phillip Susi
2013-12-11 16:49 ` Karel Zak
2013-12-12 12:05 ` Karel Zak
2013-12-12 14:59   ` Phillip Susi
2013-12-12 16:02     ` Karel Zak
2013-12-12 19:42   ` Miklos Szeredi
2013-12-12 21:53     ` Al Viro
2013-12-12 21:53       ` Al Viro
2013-12-13 10:45       ` Karel Zak
2013-12-13  8:18     ` Karel Zak
2013-12-13  8:18       ` Karel Zak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.