util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using the upcoming fsinfo()
@ 2019-05-13  5:33 Ian Kent
  2019-05-13  9:08 ` Karel Zak
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Kent @ 2019-05-13  5:33 UTC (permalink / raw)
  To: util-linux

Hi all,

Some of you may know that David Howells is working on getting
a new system call fsinfo() merged into the Linux kernel.

This system call will provide access to information about mounted
mounts without having to read and parse file based mount tables
such as /proc/self/mountinfo, etc.

Essentially all mounts have an id and one can get the id of a
mount by it's path and then use that to obtain a large range
of information about it.

The information can include a list of mounts within the mount
which can be used to traverse a tree of mounts or the id used
to lookup information on an individual mount without the need
to traverse a file based mount table.

I'd like to update libmount to use the fsinfo() system call
because I believe using file based methods to get mount
information introduces significant overhead that can be
avoided. 

Because the fsinfo() system call provides a very different way
to get information
about mounts, and having looked at the current
code, I'm wondering what will be
the best way to go about it.

Any suggestions about the way this could best be done, given
that the existing methods must still work, will be very much
appreciated.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-13  5:33 Using the upcoming fsinfo() Ian Kent
@ 2019-05-13  9:08 ` Karel Zak
  2019-05-13 16:04   ` Bruce Dubbs
  2019-05-14  0:23   ` Ian Kent
  0 siblings, 2 replies; 15+ messages in thread
From: Karel Zak @ 2019-05-13  9:08 UTC (permalink / raw)
  To: Ian Kent; +Cc: util-linux

On Mon, May 13, 2019 at 01:33:22PM +0800, Ian Kent wrote:
> Some of you may know that David Howells is working on getting
> a new system call fsinfo() merged into the Linux kernel.
> 
> This system call will provide access to information about mounted
> mounts without having to read and parse file based mount tables
> such as /proc/self/mountinfo, etc.
> 
> Essentially all mounts have an id and one can get the id of a
> mount by it's path and then use that to obtain a large range
> of information about it.
> 
> The information can include a list of mounts within the mount
> which can be used to traverse a tree of mounts or the id used
> to lookup information on an individual mount without the need
> to traverse a file based mount table.
> 
> I'd like to update libmount to use the fsinfo() system call
> because I believe using file based methods to get mount
> information introduces significant overhead that can be
> avoided. 
> 
> Because the fsinfo() system call provides a very different way
> to get information
> about mounts, and having looked at the current
> code, I'm wondering what will be
> the best way to go about it.
> 
> Any suggestions about the way this could best be done, given
> that the existing methods must still work, will be very much
> appreciated.

It would be nice to start with some low-level things to read info
about a target (mountpoint) into libmnt_fs, something like:

    int mnt_fsinfo_fill_fs(chat char *tgt, struct libmnt_fs *fs)

and fill create a complete mount table by fsinfo():

    int mnt_fsinfo_fill_table(struct libmnt_table *tab)

... probably add fsinfo.c to code to keep it all together.

So, after then we can use these functions in our code. 

The nice place where is ugly overhead with the current mountinfo is
context_umount.c code, see lookup_umount_fs() and
mnt_context_find_umount_fs(). In this code we have mountpoint and we
need more information about it (due to redirection to umount.<type>
helpers, userspace mount options, etc.). It sounds like ideal to use
mnt_fsinfo_fill_fs() if possible.

The most visible change will be to use mnt_fsinfo_fill_table() with in
mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
This will be huge improvement as we use this function in systemd on
each mount table change...

The question is how easily will be to replace mountinfo with fsinfo().

Note that we have also #util-linux on freenode IRC.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-13  9:08 ` Karel Zak
@ 2019-05-13 16:04   ` Bruce Dubbs
  2019-05-14  0:04     ` Ian Kent
  2019-05-15 11:27     ` Karel Zak
  2019-05-14  0:23   ` Ian Kent
  1 sibling, 2 replies; 15+ messages in thread
From: Bruce Dubbs @ 2019-05-13 16:04 UTC (permalink / raw)
  To: Karel Zak, Ian Kent; +Cc: util-linux

On 5/13/19 4:08 AM, Karel Zak wrote:
> On Mon, May 13, 2019 at 01:33:22PM +0800, Ian Kent wrote:
>> Some of you may know that David Howells is working on getting
>> a new system call fsinfo() merged into the Linux kernel.
>>
>> This system call will provide access to information about mounted
>> mounts without having to read and parse file based mount tables
>> such as /proc/self/mountinfo, etc.
>>
>> Essentially all mounts have an id and one can get the id of a
>> mount by it's path and then use that to obtain a large range
>> of information about it.
>>
>> The information can include a list of mounts within the mount
>> which can be used to traverse a tree of mounts or the id used
>> to lookup information on an individual mount without the need
>> to traverse a file based mount table.
>>
>> I'd like to update libmount to use the fsinfo() system call
>> because I believe using file based methods to get mount
>> information introduces significant overhead that can be
>> avoided.
>>
>> Because the fsinfo() system call provides a very different way
>> to get information
>> about mounts, and having looked at the current
>> code, I'm wondering what will be
>> the best way to go about it.
>>
>> Any suggestions about the way this could best be done, given
>> that the existing methods must still work, will be very much
>> appreciated.
> 
> It would be nice to start with some low-level things to read info
> about a target (mountpoint) into libmnt_fs, something like:
> 
>      int mnt_fsinfo_fill_fs(chat char *tgt, struct libmnt_fs *fs)
> 
> and fill create a complete mount table by fsinfo():
> 
>      int mnt_fsinfo_fill_table(struct libmnt_table *tab)
> 
> ... probably add fsinfo.c to code to keep it all together.
> 
> So, after then we can use these functions in our code.
> 
> The nice place where is ugly overhead with the current mountinfo is
> context_umount.c code, see lookup_umount_fs() and
> mnt_context_find_umount_fs(). In this code we have mountpoint and we
> need more information about it (due to redirection to umount.<type>
> helpers, userspace mount options, etc.). It sounds like ideal to use
> mnt_fsinfo_fill_fs() if possible.
> 
> The most visible change will be to use mnt_fsinfo_fill_table() with in
> mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> This will be huge improvement as we use this function in systemd on
> each mount table change...
> 
> The question is how easily will be to replace mountinfo with fsinfo().

I may be stating the obvious, but this proposal does not appear to 
simplify anything because it is kernel version dependent.  From what I 
understand, the new and old methods will both need to be supported for 
quite some time.

I'm not suggesting that the changes not be made, but I suggest going slow.

   -- Bruce

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-13 16:04   ` Bruce Dubbs
@ 2019-05-14  0:04     ` Ian Kent
  2019-05-15 11:27     ` Karel Zak
  1 sibling, 0 replies; 15+ messages in thread
From: Ian Kent @ 2019-05-14  0:04 UTC (permalink / raw)
  To: Bruce Dubbs, Karel Zak; +Cc: util-linux

On Mon, 2019-05-13 at 11:04 -0500, Bruce Dubbs wrote:
> On 5/13/19 4:08 AM, Karel Zak wrote:
> > On Mon, May 13, 2019 at 01:33:22PM +0800, Ian Kent wrote:
> > > Some of you may know that David Howells is working on getting
> > > a new system call fsinfo() merged into the Linux kernel.
> > > 
> > > This system call will provide access to information about mounted
> > > mounts without having to read and parse file based mount tables
> > > such as /proc/self/mountinfo, etc.
> > > 
> > > Essentially all mounts have an id and one can get the id of a
> > > mount by it's path and then use that to obtain a large range
> > > of information about it.
> > > 
> > > The information can include a list of mounts within the mount
> > > which can be used to traverse a tree of mounts or the id used
> > > to lookup information on an individual mount without the need
> > > to traverse a file based mount table.
> > > 
> > > I'd like to update libmount to use the fsinfo() system call
> > > because I believe using file based methods to get mount
> > > information introduces significant overhead that can be
> > > avoided.
> > > 
> > > Because the fsinfo() system call provides a very different way
> > > to get information
> > > about mounts, and having looked at the current
> > > code, I'm wondering what will be
> > > the best way to go about it.
> > > 
> > > Any suggestions about the way this could best be done, given
> > > that the existing methods must still work, will be very much
> > > appreciated.
> > 
> > It would be nice to start with some low-level things to read info
> > about a target (mountpoint) into libmnt_fs, something like:
> > 
> >      int mnt_fsinfo_fill_fs(chat char *tgt, struct libmnt_fs *fs)
> > 
> > and fill create a complete mount table by fsinfo():
> > 
> >      int mnt_fsinfo_fill_table(struct libmnt_table *tab)
> > 
> > ... probably add fsinfo.c to code to keep it all together.
> > 
> > So, after then we can use these functions in our code.
> > 
> > The nice place where is ugly overhead with the current mountinfo is
> > context_umount.c code, see lookup_umount_fs() and
> > mnt_context_find_umount_fs(). In this code we have mountpoint and we
> > need more information about it (due to redirection to umount.<type>
> > helpers, userspace mount options, etc.). It sounds like ideal to use
> > mnt_fsinfo_fill_fs() if possible.
> > 
> > The most visible change will be to use mnt_fsinfo_fill_table() with in
> > mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> > This will be huge improvement as we use this function in systemd on
> > each mount table change...
> > 
> > The question is how easily will be to replace mountinfo with fsinfo().
> 
> I may be stating the obvious, but this proposal does not appear to 
> simplify anything because it is kernel version dependent.  From what I 
> understand, the new and old methods will both need to be supported for 
> quite some time.

Yes, it won't really simplify the code base overall because of the
need to support kernel versions that may not have the system call.

But what I didn't talk about is there's a real problem handling
large mount tables with the current method of reading the proc file
system mount tables and these tables can get very large at times.

And this is also about processes being flooded with notifications
due to heavy mount/umount activity and then re-reading the entire
mount table (or at least half on average) on every one because
there's no other way to locate the mount they are looking for.

I think the situation with util-linux isn't so bad in this respect
but I still believe keeping the in-memory mount table up to date
should see improvement. And libmount is used by quite a number of
problematic applications so improving it will translate to
improvement in those applications too.

Ultimately I'll need to look at other applications (perhaps persuade
them to use libmount).

There's also the large number of notifications itself but I'm
still not sure how to improve that. There will be a notifications
implementation to accompany the recent mount-API/fsinfo changes as
well so hopefully we'll be able to improve the situation with the
implementation of that.

> 
> I'm not suggesting that the changes not be made, but I suggest going slow.

The changes will be fairly difficult because the util-linux mount
handling is quite complex.

And the fact that the fsinfo() patch series hasn't been merged yet
means this isn't going to be done quickly (at least not "rushed"
anyway).

But it does need to be done ahead of the merge so we can work out what's
missing in the fsinfo() implementation and try to have things added/fixed
prior to the upstream merge.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-13  9:08 ` Karel Zak
  2019-05-13 16:04   ` Bruce Dubbs
@ 2019-05-14  0:23   ` Ian Kent
  2019-05-15 11:45     ` Karel Zak
  1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2019-05-14  0:23 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Mon, 2019-05-13 at 11:08 +0200, Karel Zak wrote:

Hi Karel,

Thanks for giving me some suggestions on where to focus
my efforts.

> On Mon, May 13, 2019 at 01:33:22PM +0800, Ian Kent wrote:
> > Some of you may know that David Howells is working on getting
> > a new system call fsinfo() merged into the Linux kernel.
> > 
> > This system call will provide access to information about mounted
> > mounts without having to read and parse file based mount tables
> > such as /proc/self/mountinfo, etc.
> > 
> > Essentially all mounts have an id and one can get the id of a
> > mount by it's path and then use that to obtain a large range
> > of information about it.
> > 
> > The information can include a list of mounts within the mount
> > which can be used to traverse a tree of mounts or the id used
> > to lookup information on an individual mount without the need
> > to traverse a file based mount table.
> > 
> > I'd like to update libmount to use the fsinfo() system call
> > because I believe using file based methods to get mount
> > information introduces significant overhead that can be
> > avoided. 
> > 
> > Because the fsinfo() system call provides a very different way
> > to get information
> > about mounts, and having looked at the current
> > code, I'm wondering what will be
> > the best way to go about it.
> > 
> > Any suggestions about the way this could best be done, given
> > that the existing methods must still work, will be very much
> > appreciated.
> 
> It would be nice to start with some low-level things to read info
> about a target (mountpoint) into libmnt_fs, something like:
> 
>     int mnt_fsinfo_fill_fs(chat char *tgt, struct libmnt_fs *fs)
> 
> and fill create a complete mount table by fsinfo():
> 
>     int mnt_fsinfo_fill_table(struct libmnt_table *tab)
> 
> ... probably add fsinfo.c to code to keep it all together.
> 
> So, after then we can use these functions in our code. 

Ok, thanks for this,

> 
> The nice place where is ugly overhead with the current mountinfo is
> context_umount.c code, see lookup_umount_fs() and
> mnt_context_find_umount_fs(). In this code we have mountpoint and we
> need more information about it (due to redirection to umount.<type>
> helpers, userspace mount options, etc.). It sounds like ideal to use
> mnt_fsinfo_fill_fs() if possible.

That sounds like an ideal opportunity for improvement by using
fsinfo(). I'll look there too.

> 
> The most visible change will be to use mnt_fsinfo_fill_table() with in
> mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> This will be huge improvement as we use this function in systemd on
> each mount table change...
> 
> The question is how easily will be to replace mountinfo with fsinfo().

I've been looking at libmount but I'm not sure I was focusing on
libmnt_fs so I'm not sure yet.

A large part of doing this early is to find out what's missing
and see if it's possible to update fsinfo().

For example, the devanme in mountinfo which can be different to
the devname returned by fsinfo(), David has said it's not straight
forward to change but at least he's aware of it and thinking about
it.

Thanks
Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-13 16:04   ` Bruce Dubbs
  2019-05-14  0:04     ` Ian Kent
@ 2019-05-15 11:27     ` Karel Zak
  1 sibling, 0 replies; 15+ messages in thread
From: Karel Zak @ 2019-05-15 11:27 UTC (permalink / raw)
  To: Bruce Dubbs; +Cc: Ian Kent, util-linux

On Mon, May 13, 2019 at 11:04:50AM -0500, Bruce Dubbs wrote:
> On 5/13/19 4:08 AM, Karel Zak wrote:
> > The nice place where is ugly overhead with the current mountinfo is
> > context_umount.c code, see lookup_umount_fs() and
> > mnt_context_find_umount_fs(). In this code we have mountpoint and we
> > need more information about it (due to redirection to umount.<type>
> > helpers, userspace mount options, etc.). It sounds like ideal to use
> > mnt_fsinfo_fill_fs() if possible.
> > 
> > The most visible change will be to use mnt_fsinfo_fill_table() with in
> > mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> > This will be huge improvement as we use this function in systemd on
> > each mount table change...
> > 
> > The question is how easily will be to replace mountinfo with fsinfo().
> 
> I may be stating the obvious, but this proposal does not appear to simplify
> anything because it is kernel version dependent.  From what I understand,
> the new and old methods will both need to be supported for quite some time.

Yes, we need to support both versions. 

The new version of the API will significantly improve performance in
situation when you need more information about a mountpoint (for
example fstype, device name, mount options, etc.) -- nice example is
umount or remount.

Now we parse all /proc/self/mountinfo to get one line from the file.
This is problem on systems with huge number of the mountpoints and on
systems where kernel mount table is modified very often and userspace
need to be synchronized with the table (e.g. systemd dependencies,
etc).

All this is about a new syscall fsinfo(). The new mount API (mount(2)
replacement) is another story :-)

> I'm not suggesting that the changes not be made, but I suggest going slow.

For end users all the changes should be invisible. The same libmount
binary should be usable everywhere independently on the new syscalls.

It's possible that we will extend the library API to make it easy for
applications to get info about a mountpoint without mountinfo file
parsing, but it should be also possible to do it with mountinfo as
fallback if there is no fsinfo().

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-14  0:23   ` Ian Kent
@ 2019-05-15 11:45     ` Karel Zak
  2019-05-16  0:13       ` Ian Kent
  0 siblings, 1 reply; 15+ messages in thread
From: Karel Zak @ 2019-05-15 11:45 UTC (permalink / raw)
  To: Ian Kent; +Cc: util-linux

On Tue, May 14, 2019 at 08:23:02AM +0800, Ian Kent wrote:
> On Mon, 2019-05-13 at 11:08 +0200, Karel Zak wrote:
> > The nice place where is ugly overhead with the current mountinfo is
> > context_umount.c code, see lookup_umount_fs() and
> > mnt_context_find_umount_fs(). In this code we have mountpoint and we
> > need more information about it (due to redirection to umount.<type>
> > helpers, userspace mount options, etc.). It sounds like ideal to use
> > mnt_fsinfo_fill_fs() if possible.
> 
> That sounds like an ideal opportunity for improvement by using
> fsinfo(). I'll look there too.

Yes.

> > The most visible change will be to use mnt_fsinfo_fill_table() with in
> > mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> > This will be huge improvement as we use this function in systemd on
> > each mount table change...
> > 
> > The question is how easily will be to replace mountinfo with fsinfo().

Now when I think about it I'm not sure if create complete mount table
by fsinfo() is the right way. Maybe many fsinfo() calls will be more
slow than generate mountinfo file in kernel and read() in userspace.
Not sure.

> I've been looking at libmount but I'm not sure I was focusing on
> libmnt_fs so I'm not sure yet.
>
> A large part of doing this early is to find out what's missing
> and see if it's possible to update fsinfo().

Yes, it would be really nice if we can get all info from fsinfo(). It
opens a new possibilities for us to make things like umount, remount,
and systemd more effective.

> For example, the devanme in mountinfo which can be different to
> the devname returned by fsinfo(), David has said it's not straight
> forward to change but at least he's aware of it and thinking about
> it.

Do you mean "source" field (9th column in mountinfo)? The device is
defined by maj:min (3rd column) in the file (well, whatever the devno
means for things like btrfs;-).

The "source" should be unmodified string as specified in userspace for
mount(2) syscall, otherwise things like "mount -a" can not compare the
kernel mount table with fstab.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-15 11:45     ` Karel Zak
@ 2019-05-16  0:13       ` Ian Kent
  2019-05-21 19:21         ` L A Walsh
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Kent @ 2019-05-16  0:13 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On Wed, 2019-05-15 at 13:45 +0200, Karel Zak wrote:
> On Tue, May 14, 2019 at 08:23:02AM +0800, Ian Kent wrote:
> > On Mon, 2019-05-13 at 11:08 +0200, Karel Zak wrote:
> > > The nice place where is ugly overhead with the current mountinfo is
> > > context_umount.c code, see lookup_umount_fs() and
> > > mnt_context_find_umount_fs(). In this code we have mountpoint and we
> > > need more information about it (due to redirection to umount.<type>
> > > helpers, userspace mount options, etc.). It sounds like ideal to use
> > > mnt_fsinfo_fill_fs() if possible.
> > 
> > That sounds like an ideal opportunity for improvement by using
> > fsinfo(). I'll look there too.
> 
> Yes.
> 
> > > The most visible change will be to use mnt_fsinfo_fill_table() with in
> > > mnt_table_parse_file() if the file name is "/proc/self/mountinfo".
> > > This will be huge improvement as we use this function in systemd on
> > > each mount table change...
> > > 
> > > The question is how easily will be to replace mountinfo with fsinfo().
> 
> Now when I think about it I'm not sure if create complete mount table
> by fsinfo() is the right way. Maybe many fsinfo() calls will be more
> slow than generate mountinfo file in kernel and read() in userspace.
> Not sure.

I'm not sure about the comparison in overhead of this either.

But it's something that needs to be done to get familiar with how
to use fsinfo() and to work out what else needs to be done.

As you know this has already shown that getting file system specific
options isn't working yet for most file systems and I'll need to
implement the missing ->fsinfo() super block operation for (at least
some, probably many) file systems just to continue the work.

There is a slightly less obvious difference using fsinfo() over
the proc file system to get the whole mount table.

When you open a proc file system mount table the kernel takes locks
that will prevent (at least) mount, umount and remount actions until
the proc file is closed.

With fsinfo() the locks need to be taken but at a much finer
granularity so mount actions can continue in parallel.

There's a price for that locking improvement though, if your trying
to get a whole consistent mount table you need to check it hasn't
changed while you read it and if it has you need to start over.

So getting the whole table with fsinfo() will definitely need to be
evaluated against using the proc file system.

But there's quite a lot of processing that happens when the kernel
issues proc mount records, the path calculations are quite expensive
for example, so the difference isn't clear cut.

> 
> > I've been looking at libmount but I'm not sure I was focusing on
> > libmnt_fs so I'm not sure yet.
> > 
> > A large part of doing this early is to find out what's missing
> > and see if it's possible to update fsinfo().
> 
> Yes, it would be really nice if we can get all info from fsinfo(). It
> opens a new possibilities for us to make things like umount, remount,
> and systemd more effective.

I think we will be able to but probably not for a while, there's
quite a bit still to do for fsinfo() by the look of it.

Excessive resource usage of systemd is one of the main motivations
for me doing this so improving that is at the top of the priority
list for me.

> 
> > For example, the devanme in mountinfo which can be different to
> > the devname returned by fsinfo(), David has said it's not straight
> > forward to change but at least he's aware of it and thinking about
> > it.
> 
> Do you mean "source" field (9th column in mountinfo)? The device is
> defined by maj:min (3rd column) in the file (well, whatever the devno
> means for things like btrfs;-).

I do.

> 
> The "source" should be unmodified string as specified in userspace for
> mount(2) syscall, otherwise things like "mount -a" can not compare the
> kernel mount table with fstab.

This string isn't always a string value that comes from the mount
kernel structure, the the proc file system needs to call upon the
file system to get it in some cases.

For example when you see an NFS <server>:<path> in the proc file system
output.

To get this string the proc file system checks if the file system provides
a super block operation ->show_devname() and calls it to get the name
otherwise it copies the string from the mount structure.

As David says, to deal with this it isn't as simple as adding an fsinfo()
request because there are cases where it can have multiple values.

A similar thing is done for field 4 where, if the file system defines
a super block operation ->show_path() it will be called to get the
path, otherwise it's calculated using mount's root. Interestingly NFS
appears to always return "/" for this from its ->show_path() function.

And, as I mentioned above, there's the needed ->fsinfo() super operation
to cover the use of the existing ->show_options() operation (provided
by pretty much all file systems) to get the file system specific options.

So there's quite a bit of detail to be worked out for fsinfo() to be
able to correctly provide all mount information.

But, hey, that was the point of doing this now.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-16  0:13       ` Ian Kent
@ 2019-05-21 19:21         ` L A Walsh
  2019-05-22  2:59           ` Ian Kent
  0 siblings, 1 reply; 15+ messages in thread
From: L A Walsh @ 2019-05-21 19:21 UTC (permalink / raw)
  To: Ian Kent; +Cc: Karel Zak, util-linux

On 2019/05/15 17:13, Ian Kent wrote:
> And, as I mentioned above, there's the needed ->fsinfo() super operation
> to cover the use of the existing ->show_options() operation (provided
> by pretty much all file systems) to get the file system specific options.
>
> So there's quite a bit of detail to be worked out for fsinfo() to be
> able to correctly provide all mount information.
>
> But, hey, that was the point of doing this now.
>   
----
	Maybe this is already planned behind the scenes, but I wanted to
throw out my own suggestion -- and that is to start with the new 
system call usage in its own cmdline tool that can be used just to call
or exercise the new call -- effectively allowing calling the new kernel call
from any shell based program -- allowing for a passthrough type operation.

	This serves to workout that the call always returns what you 
expect it to, familiarity with the new call and how it works as well as
developing a first interface to construct and parse calls-to and 
output-from the call.

	From there -- those first options could be moved to only 
be used with '--raw' or '--direct' switch with a new switch associated
with, perhaps another util that may eventually be replaced  with this
code that uses the new utility.

	All of that could be done along with a continuing build and
release of the older tools until such time as the new call-using
tool replaces all of the old tool to whatever standard is wanted.

	That way, it could allow not disturbing old code
while code is developed for using the new interface, allowing for
a seamless switch sometime later with the old progs being left around
for a release with some 'old' prefix and eventually not built by default
and moved to the project's "attic" later on.

	This can allow for an extended period of feedback & development
until all users are comfy w/the new tool (which might, in some cases,
have an option to generate the same output as the old tool (but using
the new call) for older scripts that might be less easy to update.

Anyway, just my general caution in code rewrites replacing old libs & utils.
And again, please forgive my saying something that may be self-evident,
standard procedure, or already planned, but just not detailed on list.


-linda



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-21 19:21         ` L A Walsh
@ 2019-05-22  2:59           ` Ian Kent
  2019-05-22  3:12             ` Ian Kent
  2019-05-22  4:28             ` L A Walsh
  0 siblings, 2 replies; 15+ messages in thread
From: Ian Kent @ 2019-05-22  2:59 UTC (permalink / raw)
  To: L A Walsh; +Cc: Karel Zak, util-linux

On Tue, 2019-05-21 at 12:21 -0700, L A Walsh wrote:
> On 2019/05/15 17:13, Ian Kent wrote:
> > And, as I mentioned above, there's the needed ->fsinfo() super operation
> > to cover the use of the existing ->show_options() operation (provided
> > by pretty much all file systems) to get the file system specific options.
> > 
> > So there's quite a bit of detail to be worked out for fsinfo() to be
> > able to correctly provide all mount information.
> > 
> > But, hey, that was the point of doing this now.
> >   
> 
> ----
> 	Maybe this is already planned behind the scenes, but I wanted to
> throw out my own suggestion -- and that is to start with the new 
> system call usage in its own cmdline tool that can be used just to call
> or exercise the new call -- effectively allowing calling the new kernel call
> from any shell based program -- allowing for a passthrough type operation.

I hadn't planned on producing a utility but I do have code that I've
been using to learn how to use the call.

I could turn that into a utility for use from scripts at some point.

> 
> 	This serves to workout that the call always returns what you 
> expect it to, familiarity with the new call and how it works as well as
> developing a first interface to construct and parse calls-to and 
> output-from the call.

Avoiding having to parse string output (from the proc file system
mount tables) is one of the key reasons to use a system call for
this.

So this isn't the point of doing it.

The work for this (and some other new system calls) is being done
in the kernel so the issue isn't to work out what the system call
returns as much as it is to ensure the system call provides what's
needed, implement things that aren't yet done and work out ways of
providing things that are needed but can't yet be provided.

> 
> 	From there -- those first options could be moved to only 
> be used with '--raw' or '--direct' switch with a new switch associated
> with, perhaps another util that may eventually be replaced  with this
> code that uses the new utility.
> 
> 	All of that could be done along with a continuing build and
> release of the older tools until such time as the new call-using
> tool replaces all of the old tool to whatever standard is wanted.

I haven't looked at the tools at all.

It may be worth looking at them but fork and exec a program then
parse text output isn't usually the way these utilities should
work.

> 
> 	That way, it could allow not disturbing old code
> while code is developed for using the new interface, allowing for
> a seamless switch sometime later with the old progs being left around
> for a release with some 'old' prefix and eventually not built by default
> and moved to the project's "attic" later on.
> 
> 	This can allow for an extended period of feedback & development
> until all users are comfy w/the new tool (which might, in some cases,
> have an option to generate the same output as the old tool (but using
> the new call) for older scripts that might be less easy to update.
> 
> Anyway, just my general caution in code rewrites replacing old libs & utils.
> And again, please forgive my saying something that may be self-evident,
> standard procedure, or already planned, but just not detailed on list.

The focus is on eliminating the need to read the proc file system
mount tables including getting the mount information for any single
mount.

When these tables are large and there's a fair bit of mount/umount
activity this can be a significant problem.

Getting this information usually means reading on average half of
the whole mount table every time and it's not possible to get info.
on a single mount without doing this.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-22  2:59           ` Ian Kent
@ 2019-05-22  3:12             ` Ian Kent
  2019-05-22  4:28             ` L A Walsh
  1 sibling, 0 replies; 15+ messages in thread
From: Ian Kent @ 2019-05-22  3:12 UTC (permalink / raw)
  To: L A Walsh; +Cc: Karel Zak, util-linux

On Wed, 2019-05-22 at 10:59 +0800, Ian Kent wrote:
> 
> > 	This serves to workout that the call always returns what you 
> > expect it to, familiarity with the new call and how it works as well as
> > developing a first interface to construct and parse calls-to and 
> > output-from the call.
> 
> Avoiding having to parse string output (from the proc file system
> mount tables) is one of the key reasons to use a system call for
> this.
> 
> So this isn't the point of doing it.
> 
> The work for this (and some other new system calls) is being done
> in the kernel so the issue isn't to work out what the system call
> returns as much as it is to ensure the system call provides what's
> needed, implement things that aren't yet done and work out ways of
> providing things that are needed but can't yet be provided.

Just to give an idea of the amount of work that still needs to be
done there are around 70 file systems included in the Linux kernel
and, so far, the code needed to provide the file system specific
mount options via fsinfo() has been done for a little over 10 of
them (about 8 of these in the last few days) and most of those are
the simpler ones.

But having said that providing the file system specific mount options
appears to be one of only a couple of things that's missing.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-22  2:59           ` Ian Kent
  2019-05-22  3:12             ` Ian Kent
@ 2019-05-22  4:28             ` L A Walsh
  2019-05-22 13:14               ` Ian Kent
  1 sibling, 1 reply; 15+ messages in thread
From: L A Walsh @ 2019-05-22  4:28 UTC (permalink / raw)
  To: Ian Kent, Karel Zak, util-linux

On 2019/05/21 19:59, Ian Kent wrote:
> I hadn't planned on producing a utility but I do have code that I've
> been using to learn how to use the call.
>
> I could turn that into a utility for use from scripts at some point.
>   
---
     not required, but thought it might allow for more types of
tests/usages.
If it is really of limited or no benefit, I'm not gonna lose sleep.
> Avoiding having to parse string output (from the proc file system
> mount tables) is one of the key reasons to use a system call for
> this.
>
> So this isn't the point of doing it.
>   
I get that....this wasn't intended as an 'endpoint' just a way for those
not
implementing and using the calls to get a feel for the call.  It may
not serve
a useful purpose in this case, but some system calls have direct
user-utils that
are very useful.  The lack of a system util to manipulate the pty calls
forced
me to write a few-line 'C' prog just to make 1 call to approve
something.  Eventually switched to a more robust interface in perl.
> The work for this (and some other new system calls) is being done
> in the kernel so the issue isn't to work out what the system call
> returns as much as it is to ensure the system call provides what's
> needed, implement things that aren't yet done and work out ways of
> providing things that are needed but can't yet be provided.
>   
----
  No basic testing that the kernel call is producing exactly what you are
expecting is needed, I take it.
>   
>> 	From there -- those first options could be moved to only 
>> be used with '--raw' or '--direct' switch with a new switch associated
>> with, perhaps another util that may eventually be replaced  with this
>> code that uses the new utility.
>>
>> 	All of that could be done along with a continuing build and
>> release of the older tools until such time as the new call-using
>> tool replaces all of the old tool to whatever standard is wanted.
>>     
>
> I haven't looked at the tools at all.
>
> It may be worth looking at them but fork and exec a program then
> parse text output isn't usually the way these utilities should
> work.
>   
----
  That wasn't what I meant -- just that if you implement functionality in
a test prog, eventually you would be able to library-ize the call for other
purposes.  I got the impression th
> The focus is on eliminating the need to read the proc file system
> mount tables including getting the mount information for any single
> mount.
>
> When these tables are large and there's a fair bit of mount/umount
> activity this can be a significant problem.
>
> Getting this information usually means reading on average half of
> the whole mount table every time and it's not possible to get info.
> on a single mount without doing this.
>   
----
    That sounds like a deficiency in the way mount tables are displayed.

Just like you can look at all net-io with a device name in column 0,
there's another directory where each device is a filename entry and by
looking at that
you can just look at the stats of that 1 file.

Block devices have the same type of all-or-single readouts as well.

So why not mounts?

I.e. why not subdirs for 'by-mountpoint', or by-device, or
whole-dev-vs.partition, or by UUID....like some things are listed
in /dev.  That would allow you to narrow in on the mount you want for
doing whatever.

The advantage of putting it in proc is that everyone easily benefits in a
portable, and easy to read interface, where-as binary-interfaces are what
make windows windows, with text interfaces on linux allowing for easy
prototyping and creative usages.

Just this one part -- of wanting a kernel call just to narrow scope
seems like a
perfect reason to add different ways of addressing mounts by different
keywords.



> Ian
>
>   


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-22  4:28             ` L A Walsh
@ 2019-05-22 13:14               ` Ian Kent
  2019-05-22 13:55                 ` Karel Zak
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Kent @ 2019-05-22 13:14 UTC (permalink / raw)
  To: L A Walsh, Karel Zak, util-linux

On Tue, 2019-05-21 at 21:28 -0700, L A Walsh wrote:
> On 2019/05/21 19:59, Ian Kent wrote:
> > I hadn't planned on producing a utility but I do have code that I've
> > been using to learn how to use the call.
> > 
> > I could turn that into a utility for use from scripts at some point.
> >   
> 
> ---
>      not required, but thought it might allow for more types of
> tests/usages.
> If it is really of limited or no benefit, I'm not gonna lose sleep.
> > Avoiding having to parse string output (from the proc file system
> > mount tables) is one of the key reasons to use a system call for
> > this.
> > 
> > So this isn't the point of doing it.
> >   
> 
> I get that....this wasn't intended as an 'endpoint' just a way for those
> not
> implementing and using the calls to get a feel for the call.  It may
> not serve
> a useful purpose in this case, but some system calls have direct
> user-utils that
> are very useful.  The lack of a system util to manipulate the pty calls
> forced
> me to write a few-line 'C' prog just to make 1 call to approve
> something.  Eventually switched to a more robust interface in perl.

We will see, I will end up with something that's more or less example
usage anyway.

There will be a fairly complex example in the kernel source tree too,
along with other examples, in the samples/vfs directory.

> > The work for this (and some other new system calls) is being done
> > in the kernel so the issue isn't to work out what the system call
> > returns as much as it is to ensure the system call provides what's
> > needed, implement things that aren't yet done and work out ways of
> > providing things that are needed but can't yet be provided.
> >   
> 
> ----
>   No basic testing that the kernel call is producing exactly what you are
> expecting is needed, I take it.

Right, that's why I have written some code.

> >   
> > > 	From there -- those first options could be moved to only 
> > > be used with '--raw' or '--direct' switch with a new switch associated
> > > with, perhaps another util that may eventually be replaced  with this
> > > code that uses the new utility.
> > > 
> > > 	All of that could be done along with a continuing build and
> > > release of the older tools until such time as the new call-using
> > > tool replaces all of the old tool to whatever standard is wanted.
> > >     
> > 
> > I haven't looked at the tools at all.
> > 
> > It may be worth looking at them but fork and exec a program then
> > parse text output isn't usually the way these utilities should
> > work.
> >   
> 
> ----
>   That wasn't what I meant -- just that if you implement functionality in
> a test prog, eventually you would be able to library-ize the call for other
> purposes.  I got the impression th

Yes, the investigative code I write will make it's way into
whatever is done.

> > The focus is on eliminating the need to read the proc file system
> > mount tables including getting the mount information for any single
> > mount.
> > 
> > When these tables are large and there's a fair bit of mount/umount
> > activity this can be a significant problem.
> > 
> > Getting this information usually means reading on average half of
> > the whole mount table every time and it's not possible to get info.
> > on a single mount without doing this.
> >   
> 
> ----
>     That sounds like a deficiency in the way mount tables are displayed.

Displayed is probably not the right word, generated is closer to
what happens in the kernel.

> 
> Just like you can look at all net-io with a device name in column 0,
> there's another directory where each device is a filename entry and by
> looking at that
> you can just look at the stats of that 1 file.
> 
> Block devices have the same type of all-or-single readouts as well.
> 
> So why not mounts?

That's worth some thought but I don't think it will work in
this case.

People will take a copy of the information provided in proc
and then use it to lookup a mount. So you still need to read
the list of mounts in the kernel to generate that to find the
piece of information you need that identifies a mount.

And you would still need to traverse the list of mounts to
generate any given view of this information on every access
too so there's not much to be gained since that's what causes
the problem with heavy mount table usage in the first place.

It's not like a fairly static device that will stay around
for a reasonable amount of time, and where the code to
maintain the proc or sysfs entries is local to a particular
driver or file system so the code is localized to a particular
sub-system and therefore reasonably maintainable.

In this case the list of mounts is present in the core VFS
and the VFS needs to cater for all the places where mounts
can be made and accessed.

And there can be significant and frequent changes to mount
information which is another reason it needs to be generated
on access.

Keep in mind the goal of the mount structures is not to make
information about them available but to make the operations
that need to be done on them doable in a sensible amount of
time.

> 
> I.e. why not subdirs for 'by-mountpoint', or by-device, or
> whole-dev-vs.partition, or by UUID....like some things are listed
> in /dev.  That would allow you to narrow in on the mount you want for
> doing whatever.

TBH, I can't see that amount of code being added to the VFS
for this.

Simple annoyances like some mounts won't have a UUID, or won't
have partition devices associated with them will also cause
inconsistent views of the mounts.

It's unlikely anyone would be willing to do it if only because
it would make an already complex body of code much, much harder
to maintain.

A system call is a simpler way to make this available while also
being a fairly concentrated body of code which is much easier to
maintain.

Don't forget that any given process can have a different view of
the list of mounts based on mount namespace which is one thing
that makes the VFS mount code quite complex.

Ian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-22 13:14               ` Ian Kent
@ 2019-05-22 13:55                 ` Karel Zak
  2019-05-23  1:27                   ` Ian Kent
  0 siblings, 1 reply; 15+ messages in thread
From: Karel Zak @ 2019-05-22 13:55 UTC (permalink / raw)
  To: Ian Kent; +Cc: L A Walsh, util-linux

On Wed, May 22, 2019 at 09:14:37PM +0800, Ian Kent wrote:
> On Tue, 2019-05-21 at 21:28 -0700, L A Walsh wrote:
> > On 2019/05/21 19:59, Ian Kent wrote:
> > > I hadn't planned on producing a utility but I do have code that I've
> > > been using to learn how to use the call.
> > > 
> > > I could turn that into a utility for use from scripts at some point.
> > >   
> > 
> > ---
> >      not required, but thought it might allow for more types of
> > tests/usages.
> > If it is really of limited or no benefit, I'm not gonna lose sleep.
> > > Avoiding having to parse string output (from the proc file system
> > > mount tables) is one of the key reasons to use a system call for
> > > this.
> > > 
> > > So this isn't the point of doing it.
> > >   
> > 
> > I get that....this wasn't intended as an 'endpoint' just a way for those
> > not
> > implementing and using the calls to get a feel for the call.  It may
> > not serve
> > a useful purpose in this case, but some system calls have direct
> > user-utils that
> > are very useful.  The lack of a system util to manipulate the pty calls
> > forced
> > me to write a few-line 'C' prog just to make 1 call to approve
> > something.  Eventually switched to a more robust interface in perl.
> 
> We will see, I will end up with something that's more or less example
> usage anyway.

I'd like to write something like "mountsh" one day. The idea is to
have very low-level tool that is able to provide command line
interface to the all fragments of the new mount API in the same
granularity as provided by kernel (mount(8) is too high-level in this
case).

Anyway, the primary goal is to use the new syscalls on standard
places (e.g. libmount) where it improves performance.

> > I.e. why not subdirs for 'by-mountpoint', or by-device, or
> > whole-dev-vs.partition, or by UUID....like some things are listed
> > in /dev.  That would allow you to narrow in on the mount you want for
> > doing whatever.
> 
> TBH, I can't see that amount of code being added to the VFS
> for this.
> 
> Simple annoyances like some mounts won't have a UUID, or won't
> have partition devices associated with them will also cause
> inconsistent views of the mounts.

or more filesystems mounted on the same mountpoint, mountpoint is
deleted, etc...

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Using the upcoming fsinfo()
  2019-05-22 13:55                 ` Karel Zak
@ 2019-05-23  1:27                   ` Ian Kent
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Kent @ 2019-05-23  1:27 UTC (permalink / raw)
  To: Karel Zak; +Cc: L A Walsh, util-linux

On Wed, 2019-05-22 at 15:55 +0200, Karel Zak wrote:
> On Wed, May 22, 2019 at 09:14:37PM +0800, Ian Kent wrote:
> > On Tue, 2019-05-21 at 21:28 -0700, L A Walsh wrote:
> > > On 2019/05/21 19:59, Ian Kent wrote:
> > > > I hadn't planned on producing a utility but I do have code that I've
> > > > been using to learn how to use the call.
> > > > 
> > > > I could turn that into a utility for use from scripts at some point.
> > > >   
> > > 
> > > ---
> > >      not required, but thought it might allow for more types of
> > > tests/usages.
> > > If it is really of limited or no benefit, I'm not gonna lose sleep.
> > > > Avoiding having to parse string output (from the proc file system
> > > > mount tables) is one of the key reasons to use a system call for
> > > > this.
> > > > 
> > > > So this isn't the point of doing it.
> > > >   
> > > 
> > > I get that....this wasn't intended as an 'endpoint' just a way for those
> > > not
> > > implementing and using the calls to get a feel for the call.  It may
> > > not serve
> > > a useful purpose in this case, but some system calls have direct
> > > user-utils that
> > > are very useful.  The lack of a system util to manipulate the pty calls
> > > forced
> > > me to write a few-line 'C' prog just to make 1 call to approve
> > > something.  Eventually switched to a more robust interface in perl.
> > 
> > We will see, I will end up with something that's more or less example
> > usage anyway.
> 
> I'd like to write something like "mountsh" one day. The idea is to
> have very low-level tool that is able to provide command line
> interface to the all fragments of the new mount API in the same
> granularity as provided by kernel (mount(8) is too high-level in this
> case).

There's fairly simple example usage of several of the mount-api
calls in samples/vfs/test-fsmount.c.

There's the in kernel mount-api documentation at
Documentation/filesystems/mount_api.txt although that's more
oriented to usage within the kerenl.

I was wondering if kernel file systems that have not been converted
to use the new api (but use the legacy mount-api kernel code) will
work properly with the new mount-api? I think they would have to
for the mount-api to be viable but I'm not sure.

LOL, I remember, all those years ago, when you set out to write
libmount and I wanted to convert autofs to use it.

Sadly I got swamped with other work and ended up more concerned
about eliminating proc mount table usage wherever possible in
autofs but with the fsinfo() and mpount-api changes I should be
able to change autofs to use libmount.

After all these years I'll finally be able to get meaningful
error codes that I simply can't get from mount(8) or mount.nfs(8).

The autofs kernel module has been capable of passing these back to
user space for years now and there shouldn't be too many autofs
user space changes needed.

But there's a lot of work to be done on libmount and we absolutely
must keep libmount stable all the way so it's a big challenge.

> 
> Anyway, the primary goal is to use the new syscalls on standard
> places (e.g. libmount) where it improves performance.
> 
> > > I.e. why not subdirs for 'by-mountpoint', or by-device, or
> > > whole-dev-vs.partition, or by UUID....like some things are listed
> > > in /dev.  That would allow you to narrow in on the mount you want for
> > > doing whatever.
> > 
> > TBH, I can't see that amount of code being added to the VFS
> > for this.
> > 
> > Simple annoyances like some mounts won't have a UUID, or won't
> > have partition devices associated with them will also cause
> > inconsistent views of the mounts.
> 
> or more filesystems mounted on the same mountpoint, mountpoint is
> deleted, etc...
> 
>     Karel
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-05-23  1:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-13  5:33 Using the upcoming fsinfo() Ian Kent
2019-05-13  9:08 ` Karel Zak
2019-05-13 16:04   ` Bruce Dubbs
2019-05-14  0:04     ` Ian Kent
2019-05-15 11:27     ` Karel Zak
2019-05-14  0:23   ` Ian Kent
2019-05-15 11:45     ` Karel Zak
2019-05-16  0:13       ` Ian Kent
2019-05-21 19:21         ` L A Walsh
2019-05-22  2:59           ` Ian Kent
2019-05-22  3:12             ` Ian Kent
2019-05-22  4:28             ` L A Walsh
2019-05-22 13:14               ` Ian Kent
2019-05-22 13:55                 ` Karel Zak
2019-05-23  1:27                   ` Ian Kent

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).