Re: Using the upcoming fsinfo()

From: Ian Kent <raven@themaw.net>
To: L A Walsh <lkml@tlinx.org>, Karel Zak <kzak@redhat.com>,
	util-linux@vger.kernel.org
Subject: Re: Using the upcoming fsinfo()
Date: Wed, 22 May 2019 21:14:37 +0800	[thread overview]
Message-ID: <562668febcf41e61ab2477235ee7ce4610bc1e8d.camel@themaw.net> (raw)
In-Reply-To: <5CE4CFEF.5020100@tlinx.org>

On Tue, 2019-05-21 at 21:28 -0700, L A Walsh wrote:
> On 2019/05/21 19:59, Ian Kent wrote:
> > I hadn't planned on producing a utility but I do have code that I've
> > been using to learn how to use the call.
> > 
> > I could turn that into a utility for use from scripts at some point.
> >   
> 
> ---
>      not required, but thought it might allow for more types of
> tests/usages.
> If it is really of limited or no benefit, I'm not gonna lose sleep.
> > Avoiding having to parse string output (from the proc file system
> > mount tables) is one of the key reasons to use a system call for
> > this.
> > 
> > So this isn't the point of doing it.
> >   
> 
> I get that....this wasn't intended as an 'endpoint' just a way for those
> not
> implementing and using the calls to get a feel for the call.Â  It may
> not serve
> a useful purpose in this case, but some system calls have direct
> user-utils that
> are very useful.Â  The lack of a system util to manipulate the pty calls
> forced
> me to write a few-line 'C' prog just to make 1 call to approve
> something.Â  Eventually switched to a more robust interface in perl.

We will see, I will end up with something that's more or less example
usage anyway.

There will be a fairly complex example in the kernel source tree too,
along with other examples, in the samples/vfs directory.

> > The work for this (and some other new system calls) is being done
> > in the kernel so the issue isn't to work out what the system call
> > returns as much as it is to ensure the system call provides what's
> > needed, implement things that aren't yet done and work out ways of
> > providing things that are needed but can't yet be provided.
> >   
> 
> ----
>   No basic testing that the kernel call is producing exactly what you are
> expecting is needed, I take it.

Right, that's why I have written some code.

> >   
> > > 	From there -- those first options could be moved to only 
> > > be used with '--raw' or '--direct' switch with a new switch associated
> > > with, perhaps another util that may eventually be replaced  with this
> > > code that uses the new utility.
> > > 
> > > 	All of that could be done along with a continuing build and
> > > release of the older tools until such time as the new call-using
> > > tool replaces all of the old tool to whatever standard is wanted.
> > >     
> > 
> > I haven't looked at the tools at all.
> > 
> > It may be worth looking at them but fork and exec a program then
> > parse text output isn't usually the way these utilities should
> > work.
> >   
> 
> ----
>   That wasn't what I meant -- just that if you implement functionality in
> a test prog, eventually you would be able to library-ize the call for other
> purposes.Â  I got the impression th

Yes, the investigative code I write will make it's way into
whatever is done.

> > The focus is on eliminating the need to read the proc file system
> > mount tables including getting the mount information for any single
> > mount.
> > 
> > When these tables are large and there's a fair bit of mount/umount
> > activity this can be a significant problem.
> > 
> > Getting this information usually means reading on average half of
> > the whole mount table every time and it's not possible to get info.
> > on a single mount without doing this.
> >   
> 
> ----
>     That sounds like a deficiency in the way mount tables are displayed.

Displayed is probably not the right word, generated is closer to
what happens in the kernel.

> 
> Just like you can look at all net-io with a device name in column 0,
> there's another directory where each device is a filename entry and by
> looking at that
> you can just look at the stats of that 1 file.
> 
> Block devices have the same type of all-or-single readouts as well.
> 
> So why not mounts?

That's worth some thought but I don't think it will work in
this case.

People will take a copy of the information provided in proc
and then use it to lookup a mount. So you still need to read
the list of mounts in the kernel to generate that to find the
piece of information you need that identifies a mount.

And you would still need to traverse the list of mounts to
generate any given view of this information on every access
too so there's not much to be gained since that's what causes
the problem with heavy mount table usage in the first place.

It's not like a fairly static device that will stay around
for a reasonable amount of time, and where the code to
maintain the proc or sysfs entries is local to a particular
driver or file system so the code is localized to a particular
sub-system and therefore reasonably maintainable.

In this case the list of mounts is present in the core VFS
and the VFS needs to cater for all the places where mounts
can be made and accessed.

And there can be significant and frequent changes to mount
information which is another reason it needs to be generated
on access.

Keep in mind the goal of the mount structures is not to make
information about them available but to make the operations
that need to be done on them doable in a sensible amount of
time.

> 
> I.e. why not subdirs for 'by-mountpoint', or by-device, or
> whole-dev-vs.partition, or by UUID....like some things are listed
> in /dev.  That would allow you to narrow in on the mount you want for
> doing whatever.

TBH, I can't see that amount of code being added to the VFS
for this.

Simple annoyances like some mounts won't have a UUID, or won't
have partition devices associated with them will also cause
inconsistent views of the mounts.

It's unlikely anyone would be willing to do it if only because
it would make an already complex body of code much, much harder
to maintain.

A system call is a simpler way to make this available while also
being a fairly concentrated body of code which is much easier to
maintain.

Don't forget that any given process can have a different view of
the list of mounts based on mount namespace which is one thing
that makes the VFS mount code quite complex.

Ian