From: Amir Goldstein <amir73il@gmail.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
"Theodore Ts'o" <tytso@mit.edu>, Karel Zak <kzak@redhat.com>,
Greg KH <gregkh@linuxfoundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>,
linux-man <linux-man@vger.kernel.org>,
LSM <linux-security-module@vger.kernel.org>,
Ian Kent <raven@themaw.net>, David Howells <dhowells@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <christian@brauner.io>,
James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API
Date: Mon, 9 May 2022 17:20:50 +0300 [thread overview]
Message-ID: <CAOQ4uxgCSJ2rpJkPy1FkP__7zhaVXO5dnZQXSzvk=fReaZH7Aw@mail.gmail.com> (raw)
In-Reply-To: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein>
On Mon, May 9, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote:
> > This is a simplification of the getvalues(2) prototype and moving it to the
> > getxattr(2) interface, as suggested by Dave.
> >
> > The patch itself just adds the possibility to retrieve a single line of
> > /proc/$$/mountinfo (which was the basic requirement from which the fsinfo
> > patchset grew out of).
> >
> > But this should be able to serve Amir's per-sb iostats, as well as a host of
> > other cases where some statistic needs to be retrieved from some object. Note:
> > a filesystem object often represents other kinds of objects (such as processes
> > in /proc) so this is not limited to fs attributes.
> >
> > This also opens up the interface to setting attributes via setxattr(2).
> >
> > After some pondering I made the namespace so:
> >
> > : - root
> > bar - an attribute
> > foo: - a folder (can contain attributes and/or folders)
> >
> > The contents of a folder is represented by a null separated list of names.
> >
> > Examples:
> >
> > $ getfattr -etext -n ":" .
> > # file: .
> > :="mnt:\000mntns:"
> >
> > $ getfattr -etext -n ":mnt:" .
> > # file: .
> > :mnt:="info"
> >
> > $ getfattr -etext -n ":mnt:info" .
> > # file: .
> > :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012"
>
> Hey Miklos,
>
> One comment about this. We really need to have this interface support
> giving us mount options like "relatime" back in numeric form (I assume
> this will be possible.). It is royally annoying having to maintain a
> mapping table in userspace just to do:
>
> relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME
> ro -> MS_RDONLY/MOUNT_ATTR_RDONLY
>
> A library shouldn't be required to use this interface. Conservative
> low-level software that keeps its shared library dependencies minimal
> will need to be able to use that interface without having to go to an
> external library that transforms text-based output to binary form (Which
> I'm very sure will need to happen if we go with a text-based
> interface.).
>
No need for a library.
We can export:
:mnt:attr:flags (in hex format)
> >
> > $ getfattr -etext -n ":mntns:" .
> > # file: .
> > :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:"
> >
> > $ getfattr -etext -n ":mntns:28:" .
> > # file: .
> > :mntns:28:="info"
> >
> > Comments?
>
> I'm not a fan of text-based APIs and I'm particularly not a fan of the
> xattr APIs. But at this point I'm ready to compromise on a lot as long
> as it gets us values out of the kernel in some way. :)
>
> I had to use xattrs extensively in various low-level userspace projects
> and they continue to be a source of races and memory bugs.
>
> A few initial questions:
>
> * The xattr APIs often require the caller to do sm like (copying some go
> code quickly as I have that lying around):
>
> for _, x := range split {
> xattr := string(x)
> // Call Getxattr() twice: First, to determine the size of the
> // buffer we need to allocate to store the extended attributes,
> // second, to actually store the extended attributes in the
> // buffer. Also, check if the size of the extended attribute
> // hasn't increased between the two calls.
> pre, err = unix.Getxattr(path, xattr, nil)
> if err != nil || pre < 0 {
> return nil, err
> }
>
> dest = make([]byte, pre)
> post := 0
> if pre > 0 {
> post, err = unix.Getxattr(path, xattr, dest)
> if err != nil || post < 0 {
> return nil, err
> }
> }
>
> if post > pre {
> return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post)
> }
>
> xattrs[xattr] = string(dest)
> }
>
> This pattern of requesting the size first by passing empty arguments,
> then allocating the buffer and then passing down that buffer to
> retrieve that value is really annoying to use and error prone (I do
> of course understand why it exists.).
>
> For real xattrs it's not that bad because we can assume that these
> values don't change often and so the race window between
> getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But
> fwiw, the post > pre check doesn't exist for no reason; we do indeed
> hit that race.
It is not really a race, you can do {} while (errno != ERANGE) and there
will be no race.
>
> In addition, it is costly having to call getxattr() twice. Again, for
> retrieving xattrs it often doesn't matter because it's not a super
> common operation but for mount and other info it might matter.
>
samba and many other projects that care about efficiency solved this
a long time ago with an opportunistic buffer - never start with NULL buffer
most values will fit in a 1K buffer.
> Will we have to use the same pattern for mnt and other info as well?
> If so, I worry that the race is way more likely than it is for real
> xattrs.
>
> * Would it be possible to support binary output with this interface?
> I really think users would love to have an interfact where they can
> get a struct with binary info back. I'm not advocating to make the
> whole interface binary but I wouldn't mind having the option to
> support it.
> Especially for some information at least. I'd really love to have a
> way go get a struct mount_info or whatever back that gives me all the
> details about a mount encompassed in a single struct.
>
I suggested that up thread and Greg has explicitly and loudly
NACKed it - so you will have to take it up with him
> Callers like systemd will have to parse text and will end up
> converting everything from text into binary anyway; especially for
> mount information. So giving them an option for this out of the box
> would be quite good.
>
> Interfaces like statx aim to be as fast as possible because we exptect
> them to be called quite often. Retrieving mount info is quite costly
> and is done quite often as well. Maybe not for all software but for a
> lot of low-level software. Especially when starting services in
> systemd a lot of mount parsing happens similar when starting
> containers in runtimes.
>
This API is not for *everything*. Obviously it does not replace statx and some
info (like the cifs OFFLINE flag) should be added to statx.
Whether or not mount info needs to get a special treatment like statx
is not proven.
Miklos claims this is a notification issue-
With David Howells' mount notification API, systemd can be pointed at the new
mount that was added/removed/changed and then systemd will rarely need to
parse thousands of mounts info.
> * If we decide to go forward with this interface - and I think I
> mentioned this in the lsfmm session - could we please at least add a
> new system call? It really feels wrong to retrieve mount and other
> information through the xattr interfaces. They aren't really xattrs.
>
> Imho, xattrs are a bit like a wonky version of streams already (One of
> the reasons I find them quite unpleasant.). Making mount and other
> information retrievable directly through the getxattr() interface will
> turn them into a full-on streams implementation imho. I'd prefer not
> to do that (Which is another reason I'd prefer at least a separate
> system call.).
If you are thinking about a read() like interface for xattr or any alternative
data stream then Linus has NACKed it times before.
However, we could add getxattr_multi() as Dave Chinner suggested for
enumerating multiple keys+ (optional) values.
In contrast to listxattr(), getxattr_multi() could allow a "short read"
at least w.r.t the size of the vector.
Thanks,
Amir.
next prev parent reply other threads:[~2022-05-09 14:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-03 12:23 [RFC PATCH] getting misc stats/attributes via xattr API Miklos Szeredi
2022-05-03 14:39 ` Amir Goldstein
2022-05-03 14:53 ` Greg KH
2022-05-03 15:04 ` Miklos Szeredi
2022-05-03 15:14 ` Amir Goldstein
2022-05-03 16:54 ` Greg KH
2022-05-03 22:43 ` Dave Chinner
2022-05-04 7:18 ` Miklos Szeredi
2022-05-04 14:22 ` Amir Goldstein
2022-05-05 12:30 ` Karel Zak
2022-05-05 13:59 ` Miklos Szeredi
2022-05-05 23:38 ` tytso
2022-05-06 0:06 ` Amir Goldstein
2022-05-07 0:32 ` Dave Chinner
2022-05-09 12:48 ` Christian Brauner
2022-05-09 14:20 ` Amir Goldstein [this message]
2022-05-09 15:08 ` Christian Brauner
2022-05-09 17:07 ` Amir Goldstein
2022-05-09 21:42 ` Vivek Goyal
2022-05-10 3:34 ` Ian Kent
2022-05-10 0:55 ` Dave Chinner
2022-05-10 12:40 ` Christian Brauner
2022-05-11 0:42 ` Dave Chinner
2022-05-11 9:16 ` Christian Brauner
2022-05-10 12:45 ` Florian Weimer
2022-05-10 23:04 ` Dave Chinner
2022-05-10 3:49 ` Miklos Szeredi
2022-05-10 4:27 ` Ian Kent
2022-05-10 8:06 ` Miklos Szeredi
2022-05-10 8:07 ` Miklos Szeredi
2022-05-10 11:53 ` Christian Brauner
2022-05-10 13:15 ` Miklos Szeredi
2022-05-10 13:18 ` Miklos Szeredi
2022-05-10 14:19 ` Christian Brauner
2022-05-10 14:41 ` Miklos Szeredi
2022-05-10 15:30 ` Christian Brauner
2022-05-10 15:47 ` Miklos Szeredi
2022-05-10 15:53 ` Christian Brauner
2022-05-10 12:35 ` Karel Zak
2022-05-10 23:25 ` Dave Chinner
2022-05-11 8:58 ` Karel Zak
2022-11-14 9:00 ` Abel Wu
2022-11-14 12:35 ` Miklos Szeredi
2022-11-15 3:39 ` Abel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxgCSJ2rpJkPy1FkP__7zhaVXO5dnZQXSzvk=fReaZH7Aw@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=brauner@kernel.org \
--cc=christian@brauner.io \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=kzak@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=raven@themaw.net \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).