linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	"Theodore Ts'o" <tytso@mit.edu>, Karel Zak <kzak@redhat.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	linux-man <linux-man@vger.kernel.org>,
	LSM <linux-security-module@vger.kernel.org>,
	Ian Kent <raven@themaw.net>, David Howells <dhowells@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <christian@brauner.io>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API
Date: Mon, 9 May 2022 17:20:50 +0300	[thread overview]
Message-ID: <CAOQ4uxgCSJ2rpJkPy1FkP__7zhaVXO5dnZQXSzvk=fReaZH7Aw@mail.gmail.com> (raw)
In-Reply-To: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein>

On Mon, May 9, 2022 at 3:48 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote:
> > This is a simplification of the getvalues(2) prototype and moving it to the
> > getxattr(2) interface, as suggested by Dave.
> >
> > The patch itself just adds the possibility to retrieve a single line of
> > /proc/$$/mountinfo (which was the basic requirement from which the fsinfo
> > patchset grew out of).
> >
> > But this should be able to serve Amir's per-sb iostats, as well as a host of
> > other cases where some statistic needs to be retrieved from some object.  Note:
> > a filesystem object often represents other kinds of objects (such as processes
> > in /proc) so this is not limited to fs attributes.
> >
> > This also opens up the interface to setting attributes via setxattr(2).
> >
> > After some pondering I made the namespace so:
> >
> > : - root
> > bar - an attribute
> > foo: - a folder (can contain attributes and/or folders)
> >
> > The contents of a folder is represented by a null separated list of names.
> >
> > Examples:
> >
> > $ getfattr -etext -n ":" .
> > # file: .
> > :="mnt:\000mntns:"
> >
> > $ getfattr -etext -n ":mnt:" .
> > # file: .
> > :mnt:="info"
> >
> > $ getfattr -etext -n ":mnt:info" .
> > # file: .
> > :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012"
>
> Hey Miklos,
>
> One comment about this. We really need to have this interface support
> giving us mount options like "relatime" back in numeric form (I assume
> this will be possible.). It is royally annoying having to maintain a
> mapping table in userspace just to do:
>
> relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME
> ro       -> MS_RDONLY/MOUNT_ATTR_RDONLY
>
> A library shouldn't be required to use this interface. Conservative
> low-level software that keeps its shared library dependencies minimal
> will need to be able to use that interface without having to go to an
> external library that transforms text-based output to binary form (Which
> I'm very sure will need to happen if we go with a text-based
> interface.).
>

No need for a library.
We can export:

:mnt:attr:flags (in hex format)

> >
> > $ getfattr -etext -n ":mntns:" .
> > # file: .
> > :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:"
> >
> > $ getfattr -etext -n ":mntns:28:" .
> > # file: .
> > :mntns:28:="info"
> >
> > Comments?
>
> I'm not a fan of text-based APIs and I'm particularly not a fan of the
> xattr APIs. But at this point I'm ready to compromise on a lot as long
> as it gets us values out of the kernel in some way. :)
>
> I had to use xattrs extensively in various low-level userspace projects
> and they continue to be a source of races and memory bugs.
>
> A few initial questions:
>
> * The xattr APIs often require the caller to do sm like (copying some go
>   code quickly as I have that lying around):
>
>         for _, x := range split {
>                 xattr := string(x)
>                 // Call Getxattr() twice: First, to determine the size of the
>                 // buffer we need to allocate to store the extended attributes,
>                 // second, to actually store the extended attributes in the
>                 // buffer. Also, check if the size of the extended attribute
>                 // hasn't increased between the two calls.
>                 pre, err = unix.Getxattr(path, xattr, nil)
>                 if err != nil || pre < 0 {
>                         return nil, err
>                 }
>
>                 dest = make([]byte, pre)
>                 post := 0
>                 if pre > 0 {
>                         post, err = unix.Getxattr(path, xattr, dest)
>                         if err != nil || post < 0 {
>                                 return nil, err
>                         }
>                 }
>
>                 if post > pre {
>                         return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post)
>                 }
>
>                 xattrs[xattr] = string(dest)
>         }
>
>   This pattern of requesting the size first by passing empty arguments,
>   then allocating the buffer and then passing down that buffer to
>   retrieve that value is really annoying to use and error prone (I do
>   of course understand why it exists.).
>
>   For real xattrs it's not that bad because we can assume that these
>   values don't change often and so the race window between
>   getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But
>   fwiw, the post > pre check doesn't exist for no reason; we do indeed
>   hit that race.

It is not really a race, you can do {} while (errno != ERANGE) and there
will be no race.

>
>   In addition, it is costly having to call getxattr() twice. Again, for
>   retrieving xattrs it often doesn't matter because it's not a super
>   common operation but for mount and other info it might matter.
>

samba and many other projects that care about efficiency solved this
a long time ago with an opportunistic buffer - never start with NULL buffer
most values will fit in a 1K buffer.

>   Will we have to use the same pattern for mnt and other info as well?
>   If so, I worry that the race is way more likely than it is for real
>   xattrs.
>
> * Would it be possible to support binary output with this interface?
>   I really think users would love to have an interfact where they can
>   get a struct with binary info back. I'm not advocating to make the
>   whole interface binary but I wouldn't mind having the option to
>   support it.
>   Especially for some information at least. I'd really love to have a
>   way go get a struct mount_info or whatever back that gives me all the
>   details about a mount encompassed in a single struct.
>

I suggested that up thread and Greg has explicitly and loudly
NACKed it - so you will have to take it up with him

>   Callers like systemd will have to parse text and will end up
>   converting everything from text into binary anyway; especially for
>   mount information. So giving them an option for this out of the box
>   would be quite good.
>
>   Interfaces like statx aim to be as fast as possible because we exptect
>   them to be called quite often. Retrieving mount info is quite costly
>   and is done quite often as well. Maybe not for all software but for a
>   lot of low-level software. Especially when starting services in
>   systemd a lot of mount parsing happens similar when starting
>   containers in runtimes.
>

This API is not for *everything*. Obviously it does not replace statx and some
info (like the cifs OFFLINE flag) should be added to statx.
Whether or not mount info needs to get a special treatment like statx
is not proven.
Miklos claims this is a notification issue-
With David Howells' mount notification API, systemd can be pointed at the new
mount that was added/removed/changed and then systemd will rarely need to
parse thousands of mounts info.

> * If we decide to go forward with this interface - and I think I
>   mentioned this in the lsfmm session - could we please at least add a
>   new system call? It really feels wrong to retrieve mount and other
>   information through the xattr interfaces. They aren't really xattrs.
>
>   Imho, xattrs are a bit like a wonky version of streams already (One of
>   the reasons I find them quite unpleasant.). Making mount and other
>   information retrievable directly through the getxattr() interface will
>   turn them into a full-on streams implementation imho. I'd prefer not
>   to do that (Which is another reason I'd prefer at least a separate
>   system call.).

If you are thinking about a read() like interface for xattr or any alternative
data stream then Linus has NACKed it times before.

However, we could add getxattr_multi() as Dave Chinner suggested for
enumerating multiple keys+ (optional) values.
In contrast to listxattr(), getxattr_multi() could allow a "short read"
at least w.r.t the size of the vector.

Thanks,
Amir.

  reply	other threads:[~2022-05-09 14:21 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 12:23 [RFC PATCH] getting misc stats/attributes via xattr API Miklos Szeredi
2022-05-03 14:39 ` Amir Goldstein
2022-05-03 14:53   ` Greg KH
2022-05-03 15:04     ` Miklos Szeredi
2022-05-03 15:14       ` Amir Goldstein
2022-05-03 16:54       ` Greg KH
2022-05-03 22:43 ` Dave Chinner
2022-05-04  7:18   ` Miklos Szeredi
2022-05-04 14:22     ` Amir Goldstein
2022-05-05 12:30 ` Karel Zak
2022-05-05 13:59   ` Miklos Szeredi
2022-05-05 23:38 ` tytso
2022-05-06  0:06   ` Amir Goldstein
2022-05-07  0:32   ` Dave Chinner
2022-05-09 12:48 ` Christian Brauner
2022-05-09 14:20   ` Amir Goldstein [this message]
2022-05-09 15:08     ` Christian Brauner
2022-05-09 17:07       ` Amir Goldstein
2022-05-09 21:42       ` Vivek Goyal
2022-05-10  3:34         ` Ian Kent
2022-05-10  0:55   ` Dave Chinner
2022-05-10 12:40     ` Christian Brauner
2022-05-11  0:42       ` Dave Chinner
2022-05-11  9:16         ` Christian Brauner
2022-05-10 12:45     ` Florian Weimer
2022-05-10 23:04       ` Dave Chinner
2022-05-10  3:49   ` Miklos Szeredi
2022-05-10  4:27     ` Ian Kent
2022-05-10  8:06       ` Miklos Szeredi
2022-05-10  8:07         ` Miklos Szeredi
2022-05-10 11:53     ` Christian Brauner
2022-05-10 13:15       ` Miklos Szeredi
2022-05-10 13:18         ` Miklos Szeredi
2022-05-10 14:19         ` Christian Brauner
2022-05-10 14:41           ` Miklos Szeredi
2022-05-10 15:30             ` Christian Brauner
2022-05-10 15:47               ` Miklos Szeredi
2022-05-10 15:53                 ` Christian Brauner
2022-05-10 12:35   ` Karel Zak
2022-05-10 23:25     ` Dave Chinner
2022-05-11  8:58       ` Karel Zak
2022-11-14  9:00 ` Abel Wu
2022-11-14 12:35   ` Miklos Szeredi
2022-11-15  3:39     ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxgCSJ2rpJkPy1FkP__7zhaVXO5dnZQXSzvk=fReaZH7Aw@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kzak@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=raven@themaw.net \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).