linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Theodore Ts'o <tytso@mit.edu>, Karel Zak <kzak@redhat.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	linux-man <linux-man@vger.kernel.org>,
	LSM <linux-security-module@vger.kernel.org>,
	Ian Kent <raven@themaw.net>, David Howells <dhowells@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <christian@brauner.io>,
	Amir Goldstein <amir73il@gmail.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Subject: Re: [RFC PATCH] getting misc stats/attributes via xattr API
Date: Mon, 9 May 2022 14:48:15 +0200	[thread overview]
Message-ID: <20220509124815.vb7d2xj5idhb2wq6@wittgenstein> (raw)
In-Reply-To: <YnEeuw6fd1A8usjj@miu.piliscsaba.redhat.com>

On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote:
> This is a simplification of the getvalues(2) prototype and moving it to the
> getxattr(2) interface, as suggested by Dave.
> 
> The patch itself just adds the possibility to retrieve a single line of
> /proc/$$/mountinfo (which was the basic requirement from which the fsinfo
> patchset grew out of).
> 
> But this should be able to serve Amir's per-sb iostats, as well as a host of
> other cases where some statistic needs to be retrieved from some object.  Note:
> a filesystem object often represents other kinds of objects (such as processes
> in /proc) so this is not limited to fs attributes.
> 
> This also opens up the interface to setting attributes via setxattr(2).
> 
> After some pondering I made the namespace so:
> 
> : - root
> bar - an attribute
> foo: - a folder (can contain attributes and/or folders)
> 
> The contents of a folder is represented by a null separated list of names.
> 
> Examples:
> 
> $ getfattr -etext -n ":" .
> # file: .
> :="mnt:\000mntns:"
> 
> $ getfattr -etext -n ":mnt:" .
> # file: .
> :mnt:="info"
> 
> $ getfattr -etext -n ":mnt:info" .
> # file: .
> :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012"

Hey Miklos,

One comment about this. We really need to have this interface support
giving us mount options like "relatime" back in numeric form (I assume
this will be possible.). It is royally annoying having to maintain a
mapping table in userspace just to do:

relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME
ro	 -> MS_RDONLY/MOUNT_ATTR_RDONLY

A library shouldn't be required to use this interface. Conservative
low-level software that keeps its shared library dependencies minimal
will need to be able to use that interface without having to go to an
external library that transforms text-based output to binary form (Which
I'm very sure will need to happen if we go with a text-based
interface.).

> 
> $ getfattr -etext -n ":mntns:" .
> # file: .
> :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:"
> 
> $ getfattr -etext -n ":mntns:28:" .
> # file: .
> :mntns:28:="info"
> 
> Comments?

I'm not a fan of text-based APIs and I'm particularly not a fan of the
xattr APIs. But at this point I'm ready to compromise on a lot as long
as it gets us values out of the kernel in some way. :)

I had to use xattrs extensively in various low-level userspace projects
and they continue to be a source of races and memory bugs.

A few initial questions:

* The xattr APIs often require the caller to do sm like (copying some go
  code quickly as I have that lying around):

	for _, x := range split {
		xattr := string(x)
		// Call Getxattr() twice: First, to determine the size of the
		// buffer we need to allocate to store the extended attributes,
		// second, to actually store the extended attributes in the
		// buffer. Also, check if the size of the extended attribute
		// hasn't increased between the two calls.
		pre, err = unix.Getxattr(path, xattr, nil)
		if err != nil || pre < 0 {
			return nil, err
		}

		dest = make([]byte, pre)
		post := 0
		if pre > 0 {
			post, err = unix.Getxattr(path, xattr, dest)
			if err != nil || post < 0 {
				return nil, err
			}
		}

		if post > pre {
			return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post)
		}

		xattrs[xattr] = string(dest)
	}

  This pattern of requesting the size first by passing empty arguments,
  then allocating the buffer and then passing down that buffer to
  retrieve that value is really annoying to use and error prone (I do
  of course understand why it exists.).

  For real xattrs it's not that bad because we can assume that these
  values don't change often and so the race window between
  getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But
  fwiw, the post > pre check doesn't exist for no reason; we do indeed
  hit that race.
  
  In addition, it is costly having to call getxattr() twice. Again, for
  retrieving xattrs it often doesn't matter because it's not a super
  common operation but for mount and other info it might matter.

  Will we have to use the same pattern for mnt and other info as well?
  If so, I worry that the race is way more likely than it is for real
  xattrs.

* Would it be possible to support binary output with this interface?
  I really think users would love to have an interfact where they can
  get a struct with binary info back. I'm not advocating to make the
  whole interface binary but I wouldn't mind having the option to
  support it.
  Especially for some information at least. I'd really love to have a
  way go get a struct mount_info or whatever back that gives me all the
  details about a mount encompassed in a single struct.

  Callers like systemd will have to parse text and will end up
  converting everything from text into binary anyway; especially for
  mount information. So giving them an option for this out of the box
  would be quite good.

  Interfaces like statx aim to be as fast as possible because we exptect
  them to be called quite often. Retrieving mount info is quite costly
  and is done quite often as well. Maybe not for all software but for a
  lot of low-level software. Especially when starting services in
  systemd a lot of mount parsing happens similar when starting
  containers in runtimes.

* If we decide to go forward with this interface - and I think I
  mentioned this in the lsfmm session - could we please at least add a
  new system call? It really feels wrong to retrieve mount and other
  information through the xattr interfaces. They aren't really xattrs.

  Imho, xattrs are a bit like a wonky version of streams already (One of
  the reasons I find them quite unpleasant.). Making mount and other
  information retrievable directly through the getxattr() interface will
  turn them into a full-on streams implementation imho. I'd prefer not
  to do that (Which is another reason I'd prefer at least a separate
  system call.).

  parent reply	other threads:[~2022-05-09 12:48 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 12:23 [RFC PATCH] getting misc stats/attributes via xattr API Miklos Szeredi
2022-05-03 14:39 ` Amir Goldstein
2022-05-03 14:53   ` Greg KH
2022-05-03 15:04     ` Miklos Szeredi
2022-05-03 15:14       ` Amir Goldstein
2022-05-03 16:54       ` Greg KH
2022-05-03 22:43 ` Dave Chinner
2022-05-04  7:18   ` Miklos Szeredi
2022-05-04 14:22     ` Amir Goldstein
2022-05-05 12:30 ` Karel Zak
2022-05-05 13:59   ` Miklos Szeredi
2022-05-05 23:38 ` tytso
2022-05-06  0:06   ` Amir Goldstein
2022-05-07  0:32   ` Dave Chinner
2022-05-09 12:48 ` Christian Brauner [this message]
2022-05-09 14:20   ` Amir Goldstein
2022-05-09 15:08     ` Christian Brauner
2022-05-09 17:07       ` Amir Goldstein
2022-05-09 21:42       ` Vivek Goyal
2022-05-10  3:34         ` Ian Kent
2022-05-10  0:55   ` Dave Chinner
2022-05-10 12:40     ` Christian Brauner
2022-05-11  0:42       ` Dave Chinner
2022-05-11  9:16         ` Christian Brauner
2022-05-10 12:45     ` Florian Weimer
2022-05-10 23:04       ` Dave Chinner
2022-05-10  3:49   ` Miklos Szeredi
2022-05-10  4:27     ` Ian Kent
2022-05-10  8:06       ` Miklos Szeredi
2022-05-10  8:07         ` Miklos Szeredi
2022-05-10 11:53     ` Christian Brauner
2022-05-10 13:15       ` Miklos Szeredi
2022-05-10 13:18         ` Miklos Szeredi
2022-05-10 14:19         ` Christian Brauner
2022-05-10 14:41           ` Miklos Szeredi
2022-05-10 15:30             ` Christian Brauner
2022-05-10 15:47               ` Miklos Szeredi
2022-05-10 15:53                 ` Christian Brauner
2022-05-10 12:35   ` Karel Zak
2022-05-10 23:25     ` Dave Chinner
2022-05-11  8:58       ` Karel Zak
2022-11-14  9:00 ` Abel Wu
2022-11-14 12:35   ` Miklos Szeredi
2022-11-15  3:39     ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220509124815.vb7d2xj5idhb2wq6@wittgenstein \
    --to=brauner@kernel.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=amir73il@gmail.com \
    --cc=christian@brauner.io \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kzak@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=raven@themaw.net \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).