All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [GIT PULL] simple xattr updates for v6.2
Date: Tue, 13 Dec 2022 11:46:44 +0100	[thread overview]
Message-ID: <20221213104643.238650-1-brauner@kernel.org> (raw)

Hey Linus,

(I thought I had sent this one yesterday but I only did my usual --dry-run
 routine. So this one is one day late.)

/* Summary */
This ports the simple xattr infrastucture to rely on a simple rbtree protected
by a read-write lock instead of a linked list protected by a spinlock.

A while ago we received reports about scaling issues for filesystems using the
simple xattr infrastructure that also support setting a larger number of
xattrs. Specifically, cgroups and tmpfs.

Both cgroupfs and tmpfs can be mounted by unprivileged users in unprivileged
containers and root in an unprivileged container can set an unrestricted number
of security.* xattrs and privileged users can also set unlimited trusted.*
xattrs. A few more words on further that below. Other xattrs such as user.* are
restricted for kernfs-based instances to a fairly limited number.

As there are apparently users that have a fairly large number of xattrs we
should scale a bit better. Using a simple linked list protected by a spinlock
used for set, get, and list operations doesn't scale well if users use a lot of
xattrs even if it's not a crazy number.

Let's switch to a simple rbtree protected by a rwlock. It scales way better and
gets rid of the perf issues some people reported. We originally had fancier
solutions even using an rcu+seqlock protected rbtree but we had concerns about
being to clever and also that deletion from an rbtree with rcu+seqlock isn't
entirely safe.

The rbtree plus rwlock is perfectly fine. By far the most common operation is
getting an xattr. While setting an xattr is not and should be comparatively
rare. And listxattr() often only happens when copying xattrs between files or
together with the contents to a new file.

Holding a lock across listxattr() is unproblematic because it doesn't list the
values of xattrs. It can only be used to list the names of all xattrs set on a
file. And the number of xattr names that can be listed with listxattr() is
limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is passed then
vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr names are found it
will return -E2BIG. In short, the maximum amount of memory that can be
retrieved via listxattr() is limited and thus listxattr() bounded.

Of course, the API is broken as documented on xattr(7) already. While I have no
idea how the xattr api ended up in this state we should probably try to come up
with something here at some point. An iterator pattern similar to readdir() as
an alternative to listxattr() or something else.

Right now it is extremly strange that users can set millions of xattrs but then
can't use listxattr() to know which xattrs are actually set. And it's really
trivial to do:
for i in {1..1000000}; do setfattr -n security.$i -v $i ./file1; done
And around 5000 xattrs it's impossible to use listxattr() to figure out which
xattrs are actually set. So I have suggested that we try to limit the number of
xattrs for simple xattrs at least. But that's a future patch and I don't
consider it very urgent.

A bonus of this port to rbtree+rwlock is that we shrink the memory consumption
for users of the simple xattr infrastructure.

This also adds kernel documentation to all the functions.

/* Testing */
clang: Ubuntu clang version 15.0.2-1
gcc: gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on v6.1-rc1 and have been sitting in linux-next. No build
failures or warnings were observed. All old and new tests in fstests,
selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with current
mainline.

The following changes since commit 9abf2313adc1ca1b6180c508c25f22f9395cc780:

  Linux 6.1-rc1 (2022-10-16 15:36:24 -0700)

are available in the Git repository at:

  ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git tags/fs.xattr.simple.rework.rbtree.rwlock.v6.2

for you to fetch changes up to 3b4c7bc01727e3a465759236eeac03d0dd686da3:

  xattr: use rbtree for simple_xattrs (2022-11-12 10:49:26 +0100)

Please consider pulling these changes from the signed fs.xattr.simple.rework.rbtree.rwlock.v6.2 tag.

Thanks!
Christian

----------------------------------------------------------------
fs.xattr.simple.rework.rbtree.rwlock.v6.2

----------------------------------------------------------------
Christian Brauner (1):
      xattr: use rbtree for simple_xattrs

 fs/xattr.c            | 317 +++++++++++++++++++++++++++++++++++++++-----------
 include/linux/xattr.h |  38 ++----
 mm/shmem.c            |   2 +-
 3 files changed, 260 insertions(+), 97 deletions(-)

             reply	other threads:[~2022-12-13 10:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 10:46 Christian Brauner [this message]
2022-12-13 18:13 ` [GIT PULL] simple xattr updates for v6.2 pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221213104643.238650-1-brauner@kernel.org \
    --to=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.