All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jan Kara <jack@suse.cz>,
	Matthew Bobrowski <mbobrowski@mbobrowski.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH v2 0/2] unprivileged fanotify listener
Date: Wed, 17 Mar 2021 21:14:06 +0200	[thread overview]
Message-ID: <CAOQ4uxjCjapuAHbYuP8Q_k0XD59UmURbmkGC1qcPkPAgQbQ8DA@mail.gmail.com> (raw)
In-Reply-To: <20210317174532.cllfsiagoudoz42m@wittgenstein>

On Wed, Mar 17, 2021 at 7:45 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Wed, Mar 17, 2021 at 02:19:57PM +0200, Amir Goldstein wrote:
> > On Wed, Mar 17, 2021 at 1:42 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 17-03-21 13:01:35, Amir Goldstein wrote:
> > > > On Tue, Mar 16, 2021 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Thu 04-03-21 13:29:19, Amir Goldstein wrote:
> > > > > > Jan,
> > > > > >
> > > > > > These patches try to implement a minimal set and least controversial
> > > > > > functionality that we can allow for unprivileged users as a starting
> > > > > > point.
> > > > > >
> > > > > > The patches were tested on top of v5.12-rc1 and the fanotify_merge
> > > > > > patches using the unprivileged listener LTP tests written by Matthew
> > > > > > and another LTP tests I wrote to test the sysfs tunable limits [1].
> > > > >
> > > > > Thanks. I've added both patches to my tree.
> > > >
> > > > Great!
> > > > I'll go post the LTP tests and work on the man page updates.
> > > >
> > > > BTW, I noticed that you pushed the aggregating for_next branch,
> > > > but not the fsnotify topic branch.
> > > >
> > > > Is this intentional?
> > >
> > > Not really, pushed now. Thanks for reminder.
> > >
> > > > I am asking because I am usually basing my development branches
> > > > off of your fsnotify branch, but I can base them on the unpushed branch.
> > > >
> > > > Heads up. I am playing with extra privileges we may be able to
> > > > allow an ns_capable user.
> > > > For example, watching a FS_USERNS_MOUNT filesystem that the user
> > > > itself has mounted inside userns.
> > > >
> > > > Another feature I am investigating is how to utilize the new idmapped
> > > > mounts to get a subtree watch functionality. This requires attaching a
> > > > userns to the group on fanotify_init().
> > > >
> > > > <hand waving>
> > > > If the group's userns are the same or below the idmapped mount userns,
> > > > then all the objects accessed via that idmapped mount are accessible
> > > > to the group's userns admin. We can use that fact to filter events very
> > > > early based on their mnt_userns and the group's userns, which should be
> > > > cheaper than any subtree permission checks.
> > > > <\hand waving>
> > >
> > > Yeah, I agree this should work. Just it seems to me the userbase for this
> > > functionality will be (at least currently) rather limited. While full
> >
> > That may change when systemd home dirs feature starts to use
> > idmapped mounts.
> > Being able to watch the user's entire home directory is a big win
> > already.
>
> Hey Amir,
> Hey Jan,
>
> I think so too.
>
> >
> > > subtree watches would be IMO interesting to much more users.
> >
> > Agreed.
>
> We have a use-case for subtree watches: One feature for containers we
> have is that users can e.g. tell us that they want the container manager
> to hotplug an arbitrary unix or block device into the container whenever
> the relevant device shows up on the system. For example they could
> instruct the container manager to plugin some new driver device when it
> shows up in /dev. That works nicely because of uevents. But users quite
> often also instruct us to plugin a path once it shows up in some
> directory in the filesystem hierarchy and unplug it once it is removed.
> Right now we're mainting an inotify-based hand-rolled recursive watch to
> make this work so we detect that add and remove event. I would be wildly
> excited if we could get rid of some of that complexity by using subtree
> watches. The container manager on the host will be unaffected by this
> feature since it will usually have root privileges and manage
> unprivileged containers.
> The unprivileged (userns use-case specifically here) subtree watches
> will be necessary and really good to have to make this work for
> container workloads and nested containers, i.e. where the container
> manager itselfs runs in a container and starts new containres. Since the
> subtree feature would be interesting for systemd itself and since our
> container manager (ChromeOS etc.) runs systemd inside unprivileged
> containers on a large scale it would be good if subtree watches could
> work in userns too.
>

I don't understand the subtree watch use case.
You will have to walk me through it.

What exactly is the container manager trying to detect?
That a subdir of a specific name/path was created/deleted?
It doesn't sound like a recursive watch is needed for that.
What am I missing?

As for nested container managers (and systemd), my thinking is
that if all the mounts that manager is watching for serving its containers
are idmapped to that manager's userns (is that a viable option?), then
there shouldn't be a problem to setup userns filtered watches in order to
be notified on all the events that happen via those idmapped mounts
and filtering by "subtree" is not needed.
I am clearly far from understanding the big picture.

> >
> > I was looking into that as well, using the example of nfsd_acceptable()
> > to implement the subtree permission check.
> >
> > The problem here is that even if unprivileged users cannot compromise
> > security, they can still cause significant CPU overhead either queueing
> > events or filtering events and that is something I haven't been able to
> > figure out a way to escape from.
> >
> > BUT, if you allow userns admin to setup subtree watches (a.k.a filtered
> > filesystem marks) on a userns filesystem/idmapped mount, now users
>
> I think that sounds reasonable.
> If the mount really is idmapped, it might be interesting to consider
> checking for privilege in the mnt_userns in addition to the regular
> permission checks that fanotify performs. My (equally handwavy) thinking
> is that this might allow for a nice feature where the creator of the
> mount (e.g. systemd) can block the creation of subtree watches by
> attaching a mnt_userns to the mnt that the user has no privilege in.
> (Just a thought.).
>

Currently, (upstream) only init_userns CAP_SYS_ADMIN can setup
fanotify watches.
In linux-next, unprivileged user can already setup inode watches
(i.e. like inotify).

So I am not sure what you are referring to by "block the creation of
subtree watches".

If systemd were to idmap my home dir to mnt_userns where my user
has CAP_SYS_ADMIN, then allowing my user to setup a watch for
all events on that mount should not be too hard.
If you think that is useful and you want to play with this feature I can
provide a WIP branch soon.

Thanks,
Amir.

  reply	other threads:[~2021-03-17 19:15 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 11:29 [PATCH v2 0/2] unprivileged fanotify listener Amir Goldstein
2021-03-04 11:29 ` [PATCH v2 1/2] fanotify: configurable limits via sysfs Amir Goldstein
2021-03-04 11:29 ` [PATCH v2 2/2] fanotify: support limited functionality for unprivileged users Amir Goldstein
2021-03-16 15:55 ` [PATCH v2 0/2] unprivileged fanotify listener Jan Kara
2021-03-17 11:01   ` Amir Goldstein
2021-03-17 11:42     ` Jan Kara
2021-03-17 12:19       ` Amir Goldstein
2021-03-17 17:45         ` Christian Brauner
2021-03-17 19:14           ` Amir Goldstein [this message]
2021-03-18 14:31             ` Christian Brauner
2021-03-18 16:48               ` Amir Goldstein
2021-03-19 13:40                 ` Christian Brauner
2021-03-19 14:21                   ` Amir Goldstein
2021-03-20 12:57                     ` Amir Goldstein
2021-03-22 12:44                       ` Amir Goldstein
2021-03-22 16:28                         ` Christian Brauner
2021-03-22 17:22                           ` Amir Goldstein
2021-03-24 13:57                         ` Amir Goldstein
2021-03-24 14:32                           ` Christian Brauner
2021-03-24 15:05                             ` Amir Goldstein
2021-03-24 16:28                               ` Christian Brauner
2021-03-24 17:07                                 ` Amir Goldstein
2021-03-25 11:12                                   ` Christian Brauner
2021-03-25 15:31                                     ` Amir Goldstein
2021-03-28 14:58                                       ` Amir Goldstein
2021-03-18 15:44         ` Jan Kara
2021-03-18 17:07           ` Amir Goldstein
2021-03-18 18:40             ` Christian Brauner
2021-03-22 18:38             ` Amir Goldstein
2021-03-24 11:48               ` Jan Kara
2021-03-24 15:50                 ` Amir Goldstein
2021-03-25 13:49                   ` Jan Kara
2021-03-25 15:05                     ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxjCjapuAHbYuP8Q_k0XD59UmURbmkGC1qcPkPAgQbQ8DA@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mbobrowski@mbobrowski.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.