linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [RFC][PATCH] fanotify: introduce filesystem view mark
Date: Mon, 17 May 2021 15:07:50 +0200	[thread overview]
Message-ID: <20210517130750.GA25760@quack2.suse.cz> (raw)
In-Reply-To: <CAOQ4uxgQ-gS5YBEPy2UEcwbO9Y0ie2vVGQn6Wts3Z8x3LZPfog@mail.gmail.com>

On Mon 17-05-21 15:45:29, Amir Goldstein wrote:
> On Mon, May 17, 2021 at 12:09 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Sat 15-05-21 17:28:27, Amir Goldstein wrote:
> > > On Fri, May 14, 2021 at 4:56 PM Christian Brauner
> > > <christian.brauner@ubuntu.com> wrote:
> > > > > for changes with idmap-filtered mark, then it won't see notification for
> > > > > those changes because A presumably runs in a different namespace than B, am
> > > > > I imagining this right? So mark which filters events based on namespace of
> > > > > the originating process won't be usable for such usecase AFAICT.
> > > >
> > > > Idmap filtered marks won't cover that use-case as envisioned now. Though
> > > > I'm not sure they really need to as the semantics are related to mount
> > > > marks.
> > >
> > > We really need to refer to those as filesystem marks. They are definitely
> > > NOT mount marks. We are trying to design a better API that will not share
> > > as many flaws with mount marks...
> >
> > I agree. I was pondering about this usecase exactly because the problem with
> > changes done through mount A and visible through mount B which didn't get
> > a notification were source of complaints about fanotify in the past and the
> > reason why you came up with filesystem marks.
> >
> > > > A mount mark would allow you to receive events based on the
> > > > originating mount. If two mounts A and B are separate but expose the
> > > > same files you wouldn't see events caused by B if you're watching A.
> > > > Similarly you would only see events from mounts that have been delegated
> > > > to you through the idmapped userns. I find this acceptable especially if
> > > > clearly documented.
> > > >
> > >
> > > The way I see it, we should delegate all the decisions over to userspace,
> > > but I agree that the current "simple" proposal may not provide a good
> > > enough answer to the case of a subtree that is shared with the host.
> > >
> > > IMO, it should be a container manager decision whether changes done by
> > > the host are:
> > > a) Not visible to containerized application
> > > b) Watched in host via recursive inode watches
> > > c) Watched in host by filesystem mark filtered in userspace
> > > d) Watched in host by an "noop" idmapped mount in host, through
> > >      which all relevant apps in host access the shared folder
> > >
> > > We can later provide the option of "subtree filtered filesystem mark"
> > > which can be choice (e). It will incur performance overhead on the system
> > > that is higher than option (d) but lower than option (c).
> >
> > But won't b) and c) require the container manager to inject events into the
> > event stream observed by the containerized fanotify user? Because in both
> > these cases the manager needs to consume generated events and decide what
> > to do with them.
> >
> 
> With (b) manager does not need to inject events.
> The manager intercepts fanotify_init() and returns an actual fantify group fd
> in the requesting process fd table.
> 
> Later, when manager intercepts fanotify_mark() with idmapped mark
> request, manager can take care of setting up the recursive inode watches,
> but the requesting process will get the events, because it has a clone of
> the fanotify group fd.

Well, but for recursive inode watches to function, you also have to process
the stream of events to detect created dirs etc. Also you may have to
remove (e.g. directory) events the original user didn't ask for...

> With (c), I guess the intercepted fanotify_init() can return an open pipe
> and proxy the stream of events read from the actual fanotify fd filtering
> out the events.

Yes, that's what I thought about. But it isn't 100% transparent (e.g.
fdinfo will be different).

> I hope we can provide some form of kernel subtree filtering so
> userspace will not need to resort to this sort of practice.

I hope as well :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2021-05-17 13:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-09 18:00 [RFC][PATCH] fanotify: introduce filesystem view mark Amir Goldstein
2020-11-10  5:07 ` Amir Goldstein
2020-11-17  7:09 ` [fanotify] a23a7dc576: unixbench.score -3.7% regression kernel test robot
2020-11-24 13:49 ` [RFC][PATCH] fanotify: introduce filesystem view mark Jan Kara
2020-11-24 14:47   ` Amir Goldstein
2020-11-25 11:01     ` Jan Kara
2020-11-25 12:34       ` Amir Goldstein
2020-11-26 11:10         ` Jan Kara
2020-11-26 11:50           ` Amir Goldstein
2020-11-26  3:42       ` Amir Goldstein
2020-11-26 11:17         ` Jan Kara
2021-04-28 18:28           ` Amir Goldstein
2021-05-03 16:53             ` Jan Kara
2021-05-03 18:44               ` Amir Goldstein
2021-05-05 12:28                 ` Jan Kara
2021-05-05 14:24                   ` Christian Brauner
2021-05-05 14:42                     ` Amir Goldstein
2021-05-05 14:56                       ` Christian Brauner
2021-05-10 10:13                     ` Jan Kara
2021-05-10 11:37                       ` Amir Goldstein
2021-05-10 14:21                         ` Jan Kara
2021-05-10 15:08                           ` Amir Goldstein
2021-05-10 15:27                             ` Jan Kara
2021-05-12 13:07                             ` Christian Brauner
2021-05-12 13:34                               ` Jan Kara
2021-05-12 16:15                                 ` Christian Brauner
2021-05-12 15:26                         ` Christian Brauner
2021-05-13 10:55                           ` Jan Kara
2021-05-14 13:56                             ` Christian Brauner
2021-05-15 14:28                               ` Amir Goldstein
2021-05-17  9:09                                 ` Jan Kara
2021-05-17 12:45                                   ` Amir Goldstein
2021-05-17 13:07                                     ` Jan Kara [this message]
2021-05-18 10:11                                 ` Christian Brauner
2021-05-18 16:02                                   ` Amir Goldstein
2021-05-19  9:31                                     ` Christian Brauner
2021-05-12 16:11                         ` Christian Brauner
2021-05-05 13:25               ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210517130750.GA25760@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=amir73il@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).