linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fs: Do not check if there is a fsnotify watcher on pseudo inodes
Date: Mon, 15 Jun 2020 14:07:46 +0300	[thread overview]
Message-ID: <CAOQ4uxibBL7W0DR4RhC5ZP7Y68ob5Xj6VncWjepgOj=RB8jrGQ@mail.gmail.com> (raw)
In-Reply-To: <20200615081202.GE9449@quack2.suse.cz>

On Mon, Jun 15, 2020 at 11:12 AM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 12-06-20 23:34:16, Amir Goldstein wrote:
> > > > > So maybe it would be better to list all users of alloc_file_pseudo()
> > > > > and say that they all should be opted out of fsnotify, without mentioning
> > > > > "internal mount"?
> > > > >
> > > >
> > > > The users are DMA buffers, CXL, aio, anon inodes, hugetlbfs, anonymous
> > > > pipes, shmem and sockets although not all of them necessary end up using
> > > > a VFS operation that triggers fsnotify.  Either way, I don't think it
> > > > makes sense (or even possible) to watch any of those with fanotify so
> > > > setting the flag seems reasonable.
> > > >
> > >
> > > I also think this seems reasonable, but the more accurate reason IMO
> > > is found in the comment for d_alloc_pseudo():
> > > "allocate a dentry (for lookup-less filesystems)..."
> > >
> > > > I updated the changelog and maybe this is clearer.
> > >
> > > I still find the use of "internal mount" terminology too vague.
> > > "lookup-less filesystems" would have been more accurate,
> >
> > Only it is not really accurate for shmfs anf hugetlbfs, which are
> > not lookup-less, they just hand out un-lookable inodes.
>
> OK, but I still think we are safe setting FMODE_NONOTIFY in
> alloc_file_pseudo() and that covers all the cases we care about. Or did I
> misunderstand something in the discussion? I can see e.g.
> __shmem_file_setup() uses alloc_file_pseudo() but again that seems to be
> used only for inodes without a path and the comment before d_alloc_pseudo()
> pretty clearly states this should be the case.
>
> So is the dispute here really only about how to call files using
> d_alloc_pseudo()?
>

Yes, semantics, no technical dispute on the patch.

> > > because as you correctly point out, the user API to set a watch
> > > requires that the marked object is looked up in the filesystem.
> > >
> > > There are also some kernel internal users that set watches
> > > like audit and nfsd, but I think they are also only interested in
> > > inodes that have a path at the time that the mark is setup.
> > >
> >
> > FWIW I verified that watches can be set on anonymous pipes
> > via /proc/XX/fd, so if we are going to apply this patch, I think it
> > should be accompanied with a complimentary patch that forbids
> > setting up a mark on these sort of inodes. If someone out there
> > is doing this, at least they would get a loud message that something
> > has changed instead of silently dropping fsnotify events.
> >
> > So now the question is how do we identify/classify "these sort of
> > inodes"? If they are no common well defining characteristics, we
> > may need to blacklist pipes sockets and anon inodes explicitly
> > with S_NONOTIFY.
>
> We already do have FS_DISALLOW_NOTIFY_PERM in file_system_type->fs_flags so
> adding FS_DISALLOW_NOTIFY would be natural if there is a need for this.

Yes, it is possible, but for the specified use case, it is probably easier
to classify by inode type (and maybe IS_ROOT()) than by filesystem type.
Also, in the case of shmem, the same file_system_type is used for user
mountable tmpfs and the kernel internal shm_mnt instance - only the
latter is used for handing out anonymous shmem files.

>
> I don't think using fsnotify on pipe inodes is sane in any way. You'd
> possibly only get the MODIFY or ACCESS events and even those would not be
> quite reliable because with pipes stuff like splicing etc. is much more
> common and that currently completely bypasses fsnotify subsystem. So
> overall I'm fine with completely ignoring fsnotify on such inodes.
>

Agreed for MODIFY ACCESS. Not so sure about other events.
I see that nfsd filecache backend only marks regular files, so that's fine.
I *think* audit only marks directories and exe files, but completely unsure.

Maybe there is no need to optimize out special inodes from all events
and only exclude them from MODIFY/ACCESS, which are the only
events where performance may be a concern?
Or maybe you did not mean to skip events on special inodes in general?

I am not sure how important OPEN events are on special inodes, but
it is scary to stop sending OPEN_PERM events.

Do you agree that we should also actively disallow setting a mark
on special disconnected inodes? instead of silently dropping events
that current kernel does deliver?

We could disallow setting a mark on a disconnected inode
(one that user is trying to configure by using a /proc/$pid/fd/X path).
We can enforce this restriction for all backends in the common helper
fsnotify_add_mark_locked().

Thanks,
Amir.

  reply	other threads:[~2020-06-15 11:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-12  9:26 [PATCH] fs: Do not check if there is a fsnotify watcher on pseudo inodes Mel Gorman
2020-06-12  9:52 ` Amir Goldstein
2020-06-12 13:18   ` Mel Gorman
2020-06-12 14:52     ` Amir Goldstein
2020-06-12 20:34       ` Amir Goldstein
2020-06-15  8:12         ` Jan Kara
2020-06-15 11:07           ` Amir Goldstein [this message]
2020-06-15 12:02         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxibBL7W0DR4RhC5ZP7Y68ob5Xj6VncWjepgOj=RB8jrGQ@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).