All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Jan Kara <jack@suse.cz>, Miklos Szeredi <miklos@szeredi.hu>,
	Eric Paris <eparis@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: fsnotify_mark_srcu wtf?
Date: Wed, 9 Nov 2016 12:10:05 +0100	[thread overview]
Message-ID: <20161109111005.GA32353@quack2.suse.cz> (raw)
In-Reply-To: <CAOQ4uxhC_ZWB4YhubF15L2xLB0Eapec2SijxdUVYhVSDNHex7w@mail.gmail.com>

On Sun 06-11-16 08:45:54, Amir Goldstein wrote:
> On Sat, Nov 5, 2016 at 11:34 PM, Jan Kara <jack@suse.cz> wrote:
> > On Wed 02-11-16 23:09:26, Miklos Szeredi wrote:
> >> We've got a report where a fanotify daemon that implements permission checks
> >> screws up and doesn't send a reply.  This then causes widespread hangs due to
> >> fsnotify_mark_srcu read side lock being held and thus causing synchronize_srcu()
> >> called from e.g. inotify_release()-> fsnotify_destroy_group()->
> >> fsnotify_mark_destroy_list() to block.
> >
> > Yes. But if a program implementing permission checks does not reply, your
> > system is likely hosed anyway. We can only try to somewhat limit the
> > damage...
> >
> 
> That was my initial thought as well, but at least with the sample code
> Miklos sent
> the only thing that gets hosed is the one process watching that one file.
> You could think of a use case of fanotify being used to watch over files
> in a specific user directory, where the damage on the entire system
> should/could be limited. No?

Yes, the damage could at least theoretically be limited only to those files
/ dirs watched by the buggy process.

> >> Below program demonstrates the issue.  It should output a single line:
> >>
> >> close(inotify_fd): success
> >>
> >> Instead it outputs nothing, which means that close(inotify_fd) got blocked by
> >> the waiting permission event.
> >>
> >> Wouldn't making the srcu per-group fix this?  Would that be too expensive?
> >
> > Per-group would be IMHO too expensive. You can have lots of groups and I'm
> > not sure srcu would scale to that. Furthermore the SRCU protects the list
> > of groups that need to get notification so it would not even be easily
> > possible. Also Amir's solution is buggy - I'll comment on that as a reply
> > to his patch. I'll try to find something to improve the situation but so
> > far I have no good idea...
> >
> 
> Yes, very much buggy indeed :/
> Anyway, the reason I drafted it quickly was to highlight the fact that the
> marks only need to live to the point of decision whether or not the event
> should be sent to the group and afterwards, its sufficient to grab the
> group reference, without having impact on the entire system.

Yes, fanotify code as such does not need the marks anymore. But the core
fsnotify code does...

> Yet another possible ugly (but less buggy) solution would be
> to iterate all marks under SRCU read protection.
> If any group is about to block (either by suggested return value
> EAGAIN or another
> by using a new op should_handle_event_deferred), defer event handling to post
> marks iteration, by keeping a few group references on stack.

And this does not work as well... Fanotify must notify groups by their
priority so you cannot arbitrarily reorder ordering in which groups get
notified. I'm currently pondering on using mark refcount to pin it when
processing permission event but there are still some details to check.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-11-09 11:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 22:09 fsnotify_mark_srcu wtf? Miklos Szeredi
2016-11-03 10:25 ` Amir Goldstein
2016-11-05 21:43   ` Jan Kara
2016-11-05 21:34 ` Jan Kara
2016-11-06  6:45   ` Amir Goldstein
2016-11-09 11:10     ` Jan Kara [this message]
2016-11-09 18:26       ` Amir Goldstein
2016-11-10 19:46         ` Jan Kara
2016-11-10 20:02           ` Amir Goldstein
2016-11-10 20:44           ` Miklos Szeredi
2016-11-10 22:41             ` Amir Goldstein
2016-11-13 18:43           ` Amir Goldstein
2016-11-14 11:59             ` Amir Goldstein
2016-12-02  8:26           ` Miklos Szeredi
2016-12-02 10:48             ` Jan Kara
2016-12-02 11:02               ` Miklos Szeredi
2016-12-06 17:07                 ` Jan Kara
2016-12-02 11:41               ` Amir Goldstein
2016-12-02 11:57                 ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161109111005.GA32353@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=amir73il@gmail.com \
    --cc=eparis@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.