linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] fsnotify: fix softlockups iterating over d_subdirs
Date: Wed, 19 Oct 2022 08:33:34 +0300	[thread overview]
Message-ID: <CAOQ4uxhwFGddgJP5xPYDysoa4GFPYu6Bj7rgHVXTEuZk+QKYQQ@mail.gmail.com> (raw)
In-Reply-To: <87edv44rll.fsf@oracle.com>

On Wed, Oct 19, 2022 at 2:52 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Amir Goldstein <amir73il@gmail.com> writes:
> > On Tue, Oct 18, 2022 at 7:12 AM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >>
> >> Hi Jan, Amir, Al,
> >>
> >> Here's my first shot at implementing what we discussed. I tested it using the
> >> negative dentry creation tool I mentioned in my previous message, with a similar
> >> workflow. Rather than having a bunch of threads accessing the directory to
> >> create that "thundering herd" of CPUs in __fsnotify_update_child_dentry_flags, I
> >> just started a lot of inotifywait tasks:
> >>
> >> 1. Create 100 million negative dentries in a dir
> >> 2. Use trace-cmd to watch __fsnotify_update_child_dentry_flags:
> >>    trace-cmd start -p function_graph -l __fsnotify_update_child_dentry_flags
> >>    sudo cat /sys/kernel/debug/tracing/trace_pipe
> >> 3. Run a lot of inotifywait tasks: for i in {1..10} inotifywait $dir & done
> >>
> >> With step #3, I see only one execution of __fsnotify_update_child_dentry_flags.
> >> Once that completes, all the inotifywait tasks say "Watches established".
> >> Similarly, once an access occurs in the directory, a single
> >> __fsnotify_update_child_dentry_flags execution occurs, and all the tasks exit.
> >> In short: it works great!
> >>
> >> However, while testing this, I've observed a dentry still in use warning during
> >> unmount of rpc_pipefs on the "nfs" dentry during shutdown. NFS is of course in
> >> use, and I assume that fsnotify must have been used to trigger this. The error
> >> is not there on mainline without my patch so it's definitely caused by this
> >> code. I'll continue debugging it but I wanted to share my first take on this so
> >> you could take a look.
> >>
> >> [ 1595.197339] BUG: Dentry 000000005f5e7197{i=67,n=nfs}  still in use (2) [unmount of rpc_pipefs rpc_pipefs]
> >>
> >
> > Hmm, the assumption we made about partial stability of d_subdirs
> > under dir inode lock looks incorrect for rpc_pipefs.
> > None of the functions that update the rpc_pipefs dcache take the parent
> > inode lock.
>
> That may be, but I'm confused how that would trigger this issue. If I'm
> understanding correctly, this warning indicates a reference counting
> bug.

Yes.
On generic_shutdown_super() there should be no more
references to dentries.

>
> If __fsnotify_update_child_dentry_flags() had gone to sleep and the list
> were edited, then it seems like there could be only two possibilities
> that could cause bugs:
>
> 1. The dentry we slept holding a reference to was removed from the list,
> and maybe moved to a different one, or just removed. If that were the
> case, we're quite unlucky, because we'll start looping indefinitely as
> we'll never get back to the beginning of the list, or worse.
>
> 2. A dentry adjacent to the one we held a reference to was removed. In
> that case, our dentry's d_child pointers should get rearranged, and when
> we wake, we should see those updates and continue.
>
> In neither of those cases do I understand where we could have done a
> dget() unpaired with a dput(), which is what seemingly would trigger
> this issue.
>

I got the same impression.

> I'm probably wrong, but without understanding the mechanism behind the
> error, I'm not sure how to approach it.
>
> > The assumption looks incorrect for other pseudo fs as well.
> >
> > The other side of the coin is that we do not really need to worry
> > about walking a huge list of pseudo fs children.
> >
> > The question is how to classify those pseudo fs and whether there
> > are other cases like this that we missed.
> >
> > Perhaps having simple_dentry_operationsis a good enough
> > clue, but perhaps it is not enough. I am not sure.
> >
> > It covers all the cases of pseudo fs that I know about, so you
> > can certainly use this clue to avoid going to sleep in the
> > update loop as a first approximation.
>
> I would worry that it would become an exercise of whack-a-mole.
> Allow/deny-listing certain filesystems for certain behavior seems scary.
>

Totally agree.

> > I can try to figure this out, but I prefer that Al will chime in to
> > provide reliable answers to those questions.
>
> I have a core dump from the warning (with panic_on_warn=1) and will see
> if I can trace or otherwise identify the exact mechanism myself.
>

Most likely the refcount was already leaked earlier, but
worth trying.

>
> Thanks for your detailed review of both the patches. I didn't get much
> time today to update the patches and test them. Your feedback looks very
> helpful though, and I'll hope to send out an updated revision tomorrow.
>
> In the absolute worst case (and I don't want to concede defeat just
> yet), keeping patch 1 without patch 2 (sleepable iteration) would still
> be a major win, since it resolves the thundering herd problem which is
> what compounds problem of the long lists.
>

Makes sense.
Patch 1 logic is solid.

Hope my suggestions won't complicate you too much,
if they do, I am sure Jan will find a way to simplify ;)

Thanks,
Amir.

  reply	other threads:[~2022-10-19  5:33 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-13 22:27 [RFC] fsnotify: allow sleepable child dentry flag update Stephen Brennan
2022-10-13 23:51 ` Al Viro
2022-11-01 21:47   ` Stephen Brennan
2022-10-14  8:01 ` Amir Goldstein
2022-10-17  7:59   ` Stephen Brennan
2022-10-17 11:44     ` Amir Goldstein
2022-10-17 16:59       ` Stephen Brennan
2022-10-17 17:42         ` Amir Goldstein
2022-10-17  9:09   ` Jan Kara
2022-10-18  4:12 ` [PATCH 0/2] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-10-18  4:12   ` [PATCH 1/2] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Stephen Brennan
2022-10-18  7:39     ` Amir Goldstein
2022-10-21  0:33       ` Stephen Brennan
2022-10-21  7:22         ` Amir Goldstein
2022-10-18  4:12   ` [PATCH 2/2] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-18  5:36     ` Amir Goldstein
2022-10-27  7:50     ` kernel test robot
2022-10-27  8:44       ` Yujie Liu
2022-10-27 22:12         ` Stephen Brennan
2022-10-18  8:07   ` [PATCH 0/2] fsnotify: fix softlockups iterating over d_subdirs Amir Goldstein
2022-10-18 23:52     ` Stephen Brennan
2022-10-19  5:33       ` Amir Goldstein [this message]
2022-10-27 22:06         ` Stephen Brennan
2022-10-28  8:58           ` Amir Goldstein
2022-10-21  1:03   ` [PATCH v2 0/3] " Stephen Brennan
2022-10-21  1:03     ` [PATCH v2 1/3] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-10-21  9:25       ` Amir Goldstein
2022-10-21  1:03     ` [PATCH v2 2/3] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Stephen Brennan
2022-10-21  4:01       ` kernel test robot
2022-10-21  8:22       ` Amir Goldstein
2022-10-21  9:18         ` Amir Goldstein
2022-10-25 18:02           ` Stephen Brennan
2022-10-26  5:41             ` Amir Goldstein
2022-10-21  9:17       ` Christian Brauner
2022-10-21  9:21         ` Amir Goldstein
2022-10-21  1:03     ` [PATCH v2 3/3] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-28  0:10     ` [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-10-28  0:10       ` [PATCH v3 1/3] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-11-10  1:12         ` Stephen Brennan
2022-10-28  0:10       ` [PATCH v3 2/3] fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem Stephen Brennan
2022-10-28  9:11         ` Amir Goldstein
2022-11-10  0:03         ` kernel test robot
2022-11-10  1:06           ` Stephen Brennan
2022-10-28  0:10       ` [PATCH v3 3/3] fsnotify: allow sleepable child flag update Stephen Brennan
2022-10-28  9:32         ` Amir Goldstein
2022-11-01 21:25           ` Stephen Brennan
2022-11-01 17:51       ` [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs Jan Kara
2022-11-01 20:48         ` Stephen Brennan
2022-11-02  8:55           ` Amir Goldstein
2022-11-10 20:04             ` Stephen Brennan
2022-11-02 17:52           ` Jan Kara
2022-11-04 23:33             ` Stephen Brennan
2022-11-07 11:56               ` Jan Kara
2022-11-11 22:06       ` [PATCH v4 0/5] " Stephen Brennan
2022-11-11 22:06         ` [PATCH v4 1/5] fsnotify: clear PARENT_WATCHED flags lazily Stephen Brennan
2022-11-11 22:06         ` [PATCH v4 2/5] fsnotify: Use d_find_any_alias to get dentry associated with inode Stephen Brennan
2022-11-12  8:53           ` Amir Goldstein
2022-11-11 22:06         ` [PATCH v4 3/5] dnotify: move fsnotify_recalc_mask() outside spinlock Stephen Brennan
2022-11-12  9:06           ` Amir Goldstein
2022-11-11 22:06         ` [PATCH v4 4/5] fsnotify: allow sleepable child flag update Stephen Brennan
2022-11-12 10:00           ` Amir Goldstein
2022-11-15  7:10           ` kernel test robot
2022-11-11 22:06         ` [PATCH v4 5/5] fsnotify: require inode lock held during " Stephen Brennan
2022-11-12  9:42           ` Amir Goldstein
2022-11-11 22:08         ` [PATCH v4 0/5] fsnotify: fix softlockups iterating over d_subdirs Stephen Brennan
2022-11-22 11:50         ` Jan Kara
2022-11-22 14:03           ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxhwFGddgJP5xPYDysoa4GFPYu6Bj7rgHVXTEuZk+QKYQQ@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stephen.s.brennan@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).