linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Miklos Szeredi <mszeredi@suse.cz>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]
Date: Fri, 30 May 2014 07:49:22 +0100	[thread overview]
Message-ID: <20140530064922.GQ18016@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFxGYDX6PTcMjYO3bsXTj+M0yF76Xtf_BCPmU+DrZ9hfzQ@mail.gmail.com>

On Thu, May 29, 2014 at 10:00:49PM -0700, Linus Torvalds wrote:

> Yeah, I think that would be good. Except I think you should create a
> "dentry_is_dead()" helper function that then has that "if you hold the
> dentry or parent lock, this is safe" comment, because for lockref in
> general you do need to have the lock in the lockref itself. The fact
> that dentries have more locking is very much dentry-specific.

With how many callers?  There literally are only two places where we look
at that bit; one of them very tempting to convert to ->d_lockref.count != 0,
since reactions to positive and negative are very similar.  We also have
that bogus BUG_ON() you've mentioned (that one simply should die) and only
one place where we check it for being negative - autofs4_lookup_active().
And that one is better dealt with by taking removal from their active
list from ->d_release() to ->d_prune() (if not turning their ->d_release()
into ->d_prune() wholesale) and making ->d_prune() called for hashed and
unhashed alike (the only instance *already* checks for d_unhashed() and does
nothing in that case; no need to check that in fs/dcache.c).  With that done,
the check will be gone - all it does is filtering out the ones that are
already on the way out, but still hadn't reached ->d_release()).

IOW, it's not a widely used functionality and it's really not something
that should be ever needed outside of fs/dcache.c.  And in fs/dcache.c
we have one call site, so I'm not sure if even mentioning __lockref_not_dead()
would make much sense - (int)child->d_lockref.count < 0 might be better,
along with a comment about ->d_parent->d_lock serializing it against
lockref_mark_dead() in __dentry_kill() just as well as ->d_lock would...

Note that the only reason why autofs is playing those games is that they
keep references to dentries that do not contribute to refcount, rip them
out when dentry is killed and do that in the wrong method, which opens the
window when ->d_lock is already dropped and ->d_release() is inevitable
but yet to be called.  Solution: rip those references out before dropping
->d_lock, which is what ->d_prune() gives us.  To be fair, that code
predates ->d_prune() by several years (Jul 2008 and Oct 2011, resp.)

And "vfs: call d_op->d_prune() before unhashing dentry" has added very
odd checks for !d_unhashed(), despite ceph ->d_prune() being an explicit
no-op in other cases...

While we are at it, what the devil is d_prune_aliases() doing?  OK, we
grab ->d_lock and see that refcount is 0.  Then we call ->d_prune(),
bump refcount to 1, forcibly unhash the sucker, drop ->d_lock and ->i_lock
and call dput().  Which seems to be far too convoluted...  AFAICS, what we
want to do is
                spin_lock(&dentry->d_lock);
                if (!dentry->d_lockref.count) {
			parent = lock_parent(dentry);
			if (likely(!dentry->d_lockref.count)) {
				__dentry_kill(dentry);
				goto restart;
			}
			if (parent)
				spin_unlock(&parent->d_lock);
		}
		spin_unlock(&dentry->d_lock);
(which means that pulling ->i_lock trylock into __dentry_kill() is
probably not a good plan, more's the pity...)  And there goes this second
call site of ->d_prune() - after that it would be called exactly in one place,
right after __dentry_kill() has done lockref_mark_dead().  The whole reason
for calling it there was that forcible unhash used to force dput() to kill
the sucker has a side effect of messing ceph ->d_prune()...

  reply	other threads:[~2014-05-30  6:49 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-26  9:37 fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667] Mika Westerberg
2014-05-26 13:57 ` Al Viro
2014-05-26 14:29   ` Mika Westerberg
2014-05-26 15:27     ` Al Viro
2014-05-26 16:42       ` Al Viro
2014-05-26 18:17       ` Linus Torvalds
2014-05-26 18:26         ` Al Viro
2014-05-26 20:24           ` Linus Torvalds
2014-05-27  1:40             ` Al Viro
2014-05-27  3:14               ` Al Viro
2014-05-27  4:00                 ` Al Viro
2014-05-27  7:04                   ` Mika Westerberg
2014-05-28  3:19                     ` Al Viro
2014-05-28  7:37                       ` Mika Westerberg
2014-05-28 11:57                         ` Al Viro
2014-05-28 13:11                           ` Mika Westerberg
2014-05-28 14:19                             ` Al Viro
2014-05-28 18:39                               ` Al Viro
2014-05-28 19:43                                 ` Linus Torvalds
2014-05-28 20:02                                   ` Linus Torvalds
2014-05-28 20:25                                     ` Al Viro
2014-05-29 10:42                                     ` Mika Westerberg
2014-05-28 20:14                                   ` Al Viro
2014-05-28 21:11                                     ` Linus Torvalds
2014-05-28 21:28                                       ` Al Viro
2014-05-29  3:11                                 ` Al Viro
2014-05-29  3:52                                   ` Al Viro
2014-05-29  5:34                                     ` Al Viro
2014-05-29 10:51                                       ` Mika Westerberg
2014-05-29 11:04                                         ` Mika Westerberg
2014-05-29 13:30                                           ` Al Viro
2014-05-29 14:56                                             ` Mika Westerberg
2014-05-29 15:10                                             ` Linus Torvalds
2014-05-29 15:44                                               ` Al Viro
2014-05-29 16:23                                                 ` Al Viro
2014-05-29 16:29                                                   ` Linus Torvalds
2014-05-29 16:53                                                     ` Al Viro
2014-05-29 18:52                                                       ` Al Viro
2014-05-29 19:14                                                         ` Linus Torvalds
2014-05-30  4:50                                                           ` Al Viro
2014-05-30  5:00                                                             ` Linus Torvalds
2014-05-30  6:49                                                               ` Al Viro [this message]
2014-05-30  8:12                                                         ` Mika Westerberg
2014-05-30 15:21                                                           ` Al Viro
2014-05-30 15:31                                                             ` Linus Torvalds
2014-05-30 16:48                                                               ` [git pull] " Al Viro
2014-05-30 17:14                                                                 ` Al Viro
2014-05-31 14:18                                                                   ` Josh Boyer
2014-05-31 14:48                                                                     ` Linus Torvalds
2014-05-31 14:58                                                                       ` Josh Boyer
2014-05-31 16:12                                                                       ` Josh Boyer
2014-05-30 17:15                                                                 ` Sedat Dilek
2014-05-29  4:21                                   ` Linus Torvalds
2014-05-29  5:16                                     ` Al Viro
2014-05-29  5:26                                       ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140530064922.GQ18016@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mszeredi@suse.cz \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).