All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Ian Kent" <raven@themaw.net>, "Mickaël Salaün" <mic@digikod.net>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	syzkaller <syzkaller@googlegroups.com>,
	"Kostya Serebryany" <kcc@google.com>,
	"Alexander Potapenko" <glider@google.com>,
	"Sasha Levin" <sasha.levin@oracle.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"David Howells" <dhowells@redhat.com>
Subject: Re: fs: NULL deref in atime_needs_update
Date: Mon, 29 Feb 2016 16:54:54 +0100	[thread overview]
Message-ID: <CACT4Y+bZ0sUi1U1xxA257MaL4oPR4GDUwMhBaWX_JavQ5bMYMQ@mail.gmail.com> (raw)
In-Reply-To: <20160229130924.GV17997@ZenIV.linux.org.uk>

On Mon, Feb 29, 2016 at 2:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Feb 28, 2016 at 08:01:01PM +0000, Al Viro wrote:
>> On Sun, Feb 28, 2016 at 05:01:34PM +0000, Al Viro wrote:
>>
>> > Erm...  What's to order ->d_inode and ->d_flags fetches there?  David?
>> > Looks like the barrier in d_is_negative() is on the wrong side of fetch.
>> > Confused...
>>
>> OK, as per David's suggestion, let's flip them around, bringing the
>> barrier in d_is_negative() between them.  Dmitry, could you try this on
>> top of mainline?  Again, it's until the first warning.
>
> Hmm...  Reordering is definitely wrong, but what I really wonder is if
> dentry_rcuwalk_invalidate() is right outside of __d_drop().  IOW, is
> it right in __d_instantiate() and dentry_unlink_inode()?  The code
> dealing with ->d_flags in RCU mode is more interested in coherency between
> ->d_flags and ->d_inode and it looks like we'd need *two* increments -
> even-to-odd before updating both and odd-to-even after both are in sync.
> The more I look at the situation with d_is_...() wrt barriers and ->d_seq,
> the less I understand it; outside of RCU mode we don't really need the
> barriers for that stuff and in RCU mode ->d_flags handling had been
> a serious headache all along...
>
> I'm tempted to do as below; the amount of smp_wmb() remains the same and
> so's the amount of stores (splitting that += 2 in two doesn't affect that -
> we dirty the same cacheline before and after anyway).  OTOH, that would
> mean that ->d_seq match guarantees ->d_flags and ->d_inode being in sync.
> And I suspect that we could drop _read_ barriers in d_is_...() after that;
> in non-RCU mode we don't give a damn anyway and in RCU one ->d_seq check
> would provide one; it doesn't really matter on x86, but smp_rmb() may be
> costly.  Splitting ->d_seq increments *does* matter on x86 wrt correctness;
> in-between state becomes guaranteed ->d_seq mismatch and that just might
> be it.  Dmitry, could you add this on top of the previous patch?


Regardless of whether reordering is wrong or not, do we see how it can
fix the WARNINGs/oopses? Because it does seem to. I've tried to revert
just this part:

-               *inode = d_backing_inode(dentry);
                negative = d_is_negative(dentry);
+               *inode = d_backing_inode(dentry);

And got:

[  976.609688] WARNING: CPU: 0 PID: 12126 at fs/namei.c:1587
lookup_fast+0x3fa/0x450()
[  976.626768] WARNING: CPU: 0 PID: 12126 at fs/namei.c:3123
path_openat+0x12bc/0x1520()

in 15 minutes.

In particular, applying this on top the previous patch will be
inconclusive, because I already don't see the warnings.



> David, Linus, do you see any problems with that?  To me it looks saner
> that way and as cheap as the current code, but I might be missing something
> here...
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 92d5140..2c08cce 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -279,7 +279,6 @@ static inline void __d_set_inode_and_type(struct dentry *dentry,
>         unsigned flags;
>
>         dentry->d_inode = inode;
> -       smp_wmb();
>         flags = READ_ONCE(dentry->d_flags);
>         flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>         flags |= type_flags;
> @@ -300,7 +299,6 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
>
>         flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>         WRITE_ONCE(dentry->d_flags, flags);
> -       smp_wmb();
>         dentry->d_inode = NULL;
>  }
>
> @@ -370,9 +368,11 @@ static void dentry_unlink_inode(struct dentry * dentry)
>         __releases(dentry->d_inode->i_lock)
>  {
>         struct inode *inode = dentry->d_inode;
> +
> +       raw_write_seqcount_begin(&dentry->d_seq);
>         __d_clear_type_and_inode(dentry);
>         hlist_del_init(&dentry->d_u.d_alias);
> -       dentry_rcuwalk_invalidate(dentry);
> +       raw_write_seqcount_end(&dentry->d_seq);
>         spin_unlock(&dentry->d_lock);
>         spin_unlock(&inode->i_lock);
>         if (!inode->i_nlink)
> @@ -1758,8 +1758,9 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
>         spin_lock(&dentry->d_lock);
>         if (inode)
>                 hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
> +       raw_write_seqcount_begin(&dentry->d_seq);
>         __d_set_inode_and_type(dentry, inode, add_flags);
> -       dentry_rcuwalk_invalidate(dentry);
> +       raw_write_seqcount_end(&dentry->d_seq);
>         spin_unlock(&dentry->d_lock);
>         fsnotify_d_instantiate(dentry, inode);
>  }

  reply	other threads:[~2016-02-29 15:55 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-05 21:11 fs: NULL deref in atime_needs_update Dmitry Vyukov
2016-02-16 23:40 ` Mickaël Salaün
2016-02-19 19:32   ` Dmitry Vyukov
2016-02-20  3:21     ` Al Viro
2016-02-20  3:54       ` Al Viro
2016-02-20  3:54         ` Al Viro
2016-02-20 13:25         ` Mickaël Salaün
2016-02-20 17:10           ` Al Viro
2016-02-20 17:10             ` Al Viro
2016-02-20 20:26             ` Mickaël Salaün
2016-02-20 20:50               ` Al Viro
2016-02-20 20:50                 ` Al Viro
2016-02-22 11:20             ` Dmitry Vyukov
2016-02-22 17:23               ` Al Viro
2016-02-23 15:34                 ` Dmitry Vyukov
2016-02-23 18:17                   ` Al Viro
2016-02-20 10:36       ` Dmitry Vyukov
2016-02-24  3:12   ` Ian Kent
2016-02-24  4:46     ` Al Viro
2016-02-24  4:46       ` Al Viro
2016-02-24 10:03       ` Dmitry Vyukov
2016-02-24 10:15         ` Dmitry Vyukov
2016-02-24 13:35           ` Dmitry Vyukov
2016-02-24 15:15             ` Al Viro
2016-02-25  8:29               ` Dmitry Vyukov
2016-02-25 16:39                 ` Al Viro
2016-02-26 21:21                   ` Al Viro
2016-02-26 21:25                     ` Dmitry Vyukov
2016-02-26 22:07                       ` Al Viro
2016-02-26 22:07                         ` Al Viro
2016-02-27 22:27                         ` Al Viro
2016-02-27 22:27                           ` Al Viro
2016-02-28 15:43                           ` Dmitry Vyukov
2016-02-28 16:04                             ` Dmitry Vyukov
2016-02-28 17:01                               ` Al Viro
2016-02-28 20:01                                 ` Al Viro
2016-02-29  9:38                                   ` Dmitry Vyukov
2016-02-29 12:34                                     ` Dmitry Vyukov
2016-02-29 16:11                                       ` Al Viro
2016-02-29 13:09                                   ` Al Viro
2016-02-29 15:54                                     ` Dmitry Vyukov [this message]
2016-02-29 16:19                                       ` Al Viro
2016-02-29 18:19                                         ` Dmitry Vyukov
2016-03-01  8:59                                           ` Dmitry Vyukov
2016-02-29 16:45                                     ` Linus Torvalds
2016-02-29 16:50                                       ` Al Viro
2016-02-29 17:20                                         ` Al Viro
2016-02-29 17:24                                         ` Linus Torvalds
2016-02-29 13:43                                   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT4Y+bZ0sUi1U1xxA257MaL4oPR4GDUwMhBaWX_JavQ5bMYMQ@mail.gmail.com \
    --to=dvyukov@google.com \
    --cc=dhowells@redhat.com \
    --cc=glider@google.com \
    --cc=kcc@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mic@digikod.net \
    --cc=raven@themaw.net \
    --cc=sasha.levin@oracle.com \
    --cc=syzkaller@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.