linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: syzbot <syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Subject: Re: KASAN: use-after-free Read in path_lookupat
Date: Mon, 25 Mar 2019 14:45:00 -0700	[thread overview]
Message-ID: <CAHk-=wj+-e=X9cvvNwc5+QuvBOyYT7OtZTBzy-Wg8zGrUwxSbw@mail.gmail.com> (raw)
In-Reply-To: <20190325211405.GP2217@ZenIV.linux.org.uk>

On Mon, Mar 25, 2019 at 2:14 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Maybe, but we really need to come up with sane documentation on the
> entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode
> group ;-/

Yeah.

I actually think the "destroy_inode/rcu_destroy_inode" part is the
simplest one to understand: it's just tearing down the inode, and the
RCU part is (obviously, as shown by this thread) somewhat subtle, but
at the same time not really all that hard to explain: "inodes that can
be looked up through RCU lookup need to be destroyed with RCU delay".

I think drop_inode->evict_inode is arguably the much more subtle
stuff. That's the "we're not destroying things, we're making decisions
about the state" kind of thing.

And we actually don't have any documentation at all for those two.
Well, there's the "porting" thing, but..

> And I want to understand the writeback-related issues
> in ocfs2 and f2fs - the current kludges in those smell fishy.

I'm assuming you're talking about exactly that drop_inode() kind of subtlety.

At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out
would simplify things a lot. *All* that the ocfs2_destroy_inode()
function does is to schedule the inode freeing for RCU delay.

Assuming my patch is fine (as mentioned, it was entirely untested),
that patch would actually simplify that a lot. Get rid of
ocfs2_destroy_inode() entirely, and just make
ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up
(we have that ocfs2_rcu_destroy_inode() currently as
ocfs2_i_callback(), but with the ugly rcu-head interface).

> >       if (unlikely(inode_init_always(sb, inode))) {
> > -             if (inode->i_sb->s_op->destroy_inode)
> > +             if (inode->i_sb->s_op->destroy_inode) {
> >                       inode->i_sb->s_op->destroy_inode(inode);
> > -             else
> > +                     if (!inode->i_sb->s_op->rcu_destroy_inode)
> > +                             return NULL;
> > +             }
> > +             if (!inode->i_sb->s_op->rcu_destroy_inode ||
> > +                 !inode->i_sb->s_op->rcu_destroy_inode(inode))
> >                       kmem_cache_free(inode_cachep, inode);
>
> ITYM  i_callback(inode); here, possibly suitably renamed.

Yeah, I guess we could just call that i_callback() directly. I wrote
it that way because it was how the code was organized (it didn't call
i_callback() before either), but yes, it woudl avoid some duplicate
code to do it that way.

And yes, at that point we'd probably want to rename it too.

On the whole, looking at a few filesystems, I really think that
'rcu_destroy_inode()" would simplify several of them.  That ocfs2 case
I already mentioned, but looking around the exact same is trye in
v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs,
openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs,
gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs)

And in other filesystems that actually *do* have a real
destroy_inode() that does other things too, they still want the rcu
delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs,
ext4, jfs, organgefs, ovl, ubifs).

In fact, every single filesystem I looked at (and I looked at most)
would be simplified by that patch.

And for some it would make it easier to fix bugs in the process (ie
bpf) because they don't currently have that rcu delay and they need
it.

So looking at all the existing users of destroy_inode(), I really do
think we should have that rcu_destroy_inode() callback. Because that

 3 files changed, 28 insertions(+), 6 deletions(-)

will allow quite a lot of lines to be removed from low-level
filesystems with all the call_rcu() and callback container_of(head,
struct inode, i_rcu) noise just going away.

But yes, it would be good to have more documentation fo the drop/evict
phase. As mentioned, I think the destroy phase is actually the easy
one, with the RCU part being the most complex one that this would
actually simplify.

                   Linus

  reply	other threads:[~2019-03-25 21:45 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28 17:40 KASAN: use-after-free Read in path_lookupat syzbot
2019-03-25  0:44 ` syzbot
2019-03-25  1:25   ` Linus Torvalds
2019-03-25  1:23 ` Linus Torvalds
2019-03-25  4:57   ` Al Viro
2019-03-25  9:15     ` Daniel Borkmann
2019-03-25 11:11       ` Al Viro
2019-03-25 11:17         ` Al Viro
2019-03-25 11:21           ` Daniel Borkmann
2019-03-25 18:36     ` Linus Torvalds
2019-03-25 19:18       ` Linus Torvalds
2019-03-25 21:14         ` Al Viro
2019-03-25 21:45           ` Linus Torvalds [this message]
2019-03-25 22:04             ` Daniel Borkmann
2019-03-25 22:13               ` Linus Torvalds
2019-03-25 22:41                 ` Daniel Borkmann
2019-03-25 22:49               ` Al Viro
2019-03-25 23:37             ` Al Viro
2019-03-25 23:44               ` Alexei Starovoitov
2019-03-26  0:21                 ` Al Viro
2019-03-26  1:38               ` ceph: fix use-after-free on symlink traversal Al Viro
2019-03-26  1:39                 ` jffs2: " Al Viro
2019-03-26  1:40                 ` ubifs: " Al Viro
2019-03-26  1:43                 ` debugfs: " Al Viro
2019-03-26 10:41                 ` ceph: " Jeff Layton
2019-03-26 11:38                 ` Ilya Dryomov
2019-03-26  1:45               ` KASAN: use-after-free Read in path_lookupat Al Viro
2019-04-10 18:11                 ` Al Viro
2019-04-10 19:44                   ` Linus Torvalds
2019-03-25 19:43       ` Al Viro
2019-03-25 22:48         ` Dave Chinner
2019-03-25 23:02           ` Al Viro
     [not found]             ` <CAGe7X7mb=gK7zhSwmT_6mmmkcbjhZAOb=wj31BdUcHkNUPsm2Q@mail.gmail.com>
2019-03-26  4:15               ` Al Viro
2019-03-27 16:58                 ` Jan Kara
2019-03-27 18:59                   ` Al Viro
2019-03-28  9:00                     ` Jan Kara
2019-03-27 17:22             ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj+-e=X9cvvNwc5+QuvBOyYT7OtZTBzy-Wg8zGrUwxSbw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).