From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C0ABC43381 for ; Mon, 25 Mar 2019 21:45:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3C4AC20693 for ; Mon, 25 Mar 2019 21:45:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553550322; bh=kB9igm/FCAW80YnJMskcdFGh4XZxwEvYpPqFQaP+aBo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=R5da/xC9HwegltTfzby9BY5auiHqXIOlN8QXNbJwFzu2ZF5nS9B1LGOe+W8NsAdgZ gslWi5/5iKenNHJA/x075fr8z0HNGoArpsR4a4GXPpV1LCIzivXIRFOhnZopr1KBpN CHQMAUPkm5g/9t4SZmiM8SNAxfkew9140JCmrP30= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729127AbfCYVpV (ORCPT ); Mon, 25 Mar 2019 17:45:21 -0400 Received: from mail-lf1-f66.google.com ([209.85.167.66]:39410 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728912AbfCYVpV (ORCPT ); Mon, 25 Mar 2019 17:45:21 -0400 Received: by mail-lf1-f66.google.com with SMTP id m13so7127490lfb.6 for ; Mon, 25 Mar 2019 14:45:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RyrGMeVONhvfWoKfN08XEzZDL45w36KigWKOTw1mc3I=; b=Uz0aIzxx/3EgWDjsNP/gnfIPI2LVbLni/Kmq7pLXtN7Ip3sVQMgxHR3LDF4zfNwEbJ eOAKhAIb+QUbOmu4vwn04d+2RgFG7hT04bLNwtW22w9kkJKfQ46cFKG2SwvQnIOV1Xyf /homDzyLih6emk2h3yM48/WKn/oFK/YFsuKek= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RyrGMeVONhvfWoKfN08XEzZDL45w36KigWKOTw1mc3I=; b=cONNOqXxySBlYoPbSmXmXt5znTrMkyStqxU1loiJ6VzNElJ0QeV4T5zJZ2TJcT9LpW XzXH1cRBmeMF/RsFTsSc9Spntm93B+12LwvWnOUmX1kefnTBJMfG2wyYXJV1Oq4wUtH6 rvrJfs6HMVC8kKdFAWCKcdpuWCSUg55oYq+ti0ZZyOY4xzL0xRUaKTYHE1VSXP5yt61K SVfamZmDZ63vKEHkin7t6tghX2BOd14Q61PRTb4RwUtaWStegfVTvGnqC7LM475ibmvL BwFBludEEKehuCMrs3YP9kqGPSRnUBeYDzmtOBjOc9KR/8DWwoy1kRWwYn4bYbJ+gTtY c8FA== X-Gm-Message-State: APjAAAUmd/63YlTJbPc+JuDX9aXLrYVUfYDDMoEWthGFtFkohWDAgfnD nXCl4pzHz/jkDuYKyoSOt2IsSbZ/paA= X-Google-Smtp-Source: APXvYqzXj5vvQ263ldPwPHOgdBQ7Vg17DLY3cuMY5rkT0aEzqZaW2IdlSuBYYqFVEpqmG8tgR47PWQ== X-Received: by 2002:a19:48c8:: with SMTP id v191mr13501866lfa.53.1553550317753; Mon, 25 Mar 2019 14:45:17 -0700 (PDT) Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com. [209.85.208.180]) by smtp.gmail.com with ESMTPSA id l4sm3244248lfp.29.2019.03.25.14.45.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 14:45:16 -0700 (PDT) Received: by mail-lj1-f180.google.com with SMTP id r24so8194190ljg.3 for ; Mon, 25 Mar 2019 14:45:16 -0700 (PDT) X-Received: by 2002:a2e:960b:: with SMTP id v11mr4509575ljh.135.1553550316377; Mon, 25 Mar 2019 14:45:16 -0700 (PDT) MIME-Version: 1.0 References: <0000000000006946d2057bbd0eef@google.com> <20190325045744.GK2217@ZenIV.linux.org.uk> <20190325211405.GP2217@ZenIV.linux.org.uk> In-Reply-To: <20190325211405.GP2217@ZenIV.linux.org.uk> From: Linus Torvalds Date: Mon, 25 Mar 2019 14:45:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: KASAN: use-after-free Read in path_lookupat To: Al Viro Cc: syzbot , Alexei Starovoitov , Daniel Borkmann , linux-fsdevel , Linux List Kernel Mailing , syzkaller-bugs Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Mar 25, 2019 at 2:14 PM Al Viro wrote: > > Maybe, but we really need to come up with sane documentation on the > entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode > group ;-/ Yeah. I actually think the "destroy_inode/rcu_destroy_inode" part is the simplest one to understand: it's just tearing down the inode, and the RCU part is (obviously, as shown by this thread) somewhat subtle, but at the same time not really all that hard to explain: "inodes that can be looked up through RCU lookup need to be destroyed with RCU delay". I think drop_inode->evict_inode is arguably the much more subtle stuff. That's the "we're not destroying things, we're making decisions about the state" kind of thing. And we actually don't have any documentation at all for those two. Well, there's the "porting" thing, but.. > And I want to understand the writeback-related issues > in ocfs2 and f2fs - the current kludges in those smell fishy. I'm assuming you're talking about exactly that drop_inode() kind of subtlety. At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out would simplify things a lot. *All* that the ocfs2_destroy_inode() function does is to schedule the inode freeing for RCU delay. Assuming my patch is fine (as mentioned, it was entirely untested), that patch would actually simplify that a lot. Get rid of ocfs2_destroy_inode() entirely, and just make ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up (we have that ocfs2_rcu_destroy_inode() currently as ocfs2_i_callback(), but with the ugly rcu-head interface). > > if (unlikely(inode_init_always(sb, inode))) { > > - if (inode->i_sb->s_op->destroy_inode) > > + if (inode->i_sb->s_op->destroy_inode) { > > inode->i_sb->s_op->destroy_inode(inode); > > - else > > + if (!inode->i_sb->s_op->rcu_destroy_inode) > > + return NULL; > > + } > > + if (!inode->i_sb->s_op->rcu_destroy_inode || > > + !inode->i_sb->s_op->rcu_destroy_inode(inode)) > > kmem_cache_free(inode_cachep, inode); > > ITYM i_callback(inode); here, possibly suitably renamed. Yeah, I guess we could just call that i_callback() directly. I wrote it that way because it was how the code was organized (it didn't call i_callback() before either), but yes, it woudl avoid some duplicate code to do it that way. And yes, at that point we'd probably want to rename it too. On the whole, looking at a few filesystems, I really think that 'rcu_destroy_inode()" would simplify several of them. That ocfs2 case I already mentioned, but looking around the exact same is trye in v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs, openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs, gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs) And in other filesystems that actually *do* have a real destroy_inode() that does other things too, they still want the rcu delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs, ext4, jfs, organgefs, ovl, ubifs). In fact, every single filesystem I looked at (and I looked at most) would be simplified by that patch. And for some it would make it easier to fix bugs in the process (ie bpf) because they don't currently have that rcu delay and they need it. So looking at all the existing users of destroy_inode(), I really do think we should have that rcu_destroy_inode() callback. Because that 3 files changed, 28 insertions(+), 6 deletions(-) will allow quite a lot of lines to be removed from low-level filesystems with all the call_rcu() and callback container_of(head, struct inode, i_rcu) noise just going away. But yes, it would be good to have more documentation fo the drop/evict phase. As mentioned, I think the destroy phase is actually the easy one, with the RCU part being the most complex one that this would actually simplify. Linus