From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ADDFC432C1 for ; Tue, 24 Sep 2019 02:52:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 45911207FD for ; Tue, 24 Sep 2019 02:52:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729443AbfIXCwX (ORCPT ); Mon, 23 Sep 2019 22:52:23 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:41660 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728992AbfIXCwX (ORCPT ); Mon, 23 Sep 2019 22:52:23 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.2 #3 (Red Hat Linux)) id 1iCaw3-000329-O6; Tue, 24 Sep 2019 02:52:15 +0000 Date: Tue, 24 Sep 2019 03:52:15 +0100 From: Al Viro To: Linus Torvalds Cc: "zhengbin (A)" , Jan Kara , Andrew Morton , linux-fsdevel , "zhangyi (F)" , renxudong1@huawei.com, Hou Tao , linux-btrfs@vger.kernel.org, "Yan, Zheng" , linux-cifs@vger.kernel.org, Steve French Subject: Re: [PATCH] Re: Possible FS race condition between iterate_dir and d_alloc_parallel Message-ID: <20190924025215.GA9941@ZenIV.linux.org.uk> References: <20190914170146.GT1131@ZenIV.linux.org.uk> <20190914200412.GU1131@ZenIV.linux.org.uk> <20190915005046.GV1131@ZenIV.linux.org.uk> <20190915160236.GW1131@ZenIV.linux.org.uk> <20190921140731.GQ1131@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190921140731.GQ1131@ZenIV.linux.org.uk> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-cifs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org [btrfs and cifs folks Cc'd] On Sat, Sep 21, 2019 at 03:07:31PM +0100, Al Viro wrote: > No "take cursors out of the list" parts yet. Argh... The things turned interesting. The tricky part is where do we handle switching cursors away from something that gets moved. What I hoped for was "just do it in simple_rename()". Which is almost OK; there are 3 problematic cases. One is shmem - there we have a special ->rename(), which handles things like RENAME_EXCHANGE et.al. Fair enough - some of that might've been moved into simple_rename(), but some (whiteouts) won't be that easy. Fair enough - we can make kicking the cursors outs a helper called by simple_rename() and by that. Exchange case is going to cause a bit of headache (the pathological case is when the entries being exchanged are next to each other in the same directory), but it's not that bad. Two other cases, though, might be serious trouble. Those are btrfs new_simple_dir() and this in cifs_root_iget(): if (rc && tcon->pipe) { cifs_dbg(FYI, "ipc connection - fake read inode\n"); spin_lock(&inode->i_lock); inode->i_mode |= S_IFDIR; set_nlink(inode, 2); inode->i_op = &cifs_ipc_inode_ops; inode->i_fop = &simple_dir_operations; inode->i_uid = cifs_sb->mnt_uid; inode->i_gid = cifs_sb->mnt_gid; spin_unlock(&inode->i_lock); } The trouble is, it looks like d_splice_alias() from a lookup elsewhere might find an alias of some subdirectory in those. And in that case we'll end up with a child of those (dcache_readdir-using) directories being ripped out and moved elsewhere. With no calls of ->rename() in sight, of course, *AND* with only shared lock on the parent. The last part is really nasty. And not just for hanging cursors off the dentries they point to - it's a problem for dcache_readdir() itself even in the mainline and with all the lockless crap reverted. We pass next->d_name.name to dir_emit() (i.e. potentially to copy_to_user()). And we have no warranty that it's not a long (== separately allocated) name, that will be freed while copy_to_user() is in progress. Sure, it'll get an RCU delay before freeing, but that doesn't help us at all. I'm not familiar with those areas in btrfs or cifs; could somebody explain what's going on there and can we indeed end up finding aliases to those suckers?