From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:35674 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750798AbeEKCSp (ORCPT ); Thu, 10 May 2018 22:18:45 -0400 Date: Fri, 11 May 2018 03:18:43 +0100 From: Al Viro To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds Subject: Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely Message-ID: <20180511021843.GY30522@ZenIV.linux.org.uk> References: <20180510182058.GP30522@ZenIV.linux.org.uk> <20180510225607.GU23861@dastard> <20180511003901.GW30522@ZenIV.linux.org.uk> <20180511013208.GV23861@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180511013208.GV23861@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, May 11, 2018 at 11:32:08AM +1000, Dave Chinner wrote: > i.e. we already have code in xfs_setup_inode() that sets the xfs > inode ILOCK rwsem dir/non-dir lockdep class before the new inode is > unlocked - we could just do the i_rwsem lockdep setup there, too. ... which would suffice - if (S_ISDIR(inode->i_mode)) { struct file_system_type *type = inode->i_sb->s_type; /* Set new key only if filesystem hasn't already changed it */ if (lockdep_match_class(&inode->i_rwsem, &type->i_mutex_key)) { in lockdep_annotate_inode_mutex_key() would make sure that ->i_rwsem will be left alone by unlock_new_inode(). > Then, if we were to factor unlock_new_inode() as Andreas suggested, > we could call __unlock_new_inode() from xfs_finish_inode_setup(). No need - if you set the class in xfs_setup_inode(), you are fine. Said that, hash insertion is also potentially delicate - another ext2/nfsd race from the same pile back in 2008 had been * ext2_new_inode() chooses inumber * open-by-fhandle guesses the inumber and hits ext2_iget(), which inserts a locked in-core inode into icache and proceeds to block reading it from disk. * ext2_new_inode() inserts *its* in-core inode into icache (with the same inumber) and sets the things up, both in-core and on disk * open-by-fhandle is back and sees a good live on-disk inode. It finishes setting the in-core one up and we'd got *TWO* in-core inodes with the same inumber, both hashed, both with dentries, both used by syscalls to do IO. Good times all around - fs corruption is fun. That was fixed by using insert_inode_locked() in ext2_new_inode(), and doing that before the on-disk inode would start looking good. If it came during ext2_iget(), it would've found an in-core inode with that inumber (locked, doomed to be rejected), waited for it to come unlocked, see it unhashed (since ext2_iget() said it was no good) and inserted its in-core inode into hash (after having rechecked that nobody had an in-core inode with the same inumber in there, that is). I'm not familiar enough with XFS icache replacment to tell if anything of that sort is a problem there; might be a non-issue for any number of reasons.