From: Mike Marshall <hubcap@omnibond.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
Mike Marshall <hubcap@omnibond.com>,
Martin Brandenburg <martin@omnibond.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely
Date: Thu, 10 May 2018 16:44:50 -0400 [thread overview]
Message-ID: <CAOg9mSSRAVi-QWQEBzdBkwgLzHrXDTkQvuLe6bWbe5Bs-nic_w@mail.gmail.com> (raw)
In-Reply-To: <20180510182058.GP30522@ZenIV.linux.org.uk>
I applied your patch to Linux v4.17-rc3 and ran xfstests and saw
no Orangefs regressions.... you can add tested-by: Mike Marshall
if you dare <g> ...
-Mike
On Thu, May 10, 2018 at 2:20 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> [in the spirit of "don't put 'em in without posting for review; the
> this is present in vfs.git#for-linus, if you prefer to look in git.
>
> Background: a bunch of nfsd races fixes from back in 2008 had
> problems with lockdep enabled; in 2012 that got "fixed", unfortunately
> reopening a narrow race window. The patch below does *NOT* fix
> all filesystems, but it does fix most of the exported local ones
> and it is easy to backport, so it makes for a sane starting point.
>
> If anyone has objections, this is your chance to yell.
> ]
>
> For anything NFS-exported we do _not_ want to unlock new inode
> before it has grown an alias; original set of fixes got the
> ordering right, but missed the nasty complication in case of
> lockdep being enabled - unlock_new_inode() does
> lockdep_annotate_inode_mutex_key(inode)
> which can only be done before anyone gets a chance to touch
> ->i_mutex. Unfortunately, flipping the order and doing
> unlock_new_inode() before d_instantiate() opens a window when
> mkdir can race with open-by-fhandle on a guessed fhandle, leading
> to multiple aliases for a directory inode and all the breakage
> that follows from that.
>
> Correct solution: a new primitive (d_instantiate_new())
> combining these two in the right order - lockdep annotate, then
> d_instantiate(), then the rest of unlock_new_inode(). All
> combinations of d_instantiate() with unlock_new_inode() should
> be converted to that.
>
> Cc: stable@kernel.org # 2.6.29 and later
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e064c49c9a9a..9e97cbb4f006 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6575,8 +6575,7 @@ static int btrfs_mknod(struct inode *dir, struct dentry *dentry,
> goto out_unlock_inode;
> } else {
> btrfs_update_inode(trans, root, inode);
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> }
>
> out_unlock:
> @@ -6652,8 +6651,7 @@ static int btrfs_create(struct inode *dir, struct dentry *dentry,
> goto out_unlock_inode;
>
> BTRFS_I(inode)->io_tree.ops = &btrfs_extent_io_ops;
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
>
> out_unlock:
> btrfs_end_transaction(trans);
> @@ -6798,12 +6796,7 @@ static int btrfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> if (err)
> goto out_fail_inode;
>
> - d_instantiate(dentry, inode);
> - /*
> - * mkdir is special. We're unlocking after we call d_instantiate
> - * to avoid a race with nfsd calling d_instantiate.
> - */
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> drop_on_err = 0;
>
> out_fail:
> @@ -10246,8 +10239,7 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry,
> goto out_unlock_inode;
> }
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
>
> out_unlock:
> btrfs_end_transaction(trans);
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 86d2de63461e..6da095fef440 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1899,6 +1899,22 @@ void d_instantiate(struct dentry *entry, struct inode * inode)
> }
> EXPORT_SYMBOL(d_instantiate);
>
> +void d_instantiate_new(struct dentry *entry, struct inode *inode)
> +{
> + BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
> + BUG_ON(!inode);
> + lockdep_annotate_inode_mutex_key(inode);
> + security_d_instantiate(entry, inode);
> + spin_lock(&inode->i_lock);
> + __d_instantiate(entry, inode);
> + WARN_ON(!(inode->i_state & I_NEW));
> + inode->i_state &= ~I_NEW;
> + smp_mb();
> + wake_up_bit(&inode->i_state, __I_NEW);
> + spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(d_instantiate_new);
> +
> /**
> * d_instantiate_no_diralias - instantiate a non-aliased dentry
> * @entry: dentry to complete
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 847904aa63a9..7bba8f2693b2 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -283,8 +283,7 @@ ecryptfs_create(struct inode *directory_inode, struct dentry *ecryptfs_dentry,
> iget_failed(ecryptfs_inode);
> goto out;
> }
> - unlock_new_inode(ecryptfs_inode);
> - d_instantiate(ecryptfs_dentry, ecryptfs_inode);
> + d_instantiate_new(ecryptfs_dentry, ecryptfs_inode);
> out:
> return rc;
> }
> diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c
> index 55f7caadb093..152453a91877 100644
> --- a/fs/ext2/namei.c
> +++ b/fs/ext2/namei.c
> @@ -41,8 +41,7 @@ static inline int ext2_add_nondir(struct dentry *dentry, struct inode *inode)
> {
> int err = ext2_add_link(dentry, inode);
> if (!err) {
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
> }
> inode_dec_link_count(inode);
> @@ -255,8 +254,7 @@ static int ext2_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
> if (err)
> goto out_fail;
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> out:
> return err;
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index b1f21e3a0763..4a09063ce1d2 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2411,8 +2411,7 @@ static int ext4_add_nondir(handle_t *handle,
> int err = ext4_add_entry(handle, dentry, inode);
> if (!err) {
> ext4_mark_inode_dirty(handle, inode);
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
> }
> drop_nlink(inode);
> @@ -2651,8 +2650,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> err = ext4_mark_inode_dirty(handle, dir);
> if (err)
> goto out_clear_inode;
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> if (IS_DIRSYNC(dir))
> ext4_handle_sync(handle);
>
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index d5098efe577c..75e37fd720b2 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -294,8 +294,7 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
>
> alloc_nid_done(sbi, ino);
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
>
> if (IS_DIRSYNC(dir))
> f2fs_sync_fs(sbi->sb, 1);
> @@ -597,8 +596,7 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
> err = page_symlink(inode, disk_link.name, disk_link.len);
>
> err_out:
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
>
> /*
> * Let's flush symlink data in order to avoid broken symlink as much as
> @@ -661,8 +659,7 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
>
> alloc_nid_done(sbi, inode->i_ino);
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
>
> if (IS_DIRSYNC(dir))
> f2fs_sync_fs(sbi->sb, 1);
> @@ -713,8 +710,7 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry,
>
> alloc_nid_done(sbi, inode->i_ino);
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
>
> if (IS_DIRSYNC(dir))
> f2fs_sync_fs(sbi->sb, 1);
> diff --git a/fs/jffs2/dir.c b/fs/jffs2/dir.c
> index 0a754f38462e..e5a6deb38e1e 100644
> --- a/fs/jffs2/dir.c
> +++ b/fs/jffs2/dir.c
> @@ -209,8 +209,7 @@ static int jffs2_create(struct inode *dir_i, struct dentry *dentry,
> __func__, inode->i_ino, inode->i_mode, inode->i_nlink,
> f->inocache->pino_nlink, inode->i_mapping->nrpages);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
>
> fail:
> @@ -430,8 +429,7 @@ static int jffs2_symlink (struct inode *dir_i, struct dentry *dentry, const char
> mutex_unlock(&dir_f->sem);
> jffs2_complete_reservation(c);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
>
> fail:
> @@ -575,8 +573,7 @@ static int jffs2_mkdir (struct inode *dir_i, struct dentry *dentry, umode_t mode
> mutex_unlock(&dir_f->sem);
> jffs2_complete_reservation(c);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
>
> fail:
> @@ -747,8 +744,7 @@ static int jffs2_mknod (struct inode *dir_i, struct dentry *dentry, umode_t mode
> mutex_unlock(&dir_f->sem);
> jffs2_complete_reservation(c);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
>
> fail:
> diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
> index b41596d71858..56c3fcbfe80e 100644
> --- a/fs/jfs/namei.c
> +++ b/fs/jfs/namei.c
> @@ -178,8 +178,7 @@ static int jfs_create(struct inode *dip, struct dentry *dentry, umode_t mode,
> unlock_new_inode(ip);
> iput(ip);
> } else {
> - unlock_new_inode(ip);
> - d_instantiate(dentry, ip);
> + d_instantiate_new(dentry, ip);
> }
>
> out2:
> @@ -313,8 +312,7 @@ static int jfs_mkdir(struct inode *dip, struct dentry *dentry, umode_t mode)
> unlock_new_inode(ip);
> iput(ip);
> } else {
> - unlock_new_inode(ip);
> - d_instantiate(dentry, ip);
> + d_instantiate_new(dentry, ip);
> }
>
> out2:
> @@ -1059,8 +1057,7 @@ static int jfs_symlink(struct inode *dip, struct dentry *dentry,
> unlock_new_inode(ip);
> iput(ip);
> } else {
> - unlock_new_inode(ip);
> - d_instantiate(dentry, ip);
> + d_instantiate_new(dentry, ip);
> }
>
> out2:
> @@ -1447,8 +1444,7 @@ static int jfs_mknod(struct inode *dir, struct dentry *dentry,
> unlock_new_inode(ip);
> iput(ip);
> } else {
> - unlock_new_inode(ip);
> - d_instantiate(dentry, ip);
> + d_instantiate_new(dentry, ip);
> }
>
> out1:
> diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
> index 1a2894aa0194..dd52d3f82e8d 100644
> --- a/fs/nilfs2/namei.c
> +++ b/fs/nilfs2/namei.c
> @@ -46,8 +46,7 @@ static inline int nilfs_add_nondir(struct dentry *dentry, struct inode *inode)
> int err = nilfs_add_link(dentry, inode);
>
> if (!err) {
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> return 0;
> }
> inode_dec_link_count(inode);
> @@ -243,8 +242,7 @@ static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> goto out_fail;
>
> nilfs_mark_inode_dirty(inode);
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> out:
> if (!err)
> err = nilfs_transaction_commit(dir->i_sb);
> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
> index 6e3134e6d98a..1b5707c44c3f 100644
> --- a/fs/orangefs/namei.c
> +++ b/fs/orangefs/namei.c
> @@ -75,8 +75,7 @@ static int orangefs_create(struct inode *dir,
> get_khandle_from_ino(inode),
> dentry);
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> orangefs_set_timeout(dentry);
> ORANGEFS_I(inode)->getattr_time = jiffies - 1;
> ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
> @@ -332,8 +331,7 @@ static int orangefs_symlink(struct inode *dir,
> "Assigned symlink inode new number of %pU\n",
> get_khandle_from_ino(inode));
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> orangefs_set_timeout(dentry);
> ORANGEFS_I(inode)->getattr_time = jiffies - 1;
> ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
> @@ -402,8 +400,7 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
> "Assigned dir inode new number of %pU\n",
> get_khandle_from_ino(inode));
>
> - d_instantiate(dentry, inode);
> - unlock_new_inode(inode);
> + d_instantiate_new(dentry, inode);
> orangefs_set_timeout(dentry);
> ORANGEFS_I(inode)->getattr_time = jiffies - 1;
> ORANGEFS_I(inode)->getattr_mask = STATX_BASIC_STATS;
> diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
> index bd39a998843d..5089dac02660 100644
> --- a/fs/reiserfs/namei.c
> +++ b/fs/reiserfs/namei.c
> @@ -687,8 +687,7 @@ static int reiserfs_create(struct inode *dir, struct dentry *dentry, umode_t mod
> reiserfs_update_inode_transaction(inode);
> reiserfs_update_inode_transaction(dir);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> retval = journal_end(&th);
>
> out_failed:
> @@ -771,8 +770,7 @@ static int reiserfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode
> goto out_failed;
> }
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> retval = journal_end(&th);
>
> out_failed:
> @@ -871,8 +869,7 @@ static int reiserfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
> /* the above add_entry did not update dir's stat data */
> reiserfs_update_sd(&th, dir);
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> retval = journal_end(&th);
> out_failed:
> reiserfs_write_unlock(dir->i_sb);
> @@ -1187,8 +1184,7 @@ static int reiserfs_symlink(struct inode *parent_dir,
> goto out_failed;
> }
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> retval = journal_end(&th);
> out_failed:
> reiserfs_write_unlock(parent_dir->i_sb);
> diff --git a/fs/udf/namei.c b/fs/udf/namei.c
> index 0458dd47e105..c586026508db 100644
> --- a/fs/udf/namei.c
> +++ b/fs/udf/namei.c
> @@ -622,8 +622,7 @@ static int udf_add_nondir(struct dentry *dentry, struct inode *inode)
> if (fibh.sbh != fibh.ebh)
> brelse(fibh.ebh);
> brelse(fibh.sbh);
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
>
> return 0;
> }
> @@ -733,8 +732,7 @@ static int udf_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> inc_nlink(dir);
> dir->i_ctime = dir->i_mtime = current_time(dir);
> mark_inode_dirty(dir);
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> if (fibh.sbh != fibh.ebh)
> brelse(fibh.ebh);
> brelse(fibh.sbh);
> diff --git a/fs/ufs/namei.c b/fs/ufs/namei.c
> index 32545cd00ceb..d5f43ba76c59 100644
> --- a/fs/ufs/namei.c
> +++ b/fs/ufs/namei.c
> @@ -39,8 +39,7 @@ static inline int ufs_add_nondir(struct dentry *dentry, struct inode *inode)
> {
> int err = ufs_add_link(dentry, inode);
> if (!err) {
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
> }
> inode_dec_link_count(inode);
> @@ -193,8 +192,7 @@ static int ufs_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
> if (err)
> goto out_fail;
>
> - unlock_new_inode(inode);
> - d_instantiate(dentry, inode);
> + d_instantiate_new(dentry, inode);
> return 0;
>
> out_fail:
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 94acbde17bb1..66c6e17e61e5 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -224,6 +224,7 @@ extern seqlock_t rename_lock;
> * These are the low-level FS interfaces to the dcache..
> */
> extern void d_instantiate(struct dentry *, struct inode *);
> +extern void d_instantiate_new(struct dentry *, struct inode *);
> extern struct dentry * d_instantiate_unique(struct dentry *, struct inode *);
> extern struct dentry * d_instantiate_anon(struct dentry *, struct inode *);
> extern int d_instantiate_no_diralias(struct dentry *, struct inode *);
next prev parent reply other threads:[~2018-05-10 20:44 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-10 18:20 [RFC][PATCH] do d_instantiate/unlock_new_inode combinations safely Al Viro
2018-05-10 19:11 ` Andreas Dilger
2018-05-10 19:32 ` Al Viro
2018-05-10 20:44 ` Mike Marshall [this message]
2018-05-10 22:56 ` Dave Chinner
2018-05-11 0:39 ` Al Viro
2018-05-11 1:32 ` Dave Chinner
2018-05-11 2:18 ` Al Viro
2018-05-11 3:00 ` Dave Chinner
2018-05-11 19:56 ` Al Viro
2018-05-11 6:15 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOg9mSSRAVi-QWQEBzdBkwgLzHrXDTkQvuLe6bWbe5Bs-nic_w@mail.gmail.com \
--to=hubcap@omnibond.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=martin@omnibond.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).