From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:53558 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752588AbdKIWPK (ORCPT ); Thu, 9 Nov 2017 17:15:10 -0500 From: NeilBrown To: Linus Torvalds Date: Fri, 10 Nov 2017 09:14:58 +1100 Cc: Al Viro , linux-fsdevel , Linux Kernel Mailing List Subject: Re: [PATCH 3/3] VFS: close race between getcwd() and d_move() In-Reply-To: References: <151019756744.30101.3832608128627682973.stgit@noble> <151019772763.30101.16040338743875884111.stgit@noble> Message-ID: <8760ajf6al.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, Nov 09 2017, Linus Torvalds wrote: > On Wed, Nov 8, 2017 at 7:22 PM, NeilBrown wrote: >> d_move() will call __d_drop() and then __d_rehash() >> on the dentry being moved. This creates a small window >> when the dentry appears to be unhashed. Many tests >> of d_unhashed() are made under ->d_lock and so are safe >> from racing with this window, but some aren't. >> In particular, getcwd() calls d_unlinked() (which calls >> d_unhashed()) without d_lock protection, so it can race. > > Hmm. > > I see what you're doing, but I don't necessarily agree. > > I would actually almost prefer that we simply change __d_move() itself. > > The problem is that __d_move() really wants to move the hashes things > atomically, but instead of doing that it does a "unhash and then > rehash". > > How nasty would it be to just expand the calls to __d_drop/__d_rehash > into __d_move itself, and take both has list locks at the same time > (with the usual ordering and checking if it's the same list, of > course). > > Linus something like this? I can probably do better than "b1" and "b2". I assume target must always be hashed ?? Do you like it enough for me to make it into a real patch? I'd probably move hlist_bl_lock_two() into list_bl.h. I'm not generally keen on open-coding subtle code, but maybe it is justified here. Thanks, NeilBrown diff --git a/fs/dcache.c b/fs/dcache.c index f90141387f01..1a329fedf23c 100644 =2D-- a/fs/dcache.c +++ b/fs/dcache.c @@ -472,6 +472,9 @@ static void dentry_lru_add(struct dentry *dentry) */ void __d_drop(struct dentry *dentry) { + /* WARNING: any changes here should be reflected in __d_move() + * which open-codes some of this functionality. + */ if (!d_unhashed(dentry)) { struct hlist_bl_head *b; /* @@ -2380,6 +2383,9 @@ EXPORT_SYMBOL(d_delete); =20 static void __d_rehash(struct dentry *entry) { + /* WARNING: any changes here should be reflected in __d_move() + * which open-codes some of this functionality. + */ struct hlist_bl_head *b =3D d_hash(entry->d_name.hash); BUG_ON(!d_unhashed(entry)); hlist_bl_lock(b); @@ -2796,11 +2802,23 @@ static void dentry_unlock_for_move(struct dentry *d= entry, struct dentry *target) * rename_lock, the i_mutex of the source and target directories, * and the sb->s_vfs_rename_mutex if they differ. See lock_rename(). */ +static void hlist_bl_lock_two(struct hlist_bl_head *b1, struct hlist_bl_h= ead *b2) +{ + if (b1 && b1 < b2) + hlist_bl_lock(b1); + if (b2) + hlist_bl_lock(b2); + if (b1 > b2) + hlist_bl_lock(b1); +} + static void __d_move(struct dentry *dentry, struct dentry *target, bool exchange) { struct inode *dir =3D NULL; unsigned n; + struct hlist_bl_head *b1, *b2; + if (!dentry->d_inode) printk(KERN_WARNING "VFS: moving negative dcache entry\n"); =20 @@ -2817,10 +2835,24 @@ static void __d_move(struct dentry *dentry, struct = dentry *target, write_seqcount_begin(&dentry->d_seq); write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED); =20 =2D /* unhash both */ =2D /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ =2D __d_drop(dentry); =2D __d_drop(target); + /* We want to unhash both, change names, then rehash one or both. + * If we use __d_drop() and __d_rehash() there will be a window + * when dentry appears to be d_unhashed() which can race with lockless + * checking. So instead we open-code the important parts of __d_drop() + * and __d_rehash(). + * @target must already be hashed, @dentry must be if @exchange. + */ + BUG_ON(d_unhashed(dentry) && exchange); + BUG_ON(d_unhashed(target)); + + b1 =3D d_unhashed(dentry) ? NULL : d_hash(dentry->d_name.hash); + b2 =3D d_hash(target->d_name.hash); + hlist_bl_lock_two(b1, b2); + if (b1) + __hlist_bl_del(&dentry->d_hash); + __hlist_bl_del(&target->d_hash); + write_seqcount_invalidate(&dentry->d_seq); + write_seqcount_invalidate(&target->d_seq); =20 /* Switch the names.. */ if (exchange) @@ -2829,9 +2861,14 @@ static void __d_move(struct dentry *dentry, struct d= entry *target, copy_name(dentry, target); =20 /* rehash in new place(s) */ =2D __d_rehash(dentry); + hlist_bl_add_head_rcu(&dentry->d_hash, b2); if (exchange) =2D __d_rehash(target); + hlist_bl_add_head_rcu(&target->d_hash, b1); + else + target->d_hash.pprev =3D NULL; + if (b1 && b1 !=3D b2) + hlist_bl_unlock(b1); + hlist_bl_unlock(b2); =20 /* ... and switch them in the tree */ if (IS_ROOT(dentry)) { --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAloE02QACgkQOeye3VZi gbn+XBAAk8FsM2HepXXF+JxlUCUMSC+RPGsLvcwiKo/kEUjrpBk9WowLpPxHxr+/ Rg1T/fgtdlE+8dsi53N/Ie+F+NBCbkxOhhLfNZbHDoIhLm/Vxg4O0jxyQ5zyaEKI 50KdAMFGMXtZlUklzKidYo4VM/d8RthNJF2FK1hUEunl8dfM5JLn1wjl04JVlEkA C4l3kWOfOn5M95ylNb0/Q6e06K4nzu+ZSiWtAbjn11BN4hyZlv5noHqgN5USPXnq nzjEaURI1BCS15FfKK7M8IIweyDMg48Z/g9mFEGpfAdNgcRotJ2l+EnQukDx7TER xnVj60hwUsh14g4EHXBSpOpfqEFTp5hSpfmKWrN5VG0HMnDQ5+lHTuHigZg/F1h+ 43O1z+RrgLpS49RMFoVC3I96PtwHwInQiRA7LtSLvrLxaddl8oeX1XmNnkoO4VAJ D3SdDxnkDwvA/sh/gfLu+HnOgkg7bzxnms/kfOJaZ7r+XSWjCcmgG3bG2GM8XbvZ mf0qP/9lI8DfznSl22ebdHJGaHZvtZbra3j+VjEVEwV8FiEH9YVTPZog26w25yCj fqnX5r3hCVE5uDp2ChLJ2aRnRD7BeNWg6zdTE1G99J96BTM3ixDwmVNKntps0+CS 2TnmBWhDoHA/DkQ2khP0zAswNRjrXxr4Z3F8Fn+ZJl/nLwz0ycA= =n0m6 -----END PGP SIGNATURE----- --=-=-=--