From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f67.google.com ([209.85.160.67]:40048 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752936AbeE1COe (ORCPT ); Sun, 27 May 2018 22:14:34 -0400 Received: by mail-pl0-f67.google.com with SMTP id t12-v6so6290906plo.7 for ; Sun, 27 May 2018 19:14:34 -0700 (PDT) From: Andreas Dilger Message-Id: <64F20BDA-00AB-429E-959E-1463AE6DCC4F@dilger.ca> Content-Type: multipart/signed; boundary="Apple-Mail=_CFC90FCD-A175-47A9-A175-52A4824DADCF"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC][PATCH] rmdir(),rename(): do shrink_dcache_parent() only on success Date: Sun, 27 May 2018 20:14:30 -0600 In-Reply-To: <20180527222029.GR30522@ZenIV.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org, Linus Torvalds To: Al Viro References: <20180527222029.GR30522@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --Apple-Mail=_CFC90FCD-A175-47A9-A175-52A4824DADCF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On May 27, 2018, at 4:20 PM, Al Viro wrote: >=20 > Once upon a time ->rmdir() instances used to check if victim inode > had more than one (in-core) reference and failed with -EBUSY if it > had. The reason was race avoidance - emptiness check is worthless > if somebody could just go and create new objects in the victim > directory afterwards. >=20 > With introduction of dcache the checks had been replaced with > checking the refcount of dentry. However, since a cached negative > lookup leaves a negative child dentry, such check had lead to false > positives - with empty foo/ doing stat foo/bar before rmdir foo > ended up with -EBUSY unless the negative dentry of foo/bar happened > to be evicted by the time of rmdir(2). That had been fixed by > doing shrink_dcache_parent() just before the refcount check. >=20 > At the same time, ext2_rmdir() has grown a private solution that > eliminated those -EBUSY - it did something (setting ->i_size to 0) > which made any subsequent ext2_add_entry() fail. >=20 > Unfortunately, even with shrink_dcache_parent() the check had been > racy - after all, the victim itself could be found by dcache lookup > just after we'd checked its refcount. That got fixed by a new > helper (dentry_unhash()) that did shrink_dcache_parent() and unhashed > the sucker if its refcount ended up equal to 1. That got called = before > ->rmdir(), turning the checks in ->rmdir() instances into "if not > unhashed fail with -EBUSY". Which reduced the boilerplate nicely, but > had an unpleasant side effect - now shrink_dcache_parent() had been > done before the emptiness checks, leading to easily triggerable calls > of shrink_dcache_parent() on arbitrary large subtrees, quite possibly > nested into each other. >=20 > Several years later the ext2-private trick had been generalized - > (in-core) inodes of dead directories are flagged and calls of > lookup, readdir and all directory-modifying methods were prevented > in so marked directories. Remaining boilerplate in ->rmdir() = instances > became redundant and some instances got rid of it. >=20 > In 2011 the call of dentry_unhash() got shifted into ->rmdir() = instances > and then killed off in all of them. That has lead to another problem, > though - in case of successful rmdir we *want* any (negative) child > dentries dropped and the victim itself made negative. There's no = point > keeping cached negative lookups in foo when we can get the negative > lookup of foo itself cached. So shrink_dcache_parent() call had been > restored; unfortunately, it went into the place where dentry_unhash() > used to be, i.e. before the ->rmdir() call. Note that we don't unhash > anymore, so any "is it busy" checks would be racy; fortunately, all of > them are gone. >=20 > We should've done that call right *after* successful ->rmdir(). That > reduces contention caused by tree-walking in shrink_dcache_parent() > and, especially, contention caused by evictions in two nested subtrees > going on in parallel. The same goes for directory-overwriting = rename() - > the story there had been parallel to that of rmdir(). >=20 > Signed-off-by: Al Viro > --- > diff --git a/fs/namei.c b/fs/namei.c > index 186bd2464fd5..269b64a1121a 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3847,11 +3847,11 @@ int vfs_rmdir(struct inode *dir, struct dentry = *dentry) > if (error) > goto out; >=20 > - shrink_dcache_parent(dentry); > error =3D dir->i_op->rmdir(dir, dentry); > if (error) > goto out; >=20 > + shrink_dcache_parent(dentry); > dentry->d_inode->i_flags |=3D S_DEAD; > dont_mount(dentry); > detach_mounts(dentry); > @@ -4434,8 +4434,6 @@ int vfs_rename(struct inode *old_dir, struct = dentry *old_dentry, > old_dir->i_nlink >=3D max_links) > goto out; > } > - if (is_dir && !(flags & RENAME_EXCHANGE) && target) > - shrink_dcache_parent(new_dentry); > if (!is_dir) { > error =3D try_break_deleg(source, delegated_inode); > if (error) > @@ -4452,8 +4450,10 @@ int vfs_rename(struct inode *old_dir, struct = dentry *old_dentry, > goto out; >=20 > if (!(flags & RENAME_EXCHANGE) && target) { > - if (is_dir) > + if (is_dir) { > + shrink_dcache_parent(new_dentry); > target->i_flags |=3D S_DEAD; Would it be better to set S_DEAD on the removed directory before shrink_dcache_parent() is called (here and in vfs_rmdir()), or is there no way for a new dentry to be added to the parent after the shrink is done? Cheers, Andreas --Apple-Mail=_CFC90FCD-A175-47A9-A175-52A4824DADCF Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAlsLZgYACgkQcqXauRfM H+Bdkg//e9kPCsw5v37s8aO34HNIkxZITtsi0XoX9F9eM6qGuDYUOdA4DrhK5EjM eeHaUvCpWEf2ceUlApSVjHwkFNtHY5vwx7m3GbTCJo96qaUE7Hn+2xmnHLt47pKa /2XeOkNrXcfl04GuyMuQ678ygHUBKHMEt1QZOapMea/gU+Hbt7hHfeqgi+HrCjn2 Vn8+hRxDVSGILLMPis5CkhJs6oa8YNBTNkvLMNKas8cAEULUSSGSRISg9jirhfVQ vNTbGegqMH0GCUzj8bofpmlbJBGa11A2POzjm5CUXjH/owf69H4hJdGao/BIRpJM qGZLPEfnaK6DONoYE07flFsxTxSXnk1fybVNtNZoN593SJe4fqY+BtB0qEehpRZd mMXFEXJWQMgn5u/FJMcC5jH01qgjaqNoGdsJHkCFuk6L+I6funsN57Ryh5mvCDcC 8iw9doxCt8njMhDBp/CiNHnPSxBXXM9NaaiIUcECz1w6W2piqcIYz+SyOqYNbSnx mBvox212dL60G0y2c0FSoZ3ny/UhLX8tpwiuvFwJbFMpkyW6r54QmqZqXBPG8cxC K4asX9UP/3Ou9RU4Eblaeqpr1Y9+Ye/KStaKTI0vVleeWeW/heWPV0gQvMNdiTKe d+2ZAYJcHWs4HnB7q5iViZWW0yeyIa6EhXm/a46xI66bqfsAE6U= =cIgG -----END PGP SIGNATURE----- --Apple-Mail=_CFC90FCD-A175-47A9-A175-52A4824DADCF--