From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:58042 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751709AbdKIXJT (ORCPT ); Thu, 9 Nov 2017 18:09:19 -0500 From: NeilBrown To: Al Viro , Linus Torvalds Date: Fri, 10 Nov 2017 10:09:09 +1100 Cc: linux-fsdevel , Linux Kernel Mailing List Subject: Re: [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list In-Reply-To: <20171109205029.GD21978@ZenIV.linux.org.uk> References: <151019756744.30101.3832608128627682973.stgit@noble> <151019772760.30101.8513274540570798315.stgit@noble> <20171109205029.GD21978@ZenIV.linux.org.uk> Message-ID: <87zi7vdp7u.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, Nov 09 2017, Al Viro wrote: > On Thu, Nov 09, 2017 at 11:52:48AM -0800, Linus Torvalds wrote: >> Honestly, looking at the code, the whole s_anon thing seems entirely >> broken. There doesn't even seem to be much reason for it. In pretty >> much all cases, we could just hash the damn dentry, >>=20 >> The only reason for actually having s_anon seems to be that we want >> some per-superblock list of these unconnected dentries for >> shrink_dcache_for_umount(). >>=20 >> Everything else would actually be *much* happier with just having the >> dentry on the regular hash table. It would entirely get rid of this >> stupid performance problem, and it would actually simplify all the >> code elsewhere, because it would remove special cases like this >>=20 >> if (unlikely(IS_ROOT(dentry))) >> b =3D &dentry->d_sb->s_anon; >> else >> b =3D d_hash(dentry->d_name.hash); >>=20 >> and just turn them into >>=20 >> b =3D d_hash(dentry->d_name.hash); >>=20 >> so I really wonder if we could just get rid of s_anon entirely. >>=20 >> Yes, getting rid of s_anon might involve crazy things like "let's just >> walk all the dentries at umount time", but honestly, that sounds >> preferable. Especially if we can just then do something like >>=20 >> - set a special flag in the superblock if we ever use __d_obtain_alias() > > Automatically set for a lot of NFS mounts (whenever you mount more than o= ne > tree from the same server, IIRC)... > >> - only scan all the dentries on umount if that flag is set. >>=20 >> Hmm? > > That looks like a bloody painful approach, IMO. I'm not saying I like > Neil's patch, but I doubt that "let's just walk the entire dcache on > umount" is a good idea. > > I wonder if separating the d_obtain_alias() and d_obtain_root() would be > a good idea; the former outnumber the latter by many orders of magnitude. > The tricky part is that we could have a disconnected directory from > d_obtain_alias() with children already connected to it (and thus normally > hashed by d_splice_alias()) and fail to connect the whole thing to parent. > > That leaves us with an orphaned tree that might stick around past the > time when we drop all references to dentries in it. And we want to > get those hunted down and shot on umount. Could we > * make s_anon hold d_obtain_root ones + orphans from such > failed reconnects > * make final dput() treat hashed IS_ROOT as "don't retain it" > * have d_obtain_alias() put into normal hash, leaving the > "move to s_anon" part to reconnect failures. > * keep umount side of things unchanged. > > I agree that temporary insertions into ->s_anon are bogus; hell, I'm not > even sure we want to put it on _any_ list initially - we want it to look > like it's hashed, so we could set ->next to NULL and have ->pprev point > to itself. Then normal case for d_obtain_alias() would not bother > the hash chains at all at allocation time, then have it put into the > right hash chain on reconnect. And on reconnect failure the caller > would've moved it to orphan list (i.e. ->s_anon). I looked back at the original bug report, and it was d_obtain_alias() that was particularly suffering. As this holds two spinlocks while waiting for the bl_lock, a delay can affect other code that doesn't touch s_anon. So improving d_obtain_alias() would be a good goal (as long as we don't just move the problem of course). It isn't only "reconnect failure" that leaves things on s_anon. We normally don't bother trying to connect non-directories. So if an NFS server is getting lots of read/write request without opens or other pathname lookups, it could easily have lots of disconnected files being repeatedly accessed. Keeping the dentries on d_anon means we don't need to keep allocating new ones for every request. So I'm not keen on dropping an IS_ROOT() dentry at final dput(), but it might make sense to add the dentry to the per-fs list of IS_ROOT dentries at that time. One possible approach would be to use d_child rather than d_hash to link together dentries that don't have a parent. We could assign a random number to d_name.hash so it could appear to be hashed without imposing on any one hash chain. We would still need a spinlock in the superblock to manage the d_anon list that links the d_child's together... I might try to see how the code looks. Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAloE4BcACgkQOeye3VZi gbkdiBAAh36eiS10OFPrnCv/Ux67bKaMewK+DNVT2bUHVO187idjgvCEtkMpJD+z gY17ReinsltSj9luRrcDkvqfjU/yrTIz8bwBmphbMDcp+7vcBd8E6i7s62idnik2 zIDiqtl3GJEuXQ9qdTKMssrJUVqAtAV+6WuFvl2KNPkl7TIPAW9Lz3UHFA4DuWvy pQq4cfz1tlmhAICYVPf543j0ptGbEoK6A1zBxpQDJyS8pyHfe8bJrUUKZ3m1PtKQ OTFmSYEULKtPPNGBnJ2PA1A7CaGvw6lOJagBQfCxvLrodOzuB/8DUafDVHY7AsT3 nykJmt7K7GBEE59hi9i5WO8VlpaBQ7LsAIQDoSJGa4ZBNVjb+xbFeo0zWsUyxIM5 uV26IPkSBDv/9Z+br/3IY856MLZwjy6nyhiK99bRq/L5Byxlnl5m/Ce7/b1H3wdv zPsM2uwmN/LEHyy8qo9KCk9pKg0Ya++AiLoZ88YDFipR4Mb4JR9ZI/Q0L4zUB4KB ppdrlKs9fEEirXQrk0AiecPblzTpnKgIdEBiPBLalGSQ1xmI7fsOSNtq3UIU5GCq 5H3rMEZ13Icm6yfsnpH14u4ZXo/p6VACkFCjmqsaT1tNND4mNQ62XrQi6/cwaPjl i3kgRGFv9lZjuWQFnkSUphzx6vfVYm1vZEL4Js6JIgplEVR1MZM= =WMFC -----END PGP SIGNATURE----- --=-=-=--