From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751517AbdBIDxk (ORCPT ); Wed, 8 Feb 2017 22:53:40 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:34282 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751346AbdBIDxh (ORCPT ); Wed, 8 Feb 2017 22:53:37 -0500 Date: Thu, 9 Feb 2017 03:53:07 +0000 From: Al Viro To: Andrew Morton Cc: Konstantin Khlebnikov , "Eric W. Biederman" , linux-kernel@vger.kernel.org Subject: Re: [PATCH] proc/sysctl: drop unregistered stale dentries as soon as possible Message-ID: <20170209035307.GK13195@ZenIV.linux.org.uk> References: <148655090380.421415.14305642138058304882.stgit@buzz> <20170208134804.5662cddf3a269eb8acb0aa8a@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170208134804.5662cddf3a269eb8acb0aa8a@linux-foundation.org> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 08, 2017 at 01:48:04PM -0800, Andrew Morton wrote: > > This patch detects stale dentry in proc_sys_compare and pretends that > > it has matching name - revalidation will kill it and lookup restarts. > > As a result each stale dentry will be seen only once and will not > > contaminate hash endlessly. > > > > What are "stale" dentries? Unused dentries? If so, why doesn't the > creation of a new dentry immediately invalidate the old dentry with a > matching path? What do other filesystems do to prevent this issue? The whole point is that it's *NOT* a matching path. Currently ->d_compare() for /proc/sys tries to make sure that sysctl getting unregistered means that no extra references will be added to dentries of the stuff we are trying to kick out. If it's getting unregistered, ->d_compare() won't be seeing them and from that point on dentry refcount can only go down - no new lookups will increase it. This kludge tries to have _any_ lookup in the same hash chain pick the first dentry of such stuff, no matter what name/parent it has. Then it relies upon ->d_revalidate() refusing to accept that sucker, so that it gets unhashed and we (hopefully) repeat the lookup. This is complete garbage. Lookups won't be repeated indefinitely - if there are several such dentries in the hash chain we search, syscall will end up failing with ESTALE on thus buggered ->d_compare(), even though none of those dentries are anywhere near the path we are trying to resolve. No other filesystem attempts that kind of insanity, and for a good reason. The problem it tries to address is that sysctl unregistration doesn't unhash the now-stale dentries. Before the unregistration we kept them even with refcount 0, until memory pressure evicts the suckers. After unregistration we make sure that refcount reaching 0 will cause the instant eviction. The problem is with the case when they had refcount 0 to start with. Then the eviction rule does not get triggered - it would have happened when dropping the last reference, but we don't have any. The kludge proposed in that patch is nowhere near being a sane way to deal with that. Having ->d_compare() notice such dentries and quietly kick them out would be borderline saner, but * it's a potentially blocking operation and ->d_compare() is called in non-blocking contexts, including deep under rcu_read_lock(). * it's done when walking a hash chain; having that chain modified by ->d_compare() itself would require some modifications of callers and those are very hot codepaths. I agree that the problem is real, but this is no way to deal with it. What we want is something along the lines of d_prune_aliases() done for all inodes corresponding to given sysctl. Done just before erase_header() in start_unregistering(). That would require maintaining the list of inodes in question (e.g. anchored in ctl_table_header) and a bit of care in traversing it (use of igrab(), etc.) In the current form - NAK. Sorry.