From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:51756 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752269AbcG3BHk (ORCPT ); Fri, 29 Jul 2016 21:07:40 -0400 Date: Sat, 30 Jul 2016 02:07:38 +0100 From: Al Viro To: Linus Torvalds Cc: linux-fsdevel@vger.kernel.org Subject: [RFC] parent in ->d_compare() arguments Message-ID: <20160730010738.GY2356@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-ID: We are passing to ->d_compare() instances parent, dentry itself, and a consistent snapshot of its ->d_name.name and ->d_name.len. In all but one instance (ncpfs one) the only thing we need the parent for is finding the superblock. Which is available as dentry->d_sb. ncpfs one is weird, but it actually wants parent's ->d_inode, so it has to be careful about the RCU case anyway. Do you have any objections to trimming the arguments list? I want to kill the 'parent' argument there and let ncpfs carefully walk dentry->d_parent->d_inode. Another thing in the same area: __d_lookup_rcu() does hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) { unsigned seq; seq = raw_seqcount_begin(&dentry->d_seq); if (dentry->d_parent != parent) continue; and raw_seqcount_begin() contains smp_rmb(). Seeing that we hit ->d_parent mismatch often enough and that we are fine with false negatives anyway, let's turn that into if (dentry->d_parent != parent) continue; seq = raw_seqcount_begin(&dentry->d_seq); if (unlikely(dentry->d_parent != parent)) continue; and cut down on the number of smp_rmb() per __d_lookup_rcu(). We do need the second check (to make sure that ->d_seq guarantees a consistent state of everything), but it's going to trigger _very_ rarely - basically, only if we step on dentry with the right parent just as it's being hit with cross-directory rename. That's a very slow case, since we are quite likely to search the tail of the wrong hash chain and eventually bugger off with NULL, switching to non-RCU codepath. So in the cases of parent mismatch we end up doing one fetch instead of two fetches + skip smp_rmb(), while in case of parent match we do extra fetch from hot cacheline + branch not taken. AFAICS, it's going to be a win even on architectures with trivial smp_rmb(); on something with costly smp_rmb() the win could be considerable.