linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] parent in ->d_compare() arguments
@ 2016-07-30  1:07 Al Viro
  2016-07-30 20:44 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2016-07-30  1:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

	We are passing to ->d_compare() instances parent, dentry itself,
and a consistent snapshot of its ->d_name.name and ->d_name.len.  In all
but one instance (ncpfs one) the only thing we need the parent for is
finding the superblock.  Which is available as dentry->d_sb.  ncpfs one
is weird, but it actually wants parent's ->d_inode, so it has to be
careful about the RCU case anyway.

	Do you have any objections to trimming the arguments list?  I want
to kill the 'parent' argument there and let ncpfs carefully walk
dentry->d_parent->d_inode.

	Another thing in the same area: __d_lookup_rcu() does
        hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
                unsigned seq;
                seq = raw_seqcount_begin(&dentry->d_seq);
                if (dentry->d_parent != parent)
                        continue;
and raw_seqcount_begin() contains smp_rmb().  Seeing that we hit ->d_parent
mismatch often enough and that we are fine with false negatives anyway,
let's turn that into
                if (dentry->d_parent != parent)
                        continue;
                seq = raw_seqcount_begin(&dentry->d_seq);
                if (unlikely(dentry->d_parent != parent))
                        continue;
and cut down on the number of smp_rmb() per __d_lookup_rcu().  We do need the
second check (to make sure that ->d_seq guarantees a consistent state of
everything), but it's going to trigger _very_ rarely - basically, only if
we step on dentry with the right parent just as it's being hit with
cross-directory rename.  That's a very slow case, since we are quite likely
to search the tail of the wrong hash chain and eventually bugger off with
NULL, switching to non-RCU codepath.  So in the cases of parent mismatch
we end up doing one fetch instead of two fetches + skip smp_rmb(), while in
case of parent match we do extra fetch from hot cacheline + branch not
taken.  AFAICS, it's going to be a win even on architectures with trivial
smp_rmb(); on something with costly smp_rmb() the win could be considerable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] parent in ->d_compare() arguments
  2016-07-30  1:07 [RFC] parent in ->d_compare() arguments Al Viro
@ 2016-07-30 20:44 ` Linus Torvalds
  2016-07-30 23:30   ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2016-07-30 20:44 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Fri, Jul 29, 2016 at 6:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>         We are passing to ->d_compare() instances parent, dentry itself,
> and a consistent snapshot of its ->d_name.name and ->d_name.len.  In all
> but one instance (ncpfs one) the only thing we need the parent for is
> finding the superblock.  Which is available as dentry->d_sb.  ncpfs one
> is weird, but it actually wants parent's ->d_inode, so it has to be
> careful about the RCU case anyway.
>
>         Do you have any objections to trimming the arguments list?  I want
> to kill the 'parent' argument there and let ncpfs carefully walk
> dentry->d_parent->d_inode.

No objection. That's all the slowpath, and simplifying it (at the
expense of possibly having to make ncpfs slightly more expensive)
sounds fine.


> Another thing in the same area: __d_lookup_rcu() does
> hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
>                 unsigned seq;
>                 seq = raw_seqcount_begin(&dentry->d_seq);
>                 if (dentry->d_parent != parent)
>                         continue;
> and raw_seqcount_begin() contains smp_rmb().  Seeing that we hit ->d_parent
> mismatch often enough and that we are fine with false negatives anyway,
> let's turn that into
>                 if (dentry->d_parent != parent)
>                         continue;
>                 seq = raw_seqcount_begin(&dentry->d_seq);
>                 if (unlikely(dentry->d_parent != parent))
>                         continue;
> and cut down on the number of smp_rmb() per __d_lookup_rcu().

No. Let's not. "smp_rmb()" is completely free on x86 (ok, so it's a
instruction scheduling barrer - close enough), so trying to optimize
away rmb's and replacing them with double compares sounds entirely
misdesigned.

Yes, yes, there are other architectures where rmb is much more
expensive. But quite frankly, in most cases those architectures have
broken synchronization to begin with ("synchronization is unusual, so
let's not optimize it"). They'll fix it eventually.

Instead, what we should look at, is to make raw_seqcount_begin() use a
smp_load_acquire() on architectures where that is cheaper than the
rmb.

But again, I don't see the point of double-testing "parent" when a
load-acquire or load+rmb _should_ be cheap (and absolutely is on x86).

                Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] parent in ->d_compare() arguments
  2016-07-30 20:44 ` Linus Torvalds
@ 2016-07-30 23:30   ` Al Viro
  2016-07-30 23:36     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2016-07-30 23:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Sat, Jul 30, 2016 at 01:44:48PM -0700, Linus Torvalds wrote:

> No. Let's not. "smp_rmb()" is completely free on x86 (ok, so it's a
> instruction scheduling barrer - close enough), so trying to optimize
> away rmb's and replacing them with double compares sounds entirely
> misdesigned.
> 
> Yes, yes, there are other architectures where rmb is much more
> expensive. But quite frankly, in most cases those architectures have
> broken synchronization to begin with ("synchronization is unusual, so
> let's not optimize it"). They'll fix it eventually.
> 
> Instead, what we should look at, is to make raw_seqcount_begin() use a
> smp_load_acquire() on architectures where that is cheaper than the
> rmb.
> 
> But again, I don't see the point of double-testing "parent" when a
> load-acquire or load+rmb _should_ be cheap (and absolutely is on x86).

Umm...  Even on x86, a lot of hash chain elements will have ->d_parent
mismatch.  Suppose rmb was a no-op; current code does
	fetch ->d_seq
	fetch ->d_parent
	compare with register
	branch taken to the end of body
while this would avoid the first fetch.  On the entries with the same
->d_parent we'd do
	fetch ->d_parent
	compare with register
	branch not taken
	fetch ->d_seq
	fetch ->d_parent
	compare with register
	branch (expectedly) not taken
which is the same as the mainline in terms of actual memory accesses and extra
3 insns.  I suspect that the win on the entries with ->d_parent mismatch can
outweight that, but that needs profiling to verify.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] parent in ->d_compare() arguments
  2016-07-30 23:30   ` Al Viro
@ 2016-07-30 23:36     ` Linus Torvalds
  2016-07-30 23:52       ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2016-07-30 23:36 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Sat, Jul 30, 2016 at 4:30 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  Even on x86, a lot of hash chain elements will have ->d_parent
> mismatch.  Suppose rmb was a no-op; current code does
>         fetch ->d_seq
>         fetch ->d_parent
>         compare with register
>         branch taken to the end of body
> while this would avoid the first fetch.

So? Aren't they in the same cacheline?

We've tried very hard to pack all those initial elements next to each other.

The first-order approximation is that number of cacheline accesses
matter. And then the second order is to make code small and avoid
extra instructions.

As far as I can tell, your change doesn't actually help the cacheline
accesses, and it makes the code bigger and have extra instructions. So
it doesn't appear to improve anything, and it does make things worse.

But numbers talk, bullshit walks. If you have numbers to show
something different, that trumps my looking at code.

             Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] parent in ->d_compare() arguments
  2016-07-30 23:36     ` Linus Torvalds
@ 2016-07-30 23:52       ` Al Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2016-07-30 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Sat, Jul 30, 2016 at 04:36:17PM -0700, Linus Torvalds wrote:

> So? Aren't they in the same cacheline?

Yes (unless lockdep map is stuck in between, in which case we are slow as
hell anyway).

> We've tried very hard to pack all those initial elements next to each other.
> 
> The first-order approximation is that number of cacheline accesses
> matter. And then the second order is to make code small and avoid
> extra instructions.
> 
> As far as I can tell, your change doesn't actually help the cacheline
> accesses, and it makes the code bigger and have extra instructions. So
> it doesn't appear to improve anything, and it does make things worse.
> 
> But numbers talk, bullshit walks. If you have numbers to show
> something different, that trumps my looking at code.

I'll profile and post the results; not today, though - it's not urgent anyway,
and right now I wouldn't trust my ability to do anything other than crawl to
the bed and hopefully get some sleep (RDU -> BOS -> RDU, on top of 4 hours of
sleep tonight and bloody two hours of delay in plane on the way back due to
bad weather; picking the younger kid from summer STEM camp had been...
interesting, for the lack of adequate printable words)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-31  1:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-30  1:07 [RFC] parent in ->d_compare() arguments Al Viro
2016-07-30 20:44 ` Linus Torvalds
2016-07-30 23:30   ` Al Viro
2016-07-30 23:36     ` Linus Torvalds
2016-07-30 23:52       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).