All of lore.kernel.org
 help / color / mirror / Atom feed
* Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
@ 2016-06-17 20:50 J. R. Okajima
  2016-06-17 22:16 ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2016-06-17 20:50 UTC (permalink / raw)
  To: viro, linux-fsdevel


I am afraid there may exist another violation of "no lookups on the same
name in parallel" rule, but I am not sure.

Roughly d_alloc_parallel() behaves like this.

struct dentry *d_alloc_parallel()
{
	new = d_alloc(parent, name);

	rcu_read_lock();
	hlist_bl_lock(b);
	rcu_read_unlock();
	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
		if (!matched_dentry_found)
			continue;
		dget(dentry);
		hlist_bl_unlock(b);
		return dentry;
	}
	hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b);
	hlist_bl_unlock(b);
	return new;
}

When two processes try opening a single existing file and enters
d_alloc_parallel() at the same time, only one process wins and should
succeeds hlist_bl_add_head_rcu(). The other process should find the
dentry in d_u.d_in_lookup_hash and return 'dentry' (instead of
'new'). Am I right?

My question is when will 'new' be added into d_u.d_in_lookup_hash?
It should be between these two lines, I guess.
	rcu_read_unlock();
	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
But can it surely happen?
If 'new' is not added here because someone else is in rcu_read_lock
region or other reason, then both processes will add the same named but
different dentry?

Is it better to change the lock/unlock-order like this?

	rcu_read_unlock();
	rcu_barrier();
	hlist_bl_lock(b);
	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {


J. R. Okajima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-17 20:50 Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
@ 2016-06-17 22:16 ` Al Viro
  2016-06-17 22:56   ` Al Viro
  2016-06-19  5:24   ` J. R. Okajima
  0 siblings, 2 replies; 16+ messages in thread
From: Al Viro @ 2016-06-17 22:16 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel

On Sat, Jun 18, 2016 at 05:50:30AM +0900, J. R. Okajima wrote:
> 	hlist_bl_lock(b);
> 	rcu_read_unlock();
> 	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
> 		if (!matched_dentry_found)
> 			continue;
> 		dget(dentry);
> 		hlist_bl_unlock(b);
> 		return dentry;
> 	}
> 	hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b);
> 	hlist_bl_unlock(b);
> 	return new;
> }

> When two processes try opening a single existing file and enters
> d_alloc_parallel() at the same time, only one process wins and should
> succeeds hlist_bl_add_head_rcu(). The other process should find the
> dentry in d_u.d_in_lookup_hash and return 'dentry' (instead of
> 'new'). Am I right?
> 
> My question is when will 'new' be added into d_u.d_in_lookup_hash?

You've quoted it yourself:
> 	hlist_bl_add_head_rcu(&new->d_u.d_in_lookup_hash, b);

> It should be between these two lines, I guess.
> 	rcu_read_unlock();
> 	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
> But can it surely happen?
> If 'new' is not added here because someone else is in rcu_read_lock
> region or other reason, then both processes will add the same named but
> different dentry?

Huh?  If you have two processes reaching that insertion into the in-lookup
hash, whoever gets the hlist_bl_lock() first wins; the loser will find
the instance added by the winner and bugger off with it.  RCU is completely
unrelated to that.  It's about the search in *primary* hash.
 
> Is it better to change the lock/unlock-order like this?
> 
> 	rcu_read_unlock();
> 	rcu_barrier();
> 	hlist_bl_lock(b);
> 	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {

No.  We need to verify that nothing had been transferred from in-lookup to
primary hash after we'd found no match in the primary.  That's what the
->i_dir_seq check is about, and it has to be done after we'd locked the
in-lookup hash chain.  We could drop rcu_read_lock before locking the
chain, but this way we make sure we won't get preempted between fetching
the ->i_dir_seq and checking if it had been changed.  The last thing we
want is rcu_barrier() - WTF for?  To get an extra chance of something
being moved from in-lookup to primary, forcing us to repeat the primary
hash lookup?

We are *NOT* modifying the primary hash in d_alloc_parallel().  At all.
With respect to the primary hash it's a pure reader.  And in-lookup hash
is only accessed under hlist_bl_lock() on its chains - no RCU accesses to
that one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-17 22:16 ` Al Viro
@ 2016-06-17 22:56   ` Al Viro
  2016-06-19  5:24   ` J. R. Okajima
  1 sibling, 0 replies; 16+ messages in thread
From: Al Viro @ 2016-06-17 22:56 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel

On Fri, Jun 17, 2016 at 11:16:14PM +0100, Al Viro wrote:

> We are *NOT* modifying the primary hash in d_alloc_parallel().  At all.
> With respect to the primary hash it's a pure reader.  And in-lookup hash
> is only accessed under hlist_bl_lock() on its chains - no RCU accesses to
> that one.

Note that if d_alloc_parallel() finds a matching dentry in the in-lookup hash,
it does *NOT* return it until it has left in-lookup hash and had been moved
to the primary one.

That's what d_wait_lookup() call is about; we wait for whoever had added it
into in-lookup hash to be done with it.  Then we check if it's been transferred
into the primary hash (with the same name and parent).  If it has been,
we treat it as if we'd found it in the primary hash in the first place -
it's not an in-lookup one anymore.  If it hasn't, we repeat the whole thing
from scratch, starting with primary hash lookup.

The *only* case when we return an in-lookup dentry is when we had allocated
it, found no matches either in primary or in-lookup hashes and inserted it
into in-lookup hash.  What happens is an optimized variant of this:

	new = new dentry instance
retry:
	fetch the value of ->i_dir_seq
	look for match in primary
	if found - drop what we'd allocated and return what we'd found
	lock the chain in in-lookup hash
	check if ->i_dir_seq has changed
	if it has changed
		unlock the chain
		goto retry
	look for match in in-lookup hash
	if found
		unlock the chain
		wait for the match to cease being in-lookup
		drop the match
		goto retry	[see below]
	insert new into in-lookup hash
	unlock the chain
	return new

The difference between that and actual d_alloc_parallel() is a simple
optimisation in the last goto retry - the one after waiting for match
to leave in-lookup hash.  Pseudocode above would rescan the primary hash;
very often it would end up finding the same dentry we'd just been waiting
for.  So if our match is hashed and still has the same name and parent we
can return it without bothering with primary hash lookup.  It is common
enough to be worth doing, but it's just an optimisation - behaviour is
the same as if we'd followed d_wait_lookup() with unconditional
        spin_unlock(&dentry->d_lock);
        dput(dentry);
        goto retry;

The only difference is that we don't bother with drop/search in primary/find
and grab the same match in a very common case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-17 22:16 ` Al Viro
  2016-06-17 22:56   ` Al Viro
@ 2016-06-19  5:24   ` J. R. Okajima
  2016-06-19 16:55     ` Al Viro
  1 sibling, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2016-06-19  5:24 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel


Al Viro:
> Huh?  If you have two processes reaching that insertion into the in-lookup
> hash, whoever gets the hlist_bl_lock() first wins; the loser will find
> the instance added by the winner and bugger off with it.  RCU is completely
> unrelated to that.  It's about the search in *primary* hash.

Ok, forget about rcu_barrier.

----------------------------------------

> 	look for match in in-lookup hash
> 	if found
> 		unlock the chain
> 		wait for the match to cease being in-lookup
> 		drop the match
> 		goto retry	[see below]
> 	insert new into in-lookup hash

The actual matching test which corresponds to above pseudo-code (if
found) is this (from v4.7-rc3).

	dentry->d_name.hash != hash
	dentry->d_parent != parent
	d_unhashed(dentry)
	name (length and string)

I am afraid this d_unhashed() test is racy.
Here is what I am guessing.

- two processes try opening the same file
- the both enter the hlist_bl_lock protected loop in d_alloc_parallel()

- the winner puts the new dentry into in-lookup hash
  + here d_unhashed(dentry) would still return true.
- then the winner process will call ->atomic_open or ->lookup. finally
  d_add() and rehash will be called and the dentry will be moved to the
  primary hash.
  + here d_unhashed(dentry) would return false.

As soon as the winner calls hlist_bl_unlock(), the looser starts
d_in_lookup_hash loop and find the dentry which the winner added.

- the looser (or we should call processB) do the tests
	dentry->d_name.hash != hash
	dentry->d_parent != parent
	d_unhashed(dentry)
- if processA has already called d_add and rehash, then this
  d_unhashed() test would return false, and processB will throw away his
  own 'new' dentry and return the found one.
- if processA has NOT called d_add and rehash yet (due to the schedule
  timing), then this d_unhashed() test would return true, and processB
  will simply skip the found dentry.
  in this case, processB will add his own 'new' dentry into in-lookup
  hash and return it.

Finally this race between these two
- d_add and rehash via ->atomic_open or ->lookup
- d_unhashed test in d_alloc_parallel
leads to the duplicated dentries (same named dentry under the same
parent).

Do you think it can happen?


J. R. Okajima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-19  5:24   ` J. R. Okajima
@ 2016-06-19 16:55     ` Al Viro
  2016-06-20  4:34       ` J. R. Okajima
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2016-06-19 16:55 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel

On Sun, Jun 19, 2016 at 02:24:44PM +0900, J. R. Okajima wrote:
> - two processes try opening the same file
> - the both enter the hlist_bl_lock protected loop in d_alloc_parallel()
> 
> - the winner puts the new dentry into in-lookup hash
>   + here d_unhashed(dentry) would still return true.
> - then the winner process will call ->atomic_open or ->lookup. finally
>   d_add() and rehash will be called and the dentry will be moved to the
>   primary hash.
>   + here d_unhashed(dentry) would return false.


> As soon as the winner calls hlist_bl_unlock(), the looser starts
> d_in_lookup_hash loop and find the dentry which the winner added.
> 
> - the looser (or we should call processB) do the tests
> 	dentry->d_name.hash != hash
> 	dentry->d_parent != parent
> 	d_unhashed(dentry)
> - if processA has already called d_add and rehash, then this
>   d_unhashed() test would return false, and processB will throw away his
>   own 'new' dentry and return the found one.
> - if processA has NOT called d_add and rehash yet (due to the schedule
>   timing), then this d_unhashed() test would return true, and processB
>   will simply skip the found dentry.
>   in this case, processB will add his own 'new' dentry into in-lookup
>   hash and return it.

How would processB get past d_wait_lookup()?  It would have to have
observed !d_in_lookup() while holding ->d_lock on that sucker; it does
*not* drop ->d_lock through the tests you've mentioned.  And both the
removals of in-lookup flag and insertion into primary hash are done under
->d_lock without dropping it in between.

That was the point of taking security_d_instantiate() prior to attaching
to inode, etc. - that way we can make those actions (removal from in-lookup
hash, possibly attaching to inode, inserting into the primary hash) atomic
wrt d_wait_lookup().

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-19 16:55     ` Al Viro
@ 2016-06-20  4:34       ` J. R. Okajima
  2016-06-20  5:35         ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2016-06-20  4:34 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel


Al Viro:
> How would processB get past d_wait_lookup()?  It would have to have

By the first d_unhashed() test in the loop, processB doesn't reach
d_wait_lookup().

Here is a more detailed scenario I am considering.
- two processes try opening the same file
- the both enter lookup_open() and the hlist_bl_lock protected loop in
  d_alloc_parallel()

processA (the winner of hlist_bl_lock)
d_alloc_parallel()
+ new = d_alloc()
+ rcu_read_lock()
+ dentry = __d_lookup_rcu()
  NULL
+ hlist_bl_lock()		--- (BL0)
+ test parent's i_dir_seq	--- (DS1)
+ rcu_read_unlock()
+ hlist_bl_for_each_entry(d_u.d_in_lookup_hash)
  not found
+ new->d_flags |= DCACHE_PAR_LOOKUP
+ new->d_wait = wq
+ hlist_bl_add_head_rcu(new->d_u.d_in_lookup_hash)
+ hlist_bl_unlock()		--- (BL1)
+ return new
--> a new dentry is added into in-lookup hash.

And then
processA
->atomic_open or ->lookup
+ __d_add()
  + spin_lock(d_lock)
  + update parent's i_dir_seq	--- (DS2)
  + __d_lookup_done()
    + hlist_bl_lock()		--- (BL2)
    + dentry->d_flags &= ~DCACHE_PAR_LOOKUP
    + __hlist_bl_del(d_u.d_in_lookup_hash)
    + hlist_bl_unlock()		--- (BL3)
  + _d_rehash()
    + hlist_bl_lock()		--- (BL4)
    + hlist_bl_add_head_rcu(d_hash)
    + hlist_bl_unlock()
  + spin_unlock(d_lock)
--> the new dentry is moved from in-lookup to primary hash.


Between (BL1) and (BL2) in processA flow, processB may acquire
hlist_bl_lock() which was blocked at (BL0), and it may happen even
before processA's (DS2).
In this case, processB will return an unexpectedly duplicated dentry
because DS1 test will be passed and the loop body will be skipped by
d_unhashed() test before d_wait_lookup().
It is after (BL4) when d_unhashed() turns FALSE.

processB
d_alloc_parallel()
+ hlist_bl_lock()
+ test parent's i_dir_seq
  passed if processA doesn't reach (DS2) yet.
+ rcu_read_unlock()
+ hlist_bl_for_each_entry(d_u.d_in_lookup_hash)
  dentry found but skips because
	if (d_unhashed(dentry))
		continue;
  before d_wait_lookup().
+ new->d_flags |= DCACHE_PAR_LOOKUP
+ new->d_wait = wq
+ hlist_bl_add_head_rcu(new->d_u.d_in_lookup_hash)
+ hlist_bl_unlock()
+ return new
--> another same named dentry is added into in-lookup hash.


On the other hand, in case of processB acquires hlist_bl_lock() between
(BL3) and (BL4) in processA's flow, processB will detect the parent's
i_dir_seq is modified and 'goto retry'. It is good.

Finally the race condition I am afraid is
- processB aqcuires BL0 between processA's BL1 and BL2.
  and
- processB tests DS1 before processA's DS2.


J. R. Okajima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-20  4:34       ` J. R. Okajima
@ 2016-06-20  5:35         ` Al Viro
  2016-06-20 14:51           ` Al Viro
  2016-06-23  1:19           ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
  0 siblings, 2 replies; 16+ messages in thread
From: Al Viro @ 2016-06-20  5:35 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel

On Mon, Jun 20, 2016 at 01:34:14PM +0900, J. R. Okajima wrote:
> 
> Al Viro:
> > How would processB get past d_wait_lookup()?  It would have to have
> 
> By the first d_unhashed() test in the loop, processB doesn't reach
> d_wait_lookup().

Huh?  What first d_unhashed()...  <stares>

That check is definitely bogus and I'm completely at loss as to WTF is it
doing there.  Thanks for catching that; this kind of idiotic braino can
escape notice when rereading the code again and again, unfortunately ;-/

Fixed, will push to Linus tonight or tomorrow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-20  5:35         ` Al Viro
@ 2016-06-20 14:51           ` Al Viro
  2016-06-20 17:14             ` [git pull] vfs fixes Al Viro
  2016-06-23  1:19           ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
  1 sibling, 1 reply; 16+ messages in thread
From: Al Viro @ 2016-06-20 14:51 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel, Linus Torvalds

On Mon, Jun 20, 2016 at 06:35:30AM +0100, Al Viro wrote:
> On Mon, Jun 20, 2016 at 01:34:14PM +0900, J. R. Okajima wrote:
> > 
> > Al Viro:
> > > How would processB get past d_wait_lookup()?  It would have to have
> > 
> > By the first d_unhashed() test in the loop, processB doesn't reach
> > d_wait_lookup().
> 
> Huh?  What first d_unhashed()...  <stares>
> 
> That check is definitely bogus and I'm completely at loss as to WTF is it
> doing there.  Thanks for catching that; this kind of idiotic braino can
> escape notice when rereading the code again and again, unfortunately ;-/
> 
> Fixed, will push to Linus tonight or tomorrow.

FWIW, I understand how it got there; it was a garbage from cut'n'paste from
lockless primary hash lookups (cut'n'paste was for the sake of "compare
the name" logics).  It was absolutely wrong - dentry is never added to
the primary hash until it has been removed from in-lookup one.  And we are
walking the in-lookup hash chain with its bitlock held, so there's no
chance of that.

In effect that junk prevented d_alloc_parallel() from *ever* spotting
in-lookup matches.  What's more, removing it has instantly uncovered
another bug in the match-handling code - dget() done under the chain
bitlock, which nests inside ->d_lock.  Trivially fixed, of course (we
just hold rcu_read_lock() through the in-lookup hash search and instead
of dget() while holding the chain bitlock do lockref_get_not_dead()
after dropping the bitlock), but...  *ouch*

It's going through the local tests right now; seems to be OK so far; I'll
send a pull request once it's through those.  But this demonstrates why RTFS
(and by somebody other than the author of TFS being R) is really, _really_
important.  I have read through that loop many times and kept missing that
turdlet ;-/

					Al, wearing a brown paperbag ;-/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [git pull] vfs fixes
  2016-06-20 14:51           ` Al Viro
@ 2016-06-20 17:14             ` Al Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2016-06-20 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

A couple more of d_walk()/d_subdirs reordering fixes (stable fodder; ought to
solve that crap for good) and a fix for a brown paperbag bug in
d_alloc_parallel() (this cycle).

The following changes since commit 1607f09c226d1378439c411baaaa020042750338:

  coredump: fix dumping through pipes (2016-06-07 22:07:09 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

for you to fetch changes up to e7d6ef9790bc281f5c29d0132b68031248523fe8:

  fix idiotic braino in d_alloc_parallel() (2016-06-20 10:07:42 -0400)

----------------------------------------------------------------
Al Viro (3):
      much milder d_walk() race
      autofs races
      fix idiotic braino in d_alloc_parallel()

 fs/autofs4/autofs_i.h  |  8 ++++--
 fs/autofs4/expire.c    | 27 ++++++------------
 fs/autofs4/root.c      |  2 +-
 fs/dcache.c            | 75 ++++++++++++++++++++++++++++++++++++++++++--------
 fs/internal.h          |  1 +
 fs/libfs.c             |  4 +--
 include/linux/dcache.h |  1 +
 7 files changed, 82 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-20  5:35         ` Al Viro
  2016-06-20 14:51           ` Al Viro
@ 2016-06-23  1:19           ` J. R. Okajima
  2016-06-23  2:58             ` Al Viro
  1 sibling, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2016-06-23  1:19 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel


Al Viro:
> That check is definitely bogus and I'm completely at loss as to WTF is it
> doing there.  Thanks for catching that; this kind of idiotic braino can
> escape notice when rereading the code again and again, unfortunately ;-/
>
> Fixed, will push to Linus tonight or tomorrow.

Thank you too for fixing.
I've confirmed the patch was merged and it passed my local tests
too. Happy.
I have another and relatively less important suggestion. Since the two
groups tests in the loop are very similar, how about extract them as a
new single static function?
Do you think it will invite a performance down?


J. R. Okajima


diff --git a/fs/dcache.c b/fs/dcache.c
index 76afffd..3a9ebbc 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2462,14 +2462,42 @@ static void d_wait_lookup(struct dentry *dentry)
 	}
 }
 
+/* tests for d_alloc_parallel() */
+static bool dap_test(struct dentry *dentry, const struct qstr *name,
+		     struct dentry *parent, bool do_unhashed_test)
+{
+	bool ret;
+
+	ret = false;
+	if (dentry->d_name.hash != name->hash)
+		goto out;
+	if (dentry->d_parent != parent)
+		goto out;
+	if (do_unhashed_test && d_unhashed(dentry))
+		goto out;
+	if (parent->d_flags & DCACHE_OP_COMPARE) {
+		int tlen = dentry->d_name.len;
+		const char *tname = dentry->d_name.name;
+		if (parent->d_op->d_compare(parent, dentry, tlen, tname, name))
+			goto out;
+	} else {
+		unsigned int len = name->len;
+		if (dentry->d_name.len != len)
+			goto out;
+		if (dentry_cmp(dentry, name->name, len))
+			goto out;
+	}
+	ret = true;
+
+out:
+	return ret;
+}
+
 struct dentry *d_alloc_parallel(struct dentry *parent,
 				const struct qstr *name,
 				wait_queue_head_t *wq)
 {
-	unsigned int len = name->len;
-	unsigned int hash = name->hash;
-	const unsigned char *str = name->name;
-	struct hlist_bl_head *b = in_lookup_hash(parent, hash);
+	struct hlist_bl_head *b = in_lookup_hash(parent, name->hash);
 	struct hlist_bl_node *node;
 	struct dentry *new = d_alloc(parent, name);
 	struct dentry *dentry;
@@ -2515,21 +2543,8 @@ retry:
 	 * we encounter.
 	 */
 	hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
-		if (dentry->d_name.hash != hash)
-			continue;
-		if (dentry->d_parent != parent)
+		if (!dap_test(dentry, name, parent, /*do_unhashed_test*/false))
 			continue;
-		if (parent->d_flags & DCACHE_OP_COMPARE) {
-			int tlen = dentry->d_name.len;
-			const char *tname = dentry->d_name.name;
-			if (parent->d_op->d_compare(parent, dentry, tlen, tname, name))
-				continue;
-		} else {
-			if (dentry->d_name.len != len)
-				continue;
-			if (dentry_cmp(dentry, str, len))
-				continue;
-		}
 		hlist_bl_unlock(b);
 		/* now we can try to grab a reference */
 		if (!lockref_get_not_dead(&dentry->d_lockref)) {
@@ -2550,23 +2565,9 @@ retry:
 		 * d_lookup() would've found anyway.  If it is, just return it;
 		 * otherwise we really have to repeat the whole thing.
 		 */
-		if (unlikely(dentry->d_name.hash != hash))
-			goto mismatch;
-		if (unlikely(dentry->d_parent != parent))
+		if (unlikely(!dap_test(dentry, name, parent,
+				       /*do_unhashed_test*/true)))
 			goto mismatch;
-		if (unlikely(d_unhashed(dentry)))
-			goto mismatch;
-		if (parent->d_flags & DCACHE_OP_COMPARE) {
-			int tlen = dentry->d_name.len;
-			const char *tname = dentry->d_name.name;
-			if (parent->d_op->d_compare(parent, dentry, tlen, tname, name))
-				goto mismatch;
-		} else {
-			if (unlikely(dentry->d_name.len != len))
-				goto mismatch;
-			if (unlikely(dentry_cmp(dentry, str, len)))
-				goto mismatch;
-		}
 		/* OK, it *is* a hashed match; return it */
 		spin_unlock(&dentry->d_lock);
 		dput(new);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-23  1:19           ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
@ 2016-06-23  2:58             ` Al Viro
  2016-06-24  5:57               ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2016-06-23  2:58 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel, Linus Torvalds, George Spelvin

[Linus and George Cc'd since it's close to the area affected by hash
rework and friends]

On Thu, Jun 23, 2016 at 10:19:16AM +0900, J. R. Okajima wrote:
> 
> Al Viro:
> > That check is definitely bogus and I'm completely at loss as to WTF is it
> > doing there.  Thanks for catching that; this kind of idiotic braino can
> > escape notice when rereading the code again and again, unfortunately ;-/
> >
> > Fixed, will push to Linus tonight or tomorrow.
> 
> Thank you too for fixing.
> I've confirmed the patch was merged and it passed my local tests
> too. Happy.
> I have another and relatively less important suggestion. Since the two
> groups tests in the loop are very similar, how about extract them as a
> new single static function?
> Do you think it will invite a performance down?

We do have too many duplicates, especially if you count the related but not
identical ones as well.

1) __d_lookup():
	check hash
	grab d_lock to stabilize the rest
	check parent
	check if it's hashed
	check name/length
2) d_alloc_parallel(), search in in-lookup hash:
	hash/parent/name stable for everything in in-lookup hash, need no locks
	check hash
	check parent
	check name/length
3) d_alloc_parallel(), check after waiting:
	d_lock already held
	check hash
	check parent
	check if it's hashed
	check name/length
4) d_exact_alias():
	check hash
	check parent
	check name/length (simplified since at the moment it's only used for
filesystems that do not have ->d_compare()).

FWIW, now that I look at d_exact_alias()... the comment in there needs
an update; the code is correct, but only because we don't call
that thing for directories.  Which means that d_splice_alias()-induced
moves do not affect us, leaving only d_move(), which is always called
with parent locked exclusive.

Hell knows...  Order of checks can be delicate; out of those cases, (1) and (2)
are on the hottest paths.  We certainly can do this:

static always_inline bool d_same_name(const struct dentry *dentry,
				      const struct dentry *parent,
				      const struct qstr *name)
{
	if (parent->d_flags & DCACHE_OP_COMPARE) {
		int tlen = dentry->d_name.len;
		const char *tname = dentry->d_name.name;
		if (parent->d_op->d_compare(parent, dentry, tlen, tname, name))
			return false;
	} else {
		if (dentry->d_name.len != qstr->len)
			return false;
		if (dentry_cmp(dentry, qstr->name, qstr->len))
			return false;
	}
	return true;
}

then switch all four to this.  That would reduce the amount of boilerplate;
I would hesitate to replace always_inline with inline, since we don't want
(1) and (2) to become uninlined, but maybe even that doesn't matter all that
much - most of rejects will happen without looking at the names.  It really
needs profiling...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-23  2:58             ` Al Viro
@ 2016-06-24  5:57               ` Linus Torvalds
  2016-06-25 22:54                 ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2016-06-24  5:57 UTC (permalink / raw)
  To: Al Viro; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin

On Wed, Jun 22, 2016 at 7:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Hell knows...  Order of checks can be delicate; out of those cases, (1) and (2)
> are on the hottest paths.

Actuially, as long as we leave the actual RCU paths alone, none of the
lookup paths are all that hot in any profile I've seen recently.

So I'm certainly ok with using that d_same_name() helper for the four
cases you mention (__d_lookup, d_alloc_parallel*2, d_exact_alias).

In fact, if you add a "unlikely()" for the DCACHE_OP_COMPARE test, you
might even improve on code generation.

The non-RCU case basically never shows up in profiles (it used to,
with symlinks, but you fixed that case), and if it ever does I suspect
that the fix will be to make sure we don't fall out of rcu.

So don't worry too much about __d_lookup() being a hot case, I don't
think it is.

(Of course, if you have a load that shows me wrong, let's look at it
by all means. Maybe the loads I have used have been bad)

               Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-24  5:57               ` Linus Torvalds
@ 2016-06-25 22:54                 ` Al Viro
  2016-06-26  1:25                   ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2016-06-25 22:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin

On Thu, Jun 23, 2016 at 10:57:09PM -0700, Linus Torvalds wrote:

> The non-RCU case basically never shows up in profiles (it used to,
> with symlinks, but you fixed that case), and if it ever does I suspect
> that the fix will be to make sure we don't fall out of rcu.
> 
> So don't worry too much about __d_lookup() being a hot case, I don't
> think it is.
> 
> (Of course, if you have a load that shows me wrong, let's look at it
> by all means. Maybe the loads I have used have been bad)

BTW, speaking of that area - is there any reason why dentry_cmp()
isn't simply
	return dentry_string_cmp(lockless_dereference(dentry->d_name.name),
				 ct, tcount);
other than "it predates lockless_dereference()"?

	The only difference is s/ACCESS_ONCE/READ_ONCE/, AFAICS -
lockless_dereference() uses the latter these days.  And that shouldn't
be a problem; as the matter of fact, *all* remaining ACCESS_ONCE in
fs/dcache.c look like they should become READ_ONCE.  Objections?

Al, digging through the barrier-related issues with dentries and peeling the
paint off the walls in process...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-25 22:54                 ` Al Viro
@ 2016-06-26  1:25                   ` Linus Torvalds
  2016-06-29  8:17                     ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2016-06-26  1:25 UTC (permalink / raw)
  To: Al Viro; +Cc: J. R. Okajima, linux-fsdevel, George Spelvin

On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> BTW, speaking of that area - is there any reason why dentry_cmp()
> isn't simply
>         return dentry_string_cmp(lockless_dereference(dentry->d_name.name),
>                                  ct, tcount);
> other than "it predates lockless_dereference()"?

No, the only important thing is that it's a read-once, and has the
smp_read_barrier_depends().

So lockless_derefence() looks like the right thing to do, it is just
that - as you suspected - the code predates that concept.

               Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-26  1:25                   ` Linus Torvalds
@ 2016-06-29  8:17                     ` Al Viro
  2016-06-29  9:22                       ` Hekuang
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2016-06-29  8:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, He Kuang

On Sat, Jun 25, 2016 at 06:25:58PM -0700, Linus Torvalds wrote:
> On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > BTW, speaking of that area - is there any reason why dentry_cmp()
> > isn't simply
> >         return dentry_string_cmp(lockless_dereference(dentry->d_name.name),
> >                                  ct, tcount);
> > other than "it predates lockless_dereference()"?
> 
> No, the only important thing is that it's a read-once, and has the
> smp_read_barrier_depends().
> 
> So lockless_derefence() looks like the right thing to do, it is just
> that - as you suspected - the code predates that concept.

... and as the matter of fact, patch to that effect (only cosmetically different
from what I'd cooked) had been posted by He Kuang back in March.  At least
I'd spotted that before pushing mine out...

The patch posted back then applied, with slightly edited commit message.  Apologies
for missing it...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Q. hlist_bl_add_head_rcu() in d_alloc_parallel()
  2016-06-29  8:17                     ` Al Viro
@ 2016-06-29  9:22                       ` Hekuang
  0 siblings, 0 replies; 16+ messages in thread
From: Hekuang @ 2016-06-29  9:22 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds; +Cc: linux-fsdevel


  在 2016/6/29 16:17, Al Viro 写道:
> On Sat, Jun 25, 2016 at 06:25:58PM -0700, Linus Torvalds wrote:
>> On Sat, Jun 25, 2016 at 3:54 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> BTW, speaking of that area - is there any reason why dentry_cmp()
>>> isn't simply
>>>          return dentry_string_cmp(lockless_dereference(dentry->d_name.name),
>>>                                   ct, tcount);
>>> other than "it predates lockless_dereference()"?
>> No, the only important thing is that it's a read-once, and has the
>> smp_read_barrier_depends().
>>
>> So lockless_derefence() looks like the right thing to do, it is just
>> that - as you suspected - the code predates that concept.
> ... and as the matter of fact, patch to that effect (only cosmetically different
> from what I'd cooked) had been posted by He Kuang back in March.  At least
> I'd spotted that before pushing mine out...
>
> The patch posted back then applied, with slightly edited commit message.  Apologies
> for missing it...
Its okay and really appreciate you remembering, I've forgotten that 
patch if not mentioned. :-)

Thank you.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-06-29  9:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17 20:50 Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
2016-06-17 22:16 ` Al Viro
2016-06-17 22:56   ` Al Viro
2016-06-19  5:24   ` J. R. Okajima
2016-06-19 16:55     ` Al Viro
2016-06-20  4:34       ` J. R. Okajima
2016-06-20  5:35         ` Al Viro
2016-06-20 14:51           ` Al Viro
2016-06-20 17:14             ` [git pull] vfs fixes Al Viro
2016-06-23  1:19           ` Q. hlist_bl_add_head_rcu() in d_alloc_parallel() J. R. Okajima
2016-06-23  2:58             ` Al Viro
2016-06-24  5:57               ` Linus Torvalds
2016-06-25 22:54                 ` Al Viro
2016-06-26  1:25                   ` Linus Torvalds
2016-06-29  8:17                     ` Al Viro
2016-06-29  9:22                       ` Hekuang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.