All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] fs/dcache: avoid trylock loops
@ 2018-02-16 15:09 John Ogness
  2018-02-16 15:09 ` [PATCH 1/4] fs/dcache: Remove stale comment from dentry_kill() John Ogness
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: John Ogness @ 2018-02-16 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, Linus Torvalds, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel

This patchset removes all trylock loops from within the dcache code.

Both d_delete() and dentry_kill() hold dentry->d_lock while needing
to acquire dentry->d_inode->i_lock. dentry_kill() also needs to
acquire dentry->d_parent->d_lock. This cannot be done directly with
spin_lock() operations because it is the reverse of the regular lock
order. To avoid ABBA deadlocks it is done using trylock loops.

Trylock loops are problematic in two scenarios:

  1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
     preemptible. As a consequence the i_lock holder can be preempted
     by a higher priority task. If that task executes the trylock
     loop it will do so forever and live lock.

  2) In virtual machines trylock loops are problematic as well. The
     VCPU on which the i_lock holder runs can be scheduled out and a
     task on a different VCPU can loop for a whole time slice. In the
     worst case this can lead to starvation. Commits 47be61845c77
     ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45f9
     ("shrink_dentry_list(): take parent's d_lock earlier") are
     addressing exactly those symptoms.

The trylock loops can be avoided with functionality similar to
lock_parent(); temporarily dropping dentry->d_lock while holding
the rcu_read_lock so that the desired locks can be acquired in the
correct order. After the locks are acquired it is necessary to verify
that the relevant dentry members did not change while dentry->d_lock
was dropped. If they changed, the whole operation must be repeated.

John Ogness (4):
  fs/dcache: Remove stale comment from dentry_kill()
  fs/dcache: Move dentry_kill() below lock_parent()
  fs/dcache: Avoid the try_lock loop in d_delete()
  fs/dcache: Avoid the try_lock loops in dentry_kill()

 fs/dcache.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 139 insertions(+), 40 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] fs/dcache: Remove stale comment from dentry_kill()
  2018-02-16 15:09 [PATCH 0/4] fs/dcache: avoid trylock loops John Ogness
@ 2018-02-16 15:09 ` John Ogness
  2018-02-16 15:09 ` [PATCH 2/4] fs/dcache: Move dentry_kill() below lock_parent() John Ogness
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: John Ogness @ 2018-02-16 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, Linus Torvalds, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel

Commit 0d98439ea3c6 ("vfs: use lockred "dead" flag to mark unrecoverably
dead dentries") removed the `ref' parameter in dentry_kill() but its
documentation remained. Remove it.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 fs/dcache.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 7c38f39958bc..3c7e0148b453 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -592,7 +592,6 @@ static void __dentry_kill(struct dentry *dentry)
 /*
  * Finish off a dentry we've decided to kill.
  * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *dentry_kill(struct dentry *dentry)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] fs/dcache: Move dentry_kill() below lock_parent()
  2018-02-16 15:09 [PATCH 0/4] fs/dcache: avoid trylock loops John Ogness
  2018-02-16 15:09 ` [PATCH 1/4] fs/dcache: Remove stale comment from dentry_kill() John Ogness
@ 2018-02-16 15:09 ` John Ogness
  2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
  2018-02-16 15:09 ` [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill() John Ogness
  3 siblings, 0 replies; 22+ messages in thread
From: John Ogness @ 2018-02-16 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, Linus Torvalds, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel

A subsequent patch will modify dentry_kill() to call lock_parent().
Move the dentry_kill() implementation "as is" below lock_parent()
first. This will help simplify the review of the subsequent patch
with dentry_kill() changes.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 fs/dcache.c | 62 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3c7e0148b453..9fed398687c9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -589,37 +589,6 @@ static void __dentry_kill(struct dentry *dentry)
 		dentry_free(dentry);
 }
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *dentry_kill(struct dentry *dentry)
-	__releases(dentry->d_lock)
-{
-	struct inode *inode = dentry->d_inode;
-	struct dentry *parent = NULL;
-
-	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
-		goto failed;
-
-	if (!IS_ROOT(dentry)) {
-		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
-			if (inode)
-				spin_unlock(&inode->i_lock);
-			goto failed;
-		}
-	}
-
-	__dentry_kill(dentry);
-	return parent;
-
-failed:
-	spin_unlock(&dentry->d_lock);
-	return dentry; /* try again with same dentry */
-}
-
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
 	struct dentry *parent = dentry->d_parent;
@@ -655,6 +624,37 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 }
 
 /*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+	__releases(dentry->d_lock)
+{
+	struct inode *inode = dentry->d_inode;
+	struct dentry *parent = NULL;
+
+	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
+		goto failed;
+
+	if (!IS_ROOT(dentry)) {
+		parent = dentry->d_parent;
+		if (unlikely(!spin_trylock(&parent->d_lock))) {
+			if (inode)
+				spin_unlock(&inode->i_lock);
+			goto failed;
+		}
+	}
+
+	__dentry_kill(dentry);
+	return parent;
+
+failed:
+	spin_unlock(&dentry->d_lock);
+	return dentry; /* try again with same dentry */
+}
+
+/*
  * Try to do a lockless dput(), and return whether that was successful.
  *
  * If unsuccessful, we return false, having already taken the dentry lock.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete()
  2018-02-16 15:09 [PATCH 0/4] fs/dcache: avoid trylock loops John Ogness
  2018-02-16 15:09 ` [PATCH 1/4] fs/dcache: Remove stale comment from dentry_kill() John Ogness
  2018-02-16 15:09 ` [PATCH 2/4] fs/dcache: Move dentry_kill() below lock_parent() John Ogness
@ 2018-02-16 15:09 ` John Ogness
  2018-02-16 17:10   ` Peter Zijlstra
                     ` (2 more replies)
  2018-02-16 15:09 ` [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill() John Ogness
  3 siblings, 3 replies; 22+ messages in thread
From: John Ogness @ 2018-02-16 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, Linus Torvalds, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel

d_delete() holds dentry->d_lock and needs to acquire
dentry->d_inode->i_lock. This cannot be done with a spin_lock()
operation because it's the reverse of the regular lock order. To avoid
the ABBA deadlock it is done with a trylock loop.

Trylock loops are problematic in two scenarios:

  1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
     preemptible. As a consequence the i_lock holder can be preempted
     by a higher priority task. If that task executes the trylock loop
     it will do so forever and live lock.

  2) In virtual machines trylock loops are problematic as well. The
     VCPU on which the i_lock holder runs can be scheduled out and a
     task on a different VCPU can loop for a whole time slice. In the
     worst case this can lead to starvation. Commits 47be61845c77
     ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45f9
     ("shrink_dentry_list(): take parent's d_lock earlier") are
     addressing exactly those symptoms.

The trylock loop can be avoided with functionality similar to
lock_parent(). The fast path tries the trylock first, which is likely
to succeed. In the contended case it attempts locking in the correct
order. This requires to drop dentry->d_lock first, which allows
another task to free d_inode. This can be prevented by the following
mechanism:

   inode = dentry->d_inode;
   rcu_read_lock();         <- Protects d_inode from being freed,
                               i.e. dentry->d_inode is a valid pointer
                               even after dentry->d_lock is dropped
   unlock(dentry->d_lock);
   lock(inode->i_lock);
   lock(dentry->d_lock);
   rcu_read_unlock();

After the locks are acquired it's necessary to verify whether
dentry->d_inode is still pointing to inode as it might have been
changed after dropping dentry->d_lock. If it matches d_delete() can
proceed, if not the whole operation has to be repeated.

Implement this in a new function dentry_lock_inode() which will be
used in a subsequent patch as well.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 fs/dcache.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9fed398687c9..2cd252f88c5d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -623,6 +623,48 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 	return parent;
 }
 
+/**
+ * dentry_lock_inode - Lock dentry->d_inode->i_lock
+ * @dentry:	The dentry to operate on
+ *
+ * Tries to acquire @dentry->d_inode->i_lock with a trylock first. If that
+ * fails it retries in correct lock order, which requires dropping
+ * @dentry->d_lock under RCU protection and then reacquiring it after
+ * locking @dentry->d_inode->i_lock.
+ *
+ * After both locks are acquired it must be verified that @dentry->d_inode
+ * did not change while @dentry->d_lock was dropped. If it's unchanged
+ * return true, otherwise drop @dentry->d_inode->i_lock and return false.
+ *
+ * Note, that even if @dentry->d_inode is unchanged, all other relevant struct
+ * members of @dentry must be reevaluated by the caller.
+ */
+static bool dentry_lock_inode(struct dentry *dentry)
+{
+	struct inode *inode = dentry->d_inode;
+
+	lockdep_assert_held(&dentry->d_lock);
+
+	if (unlikely(!spin_trylock(&inode->i_lock))) {
+		rcu_read_lock();
+		spin_unlock(&dentry->d_lock);
+		spin_lock(&inode->i_lock);
+		spin_lock(&dentry->d_lock);
+		rcu_read_unlock();
+
+		/*
+		 * @dentry->d_inode might have changed after dropping
+		 * @dentry->d_lock. If so, release @inode->i_lock and
+		 * signal the caller to restart the operation.
+		 */
+		if (unlikely(inode != dentry->d_inode)) {
+			spin_unlock(&inode->i_lock);
+			return false;
+		}
+	}
+	return true;
+}
+
 /*
  * Finish off a dentry we've decided to kill.
  * dentry->d_lock must be held, returns with it unlocked.
@@ -2378,22 +2420,36 @@ void d_delete(struct dentry * dentry)
 	/*
 	 * Are we the only user?
 	 */
-again:
 	spin_lock(&dentry->d_lock);
+again:
 	inode = dentry->d_inode;
 	isdir = S_ISDIR(inode->i_mode);
 	if (dentry->d_lockref.count == 1) {
-		if (!spin_trylock(&inode->i_lock)) {
-			spin_unlock(&dentry->d_lock);
-			cpu_relax();
+		/*
+		 * Lock the inode. Might drop dentry->d_lock temporarily
+		 * which allows inode to change. Start over if that happens.
+		 */
+		if (!dentry_lock_inode(dentry))
 			goto again;
+
+		/*
+		 * Recheck refcount as it might have been incremented while
+		 * d_lock was dropped.
+		 */
+		if (dentry->d_lockref.count != 1) {
+			spin_unlock(&inode->i_lock);
+			goto drop;
 		}
+		/*
+		 * isdir is not reloaded because it is not possible that it
+		 * changes on the same inode.
+		 */
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_unlink_inode(dentry);
 		fsnotify_nameremove(dentry, isdir);
 		return;
 	}
-
+drop:
 	if (!d_unhashed(dentry))
 		__d_drop(dentry);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 15:09 [PATCH 0/4] fs/dcache: avoid trylock loops John Ogness
                   ` (2 preceding siblings ...)
  2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
@ 2018-02-16 15:09 ` John Ogness
  2018-02-16 18:03   ` Linus Torvalds
  3 siblings, 1 reply; 22+ messages in thread
From: John Ogness @ 2018-02-16 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Al Viro, Linus Torvalds, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior, linux-kernel

dentry_kill() holds dentry->d_lock and needs to acquire both
dentry->d_inode->i_lock and dentry->d_parent->d_lock. This cannot be
done with spin_lock() operations because it's the reverse of the
regular lock order. To avoid ABBA deadlocks it is done with two
trylock loops.

Trylock loops are problematic in two scenarios:

  1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
     preemptible. As a consequence the i_lock holder can be preempted
     by a higher priority task. If that task executes the trylock loop
     it will do so forever and live lock.

  2) In virtual machines trylock loops are problematic as well. The
     VCPU on which the i_lock holder runs can be scheduled out and a
     task on a different VCPU can loop for a whole time slice. In the
     worst case this can lead to starvation. Commits 47be61845c77
     ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b45f9
     ("shrink_dentry_list(): take parent's d_lock earlier") are
     addressing exactly those symptoms.

Avoid the trylock loops by using dentry_lock_inode() and lock_parent()
which take the locks in the appropriate order. As both functions drop
dentry->lock briefly, this requires rechecking of the dentry content
as it might have changed after dropping the lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 fs/dcache.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 16 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 2cd252f88c5d..b557679c14c9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -673,27 +673,73 @@ static bool dentry_lock_inode(struct dentry *dentry)
 static struct dentry *dentry_kill(struct dentry *dentry)
 	__releases(dentry->d_lock)
 {
-	struct inode *inode = dentry->d_inode;
-	struct dentry *parent = NULL;
+	struct dentry *parent;
+	struct inode *inode;
+
+again:
+	parent = NULL;
+	inode = dentry->d_inode;
+	if (inode) {
+		/*
+		 * Lock the inode. Might drop dentry->d_lock temporarily
+		 * which allows inode to change. Start over if that happens.
+		 */
+		if (!dentry_lock_inode(dentry))
+			goto again;
 
-	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
-		goto failed;
+		/*
+		 * Recheck refcount as it might have been incremented while
+		 * d_lock was dropped.
+		 */
+		if (unlikely(dentry->d_lockref.count != 1))
+			goto drop_ref;
+	}
 
-	if (!IS_ROOT(dentry)) {
-		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
-			if (inode)
-				spin_unlock(&inode->i_lock);
-			goto failed;
-		}
+	parent = lock_parent(dentry);
+
+	/*
+	 * Check refcount because it might have changed
+	 * while d_lock was dropped.
+	 */
+	if (unlikely(dentry->d_lockref.count != 1))
+		goto drop_ref;
+
+	/*
+	 * The inode may have changed in the window where the
+	 * dentry was unlocked. If that happens it is necessary
+	 * to start over because the wrong inode lock is held.
+	 */
+	if (unlikely(inode != dentry->d_inode)) {
+		if (parent)
+			spin_unlock(&parent->d_lock);
+		if (inode)
+			spin_unlock(&inode->i_lock);
+		goto again;
 	}
 
 	__dentry_kill(dentry);
 	return parent;
 
-failed:
+drop_ref:
+	/*
+	 * If refcount is > 1 it was incremented while dentry->d_lock was
+	 * dropped. Just decrement the refcount, unlock and tell the caller
+	 * to stop the directory walk.
+	 *
+	 * For paranoia reasons check whether the refcount is < 1. If so,
+	 * report the detection and avoid the decrement which would just
+	 * cause a problem in some other place. The warning might be
+	 * helpful to decode the root cause of the refcounting bug.
+	 */
+	if (!WARN_ON(dentry->d_lockref.count < 1))
+		dentry->d_lockref.count--;
+
 	spin_unlock(&dentry->d_lock);
-	return dentry; /* try again with same dentry */
+	if (parent)
+		spin_unlock(&parent->d_lock);
+	if (inode)
+		spin_unlock(&inode->i_lock);
+	return NULL;
 }
 
 /*
@@ -865,10 +911,8 @@ void dput(struct dentry *dentry)
 
 kill_it:
 	dentry = dentry_kill(dentry);
-	if (dentry) {
-		cond_resched();
+	if (dentry)
 		goto repeat;
-	}
 }
 EXPORT_SYMBOL(dput);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete()
  2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
@ 2018-02-16 17:10   ` Peter Zijlstra
  2018-02-16 17:30   ` Peter Zijlstra
  2018-02-22  5:18   ` Al Viro
  2 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-02-16 17:10 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Linus Torvalds, Christoph Hellwig,
	Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel

On Fri, Feb 16, 2018 at 04:09:32PM +0100, John Ogness wrote:
> +static bool dentry_lock_inode(struct dentry *dentry)
> +{
> +	struct inode *inode = dentry->d_inode;
> +
> +	lockdep_assert_held(&dentry->d_lock);
> +
> +	if (unlikely(!spin_trylock(&inode->i_lock))) {

	if (likely(spin_trylock(&inode->i_lock)))
		return true;

and then unindent by 1 stop the below code:

> +		rcu_read_lock();
> +		spin_unlock(&dentry->d_lock);
> +		spin_lock(&inode->i_lock);
> +		spin_lock(&dentry->d_lock);
> +		rcu_read_unlock();
> +
> +		/*
> +		 * @dentry->d_inode might have changed after dropping
> +		 * @dentry->d_lock. If so, release @inode->i_lock and
> +		 * signal the caller to restart the operation.
> +		 */
> +		if (unlikely(inode != dentry->d_inode)) {
> +			spin_unlock(&inode->i_lock);
> +			return false;
> +		}
> +	}
> +	return true;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete()
  2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
  2018-02-16 17:10   ` Peter Zijlstra
@ 2018-02-16 17:30   ` Peter Zijlstra
  2018-02-22  5:18   ` Al Viro
  2 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-02-16 17:30 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Linus Torvalds, Christoph Hellwig,
	Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel,
	Paul McKenney

On Fri, Feb 16, 2018 at 04:09:32PM +0100, John Ogness wrote:
> 
>    inode = dentry->d_inode;
>    rcu_read_lock();         <- Protects d_inode from being freed,
>                                i.e. dentry->d_inode is a valid pointer
>                                even after dentry->d_lock is dropped
>    unlock(dentry->d_lock);
>    lock(inode->i_lock);
>    lock(dentry->d_lock);
>    rcu_read_unlock();

So that is entirely tricky, typically we have to have a lookup _after_
rcu_read_lock().

Here, we rely on not being able to call dentry_free() while we hold
d_lock, which ensure dentry must be valid in the freshly started
rcu-section.

And I suppose that same ensures dentry->d_ionde stays alive. But this
needs a comment at least.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 15:09 ` [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill() John Ogness
@ 2018-02-16 18:03   ` Linus Torvalds
  2018-02-16 22:32     ` John Ogness
  2018-02-22  5:40     ` Al Viro
  0 siblings, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2018-02-16 18:03 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Fri, Feb 16, 2018 at 7:09 AM, John Ogness <john.ogness@linutronix.de> wrote:
> dentry_kill() holds dentry->d_lock and needs to acquire both
> dentry->d_inode->i_lock and dentry->d_parent->d_lock. This cannot be
> done with spin_lock() operations because it's the reverse of the
> regular lock order. To avoid ABBA deadlocks it is done with two
> trylock loops.
>
> Trylock loops are problematic in two scenarios:

I don't mind this patch series per se (although I would really like Al
to ack it), but this particular patch I hate.

Why?

> Avoid the trylock loops by using dentry_lock_inode() and lock_parent()
> which take the locks in the appropriate order. As both functions drop
> dentry->lock briefly, this requires rechecking of the dentry content
> as it might have changed after dropping the lock.

I think the trylock should be done first, and then you don't need that
recheck for the common case.

I realize that the recheck itself isn't expensive, but it's mostly
about the code flow and the comment:

> +                * Recheck refcount as it might have been incremented while
> +                * d_lock was dropped.

the thing is, 99.9% of the time the d_lock wasn't dropped, so that
"while d_lock was dropped" comment is misleading.

Re-organizing it to do the trylock fastpath explicitly here and not
bothering with the re-check etc crid for the common case is the rioght
thing to do.

And the old code was actually organized exactly that way, with a

        if (inode && unlikely(!spin_trylock(&inode->i_lock)))
                goto failed;

at the top.

But instead of having that unlikely "failed" case do the complex
thing, you made the *normal* case do the complex thing.

So NAK on this.

It should be fairly trivial to fix, and make the "failed" thing do it right.

             Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 18:03   ` Linus Torvalds
@ 2018-02-16 22:32     ` John Ogness
  2018-02-16 22:42       ` Linus Torvalds
  2018-02-22  5:40     ` Al Viro
  1 sibling, 1 reply; 22+ messages in thread
From: John Ogness @ 2018-02-16 22:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On 2018-02-16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Fri, Feb 16, 2018 at 7:09 AM, John Ogness <john.ogness@linutronix.de> wrote:
>> dentry_kill() holds dentry->d_lock and needs to acquire both
>> dentry->d_inode->i_lock and dentry->d_parent->d_lock. This cannot be
>> done with spin_lock() operations because it's the reverse of the
>> regular lock order. To avoid ABBA deadlocks it is done with two
>> trylock loops.
>>
>> Trylock loops are problematic in two scenarios:
>
> I don't mind this patch series per se (although I would really like Al
> to ack it), but this particular patch I hate.
>
> Why?
>
>> Avoid the trylock loops by using dentry_lock_inode() and lock_parent()
>> which take the locks in the appropriate order. As both functions drop
>> dentry->lock briefly, this requires rechecking of the dentry content
>> as it might have changed after dropping the lock.
>
> I think the trylock should be done first, and then you don't need that
> recheck for the common case.
>
> I realize that the recheck itself isn't expensive, but it's mostly
> about the code flow and the comment:
>
>> +                * Recheck refcount as it might have been incremented while
>> +                * d_lock was dropped.
>
> the thing is, 99.9% of the time the d_lock wasn't dropped, so that
> "while d_lock was dropped" comment is misleading.
>
> Re-organizing it to do the trylock fastpath explicitly here and not
> bothering with the re-check etc crid for the common case is the rioght
> thing to do.
>
> And the old code was actually organized exactly that way, with a
>
>         if (inode && unlikely(!spin_trylock(&inode->i_lock)))
>                 goto failed;
>
> at the top.
>
> But instead of having that unlikely "failed" case do the complex
> thing, you made the *normal* case do the complex thing.
>
> So NAK on this.

lock_parent() already has the problem you are referring to. Callers are
required to recheck the dentry contents and check the returned parent
because they do not know if the trylock succeeded. See
d_prune_aliases(), for example.

Would you like my v2 to fixup lock_parent() semantics to address your
concerns there as well?

John Ogness

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 22:32     ` John Ogness
@ 2018-02-16 22:42       ` Linus Torvalds
  2018-02-16 23:05         ` John Ogness
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2018-02-16 22:42 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Fri, Feb 16, 2018 at 2:32 PM, John Ogness <john.ogness@linutronix.de> wrote:
>
> lock_parent() already has the problem you are referring to. Callers are
> required to recheck the dentry contents and check the returned parent
> because they do not know if the trylock succeeded. See
> d_prune_aliases(), for example.

What are you talking about?

lock_parent() does the nice "spin_trylock succeeded" special case.

Yes, it will then do the "unlock dentry, do the parent first, then
re-check" too, and callers may need to worry about it.

But that's not what I'm complaining about in your patch. You remove
the simple case, and make dentry_kill() do the "recheck in case I
dropped" every single time.

It's the "turn a simple case into a complex case" that I absolutely detest.

The fact that there are _other_ complex cases doesn't make it any
better. The whole "but Bobby does it too" thing is not a defense.
Would you jump off a bridge just because your friend did it?

               Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 22:42       ` Linus Torvalds
@ 2018-02-16 23:05         ` John Ogness
  2018-02-16 23:31           ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: John Ogness @ 2018-02-16 23:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On 2018-02-16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> lock_parent() already has the problem you are referring to. Callers
>> are required to recheck the dentry contents and check the returned
>> parent because they do not know if the trylock succeeded. See
>> d_prune_aliases(), for example.
>
> What are you talking about?
>
> lock_parent() does the nice "spin_trylock succeeded" special case.
>
> Yes, it will then do the "unlock dentry, do the parent first, then
> re-check" too, and callers may need to worry about it.
>
> But that's not what I'm complaining about in your patch. You remove
> the simple case, and make dentry_kill() do the "recheck in case I
> dropped" every single time.

dentry_lock_inode() uses the same semantics as lock_parent(). The caller
does not know if the trylock succeeded. Any caller using lock_parent()
must "recheck in case I dropped", just as with dentry_lock_inode(). This
is what you have pointed out.

> The fact that there are _other_ complex cases doesn't make it any
> better. The whole "but Bobby does it too" thing is not a defense.
> Would you jump off a bridge just because your friend did it?

dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
common case. So by changing the semantics of lock_parent(), I am
removing two "recheck in case I dropped" in the common case rather than
just the one you pointed out.

John Ogness

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 23:05         ` John Ogness
@ 2018-02-16 23:31           ` Linus Torvalds
  2018-02-16 23:49             ` John Ogness
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2018-02-16 23:31 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Fri, Feb 16, 2018 at 3:05 PM, John Ogness <john.ogness@linutronix.de> wrote:
>
> dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
> common case. So by changing the semantics of lock_parent(), I am
> removing two "recheck in case I dropped" in the common case rather than
> just the one you pointed out.

Ok, that would be lovely, but doesn't that end up being a nasty patch?
You can't just move the trylock into the caller, since then you need
to move all the other stuff too?

Or were you planning on splitting lock_parent() into two, for the
"fast case vs compex case"?

Or maybe I'm entirely missing something and we're miscommunicating.

I'm actually not so much worried about the cost of re-checking (the
real cost tends to be the locked cycle itself) as I am about the code
looking understandable. Your d_delete() patch didn't make me  go "that
looks more complicated", probably partl ybecause of the nice helper
function.

So it may be that my dislike of the "re-check after possibly dropping
the lock" is not really about the re-checking, but about just how it
made that function look much more complicated.

             Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 23:31           ` Linus Torvalds
@ 2018-02-16 23:49             ` John Ogness
  2018-02-17  0:06               ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: John Ogness @ 2018-02-16 23:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On 2018-02-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
>> common case. So by changing the semantics of lock_parent(), I am
>> removing two "recheck in case I dropped" in the common case rather
>> than just the one you pointed out.
>
> Ok, that would be lovely, but doesn't that end up being a nasty patch?

After reading your initial feedback my idea was to change both
lock_parent() and dentry_lock_inode() to not only communicate _if_ the
lock was successful, but also if d_lock was dropped in the process. (For
example, with a tristate rather than boolean return value.) Then callers
would know if they needed to recheck the dentry contents.

> So it may be that my dislike of the "re-check after possibly dropping
> the lock" is not really about the re-checking, but about just how it
> made that function look much more complicated.

I understand what you are saying and I appreciate the comments. I will
code up some variations for myself and try to pick the one that is the
least complicated for my v2.

John Ogness

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 23:49             ` John Ogness
@ 2018-02-17  0:06               ` Linus Torvalds
  2018-02-19 23:34                 ` John Ogness
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2018-02-17  0:06 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Fri, Feb 16, 2018 at 3:49 PM, John Ogness <john.ogness@linutronix.de> wrote:
>
> After reading your initial feedback my idea was to change both
> lock_parent() and dentry_lock_inode() to not only communicate _if_ the
> lock was successful, but also if d_lock was dropped in the process. (For
> example, with a tristate rather than boolean return value.) Then callers
> would know if they needed to recheck the dentry contents.

So I think that would work well for your dentry_lock_inode() helper,
and might indeed solve my reaction to your dentry_kill() patch.

I suspect it doesn't work for lock_parent(), because that has to also
return the parent itself. So you'd have to add another way to say
"didn't need to drop dentry lock". I suspect it gets ugly real fast.

But yes, making dentry_lock_inode() return 0/1/2 for "fail/fast/slow"
(or whatever) sounds doable. And then your dentry_kill() patch can use
a "switch ()" to handle the cases, and the whole "need to revalidate"
might become pretty clear and clean.

I'd suggest you ignore lock_parent() for now. Unless you come up with
something clever.

        Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-17  0:06               ` Linus Torvalds
@ 2018-02-19 23:34                 ` John Ogness
  2018-02-20  0:42                   ` Linus Torvalds
                                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: John Ogness @ 2018-02-19 23:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On 2018-02-17, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> After reading your initial feedback my idea was to change both
>> lock_parent() and dentry_lock_inode() to not only communicate _if_
>> the lock was successful, but also if d_lock was dropped in the
>> process. (For example, with a tristate rather than boolean return
>> value.) Then callers would know if they needed to recheck the dentry
>> contents.
>
> So I think that would work well for your dentry_lock_inode() helper,
> and might indeed solve my reaction to your dentry_kill() patch.

I have looked at different implementations and am not clearly convinced
the way to suggest for my v2 patchset. I am also considering avoiding
the fast case and just changing the misleading comments.

Here are 3 different code snippets from a possible v2 dentry_kill()
implementation:

---->8----
Implementation 1: Stay with the original patch implementation but
fix the comments so that they are not misleading. Supports branch
prediction but code assumes trylock failed. This is the simplest
approach.

                /*
                 * Lock the inode. Might drop dentry->d_lock temporarily
                 * which allows inode to change. Start over if that happens.
                 */
                if (unlikely(!dentry_lock_inode(dentry)))
                        goto again;

                /*
                 * Check refcount because it might have incremented
                 * if d_lock was temporarily dropped.
                 */
                if (unlikely(dentry->d_lockref.count != 1))
                        goto drop_ref;

---->8----
Implementation 2: Using switch on a dentry_lock_inode() that returns a
tristate value. Does not support branch prediction. This approach is
probably easiest to understand.

 		/*
 		 * Lock the inode. Might drop dentry->d_lock temporarily
 		 * which allows inode to change. Start over if that happens.
 		 */
                switch (dentry_lock_inode(dentry)) {
                case LOCK_FAST:
                        break;
                case LOCK_SLOW:
                        /*
                         * Recheck refcount as it might have been
                         * incremented while d_lock was dropped.
                         */
                        if (unlikely(dentry->d_lockref.count != 1))
                                goto drop_ref;
                        break;
                case LOCK_FAILED:
                        goto again;
                }

---->8----
Implementation 3: The same as implementation 2 but using if's to
support branch prediction. This approach is probably the most
complicated to understand but will be the fastest.

 		/*
 		 * Lock the inode. Might drop dentry->d_lock temporarily
 		 * which allows inode to change. Start over if that happens.
 		 */
                int ret = dentry_lock_inode(dentry);
                if (unlikely(ret != LOCK_FAST)) {
                        if (ret == LOCK_FAILED)
                                goto again;
                        /*
                         * Recheck refcount as it might have been
                         * incremented while d_lock was dropped.
                         */
                        if (dentry->d_lockref.count != 1)
                                goto drop_ref;
                }

---->8----

> I suspect it doesn't work for lock_parent(), because that has to also
> return the parent itself. So you'd have to add another way to say
> "didn't need to drop dentry lock". I suspect it gets ugly real fast.

If lock_parent() returns a non-NULL, it is returning
dentry->d_parent. So the return value is really just a boolean and the
locked parent is the parent of the dentry. The function is a little bit
tricky because it could return NULL (lock failed) even if the dentry has
a non-NULL d_parent. So any caller using a tristate return variation of
lock_parent() must rely on the return value instead of a non-NULL
dentry->d_parent.

Taking all that into account, the changes are fairly straight forward.

---->8----
Change lock_parent() to return tristate lock status instead of a pointer
to the locked dentry->d_parent.

diff --git a/fs/dcache.c b/fs/dcache.c
index b557679c14c9..53074243ce48 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -589,15 +589,15 @@ static void __dentry_kill(struct dentry *dentry)
 		dentry_free(dentry);
 }
 
-static inline struct dentry *lock_parent(struct dentry *dentry)
+static inline int lock_parent(struct dentry *dentry)
 {
 	struct dentry *parent = dentry->d_parent;
 	if (IS_ROOT(dentry))
-		return NULL;
+		return LOCK_FAILED;
 	if (unlikely(dentry->d_lockref.count < 0))
-		return NULL;
+		return LOCK_FAILED;
 	if (likely(spin_trylock(&parent->d_lock)))
-		return parent;
+		return LOCK_FAST;
 	rcu_read_lock();
 	spin_unlock(&dentry->d_lock);
 again:
@@ -616,11 +616,11 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 		goto again;
 	}
 	rcu_read_unlock();
-	if (parent != dentry)
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	else
-		parent = NULL;
-	return parent;
+	if (parent == dentry)
+		return LOCK_FAILED;
+
+	spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+	return LOCK_SLOW;
 }
 
 /**
@@ -1035,19 +1035,21 @@ EXPORT_SYMBOL(d_find_alias);
  */
 void d_prune_aliases(struct inode *inode)
 {
+	struct dentry *parent;
 	struct dentry *dentry;
 restart:
 	spin_lock(&inode->i_lock);
 	hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
 		spin_lock(&dentry->d_lock);
 		if (!dentry->d_lockref.count) {
-			struct dentry *parent = lock_parent(dentry);
+			int ret = lock_parent(dentry);
+			parent = dentry->d_parent;
 			if (likely(!dentry->d_lockref.count)) {
 				__dentry_kill(dentry);
 				dput(parent);
 				goto restart;
 			}
-			if (parent)
+			if (ret != LOCK_FAILED)
 				spin_unlock(&parent->d_lock);
 		}
 		spin_unlock(&dentry->d_lock);
@@ -1062,9 +1064,11 @@ static void shrink_dentry_list(struct list_head *list)
 
 	while (!list_empty(list)) {
 		struct inode *inode;
+		int ret;
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 		spin_lock(&dentry->d_lock);
-		parent = lock_parent(dentry);
+		ret = lock_parent(dentry);
+		parent = dentry->d_parent;
 
 		/*
 		 * The dispose list is isolated and dentries are not accounted
@@ -1079,7 +1083,7 @@ static void shrink_dentry_list(struct list_head *list)
 		 */
 		if (dentry->d_lockref.count > 0) {
 			spin_unlock(&dentry->d_lock);
-			if (parent)
+			if (ret != LOCK_FAILED)
 				spin_unlock(&parent->d_lock);
 			continue;
 		}
@@ -1088,7 +1092,7 @@ static void shrink_dentry_list(struct list_head *list)
 		if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
 			bool can_free = dentry->d_flags & DCACHE_MAY_FREE;
 			spin_unlock(&dentry->d_lock);
-			if (parent)
+			if (ret != LOCK_FAILED)
 				spin_unlock(&parent->d_lock);
 			if (can_free)
 				dentry_free(dentry);
@@ -1099,7 +1103,7 @@ static void shrink_dentry_list(struct list_head *list)
 		if (inode && unlikely(!spin_trylock(&inode->i_lock))) {
 			d_shrink_add(dentry, list);
 			spin_unlock(&dentry->d_lock);
-			if (parent)
+			if (ret != LOCK_FAILED)
 				spin_unlock(&parent->d_lock);
 			continue;
 		}

---->8----

The above patch adjusts all callers of lock_parent() except for 1 that
will be replaced will my new dentry_kill() in v2 anyway.

Although the callers of lock_parent() until now do not take advantage of
the LOCK_FAST return value, my new dentry_lock_inode() _would_. Which
would mean that the common case for dentry_kill() would not do any
unnecessary checks.

static struct dentry *dentry_kill(struct dentry *dentry)
        __releases(dentry->d_lock)
{
        int ret_inode, ret_parent;
        struct dentry *parent;
        struct inode *inode;
again:
        parent = NULL;
        inode = dentry->d_inode;
        if (inode) {
                ret_inode = dentry_lock_inode(dentry);
                if (unlikely(ret_inode != LOCK_FAST)) {
                        ...
                }
        }

        ret_parent = lock_parent(dentry);
        if (likely(ret_parent == LOCK_FAST)) {
                parent = dentry->d_parent;
        } else {
                ...
        }

        __dentry_kill(dentry);
        return parent;
...
}

> I'm actually not so much worried about the cost of re-checking (the
> real cost tends to be the locked cycle itself) as I am about the code
> looking understandable. Your d_delete() patch didn't make me go "that
> looks more complicated", probably partly because of the nice helper
> function.

IMHO the original v1 patchset is the simplest codewise. And if the
comments were updated to not mislead, it is the way I would want to go.

Adding all the code to verbosely model a fast path just seems to me to
add too much code to an already complex part of the VFS.

John Ogness

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-19 23:34                 ` John Ogness
@ 2018-02-20  0:42                   ` Linus Torvalds
  2018-02-20  8:39                   ` Peter Zijlstra
  2018-02-22  5:29                   ` Al Viro
  2 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2018-02-20  0:42 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Mon, Feb 19, 2018 at 3:34 PM, John Ogness <john.ogness@linutronix.de> wrote:
>
> IMHO the original v1 patchset is the simplest codewise. And if the
> comments were updated to not mislead, it is the way I would want to go.

Ok, looking at your alternatives, I am inclined to agree.

           Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-19 23:34                 ` John Ogness
  2018-02-20  0:42                   ` Linus Torvalds
@ 2018-02-20  8:39                   ` Peter Zijlstra
  2018-02-20  8:43                     ` Peter Zijlstra
  2018-02-22  5:29                   ` Al Viro
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-02-20  8:39 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, linux-fsdevel, Al Viro, Christoph Hellwig,
	Thomas Gleixner, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List, hjl.tools

On Tue, Feb 20, 2018 at 12:34:57AM +0100, John Ogness wrote:
> Implementation 2: Using switch on a dentry_lock_inode() that returns a
> tristate value. Does not support branch prediction. This approach is
> probably easiest to understand.
> 
>  		/*
>  		 * Lock the inode. Might drop dentry->d_lock temporarily
>  		 * which allows inode to change. Start over if that happens.
>  		 */
>                 switch (dentry_lock_inode(dentry)) {
>                 case LOCK_FAST:

Bah, I just checked, you cannot use GCC label attributes on statements
:/ Otherwise you could've done:

		case LOCK_FAST: __attribute__((hot));

>                         break;
>                 case LOCK_SLOW:
>                         /*
>                          * Recheck refcount as it might have been
>                          * incremented while d_lock was dropped.
>                          */
>                         if (unlikely(dentry->d_lockref.count != 1))
>                                 goto drop_ref;
>                         break;
>                 case LOCK_FAILED:
>                         goto again;
>                 }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-20  8:39                   ` Peter Zijlstra
@ 2018-02-20  8:43                     ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-02-20  8:43 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, linux-fsdevel, Al Viro, Christoph Hellwig,
	Thomas Gleixner, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List, hjl.tools

On Tue, Feb 20, 2018 at 09:39:37AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 20, 2018 at 12:34:57AM +0100, John Ogness wrote:
> > Implementation 2: Using switch on a dentry_lock_inode() that returns a
> > tristate value. Does not support branch prediction. This approach is
> > probably easiest to understand.
> > 
> >  		/*
> >  		 * Lock the inode. Might drop dentry->d_lock temporarily
> >  		 * which allows inode to change. Start over if that happens.
> >  		 */
> >                 switch (dentry_lock_inode(dentry)) {
> >                 case LOCK_FAST:
> 
> Bah, I just checked, you cannot use GCC label attributes on statements
> :/ Otherwise you could've done:
> 
> 		case LOCK_FAST: __attribute__((hot));

Oooh shiny, you can actually write:

		switch(__builtin_expect(dentry_lock_inode(dentry), LOCK_FAST)) {

and have that work, see:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59521

> >                         break;
> >                 case LOCK_SLOW:
> >                         /*
> >                          * Recheck refcount as it might have been
> >                          * incremented while d_lock was dropped.
> >                          */
> >                         if (unlikely(dentry->d_lockref.count != 1))
> >                                 goto drop_ref;
> >                         break;
> >                 case LOCK_FAILED:
> >                         goto again;
> >                 }
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete()
  2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
  2018-02-16 17:10   ` Peter Zijlstra
  2018-02-16 17:30   ` Peter Zijlstra
@ 2018-02-22  5:18   ` Al Viro
  2018-02-22  8:35     ` John Ogness
  2 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2018-02-22  5:18 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig,
	Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior,
	linux-kernel

On Fri, Feb 16, 2018 at 04:09:32PM +0100, John Ogness wrote:
> @@ -2378,22 +2420,36 @@ void d_delete(struct dentry * dentry)
>  	/*
>  	 * Are we the only user?
>  	 */
> -again:
>  	spin_lock(&dentry->d_lock);
> +again:
>  	inode = dentry->d_inode;
>  	isdir = S_ISDIR(inode->i_mode);
>  	if (dentry->d_lockref.count == 1) {
> -		if (!spin_trylock(&inode->i_lock)) {
> -			spin_unlock(&dentry->d_lock);
> -			cpu_relax();
> +		/*
> +		 * Lock the inode. Might drop dentry->d_lock temporarily
> +		 * which allows inode to change. Start over if that happens.
> +		 */
> +		if (!dentry_lock_inode(dentry))
>  			goto again;

IDGI.  First of all, why do we need to fetch ->d_inode (and calculate isdir)
before that dentry_lock_inode() of yours?  That's at least partially understandable
in the current version, where we need inode in d_delete() scope, but here it looks
bloody odd.

And if you move those fetches past the call of dentry_lock_inode(), you suddenly
get the life much simpler:

	grab d_lock
	if d_count is greater than 1, drop it and bugger off
	while !dentry_lock_inode(dentry)
		;
	fetch inode
	recheck d_count, in the unlikely case when it's greater than 1,
			drop and bugger off
	clear CANT_MOUNT
	calculate isdir
	unlink_inode
	fsnotify shite

I mean, do we really want to keep rechecking d_count on each loop iteration?
What does it buy us?  Sure, we want to recheck in the end for correctness
sake, but...

It might make sense to move the loop inside dentry_lock_inode(), IMO.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-19 23:34                 ` John Ogness
  2018-02-20  0:42                   ` Linus Torvalds
  2018-02-20  8:39                   ` Peter Zijlstra
@ 2018-02-22  5:29                   ` Al Viro
  2 siblings, 0 replies; 22+ messages in thread
From: Al Viro @ 2018-02-22  5:29 UTC (permalink / raw)
  To: John Ogness
  Cc: Linus Torvalds, linux-fsdevel, Christoph Hellwig,
	Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Tue, Feb 20, 2018 at 12:34:57AM +0100, John Ogness wrote:


> Implementation 3: The same as implementation 2 but using if's to
> support branch prediction. This approach is probably the most
> complicated to understand but will be the fastest.
> 
>  		/*
>  		 * Lock the inode. Might drop dentry->d_lock temporarily
>  		 * which allows inode to change. Start over if that happens.
>  		 */
>                 int ret = dentry_lock_inode(dentry);
>                 if (unlikely(ret != LOCK_FAST)) {
>                         if (ret == LOCK_FAILED)
>                                 goto again;
>                         /*
>                          * Recheck refcount as it might have been
>                          * incremented while d_lock was dropped.
>                          */
>                         if (dentry->d_lockref.count != 1)
>                                 goto drop_ref;
>                 }

Implementation 4: screw the tristate, move the loop inside dentry_lock_inode().

> If lock_parent() returns a non-NULL, it is returning
> dentry->d_parent. So the return value is really just a boolean and the
> locked parent is the parent of the dentry. The function is a little bit
> tricky because it could return NULL (lock failed) even if the dentry has
> a non-NULL d_parent. So any caller using a tristate return variation of
> lock_parent() must rely on the return value instead of a non-NULL
> dentry->d_parent.

dentry always has non-NULL ->d_parent; it might point to dentry itself, but
it's never NULL.

>  		if (!dentry->d_lockref.count) {
> -			struct dentry *parent = lock_parent(dentry);
> +			int ret = lock_parent(dentry);
> +			parent = dentry->d_parent;
>  			if (likely(!dentry->d_lockref.count)) {
>  				__dentry_kill(dentry);
>  				dput(parent);

Broken.  In IS_ROOT case you'll hit an extra dput() on dentry itself.
dput(NULL) is no-op; this, OTOH, isn't.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()
  2018-02-16 18:03   ` Linus Torvalds
  2018-02-16 22:32     ` John Ogness
@ 2018-02-22  5:40     ` Al Viro
  1 sibling, 0 replies; 22+ messages in thread
From: Al Viro @ 2018-02-22  5:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Ogness, linux-fsdevel, Christoph Hellwig, Thomas Gleixner,
	Peter Zijlstra, Sebastian Andrzej Siewior,
	Linux Kernel Mailing List

On Fri, Feb 16, 2018 at 10:03:15AM -0800, Linus Torvalds wrote:
> On Fri, Feb 16, 2018 at 7:09 AM, John Ogness <john.ogness@linutronix.de> wrote:
> > dentry_kill() holds dentry->d_lock and needs to acquire both
> > dentry->d_inode->i_lock and dentry->d_parent->d_lock. This cannot be
> > done with spin_lock() operations because it's the reverse of the
> > regular lock order. To avoid ABBA deadlocks it is done with two
> > trylock loops.
> >
> > Trylock loops are problematic in two scenarios:
> 
> I don't mind this patch series per se (although I would really like Al
> to ack it), but this particular patch I hate.

I'm not happy about the previous patch either.  Why do we need the users
of that thing to deal with retries?  And I'm not even sure we want to
bother with retries on inode change inside dentry_kill() itself - just
unlock, return dentry and let the caller handle that.  Callers *must*
handle "need to drop another one" anyway, so...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete()
  2018-02-22  5:18   ` Al Viro
@ 2018-02-22  8:35     ` John Ogness
  0 siblings, 0 replies; 22+ messages in thread
From: John Ogness @ 2018-02-22  8:35 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Christoph Hellwig,
	Thomas Gleixner, Peter Zijlstra, Sebastian Andrzej Siewior,
	linux-kernel

On 2018-02-22, Al Viro <viro@ZenIV.linux.org.uk> wrote:
>> @@ -2378,22 +2420,36 @@ void d_delete(struct dentry * dentry)
>>  	/*
>>  	 * Are we the only user?
>>  	 */
>> -again:
>>  	spin_lock(&dentry->d_lock);
>> +again:
>>  	inode = dentry->d_inode;
>>  	isdir = S_ISDIR(inode->i_mode);
>>  	if (dentry->d_lockref.count == 1) {
>> -		if (!spin_trylock(&inode->i_lock)) {
>> -			spin_unlock(&dentry->d_lock);
>> -			cpu_relax();
>> +		/*
>> +		 * Lock the inode. Might drop dentry->d_lock temporarily
>> +		 * which allows inode to change. Start over if that happens.
>> +		 */
>> +		if (!dentry_lock_inode(dentry))
>>  			goto again;
>
> IDGI.  First of all, why do we need to fetch ->d_inode (and calculate
> isdir) before that dentry_lock_inode() of yours? That's at least
> partially understandable in the current version, where we need inode
> in d_delete() scope, but here it looks bloody odd.

I tried to change the function as little as possible. You are right that
it now looks odd. I seem to have missed the forest for the trees.

> And if you move those fetches past the call of dentry_lock_inode(),
> you suddenly get the life much simpler:
>
> 	grab d_lock
> 	if d_count is greater than 1, drop it and bugger off
> 	while !dentry_lock_inode(dentry)
> 		;
> 	fetch inode
> 	recheck d_count, in the unlikely case when it's greater than 1,
> 			drop and bugger off
> 	clear CANT_MOUNT
> 	calculate isdir
> 	unlink_inode
> 	fsnotify shite
>
> I mean, do we really want to keep rechecking d_count on each loop
> iteration?  What does it buy us?  Sure, we want to recheck in the end
> for correctness sake, but...

I have been unable to produce a test case where dentry_lock_inode() can
fail. AFAICT it is not possible from userspace. Perhaps some filesystem
could trigger it. But if it would fail, getting the refcount to increase
in the dropped d_lock window is quite easy to reproduce. And in that
case we wouldn't need to keep trying to aquire the inode lock and could
just drop.
        
> It might make sense to move the loop inside dentry_lock_inode(), IMO.

Agreed. I will change dentry_lock_inode() so that it will only fail if
the refcount changes. If there are inode changes, it will loop
internally. That will change your suggestion to:

 	grab d_lock
 	if d_count is greater than 1
 	 	drop it and bugger off
 	if !dentry_lock_inode(dentry)
	 	drop it and bugger off
 	fetch inode
 	clear CANT_MOUNT
 	calculate isdir
 	unlink_inode
 	fsnotify shite

John Ogness

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-02-22  8:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-16 15:09 [PATCH 0/4] fs/dcache: avoid trylock loops John Ogness
2018-02-16 15:09 ` [PATCH 1/4] fs/dcache: Remove stale comment from dentry_kill() John Ogness
2018-02-16 15:09 ` [PATCH 2/4] fs/dcache: Move dentry_kill() below lock_parent() John Ogness
2018-02-16 15:09 ` [PATCH 3/4] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
2018-02-16 17:10   ` Peter Zijlstra
2018-02-16 17:30   ` Peter Zijlstra
2018-02-22  5:18   ` Al Viro
2018-02-22  8:35     ` John Ogness
2018-02-16 15:09 ` [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill() John Ogness
2018-02-16 18:03   ` Linus Torvalds
2018-02-16 22:32     ` John Ogness
2018-02-16 22:42       ` Linus Torvalds
2018-02-16 23:05         ` John Ogness
2018-02-16 23:31           ` Linus Torvalds
2018-02-16 23:49             ` John Ogness
2018-02-17  0:06               ` Linus Torvalds
2018-02-19 23:34                 ` John Ogness
2018-02-20  0:42                   ` Linus Torvalds
2018-02-20  8:39                   ` Peter Zijlstra
2018-02-20  8:43                     ` Peter Zijlstra
2018-02-22  5:29                   ` Al Viro
2018-02-22  5:40     ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.