linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] shrink_dcache_parent() deadlock
@ 2012-01-09 10:58 Miklos Szeredi
  2012-01-09 16:43 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Miklos Szeredi @ 2012-01-09 10:58 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, mgorman, gregkh, hch, akpm, torvalds

Two (or more) concurrent calls of shrink_dcache_parent() on the same
dentry with the right timing may cause shrink_dcache_parent() to loop
forever.

Here's what appears to happen:

1 - CPU0: select_parent(P) finds C and puts it on sb->s_dentry_lru, returns 1

2 - CPU0: __shrink_dcache_sb() moves C from sb->s_dentry_lru to tmp

3 - CPU1: select_parent(P) locks P->d_lock

4 - CPU0: shrink_dentry_list() locks C->d_lock
    dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

5 - CPU1: select_parent(P) locks C->d_lock,
          moves C from tmp to sb->s_dentry_lru, returns 1

6 - CPU0: shrink_dentry_list() finds tmp empty, returns

7 - Goto 2 with CPU0 and CPU1 switched


So basically select_parent() steals the dentry from shrink_dentry_list()
and thinks it found a new one, causing shrink_dentry_list() to think
it's making progress and loop over and over.

This was actually triggered by a combination of the bonding driver
removing a directory in sysfs and a udev rule causing udev to do stat
multiple times on the just removed directory.

One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent().  But I think a better solution is to stop the
stealing behavior.

This patch adds a new dentry flag that is set when the dentry is removed
from the lru and put on the list being processed by
shrink_dentry_list().  The flag is cleared in dentry_lru_del() which is
called if the dentry gets a new reference just before it's pruned.

Thoughts?

diff --git a/fs/dcache.c b/fs/dcache.c
index 89509b5..09805eb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -242,6 +242,7 @@ static void dentry_lru_add(struct dentry *dentry)
 static void __dentry_lru_del(struct dentry *dentry)
 {
 	list_del_init(&dentry->d_lru);
+	dentry->d_flags &= ~DCACHE_SHRINK_LIST;
 	dentry->d_sb->s_nr_dentry_unused--;
 	dentry_stat.nr_unused--;
 }
@@ -807,6 +808,7 @@ relock:
 			spin_unlock(&dentry->d_lock);
 		} else {
 			list_move_tail(&dentry->d_lru, &tmp);
+			dentry->d_flags |= DCACHE_SHRINK_LIST;
 			spin_unlock(&dentry->d_lock);
 			if (!--count)
 				break;
@@ -1117,11 +1119,11 @@ resume:
 		 * move only zero ref count dentries to the end 
 		 * of the unused list for prune_dcache
 		 */
-		if (!dentry->d_count) {
+		if (dentry->d_count) {
+			dentry_lru_del(dentry);
+		} else if(!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
 			dentry_lru_move_tail(dentry);
 			found++;
-		} else {
-			dentry_lru_del(dentry);
 		}
 
 		/*
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index ed9f74f..4eb8c80 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -203,6 +203,7 @@ struct dentry_operations {
 
 #define DCACHE_CANT_MOUNT	0x0100
 #define DCACHE_GENOCIDE		0x0200
+#define DCACHE_SHRINK_LIST	0x0400
 
 #define DCACHE_NFSFS_RENAMED	0x1000
      /* this dentry has been "silly renamed" and has to be deleted on the last

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 10:58 [RFC PATCH] shrink_dcache_parent() deadlock Miklos Szeredi
@ 2012-01-09 16:43 ` Linus Torvalds
  2012-01-09 17:05   ` Miklos Szeredi
  2012-01-09 17:16 ` Christoph Hellwig
  2012-01-09 17:27 ` Al Viro
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2012-01-09 16:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, linux-fsdevel, linux-kernel, mgorman, gregkh, hch, akpm

On Mon, Jan 9, 2012 at 2:58 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> This patch adds a new dentry flag that is set when the dentry is removed
> from the lru and put on the list being processed by
> shrink_dentry_list().  The flag is cleared in dentry_lru_del() which is
> called if the dentry gets a new reference just before it's pruned.
>
> Thoughts?

Looks reasonable. Were you actually able to reproduce the hang, or how
did you notice?

Al, comments?

                           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 16:43 ` Linus Torvalds
@ 2012-01-09 17:05   ` Miklos Szeredi
  2012-01-09 17:16     ` Greg KH
  0 siblings, 1 reply; 24+ messages in thread
From: Miklos Szeredi @ 2012-01-09 17:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: viro, linux-fsdevel, linux-kernel, mgorman, gregkh, hch, akpm

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Jan 9, 2012 at 2:58 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> This patch adds a new dentry flag that is set when the dentry is removed
>> from the lru and put on the list being processed by
>> shrink_dentry_list().  The flag is cleared in dentry_lru_del() which is
>> called if the dentry gets a new reference just before it's pruned.
>>
>> Thoughts?
>
> Looks reasonable. Were you actually able to reproduce the hang, or how
> did you notice?

We got a bug report from a partner about hang during reboot.  It's
actually quite easily reproducible with:

while true; do
        echo -bond0 > /sys/class/net/bonding_masters
        echo +bond0 > /sys/class/net/bonding_masters
        echo -bond1 > /sys/class/net/bonding_masters
        echo +bond1 > /sys/class/net/bonding_masters
done

That reliably triggers the soft lookup detector for several machines
that we tested.  It's rather timing sensitive though, because turning on
DEBUG_SPINLOCK makes it go away.

I had to add printks to see what's going on, because it wasn't obvious
from the stack traces and crash dumps.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 10:58 [RFC PATCH] shrink_dcache_parent() deadlock Miklos Szeredi
  2012-01-09 16:43 ` Linus Torvalds
@ 2012-01-09 17:16 ` Christoph Hellwig
  2012-01-09 17:30   ` Al Viro
  2012-01-09 17:27 ` Al Viro
  2 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2012-01-09 17:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: viro, linux-fsdevel, linux-kernel, mgorman, gregkh, akpm,
	torvalds, david

Dave's patch from August in which removes the whole select_parent lru abuse
should fix this to, and actually clean up that area.  It will probably
need a rebase as it was near the end of a larger series.  We probably
want the patch directly following it in the series, too.

---
From:	Dave Chinner <david@fromorbit.com>
Subject: [PATCH 11/13] dcache: use a dispose list in select_parent

From: Dave Chinner <dchinner@redhat.com>

select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.

IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.

Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/dcache.c |   65 ++++++++++++++++++++---------------------------------------
 1 files changed, 22 insertions(+), 43 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index d19e453..b931415 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -264,15 +264,15 @@ static void dentry_lru_del(struct dentry *dentry)
 	}
 }
 
-static void dentry_lru_move_tail(struct dentry *dentry)
+static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list)
 {
 	spin_lock(&dentry->d_sb->s_dentry_lru_lock);
 	if (list_empty(&dentry->d_lru)) {
-		list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+		list_add_tail(&dentry->d_lru, list);
 		dentry->d_sb->s_nr_dentry_unused++;
 		this_cpu_inc(nr_dentry_unused);
 	} else {
-		list_move_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+		list_move_tail(&dentry->d_lru, list);
 	}
 	spin_unlock(&dentry->d_sb->s_dentry_lru_lock);
 }
@@ -752,14 +752,18 @@ static void shrink_dentry_list(struct list_head *list)
 }
 
 /**
- * __shrink_dcache_sb - shrink the dentry LRU on a given superblock
- * @sb:		superblock to shrink dentry LRU.
- * @count:	number of entries to prune
- * @flags:	flags to control the dentry processing
+ * prune_dcache_sb - shrink the dcache
+ * @sb: superblock
+ * @nr_to_scan: number of entries to try to free
+ *
+ * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
+ * done when we need more memory an called from the superblock shrinker
+ * function.
  *
- * If flags contains DCACHE_REFERENCED reference dentries will not be pruned.
+ * This function may fail to free any resources if all the dentries are in
+ * use.
  */
-static long __shrink_dcache_sb(struct super_block *sb, long count, int flags)
+long prune_dcache_sb(struct super_block *sb, long nr_to_scan)
 {
 	struct dentry *dentry;
 	LIST_HEAD(referenced);
@@ -779,13 +783,7 @@ relock:
 			goto relock;
 		}
 
-		/*
-		 * If we are honouring the DCACHE_REFERENCED flag and the
-		 * dentry has this flag set, don't free it.  Clear the flag
-		 * and put it back on the LRU.
-		 */
-		if (flags & DCACHE_REFERENCED &&
-				dentry->d_flags & DCACHE_REFERENCED) {
+		if (dentry->d_flags & DCACHE_REFERENCED) {
 			dentry->d_flags &= ~DCACHE_REFERENCED;
 			list_move(&dentry->d_lru, &referenced);
 			spin_unlock(&dentry->d_lock);
@@ -793,7 +791,7 @@ relock:
 			list_move_tail(&dentry->d_lru, &tmp);
 			spin_unlock(&dentry->d_lock);
 			freed++;
-			if (!--count)
+			if (!--nr_to_scan)
 				break;
 		}
 		cond_resched_lock(&sb->s_dentry_lru_lock);
@@ -807,23 +805,6 @@ relock:
 }
 
 /**
- * prune_dcache_sb - shrink the dcache
- * @sb: superblock
- * @nr_to_scan: number of entries to try to free
- *
- * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
- * done when we need more memory an called from the superblock shrinker
- * function.
- *
- * This function may fail to free any resources if all the dentries are in
- * use.
- */
-long prune_dcache_sb(struct super_block *sb, long nr_to_scan)
-{
-	return __shrink_dcache_sb(sb, nr_to_scan, DCACHE_REFERENCED);
-}
-
-/**
  * shrink_dcache_sb - shrink dcache for a superblock
  * @sb: superblock
  *
@@ -1073,7 +1054,7 @@ EXPORT_SYMBOL(have_submounts);
  * drop the lock and return early due to latency
  * constraints.
  */
-static long select_parent(struct dentry * parent)
+static long select_parent(struct dentry *parent, struct list_head *dispose)
 {
 	struct dentry *this_parent;
 	struct list_head *next;
@@ -1095,12 +1076,11 @@ resume:
 
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 
-		/* 
-		 * move only zero ref count dentries to the end 
-		 * of the unused list for prune_dcache
+		/*
+		 * move only zero ref count dentries to the dispose list.
 		 */
 		if (!dentry->d_count) {
-			dentry_lru_move_tail(dentry);
+			dentry_lru_move_list(dentry, dispose);
 			found++;
 		} else {
 			dentry_lru_del(dentry);
@@ -1162,14 +1142,13 @@ rename_retry:
  *
  * Prune the dcache to remove unused children of the parent dentry.
  */
- 
 void shrink_dcache_parent(struct dentry * parent)
 {
-	struct super_block *sb = parent->d_sb;
+	LIST_HEAD(dispose);
 	long found;
 
-	while ((found = select_parent(parent)) != 0)
-		__shrink_dcache_sb(sb, found, 0);
+	while ((found = select_parent(parent, &dispose)) != 0)
+		shrink_dentry_list(&dispose);
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 17:05   ` Miklos Szeredi
@ 2012-01-09 17:16     ` Greg KH
  0 siblings, 0 replies; 24+ messages in thread
From: Greg KH @ 2012-01-09 17:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, viro, linux-fsdevel, linux-kernel, mgorman, hch, akpm

On Mon, Jan 09, 2012 at 06:05:32PM +0100, Miklos Szeredi wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Mon, Jan 9, 2012 at 2:58 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> This patch adds a new dentry flag that is set when the dentry is removed
> >> from the lru and put on the list being processed by
> >> shrink_dentry_list().  The flag is cleared in dentry_lru_del() which is
> >> called if the dentry gets a new reference just before it's pruned.
> >>
> >> Thoughts?
> >
> > Looks reasonable. Were you actually able to reproduce the hang, or how
> > did you notice?
> 
> We got a bug report from a partner about hang during reboot.  It's
> actually quite easily reproducible with:
> 
> while true; do
>         echo -bond0 > /sys/class/net/bonding_masters
>         echo +bond0 > /sys/class/net/bonding_masters
>         echo -bond1 > /sys/class/net/bonding_masters
>         echo +bond1 > /sys/class/net/bonding_masters
> done
> 
> That reliably triggers the soft lookup detector for several machines
> that we tested.  It's rather timing sensitive though, because turning on
> DEBUG_SPINLOCK makes it go away.
> 
> I had to add printks to see what's going on, because it wasn't obvious
> from the stack traces and crash dumps.

You also usually need to add a udev rule that looks at the pci id of the
device as well, without checking for what type it is (which is what a
"correct" udev rule should be doing, which is why we only started seeing
this on some systems, with "incorrect" rules.)

Having a file in /lib/udev/rules.d/ with only this one rule:
ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"

Seems to do the trick.  udev calls stat() on the vendor sysfs file,
while it is going away, and then this dentry race shows up.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 10:58 [RFC PATCH] shrink_dcache_parent() deadlock Miklos Szeredi
  2012-01-09 16:43 ` Linus Torvalds
  2012-01-09 17:16 ` Christoph Hellwig
@ 2012-01-09 17:27 ` Al Viro
  2 siblings, 0 replies; 24+ messages in thread
From: Al Viro @ 2012-01-09 17:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, mgorman, gregkh, hch, akpm, torvalds

On Mon, Jan 09, 2012 at 11:58:54AM +0100, Miklos Szeredi wrote:

> This patch adds a new dentry flag that is set when the dentry is removed
> from the lru and put on the list being processed by
> shrink_dentry_list().  The flag is cleared in dentry_lru_del() which is
> called if the dentry gets a new reference just before it's pruned.

Umm...  In principle, I like that variant, provided that you send it with
sane commit message.  Explicit description of the conditions when new
flag is set will probably turn out to be very useful several months down
the road, so...  Unless I'm misreading you, it should be something along
the lines of "flag is set if and only if dentry->d_lru is currently a part of
tmp cyclic list of some ongoing __shrink_dcache_sb() instance", preferably
without torturing the langauge so much...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 17:16 ` Christoph Hellwig
@ 2012-01-09 17:30   ` Al Viro
  2012-01-09 18:30     ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Al Viro @ 2012-01-09 17:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, mgorman, gregkh,
	akpm, torvalds, david

On Mon, Jan 09, 2012 at 12:16:39PM -0500, Christoph Hellwig wrote:
> Dave's patch from August in which removes the whole select_parent lru abuse
> should fix this to, and actually clean up that area.  It will probably
> need a rebase as it was near the end of a larger series.  We probably
> want the patch directly following it in the series, too.

Resend would be nice; I can try to dig the series out and rebase it, but...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 17:30   ` Al Viro
@ 2012-01-09 18:30     ` Linus Torvalds
  2012-01-09 18:46       ` Linus Torvalds
  2012-01-09 20:59       ` Dave Chinner
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2012-01-09 18:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Miklos Szeredi, linux-fsdevel, linux-kernel,
	mgorman, gregkh, akpm, david

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On Mon, Jan 9, 2012 at 9:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Resend would be nice; I can try to dig the series out and rebase it, but...

Here's a TOTALLY UNTESTED rebase of just that single patch from Dave.

I did not check if the rest of the series did something that this
patch needs, so the patch may be entirely broken, but it does look
sane on its own (ie I do not see anything obviously broken in it). So
it still looks like a nice cleanup, but maybe I'm missing something
subtle, and maybe the rest of Dave's series really is required.

It compiles. That's all I'm going to say about my extensive testing.
Which is also why I haven't added my sign-off to it.

Comments?

                                Linus

[-- Attachment #2: dave.diff --]
[-- Type: text/x-patch, Size: 5633 bytes --]

From: Dave Chinner <david@fromorbit.com>
Subject: [PATCH 11/13] dcache: use a dispose list in select_parent
Date: 	Tue, 23 Aug 2011 18:56:24 +1000

select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.

IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.

Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/dcache.c |   63 +++++++++++++++++++---------------------------------------
 1 files changed, 21 insertions(+), 42 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9791b1e7eee4..b209d73f9a98 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -276,15 +276,15 @@ static void dentry_lru_prune(struct dentry *dentry)
 	}
 }
 
-static void dentry_lru_move_tail(struct dentry *dentry)
+static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list)
 {
 	spin_lock(&dcache_lru_lock);
 	if (list_empty(&dentry->d_lru)) {
-		list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+		list_add_tail(&dentry->d_lru, list);
 		dentry->d_sb->s_nr_dentry_unused++;
 		dentry_stat.nr_unused++;
 	} else {
-		list_move_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
+		list_move_tail(&dentry->d_lru, list);
 	}
 	spin_unlock(&dcache_lru_lock);
 }
@@ -770,14 +770,18 @@ static void shrink_dentry_list(struct list_head *list)
 }
 
 /**
- * __shrink_dcache_sb - shrink the dentry LRU on a given superblock
- * @sb:		superblock to shrink dentry LRU.
- * @count:	number of entries to prune
- * @flags:	flags to control the dentry processing
+ * prune_dcache_sb - shrink the dcache
+ * @sb: superblock
+ * @count: number of entries to try to free
+ *
+ * Attempt to shrink the superblock dcache LRU by @count entries. This is
+ * done when we need more memory an called from the superblock shrinker
+ * function.
  *
- * If flags contains DCACHE_REFERENCED reference dentries will not be pruned.
+ * This function may fail to free any resources if all the dentries are in
+ * use.
  */
-static void __shrink_dcache_sb(struct super_block *sb, int count, int flags)
+void prune_dcache_sb(struct super_block *sb, int count)
 {
 	struct dentry *dentry;
 	LIST_HEAD(referenced);
@@ -796,13 +800,7 @@ relock:
 			goto relock;
 		}
 
-		/*
-		 * If we are honouring the DCACHE_REFERENCED flag and the
-		 * dentry has this flag set, don't free it.  Clear the flag
-		 * and put it back on the LRU.
-		 */
-		if (flags & DCACHE_REFERENCED &&
-				dentry->d_flags & DCACHE_REFERENCED) {
+		if (dentry->d_flags & DCACHE_REFERENCED) {
 			dentry->d_flags &= ~DCACHE_REFERENCED;
 			list_move(&dentry->d_lru, &referenced);
 			spin_unlock(&dentry->d_lock);
@@ -822,23 +820,6 @@ relock:
 }
 
 /**
- * prune_dcache_sb - shrink the dcache
- * @sb: superblock
- * @nr_to_scan: number of entries to try to free
- *
- * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
- * done when we need more memory an called from the superblock shrinker
- * function.
- *
- * This function may fail to free any resources if all the dentries are in
- * use.
- */
-void prune_dcache_sb(struct super_block *sb, int nr_to_scan)
-{
-	__shrink_dcache_sb(sb, nr_to_scan, DCACHE_REFERENCED);
-}
-
-/**
  * shrink_dcache_sb - shrink dcache for a superblock
  * @sb: superblock
  *
@@ -1092,7 +1073,7 @@ EXPORT_SYMBOL(have_submounts);
  * drop the lock and return early due to latency
  * constraints.
  */
-static int select_parent(struct dentry * parent)
+static int select_parent(struct dentry *parent, struct list_head *dispose)
 {
 	struct dentry *this_parent;
 	struct list_head *next;
@@ -1114,12 +1095,11 @@ resume:
 
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 
-		/* 
-		 * move only zero ref count dentries to the end 
-		 * of the unused list for prune_dcache
+		/*
+		 * move only zero ref count dentries to the dispose list.
 		 */
 		if (!dentry->d_count) {
-			dentry_lru_move_tail(dentry);
+			dentry_lru_move_list(dentry, dispose);
 			found++;
 		} else {
 			dentry_lru_del(dentry);
@@ -1181,14 +1161,13 @@ rename_retry:
  *
  * Prune the dcache to remove unused children of the parent dentry.
  */
- 
 void shrink_dcache_parent(struct dentry * parent)
 {
-	struct super_block *sb = parent->d_sb;
+	LIST_HEAD(dispose);
 	int found;
 
-	while ((found = select_parent(parent)) != 0)
-		__shrink_dcache_sb(sb, found, 0);
+	while ((found = select_parent(parent, &dispose)) != 0)
+		shrink_dentry_list(&dispose);
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 18:30     ` Linus Torvalds
@ 2012-01-09 18:46       ` Linus Torvalds
  2012-01-09 19:04         ` Christoph Hellwig
  2012-01-09 20:59       ` Dave Chinner
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2012-01-09 18:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Miklos Szeredi, linux-fsdevel, linux-kernel,
	mgorman, gregkh, akpm, david

On Mon, Jan 9, 2012 at 10:30 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Here's a TOTALLY UNTESTED rebase of just that single patch from Dave.

Well, it boots, and the dentry cache shrinks under memory pressure. So
it's not _totally_ untested now.

But I don't really see why it would fix Miklos' case. I think
select_parent() may still end up touching the d_lru list of something
that is on the 'dispose' list. So the patch looks like a nice cleanup,
but it seems to be independent of the issue Miklos found.

What am I missing now?

                      Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 18:46       ` Linus Torvalds
@ 2012-01-09 19:04         ` Christoph Hellwig
  2012-01-09 19:18           ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2012-01-09 19:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm, david

On Mon, Jan 09, 2012 at 10:46:22AM -0800, Linus Torvalds wrote:
> On Mon, Jan 9, 2012 at 10:30 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Here's a TOTALLY UNTESTED rebase of just that single patch from Dave.
> 
> Well, it boots, and the dentry cache shrinks under memory pressure. So
> it's not _totally_ untested now.
> 
> But I don't really see why it would fix Miklos' case. I think
> select_parent() may still end up touching the d_lru list of something
> that is on the 'dispose' list. So the patch looks like a nice cleanup,
> but it seems to be independent of the issue Miklos found.
> 
> What am I missing now?

After Dave's patch select_parent isolates dentries that are going to
be dropped directly to an on-stack list instead of abusing the LRU.

With that scheme the trylock and retry loop in __shrink_dcache_sb
goes away completely for this caller.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 19:04         ` Christoph Hellwig
@ 2012-01-09 19:18           ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2012-01-09 19:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Miklos Szeredi, linux-fsdevel, linux-kernel, mgorman,
	gregkh, akpm, david

On Mon, Jan 9, 2012 at 11:04 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> After Dave's patch select_parent isolates dentries that are going to
> be dropped directly to an on-stack list instead of abusing the LRU.
>
> With that scheme the trylock and retry loop in __shrink_dcache_sb
> goes away completely for this caller.

Yes, but it still exists for the prune_dcache_sb() case.

So prune_dcache_sb() puts random dentries on its own private lists,
and drops the lru_lock (and the dentry lock).

In the meantime, what protects us from select_parent() finding those
*same* dentries, and doing

    dentry_lru_move_list(dentry, dispose);

which - despite the name - moves the dentry not from the lru list, but
from the prune_dcache_sb local list to the select_parent() local
list..

Hmm. I guess the endless "move back-and-forth" thing is gone, but the
random "move from one private list to another" makes me worry about
the confusion.

But I guess we don't care - the same thing is going to happen to the
dentry regardless of which of the local lists it is on.

And in any case, I think Dave's patch looks like a nice cleanup. Does
it work for people in that forward-ported-alone version I sent out?

And Miklos, does it fix your test-case? Because if so, I think Dave's
patch is nicer, and avoids adding a new state bit by just cleaning
things up in general.

                        Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 18:30     ` Linus Torvalds
  2012-01-09 18:46       ` Linus Torvalds
@ 2012-01-09 20:59       ` Dave Chinner
  2012-01-09 21:21         ` Linus Torvalds
  2012-01-09 21:26         ` Al Viro
  1 sibling, 2 replies; 24+ messages in thread
From: Dave Chinner @ 2012-01-09 20:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Mon, Jan 09, 2012 at 10:30:59AM -0800, Linus Torvalds wrote:
> On Mon, Jan 9, 2012 at 9:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Resend would be nice; I can try to dig the series out and rebase it, but...
> 
> Here's a TOTALLY UNTESTED rebase of just that single patch from Dave.
> 
> I did not check if the rest of the series did something that this
> patch needs, so the patch may be entirely broken, but it does look
> sane on its own (ie I do not see anything obviously broken in it). So
> it still looks like a nice cleanup, but maybe I'm missing something
> subtle, and maybe the rest of Dave's series really is required.

Nothing else is required - that patch is a OK by itself as a bug
fix/cleanup.

> It compiles. That's all I'm going to say about my extensive testing.
> Which is also why I haven't added my sign-off to it.

I did quite a bit of load and stress testing on the series back when
I first posted it, so the dcache changes don't have any problems I
know of.

> Comments?

Looks OK to me.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 20:59       ` Dave Chinner
@ 2012-01-09 21:21         ` Linus Torvalds
  2012-01-10  1:34           ` Al Viro
  2012-01-09 21:26         ` Al Viro
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2012-01-09 21:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Mon, Jan 9, 2012 at 12:59 PM, Dave Chinner <david@fromorbit.com> wrote:
>
>> It compiles. That's all I'm going to say about my extensive testing.
>> Which is also why I haven't added my sign-off to it.
>
> I did quite a bit of load and stress testing on the series back when
> I first posted it, so the dcache changes don't have any problems I
> know of.

.. and from the kernel compiles and looking at the code some more
today, I'm getting more and more convinced that the patch is
absolutely the right thing to do.

So I'll happily add my signed-off on that code if Al wants to take
that edited patch into his tree. Or just commit it directly.

I'd like to hear that it fixes Miklos' case from somebody that
recreated it - that bonding_masters script doesn't work for me because
I compile or use (nor do I compile or use the soft-lockup detector),
so it would be good if somebody who has the particular test-case
working (or "not working" ;) for them to verify that yes, it fixes it.

But I think I agree with the explanation for why it would be fixed,
and I certainly agree with the code being cleaner.

                      Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 20:59       ` Dave Chinner
  2012-01-09 21:21         ` Linus Torvalds
@ 2012-01-09 21:26         ` Al Viro
  1 sibling, 0 replies; 24+ messages in thread
From: Al Viro @ 2012-01-09 21:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 07:59:07AM +1100, Dave Chinner wrote:
> > Comments?
> 
> Looks OK to me.

OK, grabbed.  And there's *more* fixes for obvious shite - by now I'm
really sick and tired of what people are doing with failure exits;
this morning catch just from looking through d_alloc_root() callers:
	isofs - inode leak
	ext4 - dentry leak + completely bogus handling of ext4_mb_init()
failure (stuff that hadn't been allocated gets freed, stuff that was
allocated isn't)
	ceph - d_alloc_root() can fail.  NULL pointer derefs galore...
)

Frankly, d_alloc_root() had been a bad API; it should've been doing
iput() on allocation failure.  I've added a trivial helper in the
local tree (d_make_root(inode) - same as d_alloc_root(inode) and do
iput(inode) if result turns out to be NULL).  Looks like *all* callers
of d_alloc_root() either turn out to be buggy or trivially convert to
d_make_root().  With a lot of boilerplate crap removed...

Hell knows...  Originally I thought about leaving both side-by-side, but
it really starts looking as if there's no reason to keep d_alloc_root()
at all...  I still have a couple of callers to check, though.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-09 21:21         ` Linus Torvalds
@ 2012-01-10  1:34           ` Al Viro
  2012-01-10  2:02             ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Al Viro @ 2012-01-10  1:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Mon, Jan 09, 2012 at 01:21:43PM -0800, Linus Torvalds wrote:
> So I'll happily add my signed-off on that code if Al wants to take
> that edited patch into his tree. Or just commit it directly.

It's there.  The usual place, i.e.
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (4):
      isofs: inode leak on mount failure
      ext4: fix failure exits
      ceph: d_alloc_root() may fail
      vfs: new helper - d_make_root()

Dave Chinner (1):
      dcache: use a dispose list in select_parent

Diffstat:
 fs/ceph/super.c        |   15 ++++++--
 fs/dcache.c            |   80 +++++++++++++++++++++++-------------------------
 fs/ext4/super.c        |   13 +++++---
 fs/isofs/inode.c       |    7 +++-
 include/linux/dcache.h |    1 +
 5 files changed, 63 insertions(+), 53 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10  1:34           ` Al Viro
@ 2012-01-10  2:02             ` Linus Torvalds
  2012-01-10 10:05               ` Miklos Szeredi
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2012-01-10  2:02 UTC (permalink / raw)
  To: Al Viro
  Cc: Dave Chinner, Christoph Hellwig, Miklos Szeredi, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Mon, Jan 9, 2012 at 5:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Dave Chinner (1):
>      dcache: use a dispose list in select_parent

Hmm. Should this also have been marked for stable? I'm assuming that
the possible deadlock (strictly speaking it's a livelock, I guess?)
goes back several releases.

Anyway, it's in my tree now, I suspect it would be good to pick at
least as far back as it trivially applies.

Greg - commit b48f03b319ba78f3abf9a7044d1f436d8d90f4f9.

                           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10  2:02             ` Linus Torvalds
@ 2012-01-10 10:05               ` Miklos Szeredi
  2012-01-10 16:00                 ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Miklos Szeredi @ 2012-01-10 10:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 3:02 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Jan 9, 2012 at 5:34 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> Dave Chinner (1):
>>      dcache: use a dispose list in select_parent
>
> Hmm. Should this also have been marked for stable? I'm assuming that
> the possible deadlock (strictly speaking it's a livelock, I guess?)

I tested Dave's patch and the bug can still be easily reproduced.

And that's to be expected, as the intermediate "being on the lru"
state that Dave's patch eliminates doesn't play a fundamental part in
the mechanism of the livelock.  It does eliminate one trylock, but
that's not the one critical to this bug (removing it is a very good
idea anyway).

The critical trylock here is the one in dentry_kill() which tries to
lock the parent.  That is basically guaranteed to fail if there are
more then one instances of shrink_dcache_parent() running on the same
dentry, because select_parent() holds that parent lock for a
relatively long time.

With Dave's patch the livelock becomes like this (basically the same
as without the patch):

1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

2 - CPU1: select_parent(P) locks P->d_lock

3 - CPU0: shrink_dentry_list() locks C->d_lock
   dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

4 - CPU1: select_parent(P) locks C->d_lock,
         moves C from dispose list being processed on CPU0 to new
dispose list, returns 1

5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

6 - Goto 2 with CPU0 and CPU1 switched

Thanks,
MIklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 10:05               ` Miklos Szeredi
@ 2012-01-10 16:00                 ` Linus Torvalds
  2012-01-10 16:15                   ` Al Viro
  2012-01-10 16:22                   ` Miklos Szeredi
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2012-01-10 16:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Tue, Jan 10, 2012 at 2:05 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> I tested Dave's patch and the bug can still be easily reproduced.
>
> And that's to be expected, as the intermediate "being on the lru"
> state that Dave's patch eliminates doesn't play a fundamental part in
> the mechanism of the livelock.  It does eliminate one trylock, but
> that's not the one critical to this bug (removing it is a very good
> idea anyway).
>
> The critical trylock here is the one in dentry_kill() which tries to
> lock the parent.

Ok. Here's your patch munged for current -git. You've got most of a
changelog, can you send this out with the right subject and a
sign-off, and re-test with the current git just to make sure.

Al, do you want to handle this or should I take it directly? I'm
assuming nobody has any objections to Miklos' patch?

                                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1512 bytes --]

 fs/dcache.c            |    8 +++++---
 include/linux/dcache.h |    1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3c6d3113a255..6477af24ff0d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -243,6 +243,7 @@ static void dentry_lru_add(struct dentry *dentry)
 static void __dentry_lru_del(struct dentry *dentry)
 {
 	list_del_init(&dentry->d_lru);
+	dentry->d_flags &= ~DCACHE_SHRINK_LIST;
 	dentry->d_sb->s_nr_dentry_unused--;
 	dentry_stat.nr_unused--;
 }
@@ -806,6 +807,7 @@ relock:
 			spin_unlock(&dentry->d_lock);
 		} else {
 			list_move_tail(&dentry->d_lru, &tmp);
+			dentry->d_flags |= DCACHE_SHRINK_LIST;
 			spin_unlock(&dentry->d_lock);
 			if (!--count)
 				break;
@@ -1098,11 +1100,11 @@ resume:
 		/*
 		 * move only zero ref count dentries to the dispose list.
 		 */
-		if (!dentry->d_count) {
+		if (dentry->d_count) {
+			dentry_lru_del(dentry);
+		} else if(!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
 			dentry_lru_move_list(dentry, dispose);
 			found++;
-		} else {
-			dentry_lru_del(dentry);
 		}
 
 		/*
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a47bda5f76db..31f73220e7d7 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -203,6 +203,7 @@ struct dentry_operations {
 
 #define DCACHE_CANT_MOUNT	0x0100
 #define DCACHE_GENOCIDE		0x0200
+#define DCACHE_SHRINK_LIST	0x0400
 
 #define DCACHE_NFSFS_RENAMED	0x1000
      /* this dentry has been "silly renamed" and has to be deleted on the last

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:00                 ` Linus Torvalds
@ 2012-01-10 16:15                   ` Al Viro
  2012-01-10 16:22                   ` Miklos Szeredi
  1 sibling, 0 replies; 24+ messages in thread
From: Al Viro @ 2012-01-10 16:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 08:00:30AM -0800, Linus Torvalds wrote:
> On Tue, Jan 10, 2012 at 2:05 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > I tested Dave's patch and the bug can still be easily reproduced.
> >
> > And that's to be expected, as the intermediate "being on the lru"
> > state that Dave's patch eliminates doesn't play a fundamental part in
> > the mechanism of the livelock. ?It does eliminate one trylock, but
> > that's not the one critical to this bug (removing it is a very good
> > idea anyway).
> >
> > The critical trylock here is the one in dentry_kill() which tries to
> > lock the parent.
> 
> Ok. Here's your patch munged for current -git. You've got most of a
> changelog, can you send this out with the right subject and a
> sign-off, and re-test with the current git just to make sure.
> 
> Al, do you want to handle this or should I take it directly?

I'll pick it once Miklos posts it.

> I'm assuming nobody has any objections to Miklos' patch?

I'm fine with it.

ObGrrr: I'm down to two remaining d_alloc_root() callers and while
neither is buggy per se, looking around a bit has turned up breakage
in both cases ;-/  (coda lookup treating OOM in new_inode() as
"not found" rather than -ENOMEM and hfs+ mount not bothering to check
if allocation *or* disk IO might fail, with very ugly results)...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:00                 ` Linus Torvalds
  2012-01-10 16:15                   ` Al Viro
@ 2012-01-10 16:22                   ` Miklos Szeredi
  2012-01-10 16:33                     ` Linus Torvalds
                                       ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Miklos Szeredi @ 2012-01-10 16:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 5:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Jan 10, 2012 at 2:05 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> I tested Dave's patch and the bug can still be easily reproduced.
>>
>> And that's to be expected, as the intermediate "being on the lru"
>> state that Dave's patch eliminates doesn't play a fundamental part in
>> the mechanism of the livelock.  It does eliminate one trylock, but
>> that's not the one critical to this bug (removing it is a very good
>> idea anyway).
>>
>> The critical trylock here is the one in dentry_kill() which tries to
>> lock the parent.
>
> Ok. Here's your patch munged for current -git. You've got most of a
> changelog, can you send this out with the right subject and a
> sign-off, and re-test with the current git just to make sure.

See the one with the subject "vfs: fix shrink_dcache_parent()
livelock" I sent out a bit earlier.

You didn't quite get it right: the flag now needs to be set in
select_parent() not prune_dcache_sb().

I think prune_dcache_sb() doesn't need this logic (although it
wouldn't hurt either) because that one is called from the slab
shrinkers and those are protected from being run multiple times in
parallel, I hope.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:22                   ` Miklos Szeredi
@ 2012-01-10 16:33                     ` Linus Torvalds
  2012-01-10 16:50                       ` Miklos Szeredi
  2012-01-10 18:04                     ` Al Viro
  2012-01-10 21:52                     ` Dave Chinner
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2012-01-10 16:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 8:22 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> You didn't quite get it right: the flag now needs to be set in
> select_parent() not prune_dcache_sb().

Ahhah.

> I think prune_dcache_sb() doesn't need this logic (although it
> wouldn't hurt either) because that one is called from the slab
> shrinkers and those are protected from being run multiple times in
> parallel, I hope.

Hmm. Even if they are never run in parallel, I think it would be much
nicer to do it in both, just so that there would be a conceptual
consistency of "when we remove the dentry from the LRU list and put it
on our pruning list, we set the bit". That cacheline will be dirty
anyway (due to the list move and getting the dentry lock), so setting
a bit is not expensive - but having odd inconsistent ad-hoc rules is
nasty.

Al?

                        Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:33                     ` Linus Torvalds
@ 2012-01-10 16:50                       ` Miklos Szeredi
  0 siblings, 0 replies; 24+ messages in thread
From: Miklos Szeredi @ 2012-01-10 16:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 5:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Hmm. Even if they are never run in parallel, I think it would be much
> nicer to do it in both, just so that there would be a conceptual
> consistency of "when we remove the dentry from the LRU list and put it
> on our pruning list, we set the bit". That cacheline will be dirty
> anyway (due to the list move and getting the dentry lock), so setting
> a bit is not expensive - but having odd inconsistent ad-hoc rules is
> nasty.

Makes sense.

I'm testing the modified patch right now and will post shortly.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:22                   ` Miklos Szeredi
  2012-01-10 16:33                     ` Linus Torvalds
@ 2012-01-10 18:04                     ` Al Viro
  2012-01-10 21:52                     ` Dave Chinner
  2 siblings, 0 replies; 24+ messages in thread
From: Al Viro @ 2012-01-10 18:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Dave Chinner, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 05:22:22PM +0100, Miklos Szeredi wrote:

> I think prune_dcache_sb() doesn't need this logic (although it
> wouldn't hurt either) because that one is called from the slab
> shrinkers and those are protected from being run multiple times in
> parallel, I hope.

	Eh...  We already have too much convoluted logics in there, so
let's keep it as non-subtle as possible.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH] shrink_dcache_parent() deadlock
  2012-01-10 16:22                   ` Miklos Szeredi
  2012-01-10 16:33                     ` Linus Torvalds
  2012-01-10 18:04                     ` Al Viro
@ 2012-01-10 21:52                     ` Dave Chinner
  2 siblings, 0 replies; 24+ messages in thread
From: Dave Chinner @ 2012-01-10 21:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Al Viro, Christoph Hellwig, linux-fsdevel,
	linux-kernel, mgorman, gregkh, akpm

On Tue, Jan 10, 2012 at 05:22:22PM +0100, Miklos Szeredi wrote:
> On Tue, Jan 10, 2012 at 5:00 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Jan 10, 2012 at 2:05 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> I tested Dave's patch and the bug can still be easily reproduced.
> >>
> >> And that's to be expected, as the intermediate "being on the lru"
> >> state that Dave's patch eliminates doesn't play a fundamental part in
> >> the mechanism of the livelock.  It does eliminate one trylock, but
> >> that's not the one critical to this bug (removing it is a very good
> >> idea anyway).
> >>
> >> The critical trylock here is the one in dentry_kill() which tries to
> >> lock the parent.
> >
> > Ok. Here's your patch munged for current -git. You've got most of a
> > changelog, can you send this out with the right subject and a
> > sign-off, and re-test with the current git just to make sure.
> 
> See the one with the subject "vfs: fix shrink_dcache_parent()
> livelock" I sent out a bit earlier.
> 
> You didn't quite get it right: the flag now needs to be set in
> select_parent() not prune_dcache_sb().
> 
> I think prune_dcache_sb() doesn't need this logic (although it
> wouldn't hurt either) because that one is called from the slab
> shrinkers and those are protected from being run multiple times in
> parallel, I hope.

Shrinkers can be called in parallel by memory reclaim on different
CPUs. The only thing serialising them is the LRU locks.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-01-10 21:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-09 10:58 [RFC PATCH] shrink_dcache_parent() deadlock Miklos Szeredi
2012-01-09 16:43 ` Linus Torvalds
2012-01-09 17:05   ` Miklos Szeredi
2012-01-09 17:16     ` Greg KH
2012-01-09 17:16 ` Christoph Hellwig
2012-01-09 17:30   ` Al Viro
2012-01-09 18:30     ` Linus Torvalds
2012-01-09 18:46       ` Linus Torvalds
2012-01-09 19:04         ` Christoph Hellwig
2012-01-09 19:18           ` Linus Torvalds
2012-01-09 20:59       ` Dave Chinner
2012-01-09 21:21         ` Linus Torvalds
2012-01-10  1:34           ` Al Viro
2012-01-10  2:02             ` Linus Torvalds
2012-01-10 10:05               ` Miklos Szeredi
2012-01-10 16:00                 ` Linus Torvalds
2012-01-10 16:15                   ` Al Viro
2012-01-10 16:22                   ` Miklos Szeredi
2012-01-10 16:33                     ` Linus Torvalds
2012-01-10 16:50                       ` Miklos Szeredi
2012-01-10 18:04                     ` Al Viro
2012-01-10 21:52                     ` Dave Chinner
2012-01-09 21:26         ` Al Viro
2012-01-09 17:27 ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).