linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix dcache race during umount
@ 2006-06-22 12:22 David Howells
  2006-06-22 14:27 ` David Howells
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: David Howells @ 2006-06-22 12:22 UTC (permalink / raw)
  To: neilb, balbir, akpm, aviro
  Cc: jblunck, dev, olh, dhowells, linux-kernel, linux-fsdevel


Hi Neil,

I'd like to propose an alternative to your patch to fix the dcache race
between unmounting a filesystem and the memory shrinker.

In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
instead of shrink_dcache_anon(), and the latter function is discarded
completely since it's no longer used.

This allows for the possibility of a superblock with multiple dentry trees in
it, such as exist in the patches for NFS superblock sharing that I worked out.

Otherwise, the patches are much the same, with the critical bit being
prune_dcache() ignoring any dentry for which the superblock is being unmounted
if called from shrink_dcache_memory().

I feel that prune_dcache() should probably at some point be merged into its
two callers, since shrink_dcache_parent() and select_parent() can probably
then do a better job of eating a dentry subtree from the leaves inwards, but I
haven't attempted that with this patch.

David
---
The race is that the shrink_dcache_memory() shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput() on the inode while the dentry is no
longer able to be found by the umounting process.  If iput() takes a while,
generic_shutdown_super() could get all the way though shrink_dcache_parent()
and shrink_dcache_sb() and invalidate_inodes() without ever waiting on this
particular inode.

Eventually the superblock gets freed anyway and if the iput() tried to touch it
(which some filesystems certainly do), it will lose.  The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry() on
someone else's dentry.  As a down_read_trylock() is used,
shrink_dcache_memory() will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any such
active prune_one_dentry() completes.

This requires that prune_dcache *knows* which filesystem (if any) it is doing
the prune on behalf of so that it can be careful of other filesystems.
shrink_dcache_memory() isn't called it on behalf of any filesystem, and so is
careful of everything.

generic_shutdown_super() now calls shrink_dcache_sb() rather than
shrink_dcache_anon() so that superblocks with multiple dentry trees can be
dealt with.  shrink_dcache_anon() is discarded since nothing then uses it.

If prune_dcache() finds a dentry that it cannot free, it leaves it where it is
(at the tail of the list) and exits, on the assumption that some other thread
will be removing that dentry soon.  To try to make sure that some work gets
done, a limited number of dnetries which are untouchable are skipped over while
choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
---
warthog>diffstat -p1 fix-dcache-race-during-umount-2.patch 
 fs/dcache.c            |  100 +++++++++++++++++++++++++++----------------------
 fs/super.c             |    2 
 include/linux/dcache.h |    1 
 3 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 940d188..c88e426 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,16 +394,29 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
+		struct rw_semaphore *s_umount;
 
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
+		if (unlikely(sb)) {
+			/* Try to find a dentry for this sb, but don't try
+			 * too hard, if they aren't near the tail they will
+			 * be moved down again soon
+			 */
+			int skip = count;
+			while (skip && tmp != &dentry_unused &&
+			    list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
+				skip--;
+				tmp = tmp->prev;
+			}
+		}
 		if (tmp == &dentry_unused)
 			break;
 		list_del_init(tmp);
@@ -427,7 +442,45 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		prune_one_dentry(dentry);
+		/*
+		 * If the dentry is not DCACHED_REFERENCED, it is time
+		 * to remove it from the dcache, provided the super block is
+		 * NULL (which means we are trying to reclaim memory)
+		 * or this dentry belongs to the same super block that
+		 * we want to shrink.
+		 */
+		/*
+		 * If this dentry is for "my" filesystem, then I can prune it
+		 * without taking the s_umount lock (I already hold it).
+		 */
+		if (sb && dentry->d_sb == sb) {
+			prune_one_dentry(dentry);
+			continue;
+		}
+		/*
+		 * ...otherwise we need to be sure this filesystem isn't being
+		 * unmounted, otherwise we could race with
+		 * generic_shutdown_super(), and end up holding a reference to
+		 * an inode while the filesystem is unmounted.
+		 * So we try to get s_umount, and make sure s_root isn't NULL.
+		 * (Take a local copy of s_umount to avoid a use-after-free of
+		 * `dentry').
+		 */
+		s_umount = &dentry->d_sb->s_umount;
+		if (down_read_trylock(s_umount)) {
+			if (dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(s_umount);
+				continue;
+			}
+			up_read(s_umount);
+		}
+		spin_unlock(&dentry->d_lock);
+		/* Cannot remove the first dentry, and it isn't appropriate
+		 * to move it to the head of the list, so give up, and try
+		 * later
+		 */
+		break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,46 +683,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
-}
-
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /*
@@ -689,7 +703,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }
diff --git a/fs/super.c b/fs/super.c
index 15f2afd..beb0c11 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 836325e..0dd1610 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
@ 2006-06-22 14:27 ` David Howells
  2006-06-22 16:08 ` Jan Blunck
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: David Howells @ 2006-06-22 14:27 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, jblunck, dev, olh, linux-kernel,
	linux-fsdevel

David Howells <dhowells@redhat.com> wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

On the other hand, I can make my patch just alter the effect of yours.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
  2006-06-22 14:27 ` David Howells
@ 2006-06-22 16:08 ` Jan Blunck
  2006-06-22 16:44   ` Jan Blunck
  2006-06-23 13:28 ` Jan Blunck
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Jan Blunck @ 2006-06-22 16:08 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Thu, Jun 22, David Howells wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

I had a similar patch. But after calling shrink_dcache_sb() in
generic_shutdown_super() the call to shrink_dcache_parent() is not necessary
anymore. And you should also fix d_genocide() that it is putting unused
dentries to the LRU list.

> I feel that prune_dcache() should probably at some point be merged into its
> two callers, since shrink_dcache_parent() and select_parent() can probably
> then do a better job of eating a dentry subtree from the leaves inwards, but I
> haven't attempted that with this patch.

Hmm, yes. The problem is that we only have a valid reference to the root
dentry of the subtree that we shrink. So we have to get a reference for the
parent of each dentry that we prune before calling prune_one_dentry().

Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 5535 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -384,9 +384,8 @@ static inline void prune_one_dentry(stru
  * @count: number of entries to try and free
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
@@ -419,15 +418,26 @@ static void prune_dcache(int count)
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
-			list_add(&dentry->d_lru, &dentry_unused);
-			dentry_stat.nr_unused++;
-			spin_unlock(&dentry->d_lock);
-			continue;
+		/* If the dentry was recently referenced, or dentry's
+		 * filesystem is going to be unmounted, don't free it. */
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    down_read_trylock(&dentry->d_sb->s_umount)) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
+				prune_one_dentry(dentry);
+				up_read(&sb->s_umount);
+				continue;
+			}
+			up_read(&sb->s_umount);
 		}
-		prune_one_dentry(dentry);
+		/* Append it at the beginning of the list, because either it
+		 * was recently reference or the dentry's filesystem is
+		 * unmounted so shrink_dcache_sb() can find it faster. */
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -464,12 +474,10 @@ void shrink_dcache_sb(struct super_block
 	 * superblock to the most recent end of the unused list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &dentry_unused);
 	}
 
 	/*
@@ -633,45 +641,6 @@ void shrink_dcache_parent(struct dentry 
 		prune_dcache(found);
 }
 
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/*
-			 * move only zero ref count dentries to the end
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
-}
-
 /*
  * Scan `nr' dentries and return the number which remain.
  *
@@ -1604,19 +1573,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 16:08 ` Jan Blunck
@ 2006-06-22 16:44   ` Jan Blunck
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Blunck @ 2006-06-22 16:44 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Thu, Jun 22, Jan Blunck wrote:

> I had a similar patch. But after calling shrink_dcache_sb() in
> generic_shutdown_super() the call to shrink_dcache_parent() is not necessary
> anymore. And you should also fix d_genocide() that it is putting unused
> dentries to the LRU list.

And I even have a patch that does compile ... so please ignore the previous
one.

Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 6207 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -384,9 +384,8 @@ static inline void prune_one_dentry(stru
  * @count: number of entries to try and free
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
@@ -419,15 +418,26 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
+		/* If the dentry was recently referenced, or dentry's
+		 * filesystem is going to be unmounted, don't free it. */
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    down_read_trylock(&dentry->d_sb->s_umount)) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
+				prune_one_dentry(dentry);
+				up_read(&sb->s_umount);
+				continue;
+			}
+			up_read(&sb->s_umount);
 		}
-		prune_one_dentry(dentry);
+		/* Append it at the beginning of the list, because either it
+		 * was recently reference or the dentry's filesystem is
+		 * unmounted so shrink_dcache_sb() can find it faster. */
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -456,32 +466,28 @@ static void prune_dcache(int count)
 
 void shrink_dcache_sb(struct super_block * sb)
 {
-	struct list_head *tmp, *next;
-	struct dentry *dentry;
+	struct dentry *dentry, *pos;
 
 	/*
 	 * Pass one ... move the dentries for the specified
 	 * superblock to the most recent end of the unused list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &dentry_unused);
 	}
 
 	/*
 	 * Pass two ... free the dentries for this superblock.
 	 */
 repeat:
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
 		dentry_stat.nr_unused--;
-		list_del_init(tmp);
+		list_del_init(&dentry->d_lru);
 		spin_lock(&dentry->d_lock);
 		if (atomic_read(&dentry->d_count)) {
 			spin_unlock(&dentry->d_lock);
@@ -633,45 +639,6 @@ void shrink_dcache_parent(struct dentry 
 		prune_dcache(found);
 }
 
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
-}
-
 /*
  * Scan `nr' dentries and return the number which remain.
  *
@@ -1604,19 +1571,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
  2006-06-22 14:27 ` David Howells
  2006-06-22 16:08 ` Jan Blunck
@ 2006-06-23 13:28 ` Jan Blunck
  2006-06-24  5:23 ` Neil Brown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Jan Blunck @ 2006-06-23 13:28 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Thu, Jun 22, David Howells wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

I've updated my patches to Linus git repository from today. Additionally, I've
changed shrink_dcache_parent() to not use prune_dcache() anymore.

Comments are welcome.

Regards,
	Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 10947 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -387,101 +387,96 @@ static void prune_one_dentry(struct dent
  *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
  */
- 
-static void prune_dcache(int count, struct super_block *sb)
+static void prune_dcache(int count)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
-		struct rw_semaphore *s_umount;
 
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
-		if (unlikely(sb)) {
-			/* Try to find a dentry for this sb, but don't try
-			 * too hard, if they aren't near the tail they will
-			 * be moved down again soon
-			 */
-			int skip = count;
-			while (skip && tmp != &dentry_unused &&
-			    list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
-				skip--;
-				tmp = tmp->prev;
-			}
-		}
 		if (tmp == &dentry_unused)
 			break;
-		list_del_init(tmp);
-		prefetch(dentry_unused.prev);
- 		dentry_stat.nr_unused--;
+		/*
+		 * We want to prefetch the predecessor of our predecessor since
+		 * our predecessor is already accessed in list_del_init().
+		 */
+		prefetch(tmp->prev->prev);
 		dentry = list_entry(tmp, struct dentry, d_lru);
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
 
- 		spin_lock(&dentry->d_lock);
+		spin_lock(&dentry->d_lock);
 		/*
 		 * We found an inuse dentry which was not removed from
-		 * dentry_unused because of laziness during lookup.  Do not free
+		 * dentry_unused because of laziness during lookup. Do not free
 		 * it - just keep it off the dentry_unused list.
 		 */
- 		if (atomic_read(&dentry->d_count)) {
- 			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		/*
-		 * If the dentry is not DCACHED_REFERENCED, it is time
-		 * to remove it from the dcache, provided the super block is
-		 * NULL (which means we are trying to reclaim memory)
-		 * or this dentry belongs to the same super block that
-		 * we want to shrink.
-		 */
-		/*
-		 * If this dentry is for "my" filesystem, then I can prune it
-		 * without taking the s_umount lock (I already hold it).
-		 */
-		if (sb && dentry->d_sb == sb) {
-			prune_one_dentry(dentry);
+		if (atomic_read(&dentry->d_count)) {
+			spin_unlock(&dentry->d_lock);
 			continue;
 		}
 		/*
-		 * ...otherwise we need to be sure this filesystem isn't being
-		 * unmounted, otherwise we could race with
-		 * generic_shutdown_super(), and end up holding a reference to
-		 * an inode while the filesystem is unmounted.
-		 * So we try to get s_umount, and make sure s_root isn't NULL.
-		 * (Take a local copy of s_umount to avoid a use-after-free of
-		 * `dentry').
+		 * If the dentry was recently referenced, or the dentry's
+		 * filesystem is going to be unmounted, don't free it.
 		 */
-		s_umount = &dentry->d_sb->s_umount;
-		if (down_read_trylock(s_umount)) {
-			if (dentry->d_sb->s_root != NULL) {
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    likely(down_read_trylock(&dentry->d_sb->s_umount))) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
 				prune_one_dentry(dentry);
-				up_read(s_umount);
+				up_read(&sb->s_umount);
 				continue;
 			}
-			up_read(s_umount);
+			up_read(&sb->s_umount);
 		}
-		spin_unlock(&dentry->d_lock);
-		/* Cannot remove the first dentry, and it isn't appropriate
-		 * to move it to the head of the list, so give up, and try
-		 * later
+		/*
+		 * Either the dentry was recently referenced or its filesystem
+		 * is going to be unmounted. Add the dentry to the beginning of
+		 * the LRU list so shrink_dcache_sb() can find it faster.
 		 */
-		break;
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
+	}
+	spin_unlock(&dcache_lock);
+}
+
+/*
+ * __shrink_dcache_sb - Shrink the dentries for a superblock
+ * @sb: superblock
+ * @list: list of dentries
+ *
+ * Prune a list of dentries of a superblock.
+ */
+static void __shrink_dcache_sb(struct super_block *sb, struct list_head *list)
+{
+	struct dentry *dentry, *pos;
+
+	spin_lock(&dcache_lock);
+repeat:
+	list_for_each_entry_safe(dentry, pos, list, d_lru) {
+		BUG_ON(dentry->d_sb != sb);
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+		spin_lock(&dentry->d_lock);
+		if (atomic_read(&dentry->d_count)) {
+			spin_unlock(&dentry->d_lock);
+			continue;
+		}
+		prune_one_dentry(dentry);
+		cond_resched_lock(&dcache_lock);
+		goto repeat;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -505,47 +500,31 @@ static void prune_dcache(int count, stru
  *
  * Shrink the dcache for the specified super block. This
  * is used to free the dcache before unmounting a file
- * system
+ * system.
+ *
+ * This must be called with sb->s_umount locked.
  */
-
 void shrink_dcache_sb(struct super_block * sb)
 {
-	struct list_head *tmp, *next;
-	struct dentry *dentry;
+	struct dentry *dentry, *pos;
+	LIST_HEAD(list);
 
 	/*
 	 * Pass one ... move the dentries for the specified
-	 * superblock to the most recent end of the unused list.
+	 * superblock to the temporarily list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &list);
 	}
+	spin_unlock(&dcache_lock);
 
 	/*
 	 * Pass two ... free the dentries for this superblock.
 	 */
-repeat:
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
-		if (dentry->d_sb != sb)
-			continue;
-		dentry_stat.nr_unused--;
-		list_del_init(tmp);
-		spin_lock(&dentry->d_lock);
-		if (atomic_read(&dentry->d_count)) {
-			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		prune_one_dentry(dentry);
-		cond_resched_lock(&dcache_lock);
-		goto repeat;
-	}
-	spin_unlock(&dcache_lock);
+	__shrink_dcache_sb(sb, &list);
 }
 
 /*
@@ -614,7 +593,7 @@ positive:
  * drop the lock and return early due to latency
  * constraints.
  */
-static int select_parent(struct dentry * parent)
+static int select_parent(struct dentry * parent, struct list_head *list)
 {
 	struct dentry *this_parent = parent;
 	struct list_head *next;
@@ -638,7 +617,7 @@ resume:
 		 * of the unused list for prune_dcache
 		 */
 		if (!atomic_read(&dentry->d_count)) {
-			list_add(&dentry->d_lru, dentry_unused.prev);
+			list_add(&dentry->d_lru, list);
 			dentry_stat.nr_unused++;
 			found++;
 		}
@@ -681,50 +660,11 @@ out:
  
 void shrink_dcache_parent(struct dentry * parent)
 {
+	LIST_HEAD(list);
 	int found;
 
-	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found, parent->d_sb);
-}
-
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct super_block *sb)
-{
-	struct hlist_node *lp;
-	struct hlist_head *head = &sb->s_anon;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found, sb);
-	} while(found);
+	while ((found = select_parent(parent, &list)) != 0)
+		__shrink_dcache_sb(parent->d_sb, &list);
 }
 
 /*
@@ -744,7 +684,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr, NULL);
+		prune_dcache(nr);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }
@@ -1659,19 +1599,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(sb);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
                   ` (2 preceding siblings ...)
  2006-06-23 13:28 ` Jan Blunck
@ 2006-06-24  5:23 ` Neil Brown
  2006-06-24  9:16 ` David Howells
  2006-06-24 13:33 ` [PATCH] Destroy the dentries contributed by a superblock on unmounting David Howells
  5 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2006-06-24  5:23 UTC (permalink / raw)
  To: David Howells
  Cc: balbir, akpm, aviro, jblunck, dev, olh, linux-kernel, linux-fsdevel

On Thursday June 22, dhowells@redhat.com wrote:
> 
> Hi Neil,
> 
> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

Is that a good idea?
I was under the impression that on large machines, the shear size of
the dentry_unused list makes scanning all of it under the dcache_lock
an unpleasant thing to do.

Do you not have easy access to the roots of all trees in your
super-block-sharing situation so that shrink_dcache_parent can be
called on them all?

I would have thought that we want to get rid of shrink_dcache_sb rather
than create more users of it ??

> 
> I feel that prune_dcache() should probably at some point be merged into its
> two callers, since shrink_dcache_parent() and select_parent() can probably
> then do a better job of eating a dentry subtree from the leaves inwards, but I
> haven't attempted that with this patch.
> 

I think this is true, prune_dcache is serving two masters and is
overly complex as a result.

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
                   ` (3 preceding siblings ...)
  2006-06-24  5:23 ` Neil Brown
@ 2006-06-24  9:16 ` David Howells
  2006-06-24 13:33 ` [PATCH] Destroy the dentries contributed by a superblock on unmounting David Howells
  5 siblings, 0 replies; 23+ messages in thread
From: David Howells @ 2006-06-24  9:16 UTC (permalink / raw)
  To: Neil Brown
  Cc: David Howells, balbir, akpm, aviro, jblunck, dev, olh,
	linux-kernel, linux-fsdevel

Neil Brown <neilb@suse.de> wrote:

> > In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> > instead of shrink_dcache_anon(), and the latter function is discarded
> > completely since it's no longer used.
> 
> Is that a good idea?

It depends on how often you expect unmounts to be happening, I suppose.

> Do you not have easy access to the roots of all trees in your
> super-block-sharing situation so that shrink_dcache_parent can be
> called on them all?

Well, all the roots are on the anon list, it's just that shrink_dcache_anon()
can't get rid of any root that's got children.

For unmounting specifically, we can do better as we can consume the dentry
trees directly.  That's not too difficult when we can unconditionally destroy
them from the leaves inwards.  That way we could probably avoid calling
shrink_dcache_parent() also - stick the tree at s_root on to the anon list
during unmount and have a single algorithm to wipe away the whole lot from
there.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
                   ` (4 preceding siblings ...)
  2006-06-24  9:16 ` David Howells
@ 2006-06-24 13:33 ` David Howells
  2006-06-25  6:48   ` Neil Brown
  2006-06-25 16:02   ` David Howells
  5 siblings, 2 replies; 23+ messages in thread
From: David Howells @ 2006-06-24 13:33 UTC (permalink / raw)
  To: Neil Brown
  Cc: David Howells, balbir, akpm, aviro, jblunck, dev, olh,
	linux-kernel, linux-fsdevel


Neil Brown <neilb@suse.de> wrote:

> Do you not have easy access to the roots of all trees in your
> super-block-sharing situation so that shrink_dcache_parent can be
> called on them all?

How about this?

David
---
From: David Howells <dhowells@redhat.com>

The attached patch destroys all the dentries attached to a superblock in one go
by:

 (1) Destroying the tree rooted at s_root.

 (2) Destroying every entry in the anon list, one at a time.

 (3) Each entry in the anon list has its subtree consumed from the leaves
     inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

The locking could be reduced if it can be guaranteed that the filesystem being
unmounted won't rearrange its dentry tree whilst we're trying to unmount it.
Unfortunately, I'm not sure this holds true for network filesystems,
particularly those that have callbacks, leases, oplocks or whatever to tell
them things have changed on the server.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/dcache.c            |  111 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/super.c             |    8 +--
 include/linux/dcache.h |    1 
 3 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 454b6af..cb630f8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -549,6 +549,117 @@ repeat:
 }
 
 /*
+ * destroy one subtree of dentries for unmount
+ * - consumes a reference on the specified root dentry
+ * - called with the dcache_lock held, which it drops
+ *   - defend against the filesystem attempting to change the dentry tree
+ *     whilst we're trying to unmount it (it may be a network fs)
+ */
+static void shrink_dcache_for_umount_one(struct dentry *dentry)
+{
+	struct dentry *parent, *p;
+
+	/* detach this root from the system */
+	if (!list_empty(&dentry->d_lru)) {
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+	}
+	__d_drop(dentry);
+
+	/* ignore anonymous dentries with parents as we'll get to those in due
+	 * course now they've been delisted */
+	if (!IS_ROOT(dentry)) {
+		spin_unlock(&dcache_lock);
+		return;
+	}
+
+	/* find the first leaf in the current subtree */
+descend_to_leaf:
+	if (!list_empty(&dentry->d_subdirs)) {
+		/* it's got children, pin it whilst we dispose of them */
+		atomic_inc(&dentry->d_count);
+
+		/* detach all children of this dentry from the system */
+		list_for_each_entry(p, &dentry->d_subdirs, d_u.d_child) {
+			if (!list_empty(&p->d_lru)) {
+				dentry_stat.nr_unused--;
+				list_del_init(&p->d_lru);
+			}
+
+			__d_drop(p);
+		}
+
+		dentry = list_entry(dentry->d_subdirs.next,
+				    struct dentry, d_u.d_child);
+		goto descend_to_leaf;
+	}
+
+	/* consume this leaf */
+consume_leaf:
+	BUG_ON(atomic_read(&dentry->d_count) != 0);
+
+	spin_lock(&dentry->d_lock);
+
+	parent = dentry->d_parent;
+	if (parent == dentry)
+		parent = NULL;
+	else
+		atomic_dec(&parent->d_count);
+
+	list_del(&dentry->d_u.d_child);
+	dentry_stat.nr_dentry--;	/* For d_free, below */
+	dentry_iput(dentry);
+	d_free(dentry);
+
+	/* done if fallen off top of tree */
+	if (!parent)
+		return;
+
+	/* ascend to parent and move to next leaf */
+	dentry = parent;
+
+	spin_lock(&dcache_lock);
+
+	if (list_empty(&dentry->d_subdirs)) {
+		atomic_dec(&dentry->d_count);
+		goto consume_leaf;
+	}
+
+	dentry = list_entry(dentry->d_subdirs.next,
+			    struct dentry, d_u.d_child);
+	goto descend_to_leaf;
+}
+
+/*
+ * destroy the dentries attached to a superblock on unmounting
+ * - shouldn't need to get dentry->d_lock as no-one else should be able to
+ *   reach this superblock and its dentries.
+ */
+void shrink_dcache_for_umount(struct super_block *sb)
+{
+	struct dentry *dentry;
+
+	if (down_read_trylock(&sb->s_umount))
+		BUG();
+
+	dentry = sb->s_root;
+	sb->s_root = NULL;
+	spin_lock(&dcache_lock);
+	atomic_dec(&dentry->d_count);
+	shrink_dcache_for_umount_one(dentry);
+
+	while (!hlist_empty(&sb->s_anon)) {
+		spin_lock(&dcache_lock);
+		if (!hlist_empty(&sb->s_anon)) {
+			dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
+			shrink_dcache_for_umount_one(dentry);
+		} else {
+			spin_unlock(&dcache_lock);
+		}
+	}
+}
+
+/*
  * Search for at least 1 mount point in the dentry's subdirs.
  * We descend to the next level whenever the d_subdirs
  * list is non-empty and continue searching.
diff --git a/fs/super.c b/fs/super.c
index ed653a6..50abd8e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -225,14 +225,10 @@ static int grab_super(struct super_block
  */
 void generic_shutdown_super(struct super_block *sb)
 {
-	struct dentry *root = sb->s_root;
 	struct super_operations *sop = sb->s_op;
 
-	if (root) {
-		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_sb(sb);
-		dput(root);
+	if (sb->s_root) {
+		shrink_dcache_for_umount(sb);
 		fsync_super(sb);
 		lock_super(sb);
 		sb->s_flags &= ~MS_ACTIVE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 28c9891..0f660ae 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
+extern void shrink_dcache_for_umount(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-24 13:33 ` [PATCH] Destroy the dentries contributed by a superblock on unmounting David Howells
@ 2006-06-25  6:48   ` Neil Brown
  2006-06-25 16:02   ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Neil Brown @ 2006-06-25  6:48 UTC (permalink / raw)
  To: David Howells
  Cc: balbir, akpm, aviro, jblunck, dev, olh, linux-kernel, linux-fsdevel

On Saturday June 24, dhowells@redhat.com wrote:
> 
> Neil Brown <neilb@suse.de> wrote:
> 
> > Do you not have easy access to the roots of all trees in your
> > super-block-sharing situation so that shrink_dcache_parent can be
> > called on them all?
> 
> How about this?

I think this is definitely going in the right direction.
A few comments.

 - The test for IS_ROOT surprises me.  I thought anonymous dentries
   were all IS_ROOT.   Maybe this changes with your shared-superblock
   changes? 

 - You have two nested loops implemented with gotos.  Dijykstra would
   be shocked!  The logic looks like it should be:

     for(;;) {
        while (!list_empty(&dentry->d_subdirs)) {
		stuff-1;
		dentry = list_entry(dentry->d_subdirs.next,....)
	}
	while (list_empty(dentry->d_subdirs)) {
		parent = dentry->d_parent (or NULL)
		d_free(dentry)
		if (!parent) return;
		dentry = parent;
	}
     }

    Which I think makes it a lot more readable (obviously I have left
    out lots of the details in the above.

  - The section that I have called 'stuff-1' above seems excessive.
    Everytime you visit a dentry with children, you remove them from
    the unused list (if present) and d_drop them from the hash.  After
    the first time, these should all be no-ops.  If there some
    particular reason for this that I'm not seeing (which case I'd
    like a comment) or can you just unuse/drop the dentry just before
    freeing it

  - Think the reference counting deserves a comment.  I think that
    this routine holds a reference count on 'dentry' and every parent
    up to the IS_ROOT.  Is that right?

In summary, it looks right, though I feel I would want to go through
and double check the refcounting again, but I would rather do that
with it restructured into while loops.
Is that fair?

NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-24 13:33 ` [PATCH] Destroy the dentries contributed by a superblock on unmounting David Howells
  2006-06-25  6:48   ` Neil Brown
@ 2006-06-25 16:02   ` David Howells
  2006-06-25 16:30     ` Nick Piggin
                       ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: David Howells @ 2006-06-25 16:02 UTC (permalink / raw)
  To: Neil Brown
  Cc: David Howells, balbir, akpm, aviro, jblunck, dev, olh,
	linux-kernel, linux-fsdevel

Neil Brown <neilb@suse.de> wrote:

>  - The test for IS_ROOT surprises me.  I thought anonymous dentries
>    were all IS_ROOT.   Maybe this changes with your shared-superblock
>    changes? 

I know they start out as root dentries, but are they required to stay so?

My NFS changes don't cause this to happen.

>  - You have two nested loops implemented with gotos.  Dijykstra would
>    be shocked!

It'll be good for him.

> The logic looks like it should be:

Two points:

 (1) The second while loop really wants to be a do-while loop.  We have to go
     around it at least once, since the first loop brings it to that
     condition.

 (2) We need to drop a reference on a dentry node for which we've drained all
     the children, but we need to do it after the test to see whether it has
     any children.  This means you'd need to put the atomic_dec() in the
     condition of the do-while statement.

     To have a while-loop instead, you'd have to get a reference on every node
     you considered, even if you're just going to immediately drop that
     reference again.

>     Which I think makes it a lot more readable (obviously I have left
>     out lots of the details in the above.

But not necessarily right.  Basically, the goto to consume_leaf when we've
drained the parent is a sort of shortcut.

>   - The section that I have called 'stuff-1' above seems excessive.
>     Everytime you visit a dentry with children, you remove them from
>     the unused list (if present) and d_drop them from the hash.  After
>     the first time, these should all be no-ops.

"After the first time"?  I don't see what you're getting at.  It is executed
once per directory.

>     If there some particular reason for this that I'm not seeing (which case
>     I'd like a comment) or can you just unuse/drop the dentry just before
>     freeing it

Well, I'd like to avoid holding the dcache_lock as much as possible, and,
theoretically, around that while loop is the only place in which dcache_lock
needs to be held.  As long as the filesystem isn't trying to mangle the dentry
tree whilst we're trying to unmount it, nothing else will be mucking around
with them (now that shrink_dcache_memory() has been fixed).

>   - Think the reference counting deserves a comment.

There is a comment:

+		/* it's got children, pin it whilst we dispose of them */

>     I think that this routine holds a reference count on 'dentry' and every
>     parent up to the IS_ROOT.  Is that right?

No.  We only get a reference on a dentry that has children.  Any dentry that
doesn't have children we just clobber.

I want to defend against a netfs attempting to evict dentries during unmount
in response to network events.

> In summary, it looks right, though I feel I would want to go through
> and double check the refcounting again, but I would rather do that
> with it restructured into while loops.
> Is that fair?

No... you should be able to handle gotos:-)

I've restructured it a little (see attached).

I've made the descend_to_leaf loop into an infinite for-loop.

I've moved the __d_drop and LRU-removal to just before we clobber an entry,
since for the moment the lock is being held anyway - that shortens the descent
loop.

The consume-leaf loop is tricky to turn into a while-loop, a for-loop or a
do-loop since as I said above: I need to call atomic_dec() on the second+
times round the loop, but after the condition has been evaluated to true.

David
---
[PATCH] Destroy the dentries contributed by a superblock on unmounting

From: David Howells <dhowells@redhat.com>

The attached patch destroys all the dentries attached to a superblock in one go
by:

 (1) Destroying the tree rooted at s_root.

 (2) Destroying every entry in the anon list, one at a time.

 (3) Each entry in the anon list has its subtree consumed from the leaves
     inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/dcache.c            |  109 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/super.c             |    8 +---
 include/linux/dcache.h |    1 
 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 454b6af..31f2d8a 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -549,6 +549,115 @@ repeat:
 }
 
 /*
+ * destroy a subtree of dentries for unmount
+ * - consumes a reference on the specified root dentry
+ * - called with the dcache_lock held, which it drops
+ *   - defend against the filesystem attempting to change the dentry tree
+ *     whilst we're trying to unmount it (it may be a network fs)
+ */
+static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
+{
+	struct dentry *parent;
+
+	/* detach this root from the system */
+	if (!list_empty(&dentry->d_lru)) {
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+	}
+	__d_drop(dentry);
+
+	/* ignore anonymous dentries with parents as we'll get to those in due
+	 * course now they've been delisted */
+	if (!IS_ROOT(dentry)) {
+		spin_unlock(&dcache_lock);
+		return;
+	}
+
+	for (;;) {
+		/* descend to the first leaf in the current subtree */
+		while (!list_empty(&dentry->d_subdirs)) {
+			/* this dentry has children - pin it whilst we dispose
+			 * of them */
+			atomic_inc(&dentry->d_count);
+			dentry = list_entry(dentry->d_subdirs.next,
+					    struct dentry, d_u.d_child);
+		}
+
+	consume_empty_leaf:
+		/* consume this leaf */
+		BUG_ON(atomic_read(&dentry->d_count) != 0);
+
+		spin_lock(&dentry->d_lock);
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		__d_drop(dentry);
+
+		parent = dentry->d_parent;
+		if (parent == dentry)
+			parent = NULL;
+		else
+			atomic_dec(&parent->d_count);
+
+		list_del(&dentry->d_u.d_child);
+		dentry_stat.nr_dentry--;	/* For d_free, below */
+		dentry_iput(dentry);
+		d_free(dentry);
+
+		/* finished if fallen off top of tree */
+		if (!parent)
+			return;
+
+		/* ascend to parent and move to next leaf */
+		dentry = parent;
+
+		spin_lock(&dcache_lock);
+
+		if (list_empty(&dentry->d_subdirs)) {
+			/* the parent is now empty and can be itself
+			 * consumed */
+			atomic_dec(&dentry->d_count);
+			goto consume_empty_leaf;
+		}
+
+		dentry = list_entry(dentry->d_subdirs.next,
+				    struct dentry, d_u.d_child);
+	}
+}
+
+/*
+ * destroy the dentries attached to a superblock on unmounting
+ * - shouldn't need to get dentry->d_lock as no-one else should be able to
+ *   reach this superblock and its dentries.
+ */
+void shrink_dcache_for_umount(struct super_block *sb)
+{
+	struct dentry *dentry;
+
+	if (down_read_trylock(&sb->s_umount))
+		BUG();
+
+	dentry = sb->s_root;
+	sb->s_root = NULL;
+	spin_lock(&dcache_lock);
+	atomic_dec(&dentry->d_count);
+	shrink_dcache_for_umount_subtree(dentry);
+
+	while (!hlist_empty(&sb->s_anon)) {
+		spin_lock(&dcache_lock);
+		if (!hlist_empty(&sb->s_anon)) {
+			dentry = hlist_entry(sb->s_anon.first,
+					     struct dentry, d_hash);
+			shrink_dcache_for_umount_subtree(dentry);
+		} else {
+			spin_unlock(&dcache_lock);
+		}
+	}
+}
+
+/*
  * Search for at least 1 mount point in the dentry's subdirs.
  * We descend to the next level whenever the d_subdirs
  * list is non-empty and continue searching.
diff --git a/fs/super.c b/fs/super.c
index 8a669f6..dc0f3c1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -225,14 +225,10 @@ static int grab_super(struct super_block
  */
 void generic_shutdown_super(struct super_block *sb)
 {
-	struct dentry *root = sb->s_root;
 	struct super_operations *sop = sb->s_op;
 
-	if (root) {
-		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_sb(sb);
-		dput(root);
+	if (sb->s_root) {
+		shrink_dcache_for_umount(sb);
 		fsync_super(sb);
 		lock_super(sb);
 		sb->s_flags &= ~MS_ACTIVE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 28c9891..0f660ae 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
+extern void shrink_dcache_for_umount(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-25 16:02   ` David Howells
@ 2006-06-25 16:30     ` Nick Piggin
  2006-06-26  6:05     ` Neil Brown
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2006-06-25 16:30 UTC (permalink / raw)
  To: David Howells
  Cc: Neil Brown, balbir, akpm, aviro, jblunck, dev, olh, linux-kernel,
	linux-fsdevel

David Howells wrote:

> +	for (;;) {
> +		/* descend to the first leaf in the current subtree */
> +		while (!list_empty(&dentry->d_subdirs)) {
> +			/* this dentry has children - pin it whilst we dispose
> +			 * of them */
> +			atomic_inc(&dentry->d_count);
> +			dentry = list_entry(dentry->d_subdirs.next,
> +					    struct dentry, d_u.d_child);
> +		}
> +
> +	consume_empty_leaf:

Just to nitpick, the indented label goes against the style in this
part of the kernel.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-25 16:02   ` David Howells
  2006-06-25 16:30     ` Nick Piggin
@ 2006-06-26  6:05     ` Neil Brown
  2006-06-26 11:21     ` David Howells
  2006-06-27 10:17     ` David Howells
  3 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2006-06-26  6:05 UTC (permalink / raw)
  To: David Howells
  Cc: balbir, akpm, aviro, jblunck, dev, olh, linux-kernel, linux-fsdevel

On Sunday June 25, dhowells@redhat.com wrote:
> Neil Brown <neilb@suse.de> wrote:
> 
> >  - The test for IS_ROOT surprises me.  I thought anonymous dentries
> >    were all IS_ROOT.   Maybe this changes with your shared-superblock
> >    changes? 
> 
> I know they start out as root dentries, but are they required to stay so?

The 'anon' entries are linked together with d_hash, so if any anon
entry were given a parent, it would also be given a name and the
d_hash would be used for something else.  So I think "yes- they are
required to stay so".  It might not hurt to have a
WARN_ON(!IS_ROOT()) though.

> > The logic looks like it should be:
> 
> Two points:
> 
>  (1) The second while loop really wants to be a do-while loop.  We have to go
>      around it at least once, since the first loop brings it to that
>      condition.

It certainly can be a do-while.  As the preceding loop leaves the
condition true, a do-while or a while() will have the same result,
though the while() will do one extra (unneeded) test.  So the choice
should be based on what makes the code clearest, and that it often a
subjective question.

> 
>  (2) We need to drop a reference on a dentry node for which we've drained all
>      the children, but we need to do it after the test to see whether it has
>      any children.  This means you'd need to put the atomic_dec() in the
>      condition of the do-while statement.
> 
>      To have a while-loop instead, you'd have to get a reference on every node
>      you considered, even if you're just going to immediately drop that
>      reference again.

Yes, I had glossed over that bit of reference counting.  I don't see a
big problem with having an atomic_inc before the loop and an
atomic_dec at the top, but I can see that others might not like it.

> 
> >     Which I think makes it a lot more readable (obviously I have left
> >     out lots of the details in the above.
> 
> But not necessarily right.  Basically, the goto to consume_leaf when we've
> drained the parent is a sort of shortcut.
> 
> >   - The section that I have called 'stuff-1' above seems excessive.
> >     Everytime you visit a dentry with children, you remove them from
> >     the unused list (if present) and d_drop them from the hash.  After
> >     the first time, these should all be no-ops.
> 
> "After the first time"?  I don't see what you're getting at.  It is executed
> once per directory.

Uhmmm. Yes, I see.  You don't did that loop when stepping down to a
directory, not when stepping back up to it.  I was confused, sorry.

> 
> >     If there some particular reason for this that I'm not seeing (which case
> >     I'd like a comment) or can you just unuse/drop the dentry just before
> >     freeing it
> 
> Well, I'd like to avoid holding the dcache_lock as much as possible, and,
> theoretically, around that while loop is the only place in which dcache_lock
> needs to be held.  As long as the filesystem isn't trying to mangle the dentry
> tree whilst we're trying to unmount it, nothing else will be mucking around
> with them (now that shrink_dcache_memory() has been fixed).

That's a good reason.  I hadn't thought of that.  Thanks.

> 
> No... you should be able to handle gotos:-)
> 
> I've restructured it a little (see attached).
> 
> I've made the descend_to_leaf loop into an infinite for-loop.
...
> The consume-leaf loop is tricky to turn into a while-loop, a for-loop or a
> do-loop since as I said above: I need to call atomic_dec() on the second+
> times round the loop, but after the condition has been evaluated to true.

If I asked really nicely, could you make consume_empty_leaf an
infinite loop too?

> +	    for (;;) {
> +		/* consume this leaf */
> +		BUG_ON(atomic_read(&dentry->d_count) != 0);
> +
> +		spin_lock(&dentry->d_lock);
......
> +
> +		if (!list_empty(&dentry->d_subdirs))
> +			break;
> +		/* the parent is now empty and can be itself
> +		 * consumed */
> +		atomic_dec(&dentry->d_count);
> +	    }
> +
> +	    dentry = list_entry(dentry->d_subdirs.next,
> +				    struct dentry, d_u.d_child);

At least I then get the visual clues that indenting gives. (pretty please).

But I'm happy to give it my Ack even without that.  It makes much more
sense (to me) to do it that way than to move lots of dentries onto the
unused list, then prune those, then try again .....

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-25 16:02   ` David Howells
  2006-06-25 16:30     ` Nick Piggin
  2006-06-26  6:05     ` Neil Brown
@ 2006-06-26 11:21     ` David Howells
  2006-06-27  0:53       ` Neil Brown
  2006-06-27 10:17     ` David Howells
  3 siblings, 1 reply; 23+ messages in thread
From: David Howells @ 2006-06-26 11:21 UTC (permalink / raw)
  To: Neil Brown
  Cc: David Howells, balbir, akpm, aviro, jblunck, dev, olh,
	linux-kernel, linux-fsdevel

Neil Brown <neilb@suse.de> wrote:

> > I know they start out as root dentries, but are they required to stay so?
> 
> The 'anon' entries are linked together with d_hash

That's irrelevant.

> so if any anon entry were given a parent, it would also be given a name

That's an assumption, though it's probably valid.  I asked Al and he confirmed
that it is.

> and the d_hash would be used for something else.  So I think "yes- they are
> required to stay so".  It might not hurt to have a WARN_ON(!IS_ROOT())
> though.

That would have to be a BUG_ON().  The function does not unlink the initial
dentry from its parent if its parent is not a root.

> though the while() will do one extra (unneeded) test

Exactly.

> So the choice should be based on what makes the code clearest

You should avoid the extra test.

> Yes, I had glossed over that bit of reference counting.  I don't see a
> big problem with having an atomic_inc before the loop and an
> atomic_dec at the top, but I can see that others might not like it.

Imagine you have a directory with a few thousand file children.  With my code
as it is, you'll do one extra inc/dec for the directory.  With your
suggestion, you'll do a few thousand extra inc/decs.  Now these are (a) memory
ops, and (b) atomic ops, but it probably won't matter since you have to write
to the dentries anyway, but the fewer the better.

We don't *need* to do the extra inc/decs, so let's not.

Actually, I discussed this with Al, and he said that I can assume that the
filesystem is not permitted to rearrange its dentries once it calls
generic_shutdown_super().  This means I can reduce the locking and dispense
with the extra refcounts entirely, though I do have to fold in a cut-down copy
of dentry_iput() (no locks to drop, no inotify watches).

> > As long as the filesystem isn't trying to mangle the dentry tree whilst
> > we're trying to unmount it, nothing else will be mucking around with them
> > (now that shrink_dcache_memory() has been fixed).
> 
> That's a good reason.  I hadn't thought of that.  Thanks.

It seems I can make that assumption.  I've added comments to the header for
generic_shutdown_super() to that effect and put more header comments on
shrink_dcache_for_umount() to say why we can avoid the locking.

> If I asked really nicely, could you make consume_empty_leaf an infinite loop
> too?

It can now be a do-while loop since I've got rid of the atomic_dec().

So here's a new version.

David
---
[PATCH] Destroy the dentries contributed by a superblock on unmounting

From: David Howells <dhowells@redhat.com>

The attached patch destroys all the dentries attached to a superblock in one go
by:

 (1) Destroying the tree rooted at s_root.

 (2) Destroying every entry in the anon list, one at a time.

 (3) Each entry in the anon list has its subtree consumed from the leaves
     inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 fs/dcache.c            |  120 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/super.c             |   12 ++---
 include/linux/dcache.h |    1 
 3 files changed, 127 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 454b6af..89c7974 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -549,6 +549,126 @@ repeat:
 }
 
 /*
+ * destroy a subtree of dentries for unmount
+ * - consumes a reference on the specified root dentry
+ * - called with the dcache_lock held, which it drops
+ *   - defend against the filesystem attempting to change the dentry tree
+ *     whilst we're trying to unmount it (it may be a network fs)
+ */
+static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
+{
+	struct dentry *parent;
+
+	BUG_ON(!IS_ROOT(dentry));
+
+	/* detach this root from the system */
+	spin_lock(&dcache_lock);
+	if (!list_empty(&dentry->d_lru)) {
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+	}
+	__d_drop(dentry);
+	spin_unlock(&dcache_lock);
+
+	for (;;) {
+		/* descend to the first leaf in the current subtree */
+		while (!list_empty(&dentry->d_subdirs)) {
+			struct dentry *loop;
+			
+			/* this is a branch with children - detach all of them
+			 * from the system in one go */
+			spin_lock(&dcache_lock);
+			list_for_each_entry(loop, &dentry->d_subdirs,
+					    d_u.d_child) {
+				if (!list_empty(&loop->d_lru)) {
+					dentry_stat.nr_unused--;
+					list_del_init(&loop->d_lru);
+				}
+
+				__d_drop(loop);
+				cond_resched_lock(&dcache_lock);
+			}
+			spin_unlock(&dcache_lock);
+
+			/* move to the first child */
+			dentry = list_entry(dentry->d_subdirs.next,
+					    struct dentry, d_u.d_child);
+		}
+
+		/* consume the dentries from this leaf up through its parents
+		 * until we find one with children or run out altogether */
+		do {
+			struct inode *inode;
+
+			BUG_ON(atomic_read(&dentry->d_count) != 0);
+
+			parent = dentry->d_parent;
+			if (parent == dentry)
+				parent = NULL;
+			else
+				atomic_dec(&parent->d_count);
+
+			list_del(&dentry->d_u.d_child);
+			dentry_stat.nr_dentry--;	/* For d_free, below */
+
+			inode = dentry->d_inode;
+			if (inode) {
+				BUG_ON(!list_empty(&inode->inotify_watches));
+				dentry->d_inode = NULL;
+				list_del_init(&dentry->d_alias);
+				if (dentry->d_op && dentry->d_op->d_iput)
+					dentry->d_op->d_iput(dentry, inode);
+				else
+					iput(inode);
+			}
+
+			d_free(dentry);
+
+			/* finished when we fall off the top of the tree,
+			 * otherwise we ascend to the parent and move to the
+			 * next sibling if there is one */
+			if (!parent)
+				return;
+
+			dentry = parent;
+
+		} while (list_empty(&dentry->d_subdirs));
+
+		dentry = list_entry(dentry->d_subdirs.next,
+				    struct dentry, d_u.d_child);
+	}
+}
+
+/*
+ * destroy the dentries attached to a superblock on unmounting
+ * - we don't need to use dentry->d_lock, and only need dcache_lock when
+ *   removing the dentry from the system lists and hashes because:
+ *   - the superblock is detached from all mountings and open files, so the
+ *     dentry trees will not be rearranged by the VFS
+ *   - s_umount is write-locked, so the memory pressure shrinker will ignore
+ *     any dentries belonging to this superblock that it comes across
+ *   - the filesystem itself is no longer permitted to rearrange the dentries
+ *     in this superblock
+ */
+void shrink_dcache_for_umount(struct super_block *sb)
+{
+	struct dentry *dentry;
+
+	if (down_read_trylock(&sb->s_umount))
+		BUG();
+
+	dentry = sb->s_root;
+	sb->s_root = NULL;
+	atomic_dec(&dentry->d_count);
+	shrink_dcache_for_umount_subtree(dentry);
+
+	while (!hlist_empty(&sb->s_anon)) {
+		dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
+		shrink_dcache_for_umount_subtree(dentry);
+	}
+}
+
+/*
  * Search for at least 1 mount point in the dentry's subdirs.
  * We descend to the next level whenever the d_subdirs
  * list is non-empty and continue searching.
diff --git a/fs/super.c b/fs/super.c
index 8a669f6..bd655b1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -222,17 +222,17 @@ static int grab_super(struct super_block
  *	that need destruction out of superblock, call generic_shutdown_super()
  *	and release aforementioned objects.  Note: dentries and inodes _are_
  *	taken care of and do not need specific handling.
+ *
+ *	Upon calling this function, the filesystem may no longer alter or
+ *	rearrange the set of dentries belonging to this super_block, nor may it
+ *	change the attachments of dentries to inodes.
  */
 void generic_shutdown_super(struct super_block *sb)
 {
-	struct dentry *root = sb->s_root;
 	struct super_operations *sop = sb->s_op;
 
-	if (root) {
-		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_sb(sb);
-		dput(root);
+	if (sb->s_root) {
+		shrink_dcache_for_umount(sb);
 		fsync_super(sb);
 		lock_super(sb);
 		sb->s_flags &= ~MS_ACTIVE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 28c9891..0f660ae 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
+extern void shrink_dcache_for_umount(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-26 11:21     ` David Howells
@ 2006-06-27  0:53       ` Neil Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2006-06-27  0:53 UTC (permalink / raw)
  To: David Howells
  Cc: balbir, akpm, aviro, jblunck, dev, olh, linux-kernel, linux-fsdevel

On Monday June 26, dhowells@redhat.com wrote:
> 
> So here's a new version.
> 

Acked-by: NeilBrown <neilb@suse.de>

very nice, thanks.

NeilBrown


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-25 16:02   ` David Howells
                       ` (2 preceding siblings ...)
  2006-06-26 11:21     ` David Howells
@ 2006-06-27 10:17     ` David Howells
  2006-06-27 22:55       ` Andrew Morton
  2006-06-27 23:18       ` David Howells
  3 siblings, 2 replies; 23+ messages in thread
From: David Howells @ 2006-06-27 10:17 UTC (permalink / raw)
  To: torvalds, akpm, aviro, neilb, jblunck
  Cc: linux-kernel, linux-fsdevel, dhowells


The attached patch destroys all the dentries attached to a superblock in one go
by:

 (1) Destroying the tree rooted at s_root.

 (2) Destroying every entry in the anon list, one at a time.

 (3) Each entry in the anon list has its subtree consumed from the leaves
     inwards.

This reduces the amount of work generic_shutdown_super() does, and avoids
iterating through the dentry_unused list.

Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
functions added by this patch.  This is because:

 (1) at the point the filesystem calls generic_shutdown_super(), it is not
     permitted to further touch the superblock's set of dentries, and nor may
     it remove aliases from inodes;

 (2) the dcache memory shrinker now skips dentries that are being unmounted;
     and

 (3) the superblock no longer has any external references through which the VFS
     can reach it.

Given these points, the only locking we need to do is when we remove dentries
from the unused list and the name hashes, which we do a directory's worth at a
time.

generic_shutdown_super() has had addition header comments added to make point
(1) explicit.

We also don't need to guard against reference counts going to zero unexpectedly
and removing bits of the tree we're working on as nothing else can call dput().

A cut down version of dentry_iput() has been folded into
shrink_dcache_for_umount_subtree() function.  Apart from not needing to unlock
things, it also doesn't need to check for inotify watches.

Signed-Off-By: David Howells <dhowells@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
---

 fs/dcache.c            |  118 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/super.c             |   12 ++---
 include/linux/dcache.h |    1 
 3 files changed, 125 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 77ce516..bec6742 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -549,6 +549,124 @@ repeat:
 }
 
 /*
+ * destroy a single subtree of dentries for unmount
+ * - see the comments on shrink_dcache_for_umount() for a description of the
+ *   locking
+ */
+static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
+{
+	struct dentry *parent;
+
+	BUG_ON(!IS_ROOT(dentry));
+
+	/* detach this root from the system */
+	spin_lock(&dcache_lock);
+	if (!list_empty(&dentry->d_lru)) {
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+	}
+	__d_drop(dentry);
+	spin_unlock(&dcache_lock);
+
+	for (;;) {
+		/* descend to the first leaf in the current subtree */
+		while (!list_empty(&dentry->d_subdirs)) {
+			struct dentry *loop;
+
+			/* this is a branch with children - detach all of them
+			 * from the system in one go */
+			spin_lock(&dcache_lock);
+			list_for_each_entry(loop, &dentry->d_subdirs,
+					    d_u.d_child) {
+				if (!list_empty(&loop->d_lru)) {
+					dentry_stat.nr_unused--;
+					list_del_init(&loop->d_lru);
+				}
+
+				__d_drop(loop);
+				cond_resched_lock(&dcache_lock);
+			}
+			spin_unlock(&dcache_lock);
+
+			/* move to the first child */
+			dentry = list_entry(dentry->d_subdirs.next,
+					    struct dentry, d_u.d_child);
+		}
+
+		/* consume the dentries from this leaf up through its parents
+		 * until we find one with children or run out altogether */
+		do {
+			struct inode *inode;
+
+			BUG_ON(atomic_read(&dentry->d_count) != 0);
+
+			parent = dentry->d_parent;
+			if (parent == dentry)
+				parent = NULL;
+			else
+				atomic_dec(&parent->d_count);
+
+			list_del(&dentry->d_u.d_child);
+			dentry_stat.nr_dentry--;	/* For d_free, below */
+
+			inode = dentry->d_inode;
+			if (inode) {
+				BUG_ON(!list_empty(&inode->inotify_watches));
+				dentry->d_inode = NULL;
+				list_del_init(&dentry->d_alias);
+				if (dentry->d_op && dentry->d_op->d_iput)
+					dentry->d_op->d_iput(dentry, inode);
+				else
+					iput(inode);
+			}
+
+			d_free(dentry);
+
+			/* finished when we fall off the top of the tree,
+			 * otherwise we ascend to the parent and move to the
+			 * next sibling if there is one */
+			if (!parent)
+				return;
+
+			dentry = parent;
+
+		} while (list_empty(&dentry->d_subdirs));
+
+		dentry = list_entry(dentry->d_subdirs.next,
+				    struct dentry, d_u.d_child);
+	}
+}
+
+/*
+ * destroy the dentries attached to a superblock on unmounting
+ * - we don't need to use dentry->d_lock, and only need dcache_lock when
+ *   removing the dentry from the system lists and hashes because:
+ *   - the superblock is detached from all mountings and open files, so the
+ *     dentry trees will not be rearranged by the VFS
+ *   - s_umount is write-locked, so the memory pressure shrinker will ignore
+ *     any dentries belonging to this superblock that it comes across
+ *   - the filesystem itself is no longer permitted to rearrange the dentries
+ *     in this superblock
+ */
+void shrink_dcache_for_umount(struct super_block *sb)
+{
+	struct dentry *dentry;
+
+	if (down_read_trylock(&sb->s_umount))
+		BUG();
+
+	dentry = sb->s_root;
+	sb->s_root = NULL;
+	atomic_dec(&dentry->d_count);
+	shrink_dcache_for_umount_subtree(dentry);
+
+	while (!hlist_empty(&sb->s_anon)) {
+		dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
+		shrink_dcache_for_umount_subtree(dentry);
+	}
+}
+
+/*
  * Search for at least 1 mount point in the dentry's subdirs.
  * We descend to the next level whenever the d_subdirs
  * list is non-empty and continue searching.
diff --git a/fs/super.c b/fs/super.c
index 8a669f6..bd655b1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -222,17 +222,17 @@ static int grab_super(struct super_block
  *	that need destruction out of superblock, call generic_shutdown_super()
  *	and release aforementioned objects.  Note: dentries and inodes _are_
  *	taken care of and do not need specific handling.
+ *
+ *	Upon calling this function, the filesystem may no longer alter or
+ *	rearrange the set of dentries belonging to this super_block, nor may it
+ *	change the attachments of dentries to inodes.
  */
 void generic_shutdown_super(struct super_block *sb)
 {
-	struct dentry *root = sb->s_root;
 	struct super_operations *sop = sb->s_op;
 
-	if (root) {
-		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_sb(sb);
-		dput(root);
+	if (sb->s_root) {
+		shrink_dcache_for_umount(sb);
 		fsync_super(sb);
 		lock_super(sb);
 		sb->s_flags &= ~MS_ACTIVE;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 28c9891..0f660ae 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
+extern void shrink_dcache_for_umount(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-27 10:17     ` David Howells
@ 2006-06-27 22:55       ` Andrew Morton
  2006-06-27 23:18       ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2006-06-27 22:55 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, aviro, neilb, jblunck, linux-kernel, linux-fsdevel, dhowells

David Howells <dhowells@redhat.com> wrote:
>
> 
> The attached patch destroys all the dentries attached to a superblock in one go
>  /*
> + * destroy a single subtree of dentries for unmount
> + * - see the comments on shrink_dcache_for_umount() for a description of the
> + *   locking
> + */
> +static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
> +{
> +	struct dentry *parent;
> +
> +	BUG_ON(!IS_ROOT(dentry));
> +
> +	/* detach this root from the system */
> +	spin_lock(&dcache_lock);
> +	if (!list_empty(&dentry->d_lru)) {
> +		dentry_stat.nr_unused--;
> +		list_del_init(&dentry->d_lru);
> +	}
> +	__d_drop(dentry);
> +	spin_unlock(&dcache_lock);
> +
> +	for (;;) {
> +		/* descend to the first leaf in the current subtree */
> +		while (!list_empty(&dentry->d_subdirs)) {
> +			struct dentry *loop;
> +
> +			/* this is a branch with children - detach all of them
> +			 * from the system in one go */
> +			spin_lock(&dcache_lock);
> +			list_for_each_entry(loop, &dentry->d_subdirs,
> +					    d_u.d_child) {
> +				if (!list_empty(&loop->d_lru)) {
> +					dentry_stat.nr_unused--;
> +					list_del_init(&loop->d_lru);
> +				}
> +
> +				__d_drop(loop);
> +				cond_resched_lock(&dcache_lock);
> +			}
> +			spin_unlock(&dcache_lock);

Is the cond_resched_lock() here safe?  Once we've dropped that lock, the
list cursor `loop' is invalidated?

If all lookup paths to all entries on this list are removed at this time
then OK - but these dentries are still on the LRU..

(An answer-via-comment-patch would suit ;))


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Destroy the dentries contributed by a superblock on unmounting
  2006-06-27 10:17     ` David Howells
  2006-06-27 22:55       ` Andrew Morton
@ 2006-06-27 23:18       ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: David Howells @ 2006-06-27 23:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, torvalds, aviro, neilb, jblunck, linux-kernel,
	linux-fsdevel

Andrew Morton <akpm@osdl.org> wrote:

> Is the cond_resched_lock() here safe?  Once we've dropped that lock, the
> list cursor `loop' is invalidated?

Yes.  Nothing else is permitted to modify the d_subdirs list.  Not the VFS,
not the filesystem, not the memory pressure shrinker.

> If all lookup paths to all entries on this list are removed at this time
> then OK - but these dentries are still on the LRU.

Ah... but the LRU shrinker may not touch dentries belonging to a filesystem
that's got s_umount writelocked (eg: one that's in the process of being
unmounted).  See prune_dcache():

		/*
		 * ...otherwise we need to be sure this filesystem isn't being
		 * unmounted, otherwise we could race with
		 * generic_shutdown_super(), and end up holding a reference to
		 * an inode while the filesystem is unmounted.
		 * So we try to get s_umount, and make sure s_root isn't NULL.
		 * (Take a local copy of s_umount to avoid a use-after-free of
		 * `dentry').
		 */
		s_umount = &dentry->d_sb->s_umount;
		if (down_read_trylock(s_umount)) {
			if (dentry->d_sb->s_root != NULL) {
				prune_one_dentry(dentry);
				up_read(s_umount);
				continue;
			}
			up_read(s_umount);
		}

		spin_unlock(&dentry->d_lock);

> (An answer-via-comment-patch would suit ;))

It is commented already:

/*
 * destroy the dentries attached to a superblock on unmounting
 * - we don't need to use dentry->d_lock, and only need dcache_lock when
 *   removing the dentry from the system lists and hashes because:
 *   - the superblock is detached from all mountings and open files, so the
 *     dentry trees will not be rearranged by the VFS
 *   - s_umount is write-locked, so the memory pressure shrinker will ignore
 *     any dentries belonging to this superblock that it comes across
 *   - the filesystem itself is no longer permitted to rearrange the dentries
 *     in this superblock
 */
void shrink_dcache_for_umount(struct super_block *sb)
{


Note the point about the memory pressure shrinker.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-04  1:05 ` [PATCH] Fix dcache race during umount NeilBrown
@ 2006-04-04  5:11   ` Balbir Singh
  0 siblings, 0 replies; 23+ messages in thread
From: Balbir Singh @ 2006-04-04  5:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, Olaf Hering

> Trying again after some useful comments from Balbir.
> Note: the change to prune_dcache looks quite different now.
>
> NeilBrown
>
<snip>
>                 tmp = dentry_unused.prev;
> +               if (unlikely(sb)) {
> +                       /* Try to find a dentry for this sb, but don't try
> +                        * too hard, if they aren't near the tail they will
> +                        * be moved down again soon
> +                        */
> +                       int skip = count;
> +                       while (skip &&
> +                              tmp != &dentry_unused &&
> +                              list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
> +                               skip--;
> +                               tmp = tmp->prev;
> +                       }
> +               }

This code looks very similar to the first pass in dcache_shrink_sb().
I would prefer if we could re-use that code here, but that would
require creating a helper function by splitting the function.

Looks like a good solution to me.

Regards,
Balbir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-04  0:59     ` Neil Brown
@ 2006-04-04  5:02       ` Balbir Singh
  0 siblings, 0 replies; 23+ messages in thread
From: Balbir Singh @ 2006-04-04  5:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh

> >  * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
> >  * provided the super block is NULL (which means we are trying to reclaim memory
> >  * or this dentry belongs to the same super block that we want to shrink.
> >  */
>
> Ok, thanks.  However it isn't time to "move it to the LRU list" but
> rather time to "move it from the LRU list, out of the cache all
> together, and through it away".

Oops, yes

>
> >
> > One side-effect of this check I see is
> >
> > Earlier, all prune_dcache() calls would prune the dentry cache. This
> > condition will cause dentries belonging only those super blocks being
> > shrink'ed to be freed up. shrink_dcache_memory() will have to do the
> > additional work of freeing dentries (especially for file systems like
> > sysfs, procfs, etc). But the good thing is it should make the per
> > super block operations faster (like unmount). IMO, this is the correct
> > behaviour, but I am not sure of the side-effects.
> >
>
> Hmm... yes, but there is a worse side-effect I think. If
> shrink_dcache_memory finds a dentry that it cannot free, it will move
> it to the head of the LRU, so unmount will not be able to find it so
> easily, and will end up moving it back down to the tail.  I don't
> think this can livelock, but it is unpleasant.
>
> Rather than move these entries to the head of the list, I'd like to
> leave them at the tail, and try to skip over entries that we might not
> be able to free.

Yes and you could use the first pass of dcache_shrink_sb() to do that for you.

Thanks,
Balbir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Fix dcache race during umount
       [not found] <20060404110351.27364.patches@notabene>
@ 2006-04-04  1:05 ` NeilBrown
  2006-04-04  5:11   ` Balbir Singh
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2006-04-04  1:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Jan Blunck, Kirill Korotaev, Olaf Hering, Balbir Singh

Trying again after some useful comments from Balbir.
Note: the change to prune_dcache looks quite different now.

NeilBrown

### Comments for Changeset

The race is that the shrink_dcache_memory shrinker could get called
while a filesystem is being unmounted, and could try to prune
a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the
dentry is no longer able to be found by the umounting process.  If
iput takes a while, generic_shutdown_super could get all the way
though shrink_dcache_parent and shrink_dcache_anon and
invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to
touch it (which some filesystems certainly do), it will lose.  The
promised "Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry
on someone else's dentry.  As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a
filesystem that is being unmounted, and unmount will not be able to
start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it
is doing the prune on behalf of so that it can be careful of other
filesystems.  shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon
list out of the superblock, so it can get the s_anon list itself, and
can pass the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where
it is (at the tail of the list) and exits, on the assumption that some
other thread will be removing that dentry soon.  To try to make sure
that some work gets done, a limited number of dnetries which are
untouchable are skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/dcache.c            |   62 ++++++++++++++++++++++++++++++++++++++++++-----
 ./fs/super.c             |    2 -
 ./include/linux/dcache.h |    2 -
 3 files changed, 58 insertions(+), 8 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./fs/dcache.c	2006-04-04 10:58:14.000000000 +1000
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,7 +394,7 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
@@ -402,6 +404,19 @@ static void prune_dcache(int count)
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
+		if (unlikely(sb)) {
+			/* Try to find a dentry for this sb, but don't try
+			 * too hard, if they aren't near the tail they will
+			 * be moved down again soon
+			 */
+			int skip = count;
+			while (skip &&
+			       tmp != &dentry_unused &&
+			       list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
+				skip--;
+				tmp = tmp->prev;
+			}
+		}
 		if (tmp == &dentry_unused)
 			break;
 		list_del_init(tmp);
@@ -427,7 +442,41 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		prune_one_dentry(dentry);
+		/*
+		 * If the dentry is not DCACHED_REFERENCED, it is time
+		 * to remove it from the dcache, provided the super block is
+		 * NULL (which means we are trying to reclaim memory)
+		 * or this dentry belongs to the same super block that
+		 * we want to shrink.
+		 */
+		/*
+		 * If this dentry is for "my" filesystem, then I can
+		 * prune it without taking the s_umount lock (I already hold it).
+		 */
+		if (sb && dentry->d_sb == sb) {
+			prune_one_dentry(dentry);
+			continue;
+		}
+		/* ...otherwise we need to be sure this filesystem isn't being
+		 * unmounted, otherwise we could race with generic_shutdown_super,
+		 * and end up holding a reference to an inode while the
+		 * filesystem is unmounted.
+		 * So we try to get s_umount, and make sure s_root isn't NULL
+		 */
+		if (down_read_trylock(&dentry->d_sb->s_umount)) {
+			if (dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(&dentry->d_sb->s_umount);
+				continue;
+			}
+			up_read(&dentry->d_sb->s_umount);
+		}
+		spin_unlock(&dentry->d_lock);
+		/* Cannot remove the first dentry, and it isn't appropriate
+		 * to move it to the head of the list, so give up, and try
+		 * later
+		 */
+		break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,7 +679,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /**
@@ -643,9 +692,10 @@ void shrink_dcache_parent(struct dentry 
  * done under dcache_lock.
  *
  */
-void shrink_dcache_anon(struct hlist_head *head)
+void shrink_dcache_anon(struct super_block *sb)
 {
 	struct hlist_node *lp;
+	struct hlist_head *head = &sb->s_anon;
 	int found;
 	do {
 		found = 0;
@@ -668,7 +718,7 @@ void shrink_dcache_anon(struct hlist_hea
 			}
 		}
 		spin_unlock(&dcache_lock);
-		prune_dcache(found);
+		prune_dcache(found, sb);
 	} while(found);
 }
 
@@ -689,7 +739,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }

diff ./fs/super.c~current~ ./fs/super.c
--- ./fs/super.c~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./fs/super.c	2006-04-03 12:21:55.000000000 +1000
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_anon(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);

diff ./include/linux/dcache.h~current~ ./include/linux/dcache.h
--- ./include/linux/dcache.h~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./include/linux/dcache.h	2006-04-03 12:21:55.000000000 +1000
@@ -217,7 +217,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
+extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-03 18:12   ` Balbir Singh
@ 2006-04-04  0:59     ` Neil Brown
  2006-04-04  5:02       ` Balbir Singh
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2006-04-04  0:59 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh

On Monday April 3, balbir@in.ibm.com wrote:
> Hi, Neil,
> 
> >
> > Cc: Jan Blunck <jblunck@suse.de>
> > Cc: Kirill Korotaev <dev@openvz.org>
> > Cc: olh@suse.de
> > Cc: bsingharora@gmail.com
> 
> Could you please make this balbir@in.ibm.com

Sure.

> 
> >
> > Signed-off-by: Neil Brown <neilb@suse.de>
> >
> <snip>
> -               /* If the dentry was recently referenced, don't free it. */
> -               if (dentry->d_flags & DCACHE_REFERENCED) {
> -                       dentry->d_flags &= ~DCACHE_REFERENCED;
> -                       list_add(&dentry->d_lru, &dentry_unused);
> -                       dentry_stat.nr_unused++;
> -                       spin_unlock(&dentry->d_lock);
> -                       continue;
> +               if (!(dentry->d_flags & DCACHE_REFERENCED) &&
> +                   (!sb || dentry->d_sb == sb)) {
> 
> Comments for the condition please. Something like
> 
> /*
>  * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
>  * provided the super block is NULL (which means we are trying to reclaim memory
>  * or this dentry belongs to the same super block that we want to shrink.
>  */

Ok, thanks.  However it isn't time to "move it to the LRU list" but
rather time to "move it from the LRU list, out of the cache all
together, and through it away".

> 
> One side-effect of this check I see is
> 
> Earlier, all prune_dcache() calls would prune the dentry cache. This
> condition will cause dentries belonging only those super blocks being
> shrink'ed to be freed up. shrink_dcache_memory() will have to do the
> additional work of freeing dentries (especially for file systems like
> sysfs, procfs, etc). But the good thing is it should make the per
> super block operations faster (like unmount). IMO, this is the correct
> behaviour, but I am not sure of the side-effects.
> 

Hmm... yes, but there is a worse side-effect I think. If
shrink_dcache_memory finds a dentry that it cannot free, it will move
it to the head of the LRU, so unmount will not be able to find it so
easily, and will end up moving it back down to the tail.  I don't
think this can livelock, but it is unpleasant.

Rather than move these entries to the head of the list, I'd like to
leave them at the tail, and try to skip over entries that we might not
be able to free.



> +                       if (sb) {
> +                               prune_one_dentry(dentry);
> +                               continue;
> +                       }
> > +                       /* Need to avoid race with generic_shutdown_super */
> > +                       if (down_read_trylock(&dentry->d_sb->s_umount) &&
> > +                           dentry->d_sb->s_root != NULL) {
> 
> There is a probable bug here. What if down_read_trylock() succeeds and
> dentry->d_sb->s_root == NULL? We still need to do an up_read before we
> move on.

Oops, yes.  Thanks for that!

> The comment would be better put as
> 
> /*
>  * If we are able to acquire the umount semaphore, then the super
> block cannot be unmounted
>  * while we are pruning this dentry. This helps avoid a race condition
> that is caused due to
>  * intermediate reference counts held by the children of the dentry in
> prune_one_dentry().
>  * This leads to select_dcache_parent() ignoring those dentries,
> leaving behind non-dput
>  * dentries. The unmount happens before prune_one_dentry() can dput
> the dentries.
>  */

Well, I've beefed up the comment, but I would rather focus on the
prune_one_dentry holding an inode rather than holding a dentry.  I
think that problem is clearer and it comes to the same thing.

Patch to follow.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-03  3:40 ` NeilBrown
@ 2006-04-03 18:12   ` Balbir Singh
  2006-04-04  0:59     ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Balbir Singh @ 2006-04-03 18:12 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh, balbir

Hi, Neil,

>
> Cc: Jan Blunck <jblunck@suse.de>
> Cc: Kirill Korotaev <dev@openvz.org>
> Cc: olh@suse.de
> Cc: bsingharora@gmail.com

Could you please make this balbir@in.ibm.com

>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
<snip>
-               /* If the dentry was recently referenced, don't free it. */
-               if (dentry->d_flags & DCACHE_REFERENCED) {
-                       dentry->d_flags &= ~DCACHE_REFERENCED;
-                       list_add(&dentry->d_lru, &dentry_unused);
-                       dentry_stat.nr_unused++;
-                       spin_unlock(&dentry->d_lock);
-                       continue;
+               if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+                   (!sb || dentry->d_sb == sb)) {

Comments for the condition please. Something like

/*
 * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
 * provided the super block is NULL (which means we are trying to reclaim memory
 * or this dentry belongs to the same super block that we want to shrink.
 */

One side-effect of this check I see is

Earlier, all prune_dcache() calls would prune the dentry cache. This
condition will cause dentries belonging only those super blocks being
shrink'ed to be freed up. shrink_dcache_memory() will have to do the
additional work of freeing dentries (especially for file systems like
sysfs, procfs, etc). But the good thing is it should make the per
super block operations faster (like unmount). IMO, this is the correct
behaviour, but I am not sure of the side-effects.

+                       if (sb) {
+                               prune_one_dentry(dentry);
+                               continue;
+                       }
> +                       /* Need to avoid race with generic_shutdown_super */
> +                       if (down_read_trylock(&dentry->d_sb->s_umount) &&
> +                           dentry->d_sb->s_root != NULL) {

There is a probable bug here. What if down_read_trylock() succeeds and
dentry->d_sb->s_root == NULL? We still need to do an up_read before we
move on.
The comment would be better put as

/*
 * If we are able to acquire the umount semaphore, then the super
block cannot be unmounted
 * while we are pruning this dentry. This helps avoid a race condition
that is caused due to
 * intermediate reference counts held by the children of the dentry in
prune_one_dentry().
 * This leads to select_dcache_parent() ignoring those dentries,
leaving behind non-dput
 * dentries. The unmount happens before prune_one_dentry() can dput
the dentries.
 */

> +                               prune_one_dentry(dentry);
> +                               up_read(&dentry->d_sb->s_umount);
> +                               continue;
> +                       }
>                 }
<snip>

Looks good to go, except for the comments (especially the up_read() bug)

Balbir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] Fix dcache race during umount
       [not found] <20060403133804.27986.patches@notabene>
@ 2006-04-03  3:40 ` NeilBrown
  2006-04-03 18:12   ` Balbir Singh
  0 siblings, 1 reply; 23+ messages in thread
From: NeilBrown @ 2006-04-03  3:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Jan Blunck, Kirill Korotaev, olh, bsingharora

I think we finally have consensus on this patch.  However please speak
up if we don't :-)

Patch is against 2.6.16-mm2

NeilBrown


### Comments for Changeset

The race is that the shrink_dcache_memory shrinker could get called
while a filesystem is being unmounted, and could try to prune
a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the
dentry is no longer able to be found by the umounting process.  If
iput takes a while, generic_shutdown_super could get all the way
though shrink_dcache_parent and shrink_dcache_anon and
invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to
touch it (which some filesystems certainly do), it will lose.  The
promised "Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry
on someone else's dentry.  As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a
filesystem that is being unmounted, and unmount will not be able to
start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it
is doing the prune on behalf of so that it can be careful of other
filesystems.  shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon
list out of the superblock, so it can get the s_anon list itself, and
can pass the superblock down to prune_dcache.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: olh@suse.de
Cc: bsingharora@gmail.com

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/dcache.c            |   41 ++++++++++++++++++++++++++++-------------
 ./fs/super.c             |    2 +-
 ./include/linux/dcache.h |    2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./fs/dcache.c	2006-04-03 12:21:55.000000000 +1000
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,7 +394,7 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
@@ -419,15 +421,27 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    (!sb || dentry->d_sb == sb)) {
+			if (sb) {
+				prune_one_dentry(dentry);
+				continue;
+			}
+			/* Need to avoid race with generic_shutdown_super */
+			if (down_read_trylock(&dentry->d_sb->s_umount) &&
+			    dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(&dentry->d_sb->s_umount);
+				continue;
+			}
 		}
-		prune_one_dentry(dentry);
+		/* The dentry was recently referenced, or the filesystem
+		 * is being unmounted, so don't free it. */
+
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,7 +644,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /**
@@ -643,9 +657,10 @@ void shrink_dcache_parent(struct dentry 
  * done under dcache_lock.
  *
  */
-void shrink_dcache_anon(struct hlist_head *head)
+void shrink_dcache_anon(struct super_block *sb)
 {
 	struct hlist_node *lp;
+	struct hlist_head *head = &sb->s_anon;
 	int found;
 	do {
 		found = 0;
@@ -668,7 +683,7 @@ void shrink_dcache_anon(struct hlist_hea
 			}
 		}
 		spin_unlock(&dcache_lock);
-		prune_dcache(found);
+		prune_dcache(found, sb);
 	} while(found);
 }
 
@@ -689,7 +704,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }

diff ./fs/super.c~current~ ./fs/super.c
--- ./fs/super.c~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./fs/super.c	2006-04-03 12:21:55.000000000 +1000
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_anon(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);

diff ./include/linux/dcache.h~current~ ./include/linux/dcache.h
--- ./include/linux/dcache.h~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./include/linux/dcache.h	2006-04-03 12:21:55.000000000 +1000
@@ -217,7 +217,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
+extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-06-27 23:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-22 12:22 [PATCH] Fix dcache race during umount David Howells
2006-06-22 14:27 ` David Howells
2006-06-22 16:08 ` Jan Blunck
2006-06-22 16:44   ` Jan Blunck
2006-06-23 13:28 ` Jan Blunck
2006-06-24  5:23 ` Neil Brown
2006-06-24  9:16 ` David Howells
2006-06-24 13:33 ` [PATCH] Destroy the dentries contributed by a superblock on unmounting David Howells
2006-06-25  6:48   ` Neil Brown
2006-06-25 16:02   ` David Howells
2006-06-25 16:30     ` Nick Piggin
2006-06-26  6:05     ` Neil Brown
2006-06-26 11:21     ` David Howells
2006-06-27  0:53       ` Neil Brown
2006-06-27 10:17     ` David Howells
2006-06-27 22:55       ` Andrew Morton
2006-06-27 23:18       ` David Howells
     [not found] <20060404110351.27364.patches@notabene>
2006-04-04  1:05 ` [PATCH] Fix dcache race during umount NeilBrown
2006-04-04  5:11   ` Balbir Singh
     [not found] <20060403133804.27986.patches@notabene>
2006-04-03  3:40 ` NeilBrown
2006-04-03 18:12   ` Balbir Singh
2006-04-04  0:59     ` Neil Brown
2006-04-04  5:02       ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).