linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/28] [rfc] dcache scaling part 1
@ 2010-11-16 14:09 Nick Piggin
  2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
                   ` (30 more replies)
  0 siblings, 31 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

There are 3 main parts to dcache scaling. This one primarily adds new locks
to take over dcache_lock, and some pre/post prep and streamlining patches.

The second implements fine grained locking, and is rather trivial after
part 1.

The third implements rcu-walk. rcu-walk depends on the first part, because it
relies on using d_lock to protect the state of the dentry (when converting from
rcu-walk to refcounted walk). Without the fine grained locing, we'd need to use
dcache_lock for that, which would be a step backwards to put into path walking
again.

Comments?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 01/28] fs: d_validate fixes
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-17 10:44   ` Andi Kleen
  2010-11-18 20:51   ` David Miller
  2010-11-16 14:09 ` [patch 02/28] kernel: kmem_ptr_validate considered harmful Nick Piggin
                   ` (29 subsequent siblings)
  30 siblings, 2 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-d_validate-fix.patch --]
[-- Type: text/plain, Size: 2394 bytes --]

d_validate has been broken for a long time.

kmem_ptr_validate does not guarantee that a pointer can be dereferenced
if it can go away at any time. Even rcu_read_lock doesn't help, because
the pointer might be queued in RCU callbacks but not executed yet.

So the parent cannot be checked, nor the name hashed. The dentry pointer
can not be touched until it can be verified under lock. Hashing simply
cannot be used.

Instead, verify the parent/child relationship by traversing parent's
d_child list. It's slow, but only ncpfs and the destaged smbfs care
about it, at this point.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:11:48.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:52.000000000 +1100
@@ -1483,41 +1483,30 @@ struct dentry *d_hash_and_lookup(struct
 }
 
 /**
- * d_validate - verify dentry provided from insecure source
+ * d_validate - verify dentry provided from insecure source (deprecated)
  * @dentry: The dentry alleged to be valid child of @dparent
  * @dparent: The parent dentry (known to be valid)
  *
  * An insecure source has sent us a dentry, here we verify it and dget() it.
  * This is used by ncpfs in its readdir implementation.
  * Zero is returned in the dentry is invalid.
+ *
+ * This function is slow for big directories, and deprecated, do not use it.
  */
- 
 int d_validate(struct dentry *dentry, struct dentry *dparent)
 {
-	struct hlist_head *base;
-	struct hlist_node *lhp;
-
-	/* Check whether the ptr might be valid at all.. */
-	if (!kmem_ptr_validate(dentry_cache, dentry))
-		goto out;
-
-	if (dentry->d_parent != dparent)
-		goto out;
+	struct dentry *child;
 
 	spin_lock(&dcache_lock);
-	base = d_hash(dparent, dentry->d_name.hash);
-	hlist_for_each(lhp,base) { 
-		/* hlist_for_each_entry_rcu() not required for d_hash list
-		 * as it is parsed under dcache_lock
-		 */
-		if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
+	list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
+		if (dentry == child) {
 			__dget_locked(dentry);
 			spin_unlock(&dcache_lock);
 			return 1;
 		}
 	}
 	spin_unlock(&dcache_lock);
-out:
+
 	return 0;
 }
 EXPORT_SYMBOL(d_validate);



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 02/28] kernel: kmem_ptr_validate considered harmful
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
  2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 03/28] fs: dcache documentation cleanup Nick Piggin
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-d_validate-broken.patch --]
[-- Type: text/plain, Size: 5807 bytes --]

This is a nasty ugly and error prone API. It's sole user, dcache, could not
get it right so there is roughly zero chance that anything else will.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 include/linux/slab.h |    2 --
 mm/slab.c            |   32 +-------------------------------
 mm/slob.c            |    5 -----
 mm/slub.c            |   40 ----------------------------------------
 mm/util.c            |   21 ---------------------
 5 files changed, 1 insertion(+), 99 deletions(-)

Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h	2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/include/linux/slab.h	2010-11-17 00:11:49.000000000 +1100
@@ -106,8 +106,6 @@ int kmem_cache_shrink(struct kmem_cache
 void kmem_cache_free(struct kmem_cache *, void *);
 unsigned int kmem_cache_size(struct kmem_cache *);
 const char *kmem_cache_name(struct kmem_cache *);
-int kern_ptr_validate(const void *ptr, unsigned long size);
-int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
 
 /*
  * Please use this macro to create slab caches. Simply specify the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slab.c	2010-11-17 00:11:49.000000000 +1100
@@ -2781,7 +2781,7 @@ static void slab_put_obj(struct kmem_cac
 /*
  * Map pages beginning at addr to the given cache and slab. This is required
  * for the slab allocator to be able to lookup the cache and slab of a
- * virtual address for kfree, ksize, kmem_ptr_validate, and slab debugging.
+ * virtual address for kfree, ksize, and slab debugging.
  */
 static void slab_map_pages(struct kmem_cache *cache, struct slab *slab,
 			   void *addr)
@@ -3660,36 +3660,6 @@ void *kmem_cache_alloc_notrace(struct km
 EXPORT_SYMBOL(kmem_cache_alloc_notrace);
 #endif
 
-/**
- * kmem_ptr_validate - check if an untrusted pointer might be a slab entry.
- * @cachep: the cache we're checking against
- * @ptr: pointer to validate
- *
- * This verifies that the untrusted pointer looks sane;
- * it is _not_ a guarantee that the pointer is actually
- * part of the slab cache in question, but it at least
- * validates that the pointer can be dereferenced and
- * looks half-way sane.
- *
- * Currently only used for dentry validation.
- */
-int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr)
-{
-	unsigned long size = cachep->buffer_size;
-	struct page *page;
-
-	if (unlikely(!kern_ptr_validate(ptr, size)))
-		goto out;
-	page = virt_to_page(ptr);
-	if (unlikely(!PageSlab(page)))
-		goto out;
-	if (unlikely(page_get_cache(page) != cachep))
-		goto out;
-	return 1;
-out:
-	return 0;
-}
-
 #ifdef CONFIG_NUMA
 void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slub.c	2010-11-17 00:24:15.000000000 +1100
@@ -1917,17 +1917,6 @@ void kmem_cache_free(struct kmem_cache *
 }
 EXPORT_SYMBOL(kmem_cache_free);
 
-/* Figure out on which slab page the object resides */
-static struct page *get_object_page(const void *x)
-{
-	struct page *page = virt_to_head_page(x);
-
-	if (!PageSlab(page))
-		return NULL;
-
-	return page;
-}
-
 /*
  * Object placement in a slab is made very easy because we always start at
  * offset 0. If we tune the size of the object to the alignment then we can
@@ -2386,35 +2375,6 @@ static int kmem_cache_open(struct kmem_c
 }
 
 /*
- * Check if a given pointer is valid
- */
-int kmem_ptr_validate(struct kmem_cache *s, const void *object)
-{
-	struct page *page;
-
-	if (!kern_ptr_validate(object, s->size))
-		return 0;
-
-	page = get_object_page(object);
-
-	if (!page || s != page->slab)
-		/* No slab or wrong slab */
-		return 0;
-
-	if (!check_valid_pointer(s, page, object))
-		return 0;
-
-	/*
-	 * We could also check if the object is on the slabs freelist.
-	 * But this would be too expensive and it seems that the main
-	 * purpose of kmem_ptr_valid() is to check if the object belongs
-	 * to a certain slab.
-	 */
-	return 1;
-}
-EXPORT_SYMBOL(kmem_ptr_validate);
-
-/*
  * Determine the size of a slab object
  */
 unsigned int kmem_cache_size(struct kmem_cache *s)
Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c	2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/util.c	2010-11-17 00:11:49.000000000 +1100
@@ -186,27 +186,6 @@ void kzfree(const void *p)
 }
 EXPORT_SYMBOL(kzfree);
 
-int kern_ptr_validate(const void *ptr, unsigned long size)
-{
-	unsigned long addr = (unsigned long)ptr;
-	unsigned long min_addr = PAGE_OFFSET;
-	unsigned long align_mask = sizeof(void *) - 1;
-
-	if (unlikely(addr < min_addr))
-		goto out;
-	if (unlikely(addr > (unsigned long)high_memory - size))
-		goto out;
-	if (unlikely(addr & align_mask))
-		goto out;
-	if (unlikely(!kern_addr_valid(addr)))
-		goto out;
-	if (unlikely(!kern_addr_valid(addr + size - 1)))
-		goto out;
-	return 1;
-out:
-	return 0;
-}
-
 /*
  * strndup_user - duplicate an existing string from user space
  * @s: The string to duplicate
Index: linux-2.6/mm/slob.c
===================================================================
--- linux-2.6.orig/mm/slob.c	2010-11-17 00:10:41.000000000 +1100
+++ linux-2.6/mm/slob.c	2010-11-17 00:11:49.000000000 +1100
@@ -678,11 +678,6 @@ int kmem_cache_shrink(struct kmem_cache
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
 
-int kmem_ptr_validate(struct kmem_cache *a, const void *b)
-{
-	return 0;
-}
-
 static unsigned int slob_ready __read_mostly;
 
 int slab_is_available(void)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 03/28] fs: dcache documentation cleanup
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
  2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
  2010-11-16 14:09 ` [patch 02/28] kernel: kmem_ptr_validate considered harmful Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 04/28] fs: change d_delete semantics Nick Piggin
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-lock-doc-remove.patch --]
[-- Type: text/plain, Size: 1509 bytes --]

Remove redundant (and incorrect, since dcache RCU lookup) dentry locking
documentation and point to the canonical document.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 include/linux/dcache.h |   18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:50:55.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:51.000000000 +1100
@@ -141,22 +141,16 @@ struct dentry_operations {
 	char *(*d_dname)(struct dentry *, char *, int);
 };
 
-/* the dentry parameter passed to d_hash and d_compare is the parent
+/*
+ * Locking rules for dentry_operations callbacks are to be found in
+ * Documentation/filesystems/Locking. Keep it updated!
+ *
+ * the dentry parameter passed to d_hash and d_compare is the parent
  * directory of the entries to be compared. It is used in case these
  * functions need any directory specific information for determining
  * equivalency classes.  Using the dentry itself might not work, as it
  * might be a negative dentry which has no information associated with
- * it */
-
-/*
-locking rules:
-		big lock	dcache_lock	d_lock   may block
-d_revalidate:	no		no		no       yes
-d_hash		no		no		no       yes
-d_compare:	no		yes		yes      no
-d_delete:	no		yes		no       no
-d_release:	no		no		no       yes
-d_iput:		no		no		no       yes
+ * it.
  */
 
 /* d_flags entries */



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 04/28] fs: change d_delete semantics
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (2 preceding siblings ...)
  2010-11-16 14:09 ` [patch 03/28] fs: dcache documentation cleanup Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-17  0:16   ` Tim Pepper
  2010-11-16 14:09 ` [patch 05/28] cifs: dont overwrite dentry name in d_revalidate Nick Piggin
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-d_delete-change.patch --]
[-- Type: text/plain, Size: 16184 bytes --]

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 Documentation/filesystems/porting |    9 +++++++++
 Documentation/filesystems/vfs.txt |   27 +++++++++++++--------------
 arch/ia64/kernel/perfmon.c        |    2 +-
 fs/9p/vfs_dentry.c                |    4 ++--
 fs/afs/dir.c                      |    4 ++--
 fs/btrfs/inode.c                  |    2 +-
 fs/coda/dir.c                     |    4 ++--
 fs/configfs/dir.c                 |    2 +-
 fs/dcache.c                       |    2 --
 fs/gfs2/dentry.c                  |    2 +-
 fs/hostfs/hostfs_kern.c           |    2 +-
 fs/libfs.c                        |    2 +-
 fs/ncpfs/dir.c                    |    4 ++--
 fs/nfs/dir.c                      |    2 +-
 fs/proc/base.c                    |    2 +-
 fs/proc/generic.c                 |    2 +-
 fs/proc/proc_sysctl.c             |    2 +-
 fs/sysfs/dir.c                    |    2 +-
 include/linux/dcache.h            |    6 +++---
 net/sunrpc/rpc_pipe.c             |    2 +-
 20 files changed, 45 insertions(+), 39 deletions(-)

Index: linux-2.6/Documentation/filesystems/porting
===================================================================
--- linux-2.6.orig/Documentation/filesystems/porting	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/Documentation/filesystems/porting	2010-11-17 01:05:50.000000000 +1100
@@ -318,3 +318,12 @@ if it's zero is not *and* *never* *had*
 may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
 free the on-disk inode, you may end up doing that while ->write_inode() is writing
 to it.
+
+---
+[mandatory]
+
+	.d_delete() now only advises the dcache as to whether or not to cache
+unreferenced dentries, and is now only called when the dentry refcount goes to
+0. Even on 0 refcount transition, it must be able to tolerate being called 0,
+1, or more times (eg. constant, idempotent).
+
Index: linux-2.6/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/vfs.txt	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/Documentation/filesystems/vfs.txt	2010-11-17 01:05:50.000000000 +1100
@@ -841,9 +841,9 @@ the VFS uses a default. As of kernel 2.6
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
-	int (*d_hash) (struct dentry *, struct qstr *);
-	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
-	int (*d_delete)(struct dentry *);
+	int (*d_hash)(struct dentry *, struct qstr *);
+	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
+	int (*d_delete)(const struct dentry *);
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
 	char *(*d_dname)(struct dentry *, char *, int);
@@ -858,9 +858,11 @@ struct dentry_operations {
 
   d_compare: called when a dentry should be compared with another
 
-  d_delete: called when the last reference to a dentry is
-	deleted. This means no-one is using the dentry, however it is
-	still valid and in the dcache
+  d_delete: called when the last reference to a dentry is dropped and the
+	dcache is deciding whether or not to cache it. Return 1 to delete
+	immediately, or 0 to cache the dentry. Default is NULL which means to
+	always cache a reachable dentry. d_delete must be constant and
+	idempotent.
 
   d_release: called when a dentry is really deallocated
 
@@ -904,14 +906,11 @@ There are a number of functions defined
 	the usage count)
 
   dput: close a handle for a dentry (decrements the usage count). If
-	the usage count drops to 0, the "d_delete" method is called
-	and the dentry is placed on the unused list if the dentry is
-	still in its parents hash list. Putting the dentry on the
-	unused list just means that if the system needs some RAM, it
-	goes through the unused list of dentries and deallocates them.
-	If the dentry has already been unhashed and the usage count
-	drops to 0, in this case the dentry is deallocated after the
-	"d_delete" method is called
+	the usage count drops to 0, and the dentry is still in its
+	parent's hash, the "d_delete" method is called to check whether
+	it should be cached. If it should not be cached, or if the dentry
+	is not hashed, it is deleted. Otherwise cached dentries are put
+	into an LRU list to be reclaimed on memory shortage.
 
   d_drop: this unhashes a dentry from its parents hash list. A
 	subsequent call to dput() will deallocate the dentry if its
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:50.000000000 +1100
@@ -446,8 +446,6 @@ static void prune_one_dentry(struct dent
 		if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock))
 			return;
 
-		if (dentry->d_op && dentry->d_op->d_delete)
-			dentry->d_op->d_delete(dentry);
 		dentry_lru_del(dentry);
 		__d_drop(dentry);
 		dentry = d_kill(dentry);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:04.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:50.000000000 +1100
@@ -133,9 +133,9 @@ enum dentry_d_lock_class
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
-	int (*d_hash) (struct dentry *, struct qstr *);
-	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
-	int (*d_delete)(struct dentry *);
+	int (*d_hash)(struct dentry *, struct qstr *);
+	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
+	int (*d_delete)(const struct dentry *);
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
 	char *(*d_dname)(struct dentry *, char *, int);
Index: linux-2.6/arch/ia64/kernel/perfmon.c
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/perfmon.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/arch/ia64/kernel/perfmon.c	2010-11-17 00:52:37.000000000 +1100
@@ -2185,7 +2185,7 @@ static const struct file_operations pfm_
 };
 
 static int
-pfmfs_delete_dentry(struct dentry *dentry)
+pfmfs_delete_dentry(const struct dentry *dentry)
 {
 	return 1;
 }
Index: linux-2.6/fs/9p/vfs_dentry.c
===================================================================
--- linux-2.6.orig/fs/9p/vfs_dentry.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/9p/vfs_dentry.c	2010-11-17 00:52:37.000000000 +1100
@@ -51,7 +51,7 @@
  *
  */
 
-static int v9fs_dentry_delete(struct dentry *dentry)
+static int v9fs_dentry_delete(const struct dentry *dentry)
 {
 	P9_DPRINTK(P9_DEBUG_VFS, " dentry: %s (%p)\n", dentry->d_name.name,
 									dentry);
@@ -68,7 +68,7 @@ static int v9fs_dentry_delete(struct den
  *
  */
 
-static int v9fs_cached_dentry_delete(struct dentry *dentry)
+static int v9fs_cached_dentry_delete(const struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
 	P9_DPRINTK(P9_DEBUG_VFS, " dentry: %s (%p)\n", dentry->d_name.name,
Index: linux-2.6/fs/afs/dir.c
===================================================================
--- linux-2.6.orig/fs/afs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/afs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -23,7 +23,7 @@ static struct dentry *afs_lookup(struct
 static int afs_dir_open(struct inode *inode, struct file *file);
 static int afs_readdir(struct file *file, void *dirent, filldir_t filldir);
 static int afs_d_revalidate(struct dentry *dentry, struct nameidata *nd);
-static int afs_d_delete(struct dentry *dentry);
+static int afs_d_delete(const struct dentry *dentry);
 static void afs_d_release(struct dentry *dentry);
 static int afs_lookup_filldir(void *_cookie, const char *name, int nlen,
 				  loff_t fpos, u64 ino, unsigned dtype);
@@ -730,7 +730,7 @@ static int afs_d_revalidate(struct dentr
  * - called from dput() when d_count is going to 0.
  * - return 1 to request dentry be unhashed, 0 otherwise
  */
-static int afs_d_delete(struct dentry *dentry)
+static int afs_d_delete(const struct dentry *dentry)
 {
 	_enter("%s", dentry->d_name.name);
 
Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/btrfs/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -4127,7 +4127,7 @@ struct inode *btrfs_lookup_dentry(struct
 	return inode;
 }
 
-static int btrfs_dentry_delete(struct dentry *dentry)
+static int btrfs_dentry_delete(const struct dentry *dentry)
 {
 	struct btrfs_root *root;
 
Index: linux-2.6/fs/coda/dir.c
===================================================================
--- linux-2.6.orig/fs/coda/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/coda/dir.c	2010-11-17 01:05:46.000000000 +1100
@@ -47,7 +47,7 @@ static int coda_readdir(struct file *fil
 
 /* dentry ops */
 static int coda_dentry_revalidate(struct dentry *de, struct nameidata *nd);
-static int coda_dentry_delete(struct dentry *);
+static int coda_dentry_delete(const struct dentry *);
 
 /* support routines */
 static int coda_venus_readdir(struct file *coda_file, void *buf,
@@ -577,7 +577,7 @@ static int coda_dentry_revalidate(struct
  * This is the callback from dput() when d_count is going to 0.
  * We use this to unhash dentries with bad inodes.
  */
-static int coda_dentry_delete(struct dentry * dentry)
+static int coda_dentry_delete(const struct dentry * dentry)
 {
 	int flags;
 
Index: linux-2.6/fs/configfs/dir.c
===================================================================
--- linux-2.6.orig/fs/configfs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/configfs/dir.c	2010-11-17 01:05:46.000000000 +1100
@@ -67,7 +67,7 @@ static void configfs_d_iput(struct dentr
  * We _must_ delete our dentries on last dput, as the chain-to-parent
  * behavior is required to clear the parents of default_groups.
  */
-static int configfs_d_delete(struct dentry *dentry)
+static int configfs_d_delete(const struct dentry *dentry)
 {
 	return 1;
 }
Index: linux-2.6/fs/gfs2/dentry.c
===================================================================
--- linux-2.6.orig/fs/gfs2/dentry.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/gfs2/dentry.c	2010-11-17 01:05:49.000000000 +1100
@@ -106,7 +106,7 @@ static int gfs2_dhash(struct dentry *den
 	return 0;
 }
 
-static int gfs2_dentry_delete(struct dentry *dentry)
+static int gfs2_dentry_delete(const struct dentry *dentry)
 {
 	struct gfs2_inode *ginode;
 
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/hostfs/hostfs_kern.c	2010-11-17 01:05:48.000000000 +1100
@@ -32,7 +32,7 @@ static inline struct hostfs_inode_info *
 
 #define FILE_HOSTFS_I(file) HOSTFS_I((file)->f_path.dentry->d_inode)
 
-static int hostfs_d_delete(struct dentry *dentry)
+static int hostfs_d_delete(const struct dentry *dentry)
 {
 	return 1;
 }
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/libfs.c	2010-11-17 01:05:45.000000000 +1100
@@ -37,7 +37,7 @@ int simple_statfs(struct dentry *dentry,
  * Retaining negative dentries for an in-memory filesystem just wastes
  * memory and lookup time: arrange for them to be deleted immediately.
  */
-static int simple_delete_dentry(struct dentry *dentry)
+static int simple_delete_dentry(const struct dentry *dentry)
 {
 	return 1;
 }
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 01:05:50.000000000 +1100
@@ -77,7 +77,7 @@ const struct inode_operations ncp_dir_in
 static int ncp_lookup_validate(struct dentry *, struct nameidata *);
 static int ncp_hash_dentry(struct dentry *, struct qstr *);
 static int ncp_compare_dentry (struct dentry *, struct qstr *, struct qstr *);
-static int ncp_delete_dentry(struct dentry *);
+static int ncp_delete_dentry(const struct dentry *);
 
 static const struct dentry_operations ncp_dentry_operations =
 {
@@ -163,7 +163,7 @@ ncp_compare_dentry(struct dentry *dentry
  * Closing files can be safely postponed until iput() - it's done there anyway.
  */
 static int
-ncp_delete_dentry(struct dentry * dentry)
+ncp_delete_dentry(const struct dentry * dentry)
 {
 	struct inode *inode = dentry->d_inode;
 
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/nfs/dir.c	2010-11-17 01:05:47.000000000 +1100
@@ -1091,7 +1091,7 @@ static int nfs_lookup_revalidate(struct
 /*
  * This is called from dput() when d_count is going to 0.
  */
-static int nfs_dentry_delete(struct dentry *dentry)
+static int nfs_dentry_delete(const struct dentry *dentry)
 {
 	dfprintk(VFS, "NFS: dentry_delete(%s/%s, %x)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
Index: linux-2.6/fs/proc/base.c
===================================================================
--- linux-2.6.orig/fs/proc/base.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/proc/base.c	2010-11-17 00:52:37.000000000 +1100
@@ -1744,7 +1744,7 @@ static int pid_revalidate(struct dentry
 	return 0;
 }
 
-static int pid_delete_dentry(struct dentry * dentry)
+static int pid_delete_dentry(const struct dentry * dentry)
 {
 	/* Is the task we represent dead?
 	 * If so, then don't put the dentry on the lru list,
Index: linux-2.6/fs/proc/generic.c
===================================================================
--- linux-2.6.orig/fs/proc/generic.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/proc/generic.c	2010-11-17 00:52:37.000000000 +1100
@@ -400,7 +400,7 @@ static const struct inode_operations pro
  * smarter: we could keep a "volatile" flag in the 
  * inode to indicate which ones to keep.
  */
-static int proc_delete_dentry(struct dentry * dentry)
+static int proc_delete_dentry(const struct dentry * dentry)
 {
 	return 1;
 }
Index: linux-2.6/fs/proc/proc_sysctl.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_sysctl.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/proc/proc_sysctl.c	2010-11-17 01:05:50.000000000 +1100
@@ -392,7 +392,7 @@ static int proc_sys_revalidate(struct de
 	return !PROC_I(dentry->d_inode)->sysctl->unregistering;
 }
 
-static int proc_sys_delete(struct dentry *dentry)
+static int proc_sys_delete(const struct dentry *dentry)
 {
 	return !!PROC_I(dentry->d_inode)->sysctl->unregistering;
 }
Index: linux-2.6/fs/sysfs/dir.c
===================================================================
--- linux-2.6.orig/fs/sysfs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/sysfs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -231,7 +231,7 @@ void release_sysfs_dirent(struct sysfs_d
 		goto repeat;
 }
 
-static int sysfs_dentry_delete(struct dentry *dentry)
+static int sysfs_dentry_delete(const struct dentry *dentry)
 {
 	struct sysfs_dirent *sd = dentry->d_fsdata;
 	return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
Index: linux-2.6/net/sunrpc/rpc_pipe.c
===================================================================
--- linux-2.6.orig/net/sunrpc/rpc_pipe.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/net/sunrpc/rpc_pipe.c	2010-11-17 00:52:37.000000000 +1100
@@ -430,7 +430,7 @@ void rpc_put_mount(void)
 }
 EXPORT_SYMBOL_GPL(rpc_put_mount);
 
-static int rpc_delete_dentry(struct dentry *dentry)
+static int rpc_delete_dentry(const struct dentry *dentry)
 {
 	return 1;
 }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 05/28] cifs: dont overwrite dentry name in d_revalidate
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (3 preceding siblings ...)
  2010-11-16 14:09 ` [patch 04/28] fs: change d_delete semantics Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 06/28] jfs: " Nick Piggin
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: cifs-d_compare-constify.patch --]
[-- Type: text/plain, Size: 2280 bytes --]

Use vfat's method for dealing with negative dentries to preserve case,
rather than overwrite dentry name in d_revalidate, which is a bit ugly
and also gets in the way of doing lock-free path walking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/cifs/dir.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/cifs/dir.c	2010-11-17 01:05:50.000000000 +1100
@@ -656,22 +656,34 @@ cifs_lookup(struct inode *parent_dir_ino
 static int
 cifs_d_revalidate(struct dentry *direntry, struct nameidata *nd)
 {
-	int isValid = 1;
-
 	if (direntry->d_inode) {
 		if (cifs_revalidate_dentry(direntry))
 			return 0;
-	} else {
-		cFYI(1, "neg dentry 0x%p name = %s",
-			 direntry, direntry->d_name.name);
-		if (time_after(jiffies, direntry->d_time + HZ) ||
-			!lookupCacheEnabled) {
-			d_drop(direntry);
-			isValid = 0;
-		}
+		else
+			return 1;
+	}
+
+	/*
+	 * This may be nfsd (or something), anyway, we can't see the
+	 * intent of this. So, since this can be for creation, drop it.
+	 */
+	if (!nd)
+		return 0;
+
+	/*
+	 * Drop the negative dentry, in order to make sure to use the
+	 * case sensitive name which is specified by user if this is
+	 * for creation.
+	 */
+	if (!(nd->flags & (LOOKUP_CONTINUE | LOOKUP_PARENT))) {
+		if (nd->flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
+			return 0;
 	}
 
-	return isValid;
+	if (time_after(jiffies, direntry->d_time + HZ) || !lookupCacheEnabled)
+		return 0;
+
+	return 1;
 }
 
 /* static int cifs_d_delete(struct dentry *direntry)
@@ -709,15 +721,8 @@ static int cifs_ci_compare(struct dentry
 	struct nls_table *codepage = CIFS_SB(dentry->d_inode->i_sb)->local_nls;
 
 	if ((a->len == b->len) &&
-	    (nls_strnicmp(codepage, a->name, b->name, a->len) == 0)) {
-		/*
-		 * To preserve case, don't let an existing negative dentry's
-		 * case take precedence.  If a is not a negative dentry, this
-		 * should have no side effects
-		 */
-		memcpy((void *)a->name, b->name, a->len);
+	    (nls_strnicmp(codepage, a->name, b->name, a->len) == 0))
 		return 0;
-	}
 	return 1;
 }
 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 06/28] jfs: dont overwrite dentry name in d_revalidate
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (4 preceding siblings ...)
  2010-11-16 14:09 ` [patch 05/28] cifs: dont overwrite dentry name in d_revalidate Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 07/28] fs: change d_compare for rcu-walk Nick Piggin
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: jfs-d_compare-constify.patch --]
[-- Type: text/plain, Size: 2412 bytes --]

Use vfat's method for dealing with negative dentries to preserve case,
rather than overwrite dentry name in d_revalidate, which is a bit ugly
and also gets in the way of doing lock-free path walking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/jfs/namei.c |   43 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 35 insertions(+), 8 deletions(-)

Index: linux-2.6/fs/jfs/namei.c
===================================================================
--- linux-2.6.orig/fs/jfs/namei.c	2010-11-17 00:50:54.000000000 +1100
+++ linux-2.6/fs/jfs/namei.c	2010-11-17 01:05:50.000000000 +1100
@@ -18,6 +18,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/namei.h>
 #include <linux/ctype.h>
 #include <linux/quotaops.h>
 #include <linux/exportfs.h>
@@ -1597,21 +1598,47 @@ static int jfs_ci_compare(struct dentry
 			goto out;
 	}
 	result = 0;
+out:
+	return result;
+}
 
+static int jfs_ci_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
 	/*
-	 * We want creates to preserve case.  A negative dentry, a, that
-	 * has a different case than b may cause a new entry to be created
-	 * with the wrong case.  Since we can't tell if a comes from a negative
-	 * dentry, we blindly replace it with b.  This should be harmless if
-	 * a is not a negative dentry.
+	 * This is not negative dentry. Always valid.
+	 *
+	 * Note, rename() to existing directory entry will have ->d_inode,
+	 * and will use existing name which isn't specified name by user.
+	 *
+	 * We may be able to drop this positive dentry here. But dropping
+	 * positive dentry isn't good idea. So it's unsupported like
+	 * rename("filename", "FILENAME") for now.
 	 */
-	memcpy((unsigned char *)a->name, b->name, a->len);
-out:
-	return result;
+	if (dentry->d_inode)
+		return 1;
+
+	/*
+	 * This may be nfsd (or something), anyway, we can't see the
+	 * intent of this. So, since this can be for creation, drop it.
+	 */
+	if (!nd)
+		return 0;
+
+	/*
+	 * Drop the negative dentry, in order to make sure to use the
+	 * case sensitive name which is specified by user if this is
+	 * for creation.
+	 */
+	if (!(nd->flags & (LOOKUP_CONTINUE | LOOKUP_PARENT))) {
+		if (nd->flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET))
+			return 0;
+	}
+	return 1;
 }
 
 const struct dentry_operations jfs_ci_dentry_operations =
 {
 	.d_hash = jfs_ci_hash,
 	.d_compare = jfs_ci_compare,
+	.d_revalidate = jfs_ci_revalidate,
 };



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 07/28] fs: change d_compare for rcu-walk
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (5 preceding siblings ...)
  2010-11-16 14:09 ` [patch 06/28] jfs: " Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-17  0:44   ` Tim Pepper
  2010-11-16 14:09 ` [patch 08/28] fs: change d_hash " Nick Piggin
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-d_compare-change.patch --]
[-- Type: text/plain, Size: 34271 bytes --]

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 Documentation/filesystems/Locking |    4 +
 Documentation/filesystems/porting |    7 +++
 Documentation/filesystems/vfs.txt |   25 +++++++++-
 fs/adfs/dir.c                     |    8 ++-
 fs/affs/namei.c                   |   44 +++++++++++--------
 fs/cifs/dir.c                     |   11 ++--
 fs/dcache.c                       |    4 +
 fs/fat/namei_msdos.c              |   15 +++---
 fs/fat/namei_vfat.c               |   39 +++++++++++-----
 fs/hfs/hfs_fs.h                   |    4 +
 fs/hfs/string.c                   |   14 +++---
 fs/hfsplus/hfsplus_fs.h           |    4 +
 fs/hfsplus/unicode.c              |   14 +++---
 fs/hpfs/dentry.c                  |   21 +++++----
 fs/isofs/inode.c                  |   88 ++++++++++++++++++--------------------
 fs/isofs/namei.c                  |    3 -
 fs/jfs/namei.c                    |   10 ++--
 fs/ncpfs/dir.c                    |   29 ++++++++----
 fs/ncpfs/ncplib_kernel.h          |    8 +--
 fs/proc/proc_sysctl.c             |   12 ++---
 include/linux/dcache.h            |   12 ++---
 include/linux/ncp_fs.h            |    4 -
 22 files changed, 228 insertions(+), 152 deletions(-)

Index: linux-2.6/fs/adfs/dir.c
===================================================================
--- linux-2.6.orig/fs/adfs/dir.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/adfs/dir.c	2010-11-17 01:05:49.000000000 +1100
@@ -237,17 +237,19 @@ adfs_hash(struct dentry *parent, struct
  * requirements of the underlying filesystem.
  */
 static int
-adfs_compare(struct dentry *parent, struct qstr *entry, struct qstr *name)
+adfs_compare(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
 	int i;
 
-	if (entry->len != name->len)
+	if (len != name->len)
 		return 1;
 
 	for (i = 0; i < name->len; i++) {
 		char a, b;
 
-		a = entry->name[i];
+		a = str[i];
 		b = name->name[i];
 
 		if (a >= 'A' && a <= 'Z')
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:49.000000000 +1100
@@ -1433,7 +1433,9 @@ struct dentry * __d_lookup(struct dentry
 		 */
 		qstr = &dentry->d_name;
 		if (parent->d_op && parent->d_op->d_compare) {
-			if (parent->d_op->d_compare(parent, qstr, name))
+			if (parent->d_op->d_compare(parent,
+						dentry, dentry->d_inode,
+						qstr->len, qstr->name, name))
 				goto next;
 		} else {
 			if (qstr->len != len)
Index: linux-2.6/fs/fat/namei_msdos.c
===================================================================
--- linux-2.6.orig/fs/fat/namei_msdos.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/fat/namei_msdos.c	2010-11-17 01:05:49.000000000 +1100
@@ -164,16 +164,19 @@ static int msdos_hash(struct dentry *den
  * Compare two msdos names. If either of the names are invalid,
  * we fall back to doing the standard name comparison.
  */
-static int msdos_cmp(struct dentry *dentry, struct qstr *a, struct qstr *b)
+static int msdos_cmp(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str,
+		const struct qstr *name)
 {
-	struct fat_mount_options *options = &MSDOS_SB(dentry->d_sb)->options;
+	struct fat_mount_options *options = &MSDOS_SB(parent->d_sb)->options;
 	unsigned char a_msdos_name[MSDOS_NAME], b_msdos_name[MSDOS_NAME];
 	int error;
 
-	error = msdos_format_name(a->name, a->len, a_msdos_name, options);
+	error = msdos_format_name(name->name, name->len, a_msdos_name, options);
 	if (error)
 		goto old_compare;
-	error = msdos_format_name(b->name, b->len, b_msdos_name, options);
+	error = msdos_format_name(str, len, b_msdos_name, options);
 	if (error)
 		goto old_compare;
 	error = memcmp(a_msdos_name, b_msdos_name, MSDOS_NAME);
@@ -182,8 +185,8 @@ static int msdos_cmp(struct dentry *dent
 
 old_compare:
 	error = 1;
-	if (a->len == b->len)
-		error = memcmp(a->name, b->name, a->len);
+	if (name->len == len)
+		error = memcmp(name->name, str, len);
 	goto out;
 }
 
Index: linux-2.6/fs/fat/namei_vfat.c
===================================================================
--- linux-2.6.orig/fs/fat/namei_vfat.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/fat/namei_vfat.c	2010-11-17 01:05:49.000000000 +1100
@@ -85,15 +85,18 @@ static int vfat_revalidate_ci(struct den
 }
 
 /* returns the length of a struct qstr, ignoring trailing dots */
-static unsigned int vfat_striptail_len(struct qstr *qstr)
+static unsigned int __vfat_striptail_len(unsigned int len, const char *name)
 {
-	unsigned int len = qstr->len;
-
-	while (len && qstr->name[len - 1] == '.')
+	while (len && name[len - 1] == '.')
 		len--;
 	return len;
 }
 
+static unsigned int vfat_striptail_len(const struct qstr *qstr)
+{
+	return __vfat_striptail_len(qstr->len, qstr->name);
+}
+
 /*
  * Compute the hash for the vfat name corresponding to the dentry.
  * Note: if the name is invalid, we leave the hash code unchanged so
@@ -133,16 +136,18 @@ static int vfat_hashi(struct dentry *den
 /*
  * Case insensitive compare of two vfat names.
  */
-static int vfat_cmpi(struct dentry *dentry, struct qstr *a, struct qstr *b)
+static int vfat_cmpi(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	struct nls_table *t = MSDOS_SB(dentry->d_inode->i_sb)->nls_io;
+	struct nls_table *t = MSDOS_SB(inode->i_sb)->nls_io;
 	unsigned int alen, blen;
 
 	/* A filename cannot end in '.' or we treat it like it has none */
-	alen = vfat_striptail_len(a);
-	blen = vfat_striptail_len(b);
+	alen = vfat_striptail_len(name);
+	blen = __vfat_striptail_len(len, str);
 	if (alen == blen) {
-		if (nls_strnicmp(t, a->name, b->name, alen) == 0)
+		if (nls_strnicmp(t, name->name, str, alen) == 0)
 			return 0;
 	}
 	return 1;
@@ -151,15 +156,17 @@ static int vfat_cmpi(struct dentry *dent
 /*
  * Case sensitive compare of two vfat names.
  */
-static int vfat_cmp(struct dentry *dentry, struct qstr *a, struct qstr *b)
+static int vfat_cmp(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
 	unsigned int alen, blen;
 
 	/* A filename cannot end in '.' or we treat it like it has none */
-	alen = vfat_striptail_len(a);
-	blen = vfat_striptail_len(b);
+	alen = vfat_striptail_len(name);
+	blen = __vfat_striptail_len(len, str);
 	if (alen == blen) {
-		if (strncmp(a->name, b->name, alen) == 0)
+		if (strncmp(name->name, str, alen) == 0)
 			return 0;
 	}
 	return 1;
@@ -780,6 +787,12 @@ static int vfat_create(struct inode *dir
 	struct timespec ts;
 	int err;
 
+	/*
+	 * To preserve case, don't let an existing negative dentry's case
+	 * take precedence.
+	 */
+	memcpy((void *)dentry->d_name.name, nd->last.name, dentry->d_name.len);
+
 	lock_super(sb);
 
 	ts = CURRENT_TIME_SEC;
Index: linux-2.6/fs/isofs/inode.c
===================================================================
--- linux-2.6.orig/fs/isofs/inode.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/isofs/inode.c	2010-11-17 01:05:49.000000000 +1100
@@ -28,14 +28,22 @@
 
 static int isofs_hashi(struct dentry *parent, struct qstr *qstr);
 static int isofs_hash(struct dentry *parent, struct qstr *qstr);
-static int isofs_dentry_cmpi(struct dentry *dentry, struct qstr *a, struct qstr *b);
-static int isofs_dentry_cmp(struct dentry *dentry, struct qstr *a, struct qstr *b);
+static int isofs_dentry_cmpi(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
+static int isofs_dentry_cmp(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 
 #ifdef CONFIG_JOLIET
 static int isofs_hashi_ms(struct dentry *parent, struct qstr *qstr);
 static int isofs_hash_ms(struct dentry *parent, struct qstr *qstr);
-static int isofs_dentry_cmpi_ms(struct dentry *dentry, struct qstr *a, struct qstr *b);
-static int isofs_dentry_cmp_ms(struct dentry *dentry, struct qstr *a, struct qstr *b);
+static int isofs_dentry_cmpi_ms(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
+static int isofs_dentry_cmp_ms(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 #endif
 
 static void isofs_put_super(struct super_block *sb)
@@ -206,49 +214,31 @@ isofs_hashi_common(struct dentry *dentry
 }
 
 /*
- * Case insensitive compare of two isofs names.
+ * Compare of two isofs names.
  */
-static int isofs_dentry_cmpi_common(struct dentry *dentry, struct qstr *a,
-				struct qstr *b, int ms)
+static int isofs_dentry_cmp_common(
+		unsigned int len, const char *str,
+		const struct qstr *name, int ms, int ci)
 {
 	int alen, blen;
 
 	/* A filename cannot end in '.' or we treat it like it has none */
-	alen = a->len;
-	blen = b->len;
+	alen = name->len;
+	blen = len;
 	if (ms) {
-		while (alen && a->name[alen-1] == '.')
+		while (alen && name->name[alen-1] == '.')
 			alen--;
-		while (blen && b->name[blen-1] == '.')
+		while (blen && str[blen-1] == '.')
 			blen--;
 	}
 	if (alen == blen) {
-		if (strnicmp(a->name, b->name, alen) == 0)
-			return 0;
-	}
-	return 1;
-}
-
-/*
- * Case sensitive compare of two isofs names.
- */
-static int isofs_dentry_cmp_common(struct dentry *dentry, struct qstr *a,
-					struct qstr *b, int ms)
-{
-	int alen, blen;
-
-	/* A filename cannot end in '.' or we treat it like it has none */
-	alen = a->len;
-	blen = b->len;
-	if (ms) {
-		while (alen && a->name[alen-1] == '.')
-			alen--;
-		while (blen && b->name[blen-1] == '.')
-			blen--;
-	}
-	if (alen == blen) {
-		if (strncmp(a->name, b->name, alen) == 0)
-			return 0;
+		if (ci) {
+			if (strnicmp(name->name, str, alen) == 0)
+				return 0;
+		} else {
+			if (strncmp(name->name, str, alen) == 0)
+				return 0;
+		}
 	}
 	return 1;
 }
@@ -266,15 +256,19 @@ isofs_hashi(struct dentry *dentry, struc
 }
 
 static int
-isofs_dentry_cmp(struct dentry *dentry,struct qstr *a,struct qstr *b)
+isofs_dentry_cmp(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return isofs_dentry_cmp_common(dentry, a, b, 0);
+	return isofs_dentry_cmp_common(len, str, name, 0, 0);
 }
 
 static int
-isofs_dentry_cmpi(struct dentry *dentry,struct qstr *a,struct qstr *b)
+isofs_dentry_cmpi(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return isofs_dentry_cmpi_common(dentry, a, b, 0);
+	return isofs_dentry_cmp_common(len, str, name, 0, 1);
 }
 
 #ifdef CONFIG_JOLIET
@@ -291,15 +285,19 @@ isofs_hashi_ms(struct dentry *dentry, st
 }
 
 static int
-isofs_dentry_cmp_ms(struct dentry *dentry,struct qstr *a,struct qstr *b)
+isofs_dentry_cmp_ms(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return isofs_dentry_cmp_common(dentry, a, b, 1);
+	return isofs_dentry_cmp_common(len, str, name, 1, 0);
 }
 
 static int
-isofs_dentry_cmpi_ms(struct dentry *dentry,struct qstr *a,struct qstr *b)
+isofs_dentry_cmpi_ms(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return isofs_dentry_cmpi_common(dentry, a, b, 1);
+	return isofs_dentry_cmp_common(len, str, name, 1, 1);
 }
 #endif
 
Index: linux-2.6/fs/proc/proc_sysctl.c
===================================================================
--- linux-2.6.orig/fs/proc/proc_sysctl.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/proc/proc_sysctl.c	2010-11-17 00:52:37.000000000 +1100
@@ -397,15 +397,15 @@ static int proc_sys_delete(const struct
 	return !!PROC_I(dentry->d_inode)->sysctl->unregistering;
 }
 
-static int proc_sys_compare(struct dentry *dir, struct qstr *qstr,
-			    struct qstr *name)
+static int proc_sys_compare(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	struct dentry *dentry = container_of(qstr, struct dentry, d_name);
-	if (qstr->len != name->len)
+	if (name->len != len)
 		return 1;
-	if (memcmp(qstr->name, name->name, name->len))
+	if (memcmp(name->name, str, len))
 		return 1;
-	return !sysctl_is_seen(PROC_I(dentry->d_inode)->sysctl);
+	return !sysctl_is_seen(PROC_I(inode)->sysctl);
 }
 
 static const struct dentry_operations proc_sys_dentry_operations = {
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:48.000000000 +1100
@@ -134,7 +134,9 @@ enum dentry_d_lock_class
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
 	int (*d_hash)(struct dentry *, struct qstr *);
-	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
+	int (*d_compare)(const struct dentry *,
+			const struct dentry *, const struct inode *,
+			unsigned int, const char *, const struct qstr *);
 	int (*d_delete)(const struct dentry *);
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
@@ -145,12 +147,8 @@ struct dentry_operations {
  * Locking rules for dentry_operations callbacks are to be found in
  * Documentation/filesystems/Locking. Keep it updated!
  *
- * the dentry parameter passed to d_hash and d_compare is the parent
- * directory of the entries to be compared. It is used in case these
- * functions need any directory specific information for determining
- * equivalency classes.  Using the dentry itself might not work, as it
- * might be a negative dentry which has no information associated with
- * it.
+ * FUrther descriptions are found in Documentation/filesystems/vfs.txt.
+ * Keep it updated too!
  */
 
 /* d_flags entries */
Index: linux-2.6/fs/affs/namei.c
===================================================================
--- linux-2.6.orig/fs/affs/namei.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/affs/namei.c	2010-11-17 01:05:49.000000000 +1100
@@ -14,10 +14,14 @@ typedef int (*toupper_t)(int);
 
 static int	 affs_toupper(int ch);
 static int	 affs_hash_dentry(struct dentry *, struct qstr *);
-static int       affs_compare_dentry(struct dentry *, struct qstr *, struct qstr *);
+static int       affs_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 static int	 affs_intl_toupper(int ch);
 static int	 affs_intl_hash_dentry(struct dentry *, struct qstr *);
-static int       affs_intl_compare_dentry(struct dentry *, struct qstr *, struct qstr *);
+static int       affs_intl_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 
 const struct dentry_operations affs_dentry_operations = {
 	.d_hash		= affs_hash_dentry,
@@ -88,29 +92,29 @@ affs_intl_hash_dentry(struct dentry *den
 	return __affs_hash_dentry(dentry, qstr, affs_intl_toupper);
 }
 
-static inline int
-__affs_compare_dentry(struct dentry *dentry, struct qstr *a, struct qstr *b, toupper_t toupper)
+static inline int __affs_compare_dentry(unsigned int len,
+		const char *str, const struct qstr *name, toupper_t toupper)
 {
-	const u8 *aname = a->name;
-	const u8 *bname = b->name;
-	int len;
+	const u8 *aname = str;
+	const u8 *bname = name->name;
 
-	/* 'a' is the qstr of an already existing dentry, so the name
-	 * must be valid. 'b' must be validated first.
+	/*
+	 * 'str' is the name of an already existing dentry, so the name
+	 * must be valid. 'name' must be validated first.
 	 */
 
-	if (affs_check_name(b->name,b->len))
+	if (affs_check_name(name->name, name->len))
 		return 1;
 
-	/* If the names are longer than the allowed 30 chars,
+	/*
+	 * If the names are longer than the allowed 30 chars,
 	 * the excess is ignored, so their length may differ.
 	 */
-	len = a->len;
 	if (len >= 30) {
-		if (b->len < 30)
+		if (name->len < 30)
 			return 1;
 		len = 30;
-	} else if (len != b->len)
+	} else if (len != name->len)
 		return 1;
 
 	for (; len > 0; len--)
@@ -121,14 +125,18 @@ __affs_compare_dentry(struct dentry *den
 }
 
 static int
-affs_compare_dentry(struct dentry *dentry, struct qstr *a, struct qstr *b)
+affs_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return __affs_compare_dentry(dentry, a, b, affs_toupper);
+	return __affs_compare_dentry(len, str, name, affs_toupper);
 }
 static int
-affs_intl_compare_dentry(struct dentry *dentry, struct qstr *a, struct qstr *b)
+affs_intl_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	return __affs_compare_dentry(dentry, a, b, affs_intl_toupper);
+	return __affs_compare_dentry(len, str, name, affs_intl_toupper);
 }
 
 /*
Index: linux-2.6/fs/hfs/hfs_fs.h
===================================================================
--- linux-2.6.orig/fs/hfs/hfs_fs.h	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/hfs/hfs_fs.h	2010-11-17 01:05:49.000000000 +1100
@@ -216,7 +216,9 @@ extern const struct dentry_operations hf
 extern int hfs_hash_dentry(struct dentry *, struct qstr *);
 extern int hfs_strcmp(const unsigned char *, unsigned int,
 		      const unsigned char *, unsigned int);
-extern int hfs_compare_dentry(struct dentry *, struct qstr *, struct qstr *);
+extern int hfs_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 
 /* trans.c */
 extern void hfs_asc2mac(struct super_block *, struct hfs_name *, struct qstr *);
Index: linux-2.6/fs/hfs/string.c
===================================================================
--- linux-2.6.orig/fs/hfs/string.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/hfs/string.c	2010-11-17 01:05:49.000000000 +1100
@@ -92,21 +92,21 @@ int hfs_strcmp(const unsigned char *s1,
  * Test for equality of two strings in the HFS filename character ordering.
  * return 1 on failure and 0 on success
  */
-int hfs_compare_dentry(struct dentry *dentry, struct qstr *s1, struct qstr *s2)
+int hfs_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
 	const unsigned char *n1, *n2;
-	int len;
 
-	len = s1->len;
 	if (len >= HFS_NAMELEN) {
-		if (s2->len < HFS_NAMELEN)
+		if (name->len < HFS_NAMELEN)
 			return 1;
 		len = HFS_NAMELEN;
-	} else if (len != s2->len)
+	} else if (len != name->len)
 		return 1;
 
-	n1 = s1->name;
-	n2 = s2->name;
+	n1 = str;
+	n2 = name->name;
 	while (len--) {
 		if (caseorder[*n1++] != caseorder[*n2++])
 			return 1;
Index: linux-2.6/fs/hfsplus/hfsplus_fs.h
===================================================================
--- linux-2.6.orig/fs/hfsplus/hfsplus_fs.h	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/hfsplus/hfsplus_fs.h	2010-11-17 01:05:49.000000000 +1100
@@ -380,7 +380,9 @@ int hfsplus_strcmp(const struct hfsplus_
 int hfsplus_uni2asc(struct super_block *, const struct hfsplus_unistr *, char *, int *);
 int hfsplus_asc2uni(struct super_block *, struct hfsplus_unistr *, const char *, int);
 int hfsplus_hash_dentry(struct dentry *dentry, struct qstr *str);
-int hfsplus_compare_dentry(struct dentry *dentry, struct qstr *s1, struct qstr *s2);
+int hfsplus_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name);
 
 /* wrapper.c */
 int hfsplus_read_wrapper(struct super_block *);
Index: linux-2.6/fs/hfsplus/unicode.c
===================================================================
--- linux-2.6.orig/fs/hfsplus/unicode.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/hfsplus/unicode.c	2010-11-17 01:05:49.000000000 +1100
@@ -363,9 +363,11 @@ int hfsplus_hash_dentry(struct dentry *d
  * Composed unicode characters are decomposed and case-folding is performed
  * if the appropriate bits are (un)set on the superblock.
  */
-int hfsplus_compare_dentry(struct dentry *dentry, struct qstr *s1, struct qstr *s2)
+int hfsplus_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	struct super_block *sb = dentry->d_sb;
+	struct super_block *sb = parent->d_sb;
 	int casefold, decompose, size;
 	int dsize1, dsize2, len1, len2;
 	const u16 *dstr1, *dstr2;
@@ -375,10 +377,10 @@ int hfsplus_compare_dentry(struct dentry
 
 	casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
 	decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
-	astr1 = s1->name;
-	len1 = s1->len;
-	astr2 = s2->name;
-	len2 = s2->len;
+	astr1 = str;
+	len1 = len;
+	astr2 = name->name;
+	len2 = name->len;
 	dsize1 = dsize2 = 0;
 	dstr1 = dstr2 = NULL;
 
Index: linux-2.6/fs/hpfs/dentry.c
===================================================================
--- linux-2.6.orig/fs/hpfs/dentry.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/hpfs/dentry.c	2010-11-17 01:05:49.000000000 +1100
@@ -34,19 +34,24 @@ static int hpfs_hash_dentry(struct dentr
 	return 0;
 }
 
-static int hpfs_compare_dentry(struct dentry *dentry, struct qstr *a, struct qstr *b)
+static int hpfs_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	unsigned al=a->len;
-	unsigned bl=b->len;
-	hpfs_adjust_length(a->name, &al);
+	unsigned al = len;
+	unsigned bl = name->len;
+
+	hpfs_adjust_length(str, &al);
 	/*hpfs_adjust_length(b->name, &bl);*/
-	/* 'a' is the qstr of an already existing dentry, so the name
-	 * must be valid. 'b' must be validated first.
+
+	/*
+	 * 'str' is the nane of an already existing dentry, so the name
+	 * must be valid. 'name' must be validated first.
 	 */
 
-	if (hpfs_chk_name(b->name, &bl))
+	if (hpfs_chk_name(name->name, &bl))
 		return 1;
-	if (hpfs_compare_names(dentry->d_sb, a->name, al, b->name, bl, 0))
+	if (hpfs_compare_names(parent->d_sb, str, al, name->name, bl, 0))
 		return 1;
 	return 0;
 }
Index: linux-2.6/fs/jfs/namei.c
===================================================================
--- linux-2.6.orig/fs/jfs/namei.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/jfs/namei.c	2010-11-17 01:05:49.000000000 +1100
@@ -1587,14 +1587,16 @@ static int jfs_ci_hash(struct dentry *di
 	return 0;
 }
 
-static int jfs_ci_compare(struct dentry *dir, struct qstr *a, struct qstr *b)
+static int jfs_ci_compare(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
 	int i, result = 1;
 
-	if (a->len != b->len)
+	if (len != name->len)
 		goto out;
-	for (i=0; i < a->len; i++) {
-		if (tolower(a->name[i]) != tolower(b->name[i]))
+	for (i=0; i < len; i++) {
+		if (tolower(str[i]) != tolower(name->name[i]))
 			goto out;
 	}
 	result = 0;
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 01:05:49.000000000 +1100
@@ -76,7 +76,9 @@ const struct inode_operations ncp_dir_in
  */
 static int ncp_lookup_validate(struct dentry *, struct nameidata *);
 static int ncp_hash_dentry(struct dentry *, struct qstr *);
-static int ncp_compare_dentry (struct dentry *, struct qstr *, struct qstr *);
+static int ncp_compare_dentry(const struct dentry *,
+		const struct dentry *, const struct inode *,
+		unsigned int, const char *, const struct qstr *);
 static int ncp_delete_dentry(const struct dentry *);
 
 static const struct dentry_operations ncp_dentry_operations =
@@ -114,10 +116,10 @@ static inline int ncp_preserve_entry_cas
 
 #define ncp_preserve_case(i)	(ncp_namespace(i) != NW_NS_DOS)
 
-static inline int ncp_case_sensitive(struct dentry *dentry)
+static inline int ncp_case_sensitive(const struct inode *inode)
 {
 #ifdef CONFIG_NCPFS_NFS_NS
-	return ncp_namespace(dentry->d_inode) == NW_NS_NFS;
+	return ncp_namespace(inode) == NW_NS_NFS;
 #else
 	return 0;
 #endif /* CONFIG_NCPFS_NFS_NS */
@@ -130,12 +132,15 @@ static inline int ncp_case_sensitive(str
 static int 
 ncp_hash_dentry(struct dentry *dentry, struct qstr *this)
 {
-	if (!ncp_case_sensitive(dentry)) {
+	struct inode *inode = dentry->d_inode;
+
+	if (!ncp_case_sensitive(inode)) {
+		struct super_block *sb = dentry->d_sb;
 		struct nls_table *t;
 		unsigned long hash;
 		int i;
 
-		t = NCP_IO_TABLE(dentry);
+		t = NCP_IO_TABLE(sb);
 		hash = init_name_hash();
 		for (i=0; i<this->len ; i++)
 			hash = partial_name_hash(ncp_tolower(t, this->name[i]),
@@ -146,15 +151,19 @@ ncp_hash_dentry(struct dentry *dentry, s
 }
 
 static int
-ncp_compare_dentry(struct dentry *dentry, struct qstr *a, struct qstr *b)
+ncp_compare_dentry(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	if (a->len != b->len)
+	struct super_block *sb = dentry->d_sb;
+
+	if (len != name->len)
 		return 1;
 
-	if (ncp_case_sensitive(dentry))
-		return strncmp(a->name, b->name, a->len);
+	if (ncp_case_sensitive(inode))
+		return strncmp(str, name->name, len);
 
-	return ncp_strnicmp(NCP_IO_TABLE(dentry), a->name, b->name, a->len);
+	return ncp_strnicmp(NCP_IO_TABLE(sb), str, name->name, len);
 }
 
 /*
Index: linux-2.6/fs/ncpfs/ncplib_kernel.h
===================================================================
--- linux-2.6.orig/fs/ncpfs/ncplib_kernel.h	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/ncpfs/ncplib_kernel.h	2010-11-17 01:05:45.000000000 +1100
@@ -135,7 +135,7 @@ int ncp__vol2io(struct ncp_server *, uns
 				const unsigned char *, unsigned int, int);
 
 #define NCP_ESC			':'
-#define NCP_IO_TABLE(dentry)	(NCP_SERVER((dentry)->d_inode)->nls_io)
+#define NCP_IO_TABLE(sb)	(NCP_SBP(sb)->nls_io)
 #define ncp_tolower(t, c)	nls_tolower(t, c)
 #define ncp_toupper(t, c)	nls_toupper(t, c)
 #define ncp_strnicmp(t, s1, s2, len) \
@@ -150,15 +150,15 @@ int ncp__io2vol(unsigned char *, unsigne
 int ncp__vol2io(unsigned char *, unsigned int *,
 				const unsigned char *, unsigned int, int);
 
-#define NCP_IO_TABLE(dentry)	NULL
+#define NCP_IO_TABLE(sb)	NULL
 #define ncp_tolower(t, c)	tolower(c)
 #define ncp_toupper(t, c)	toupper(c)
 #define ncp_io2vol(S,m,i,n,k,U)	ncp__io2vol(m,i,n,k,U)
 #define ncp_vol2io(S,m,i,n,k,U)	ncp__vol2io(m,i,n,k,U)
 
 
-static inline int ncp_strnicmp(struct nls_table *t, const unsigned char *s1,
-		const unsigned char *s2, int len)
+static inline int ncp_strnicmp(const struct nls_table *t,
+		const unsigned char *s1, const unsigned char *s2, int len)
 {
 	while (len--) {
 		if (tolower(*s1++) != tolower(*s2++))
Index: linux-2.6/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/vfs.txt	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/vfs.txt	2010-11-17 01:05:49.000000000 +1100
@@ -842,7 +842,9 @@ the VFS uses a default. As of kernel 2.6
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
 	int (*d_hash)(struct dentry *, struct qstr *);
-	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
+	int (*d_compare)(const struct dentry *,
+			const struct dentry *, const struct inode *,
+			unsigned int, const char *, const struct qstr *);
 	int (*d_delete)(const struct dentry *);
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
@@ -854,9 +856,26 @@ struct dentry_operations {
 	dcache. Most filesystems leave this as NULL, because all their
 	dentries in the dcache are valid
 
-  d_hash: called when the VFS adds a dentry to the hash table
+  d_hash: called when the VFS adds a dentry to the hash table. The first
+	dentry passed to d_hash is the parent directory that the name is
+ 	to be hashed into.
 
-  d_compare: called when a dentry should be compared with another
+  d_compare: called to compare a dentry name with a given name. The first
+	dentry is the parent of the dentry to be compared, the second is
+	the dentry itself. inode, len, and name string are properties of
+	the dentry to be compared. qstr is the name to compare it with.
+
+	Must be constant and idempotent, and should not take locks if
+	possible, and should not or store into the dentry or inodes.
+	Should not dereference pointers outside the dentry or inodes without
+	lots of care (eg.  d_parent, d_inode shouldn't be used).
+
+	However, our vfsmount is pinned, and RCU held, so the dentries and
+	inodes won't disappear, neither will our sb or filesystem module.
+	->i_sb and ->d_sb may be used.
+
+	It is a tricky calling convention because it needs to be called under
+	"rcu-walk", ie. without any locks or references on things.
 
   d_delete: called when the last reference to a dentry is dropped and the
 	dcache is deciding whether or not to cache it. Return 1 to delete
Index: linux-2.6/fs/isofs/namei.c
===================================================================
--- linux-2.6.orig/fs/isofs/namei.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/isofs/namei.c	2010-11-17 00:52:37.000000000 +1100
@@ -37,7 +37,8 @@ isofs_cmp(struct dentry *dentry, const c
 
 	qstr.name = compare;
 	qstr.len = dlen;
-	return dentry->d_op->d_compare(dentry, &dentry->d_name, &qstr);
+	return dentry->d_op->d_compare(NULL, NULL, NULL,
+			dentry->d_name.len, dentry->d_name.name, &qstr);
 }
 
 /*
Index: linux-2.6/include/linux/ncp_fs.h
===================================================================
--- linux-2.6.orig/include/linux/ncp_fs.h	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/include/linux/ncp_fs.h	2010-11-17 00:52:37.000000000 +1100
@@ -184,13 +184,13 @@ struct ncp_entry_info {
 	__u8			file_handle[6];
 };
 
-static inline struct ncp_server *NCP_SBP(struct super_block *sb)
+static inline struct ncp_server *NCP_SBP(const struct super_block *sb)
 {
 	return sb->s_fs_info;
 }
 
 #define NCP_SERVER(inode)	NCP_SBP((inode)->i_sb)
-static inline struct ncp_inode_info *NCP_FINFO(struct inode *inode)
+static inline struct ncp_inode_info *NCP_FINFO(const struct inode *inode)
 {
 	return container_of(inode, struct ncp_inode_info, vfs_inode);
 }
Index: linux-2.6/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.orig/Documentation/filesystems/Locking	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/Documentation/filesystems/Locking	2010-11-17 01:05:48.000000000 +1100
@@ -11,7 +11,9 @@ be able to use diff(1).
 prototypes:
 	int (*d_revalidate)(struct dentry *, int);
 	int (*d_hash) (struct dentry *, struct qstr *);
-	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
+	int (*d_compare)(const struct dentry *,
+			const struct dentry *, const struct inode *,
+			unsigned int, const char *, const struct qstr *);
 	int (*d_delete)(struct dentry *);
 	void (*d_release)(struct dentry *);
 	void (*d_iput)(struct dentry *, struct inode *);
Index: linux-2.6/Documentation/filesystems/porting
===================================================================
--- linux-2.6.orig/Documentation/filesystems/porting	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/porting	2010-11-17 01:05:48.000000000 +1100
@@ -327,3 +327,10 @@ unreferenced dentries, and is now only c
 0. Even on 0 refcount transition, it must be able to tolerate being called 0,
 1, or more times (eg. constant, idempotent).
 
+---
+[mandatory]
+
+	.d_compare() calling convention and locking rules are significantly
+changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
+look at examples of other filesystems) for guidance.
+
Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/cifs/dir.c	2010-11-17 01:05:49.000000000 +1100
@@ -715,13 +715,14 @@ static int cifs_ci_hash(struct dentry *d
 	return 0;
 }
 
-static int cifs_ci_compare(struct dentry *dentry, struct qstr *a,
-			   struct qstr *b)
+static int cifs_ci_compare(const struct dentry *parent,
+		const struct dentry *dentry, const struct inode *inode,
+		unsigned int len, const char *str, const struct qstr *name)
 {
-	struct nls_table *codepage = CIFS_SB(dentry->d_inode->i_sb)->local_nls;
+	struct nls_table *codepage = CIFS_SB(inode->i_sb)->local_nls;
 
-	if ((a->len == b->len) &&
-	    (nls_strnicmp(codepage, a->name, b->name, a->len) == 0))
+	if ((name->len == len) &&
+	    (nls_strnicmp(codepage, name->name, str, len) == 0))
 		return 0;
 	return 1;
 }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 08/28] fs: change d_hash for rcu-walk
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (6 preceding siblings ...)
  2010-11-16 14:09 ` [patch 07/28] fs: change d_compare for rcu-walk Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-17  0:50   ` Tim Pepper
  2010-11-16 14:09 ` [patch 09/28] hostfs: simplify locking Nick Piggin
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-d_hash-change.patch --]
[-- Type: text/plain, Size: 23054 bytes --]

Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 Documentation/filesystems/Locking |    5 +++--
 Documentation/filesystems/porting |    7 +++++++
 Documentation/filesystems/vfs.txt |    8 ++++++--
 fs/adfs/dir.c                     |    3 ++-
 fs/affs/namei.c                   |   20 ++++++++++++--------
 fs/cifs/dir.c                     |    5 +++--
 fs/cifs/readdir.c                 |    2 +-
 fs/dcache.c                       |    2 +-
 fs/ecryptfs/inode.c               |    4 ++--
 fs/fat/namei_msdos.c              |    3 ++-
 fs/fat/namei_vfat.c               |    8 +++++---
 fs/gfs2/dentry.c                  |    3 ++-
 fs/hfs/hfs_fs.h                   |    3 ++-
 fs/hfs/string.c                   |    3 ++-
 fs/hfsplus/hfsplus_fs.h           |    3 ++-
 fs/hfsplus/unicode.c              |    3 ++-
 fs/hpfs/dentry.c                  |    3 ++-
 fs/isofs/inode.c                  |   28 ++++++++++++++++++----------
 fs/jfs/namei.c                    |    3 ++-
 fs/namei.c                        |    5 +++--
 fs/ncpfs/dir.c                    |   10 +++++-----
 fs/sysv/namei.c                   |    3 ++-
 include/linux/dcache.h            |    3 ++-
 23 files changed, 88 insertions(+), 49 deletions(-)

Index: linux-2.6/fs/adfs/dir.c
===================================================================
--- linux-2.6.orig/fs/adfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/adfs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -201,7 +201,8 @@ const struct file_operations adfs_dir_op
 };
 
 static int
-adfs_hash(struct dentry *parent, struct qstr *qstr)
+adfs_hash(const struct dentry *parent, const struct inode *inode,
+		struct qstr *qstr)
 {
 	const unsigned int name_len = ADFS_SB(parent->d_sb)->s_namelen;
 	const unsigned char *name;
Index: linux-2.6/fs/affs/namei.c
===================================================================
--- linux-2.6.orig/fs/affs/namei.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/affs/namei.c	2010-11-17 00:52:37.000000000 +1100
@@ -13,12 +13,14 @@
 typedef int (*toupper_t)(int);
 
 static int	 affs_toupper(int ch);
-static int	 affs_hash_dentry(struct dentry *, struct qstr *);
+static int	 affs_hash_dentry(const struct dentry *,
+		const struct inode *, struct qstr *);
 static int       affs_compare_dentry(const struct dentry *parent,
 		const struct dentry *dentry, const struct inode *inode,
 		unsigned int len, const char *str, const struct qstr *name);
 static int	 affs_intl_toupper(int ch);
-static int	 affs_intl_hash_dentry(struct dentry *, struct qstr *);
+static int	 affs_intl_hash_dentry(const struct dentry *,
+		const struct inode *, struct qstr *);
 static int       affs_intl_compare_dentry(const struct dentry *parent,
 		const struct dentry *dentry, const struct inode *inode,
 		unsigned int len, const char *str, const struct qstr *name);
@@ -62,13 +64,13 @@ affs_get_toupper(struct super_block *sb)
  * Note: the dentry argument is the parent dentry.
  */
 static inline int
-__affs_hash_dentry(struct dentry *dentry, struct qstr *qstr, toupper_t toupper)
+__affs_hash_dentry(struct qstr *qstr, toupper_t toupper)
 {
 	const u8 *name = qstr->name;
 	unsigned long hash;
 	int i;
 
-	i = affs_check_name(qstr->name,qstr->len);
+	i = affs_check_name(qstr->name, qstr->len);
 	if (i)
 		return i;
 
@@ -82,14 +84,16 @@ __affs_hash_dentry(struct dentry *dentry
 }
 
 static int
-affs_hash_dentry(struct dentry *dentry, struct qstr *qstr)
+affs_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
-	return __affs_hash_dentry(dentry, qstr, affs_toupper);
+	return __affs_hash_dentry(qstr, affs_toupper);
 }
 static int
-affs_intl_hash_dentry(struct dentry *dentry, struct qstr *qstr)
+affs_intl_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
-	return __affs_hash_dentry(dentry, qstr, affs_intl_toupper);
+	return __affs_hash_dentry(qstr, affs_intl_toupper);
 }
 
 static inline int __affs_compare_dentry(unsigned int len,
Index: linux-2.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.orig/fs/cifs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/cifs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -700,9 +700,10 @@ const struct dentry_operations cifs_dent
 /* d_delete:       cifs_d_delete,      */ /* not needed except for debugging */
 };
 
-static int cifs_ci_hash(struct dentry *dentry, struct qstr *q)
+static int cifs_ci_hash(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *q)
 {
-	struct nls_table *codepage = CIFS_SB(dentry->d_inode->i_sb)->local_nls;
+	struct nls_table *codepage = CIFS_SB(dentry->d_sb)->local_nls;
 	unsigned long hash;
 	int i;
 
Index: linux-2.6/fs/fat/namei_msdos.c
===================================================================
--- linux-2.6.orig/fs/fat/namei_msdos.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/fat/namei_msdos.c	2010-11-17 00:52:37.000000000 +1100
@@ -148,7 +148,8 @@ static int msdos_find(struct inode *dir,
  * that the existing dentry can be used. The msdos fs routines will
  * return ENOENT or EINVAL as appropriate.
  */
-static int msdos_hash(struct dentry *dentry, struct qstr *qstr)
+static int msdos_hash(const struct dentry *dentry, const struct inode *inode,
+	       struct qstr *qstr)
 {
 	struct fat_mount_options *options = &MSDOS_SB(dentry->d_sb)->options;
 	unsigned char msdos_name[MSDOS_NAME];
Index: linux-2.6/fs/fat/namei_vfat.c
===================================================================
--- linux-2.6.orig/fs/fat/namei_vfat.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/fat/namei_vfat.c	2010-11-17 00:52:37.000000000 +1100
@@ -103,7 +103,8 @@ static unsigned int vfat_striptail_len(c
  * that the existing dentry can be used. The vfat fs routines will
  * return ENOENT or EINVAL as appropriate.
  */
-static int vfat_hash(struct dentry *dentry, struct qstr *qstr)
+static int vfat_hash(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	qstr->hash = full_name_hash(qstr->name, vfat_striptail_len(qstr));
 	return 0;
@@ -115,9 +116,10 @@ static int vfat_hash(struct dentry *dent
  * that the existing dentry can be used. The vfat fs routines will
  * return ENOENT or EINVAL as appropriate.
  */
-static int vfat_hashi(struct dentry *dentry, struct qstr *qstr)
+static int vfat_hashi(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
-	struct nls_table *t = MSDOS_SB(dentry->d_inode->i_sb)->nls_io;
+	struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
 	const unsigned char *name;
 	unsigned int len;
 	unsigned long hash;
Index: linux-2.6/fs/gfs2/dentry.c
===================================================================
--- linux-2.6.orig/fs/gfs2/dentry.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/gfs2/dentry.c	2010-11-17 00:52:37.000000000 +1100
@@ -100,7 +100,8 @@ static int gfs2_drevalidate(struct dentr
 	return 0;
 }
 
-static int gfs2_dhash(struct dentry *dentry, struct qstr *str)
+static int gfs2_dhash(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *str)
 {
 	str->hash = gfs2_disk_hash(str->name, str->len);
 	return 0;
Index: linux-2.6/fs/hfs/hfs_fs.h
===================================================================
--- linux-2.6.orig/fs/hfs/hfs_fs.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hfs/hfs_fs.h	2010-11-17 00:52:37.000000000 +1100
@@ -213,7 +213,8 @@ extern int hfs_part_find(struct super_bl
 /* string.c */
 extern const struct dentry_operations hfs_dentry_operations;
 
-extern int hfs_hash_dentry(struct dentry *, struct qstr *);
+extern int hfs_hash_dentry(const struct dentry *, const struct inode *,
+		struct qstr *);
 extern int hfs_strcmp(const unsigned char *, unsigned int,
 		      const unsigned char *, unsigned int);
 extern int hfs_compare_dentry(const struct dentry *parent,
Index: linux-2.6/fs/hfs/string.c
===================================================================
--- linux-2.6.orig/fs/hfs/string.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hfs/string.c	2010-11-17 00:52:37.000000000 +1100
@@ -51,7 +51,8 @@ static unsigned char caseorder[256] = {
 /*
  * Hash a string to an integer in a case-independent way
  */
-int hfs_hash_dentry(struct dentry *dentry, struct qstr *this)
+int hfs_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *this)
 {
 	const unsigned char *name = this->name;
 	unsigned int hash, len = this->len;
Index: linux-2.6/fs/hfsplus/hfsplus_fs.h
===================================================================
--- linux-2.6.orig/fs/hfsplus/hfsplus_fs.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hfsplus/hfsplus_fs.h	2010-11-17 00:52:37.000000000 +1100
@@ -379,7 +379,8 @@ int hfsplus_strcasecmp(const struct hfsp
 int hfsplus_strcmp(const struct hfsplus_unistr *, const struct hfsplus_unistr *);
 int hfsplus_uni2asc(struct super_block *, const struct hfsplus_unistr *, char *, int *);
 int hfsplus_asc2uni(struct super_block *, struct hfsplus_unistr *, const char *, int);
-int hfsplus_hash_dentry(struct dentry *dentry, struct qstr *str);
+int hfsplus_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *str);
 int hfsplus_compare_dentry(const struct dentry *parent,
 		const struct dentry *dentry, const struct inode *inode,
 		unsigned int len, const char *str, const struct qstr *name);
Index: linux-2.6/fs/hfsplus/unicode.c
===================================================================
--- linux-2.6.orig/fs/hfsplus/unicode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hfsplus/unicode.c	2010-11-17 00:52:37.000000000 +1100
@@ -320,7 +320,8 @@ int hfsplus_asc2uni(struct super_block *
  * Composed unicode characters are decomposed and case-folding is performed
  * if the appropriate bits are (un)set on the superblock.
  */
-int hfsplus_hash_dentry(struct dentry *dentry, struct qstr *str)
+int hfsplus_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *str)
 {
 	struct super_block *sb = dentry->d_sb;
 	const char *astr;
Index: linux-2.6/fs/hpfs/dentry.c
===================================================================
--- linux-2.6.orig/fs/hpfs/dentry.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hpfs/dentry.c	2010-11-17 00:52:37.000000000 +1100
@@ -12,7 +12,8 @@
  * Note: the dentry argument is the parent dentry.
  */
 
-static int hpfs_hash_dentry(struct dentry *dentry, struct qstr *qstr)
+static int hpfs_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	unsigned long	 hash;
 	int		 i;
Index: linux-2.6/fs/isofs/inode.c
===================================================================
--- linux-2.6.orig/fs/isofs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/isofs/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -26,8 +26,10 @@
 
 #define BEQUIET
 
-static int isofs_hashi(struct dentry *parent, struct qstr *qstr);
-static int isofs_hash(struct dentry *parent, struct qstr *qstr);
+static int isofs_hashi(const struct dentry *parent, const struct inode *inode,
+		struct qstr *qstr);
+static int isofs_hash(const struct dentry *parent, const struct inode *inode,
+		struct qstr *qstr);
 static int isofs_dentry_cmpi(const struct dentry *parent,
 		const struct dentry *dentry, const struct inode *inode,
 		unsigned int len, const char *str, const struct qstr *name);
@@ -36,8 +38,10 @@ static int isofs_dentry_cmp(const struct
 		unsigned int len, const char *str, const struct qstr *name);
 
 #ifdef CONFIG_JOLIET
-static int isofs_hashi_ms(struct dentry *parent, struct qstr *qstr);
-static int isofs_hash_ms(struct dentry *parent, struct qstr *qstr);
+static int isofs_hashi_ms(const struct dentry *parent, const struct inode *inode,
+		struct qstr *qstr);
+static int isofs_hash_ms(const struct dentry *parent, const struct inode *inode,
+		struct qstr *qstr);
 static int isofs_dentry_cmpi_ms(const struct dentry *parent,
 		const struct dentry *dentry, const struct inode *inode,
 		unsigned int len, const char *str, const struct qstr *name);
@@ -168,7 +172,7 @@ struct iso9660_options{
  * Compute the hash for the isofs name corresponding to the dentry.
  */
 static int
-isofs_hash_common(struct dentry *dentry, struct qstr *qstr, int ms)
+isofs_hash_common(const struct dentry *dentry, struct qstr *qstr, int ms)
 {
 	const char *name;
 	int len;
@@ -189,7 +193,7 @@ isofs_hash_common(struct dentry *dentry,
  * Compute the hash for the isofs name corresponding to the dentry.
  */
 static int
-isofs_hashi_common(struct dentry *dentry, struct qstr *qstr, int ms)
+isofs_hashi_common(const struct dentry *dentry, struct qstr *qstr, int ms)
 {
 	const char *name;
 	int len;
@@ -244,13 +248,15 @@ static int isofs_dentry_cmp_common(
 }
 
 static int
-isofs_hash(struct dentry *dentry, struct qstr *qstr)
+isofs_hash(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	return isofs_hash_common(dentry, qstr, 0);
 }
 
 static int
-isofs_hashi(struct dentry *dentry, struct qstr *qstr)
+isofs_hashi(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	return isofs_hashi_common(dentry, qstr, 0);
 }
@@ -273,13 +279,15 @@ isofs_dentry_cmpi(const struct dentry *p
 
 #ifdef CONFIG_JOLIET
 static int
-isofs_hash_ms(struct dentry *dentry, struct qstr *qstr)
+isofs_hash_ms(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	return isofs_hash_common(dentry, qstr, 1);
 }
 
 static int
-isofs_hashi_ms(struct dentry *dentry, struct qstr *qstr)
+isofs_hashi_ms(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	return isofs_hashi_common(dentry, qstr, 1);
 }
Index: linux-2.6/fs/jfs/namei.c
===================================================================
--- linux-2.6.orig/fs/jfs/namei.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/jfs/namei.c	2010-11-17 00:52:37.000000000 +1100
@@ -1574,7 +1574,8 @@ const struct file_operations jfs_dir_ope
 	.llseek		= generic_file_llseek,
 };
 
-static int jfs_ci_hash(struct dentry *dir, struct qstr *this)
+static int jfs_ci_hash(const struct dentry *dir, const struct inode *inode,
+		struct qstr *this)
 {
 	unsigned long hash;
 	int i;
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 01:05:45.000000000 +1100
@@ -75,7 +75,8 @@ const struct inode_operations ncp_dir_in
  * Dentry operations routines
  */
 static int ncp_lookup_validate(struct dentry *, struct nameidata *);
-static int ncp_hash_dentry(struct dentry *, struct qstr *);
+static int ncp_hash_dentry(const struct dentry *, const struct inode *,
+		struct qstr *);
 static int ncp_compare_dentry(const struct dentry *,
 		const struct dentry *, const struct inode *,
 		unsigned int, const char *, const struct qstr *);
@@ -130,10 +131,9 @@ static inline int ncp_case_sensitive(con
  * is case-sensitive.
  */
 static int 
-ncp_hash_dentry(struct dentry *dentry, struct qstr *this)
+ncp_hash_dentry(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *this)
 {
-	struct inode *inode = dentry->d_inode;
-
 	if (!ncp_case_sensitive(inode)) {
 		struct super_block *sb = dentry->d_sb;
 		struct nls_table *t;
@@ -602,7 +602,7 @@ ncp_fill_cache(struct file *filp, void *
 	qname.hash = full_name_hash(qname.name, qname.len);
 
 	if (dentry->d_op && dentry->d_op->d_hash)
-		if (dentry->d_op->d_hash(dentry, &qname) != 0)
+		if (dentry->d_op->d_hash(dentry, dentry->d_inode, &qname) != 0)
 			goto end_advance;
 
 	newdent = d_lookup(dentry, &qname);
Index: linux-2.6/fs/sysv/namei.c
===================================================================
--- linux-2.6.orig/fs/sysv/namei.c	2010-11-17 00:50:53.000000000 +1100
+++ linux-2.6/fs/sysv/namei.c	2010-11-17 00:52:37.000000000 +1100
@@ -27,7 +27,8 @@ static int add_nondir(struct dentry *den
 	return err;
 }
 
-static int sysv_hash(struct dentry *dentry, struct qstr *qstr)
+static int sysv_hash(const struct dentry *dentry, const struct inode *inode,
+		struct qstr *qstr)
 {
 	/* Truncate the name in place, avoids having to define a compare
 	   function. */
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:48.000000000 +1100
@@ -133,7 +133,8 @@ enum dentry_d_lock_class
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
-	int (*d_hash)(struct dentry *, struct qstr *);
+	int (*d_hash)(const struct dentry *, const struct inode *,
+			struct qstr *);
 	int (*d_compare)(const struct dentry *,
 			const struct dentry *, const struct inode *,
 			unsigned int, const char *, const struct qstr *);
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-11-17 00:50:52.000000000 +1100
+++ linux-2.6/fs/namei.c	2010-11-17 01:05:46.000000000 +1100
@@ -731,7 +731,8 @@ static int do_lookup(struct nameidata *n
 	 * to use its own hash..
 	 */
 	if (nd->path.dentry->d_op && nd->path.dentry->d_op->d_hash) {
-		int err = nd->path.dentry->d_op->d_hash(nd->path.dentry, name);
+		int err = nd->path.dentry->d_op->d_hash(nd->path.dentry,
+				nd->path.dentry->d_inode, name);
 		if (err < 0)
 			return err;
 	}
@@ -1134,7 +1135,7 @@ static struct dentry *__lookup_hash(stru
 	 * to use its own hash..
 	 */
 	if (base->d_op && base->d_op->d_hash) {
-		err = base->d_op->d_hash(base, name);
+		err = base->d_op->d_hash(base, inode, name);
 		dentry = ERR_PTR(err);
 		if (err < 0)
 			goto out;
Index: linux-2.6/fs/cifs/readdir.c
===================================================================
--- linux-2.6.orig/fs/cifs/readdir.c	2010-11-17 00:50:52.000000000 +1100
+++ linux-2.6/fs/cifs/readdir.c	2010-11-17 00:52:37.000000000 +1100
@@ -79,7 +79,7 @@ cifs_readdir_lookup(struct dentry *paren
 	cFYI(1, "For %s", name->name);
 
 	if (parent->d_op && parent->d_op->d_hash)
-		parent->d_op->d_hash(parent, name);
+		parent->d_op->d_hash(parent, parent->d_inode, name);
 	else
 		name->hash = full_name_hash(name->name, name->len);
 
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:48.000000000 +1100
@@ -1474,7 +1474,7 @@ struct dentry *d_hash_and_lookup(struct
 	 */
 	name->hash = full_name_hash(name->name, name->len);
 	if (dir->d_op && dir->d_op->d_hash) {
-		if (dir->d_op->d_hash(dir, name) < 0)
+		if (dir->d_op->d_hash(dir, dir->d_inode, name) < 0)
 			goto out;
 	}
 	dentry = d_lookup(dir, name);
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c	2010-11-17 00:50:52.000000000 +1100
+++ linux-2.6/fs/ecryptfs/inode.c	2010-11-17 01:05:47.000000000 +1100
@@ -454,7 +454,7 @@ static struct dentry *ecryptfs_lookup(st
 	lower_name.hash = ecryptfs_dentry->d_name.hash;
 	if (lower_dir_dentry->d_op && lower_dir_dentry->d_op->d_hash) {
 		rc = lower_dir_dentry->d_op->d_hash(lower_dir_dentry,
-						    &lower_name);
+				lower_dir_dentry->d_inode, &lower_name);
 		if (rc < 0)
 			goto out_d_drop;
 	}
@@ -489,7 +489,7 @@ static struct dentry *ecryptfs_lookup(st
 	lower_name.hash = full_name_hash(lower_name.name, lower_name.len);
 	if (lower_dir_dentry->d_op && lower_dir_dentry->d_op->d_hash) {
 		rc = lower_dir_dentry->d_op->d_hash(lower_dir_dentry,
-						    &lower_name);
+				lower_dir_dentry->d_inode, &lower_name);
 		if (rc < 0)
 			goto out_d_drop;
 	}
Index: linux-2.6/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.orig/Documentation/filesystems/Locking	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/Locking	2010-11-17 01:05:42.000000000 +1100
@@ -10,7 +10,8 @@ be able to use diff(1).
 --------------------------- dentry_operations --------------------------
 prototypes:
 	int (*d_revalidate)(struct dentry *, int);
-	int (*d_hash) (struct dentry *, struct qstr *);
+	int (*d_hash)(const struct dentry *, const struct inode *,
+			struct qstr *);
 	int (*d_compare)(const struct dentry *,
 			const struct dentry *, const struct inode *,
 			unsigned int, const char *, const struct qstr *);
@@ -23,7 +24,7 @@ be able to use diff(1).
 	none have BKL
 		dcache_lock	rename_lock	->d_lock	may block
 d_revalidate:	no		no		no		yes
-d_hash		no		no		no		yes
+d_hash		no		no		no		no
 d_compare:	no		yes		no		no 
 d_delete:	yes		no		yes		no
 d_release:	no		no		no		yes
Index: linux-2.6/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/vfs.txt	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/vfs.txt	2010-11-17 00:52:37.000000000 +1100
@@ -841,7 +841,8 @@ the VFS uses a default. As of kernel 2.6
 
 struct dentry_operations {
 	int (*d_revalidate)(struct dentry *, struct nameidata *);
-	int (*d_hash)(struct dentry *, struct qstr *);
+	int (*d_hash)(const struct dentry *, const struct inode *,
+			struct qstr *);
 	int (*d_compare)(const struct dentry *,
 			const struct dentry *, const struct inode *,
 			unsigned int, const char *, const struct qstr *);
@@ -858,7 +859,10 @@ struct dentry_operations {
 
   d_hash: called when the VFS adds a dentry to the hash table. The first
 	dentry passed to d_hash is the parent directory that the name is
- 	to be hashed into.
+ 	to be hashed into. The inode is the dentry's inode.
+
+	Same locking and synchronisation rules as d_compare regarding
+	what is safe to dereference etc.
 
   d_compare: called to compare a dentry name with a given name. The first
 	dentry is the parent of the dentry to be compared, the second is
Index: linux-2.6/Documentation/filesystems/porting
===================================================================
--- linux-2.6.orig/Documentation/filesystems/porting	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/porting	2010-11-17 01:05:42.000000000 +1100
@@ -334,3 +334,10 @@ unreferenced dentries, and is now only c
 changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
 look at examples of other filesystems) for guidance.
 
+---
+[mandatory]
+
+	.d_hash() calling convention and locking rules are significantly
+changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
+look at examples of other filesystems) for guidance.
+



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 09/28] hostfs: simplify locking
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (7 preceding siblings ...)
  2010-11-16 14:09 ` [patch 08/28] fs: change d_hash " Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 10/28] fs: dcache scale hash Nick Piggin
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: hostfs-change.patch --]
[-- Type: text/plain, Size: 3653 bytes --]

Remove dcache_lock locking from hostfs filesystem, and move it into dcache
helpers. All that should really be required is a coherent path name, protection
from concurrent modification is not provided outside path name generation
because dcache_lock is dropped before the path is used.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c             |   15 +++++++++++++--
 fs/hostfs/hostfs_kern.c |   24 ++++++++++--------------
 include/linux/dcache.h  |    2 +-
 3 files changed, 24 insertions(+), 17 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:47.000000000 +1100
@@ -2140,7 +2140,7 @@ char *dynamic_dname(struct dentry *dentr
 /*
  * Write full pathname from the root of the filesystem into the buffer.
  */
-char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
+static char *__dentry_path(struct dentry *dentry, char *buf, int buflen)
 {
 	char *end = buf + buflen;
 	char *retval;
@@ -2167,7 +2167,18 @@ char *__dentry_path(struct dentry *dentr
 Elong:
 	return ERR_PTR(-ENAMETOOLONG);
 }
-EXPORT_SYMBOL(__dentry_path);
+
+char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
+{
+	char *retval;
+
+	spin_lock(&dcache_lock);
+	retval = __dentry_path(dentry, buf, buflen);
+	spin_unlock(&dcache_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(dentry_path_raw);
 
 char *dentry_path(struct dentry *dentry, char *buf, int buflen)
 {
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/hostfs/hostfs_kern.c	2010-11-17 00:52:37.000000000 +1100
@@ -92,12 +92,10 @@ __uml_setup("hostfs=", hostfs_args,
 
 static char *__dentry_name(struct dentry *dentry, char *name)
 {
-	char *p = __dentry_path(dentry, name, PATH_MAX);
+	char *p = dentry_path_raw(dentry, name, PATH_MAX);
 	char *root;
 	size_t len;
 
-	spin_unlock(&dcache_lock);
-
 	root = dentry->d_sb->s_fs_info;
 	len = strlen(root);
 	if (IS_ERR(p)) {
@@ -123,25 +121,23 @@ static char *dentry_name(struct dentry *
 	if (!name)
 		return NULL;
 
-	spin_lock(&dcache_lock);
 	return __dentry_name(dentry, name); /* will unlock */
 }
 
 static char *inode_name(struct inode *ino)
 {
 	struct dentry *dentry;
-	char *name = __getname();
-	if (!name)
-		return NULL;
+	char *name;
 
-	spin_lock(&dcache_lock);
-	if (list_empty(&ino->i_dentry)) {
-		spin_unlock(&dcache_lock);
-		__putname(name);
+	dentry = d_find_alias(ino);
+	if (!dentry)
 		return NULL;
-	}
-	dentry = list_first_entry(&ino->i_dentry, struct dentry, d_alias);
-	return __dentry_name(dentry, name); /* will unlock */
+
+	name = dentry_name(dentry);
+
+	dput(dentry);
+
+	return name;
 }
 
 static char *follow_link(char *link)
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:47.000000000 +1100
@@ -309,7 +309,7 @@ extern char *dynamic_dname(struct dentry
 extern char *__d_path(const struct path *path, struct path *root, char *, int);
 extern char *d_path(const struct path *, char *, int);
 extern char *d_path_with_unreachable(const struct path *, char *, int);
-extern char *__dentry_path(struct dentry *, char *, int);
+extern char *dentry_path_raw(struct dentry *, char *, int);
 extern char *dentry_path(struct dentry *, char *, int);
 
 /* Allocation counts.. */



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 10/28] fs: dcache scale hash
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (8 preceding siblings ...)
  2010-11-16 14:09 ` [patch 09/28] hostfs: simplify locking Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 11/28] fs: dcache scale lru Nick Piggin
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-d_hash.patch --]
[-- Type: text/plain, Size: 3819 bytes --]

Add a new lock, dcache_hash_lock, to protect the dcache hash table from
concurrent modification. d_hash is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c            |   38 +++++++++++++++++++++++++++-----------
 include/linux/dcache.h |    3 +++
 2 files changed, 30 insertions(+), 11 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:47.000000000 +1100
@@ -35,12 +35,27 @@
 #include <linux/hardirq.h>
 #include "internal.h"
 
+/*
+ * Usage:
+ * dcache_hash_lock protects dcache hash table
+ *
+ * Ordering:
+ * dcache_lock
+ *   dentry->d_lock
+ *     dcache_hash_lock
+ *
+ * if (dentry1 < dentry2)
+ *   dentry1->d_lock
+ *     dentry2->d_lock
+ */
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
- __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
+EXPORT_SYMBOL(dcache_hash_lock);
 EXPORT_SYMBOL(dcache_lock);
 
 static struct kmem_cache *dentry_cache __read_mostly;
@@ -1195,7 +1210,9 @@ struct dentry *d_obtain_alias(struct ino
 	tmp->d_flags |= DCACHE_DISCONNECTED;
 	tmp->d_flags &= ~DCACHE_UNHASHED;
 	list_add(&tmp->d_alias, &inode->i_dentry);
+	spin_lock(&dcache_hash_lock);
 	hlist_add_head(&tmp->d_hash, &inode->i_sb->s_anon);
+	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&tmp->d_lock);
 
 	spin_unlock(&dcache_lock);
@@ -1581,7 +1598,9 @@ void d_rehash(struct dentry * entry)
 {
 	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
+	spin_lock(&dcache_hash_lock);
 	_d_rehash(entry);
+	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&entry->d_lock);
 	spin_unlock(&dcache_lock);
 }
@@ -1661,8 +1680,6 @@ static void switch_names(struct dentry *
  */
 static void d_move_locked(struct dentry * dentry, struct dentry * target)
 {
-	struct hlist_head *list;
-
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
@@ -1679,14 +1696,11 @@ static void d_move_locked(struct dentry
 	}
 
 	/* Move the dentry to the target hash queue, if on different bucket */
-	if (d_unhashed(dentry))
-		goto already_unhashed;
-
-	hlist_del_rcu(&dentry->d_hash);
-
-already_unhashed:
-	list = d_hash(target->d_parent, target->d_name.hash);
-	__d_rehash(dentry, list);
+	spin_lock(&dcache_hash_lock);
+	if (!d_unhashed(dentry))
+		hlist_del_rcu(&dentry->d_hash);
+	__d_rehash(dentry, d_hash(target->d_parent, target->d_name.hash));
+	spin_unlock(&dcache_hash_lock);
 
 	/* Unhash the target: dput() will then get rid of it */
 	__d_drop(target);
@@ -1883,7 +1897,9 @@ struct dentry *d_materialise_unique(stru
 found_lock:
 	spin_lock(&actual->d_lock);
 found:
+	spin_lock(&dcache_hash_lock);
 	_d_rehash(actual);
+	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&actual->d_lock);
 	spin_unlock(&dcache_lock);
 out_nolock:
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:46.000000000 +1100
@@ -181,6 +181,7 @@ struct dentry_operations {
 
 #define DCACHE_CANT_MOUNT	0x0100
 
+extern spinlock_t dcache_hash_lock;
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
 
@@ -204,7 +205,9 @@ static inline void __d_drop(struct dentr
 {
 	if (!(dentry->d_flags & DCACHE_UNHASHED)) {
 		dentry->d_flags |= DCACHE_UNHASHED;
+		spin_lock(&dcache_hash_lock);
 		hlist_del_rcu(&dentry->d_hash);
+		spin_unlock(&dcache_hash_lock);
 	}
 }
 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 11/28] fs: dcache scale lru
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (9 preceding siblings ...)
  2010-11-16 14:09 ` [patch 10/28] fs: dcache scale hash Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 12/28] fs: dcache scale dentry refcount Nick Piggin
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-d_lru.patch --]
[-- Type: text/plain, Size: 8782 bytes --]

Add a new lock, dcache_lru_lock, to protect the dcache LRU list from concurrent
modification. d_lru is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |  112 +++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 84 insertions(+), 28 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:46.000000000 +1100
@@ -37,11 +37,19 @@
 
 /*
  * Usage:
- * dcache_hash_lock protects dcache hash table
+ * dcache_hash_lock protects:
+ *   - the dcache hash table
+ * dcache_lru_lock protects:
+ *   - the dcache lru lists and counters
+ * d_lock protects:
+ *   - d_flags
+ *   - d_name
+ *   - d_lru
  *
  * Ordering:
  * dcache_lock
  *   dentry->d_lock
+ *     dcache_lru_lock
  *     dcache_hash_lock
  *
  * if (dentry1 < dentry2)
@@ -52,6 +60,7 @@ int sysctl_vfs_cache_pressure __read_mos
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
@@ -148,28 +157,38 @@ static void dentry_iput(struct dentry *
 }
 
 /*
- * dentry_lru_(add|del|move_tail) must be called with dcache_lock held.
+ * dentry_lru_(add|del|move_tail) must be called with d_lock held.
  */
 static void dentry_lru_add(struct dentry *dentry)
 {
 	if (list_empty(&dentry->d_lru)) {
+		spin_lock(&dcache_lru_lock);
 		list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
 		dentry->d_sb->s_nr_dentry_unused++;
 		percpu_counter_inc(&nr_dentry_unused);
+		spin_unlock(&dcache_lru_lock);
 	}
 }
 
+static void __dentry_lru_del(struct dentry *dentry)
+{
+	list_del_init(&dentry->d_lru);
+	dentry->d_sb->s_nr_dentry_unused--;
+	percpu_counter_dec(&nr_dentry_unused);
+}
+
 static void dentry_lru_del(struct dentry *dentry)
 {
 	if (!list_empty(&dentry->d_lru)) {
-		list_del_init(&dentry->d_lru);
-		dentry->d_sb->s_nr_dentry_unused--;
-		percpu_counter_dec(&nr_dentry_unused);
+		spin_lock(&dcache_lru_lock);
+		__dentry_lru_del(dentry);
+		spin_unlock(&dcache_lru_lock);
 	}
 }
 
 static void dentry_lru_move_tail(struct dentry *dentry)
 {
+	spin_lock(&dcache_lru_lock);
 	if (list_empty(&dentry->d_lru)) {
 		list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
 		dentry->d_sb->s_nr_dentry_unused++;
@@ -177,6 +196,7 @@ static void dentry_lru_move_tail(struct
 	} else {
 		list_move_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
 	}
+	spin_unlock(&dcache_lru_lock);
 }
 
 /**
@@ -186,6 +206,8 @@ static void dentry_lru_move_tail(struct
  * The dentry must already be unhashed and removed from the LRU.
  *
  * If this is the root of the dentry tree, return NULL.
+ *
+ * dcache_lock and d_lock must be held by caller, are dropped by d_kill.
  */
 static struct dentry *d_kill(struct dentry *dentry)
 	__releases(dentry->d_lock)
@@ -341,10 +363,19 @@ int d_invalidate(struct dentry * dentry)
 EXPORT_SYMBOL(d_invalidate);
 
 /* This should be called _only_ with dcache_lock held */
+static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
+{
+	atomic_inc(&dentry->d_count);
+	dentry_lru_del(dentry);
+	return dentry;
+}
+
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
 	atomic_inc(&dentry->d_count);
+	spin_lock(&dentry->d_lock);
 	dentry_lru_del(dentry);
+	spin_unlock(&dentry->d_lock);
 	return dentry;
 }
 
@@ -423,7 +454,7 @@ void d_prune_aliases(struct inode *inode
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		spin_lock(&dentry->d_lock);
 		if (!atomic_read(&dentry->d_count)) {
-			__dget_locked(dentry);
+			__dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
@@ -447,7 +478,6 @@ EXPORT_SYMBOL(d_prune_aliases);
 static void prune_one_dentry(struct dentry * dentry)
 	__releases(dentry->d_lock)
 	__releases(dcache_lock)
-	__acquires(dcache_lock)
 {
 	__d_drop(dentry);
 	dentry = d_kill(dentry);
@@ -456,15 +486,16 @@ static void prune_one_dentry(struct dent
 	 * Prune ancestors.  Locking is simpler than in dput(),
 	 * because dcache_lock needs to be taken anyway.
 	 */
-	spin_lock(&dcache_lock);
 	while (dentry) {
-		if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock))
+		spin_lock(&dcache_lock);
+		if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock)) {
+			spin_unlock(&dcache_lock);
 			return;
+		}
 
 		dentry_lru_del(dentry);
 		__d_drop(dentry);
 		dentry = d_kill(dentry);
-		spin_lock(&dcache_lock);
 	}
 }
 
@@ -474,21 +505,31 @@ static void shrink_dentry_list(struct li
 
 	while (!list_empty(list)) {
 		dentry = list_entry(list->prev, struct dentry, d_lru);
-		dentry_lru_del(dentry);
+
+		if (!spin_trylock(&dentry->d_lock)) {
+			spin_unlock(&dcache_lru_lock);
+			cpu_relax();
+			spin_lock(&dcache_lru_lock);
+			continue;
+		}
+
+		__dentry_lru_del(dentry);
 
 		/*
 		 * We found an inuse dentry which was not removed from
 		 * the LRU because of laziness during lookup.  Do not free
 		 * it - just keep it off the LRU list.
 		 */
-		spin_lock(&dentry->d_lock);
 		if (atomic_read(&dentry->d_count)) {
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
+		spin_unlock(&dcache_lru_lock);
+
 		prune_one_dentry(dentry);
-		/* dentry->d_lock was dropped in prune_one_dentry() */
-		cond_resched_lock(&dcache_lock);
+		/* dcache_lock and dentry->d_lock dropped */
+		spin_lock(&dcache_lock);
+		spin_lock(&dcache_lru_lock);
 	}
 }
 
@@ -509,32 +550,36 @@ static void __shrink_dcache_sb(struct su
 	int cnt = *count;
 
 	spin_lock(&dcache_lock);
+relock:
+	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
 		dentry = list_entry(sb->s_dentry_lru.prev,
 				struct dentry, d_lru);
 		BUG_ON(dentry->d_sb != sb);
 
+		if (!spin_trylock(&dentry->d_lock)) {
+			spin_unlock(&dcache_lru_lock);
+			cpu_relax();
+			goto relock;
+		}
+
 		/*
 		 * If we are honouring the DCACHE_REFERENCED flag and the
 		 * dentry has this flag set, don't free it.  Clear the flag
 		 * and put it back on the LRU.
 		 */
-		if (flags & DCACHE_REFERENCED) {
-			spin_lock(&dentry->d_lock);
-			if (dentry->d_flags & DCACHE_REFERENCED) {
-				dentry->d_flags &= ~DCACHE_REFERENCED;
-				list_move(&dentry->d_lru, &referenced);
-				spin_unlock(&dentry->d_lock);
-				cond_resched_lock(&dcache_lock);
-				continue;
-			}
+		if (flags & DCACHE_REFERENCED &&
+				dentry->d_flags & DCACHE_REFERENCED) {
+			dentry->d_flags &= ~DCACHE_REFERENCED;
+			list_move(&dentry->d_lru, &referenced);
 			spin_unlock(&dentry->d_lock);
+		} else {
+			list_move_tail(&dentry->d_lru, &tmp);
+			spin_unlock(&dentry->d_lock);
+			if (!--cnt)
+				break;
 		}
-
-		list_move_tail(&dentry->d_lru, &tmp);
-		if (!--cnt)
-			break;
-		cond_resched_lock(&dcache_lock);
+		/* XXX: re-add cond_resched_lock when dcache_lock goes away */
 	}
 
 	*count = cnt;
@@ -542,6 +587,7 @@ static void __shrink_dcache_sb(struct su
 
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
+	spin_unlock(&dcache_lru_lock);
 	spin_unlock(&dcache_lock);
 
 }
@@ -637,10 +683,12 @@ void shrink_dcache_sb(struct super_block
 	LIST_HEAD(tmp);
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
 		list_splice_init(&sb->s_dentry_lru, &tmp);
 		shrink_dentry_list(&tmp);
 	}
+	spin_unlock(&dcache_lru_lock);
 	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
@@ -659,7 +707,9 @@ static void shrink_dcache_for_umount_sub
 
 	/* detach this root from the system */
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	dentry_lru_del(dentry);
+	spin_unlock(&dentry->d_lock);
 	__d_drop(dentry);
 	spin_unlock(&dcache_lock);
 
@@ -673,7 +723,9 @@ static void shrink_dcache_for_umount_sub
 			spin_lock(&dcache_lock);
 			list_for_each_entry(loop, &dentry->d_subdirs,
 					    d_u.d_child) {
+				spin_lock(&loop->d_lock);
 				dentry_lru_del(loop);
+				spin_unlock(&loop->d_lock);
 				__d_drop(loop);
 				cond_resched_lock(&dcache_lock);
 			}
@@ -850,6 +902,8 @@ static int select_parent(struct dentry *
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
 
+		spin_lock(&dentry->d_lock);
+
 		/* 
 		 * move only zero ref count dentries to the end 
 		 * of the unused list for prune_dcache
@@ -861,6 +915,8 @@ static int select_parent(struct dentry *
 			dentry_lru_del(dentry);
 		}
 
+		spin_unlock(&dentry->d_lock);
+
 		/*
 		 * We can return to the caller if we have found some (this
 		 * ensures forward progress). We'll be coming back to find



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 12/28] fs: dcache scale dentry refcount
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (10 preceding siblings ...)
  2010-11-16 14:09 ` [patch 11/28] fs: dcache scale lru Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 13/28] fs: dcache scale d_unhashed Nick Piggin
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-d_count.patch --]
[-- Type: text/plain, Size: 23320 bytes --]

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 arch/powerpc/platforms/cell/spufs/inode.c |    2 
 drivers/infiniband/hw/ipath/ipath_fs.c    |    2 
 drivers/infiniband/hw/qib/qib_fs.c        |    2 
 fs/autofs4/expire.c                       |    8 +-
 fs/autofs4/root.c                         |    6 -
 fs/ceph/dir.c                             |    4 -
 fs/ceph/inode.c                           |    4 -
 fs/ceph/mds_client.c                      |    2 
 fs/coda/dir.c                             |    2 
 fs/configfs/dir.c                         |    3 
 fs/configfs/inode.c                       |    2 
 fs/dcache.c                               |  106 +++++++++++++++++++++++-------
 fs/ecryptfs/inode.c                       |    2 
 fs/locks.c                                |    2 
 fs/namei.c                                |    2 
 fs/nfs/dir.c                              |    6 -
 fs/nfs/unlink.c                           |    2 
 fs/nfsd/vfs.c                             |    5 -
 fs/nilfs2/super.c                         |    2 
 include/linux/dcache.h                    |   29 ++++----
 kernel/cgroup.c                           |    2 
 21 files changed, 126 insertions(+), 69 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:45.000000000 +1100
@@ -45,6 +45,7 @@
  *   - d_flags
  *   - d_name
  *   - d_lru
+ *   - d_count
  *
  * Ordering:
  * dcache_lock
@@ -119,6 +120,7 @@ static void __d_free(struct rcu_head *he
  */
 static void d_free(struct dentry *dentry)
 {
+	BUG_ON(dentry->d_count);
 	percpu_counter_dec(&nr_dentry);
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
@@ -216,8 +218,11 @@ static struct dentry *d_kill(struct dent
 	struct dentry *parent;
 
 	list_del(&dentry->d_u.d_child);
-	/*drops the locks, at that point nobody can reach this dentry */
 	dentry_iput(dentry);
+	/*
+	 * dentry_iput drops the locks, at which point nobody (except
+	 * transient RCU lookups) can reach this dentry.
+	 */
 	if (IS_ROOT(dentry))
 		parent = NULL;
 	else
@@ -261,13 +266,23 @@ void dput(struct dentry *dentry)
 		return;
 
 repeat:
-	if (atomic_read(&dentry->d_count) == 1)
+	if (dentry->d_count == 1)
 		might_sleep();
-	if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
-		return;
-
 	spin_lock(&dentry->d_lock);
-	if (atomic_read(&dentry->d_count)) {
+	if (dentry->d_count == 1) {
+		if (!spin_trylock(&dcache_lock)) {
+			/*
+			 * Something of a livelock possibility we could avoid
+			 * by taking dcache_lock and trying again, but we
+			 * want to reduce dcache_lock anyway so this will
+			 * get improved.
+			 */
+			spin_unlock(&dentry->d_lock);
+			goto repeat;
+		}
+	}
+	dentry->d_count--;
+	if (dentry->d_count) {
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		return;
@@ -347,7 +362,7 @@ int d_invalidate(struct dentry * dentry)
 	 * working directory or similar).
 	 */
 	spin_lock(&dentry->d_lock);
-	if (atomic_read(&dentry->d_count) > 1) {
+	if (dentry->d_count > 1) {
 		if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
@@ -362,29 +377,61 @@ int d_invalidate(struct dentry * dentry)
 }
 EXPORT_SYMBOL(d_invalidate);
 
-/* This should be called _only_ with dcache_lock held */
+/* This must be called with dcache_lock and d_lock held */
 static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
 {
-	atomic_inc(&dentry->d_count);
+	dentry->d_count++;
 	dentry_lru_del(dentry);
 	return dentry;
 }
 
+/* This should be called _only_ with dcache_lock held */
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
-	atomic_inc(&dentry->d_count);
 	spin_lock(&dentry->d_lock);
-	dentry_lru_del(dentry);
+	__dget_locked_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
 	return dentry;
 }
 
+struct dentry * dget_locked_dlock(struct dentry *dentry)
+{
+	return __dget_locked_dlock(dentry);
+}
+
 struct dentry * dget_locked(struct dentry *dentry)
 {
 	return __dget_locked(dentry);
 }
 EXPORT_SYMBOL(dget_locked);
 
+struct dentry *dget_parent(struct dentry *dentry)
+{
+	struct dentry *ret;
+
+repeat:
+	spin_lock(&dentry->d_lock);
+	ret = dentry->d_parent;
+	if (!ret)
+		goto out;
+	if (dentry == ret) {
+		ret->d_count++;
+		goto out;
+	}
+	if (!spin_trylock(&ret->d_lock)) {
+		spin_unlock(&dentry->d_lock);
+		cpu_relax();
+		goto repeat;
+	}
+	BUG_ON(!ret->d_count);
+	ret->d_count++;
+	spin_unlock(&ret->d_lock);
+out:
+	spin_unlock(&dentry->d_lock);
+	return ret;
+}
+EXPORT_SYMBOL(dget_parent);
+
 /**
  * d_find_alias - grab a hashed alias of inode
  * @inode: inode in question
@@ -453,7 +500,7 @@ void d_prune_aliases(struct inode *inode
 	spin_lock(&dcache_lock);
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		spin_lock(&dentry->d_lock);
-		if (!atomic_read(&dentry->d_count)) {
+		if (!dentry->d_count) {
 			__dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
@@ -488,7 +535,10 @@ static void prune_one_dentry(struct dent
 	 */
 	while (dentry) {
 		spin_lock(&dcache_lock);
-		if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock)) {
+		spin_lock(&dentry->d_lock);
+		dentry->d_count--;
+		if (dentry->d_count) {
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			return;
 		}
@@ -520,7 +570,7 @@ static void shrink_dentry_list(struct li
 		 * the LRU because of laziness during lookup.  Do not free
 		 * it - just keep it off the LRU list.
 		 */
-		if (atomic_read(&dentry->d_count)) {
+		if (dentry->d_count) {
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
@@ -741,7 +791,7 @@ static void shrink_dcache_for_umount_sub
 		do {
 			struct inode *inode;
 
-			if (atomic_read(&dentry->d_count) != 0) {
+			if (dentry->d_count != 0) {
 				printk(KERN_ERR
 				       "BUG: Dentry %p{i=%lx,n=%s}"
 				       " still in use (%d)"
@@ -750,7 +800,7 @@ static void shrink_dcache_for_umount_sub
 				       dentry->d_inode ?
 				       dentry->d_inode->i_ino : 0UL,
 				       dentry->d_name.name,
-				       atomic_read(&dentry->d_count),
+				       dentry->d_count,
 				       dentry->d_sb->s_type->name,
 				       dentry->d_sb->s_id);
 				BUG();
@@ -760,7 +810,9 @@ static void shrink_dcache_for_umount_sub
 				parent = NULL;
 			else {
 				parent = dentry->d_parent;
-				atomic_dec(&parent->d_count);
+				spin_lock(&parent->d_lock);
+				parent->d_count--;
+				spin_unlock(&parent->d_lock);
 			}
 
 			list_del(&dentry->d_u.d_child);
@@ -811,7 +863,9 @@ void shrink_dcache_for_umount(struct sup
 
 	dentry = sb->s_root;
 	sb->s_root = NULL;
-	atomic_dec(&dentry->d_count);
+	spin_lock(&dentry->d_lock);
+	dentry->d_count--;
+	spin_unlock(&dentry->d_lock);
 	shrink_dcache_for_umount_subtree(dentry);
 
 	while (!hlist_empty(&sb->s_anon)) {
@@ -908,7 +962,7 @@ static int select_parent(struct dentry *
 		 * move only zero ref count dentries to the end 
 		 * of the unused list for prune_dcache
 		 */
-		if (!atomic_read(&dentry->d_count)) {
+		if (!dentry->d_count) {
 			dentry_lru_move_tail(dentry);
 			found++;
 		} else {
@@ -1029,7 +1083,7 @@ struct dentry *d_alloc(struct dentry * p
 	memcpy(dname, name->name, name->len);
 	dname[name->len] = 0;
 
-	atomic_set(&dentry->d_count, 1);
+	dentry->d_count = 1;
 	dentry->d_flags = DCACHE_UNHASHED;
 	spin_lock_init(&dentry->d_lock);
 	dentry->d_inode = NULL;
@@ -1517,7 +1571,7 @@ struct dentry * __d_lookup(struct dentry
 				goto next;
 		}
 
-		atomic_inc(&dentry->d_count);
+		dentry->d_count++;
 		found = dentry;
 		spin_unlock(&dentry->d_lock);
 		break;
@@ -1614,7 +1668,7 @@ void d_delete(struct dentry * dentry)
 	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
-	if (atomic_read(&dentry->d_count) == 1) {
+	if (dentry->d_count == 1) {
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_iput(dentry);
 		fsnotify_nameremove(dentry, isdir);
@@ -2428,11 +2482,15 @@ void d_genocide(struct dentry *root)
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
+		spin_lock(&dentry->d_lock);
+		dentry->d_count--;
+		spin_unlock(&dentry->d_lock);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
+		spin_lock(&this_parent->d_lock);
+		this_parent->d_count--;
+		spin_unlock(&this_parent->d_lock);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:44.000000000 +1100
@@ -87,7 +87,7 @@ full_name_hash(const unsigned char *name
 #endif
 
 struct dentry {
-	atomic_t d_count;
+	unsigned int d_count;		/* protected by d_lock */
 	unsigned int d_flags;		/* protected by d_lock */
 	spinlock_t d_lock;		/* per dentry lock */
 	int d_mounted;
@@ -329,17 +329,28 @@ extern char *dentry_path(struct dentry *
  *	needs and they take necessary precautions) you should hold dcache_lock
  *	and call dget_locked() instead of dget().
  */
- 
+static inline struct dentry *dget_dlock(struct dentry *dentry)
+{
+	if (dentry) {
+		BUG_ON(!dentry->d_count);
+		dentry->d_count++;
+	}
+	return dentry;
+}
 static inline struct dentry *dget(struct dentry *dentry)
 {
 	if (dentry) {
-		BUG_ON(!atomic_read(&dentry->d_count));
-		atomic_inc(&dentry->d_count);
+		spin_lock(&dentry->d_lock);
+		dget_dlock(dentry);
+		spin_unlock(&dentry->d_lock);
 	}
 	return dentry;
 }
 
 extern struct dentry * dget_locked(struct dentry *);
+extern struct dentry * dget_locked_dlock(struct dentry *);
+
+extern struct dentry *dget_parent(struct dentry *dentry);
 
 /**
  *	d_unhashed -	is dentry hashed
@@ -370,16 +381,6 @@ static inline void dont_mount(struct den
 	spin_unlock(&dentry->d_lock);
 }
 
-static inline struct dentry *dget_parent(struct dentry *dentry)
-{
-	struct dentry *ret;
-
-	spin_lock(&dentry->d_lock);
-	ret = dget(dentry->d_parent);
-	spin_unlock(&dentry->d_lock);
-	return ret;
-}
-
 extern void dput(struct dentry *);
 
 static inline int d_mountpoint(struct dentry *dentry)
Index: linux-2.6/fs/configfs/dir.c
===================================================================
--- linux-2.6.orig/fs/configfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/configfs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -399,8 +399,7 @@ static void remove_dir(struct dentry * d
 	if (d->d_inode)
 		simple_rmdir(parent->d_inode,d);
 
-	pr_debug(" o %s removing done (%d)\n",d->d_name.name,
-		 atomic_read(&d->d_count));
+	pr_debug(" o %s removing done (%d)\n",d->d_name.name, d->d_count);
 
 	dput(parent);
 }
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/locks.c	2010-11-17 00:52:37.000000000 +1100
@@ -1390,7 +1390,7 @@ int generic_setlease(struct file *filp,
 		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 			goto out;
 		if ((arg == F_WRLCK)
-		    && ((atomic_read(&dentry->d_count) > 1)
+		    && ((dentry->d_count > 1)
 			|| (atomic_read(&inode->i_count) > 1)))
 			goto out;
 	}
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/namei.c	2010-11-17 01:05:42.000000000 +1100
@@ -2130,7 +2130,7 @@ void dentry_unhash(struct dentry *dentry
 	shrink_dcache_parent(dentry);
 	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
-	if (atomic_read(&dentry->d_count) == 2)
+	if (dentry->d_count == 2)
 		__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
Index: linux-2.6/fs/autofs4/expire.c
===================================================================
--- linux-2.6.orig/fs/autofs4/expire.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/autofs4/expire.c	2010-11-17 01:05:45.000000000 +1100
@@ -198,7 +198,7 @@ static int autofs4_tree_busy(struct vfsm
 			else
 				ino_count++;
 
-			if (atomic_read(&p->d_count) > ino_count) {
+			if (p->d_count > ino_count) {
 				top_ino->last_used = jiffies;
 				dput(p);
 				return 1;
@@ -347,7 +347,7 @@ struct dentry *autofs4_expire_indirect(s
 
 			/* Path walk currently on this dentry? */
 			ino_count = atomic_read(&ino->count) + 2;
-			if (atomic_read(&dentry->d_count) > ino_count)
+			if (dentry->d_count > ino_count)
 				goto next;
 
 			/* Can we umount this guy */
@@ -369,7 +369,7 @@ struct dentry *autofs4_expire_indirect(s
 		if (!exp_leaves) {
 			/* Path walk currently on this dentry? */
 			ino_count = atomic_read(&ino->count) + 1;
-			if (atomic_read(&dentry->d_count) > ino_count)
+			if (dentry->d_count > ino_count)
 				goto next;
 
 			if (!autofs4_tree_busy(mnt, dentry, timeout, do_now)) {
@@ -383,7 +383,7 @@ struct dentry *autofs4_expire_indirect(s
 		} else {
 			/* Path walk currently on this dentry? */
 			ino_count = atomic_read(&ino->count) + 1;
-			if (atomic_read(&dentry->d_count) > ino_count)
+			if (dentry->d_count > ino_count)
 				goto next;
 
 			expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
Index: linux-2.6/fs/coda/dir.c
===================================================================
--- linux-2.6.orig/fs/coda/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/coda/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -559,7 +559,7 @@ static int coda_dentry_revalidate(struct
 	if (cii->c_flags & C_FLUSH) 
 		coda_flag_inode_children(inode, C_FLUSH);
 
-	if (atomic_read(&de->d_count) > 1)
+	if (de->d_count > 1)
 		/* pretend it's valid, but don't change the flags */
 		goto out;
 
Index: linux-2.6/fs/ecryptfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ecryptfs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ecryptfs/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -260,7 +260,7 @@ int ecryptfs_lookup_and_interpose_lower(
 				   ecryptfs_dentry->d_parent));
 	lower_inode = lower_dentry->d_inode;
 	fsstack_copy_attr_atime(ecryptfs_dir_inode, lower_dir_dentry->d_inode);
-	BUG_ON(!atomic_read(&lower_dentry->d_count));
+	BUG_ON(!lower_dentry->d_count);
 	ecryptfs_set_dentry_private(ecryptfs_dentry,
 				    kmem_cache_alloc(ecryptfs_dentry_info_cache,
 						     GFP_KERNEL));
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/nfs/dir.c	2010-11-17 01:05:42.000000000 +1100
@@ -1694,7 +1694,7 @@ static int nfs_unlink(struct inode *dir,
 
 	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
-	if (atomic_read(&dentry->d_count) > 1) {
+	if (dentry->d_count > 1) {
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		/* Start asynchronous writeout of the inode */
@@ -1842,7 +1842,7 @@ static int nfs_rename(struct inode *old_
 	dfprintk(VFS, "NFS: rename(%s/%s -> %s/%s, ct=%d)\n",
 		 old_dentry->d_parent->d_name.name, old_dentry->d_name.name,
 		 new_dentry->d_parent->d_name.name, new_dentry->d_name.name,
-		 atomic_read(&new_dentry->d_count));
+		 new_dentry->d_count);
 
 	/*
 	 * For non-directories, check whether the target is busy and if so,
@@ -1860,7 +1860,7 @@ static int nfs_rename(struct inode *old_
 			rehash = new_dentry;
 		}
 
-		if (atomic_read(&new_dentry->d_count) > 2) {
+		if (new_dentry->d_count > 2) {
 			int err;
 
 			/* copy the target dentry's name */
Index: linux-2.6/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.orig/fs/nfsd/vfs.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/nfsd/vfs.c	2010-11-17 00:52:37.000000000 +1100
@@ -1756,8 +1756,7 @@ nfsd_rename(struct svc_rqst *rqstp, stru
 		goto out_dput_new;
 
 	if (svc_msnfs(ffhp) &&
-		((atomic_read(&odentry->d_count) > 1)
-		 || (atomic_read(&ndentry->d_count) > 1))) {
+		((odentry->d_count > 1) || (ndentry->d_count > 1))) {
 			host_err = -EPERM;
 			goto out_dput_new;
 	}
@@ -1843,7 +1842,7 @@ nfsd_unlink(struct svc_rqst *rqstp, stru
 	if (type != S_IFDIR) { /* It's UNLINK */
 #ifdef MSNFS
 		if ((fhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
-			(atomic_read(&rdentry->d_count) > 1)) {
+			(rdentry->d_count > 1)) {
 			host_err = -EPERM;
 		} else
 #endif
Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/kernel/cgroup.c	2010-11-17 01:05:45.000000000 +1100
@@ -3638,9 +3638,7 @@ static int cgroup_rmdir(struct inode *un
 	list_del(&cgrp->sibling);
 	cgroup_unlock_hierarchy(cgrp->root);
 
-	spin_lock(&cgrp->dentry->d_lock);
 	d = dget(cgrp->dentry);
-	spin_unlock(&d->d_lock);
 
 	cgroup_d_remove_dir(d);
 	dput(d);
Index: linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 01:05:45.000000000 +1100
@@ -162,7 +162,7 @@ static void spufs_prune_dir(struct dentr
 		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry)) && dentry->d_inode) {
-			dget_locked(dentry);
+			dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			simple_unlink(dir->d_inode, dentry);
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 01:05:42.000000000 +1100
@@ -280,7 +280,7 @@ static int remove_file(struct dentry *pa
 	spin_lock(&dcache_lock);
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
-		dget_locked(tmp);
+		dget_locked_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
 		spin_unlock(&dcache_lock);
Index: linux-2.6/fs/configfs/inode.c
===================================================================
--- linux-2.6.orig/fs/configfs/inode.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/configfs/inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -253,7 +253,7 @@ void configfs_drop_dentry(struct configf
 		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry) && dentry->d_inode)) {
-			dget_locked(dentry);
+			dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
Index: linux-2.6/fs/ceph/dir.c
===================================================================
--- linux-2.6.orig/fs/ceph/dir.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/ceph/dir.c	2010-11-17 01:05:45.000000000 +1100
@@ -149,7 +149,9 @@ static int __dcache_readdir(struct file
 		di = ceph_dentry(dentry);
 	}
 
-	atomic_inc(&dentry->d_count);
+	spin_lock(&dentry->d_lock);
+	dentry->d_count++;
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
 	dout(" %llu (%llu) dentry %p %.*s %p\n", di->offset, filp->f_pos,
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/ceph/inode.c	2010-11-17 01:05:45.000000000 +1100
@@ -866,8 +866,8 @@ static struct dentry *splice_dentry(stru
 	} else if (realdn) {
 		dout("dn %p (%d) spliced with %p (%d) "
 		     "inode %p ino %llx.%llx\n",
-		     dn, atomic_read(&dn->d_count),
-		     realdn, atomic_read(&realdn->d_count),
+		     dn, dn->d_count,
+		     realdn, realdn->d_count,
 		     realdn->d_inode, ceph_vinop(realdn->d_inode));
 		dput(dn);
 		dn = realdn;
Index: linux-2.6/fs/ceph/mds_client.c
===================================================================
--- linux-2.6.orig/fs/ceph/mds_client.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/ceph/mds_client.c	2010-11-17 00:52:37.000000000 +1100
@@ -1452,7 +1452,7 @@ char *ceph_mdsc_build_path(struct dentry
 	*base = ceph_ino(temp->d_inode);
 	*plen = len;
 	dout("build_path on %p %d built %llx '%.*s'\n",
-	     dentry, atomic_read(&dentry->d_count), *base, len, path);
+	     dentry, dentry->d_count, *base, len, path);
 	return path;
 }
 
Index: linux-2.6/drivers/infiniband/hw/qib/qib_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 01:05:42.000000000 +1100
@@ -456,7 +456,7 @@ static int remove_file(struct dentry *pa
 	spin_lock(&dcache_lock);
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
-		dget_locked(tmp);
+		dget_locked_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
 		spin_unlock(&dcache_lock);
Index: linux-2.6/fs/autofs4/root.c
===================================================================
--- linux-2.6.orig/fs/autofs4/root.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/autofs4/root.c	2010-11-17 01:05:45.000000000 +1100
@@ -436,7 +436,7 @@ static struct dentry *autofs4_lookup_act
 		spin_lock(&active->d_lock);
 
 		/* Already gone? */
-		if (atomic_read(&active->d_count) == 0)
+		if (active->d_count == 0)
 			goto next;
 
 		qstr = &active->d_name;
@@ -452,7 +452,7 @@ static struct dentry *autofs4_lookup_act
 			goto next;
 
 		if (d_unhashed(active)) {
-			dget(active);
+			dget_dlock(active);
 			spin_unlock(&active->d_lock);
 			spin_unlock(&sbi->lookup_lock);
 			spin_unlock(&dcache_lock);
@@ -507,7 +507,7 @@ static struct dentry *autofs4_lookup_exp
 			goto next;
 
 		if (d_unhashed(expiring)) {
-			dget(expiring);
+			dget_dlock(expiring);
 			spin_unlock(&expiring->d_lock);
 			spin_unlock(&sbi->lookup_lock);
 			spin_unlock(&dcache_lock);
Index: linux-2.6/fs/nfs/unlink.c
===================================================================
--- linux-2.6.orig/fs/nfs/unlink.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/nfs/unlink.c	2010-11-17 00:52:37.000000000 +1100
@@ -496,7 +496,7 @@ nfs_sillyrename(struct inode *dir, struc
 
 	dfprintk(VFS, "NFS: silly-rename(%s/%s, ct=%d)\n",
 		dentry->d_parent->d_name.name, dentry->d_name.name,
-		atomic_read(&dentry->d_count));
+		dentry->d_count);
 	nfs_inc_stats(dir, NFSIOS_SILLYRENAME);
 
 	/*
Index: linux-2.6/fs/nilfs2/super.c
===================================================================
--- linux-2.6.orig/fs/nilfs2/super.c	2010-11-17 00:50:51.000000000 +1100
+++ linux-2.6/fs/nilfs2/super.c	2010-11-17 00:52:37.000000000 +1100
@@ -838,7 +838,7 @@ static int nilfs_attach_snapshot(struct
 
 static int nilfs_tree_was_touched(struct dentry *root_dentry)
 {
-	return atomic_read(&root_dentry->d_count) > 1;
+	return root_dentry->d_count > 1;
 }
 
 /**



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 13/28] fs: dcache scale d_unhashed
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (11 preceding siblings ...)
  2010-11-16 14:09 ` [patch 12/28] fs: dcache scale dentry refcount Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-19 19:41   ` Tim Pepper
  2010-11-16 14:09 ` [patch 14/28] fs: dcache scale subdirs Nick Piggin
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-d_unhashed.patch --]
[-- Type: text/plain, Size: 14812 bytes --]

Protect d_unhashed(dentry) condition with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 arch/powerpc/platforms/cell/spufs/inode.c |    3 +
 drivers/usb/core/inode.c                  |    3 +
 fs/autofs4/autofs_i.h                     |   13 ----
 fs/autofs4/expire.c                       |   21 +++++--
 fs/ceph/dir.c                             |    5 +
 fs/configfs/configfs_internal.h           |    2 
 fs/dcache.c                               |   83 ++++++++++++++++++++----------
 fs/libfs.c                                |   29 +++++++---
 fs/ocfs2/dcache.c                         |    5 +
 security/tomoyo/realpath.c                |    1 
 10 files changed, 109 insertions(+), 56 deletions(-)

Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/libfs.c	2010-11-17 01:05:45.000000000 +1100
@@ -16,6 +16,11 @@
 
 #include <asm/uaccess.h>
 
+static inline int simple_positive(struct dentry *dentry)
+{
+	return dentry->d_inode && !d_unhashed(dentry);
+}
+
 int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
 		   struct kstat *stat)
 {
@@ -100,8 +105,10 @@ loff_t dcache_dir_lseek(struct file *fil
 			while (n && p != &file->f_path.dentry->d_subdirs) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				if (!d_unhashed(next) && next->d_inode)
+				spin_lock(&next->d_lock);
+				if (simple_positive(next))
 					n--;
+				spin_unlock(&next->d_lock);
 				p = p->next;
 			}
 			list_add_tail(&cursor->d_u.d_child, p);
@@ -155,9 +162,13 @@ int dcache_readdir(struct file * filp, v
 			for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				if (d_unhashed(next) || !next->d_inode)
+				spin_lock_nested(&next->d_lock, DENTRY_D_LOCK_NESTED);
+				if (!simple_positive(next)) {
+					spin_unlock(&next->d_lock);
 					continue;
+				}
 
+				spin_unlock(&next->d_lock);
 				spin_unlock(&dcache_lock);
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
@@ -259,20 +270,20 @@ int simple_link(struct dentry *old_dentr
 	return 0;
 }
 
-static inline int simple_positive(struct dentry *dentry)
-{
-	return dentry->d_inode && !d_unhashed(dentry);
-}
-
 int simple_empty(struct dentry *dentry)
 {
 	struct dentry *child;
 	int ret = 0;
 
 	spin_lock(&dcache_lock);
-	list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child)
-		if (simple_positive(child))
+	list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+		spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
+		if (simple_positive(child)) {
+			spin_unlock(&child->d_lock);
 			goto out;
+		}
+		spin_unlock(&child->d_lock);
+	}
 	ret = 1;
 out:
 	spin_unlock(&dcache_lock);
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:45.000000000 +1100
@@ -46,6 +46,7 @@
  *   - d_name
  *   - d_lru
  *   - d_count
+ *   - d_unhashed()
  *
  * Ordering:
  * dcache_lock
@@ -53,6 +54,13 @@
  *     dcache_lru_lock
  *     dcache_hash_lock
  *
+ * If there is an ancestor relationship:
+ * dentry->d_parent->...->d_parent->d_lock
+ *   ...
+ *     dentry->d_parent->d_lock
+ *       dentry->d_lock
+ *
+ * If no ancestor relationship:
  * if (dentry1 < dentry2)
  *   dentry1->d_lock
  *     dentry2->d_lock
@@ -337,7 +345,9 @@ int d_invalidate(struct dentry * dentry)
 	 * If it's already been dropped, return OK.
 	 */
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	if (d_unhashed(dentry)) {
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		return 0;
 	}
@@ -346,9 +356,11 @@ int d_invalidate(struct dentry * dentry)
 	 * to get rid of unused child entries.
 	 */
 	if (!list_empty(&dentry->d_subdirs)) {
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		shrink_dcache_parent(dentry);
 		spin_lock(&dcache_lock);
+		spin_lock(&dentry->d_lock);
 	}
 
 	/*
@@ -361,7 +373,6 @@ int d_invalidate(struct dentry * dentry)
 	 * we might still populate it if it was a
 	 * working directory or similar).
 	 */
-	spin_lock(&dentry->d_lock);
 	if (dentry->d_count > 1) {
 		if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
 			spin_unlock(&dentry->d_lock);
@@ -448,35 +459,49 @@ EXPORT_SYMBOL(dget_parent);
  * any other hashed alias over that one unless @want_discon is set,
  * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias.
  */
-
-static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
+static struct dentry *___d_find_alias(struct inode *inode, int want_discon)
 {
-	struct list_head *head, *next, *tmp;
-	struct dentry *alias, *discon_alias=NULL;
+	struct dentry *alias, *discon_alias;
 
-	head = &inode->i_dentry;
-	next = inode->i_dentry.next;
-	while (next != head) {
-		tmp = next;
-		next = tmp->next;
-		prefetch(next);
-		alias = list_entry(tmp, struct dentry, d_alias);
+again:
+	discon_alias = NULL;
+	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+		spin_lock(&alias->d_lock);
  		if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
 			if (IS_ROOT(alias) &&
 			    (alias->d_flags & DCACHE_DISCONNECTED))
 				discon_alias = alias;
-			else if (!want_discon) {
-				__dget_locked(alias);
+			else if (!want_discon)
 				return alias;
-			}
 		}
+		spin_unlock(&alias->d_lock);
 	}
-	if (discon_alias)
-		__dget_locked(discon_alias);
-	return discon_alias;
+	if (discon_alias) {
+		alias = discon_alias;
+		spin_lock(&alias->d_lock);
+ 		if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
+			if (IS_ROOT(alias) &&
+			    (alias->d_flags & DCACHE_DISCONNECTED))
+				return alias;
+		}
+		spin_unlock(&alias->d_lock);
+		goto again;
+	}
+	return NULL;
 }
 
-struct dentry * d_find_alias(struct inode *inode)
+static struct dentry *__d_find_alias(struct inode *inode, int want_discon)
+{
+	struct dentry *alias;
+	alias = ___d_find_alias(inode, want_discon);
+	if (alias) {
+		__dget_locked_dlock(alias);
+		spin_unlock(&alias->d_lock);
+	}
+	return alias;
+}
+
+struct dentry *d_find_alias(struct inode *inode)
 {
 	struct dentry *de = NULL;
 
@@ -759,8 +784,8 @@ static void shrink_dcache_for_umount_sub
 	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	dentry_lru_del(dentry);
-	spin_unlock(&dentry->d_lock);
 	__d_drop(dentry);
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
 	for (;;) {
@@ -775,8 +800,8 @@ static void shrink_dcache_for_umount_sub
 					    d_u.d_child) {
 				spin_lock(&loop->d_lock);
 				dentry_lru_del(loop);
-				spin_unlock(&loop->d_lock);
 				__d_drop(loop);
+				spin_unlock(&loop->d_lock);
 				cond_resched_lock(&dcache_lock);
 			}
 			spin_unlock(&dcache_lock);
@@ -1797,7 +1822,10 @@ static void d_move_locked(struct dentry
 	/*
 	 * XXXX: do we really need to take target->d_lock?
 	 */
-	if (target < dentry) {
+	if (d_ancestor(dentry, target)) {
+		spin_lock(&dentry->d_lock);
+		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
+	} else if (d_ancestor(target, dentry) || target < dentry) {
 		spin_lock(&target->d_lock);
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 	} else {
@@ -2476,15 +2504,18 @@ void d_genocide(struct dentry *root)
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		if (d_unhashed(dentry)||!dentry->d_inode)
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		if (d_unhashed(dentry) || !dentry->d_inode) {
+			spin_unlock(&dentry->d_lock);
 			continue;
+		}
 		if (!list_empty(&dentry->d_subdirs)) {
+			spin_unlock(&dentry->d_lock);
 			this_parent = dentry;
 			goto repeat;
 		}
-		spin_lock(&dentry->d_lock);
-		dentry->d_count--;
-		spin_unlock(&dentry->d_lock);
+ 		dentry->d_count--;
+ 		spin_unlock(&dentry->d_lock);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
Index: linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -166,6 +166,9 @@ static void spufs_prune_dir(struct dentr
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			simple_unlink(dir->d_inode, dentry);
+			/* XXX: what is dcache_lock protecting here? Other
+			 * filesystems (IB, configfs) release dcache_lock
+			 * before unlink */
 			spin_unlock(&dcache_lock);
 			dput(dentry);
 		} else {
Index: linux-2.6/fs/configfs/configfs_internal.h
===================================================================
--- linux-2.6.orig/fs/configfs/configfs_internal.h	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/fs/configfs/configfs_internal.h	2010-11-17 01:05:42.000000000 +1100
@@ -121,6 +121,7 @@ static inline struct config_item *config
 	struct config_item * item = NULL;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	if (!d_unhashed(dentry)) {
 		struct configfs_dirent * sd = dentry->d_fsdata;
 		if (sd->s_type & CONFIGFS_ITEM_LINK) {
@@ -129,6 +130,7 @@ static inline struct config_item *config
 		} else
 			item = config_item_get(sd->s_element);
 	}
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
 	return item;
Index: linux-2.6/fs/ocfs2/dcache.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/dcache.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/fs/ocfs2/dcache.c	2010-11-17 01:05:44.000000000 +1100
@@ -174,13 +174,16 @@ struct dentry *ocfs2_find_local_alias(st
 	list_for_each(p, &inode->i_dentry) {
 		dentry = list_entry(p, struct dentry, d_alias);
 
+		spin_lock(&dentry->d_lock);
 		if (ocfs2_match_dentry(dentry, parent_blkno, skip_unhashed)) {
 			mlog(0, "dentry found: %.*s\n",
 			     dentry->d_name.len, dentry->d_name.name);
 
-			dget_locked(dentry);
+			dget_locked_dlock(dentry);
+			spin_unlock(&dentry->d_lock);
 			break;
 		}
+		spin_unlock(&dentry->d_lock);
 
 		dentry = NULL;
 	}
Index: linux-2.6/security/tomoyo/realpath.c
===================================================================
--- linux-2.6.orig/security/tomoyo/realpath.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/security/tomoyo/realpath.c	2010-11-17 00:52:37.000000000 +1100
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <net/sock.h>
 #include "common.h"
+#include "../../fs/internal.h"
 
 /**
  * tomoyo_encode: Convert binary string to ascii string.
Index: linux-2.6/drivers/usb/core/inode.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/inode.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/drivers/usb/core/inode.c	2010-11-17 01:05:44.000000000 +1100
@@ -348,10 +348,13 @@ static int usbfs_empty (struct dentry *d
 
 	list_for_each(list, &dentry->d_subdirs) {
 		struct dentry *de = list_entry(list, struct dentry, d_u.d_child);
+		spin_lock(&de->d_lock);
 		if (usbfs_positive(de)) {
+			spin_unlock(&de->d_lock);
 			spin_unlock(&dcache_lock);
 			return 0;
 		}
+		spin_unlock(&de->d_lock);
 	}
 
 	spin_unlock(&dcache_lock);
Index: linux-2.6/fs/ceph/dir.c
===================================================================
--- linux-2.6.orig/fs/ceph/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ceph/dir.c	2010-11-17 01:05:45.000000000 +1100
@@ -135,6 +135,7 @@ static int __dcache_readdir(struct file
 			fi->at_end = 1;
 			goto out_unlock;
 		}
+		spin_lock(&dentry->d_lock);
 		if (!d_unhashed(dentry) && dentry->d_inode &&
 		    ceph_snap(dentry->d_inode) != CEPH_SNAPDIR &&
 		    ceph_ino(dentry->d_inode) != CEPH_INO_CEPH &&
@@ -144,13 +145,13 @@ static int __dcache_readdir(struct file
 		     dentry->d_name.len, dentry->d_name.name, di->offset,
 		     filp->f_pos, d_unhashed(dentry) ? " unhashed" : "",
 		     !dentry->d_inode ? " null" : "");
+		spin_unlock(&dentry->d_lock);
 		p = p->prev;
 		dentry = list_entry(p, struct dentry, d_u.d_child);
 		di = ceph_dentry(dentry);
 	}
 
-	spin_lock(&dentry->d_lock);
-	dentry->d_count++;
+	dget_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
Index: linux-2.6/fs/autofs4/autofs_i.h
===================================================================
--- linux-2.6.orig/fs/autofs4/autofs_i.h	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/fs/autofs4/autofs_i.h	2010-11-17 01:05:45.000000000 +1100
@@ -254,19 +254,6 @@ static inline int simple_positive(struct
 	return dentry->d_inode && !d_unhashed(dentry);
 }
 
-static inline int __simple_empty(struct dentry *dentry)
-{
-	struct dentry *child;
-	int ret = 0;
-
-	list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child)
-		if (simple_positive(child))
-			goto out;
-	ret = 1;
-out:
-	return ret;
-}
-
 static inline void autofs4_add_expiring(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
Index: linux-2.6/fs/autofs4/expire.c
===================================================================
--- linux-2.6.orig/fs/autofs4/expire.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/expire.c	2010-11-17 01:05:45.000000000 +1100
@@ -160,14 +160,18 @@ static int autofs4_tree_busy(struct vfsm
 
 	spin_lock(&dcache_lock);
 	for (p = top; p; p = next_dentry(p, top)) {
+		spin_lock(&p->d_lock);
 		/* Negative dentry - give up */
-		if (!simple_positive(p))
+		if (!simple_positive(p)) {
+			spin_unlock(&p->d_lock);
 			continue;
+		}
 
 		DPRINTK("dentry %p %.*s",
 			p, (int) p->d_name.len, p->d_name.name);
 
-		p = dget(p);
+		p = dget_dlock(p);
+		spin_unlock(&p->d_lock);
 		spin_unlock(&dcache_lock);
 
 		/*
@@ -228,14 +232,18 @@ static struct dentry *autofs4_check_leav
 
 	spin_lock(&dcache_lock);
 	for (p = parent; p; p = next_dentry(p, parent)) {
+		spin_lock(&p->d_lock);
 		/* Negative dentry - give up */
-		if (!simple_positive(p))
+		if (!simple_positive(p)) {
+			spin_unlock(&p->d_lock);
 			continue;
+		}
 
 		DPRINTK("dentry %p %.*s",
 			p, (int) p->d_name.len, p->d_name.name);
 
-		p = dget(p);
+		p = dget_dlock(p);
+		spin_unlock(&p->d_lock);
 		spin_unlock(&dcache_lock);
 
 		if (d_mountpoint(p)) {
@@ -324,12 +332,15 @@ struct dentry *autofs4_expire_indirect(s
 		struct dentry *dentry = list_entry(next, struct dentry, d_u.d_child);
 
 		/* Negative dentry - give up */
+		spin_lock(&dentry->d_lock);
 		if (!simple_positive(dentry)) {
 			next = next->next;
+			spin_unlock(&dentry->d_lock);
 			continue;
 		}
 
-		dentry = dget(dentry);
+		dentry = dget_dlock(dentry);
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 
 		spin_lock(&sbi->fs_lock);



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 14/28] fs: dcache scale subdirs
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (12 preceding siblings ...)
  2010-11-16 14:09 ` [patch 13/28] fs: dcache scale d_unhashed Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-19 19:41   ` Tim Pepper
  2010-11-16 14:09 ` [patch 15/28] fs: scale inode alias list Nick Piggin
                   ` (16 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-d_subdirs.patch --]
[-- Type: text/plain, Size: 38929 bytes --]

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 drivers/staging/smbfs/cache.c |    4 
 drivers/usb/core/inode.c      |    8 +
 fs/autofs4/autofs_i.h         |   11 ++
 fs/autofs4/expire.c           |  129 +++++++++++++--------------
 fs/autofs4/root.c             |   18 +++
 fs/ceph/dir.c                 |    6 +
 fs/ceph/inode.c               |    8 +
 fs/coda/cache.c               |    2 
 fs/dcache.c                   |  195 +++++++++++++++++++++++++++++++-----------
 fs/libfs.c                    |   24 +++--
 fs/ncpfs/dir.c                |    3 
 fs/ncpfs/ncplib_kernel.h      |    4 
 fs/notify/fsnotify.c          |    4 
 include/linux/dcache.h        |    1 
 kernel/cgroup.c               |   19 +++-
 security/selinux/selinuxfs.c  |   12 ++
 16 files changed, 314 insertions(+), 134 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:44.000000000 +1100
@@ -47,6 +47,8 @@
  *   - d_lru
  *   - d_count
  *   - d_unhashed()
+ *   - d_parent and d_subdirs
+ *   - childrens' d_child and d_parent
  *
  * Ordering:
  * dcache_lock
@@ -217,24 +219,22 @@ static void dentry_lru_move_tail(struct
  *
  * If this is the root of the dentry tree, return NULL.
  *
- * dcache_lock and d_lock must be held by caller, are dropped by d_kill.
+ * dcache_lock and d_lock and d_parent->d_lock must be held by caller, and
+ * are dropped by d_kill.
  */
-static struct dentry *d_kill(struct dentry *dentry)
+static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
 	__releases(dentry->d_lock)
+	__releases(parent->d_lock)
 	__releases(dcache_lock)
 {
-	struct dentry *parent;
-
 	list_del(&dentry->d_u.d_child);
+	if (parent)
+		spin_unlock(&parent->d_lock);
 	dentry_iput(dentry);
 	/*
 	 * dentry_iput drops the locks, at which point nobody (except
 	 * transient RCU lookups) can reach this dentry.
 	 */
-	if (IS_ROOT(dentry))
-		parent = NULL;
-	else
-		parent = dentry->d_parent;
 	d_free(dentry);
 	return parent;
 }
@@ -270,6 +270,7 @@ static struct dentry *d_kill(struct dent
 
 void dput(struct dentry *dentry)
 {
+	struct dentry *parent;
 	if (!dentry)
 		return;
 
@@ -277,6 +278,10 @@ void dput(struct dentry *dentry)
 	if (dentry->d_count == 1)
 		might_sleep();
 	spin_lock(&dentry->d_lock);
+	if (IS_ROOT(dentry))
+		parent = NULL;
+	else
+		parent = dentry->d_parent;
 	if (dentry->d_count == 1) {
 		if (!spin_trylock(&dcache_lock)) {
 			/*
@@ -288,10 +293,17 @@ void dput(struct dentry *dentry)
 			spin_unlock(&dentry->d_lock);
 			goto repeat;
 		}
+		if (parent && !spin_trylock(&parent->d_lock)) {
+			spin_unlock(&dentry->d_lock);
+			spin_unlock(&dcache_lock);
+			goto repeat;
+		}
 	}
 	dentry->d_count--;
 	if (dentry->d_count) {
 		spin_unlock(&dentry->d_lock);
+		if (parent)
+			spin_unlock(&parent->d_lock);
 		spin_unlock(&dcache_lock);
 		return;
 	}
@@ -313,6 +325,8 @@ void dput(struct dentry *dentry)
 	dentry_lru_add(dentry);
 
  	spin_unlock(&dentry->d_lock);
+	if (parent)
+		spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return;
 
@@ -321,7 +335,7 @@ void dput(struct dentry *dentry)
 kill_it:
 	/* if dentry was on the d_lru list delete it from there */
 	dentry_lru_del(dentry);
-	dentry = d_kill(dentry);
+	dentry = d_kill(dentry, parent);
 	if (dentry)
 		goto repeat;
 }
@@ -547,12 +561,13 @@ EXPORT_SYMBOL(d_prune_aliases);
  * quadratic behavior of shrink_dcache_parent(), but is also expected
  * to be beneficial in reducing dentry cache fragmentation.
  */
-static void prune_one_dentry(struct dentry * dentry)
+static void prune_one_dentry(struct dentry *dentry, struct dentry *parent)
 	__releases(dentry->d_lock)
+	__releases(parent->d_lock)
 	__releases(dcache_lock)
 {
 	__d_drop(dentry);
-	dentry = d_kill(dentry);
+	dentry = d_kill(dentry, parent);
 
 	/*
 	 * Prune ancestors.  Locking is simpler than in dput(),
@@ -560,9 +575,20 @@ static void prune_one_dentry(struct dent
 	 */
 	while (dentry) {
 		spin_lock(&dcache_lock);
+again:
 		spin_lock(&dentry->d_lock);
+		if (IS_ROOT(dentry))
+			parent = NULL;
+		else
+			parent = dentry->d_parent;
+		if (parent && !spin_trylock(&parent->d_lock)) {
+			spin_unlock(&dentry->d_lock);
+			goto again;
+		}
 		dentry->d_count--;
 		if (dentry->d_count) {
+			if (parent)
+				spin_unlock(&parent->d_lock);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			return;
@@ -570,7 +596,7 @@ static void prune_one_dentry(struct dent
 
 		dentry_lru_del(dentry);
 		__d_drop(dentry);
-		dentry = d_kill(dentry);
+		dentry = d_kill(dentry, parent);
 	}
 }
 
@@ -579,29 +605,40 @@ static void shrink_dentry_list(struct li
 	struct dentry *dentry;
 
 	while (!list_empty(list)) {
+		struct dentry *parent;
+
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 
 		if (!spin_trylock(&dentry->d_lock)) {
+relock:
 			spin_unlock(&dcache_lru_lock);
 			cpu_relax();
 			spin_lock(&dcache_lru_lock);
 			continue;
 		}
 
-		__dentry_lru_del(dentry);
-
 		/*
 		 * We found an inuse dentry which was not removed from
 		 * the LRU because of laziness during lookup.  Do not free
 		 * it - just keep it off the LRU list.
 		 */
 		if (dentry->d_count) {
+			__dentry_lru_del(dentry);
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
+		if (IS_ROOT(dentry))
+			parent = NULL;
+		else
+			parent = dentry->d_parent;
+		if (parent && !spin_trylock(&parent->d_lock)) {
+			spin_unlock(&dentry->d_lock);
+			goto relock;
+		}
+		__dentry_lru_del(dentry);
 		spin_unlock(&dcache_lru_lock);
 
-		prune_one_dentry(dentry);
+		prune_one_dentry(dentry, parent);
 		/* dcache_lock and dentry->d_lock dropped */
 		spin_lock(&dcache_lock);
 		spin_lock(&dcache_lru_lock);
@@ -796,14 +833,16 @@ static void shrink_dcache_for_umount_sub
 			/* this is a branch with children - detach all of them
 			 * from the system in one go */
 			spin_lock(&dcache_lock);
+			spin_lock(&dentry->d_lock);
 			list_for_each_entry(loop, &dentry->d_subdirs,
 					    d_u.d_child) {
-				spin_lock(&loop->d_lock);
+				spin_lock_nested(&loop->d_lock,
+						DENTRY_D_LOCK_NESTED);
 				dentry_lru_del(loop);
 				__d_drop(loop);
 				spin_unlock(&loop->d_lock);
-				cond_resched_lock(&dcache_lock);
 			}
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 
 			/* move to the first child */
@@ -831,16 +870,17 @@ static void shrink_dcache_for_umount_sub
 				BUG();
 			}
 
-			if (IS_ROOT(dentry))
+			if (IS_ROOT(dentry)) {
 				parent = NULL;
-			else {
+				list_del(&dentry->d_u.d_child);
+			} else {
 				parent = dentry->d_parent;
 				spin_lock(&parent->d_lock);
 				parent->d_count--;
+				list_del(&dentry->d_u.d_child);
 				spin_unlock(&parent->d_lock);
 			}
 
-			list_del(&dentry->d_u.d_child);
 			detached++;
 
 			inode = dentry->d_inode;
@@ -921,6 +961,7 @@ int have_submounts(struct dentry *parent
 	spin_lock(&dcache_lock);
 	if (d_mountpoint(parent))
 		goto positive;
+	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -928,22 +969,34 @@ int have_submounts(struct dentry *parent
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
+
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 		/* Have we found a mount point ? */
-		if (d_mountpoint(dentry))
+		if (d_mountpoint(dentry)) {
+			spin_unlock(&dentry->d_lock);
+			spin_unlock(&this_parent->d_lock);
 			goto positive;
+		}
 		if (!list_empty(&dentry->d_subdirs)) {
+			spin_unlock(&this_parent->d_lock);
+			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
 			this_parent = dentry;
+			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
 			goto repeat;
 		}
+		spin_unlock(&dentry->d_lock);
 	}
 	/*
 	 * All done at this level ... ascend and resume the search.
 	 */
 	if (this_parent != parent) {
 		next = this_parent->d_u.d_child.next;
+		spin_unlock(&this_parent->d_lock);
 		this_parent = this_parent->d_parent;
+		spin_lock(&this_parent->d_lock);
 		goto resume;
 	}
+	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return 0; /* No mount points found in tree */
 positive:
@@ -973,6 +1026,7 @@ static int select_parent(struct dentry *
 	int found = 0;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -980,8 +1034,9 @@ static int select_parent(struct dentry *
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
+		BUG_ON(this_parent == dentry);
 
-		spin_lock(&dentry->d_lock);
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 
 		/* 
 		 * move only zero ref count dentries to the end 
@@ -994,33 +1049,44 @@ static int select_parent(struct dentry *
 			dentry_lru_del(dentry);
 		}
 
-		spin_unlock(&dentry->d_lock);
-
 		/*
 		 * We can return to the caller if we have found some (this
 		 * ensures forward progress). We'll be coming back to find
 		 * the rest.
 		 */
-		if (found && need_resched())
+		if (found && need_resched()) {
+			spin_unlock(&dentry->d_lock);
 			goto out;
+		}
 
 		/*
 		 * Descend a level if the d_subdirs list is non-empty.
 		 */
 		if (!list_empty(&dentry->d_subdirs)) {
+			spin_unlock(&this_parent->d_lock);
+			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
 			this_parent = dentry;
+			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
 			goto repeat;
 		}
+
+		spin_unlock(&dentry->d_lock);
 	}
 	/*
 	 * All done at this level ... ascend and resume the search.
 	 */
 	if (this_parent != parent) {
+		struct dentry *tmp;
 		next = this_parent->d_u.d_child.next;
-		this_parent = this_parent->d_parent;
+		tmp = this_parent->d_parent;
+		spin_unlock(&this_parent->d_lock);
+		BUG_ON(tmp == this_parent);
+		this_parent = tmp;
+		spin_lock(&this_parent->d_lock);
 		goto resume;
 	}
 out:
+	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return found;
 }
@@ -1121,18 +1187,19 @@ struct dentry *d_alloc(struct dentry * p
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
 	INIT_LIST_HEAD(&dentry->d_alias);
+	INIT_LIST_HEAD(&dentry->d_u.d_child);
 
 	if (parent) {
-		dentry->d_parent = dget(parent);
+		spin_lock(&dcache_lock);
+		spin_lock(&parent->d_lock);
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		dentry->d_parent = dget_dlock(parent);
 		dentry->d_sb = parent->d_sb;
-	} else {
-		INIT_LIST_HEAD(&dentry->d_u.d_child);
-	}
-
-	spin_lock(&dcache_lock);
-	if (parent)
 		list_add(&dentry->d_u.d_child, &parent->d_subdirs);
-	spin_unlock(&dcache_lock);
+		spin_unlock(&dentry->d_lock);
+		spin_unlock(&parent->d_lock);
+		spin_unlock(&dcache_lock);
+	}
 
 	percpu_counter_inc(&nr_dentry);
 
@@ -1650,13 +1717,18 @@ int d_validate(struct dentry *dentry, st
 	struct dentry *child;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dparent->d_lock);
 	list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
 		if (dentry == child) {
-			__dget_locked(dentry);
+			spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+			__dget_locked_dlock(dentry);
+			spin_unlock(&dentry->d_lock);
+			spin_unlock(&dparent->d_lock);
 			spin_unlock(&dcache_lock);
 			return 1;
 		}
 	}
+	spin_unlock(&dparent->d_lock);
 	spin_unlock(&dcache_lock);
 
 	return 0;
@@ -1822,15 +1894,26 @@ static void d_move_locked(struct dentry
 	/*
 	 * XXXX: do we really need to take target->d_lock?
 	 */
-	if (d_ancestor(dentry, target)) {
-		spin_lock(&dentry->d_lock);
-		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
-	} else if (d_ancestor(target, dentry) || target < dentry) {
-		spin_lock(&target->d_lock);
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-	} else {
-		spin_lock(&dentry->d_lock);
-		spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
+	BUG_ON(d_ancestor(dentry, target));
+	BUG_ON(d_ancestor(target, dentry));
+
+	if (IS_ROOT(dentry) || dentry->d_parent == target->d_parent)
+		spin_lock(&target->d_parent->d_lock);
+	else {
+		if (d_ancestor(dentry->d_parent, target->d_parent)) {
+			spin_lock(&dentry->d_parent->d_lock);
+			spin_lock_nested(&target->d_parent->d_lock, DENTRY_D_LOCK_NESTED);
+		} else {
+			spin_lock(&target->d_parent->d_lock);
+			spin_lock_nested(&dentry->d_parent->d_lock, DENTRY_D_LOCK_NESTED);
+		}
+	}
+	if (target < dentry) {
+		spin_lock_nested(&target->d_lock, 2);
+		spin_lock_nested(&dentry->d_lock, 3);
+ 	} else {
+		spin_lock_nested(&dentry->d_lock, 2);
+		spin_lock_nested(&target->d_lock, 3);
 	}
 
 	/* Move the dentry to the target hash queue, if on different bucket */
@@ -1863,6 +1946,10 @@ static void d_move_locked(struct dentry
 	}
 
 	list_add(&dentry->d_u.d_child, &dentry->d_parent->d_subdirs);
+	if (target->d_parent != dentry->d_parent)
+		spin_unlock(&dentry->d_parent->d_lock);
+	if (target->d_parent != target)
+		spin_unlock(&target->d_parent->d_lock);
 	spin_unlock(&target->d_lock);
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
@@ -1963,6 +2050,13 @@ static void __d_materialise_dentry(struc
 	dparent = dentry->d_parent;
 	aparent = anon->d_parent;
 
+	/* XXX: hack */
+	/* returns with anon->d_lock held! */
+	spin_lock(&aparent->d_lock);
+	spin_lock(&dparent->d_lock);
+	spin_lock(&dentry->d_lock);
+	spin_lock(&anon->d_lock);
+
 	dentry->d_parent = (aparent == anon) ? dentry : aparent;
 	list_del(&dentry->d_u.d_child);
 	if (!IS_ROOT(dentry))
@@ -1977,6 +2071,10 @@ static void __d_materialise_dentry(struc
 	else
 		INIT_LIST_HEAD(&anon->d_u.d_child);
 
+	spin_unlock(&dentry->d_lock);
+	spin_unlock(&dparent->d_lock);
+	spin_unlock(&aparent->d_lock);
+
 	anon->d_flags &= ~DCACHE_DISCONNECTED;
 }
 
@@ -2012,7 +2110,6 @@ struct dentry *d_materialise_unique(stru
 			/* Is this an anonymous mountpoint that we could splice
 			 * into our tree? */
 			if (IS_ROOT(alias)) {
-				spin_lock(&alias->d_lock);
 				__d_materialise_dentry(dentry, alias);
 				__d_drop(alias);
 				goto found;
@@ -2497,6 +2594,7 @@ void d_genocide(struct dentry *root)
 	struct list_head *next;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
 resume:
@@ -2510,8 +2608,10 @@ void d_genocide(struct dentry *root)
 			continue;
 		}
 		if (!list_empty(&dentry->d_subdirs)) {
-			spin_unlock(&dentry->d_lock);
+			spin_unlock(&this_parent->d_lock);
+			spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
 			this_parent = dentry;
+			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
 			goto repeat;
 		}
  		dentry->d_count--;
@@ -2519,12 +2619,13 @@ void d_genocide(struct dentry *root)
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		spin_lock(&this_parent->d_lock);
 		this_parent->d_count--;
 		spin_unlock(&this_parent->d_lock);
 		this_parent = this_parent->d_parent;
+		spin_lock(&this_parent->d_lock);
 		goto resume;
 	}
+	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/libfs.c	2010-11-17 01:05:42.000000000 +1100
@@ -81,7 +81,8 @@ int dcache_dir_close(struct inode *inode
 
 loff_t dcache_dir_lseek(struct file *file, loff_t offset, int origin)
 {
-	mutex_lock(&file->f_path.dentry->d_inode->i_mutex);
+	struct dentry *dentry = file->f_path.dentry;
+	mutex_lock(&dentry->d_inode->i_mutex);
 	switch (origin) {
 		case 1:
 			offset += file->f_pos;
@@ -89,7 +90,7 @@ loff_t dcache_dir_lseek(struct file *fil
 			if (offset >= 0)
 				break;
 		default:
-			mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
+			mutex_unlock(&dentry->d_inode->i_mutex);
 			return -EINVAL;
 	}
 	if (offset != file->f_pos) {
@@ -100,22 +101,25 @@ loff_t dcache_dir_lseek(struct file *fil
 			loff_t n = file->f_pos - 2;
 
 			spin_lock(&dcache_lock);
+			spin_lock(&dentry->d_lock);
+			/* d_lock not required for cursor */
 			list_del(&cursor->d_u.d_child);
-			p = file->f_path.dentry->d_subdirs.next;
-			while (n && p != &file->f_path.dentry->d_subdirs) {
+			p = dentry->d_subdirs.next;
+			while (n && p != &dentry->d_subdirs) {
 				struct dentry *next;
 				next = list_entry(p, struct dentry, d_u.d_child);
-				spin_lock(&next->d_lock);
+				spin_lock_nested(&next->d_lock, DENTRY_D_LOCK_NESTED);
 				if (simple_positive(next))
 					n--;
 				spin_unlock(&next->d_lock);
 				p = p->next;
 			}
 			list_add_tail(&cursor->d_u.d_child, p);
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 		}
 	}
-	mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
+	mutex_unlock(&dentry->d_inode->i_mutex);
 	return offset;
 }
 
@@ -156,6 +160,7 @@ int dcache_readdir(struct file * filp, v
 			/* fallthrough */
 		default:
 			spin_lock(&dcache_lock);
+			spin_lock(&dentry->d_lock);
 			if (filp->f_pos == 2)
 				list_move(q, &dentry->d_subdirs);
 
@@ -169,6 +174,7 @@ int dcache_readdir(struct file * filp, v
 				}
 
 				spin_unlock(&next->d_lock);
+				spin_unlock(&dentry->d_lock);
 				spin_unlock(&dcache_lock);
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
@@ -176,11 +182,15 @@ int dcache_readdir(struct file * filp, v
 					    dt_type(next->d_inode)) < 0)
 					return 0;
 				spin_lock(&dcache_lock);
+				spin_lock(&dentry->d_lock);
+				spin_lock_nested(&next->d_lock, DENTRY_D_LOCK_NESTED);
 				/* next is still alive */
 				list_move(q, p);
+				spin_unlock(&next->d_lock);
 				p = q;
 				filp->f_pos++;
 			}
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 	}
 	return 0;
@@ -276,6 +286,7 @@ int simple_empty(struct dentry *dentry)
 	int ret = 0;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
 		spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
 		if (simple_positive(child)) {
@@ -286,6 +297,7 @@ int simple_empty(struct dentry *dentry)
 	}
 	ret = 1;
 out:
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 	return ret;
 }
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:44.000000000 +1100
@@ -337,6 +337,7 @@ static inline struct dentry *dget_dlock(
 	}
 	return dentry;
 }
+
 static inline struct dentry *dget(struct dentry *dentry)
 {
 	if (dentry) {
Index: linux-2.6/drivers/usb/core/inode.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/usb/core/inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -345,18 +345,20 @@ static int usbfs_empty (struct dentry *d
 	struct list_head *list;
 
 	spin_lock(&dcache_lock);
-
+	spin_lock(&dentry->d_lock);
 	list_for_each(list, &dentry->d_subdirs) {
 		struct dentry *de = list_entry(list, struct dentry, d_u.d_child);
-		spin_lock(&de->d_lock);
+
+		spin_lock_nested(&de->d_lock, DENTRY_D_LOCK_NESTED);
 		if (usbfs_positive(de)) {
 			spin_unlock(&de->d_lock);
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			return 0;
 		}
 		spin_unlock(&de->d_lock);
 	}
-
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 	return 1;
 }
Index: linux-2.6/fs/autofs4/expire.c
===================================================================
--- linux-2.6.orig/fs/autofs4/expire.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/expire.c	2010-11-17 01:05:42.000000000 +1100
@@ -91,24 +91,64 @@ static int autofs4_mount_busy(struct vfs
 }
 
 /*
- * Calculate next entry in top down tree traversal.
- * From next_mnt in namespace.c - elegant.
+ * Calculate and dget next entry in top down tree traversal.
  */
-static struct dentry *next_dentry(struct dentry *p, struct dentry *root)
+static struct dentry *get_next_positive_dentry(struct dentry *prev,
+						struct dentry *root)
 {
-	struct list_head *next = p->d_subdirs.next;
+	struct list_head *next;
+	struct dentry *p, *ret;
+
+	if (prev == NULL)
+		return dget(prev);
 
+	spin_lock(&dcache_lock);
+relock:
+	p = prev;
+	spin_lock(&p->d_lock);
+again:
+	next = p->d_subdirs.next;
 	if (next == &p->d_subdirs) {
 		while (1) {
-			if (p == root)
+			struct dentry *parent;
+
+			if (p == root) {
+				spin_unlock(&p->d_lock);
+				spin_unlock(&dcache_lock);
+				dput(prev);
 				return NULL;
+			}
+
+			parent = p->d_parent;
+			if (!spin_trylock(&parent->d_lock)) {
+				spin_unlock(&p->d_lock);
+				cpu_relax();
+				goto relock;
+			}
+			spin_unlock(&p->d_lock);
 			next = p->d_u.d_child.next;
-			if (next != &p->d_parent->d_subdirs)
+			p = parent;
+			if (next != &parent->d_subdirs)
 				break;
-			p = p->d_parent;
 		}
 	}
-	return list_entry(next, struct dentry, d_u.d_child);
+	ret = list_entry(next, struct dentry, d_u.d_child);
+
+	spin_lock_nested(&ret->d_lock, DENTRY_D_LOCK_NESTED);
+	/* Negative dentry - try next */
+	if (!simple_positive(ret)) {
+		spin_unlock(&ret->d_lock);
+		p = ret;
+		goto again;
+	}
+	dget_dlock(ret);
+	spin_unlock(&ret->d_lock);
+	spin_unlock(&p->d_lock);
+	spin_unlock(&dcache_lock);
+
+	dput(prev);
+
+	return ret;
 }
 
 /*
@@ -158,22 +198,11 @@ static int autofs4_tree_busy(struct vfsm
 	if (!simple_positive(top))
 		return 1;
 
-	spin_lock(&dcache_lock);
-	for (p = top; p; p = next_dentry(p, top)) {
-		spin_lock(&p->d_lock);
-		/* Negative dentry - give up */
-		if (!simple_positive(p)) {
-			spin_unlock(&p->d_lock);
-			continue;
-		}
-
+	p = NULL;
+	while ((p = get_next_positive_dentry(p, top))) {
 		DPRINTK("dentry %p %.*s",
 			p, (int) p->d_name.len, p->d_name.name);
 
-		p = dget_dlock(p);
-		spin_unlock(&p->d_lock);
-		spin_unlock(&dcache_lock);
-
 		/*
 		 * Is someone visiting anywhere in the subtree ?
 		 * If there's no mount we need to check the usage
@@ -208,10 +237,7 @@ static int autofs4_tree_busy(struct vfsm
 				return 1;
 			}
 		}
-		dput(p);
-		spin_lock(&dcache_lock);
 	}
-	spin_unlock(&dcache_lock);
 
 	/* Timeout of a tree mount is ultimately determined by its top dentry */
 	if (!autofs4_can_expire(top, timeout, do_now))
@@ -230,36 +256,21 @@ static struct dentry *autofs4_check_leav
 	DPRINTK("parent %p %.*s",
 		parent, (int)parent->d_name.len, parent->d_name.name);
 
-	spin_lock(&dcache_lock);
-	for (p = parent; p; p = next_dentry(p, parent)) {
-		spin_lock(&p->d_lock);
-		/* Negative dentry - give up */
-		if (!simple_positive(p)) {
-			spin_unlock(&p->d_lock);
-			continue;
-		}
-
+	p = NULL;
+	while ((p = get_next_positive_dentry(p, parent))) {
 		DPRINTK("dentry %p %.*s",
 			p, (int) p->d_name.len, p->d_name.name);
 
-		p = dget_dlock(p);
-		spin_unlock(&p->d_lock);
-		spin_unlock(&dcache_lock);
-
 		if (d_mountpoint(p)) {
 			/* Can we umount this guy */
 			if (autofs4_mount_busy(mnt, p))
-				goto cont;
+				continue;
 
 			/* Can we expire this guy */
 			if (autofs4_can_expire(p, timeout, do_now))
 				return p;
 		}
-cont:
-		dput(p);
-		spin_lock(&dcache_lock);
 	}
-	spin_unlock(&dcache_lock);
 	return NULL;
 }
 
@@ -310,8 +321,8 @@ struct dentry *autofs4_expire_indirect(s
 {
 	unsigned long timeout;
 	struct dentry *root = sb->s_root;
+	struct dentry *dentry;
 	struct dentry *expired = NULL;
-	struct list_head *next;
 	int do_now = how & AUTOFS_EXP_IMMEDIATE;
 	int exp_leaves = how & AUTOFS_EXP_LEAVES;
 	struct autofs_info *ino;
@@ -323,26 +334,8 @@ struct dentry *autofs4_expire_indirect(s
 	now = jiffies;
 	timeout = sbi->exp_timeout;
 
-	spin_lock(&dcache_lock);
-	next = root->d_subdirs.next;
-
-	/* On exit from the loop expire is set to a dgot dentry
-	 * to expire or it's NULL */
-	while ( next != &root->d_subdirs ) {
-		struct dentry *dentry = list_entry(next, struct dentry, d_u.d_child);
-
-		/* Negative dentry - give up */
-		spin_lock(&dentry->d_lock);
-		if (!simple_positive(dentry)) {
-			next = next->next;
-			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-
-		dentry = dget_dlock(dentry);
-		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
-
+	dentry = NULL;
+	while ((dentry = get_next_positive_dentry(dentry, root))) {
 		spin_lock(&sbi->fs_lock);
 		ino = autofs4_dentry_ino(dentry);
 
@@ -405,11 +398,7 @@ struct dentry *autofs4_expire_indirect(s
 		}
 next:
 		spin_unlock(&sbi->fs_lock);
-		dput(dentry);
-		spin_lock(&dcache_lock);
-		next = next->next;
 	}
-	spin_unlock(&dcache_lock);
 	return NULL;
 
 found:
@@ -420,7 +409,11 @@ struct dentry *autofs4_expire_indirect(s
 	init_completion(&ino->expire_complete);
 	spin_unlock(&sbi->fs_lock);
 	spin_lock(&dcache_lock);
-	list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child);
+	spin_lock(&expired->d_parent->d_lock);
+	spin_lock_nested(&expired->d_lock, DENTRY_D_LOCK_NESTED);
+  	list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child);
+	spin_unlock(&expired->d_lock);
+	spin_unlock(&expired->d_parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return expired;
 }
Index: linux-2.6/fs/autofs4/root.c
===================================================================
--- linux-2.6.orig/fs/autofs4/root.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/root.c	2010-11-17 01:05:42.000000000 +1100
@@ -143,10 +143,13 @@ static int autofs4_dir_open(struct inode
 	 * it.
 	 */
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	if (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs)) {
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		return -ENOENT;
 	}
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
 out:
@@ -253,7 +256,9 @@ static void *autofs4_follow_link(struct
 	lookup_type = autofs4_need_mount(nd->flags);
 	spin_lock(&sbi->fs_lock);
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	if (!(lookup_type || ino->flags & AUTOFS_INF_PENDING)) {
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		spin_unlock(&sbi->fs_lock);
 		goto follow;
@@ -266,6 +271,7 @@ static void *autofs4_follow_link(struct
 	 */
 	if (ino->flags & AUTOFS_INF_PENDING ||
 	    (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs))) {
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 		spin_unlock(&sbi->fs_lock);
 
@@ -275,6 +281,7 @@ static void *autofs4_follow_link(struct
 
 		goto follow;
 	}
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 	spin_unlock(&sbi->fs_lock);
 follow:
@@ -347,10 +354,12 @@ static int autofs4_revalidate(struct den
 
 	/* Check for a non-mountpoint directory with no contents */
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	if (S_ISDIR(dentry->d_inode->i_mode) &&
 	    !d_mountpoint(dentry) && list_empty(&dentry->d_subdirs)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
+		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_lock);
 
 		/* The daemon never causes a mount to trigger */
@@ -367,6 +376,7 @@ static int autofs4_revalidate(struct den
 
 		return status;
 	}
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 
 	return 1;
@@ -776,12 +786,16 @@ static int autofs4_dir_rmdir(struct inod
 		return -EACCES;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&sbi->lookup_lock);
+	spin_lock(&dentry->d_lock);
 	if (!list_empty(&dentry->d_subdirs)) {
+		spin_unlock(&dentry->d_lock);
+		spin_unlock(&sbi->lookup_lock);
 		spin_unlock(&dcache_lock);
 		return -ENOTEMPTY;
 	}
-	autofs4_add_expiring(dentry);
-	spin_lock(&dentry->d_lock);
+	__autofs4_add_expiring(dentry);
+	spin_unlock(&sbi->lookup_lock);
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
Index: linux-2.6/fs/coda/cache.c
===================================================================
--- linux-2.6.orig/fs/coda/cache.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/fs/coda/cache.c	2010-11-17 01:05:42.000000000 +1100
@@ -94,6 +94,7 @@ static void coda_flag_children(struct de
 	struct dentry *de;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	list_for_each(child, &parent->d_subdirs)
 	{
 		de = list_entry(child, struct dentry, d_u.d_child);
@@ -102,6 +103,7 @@ static void coda_flag_children(struct de
 			continue;
 		coda_flag_inode(de->d_inode, flag);
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return; 
 }
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 01:05:42.000000000 +1100
@@ -395,6 +395,7 @@ ncp_dget_fpos(struct dentry *dentry, str
 
 	/* If a pointer is invalid, we search the dentry. */
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dent = list_entry(next, struct dentry, d_u.d_child);
@@ -403,11 +404,13 @@ ncp_dget_fpos(struct dentry *dentry, str
 				dget_locked(dent);
 			else
 				dent = NULL;
+			spin_unlock(&parent->d_lock);
 			spin_unlock(&dcache_lock);
 			goto out;
 		}
 		next = next->next;
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return NULL;
 
Index: linux-2.6/fs/ncpfs/ncplib_kernel.h
===================================================================
--- linux-2.6.orig/fs/ncpfs/ncplib_kernel.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/ncplib_kernel.h	2010-11-17 01:05:42.000000000 +1100
@@ -194,6 +194,7 @@ ncp_renew_dentries(struct dentry *parent
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dentry = list_entry(next, struct dentry, d_u.d_child);
@@ -205,6 +206,7 @@ ncp_renew_dentries(struct dentry *parent
 
 		next = next->next;
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
@@ -216,6 +218,7 @@ ncp_invalidate_dircache_entries(struct d
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dentry = list_entry(next, struct dentry, d_u.d_child);
@@ -223,6 +226,7 @@ ncp_invalidate_dircache_entries(struct d
 		ncp_age_dentry(server, dentry);
 		next = next->next;
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/kernel/cgroup.c	2010-11-17 01:05:42.000000000 +1100
@@ -875,23 +875,31 @@ static void cgroup_clear_directory(struc
 
 	BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
 	spin_lock(&dcache_lock);
+	spin_lock(&dentry->d_lock);
 	node = dentry->d_subdirs.next;
 	while (node != &dentry->d_subdirs) {
 		struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
+
+		spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
 		list_del_init(node);
 		if (d->d_inode) {
 			/* This should never be called on a cgroup
 			 * directory with child cgroups */
 			BUG_ON(d->d_inode->i_mode & S_IFDIR);
-			d = dget_locked(d);
+			dget_locked_dlock(d);
+			spin_unlock(&d->d_lock);
+			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			d_delete(d);
 			simple_unlink(dentry->d_inode, d);
 			dput(d);
 			spin_lock(&dcache_lock);
-		}
+			spin_lock(&dentry->d_lock);
+		} else
+			spin_unlock(&d->d_lock);
 		node = dentry->d_subdirs.next;
 	}
+	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
@@ -900,10 +908,17 @@ static void cgroup_clear_directory(struc
  */
 static void cgroup_d_remove_dir(struct dentry *dentry)
 {
+	struct dentry *parent;
+
 	cgroup_clear_directory(dentry);
 
 	spin_lock(&dcache_lock);
+	parent = dentry->d_parent;
+	spin_lock(&parent->d_lock);
+	spin_lock(&dentry->d_lock);
 	list_del_init(&dentry->d_u.d_child);
+	spin_unlock(&dentry->d_lock);
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 	remove_dir(dentry);
 }
Index: linux-2.6/security/selinux/selinuxfs.c
===================================================================
--- linux-2.6.orig/security/selinux/selinuxfs.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/security/selinux/selinuxfs.c	2010-11-17 01:05:42.000000000 +1100
@@ -1146,22 +1146,30 @@ static void sel_remove_entries(struct de
 	struct list_head *node;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&de->d_lock);
 	node = de->d_subdirs.next;
 	while (node != &de->d_subdirs) {
 		struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
+
+		spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED);
 		list_del_init(node);
 
 		if (d->d_inode) {
-			d = dget_locked(d);
+			dget_locked_dlock(d);
+			spin_unlock(&de->d_lock);
+			spin_unlock(&d->d_lock);
 			spin_unlock(&dcache_lock);
 			d_delete(d);
 			simple_unlink(de->d_inode, d);
 			dput(d);
 			spin_lock(&dcache_lock);
-		}
+			spin_lock(&de->d_lock);
+		} else
+			spin_unlock(&d->d_lock);
 		node = de->d_subdirs.next;
 	}
 
+	spin_unlock(&de->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6/fs/notify/fsnotify.c
===================================================================
--- linux-2.6.orig/fs/notify/fsnotify.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/fs/notify/fsnotify.c	2010-11-17 01:05:44.000000000 +1100
@@ -68,17 +68,19 @@ void __fsnotify_update_child_dentry_flag
 		/* run all of the children of the original inode and fix their
 		 * d_flags to indicate parental interest (their parent is the
 		 * original inode) */
+		spin_lock(&alias->d_lock);
 		list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
 			if (!child->d_inode)
 				continue;
 
-			spin_lock(&child->d_lock);
+			spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
 			if (watched)
 				child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
 			else
 				child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
 			spin_unlock(&child->d_lock);
 		}
+		spin_unlock(&alias->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
Index: linux-2.6/fs/ceph/dir.c
===================================================================
--- linux-2.6.orig/fs/ceph/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ceph/dir.c	2010-11-17 01:05:42.000000000 +1100
@@ -112,6 +112,7 @@ static int __dcache_readdir(struct file
 	     last);
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 
 	/* start at beginning? */
 	if (filp->f_pos == 2 || (last &&
@@ -135,7 +136,7 @@ static int __dcache_readdir(struct file
 			fi->at_end = 1;
 			goto out_unlock;
 		}
-		spin_lock(&dentry->d_lock);
+		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 		if (!d_unhashed(dentry) && dentry->d_inode &&
 		    ceph_snap(dentry->d_inode) != CEPH_SNAPDIR &&
 		    ceph_ino(dentry->d_inode) != CEPH_INO_CEPH &&
@@ -153,6 +154,7 @@ static int __dcache_readdir(struct file
 
 	dget_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 
 	dout(" %llu (%llu) dentry %p %.*s %p\n", di->offset, filp->f_pos,
@@ -187,10 +189,12 @@ static int __dcache_readdir(struct file
 	}
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	p = p->prev;	/* advance to next dentry */
 	goto more;
 
 out_unlock:
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 out:
 	if (last)
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ceph/inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -829,11 +829,13 @@ static void ceph_set_dentry_offset(struc
 	spin_unlock(&inode->i_lock);
 
 	spin_lock(&dcache_lock);
-	spin_lock(&dn->d_lock);
+	spin_lock(&dir->d_lock);
+	spin_lock_nested(&dn->d_lock, DENTRY_D_LOCK_NESTED);
 	list_move(&dn->d_u.d_child, &dir->d_subdirs);
 	dout("set_dentry_offset %p %lld (%p %p)\n", dn, di->offset,
 	     dn->d_u.d_child.prev, dn->d_u.d_child.next);
 	spin_unlock(&dn->d_lock);
+	spin_unlock(&dir->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
@@ -1218,9 +1220,11 @@ int ceph_readdir_prepopulate(struct ceph
 		} else {
 			/* reorder parent's d_subdirs */
 			spin_lock(&dcache_lock);
-			spin_lock(&dn->d_lock);
+			spin_lock(&parent->d_lock);
+			spin_lock_nested(&dn->d_lock, DENTRY_D_LOCK_NESTED);
 			list_move(&dn->d_u.d_child, &parent->d_subdirs);
 			spin_unlock(&dn->d_lock);
+			spin_unlock(&parent->d_lock);
 			spin_unlock(&dcache_lock);
 		}
 
Index: linux-2.6/fs/autofs4/autofs_i.h
===================================================================
--- linux-2.6.orig/fs/autofs4/autofs_i.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/autofs_i.h	2010-11-17 01:05:42.000000000 +1100
@@ -254,6 +254,17 @@ static inline int simple_positive(struct
 	return dentry->d_inode && !d_unhashed(dentry);
 }
 
+static inline void __autofs4_add_expiring(struct dentry *dentry)
+{
+	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
+	struct autofs_info *ino = autofs4_dentry_ino(dentry);
+	if (ino) {
+		if (list_empty(&ino->expiring))
+			list_add(&ino->expiring, &sbi->expiring_list);
+	}
+	return;
+}
+
 static inline void autofs4_add_expiring(struct dentry *dentry)
 {
 	struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
Index: linux-2.6/drivers/staging/smbfs/cache.c
===================================================================
--- linux-2.6.orig/drivers/staging/smbfs/cache.c	2010-11-17 00:50:50.000000000 +1100
+++ linux-2.6/drivers/staging/smbfs/cache.c	2010-11-17 01:05:42.000000000 +1100
@@ -63,6 +63,7 @@ smb_invalidate_dircache_entries(struct d
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dentry = list_entry(next, struct dentry, d_u.d_child);
@@ -70,6 +71,7 @@ smb_invalidate_dircache_entries(struct d
 		smb_age_dentry(server, dentry);
 		next = next->next;
 	}
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 }
 
@@ -97,6 +99,7 @@ smb_dget_fpos(struct dentry *dentry, str
 
 	/* If a pointer is invalid, we search the dentry. */
 	spin_lock(&dcache_lock);
+	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
 		dent = list_entry(next, struct dentry, d_u.d_child);
@@ -111,6 +114,7 @@ smb_dget_fpos(struct dentry *dentry, str
 	}
 	dent = NULL;
 out_unlock:
+	spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_lock);
 	return dent;
 }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 15/28] fs: scale inode alias list
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (13 preceding siblings ...)
  2010-11-16 14:09 ` [patch 14/28] fs: dcache scale subdirs Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-19 19:41   ` Tim Pepper
  2010-11-16 14:09 ` [patch 16/28] fs: Use rename lock and RCU for multi-step operations Nick Piggin
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-scale-i_dentry.patch --]
[-- Type: text/plain, Size: 16461 bytes --]

Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/9p/vfs_inode.c      |    2 +
 fs/affs/amigaffs.c     |    2 +
 fs/cifs/inode.c        |    3 ++
 fs/dcache.c            |   66 +++++++++++++++++++++++++++++++++++++++++++------
 fs/exportfs/expfs.c    |    4 ++
 fs/nfs/getroot.c       |    4 ++
 fs/notify/fsnotify.c   |    2 +
 fs/ocfs2/dcache.c      |    3 +-
 include/linux/dcache.h |    1 
 9 files changed, 78 insertions(+), 9 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:43.000000000 +1100
@@ -37,6 +37,8 @@
 
 /*
  * Usage:
+ * dcache_inode_lock protects:
+ *   - i_dentry, d_alias, d_inode
  * dcache_hash_lock protects:
  *   - the dcache hash table
  * dcache_lru_lock protects:
@@ -49,12 +51,14 @@
  *   - d_unhashed()
  *   - d_parent and d_subdirs
  *   - childrens' d_child and d_parent
+ *   - d_alias, d_inode
  *
  * Ordering:
  * dcache_lock
- *   dentry->d_lock
- *     dcache_lru_lock
- *     dcache_hash_lock
+ *   dcache_inode_lock
+ *     dentry->d_lock
+ *       dcache_lru_lock
+ *       dcache_hash_lock
  *
  * If there is an ancestor relationship:
  * dentry->d_parent->...->d_parent->d_lock
@@ -70,11 +74,13 @@
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
+EXPORT_SYMBOL(dcache_inode_lock);
 EXPORT_SYMBOL(dcache_hash_lock);
 EXPORT_SYMBOL(dcache_lock);
 
@@ -148,6 +154,7 @@ static void d_free(struct dentry *dentry
  */
 static void dentry_iput(struct dentry * dentry)
 	__releases(dentry->d_lock)
+	__releases(dcache_inode_lock)
 	__releases(dcache_lock)
 {
 	struct inode *inode = dentry->d_inode;
@@ -155,6 +162,7 @@ static void dentry_iput(struct dentry *
 		dentry->d_inode = NULL;
 		list_del_init(&dentry->d_alias);
 		spin_unlock(&dentry->d_lock);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 		if (!inode->i_nlink)
 			fsnotify_inoderemove(inode);
@@ -164,6 +172,7 @@ static void dentry_iput(struct dentry *
 			iput(inode);
 	} else {
 		spin_unlock(&dentry->d_lock);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 	}
 }
@@ -225,6 +234,7 @@ static void dentry_lru_move_tail(struct
 static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
 	__releases(dentry->d_lock)
 	__releases(parent->d_lock)
+	__releases(dcache_inode_lock)
 	__releases(dcache_lock)
 {
 	list_del(&dentry->d_u.d_child);
@@ -290,13 +300,18 @@ void dput(struct dentry *dentry)
 			 * want to reduce dcache_lock anyway so this will
 			 * get improved.
 			 */
+drop1:
 			spin_unlock(&dentry->d_lock);
 			goto repeat;
 		}
-		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dentry->d_lock);
+		if (!spin_trylock(&dcache_inode_lock)) {
+drop2:
 			spin_unlock(&dcache_lock);
-			goto repeat;
+			goto drop1;
+		}
+		if (parent && !spin_trylock(&parent->d_lock)) {
+			spin_unlock(&dcache_inode_lock);
+			goto drop2;
 		}
 	}
 	dentry->d_count--;
@@ -327,6 +342,7 @@ void dput(struct dentry *dentry)
  	spin_unlock(&dentry->d_lock);
 	if (parent)
 		spin_unlock(&parent->d_lock);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	return;
 
@@ -521,7 +537,9 @@ struct dentry *d_find_alias(struct inode
 
 	if (!list_empty(&inode->i_dentry)) {
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
 		de = __d_find_alias(inode, 0);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 	}
 	return de;
@@ -537,18 +555,21 @@ void d_prune_aliases(struct inode *inode
 	struct dentry *dentry;
 restart:
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		spin_lock(&dentry->d_lock);
 		if (!dentry->d_count) {
 			__dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
+			spin_unlock(&dcache_inode_lock);
 			spin_unlock(&dcache_lock);
 			dput(dentry);
 			goto restart;
 		}
 		spin_unlock(&dentry->d_lock);
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(d_prune_aliases);
@@ -564,6 +585,7 @@ EXPORT_SYMBOL(d_prune_aliases);
 static void prune_one_dentry(struct dentry *dentry, struct dentry *parent)
 	__releases(dentry->d_lock)
 	__releases(parent->d_lock)
+	__releases(dcache_inode_lock)
 	__releases(dcache_lock)
 {
 	__d_drop(dentry);
@@ -575,6 +597,7 @@ static void prune_one_dentry(struct dent
 	 */
 	while (dentry) {
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
 again:
 		spin_lock(&dentry->d_lock);
 		if (IS_ROOT(dentry))
@@ -590,6 +613,7 @@ static void prune_one_dentry(struct dent
 			if (parent)
 				spin_unlock(&parent->d_lock);
 			spin_unlock(&dentry->d_lock);
+			spin_unlock(&dcache_inode_lock);
 			spin_unlock(&dcache_lock);
 			return;
 		}
@@ -639,8 +663,9 @@ static void shrink_dentry_list(struct li
 		spin_unlock(&dcache_lru_lock);
 
 		prune_one_dentry(dentry, parent);
-		/* dcache_lock and dentry->d_lock dropped */
+		/* dcache_lock, dcache_inode_lock and dentry->d_lock dropped */
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
 		spin_lock(&dcache_lru_lock);
 	}
 }
@@ -662,6 +687,7 @@ static void __shrink_dcache_sb(struct su
 	int cnt = *count;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 relock:
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
@@ -700,8 +726,8 @@ static void __shrink_dcache_sb(struct su
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
 	spin_unlock(&dcache_lru_lock);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
-
 }
 
 /**
@@ -795,12 +821,14 @@ void shrink_dcache_sb(struct super_block
 	LIST_HEAD(tmp);
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
 		list_splice_init(&sb->s_dentry_lru, &tmp);
 		shrink_dentry_list(&tmp);
 	}
 	spin_unlock(&dcache_lru_lock);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
@@ -1221,9 +1249,11 @@ EXPORT_SYMBOL(d_alloc_name);
 /* the caller must hold dcache_lock */
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
+	spin_lock(&dentry->d_lock);
 	if (inode)
 		list_add(&dentry->d_alias, &inode->i_dentry);
 	dentry->d_inode = inode;
+	spin_unlock(&dentry->d_lock);
 	fsnotify_d_instantiate(dentry, inode);
 }
 
@@ -1246,7 +1276,9 @@ void d_instantiate(struct dentry *entry,
 {
 	BUG_ON(!list_empty(&entry->d_alias));
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	__d_instantiate(entry, inode);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	security_d_instantiate(entry, inode);
 }
@@ -1307,7 +1339,9 @@ struct dentry *d_instantiate_unique(stru
 	BUG_ON(!list_empty(&entry->d_alias));
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	result = __d_instantiate_unique(entry, inode);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 
 	if (!result) {
@@ -1398,8 +1432,10 @@ struct dentry *d_obtain_alias(struct ino
 	tmp->d_parent = tmp; /* make sure dput doesn't croak */
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	res = __d_find_alias(inode, 0);
 	if (res) {
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 		dput(tmp);
 		goto out_iput;
@@ -1416,6 +1452,7 @@ struct dentry *d_obtain_alias(struct ino
 	hlist_add_head(&tmp->d_hash, &inode->i_sb->s_anon);
 	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&tmp->d_lock);
+	spin_unlock(&dcache_inode_lock);
 
 	spin_unlock(&dcache_lock);
 	return tmp;
@@ -1448,9 +1485,11 @@ struct dentry *d_splice_alias(struct ino
 
 	if (inode && S_ISDIR(inode->i_mode)) {
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
 		new = __d_find_alias(inode, 1);
 		if (new) {
 			BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+			spin_unlock(&dcache_inode_lock);
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(new, inode);
 			d_move(new, dentry);
@@ -1458,6 +1497,7 @@ struct dentry *d_splice_alias(struct ino
 		} else {
 			/* already taking dcache_lock, so d_add() by hand */
 			__d_instantiate(dentry, inode);
+			spin_unlock(&dcache_inode_lock);
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(dentry, inode);
 			d_rehash(dentry);
@@ -1532,8 +1572,10 @@ struct dentry *d_add_ci(struct dentry *d
 	 * already has a dentry.
 	 */
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	if (!S_ISDIR(inode->i_mode) || list_empty(&inode->i_dentry)) {
 		__d_instantiate(found, inode);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 		security_d_instantiate(found, inode);
 		return found;
@@ -1545,6 +1587,7 @@ struct dentry *d_add_ci(struct dentry *d
 	 */
 	new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
 	dget_locked(new);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	security_d_instantiate(found, inode);
 	d_move(new, found);
@@ -1763,6 +1806,7 @@ void d_delete(struct dentry * dentry)
 	 * Are we the only user?
 	 */
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (dentry->d_count == 1) {
@@ -1776,6 +1820,7 @@ void d_delete(struct dentry * dentry)
 		__d_drop(dentry);
 
 	spin_unlock(&dentry->d_lock);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 
 	fsnotify_nameremove(dentry, isdir);
@@ -2003,6 +2048,7 @@ struct dentry *d_ancestor(struct dentry
  */
 static struct dentry *__d_unalias(struct dentry *dentry, struct dentry *alias)
 	__releases(dcache_lock)
+	__releases(dcache_inode_lock)
 {
 	struct mutex *m1 = NULL, *m2 = NULL;
 	struct dentry *ret;
@@ -2028,6 +2074,7 @@ static struct dentry *__d_unalias(struct
 	d_move_locked(alias, dentry);
 	ret = alias;
 out_err:
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	if (m2)
 		mutex_unlock(m2);
@@ -2093,6 +2140,7 @@ struct dentry *d_materialise_unique(stru
 	BUG_ON(!d_unhashed(dentry));
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 
 	if (!inode) {
 		actual = dentry;
@@ -2136,6 +2184,7 @@ struct dentry *d_materialise_unique(stru
 	_d_rehash(actual);
 	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&actual->d_lock);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 out_nolock:
 	if (actual == dentry) {
@@ -2147,6 +2196,7 @@ struct dentry *d_materialise_unique(stru
 	return actual;
 
 shouldnt_be_hashed:
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	BUG();
 }
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:43.000000000 +1100
@@ -181,6 +181,7 @@ struct dentry_operations {
 
 #define DCACHE_CANT_MOUNT	0x0100
 
+extern spinlock_t dcache_inode_lock;
 extern spinlock_t dcache_hash_lock;
 extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
Index: linux-2.6/fs/exportfs/expfs.c
===================================================================
--- linux-2.6.orig/fs/exportfs/expfs.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/exportfs/expfs.c	2010-11-17 01:05:42.000000000 +1100
@@ -48,8 +48,10 @@ find_acceptable_alias(struct dentry *res
 		return result;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &result->d_inode->i_dentry, d_alias) {
 		dget_locked(dentry);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 		if (toput)
 			dput(toput);
@@ -58,8 +60,10 @@ find_acceptable_alias(struct dentry *res
 			return dentry;
 		}
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
 		toput = dentry;
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 
 	if (toput)
Index: linux-2.6/fs/affs/amigaffs.c
===================================================================
--- linux-2.6.orig/fs/affs/amigaffs.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/affs/amigaffs.c	2010-11-17 01:05:42.000000000 +1100
@@ -129,6 +129,7 @@ affs_fix_dcache(struct dentry *dentry, u
 	struct list_head *head, *next;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	head = &inode->i_dentry;
 	next = head->next;
 	while (next != head) {
@@ -139,6 +140,7 @@ affs_fix_dcache(struct dentry *dentry, u
 		}
 		next = next->next;
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6/fs/ocfs2/dcache.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ocfs2/dcache.c	2010-11-17 01:05:42.000000000 +1100
@@ -170,7 +170,7 @@ struct dentry *ocfs2_find_local_alias(st
 	struct dentry *dentry = NULL;
 
 	spin_lock(&dcache_lock);
-
+	spin_lock(&dcache_inode_lock);
 	list_for_each(p, &inode->i_dentry) {
 		dentry = list_entry(p, struct dentry, d_alias);
 
@@ -188,6 +188,7 @@ struct dentry *ocfs2_find_local_alias(st
 		dentry = NULL;
 	}
 
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 
 	return dentry;
Index: linux-2.6/fs/nfs/getroot.c
===================================================================
--- linux-2.6.orig/fs/nfs/getroot.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/nfs/getroot.c	2010-11-17 01:05:42.000000000 +1100
@@ -64,7 +64,11 @@ static int nfs_superblock_set_dummy_root
 		 * Oops, since the test for IS_ROOT() will fail.
 		 */
 		spin_lock(&dcache_lock);
+		spin_lock(&dcache_inode_lock);
+		spin_lock(&sb->s_root->d_lock);
 		list_del_init(&sb->s_root->d_alias);
+		spin_unlock(&sb->s_root->d_lock);
+		spin_unlock(&dcache_inode_lock);
 		spin_unlock(&dcache_lock);
 	}
 	return 0;
Index: linux-2.6/fs/notify/fsnotify.c
===================================================================
--- linux-2.6.orig/fs/notify/fsnotify.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/notify/fsnotify.c	2010-11-17 01:05:42.000000000 +1100
@@ -60,6 +60,7 @@ void __fsnotify_update_child_dentry_flag
 	watched = fsnotify_inode_watches_children(inode);
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	/* run all of the dentries associated with this inode.  Since this is a
 	 * directory, there damn well better only be one item on this list */
 	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
@@ -82,6 +83,7 @@ void __fsnotify_update_child_dentry_flag
 		}
 		spin_unlock(&alias->d_lock);
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6/fs/9p/vfs_inode.c
===================================================================
--- linux-2.6.orig/fs/9p/vfs_inode.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/9p/vfs_inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -271,9 +271,11 @@ static struct dentry *v9fs_dentry_from_d
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	/* Directory should have only one entry. */
 	BUG_ON(S_ISDIR(inode->i_mode) && !list_is_singular(&inode->i_dentry));
 	dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	return dentry;
 }
Index: linux-2.6/fs/cifs/inode.c
===================================================================
--- linux-2.6.orig/fs/cifs/inode.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/cifs/inode.c	2010-11-17 01:05:42.000000000 +1100
@@ -805,12 +805,15 @@ inode_has_hashed_dentries(struct inode *
 	struct dentry *dentry;
 
 	spin_lock(&dcache_lock);
+	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		if (!d_unhashed(dentry) || IS_ROOT(dentry)) {
+			spin_unlock(&dcache_inode_lock);
 			spin_unlock(&dcache_lock);
 			return true;
 		}
 	}
+	spin_unlock(&dcache_inode_lock);
 	spin_unlock(&dcache_lock);
 	return false;
 }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 16/28] fs: Use rename lock and RCU for multi-step operations
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (14 preceding siblings ...)
  2010-11-16 14:09 ` [patch 15/28] fs: scale inode alias list Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-19 19:42   ` Tim Pepper
  2010-11-16 14:09 ` [patch 17/28] fs: increase d_name lock coverage Nick Piggin
                   ` (14 subsequent siblings)
  30 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache_lock-multi-step.patch --]
[-- Type: text/plain, Size: 15237 bytes --]

The remaining usages for dcache_lock is to allow atomic, multi-step read-side
operations over the directory tree by excluding modifications to the tree.
Also, to walk in the leaf->root direction in the tree where we don't have
a natural d_lock ordering.

This could be accomplished by taking every d_lock, but this would mean a
huge number of locks and actually gets very tricky.

Solve this instead by using the rename seqlock for multi-step read-side
operations, retry in case of a rename so we don't walk up the wrong parent.
Concurrent dentry insertions are not serialised against.  Concurrent deletes
are tricky when walking up the directory: our parent might have been deleted
when dropping locks so also need to check and retry for that.

We can also use the rename lock in cases where livelock is a worry (and it
is introduced in subsequent patch).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 drivers/staging/pohmelfs/path_entry.c |   15 +++
 fs/autofs4/waitq.c                    |   16 +++-
 fs/dcache.c                           |  136 ++++++++++++++++++++++++++++------
 fs/nfs/namespace.c                    |   14 +++
 include/linux/dcache.h                |    1 
 5 files changed, 152 insertions(+), 30 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:43.000000000 +1100
@@ -80,6 +80,7 @@ static __cacheline_aligned_in_smp DEFINE
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
+EXPORT_SYMBOL(rename_lock);
 EXPORT_SYMBOL(dcache_inode_lock);
 EXPORT_SYMBOL(dcache_hash_lock);
 EXPORT_SYMBOL(dcache_lock);
@@ -237,6 +238,7 @@ static struct dentry *d_kill(struct dent
 	__releases(dcache_inode_lock)
 	__releases(dcache_lock)
 {
+	dentry->d_parent = NULL;
 	list_del(&dentry->d_u.d_child);
 	if (parent)
 		spin_unlock(&parent->d_lock);
@@ -980,11 +982,15 @@ void shrink_dcache_for_umount(struct sup
  * Return true if the parent or its subdirectories contain
  * a mount point
  */
- 
 int have_submounts(struct dentry *parent)
 {
-	struct dentry *this_parent = parent;
+	struct dentry *this_parent;
 	struct list_head *next;
+	unsigned seq;
+
+rename_retry:
+	this_parent = parent;
+	seq = read_seqbegin(&rename_lock);
 
 	spin_lock(&dcache_lock);
 	if (d_mountpoint(parent))
@@ -1018,17 +1024,37 @@ int have_submounts(struct dentry *parent
 	 * All done at this level ... ascend and resume the search.
 	 */
 	if (this_parent != parent) {
-		next = this_parent->d_u.d_child.next;
+		struct dentry *tmp;
+		struct dentry *child;
+
+		tmp = this_parent->d_parent;
+		rcu_read_lock();
 		spin_unlock(&this_parent->d_lock);
-		this_parent = this_parent->d_parent;
+		child = this_parent;
+		this_parent = tmp;
 		spin_lock(&this_parent->d_lock);
+		/* might go back up the wrong parent if we have had a rename
+		 * or deletion */
+		if (this_parent != child->d_parent ||
+				read_seqretry(&rename_lock, seq)) {
+			spin_unlock(&this_parent->d_lock);
+			spin_unlock(&dcache_lock);
+			rcu_read_unlock();
+			goto rename_retry;
+		}
+		rcu_read_unlock();
+		next = child->d_u.d_child.next;
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 	return 0; /* No mount points found in tree */
 positive:
 	spin_unlock(&dcache_lock);
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 	return 1;
 }
 EXPORT_SYMBOL(have_submounts);
@@ -1049,10 +1075,15 @@ EXPORT_SYMBOL(have_submounts);
  */
 static int select_parent(struct dentry * parent)
 {
-	struct dentry *this_parent = parent;
+	struct dentry *this_parent;
 	struct list_head *next;
+	unsigned seq;
 	int found = 0;
 
+rename_retry:
+	this_parent = parent;
+	seq = read_seqbegin(&rename_lock);
+
 	spin_lock(&dcache_lock);
 	spin_lock(&this_parent->d_lock);
 repeat:
@@ -1062,7 +1093,6 @@ static int select_parent(struct dentry *
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
-		BUG_ON(this_parent == dentry);
 
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 
@@ -1105,17 +1135,32 @@ static int select_parent(struct dentry *
 	 */
 	if (this_parent != parent) {
 		struct dentry *tmp;
-		next = this_parent->d_u.d_child.next;
+		struct dentry *child;
+
 		tmp = this_parent->d_parent;
+		rcu_read_lock();
 		spin_unlock(&this_parent->d_lock);
-		BUG_ON(tmp == this_parent);
+		child = this_parent;
 		this_parent = tmp;
 		spin_lock(&this_parent->d_lock);
+		/* might go back up the wrong parent if we have had a rename
+		 * or deletion */
+		if (this_parent != child->d_parent ||
+				read_seqretry(&rename_lock, seq)) {
+			spin_unlock(&this_parent->d_lock);
+			spin_unlock(&dcache_lock);
+			rcu_read_unlock();
+			goto rename_retry;
+		}
+		rcu_read_unlock();
+		next = child->d_u.d_child.next;
 		goto resume;
 	}
 out:
 	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 	return found;
 }
 
@@ -1615,7 +1660,7 @@ EXPORT_SYMBOL(d_add_ci);
 struct dentry * d_lookup(struct dentry * parent, struct qstr * name)
 {
 	struct dentry * dentry = NULL;
-	unsigned long seq;
+	unsigned seq;
 
         do {
                 seq = read_seqbegin(&rename_lock);
@@ -2225,7 +2270,7 @@ static int prepend_name(char **buffer, i
  * @buffer: pointer to the end of the buffer
  * @buflen: pointer to buffer length
  *
- * Caller holds the dcache_lock.
+ * Caller holds the rename_lock.
  *
  * If path is not reachable from the supplied root, then the value of
  * root is changed (without modifying refcounts).
@@ -2310,7 +2355,9 @@ char *__d_path(const struct path *path,
 
 	prepend(&res, &buflen, "\0", 1);
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	error = prepend_path(path, root, &res, &buflen);
+	write_sequnlock(&rename_lock);
 	spin_unlock(&dcache_lock);
 
 	if (error)
@@ -2374,10 +2421,12 @@ char *d_path(const struct path *path, ch
 
 	get_fs_root(current->fs, &root);
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	tmp = root;
 	error = path_with_deleted(path, &tmp, &res, &buflen);
 	if (error)
 		res = ERR_PTR(error);
+	write_sequnlock(&rename_lock);
 	spin_unlock(&dcache_lock);
 	path_put(&root);
 	return res;
@@ -2405,10 +2454,12 @@ char *d_path_with_unreachable(const stru
 
 	get_fs_root(current->fs, &root);
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	tmp = root;
 	error = path_with_deleted(path, &tmp, &res, &buflen);
 	if (!error && !path_equal(&tmp, &root))
 		error = prepend_unreachable(&res, &buflen);
+	write_sequnlock(&rename_lock);
 	spin_unlock(&dcache_lock);
 	path_put(&root);
 	if (error)
@@ -2474,7 +2525,9 @@ char *dentry_path_raw(struct dentry *den
 	char *retval;
 
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	retval = __dentry_path(dentry, buf, buflen);
+	write_sequnlock(&rename_lock);
 	spin_unlock(&dcache_lock);
 
 	return retval;
@@ -2487,6 +2540,7 @@ char *dentry_path(struct dentry *dentry,
 	char *retval;
 
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	if (d_unlinked(dentry)) {
 		p = buf + buflen;
 		if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2494,6 +2548,7 @@ char *dentry_path(struct dentry *dentry,
 		buflen++;
 	}
 	retval = __dentry_path(dentry, buf, buflen);
+	write_sequnlock(&rename_lock);
 	spin_unlock(&dcache_lock);
 	if (!IS_ERR(retval) && p)
 		*p = '/';	/* restore '/' overriden with '\0' */
@@ -2534,6 +2589,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 
 	error = -ENOENT;
 	spin_lock(&dcache_lock);
+	write_seqlock(&rename_lock);
 	if (!d_unlinked(pwd.dentry)) {
 		unsigned long len;
 		struct path tmp = root;
@@ -2542,6 +2598,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 
 		prepend(&cwd, &buflen, "\0", 1);
 		error = prepend_path(&pwd, &tmp, &cwd, &buflen);
+		write_sequnlock(&rename_lock);
 		spin_unlock(&dcache_lock);
 
 		if (error)
@@ -2561,8 +2618,10 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 			if (copy_to_user(buf, cwd, len))
 				error = -EFAULT;
 		}
-	} else
+	} else {
+		write_sequnlock(&rename_lock);
 		spin_unlock(&dcache_lock);
+	}
 
 out:
 	path_put(&pwd);
@@ -2590,25 +2649,25 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 int is_subdir(struct dentry *new_dentry, struct dentry *old_dentry)
 {
 	int result;
-	unsigned long seq;
+	unsigned seq;
 
 	if (new_dentry == old_dentry)
 		return 1;
 
-	/*
-	 * Need rcu_readlock to protect against the d_parent trashing
-	 * due to d_move
-	 */
-	rcu_read_lock();
 	do {
 		/* for restarting inner loop in case of seq retry */
 		seq = read_seqbegin(&rename_lock);
+		/*
+		 * Need rcu_readlock to protect against the d_parent trashing
+		 * due to d_move
+		 */
+		rcu_read_lock();
 		if (d_ancestor(old_dentry, new_dentry))
 			result = 1;
 		else
 			result = 0;
+		rcu_read_unlock();
 	} while (read_seqretry(&rename_lock, seq));
-	rcu_read_unlock();
 
 	return result;
 }
@@ -2640,9 +2699,13 @@ EXPORT_SYMBOL(path_is_under);
 
 void d_genocide(struct dentry *root)
 {
-	struct dentry *this_parent = root;
+	struct dentry *this_parent;
 	struct list_head *next;
+	unsigned seq;
 
+rename_retry:
+	this_parent = root;
+	seq = read_seqbegin(&rename_lock);
 	spin_lock(&dcache_lock);
 	spin_lock(&this_parent->d_lock);
 repeat:
@@ -2652,6 +2715,7 @@ void d_genocide(struct dentry *root)
 		struct list_head *tmp = next;
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
+
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 		if (d_unhashed(dentry) || !dentry->d_inode) {
 			spin_unlock(&dentry->d_lock);
@@ -2664,19 +2728,43 @@ void d_genocide(struct dentry *root)
 			spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
 			goto repeat;
 		}
- 		dentry->d_count--;
+		if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
+			dentry->d_flags |= DCACHE_GENOCIDE;
+			dentry->d_count--;
+		}
  		spin_unlock(&dentry->d_lock);
 	}
 	if (this_parent != root) {
-		next = this_parent->d_u.d_child.next;
-		this_parent->d_count--;
+		struct dentry *tmp;
+		struct dentry *child;
+
+		tmp = this_parent->d_parent;
+		if (!(this_parent->d_flags & DCACHE_GENOCIDE)) {
+			this_parent->d_flags |= DCACHE_GENOCIDE;
+			this_parent->d_count--;
+		}
+		rcu_read_lock();
 		spin_unlock(&this_parent->d_lock);
-		this_parent = this_parent->d_parent;
-		spin_lock(&this_parent->d_lock);
+		child = this_parent;
+		this_parent = tmp;
+ 		spin_lock(&this_parent->d_lock);
+		/* might go back up the wrong parent if we have had a rename
+		 * or deletion */
+		if (this_parent != child->d_parent ||
+				read_seqretry(&rename_lock, seq)) {
+			spin_unlock(&this_parent->d_lock);
+			spin_unlock(&dcache_lock);
+			rcu_read_unlock();
+			goto rename_retry;
+		}
+		rcu_read_unlock();
+		next = child->d_u.d_child.next;
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
 	spin_unlock(&dcache_lock);
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 }
 
 /**
Index: linux-2.6/drivers/staging/pohmelfs/path_entry.c
===================================================================
--- linux-2.6.orig/drivers/staging/pohmelfs/path_entry.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/drivers/staging/pohmelfs/path_entry.c	2010-11-17 01:05:42.000000000 +1100
@@ -83,10 +83,11 @@ int pohmelfs_construct_path_string(struc
 int pohmelfs_path_length(struct pohmelfs_inode *pi)
 {
 	struct dentry *d, *root, *first;
-	int len = 1; /* Root slash */
+	int len;
+	unsigned seq;
 
-	first = d = d_find_alias(&pi->vfs_inode);
-	if (!d) {
+	first = d_find_alias(&pi->vfs_inode);
+	if (!first) {
 		dprintk("%s: ino: %llu, mode: %o.\n", __func__, pi->ino, pi->vfs_inode.i_mode);
 		return -ENOENT;
 	}
@@ -95,6 +96,11 @@ int pohmelfs_path_length(struct pohmelfs
 	root = dget(current->fs->root.dentry);
 	spin_unlock(&current->fs->lock);
 
+rename_retry:
+	len = 1; /* Root slash */
+	d = first;
+	seq = read_seqbegin(&rename_lock);
+	rcu_read_lock();
 	spin_lock(&dcache_lock);
 
 	if (!IS_ROOT(d) && d_unhashed(d))
@@ -105,6 +111,9 @@ int pohmelfs_path_length(struct pohmelfs
 		d = d->d_parent;
 	}
 	spin_unlock(&dcache_lock);
+	rcu_read_unlock();
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 
 	dput(root);
 	dput(first);
Index: linux-2.6/fs/autofs4/waitq.c
===================================================================
--- linux-2.6.orig/fs/autofs4/waitq.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/autofs4/waitq.c	2010-11-17 01:05:42.000000000 +1100
@@ -186,16 +186,25 @@ static int autofs4_getpath(struct autofs
 {
 	struct dentry *root = sbi->sb->s_root;
 	struct dentry *tmp;
-	char *buf = *name;
+	char *buf;
 	char *p;
-	int len = 0;
+	int len;
+	unsigned seq;
 
+rename_retry:
+	buf = *name;
+	len = 0;
+	seq = read_seqbegin(&rename_lock);
+	rcu_read_lock();
 	spin_lock(&dcache_lock);
 	for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent)
 		len += tmp->d_name.len + 1;
 
 	if (!len || --len > NAME_MAX) {
 		spin_unlock(&dcache_lock);
+		rcu_read_unlock();
+		if (read_seqretry(&rename_lock, seq))
+			goto rename_retry;
 		return 0;
 	}
 
@@ -209,6 +218,9 @@ static int autofs4_getpath(struct autofs
 		strncpy(p, tmp->d_name.name, tmp->d_name.len);
 	}
 	spin_unlock(&dcache_lock);
+	rcu_read_unlock();
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 
 	return len;
 }
Index: linux-2.6/fs/nfs/namespace.c
===================================================================
--- linux-2.6.orig/fs/nfs/namespace.c	2010-11-17 00:50:49.000000000 +1100
+++ linux-2.6/fs/nfs/namespace.c	2010-11-17 01:05:42.000000000 +1100
@@ -49,11 +49,17 @@ char *nfs_path(const char *base,
 	       const struct dentry *dentry,
 	       char *buffer, ssize_t buflen)
 {
-	char *end = buffer+buflen;
+	char *end;
 	int namelen;
+	unsigned seq;
 
+rename_retry:
+	end = buffer+buflen;
 	*--end = '\0';
 	buflen--;
+
+	seq = read_seqbegin(&rename_lock);
+	rcu_read_lock();
 	spin_lock(&dcache_lock);
 	while (!IS_ROOT(dentry) && dentry != droot) {
 		namelen = dentry->d_name.len;
@@ -66,6 +72,9 @@ char *nfs_path(const char *base,
 		dentry = dentry->d_parent;
 	}
 	spin_unlock(&dcache_lock);
+	rcu_read_unlock();
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 	if (*end != '/') {
 		if (--buflen < 0)
 			goto Elong;
@@ -83,6 +92,9 @@ char *nfs_path(const char *base,
 	return end;
 Elong_unlock:
 	spin_unlock(&dcache_lock);
+	rcu_read_unlock();
+	if (read_seqretry(&rename_lock, seq))
+		goto rename_retry;
 Elong:
 	return ERR_PTR(-ENAMETOOLONG);
 }
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:42.000000000 +1100
@@ -180,6 +180,7 @@ struct dentry_operations {
 #define DCACHE_FSNOTIFY_PARENT_WATCHED	0x0080 /* Parent inode is watched by some fsnotify listener */
 
 #define DCACHE_CANT_MOUNT	0x0100
+#define DCACHE_GENOCIDE		0x0200
 
 extern spinlock_t dcache_inode_lock;
 extern spinlock_t dcache_hash_lock;



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 17/28] fs: increase d_name lock coverage
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (15 preceding siblings ...)
  2010-11-16 14:09 ` [patch 16/28] fs: Use rename lock and RCU for multi-step operations Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 18/28] fs: dcache remove dcache_lock Nick Piggin
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-lock-d_name.patch --]
[-- Type: text/plain, Size: 1903 bytes --]

Cover d_name with d_lock in more cases, where there may be concurrent
modification to it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:42.000000000 +1100
@@ -1361,16 +1361,20 @@ static struct dentry *__d_instantiate_un
 	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
 		struct qstr *qstr = &alias->d_name;
 
+		spin_lock(&alias->d_lock);
 		if (qstr->hash != hash)
-			continue;
+			goto next;
 		if (alias->d_parent != entry->d_parent)
-			continue;
+			goto next;
 		if (qstr->len != len)
-			continue;
+			goto next;
 		if (memcmp(qstr->name, name, len))
-			continue;
-		dget_locked(alias);
+			goto next;
+		dget_locked_dlock(alias);
+		spin_unlock(&alias->d_lock);
 		return alias;
+next:
+		spin_unlock(&alias->d_lock);
 	}
 
 	__d_instantiate(entry, inode);
@@ -2298,7 +2302,9 @@ static int prepend_path(const struct pat
 		}
 		parent = dentry->d_parent;
 		prefetch(parent);
+		spin_lock(&dentry->d_lock);
 		error = prepend_name(buffer, buflen, &dentry->d_name);
+		spin_unlock(&dentry->d_lock);
 		if (!error)
 			error = prepend(buffer, buflen, "/", 1);
 		if (error)
@@ -2506,10 +2512,13 @@ static char *__dentry_path(struct dentry
 
 	while (!IS_ROOT(dentry)) {
 		struct dentry *parent = dentry->d_parent;
+		int error;
 
 		prefetch(parent);
-		if ((prepend_name(&end, &buflen, &dentry->d_name) != 0) ||
-		    (prepend(&end, &buflen, "/", 1) != 0))
+		spin_lock(&dentry->d_lock);
+		error = prepend_name(&end, &buflen, &dentry->d_name);
+		spin_unlock(&dentry->d_lock);
+		if (error != 0 || prepend(&end, &buflen, "/", 1) != 0)
 			goto Elong;
 
 		retval = end;



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 18/28] fs: dcache remove dcache_lock
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (16 preceding siblings ...)
  2010-11-16 14:09 ` [patch 17/28] fs: increase d_name lock coverage Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 19/28] fs: dcache avoid starvation in dcache multi-step operations Nick Piggin
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache_lock-remove.patch --]
[-- Type: text/plain, Size: 71136 bytes --]

dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 Documentation/filesystems/Locking            |   16 +-
 Documentation/filesystems/dentry-locking.txt |   40 +++---
 Documentation/filesystems/porting            |    8 +
 arch/powerpc/platforms/cell/spufs/inode.c    |    5 
 drivers/infiniband/hw/ipath/ipath_fs.c       |    6 -
 drivers/infiniband/hw/qib/qib_fs.c           |    3 
 drivers/staging/pohmelfs/path_entry.c        |    2 
 drivers/staging/smbfs/cache.c                |    4 
 drivers/usb/core/inode.c                     |    3 
 fs/9p/vfs_inode.c                            |    2 
 fs/affs/amigaffs.c                           |    2 
 fs/autofs4/autofs_i.h                        |    3 
 fs/autofs4/expire.c                          |   10 -
 fs/autofs4/root.c                            |   44 +++----
 fs/autofs4/waitq.c                           |    7 -
 fs/ceph/dir.c                                |    6 -
 fs/ceph/inode.c                              |    4 
 fs/cifs/inode.c                              |    3 
 fs/coda/cache.c                              |    2 
 fs/configfs/configfs_internal.h              |    2 
 fs/configfs/inode.c                          |    6 -
 fs/dcache.c                                  |  162 ++++-----------------------
 fs/exportfs/expfs.c                          |    4 
 fs/libfs.c                                   |    8 -
 fs/namei.c                                   |    9 -
 fs/ncpfs/dir.c                               |   18 +--
 fs/ncpfs/ncplib_kernel.h                     |    4 
 fs/nfs/dir.c                                 |    3 
 fs/nfs/getroot.c                             |    2 
 fs/nfs/namespace.c                           |    3 
 fs/notify/fsnotify.c                         |    2 
 fs/ocfs2/dcache.c                            |    2 
 include/linux/dcache.h                       |    7 -
 include/linux/fs.h                           |    6 -
 include/linux/fsnotify.h                     |    2 
 include/linux/fsnotify_backend.h             |   11 +
 include/linux/namei.h                        |    1 
 kernel/cgroup.c                              |    6 -
 mm/filemap.c                                 |    3 
 security/selinux/selinuxfs.c                 |    4 
 40 files changed, 120 insertions(+), 315 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:40.000000000 +1100
@@ -54,11 +54,10 @@
  *   - d_alias, d_inode
  *
  * Ordering:
- * dcache_lock
- *   dcache_inode_lock
- *     dentry->d_lock
- *       dcache_lru_lock
- *       dcache_hash_lock
+ * dcache_inode_lock
+ *   dentry->d_lock
+ *     dcache_lru_lock
+ *     dcache_hash_lock
  *
  * If there is an ancestor relationship:
  * dentry->d_parent->...->d_parent->d_lock
@@ -77,13 +76,11 @@ EXPORT_SYMBOL_GPL(sysctl_vfs_cache_press
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
-__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
 EXPORT_SYMBOL(rename_lock);
 EXPORT_SYMBOL(dcache_inode_lock);
 EXPORT_SYMBOL(dcache_hash_lock);
-EXPORT_SYMBOL(dcache_lock);
 
 static struct kmem_cache *dentry_cache __read_mostly;
 
@@ -133,7 +130,7 @@ static void __d_free(struct rcu_head *he
 }
 
 /*
- * no dcache_lock, please.
+ * no locks, please.
  */
 static void d_free(struct dentry *dentry)
 {
@@ -156,7 +153,6 @@ static void d_free(struct dentry *dentry
 static void dentry_iput(struct dentry * dentry)
 	__releases(dentry->d_lock)
 	__releases(dcache_inode_lock)
-	__releases(dcache_lock)
 {
 	struct inode *inode = dentry->d_inode;
 	if (inode) {
@@ -164,7 +160,6 @@ static void dentry_iput(struct dentry *
 		list_del_init(&dentry->d_alias);
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 		if (!inode->i_nlink)
 			fsnotify_inoderemove(inode);
 		if (dentry->d_op && dentry->d_op->d_iput)
@@ -174,7 +169,6 @@ static void dentry_iput(struct dentry *
 	} else {
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 	}
 }
 
@@ -229,14 +223,13 @@ static void dentry_lru_move_tail(struct
  *
  * If this is the root of the dentry tree, return NULL.
  *
- * dcache_lock and d_lock and d_parent->d_lock must be held by caller, and
- * are dropped by d_kill.
+ * dentry->d_lock and parent->d_lock must be held by caller, and are dropped by
+ * d_kill.
  */
 static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
 	__releases(dentry->d_lock)
 	__releases(parent->d_lock)
 	__releases(dcache_inode_lock)
-	__releases(dcache_lock)
 {
 	dentry->d_parent = NULL;
 	list_del(&dentry->d_u.d_child);
@@ -295,21 +288,10 @@ void dput(struct dentry *dentry)
 	else
 		parent = dentry->d_parent;
 	if (dentry->d_count == 1) {
-		if (!spin_trylock(&dcache_lock)) {
-			/*
-			 * Something of a livelock possibility we could avoid
-			 * by taking dcache_lock and trying again, but we
-			 * want to reduce dcache_lock anyway so this will
-			 * get improved.
-			 */
-drop1:
-			spin_unlock(&dentry->d_lock);
-			goto repeat;
-		}
 		if (!spin_trylock(&dcache_inode_lock)) {
 drop2:
-			spin_unlock(&dcache_lock);
-			goto drop1;
+			spin_unlock(&dentry->d_lock);
+			goto repeat;
 		}
 		if (parent && !spin_trylock(&parent->d_lock)) {
 			spin_unlock(&dcache_inode_lock);
@@ -321,7 +303,6 @@ void dput(struct dentry *dentry)
 		spin_unlock(&dentry->d_lock);
 		if (parent)
 			spin_unlock(&parent->d_lock);
-		spin_unlock(&dcache_lock);
 		return;
 	}
 
@@ -345,7 +326,6 @@ void dput(struct dentry *dentry)
 	if (parent)
 		spin_unlock(&parent->d_lock);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	return;
 
 unhash_it:
@@ -376,11 +356,9 @@ int d_invalidate(struct dentry * dentry)
 	/*
 	 * If it's already been dropped, return OK.
 	 */
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	if (d_unhashed(dentry)) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
 		return 0;
 	}
 	/*
@@ -389,9 +367,7 @@ int d_invalidate(struct dentry * dentry)
 	 */
 	if (!list_empty(&dentry->d_subdirs)) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
 		shrink_dcache_parent(dentry);
-		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 	}
 
@@ -408,19 +384,17 @@ int d_invalidate(struct dentry * dentry)
 	if (dentry->d_count > 1) {
 		if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 			return -EBUSY;
 		}
 	}
 
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 	return 0;
 }
 EXPORT_SYMBOL(d_invalidate);
 
-/* This must be called with dcache_lock and d_lock held */
+/* This must be called with d_lock held */
 static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
 {
 	dentry->d_count++;
@@ -428,7 +402,7 @@ static inline struct dentry * __dget_loc
 	return dentry;
 }
 
-/* This should be called _only_ with dcache_lock held */
+/* This must be called with d_lock held */
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
 	spin_lock(&dentry->d_lock);
@@ -538,11 +512,9 @@ struct dentry *d_find_alias(struct inode
 	struct dentry *de = NULL;
 
 	if (!list_empty(&inode->i_dentry)) {
-		spin_lock(&dcache_lock);
 		spin_lock(&dcache_inode_lock);
 		de = __d_find_alias(inode, 0);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 	}
 	return de;
 }
@@ -556,7 +528,6 @@ void d_prune_aliases(struct inode *inode
 {
 	struct dentry *dentry;
 restart:
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		spin_lock(&dentry->d_lock);
@@ -565,14 +536,12 @@ void d_prune_aliases(struct inode *inode
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_inode_lock);
-			spin_unlock(&dcache_lock);
 			dput(dentry);
 			goto restart;
 		}
 		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
@@ -588,17 +557,14 @@ static void prune_one_dentry(struct dent
 	__releases(dentry->d_lock)
 	__releases(parent->d_lock)
 	__releases(dcache_inode_lock)
-	__releases(dcache_lock)
 {
 	__d_drop(dentry);
 	dentry = d_kill(dentry, parent);
 
 	/*
-	 * Prune ancestors.  Locking is simpler than in dput(),
-	 * because dcache_lock needs to be taken anyway.
+	 * Prune ancestors.
 	 */
 	while (dentry) {
-		spin_lock(&dcache_lock);
 		spin_lock(&dcache_inode_lock);
 again:
 		spin_lock(&dentry->d_lock);
@@ -616,7 +582,6 @@ static void prune_one_dentry(struct dent
 				spin_unlock(&parent->d_lock);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_inode_lock);
-			spin_unlock(&dcache_lock);
 			return;
 		}
 
@@ -665,8 +630,7 @@ static void shrink_dentry_list(struct li
 		spin_unlock(&dcache_lru_lock);
 
 		prune_one_dentry(dentry, parent);
-		/* dcache_lock, dcache_inode_lock and dentry->d_lock dropped */
-		spin_lock(&dcache_lock);
+		/* dcache_inode_lock and dentry->d_lock dropped */
 		spin_lock(&dcache_inode_lock);
 		spin_lock(&dcache_lru_lock);
 	}
@@ -688,7 +652,6 @@ static void __shrink_dcache_sb(struct su
 	LIST_HEAD(tmp);
 	int cnt = *count;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 relock:
 	spin_lock(&dcache_lru_lock);
@@ -719,7 +682,6 @@ static void __shrink_dcache_sb(struct su
 			if (!--cnt)
 				break;
 		}
-		/* XXX: re-add cond_resched_lock when dcache_lock goes away */
 	}
 
 	*count = cnt;
@@ -729,7 +691,6 @@ static void __shrink_dcache_sb(struct su
 		list_splice(&referenced, &sb->s_dentry_lru);
 	spin_unlock(&dcache_lru_lock);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /**
@@ -751,7 +712,6 @@ static void prune_dcache(int count)
 
 	if (unused == 0 || count == 0)
 		return;
-	spin_lock(&dcache_lock);
 	if (count >= unused)
 		prune_ratio = 1;
 	else
@@ -788,11 +748,9 @@ static void prune_dcache(int count)
 		if (down_read_trylock(&sb->s_umount)) {
 			if ((sb->s_root != NULL) &&
 			    (!list_empty(&sb->s_dentry_lru))) {
-				spin_unlock(&dcache_lock);
 				__shrink_dcache_sb(sb, &w_count,
 						DCACHE_REFERENCED);
 				pruned -= w_count;
-				spin_lock(&dcache_lock);
 			}
 			up_read(&sb->s_umount);
 		}
@@ -808,7 +766,6 @@ static void prune_dcache(int count)
 	if (p)
 		__put_super(p);
 	spin_unlock(&sb_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /**
@@ -822,7 +779,6 @@ void shrink_dcache_sb(struct super_block
 {
 	LIST_HEAD(tmp);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
@@ -831,7 +787,6 @@ void shrink_dcache_sb(struct super_block
 	}
 	spin_unlock(&dcache_lru_lock);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
@@ -848,12 +803,10 @@ static void shrink_dcache_for_umount_sub
 	BUG_ON(!IS_ROOT(dentry));
 
 	/* detach this root from the system */
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	dentry_lru_del(dentry);
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 
 	for (;;) {
 		/* descend to the first leaf in the current subtree */
@@ -862,7 +815,6 @@ static void shrink_dcache_for_umount_sub
 
 			/* this is a branch with children - detach all of them
 			 * from the system in one go */
-			spin_lock(&dcache_lock);
 			spin_lock(&dentry->d_lock);
 			list_for_each_entry(loop, &dentry->d_subdirs,
 					    d_u.d_child) {
@@ -873,7 +825,6 @@ static void shrink_dcache_for_umount_sub
 				spin_unlock(&loop->d_lock);
 			}
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 
 			/* move to the first child */
 			dentry = list_entry(dentry->d_subdirs.next,
@@ -940,8 +891,7 @@ static void shrink_dcache_for_umount_sub
 
 /*
  * destroy the dentries attached to a superblock on unmounting
- * - we don't need to use dentry->d_lock, and only need dcache_lock when
- *   removing the dentry from the system lists and hashes because:
+ * - we don't need to use dentry->d_lock because:
  *   - the superblock is detached from all mountings and open files, so the
  *     dentry trees will not be rearranged by the VFS
  *   - s_umount is write-locked, so the memory pressure shrinker will ignore
@@ -992,7 +942,6 @@ int have_submounts(struct dentry *parent
 	this_parent = parent;
 	seq = read_seqbegin(&rename_lock);
 
-	spin_lock(&dcache_lock);
 	if (d_mountpoint(parent))
 		goto positive;
 	spin_lock(&this_parent->d_lock);
@@ -1038,7 +987,6 @@ int have_submounts(struct dentry *parent
 		if (this_parent != child->d_parent ||
 				read_seqretry(&rename_lock, seq)) {
 			spin_unlock(&this_parent->d_lock);
-			spin_unlock(&dcache_lock);
 			rcu_read_unlock();
 			goto rename_retry;
 		}
@@ -1047,12 +995,10 @@ int have_submounts(struct dentry *parent
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
-	spin_unlock(&dcache_lock);
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	return 0; /* No mount points found in tree */
 positive:
-	spin_unlock(&dcache_lock);
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	return 1;
@@ -1084,7 +1030,6 @@ static int select_parent(struct dentry *
 	this_parent = parent;
 	seq = read_seqbegin(&rename_lock);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
@@ -1148,7 +1093,6 @@ static int select_parent(struct dentry *
 		if (this_parent != child->d_parent ||
 				read_seqretry(&rename_lock, seq)) {
 			spin_unlock(&this_parent->d_lock);
-			spin_unlock(&dcache_lock);
 			rcu_read_unlock();
 			goto rename_retry;
 		}
@@ -1158,7 +1102,6 @@ static int select_parent(struct dentry *
 	}
 out:
 	spin_unlock(&this_parent->d_lock);
-	spin_unlock(&dcache_lock);
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 	return found;
@@ -1263,7 +1206,6 @@ struct dentry *d_alloc(struct dentry * p
 	INIT_LIST_HEAD(&dentry->d_u.d_child);
 
 	if (parent) {
-		spin_lock(&dcache_lock);
 		spin_lock(&parent->d_lock);
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 		dentry->d_parent = dget_dlock(parent);
@@ -1271,7 +1213,6 @@ struct dentry *d_alloc(struct dentry * p
 		list_add(&dentry->d_u.d_child, &parent->d_subdirs);
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&parent->d_lock);
-		spin_unlock(&dcache_lock);
 	}
 
 	percpu_counter_inc(&nr_dentry);
@@ -1291,7 +1232,6 @@ struct dentry *d_alloc_name(struct dentr
 }
 EXPORT_SYMBOL(d_alloc_name);
 
-/* the caller must hold dcache_lock */
 static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 {
 	spin_lock(&dentry->d_lock);
@@ -1320,11 +1260,9 @@ static void __d_instantiate(struct dentr
 void d_instantiate(struct dentry *entry, struct inode * inode)
 {
 	BUG_ON(!list_empty(&entry->d_alias));
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	__d_instantiate(entry, inode);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	security_d_instantiate(entry, inode);
 }
 EXPORT_SYMBOL(d_instantiate);
@@ -1387,11 +1325,9 @@ struct dentry *d_instantiate_unique(stru
 
 	BUG_ON(!list_empty(&entry->d_alias));
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	result = __d_instantiate_unique(entry, inode);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 
 	if (!result) {
 		security_d_instantiate(entry, inode);
@@ -1480,12 +1416,11 @@ struct dentry *d_obtain_alias(struct ino
 	}
 	tmp->d_parent = tmp; /* make sure dput doesn't croak */
 
-	spin_lock(&dcache_lock);
+
 	spin_lock(&dcache_inode_lock);
 	res = __d_find_alias(inode, 0);
 	if (res) {
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 		dput(tmp);
 		goto out_iput;
 	}
@@ -1503,7 +1438,6 @@ struct dentry *d_obtain_alias(struct ino
 	spin_unlock(&tmp->d_lock);
 	spin_unlock(&dcache_inode_lock);
 
-	spin_unlock(&dcache_lock);
 	return tmp;
 
  out_iput:
@@ -1533,21 +1467,18 @@ struct dentry *d_splice_alias(struct ino
 	struct dentry *new = NULL;
 
 	if (inode && S_ISDIR(inode->i_mode)) {
-		spin_lock(&dcache_lock);
 		spin_lock(&dcache_inode_lock);
 		new = __d_find_alias(inode, 1);
 		if (new) {
 			BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
 			spin_unlock(&dcache_inode_lock);
-			spin_unlock(&dcache_lock);
 			security_d_instantiate(new, inode);
 			d_move(new, dentry);
 			iput(inode);
 		} else {
-			/* already taking dcache_lock, so d_add() by hand */
+			/* already taking dcache_inode_lock, so d_add() by hand */
 			__d_instantiate(dentry, inode);
 			spin_unlock(&dcache_inode_lock);
-			spin_unlock(&dcache_lock);
 			security_d_instantiate(dentry, inode);
 			d_rehash(dentry);
 		}
@@ -1620,12 +1551,10 @@ struct dentry *d_add_ci(struct dentry *d
 	 * Negative dentry: instantiate it unless the inode is a directory and
 	 * already has a dentry.
 	 */
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	if (!S_ISDIR(inode->i_mode) || list_empty(&inode->i_dentry)) {
 		__d_instantiate(found, inode);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 		security_d_instantiate(found, inode);
 		return found;
 	}
@@ -1637,7 +1566,6 @@ struct dentry *d_add_ci(struct dentry *d
 	new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
 	dget_locked(new);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	security_d_instantiate(found, inode);
 	d_move(new, found);
 	iput(inode);
@@ -1808,7 +1736,6 @@ int d_validate(struct dentry *dentry, st
 {
 	struct dentry *child;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dparent->d_lock);
 	list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
 		if (dentry == child) {
@@ -1816,12 +1743,10 @@ int d_validate(struct dentry *dentry, st
 			__dget_locked_dlock(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dparent->d_lock);
-			spin_unlock(&dcache_lock);
 			return 1;
 		}
 	}
 	spin_unlock(&dparent->d_lock);
-	spin_unlock(&dcache_lock);
 
 	return 0;
 }
@@ -1854,7 +1779,6 @@ void d_delete(struct dentry * dentry)
 	/*
 	 * Are we the only user?
 	 */
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
@@ -1870,7 +1794,6 @@ void d_delete(struct dentry * dentry)
 
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 
 	fsnotify_nameremove(dentry, isdir);
 }
@@ -1897,13 +1820,11 @@ static void _d_rehash(struct dentry * en
  
 void d_rehash(struct dentry * entry)
 {
-	spin_lock(&dcache_lock);
 	spin_lock(&entry->d_lock);
 	spin_lock(&dcache_hash_lock);
 	_d_rehash(entry);
 	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&entry->d_lock);
-	spin_unlock(&dcache_lock);
 }
 EXPORT_SYMBOL(d_rehash);
 
@@ -1972,14 +1893,14 @@ static void switch_names(struct dentry *
  */
  
 /*
- * d_move_locked - move a dentry
+ * d_move - move a dentry
  * @dentry: entry to move
  * @target: new dentry
  *
  * Update the dcache to reflect the move of a file name. Negative
  * dcache entries should not be moved in this way.
  */
-static void d_move_locked(struct dentry * dentry, struct dentry * target)
+void d_move(struct dentry * dentry, struct dentry * target)
 {
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -2049,22 +1970,6 @@ static void d_move_locked(struct dentry
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
 }
-
-/**
- * d_move - move a dentry
- * @dentry: entry to move
- * @target: new dentry
- *
- * Update the dcache to reflect the move of a file name. Negative
- * dcache entries should not be moved in this way.
- */
-
-void d_move(struct dentry * dentry, struct dentry * target)
-{
-	spin_lock(&dcache_lock);
-	d_move_locked(dentry, target);
-	spin_unlock(&dcache_lock);
-}
 EXPORT_SYMBOL(d_move);
 
 /**
@@ -2090,13 +1995,12 @@ struct dentry *d_ancestor(struct dentry
  * This helper attempts to cope with remotely renamed directories
  *
  * It assumes that the caller is already holding
- * dentry->d_parent->d_inode->i_mutex and the dcache_lock
+ * dentry->d_parent->d_inode->i_mutex and the dcache_inode_lock
  *
  * Note: If ever the locking in lock_rename() changes, then please
  * remember to update this too...
  */
 static struct dentry *__d_unalias(struct dentry *dentry, struct dentry *alias)
-	__releases(dcache_lock)
 	__releases(dcache_inode_lock)
 {
 	struct mutex *m1 = NULL, *m2 = NULL;
@@ -2120,11 +2024,10 @@ static struct dentry *__d_unalias(struct
 		goto out_err;
 	m2 = &alias->d_parent->d_inode->i_mutex;
 out_unalias:
-	d_move_locked(alias, dentry);
+	d_move(alias, dentry);
 	ret = alias;
 out_err:
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	if (m2)
 		mutex_unlock(m2);
 	if (m1)
@@ -2188,7 +2091,6 @@ struct dentry *d_materialise_unique(stru
 
 	BUG_ON(!d_unhashed(dentry));
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 
 	if (!inode) {
@@ -2203,6 +2105,11 @@ struct dentry *d_materialise_unique(stru
 		/* Does an aliased dentry already exist? */
 		alias = __d_find_alias(inode, 0);
 		if (alias) {
+			/*
+			 * XXX: after dcache_lock removal, we no longer
+			 * guarantee a !d_unhashed alias here. Is that going to
+			 * be a problem?
+			 */
 			actual = alias;
 			/* Is this an anonymous mountpoint that we could splice
 			 * into our tree? */
@@ -2234,7 +2141,6 @@ struct dentry *d_materialise_unique(stru
 	spin_unlock(&dcache_hash_lock);
 	spin_unlock(&actual->d_lock);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 out_nolock:
 	if (actual == dentry) {
 		security_d_instantiate(dentry, inode);
@@ -2246,7 +2152,6 @@ struct dentry *d_materialise_unique(stru
 
 shouldnt_be_hashed:
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	BUG();
 }
 EXPORT_SYMBOL_GPL(d_materialise_unique);
@@ -2360,11 +2265,9 @@ char *__d_path(const struct path *path,
 	int error;
 
 	prepend(&res, &buflen, "\0", 1);
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	error = prepend_path(path, root, &res, &buflen);
 	write_sequnlock(&rename_lock);
-	spin_unlock(&dcache_lock);
 
 	if (error)
 		return ERR_PTR(error);
@@ -2426,14 +2329,12 @@ char *d_path(const struct path *path, ch
 		return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
 
 	get_fs_root(current->fs, &root);
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	tmp = root;
 	error = path_with_deleted(path, &tmp, &res, &buflen);
 	if (error)
 		res = ERR_PTR(error);
 	write_sequnlock(&rename_lock);
-	spin_unlock(&dcache_lock);
 	path_put(&root);
 	return res;
 }
@@ -2459,14 +2360,12 @@ char *d_path_with_unreachable(const stru
 		return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
 
 	get_fs_root(current->fs, &root);
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	tmp = root;
 	error = path_with_deleted(path, &tmp, &res, &buflen);
 	if (!error && !path_equal(&tmp, &root))
 		error = prepend_unreachable(&res, &buflen);
 	write_sequnlock(&rename_lock);
-	spin_unlock(&dcache_lock);
 	path_put(&root);
 	if (error)
 		res =  ERR_PTR(error);
@@ -2533,11 +2432,9 @@ char *dentry_path_raw(struct dentry *den
 {
 	char *retval;
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	retval = __dentry_path(dentry, buf, buflen);
 	write_sequnlock(&rename_lock);
-	spin_unlock(&dcache_lock);
 
 	return retval;
 }
@@ -2548,7 +2445,6 @@ char *dentry_path(struct dentry *dentry,
 	char *p = NULL;
 	char *retval;
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	if (d_unlinked(dentry)) {
 		p = buf + buflen;
@@ -2558,12 +2454,10 @@ char *dentry_path(struct dentry *dentry,
 	}
 	retval = __dentry_path(dentry, buf, buflen);
 	write_sequnlock(&rename_lock);
-	spin_unlock(&dcache_lock);
 	if (!IS_ERR(retval) && p)
 		*p = '/';	/* restore '/' overriden with '\0' */
 	return retval;
 Elong:
-	spin_unlock(&dcache_lock);
 	return ERR_PTR(-ENAMETOOLONG);
 }
 
@@ -2597,7 +2491,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 	get_fs_root_and_pwd(current->fs, &root, &pwd);
 
 	error = -ENOENT;
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	if (!d_unlinked(pwd.dentry)) {
 		unsigned long len;
@@ -2608,7 +2501,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 		prepend(&cwd, &buflen, "\0", 1);
 		error = prepend_path(&pwd, &tmp, &cwd, &buflen);
 		write_sequnlock(&rename_lock);
-		spin_unlock(&dcache_lock);
 
 		if (error)
 			goto out;
@@ -2629,7 +2521,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
 		}
 	} else {
 		write_sequnlock(&rename_lock);
-		spin_unlock(&dcache_lock);
 	}
 
 out:
@@ -2715,7 +2606,6 @@ void d_genocide(struct dentry *root)
 rename_retry:
 	this_parent = root;
 	seq = read_seqbegin(&rename_lock);
-	spin_lock(&dcache_lock);
 	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
@@ -2762,7 +2652,6 @@ void d_genocide(struct dentry *root)
 		if (this_parent != child->d_parent ||
 				read_seqretry(&rename_lock, seq)) {
 			spin_unlock(&this_parent->d_lock);
-			spin_unlock(&dcache_lock);
 			rcu_read_unlock();
 			goto rename_retry;
 		}
@@ -2771,7 +2660,6 @@ void d_genocide(struct dentry *root)
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
-	spin_unlock(&dcache_lock);
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
 }
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/namei.c	2010-11-17 00:52:37.000000000 +1100
@@ -612,8 +612,8 @@ int follow_up(struct path *path)
 	return 1;
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
+/*
+ * serialization is taken care of in namespace.c
  */
 static int __follow_mount(struct path *path)
 {
@@ -645,9 +645,6 @@ static void follow_mount(struct path *pa
 	}
 }
 
-/* no need for dcache_lock, as serialization is taken care in
- * namespace.c
- */
 int follow_down(struct path *path)
 {
 	struct vfsmount *mounted;
@@ -2128,12 +2125,10 @@ void dentry_unhash(struct dentry *dentry
 {
 	dget(dentry);
 	shrink_dcache_parent(dentry);
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	if (dentry->d_count == 2)
 		__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 int vfs_rmdir(struct inode *dir, struct dentry *dentry)
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:39.000000000 +1100
@@ -184,7 +184,6 @@ struct dentry_operations {
 
 extern spinlock_t dcache_inode_lock;
 extern spinlock_t dcache_hash_lock;
-extern spinlock_t dcache_lock;
 extern seqlock_t rename_lock;
 
 /**
@@ -215,11 +214,9 @@ static inline void __d_drop(struct dentr
 
 static inline void d_drop(struct dentry *dentry)
 {
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
  	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 static inline int dname_external(struct dentry *dentry)
@@ -328,8 +325,8 @@ extern char *dentry_path(struct dentry *
  *	destroyed when it has references. dget() should never be
  *	called for dentries with zero reference counter. For these cases
  *	(preferably none, functions in dcache.c are sufficient for normal
- *	needs and they take necessary precautions) you should hold dcache_lock
- *	and call dget_locked() instead of dget().
+ *	needs and they take necessary precautions) you should hold d_lock
+ *	and call dget_dlock() instead of dget().
  */
 static inline struct dentry *dget_dlock(struct dentry *dentry)
 {
Index: linux-2.6/fs/exportfs/expfs.c
===================================================================
--- linux-2.6.orig/fs/exportfs/expfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/exportfs/expfs.c	2010-11-17 01:05:39.000000000 +1100
@@ -47,24 +47,20 @@ find_acceptable_alias(struct dentry *res
 	if (acceptable(context, result))
 		return result;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &result->d_inode->i_dentry, d_alias) {
 		dget_locked(dentry);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 		if (toput)
 			dput(toput);
 		if (dentry != result && acceptable(context, dentry)) {
 			dput(result);
 			return dentry;
 		}
-		spin_lock(&dcache_lock);
 		spin_lock(&dcache_inode_lock);
 		toput = dentry;
 	}
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 
 	if (toput)
 		dput(toput);
Index: linux-2.6/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.orig/Documentation/filesystems/Locking	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/Locking	2010-11-17 00:52:37.000000000 +1100
@@ -22,14 +22,14 @@ be able to use diff(1).
 
 locking rules:
 	none have BKL
-		dcache_lock	rename_lock	->d_lock	may block
-d_revalidate:	no		no		no		yes
-d_hash		no		no		no		no
-d_compare:	no		yes		no		no 
-d_delete:	yes		no		yes		no
-d_release:	no		no		no		yes
-d_iput:		no		no		no		yes
-d_dname:	no		no		no		no
+		rename_lock	->d_lock	may block
+d_revalidate:	no		no		yes
+d_hash		no		no		no
+d_compare:	yes		no		no
+d_delete:	no		yes		no
+d_release:	no		no		yes
+d_iput:		no		no		yes
+d_dname:	no		no		no
 
 --------------------------- inode_operations --------------------------- 
 prototypes:
Index: linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 01:05:39.000000000 +1100
@@ -159,21 +159,18 @@ static void spufs_prune_dir(struct dentr
 
 	mutex_lock(&dir->d_inode->i_mutex);
 	list_for_each_entry_safe(dentry, tmp, &dir->d_subdirs, d_u.d_child) {
-		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry)) && dentry->d_inode) {
 			dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			simple_unlink(dir->d_inode, dentry);
-			/* XXX: what is dcache_lock protecting here? Other
+			/* XXX: what was dcache_lock protecting here? Other
 			 * filesystems (IB, configfs) release dcache_lock
 			 * before unlink */
-			spin_unlock(&dcache_lock);
 			dput(dentry);
 		} else {
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 		}
 	}
 	shrink_dcache_parent(dir);
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 01:05:39.000000000 +1100
@@ -277,18 +277,14 @@ static int remove_file(struct dentry *pa
 		goto bail;
 	}
 
-	spin_lock(&dcache_lock);
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
 		dget_locked_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
-		spin_unlock(&dcache_lock);
 		simple_unlink(parent->d_inode, tmp);
-	} else {
+	} else
 		spin_unlock(&tmp->d_lock);
-		spin_unlock(&dcache_lock);
-	}
 
 	ret = 0;
 bail:
Index: linux-2.6/drivers/usb/core/inode.c
===================================================================
--- linux-2.6.orig/drivers/usb/core/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/usb/core/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -344,7 +344,6 @@ static int usbfs_empty (struct dentry *d
 {
 	struct list_head *list;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	list_for_each(list, &dentry->d_subdirs) {
 		struct dentry *de = list_entry(list, struct dentry, d_u.d_child);
@@ -353,13 +352,11 @@ static int usbfs_empty (struct dentry *d
 		if (usbfs_positive(de)) {
 			spin_unlock(&de->d_lock);
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 			return 0;
 		}
 		spin_unlock(&de->d_lock);
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 	return 1;
 }
 
Index: linux-2.6/fs/affs/amigaffs.c
===================================================================
--- linux-2.6.orig/fs/affs/amigaffs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/affs/amigaffs.c	2010-11-17 00:52:37.000000000 +1100
@@ -128,7 +128,6 @@ affs_fix_dcache(struct dentry *dentry, u
 	void *data = dentry->d_fsdata;
 	struct list_head *head, *next;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	head = &inode->i_dentry;
 	next = head->next;
@@ -141,7 +140,6 @@ affs_fix_dcache(struct dentry *dentry, u
 		next = next->next;
 	}
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 }
 
 
Index: linux-2.6/fs/coda/cache.c
===================================================================
--- linux-2.6.orig/fs/coda/cache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/coda/cache.c	2010-11-17 00:52:37.000000000 +1100
@@ -93,7 +93,6 @@ static void coda_flag_children(struct de
 	struct list_head *child;
 	struct dentry *de;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	list_for_each(child, &parent->d_subdirs)
 	{
@@ -104,7 +103,6 @@ static void coda_flag_children(struct de
 		coda_flag_inode(de->d_inode, flag);
 	}
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 	return; 
 }
 
Index: linux-2.6/fs/configfs/configfs_internal.h
===================================================================
--- linux-2.6.orig/fs/configfs/configfs_internal.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/configfs/configfs_internal.h	2010-11-17 00:52:37.000000000 +1100
@@ -120,7 +120,6 @@ static inline struct config_item *config
 {
 	struct config_item * item = NULL;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	if (!d_unhashed(dentry)) {
 		struct configfs_dirent * sd = dentry->d_fsdata;
@@ -131,7 +130,6 @@ static inline struct config_item *config
 			item = config_item_get(sd->s_element);
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 
 	return item;
 }
Index: linux-2.6/fs/configfs/inode.c
===================================================================
--- linux-2.6.orig/fs/configfs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/configfs/inode.c	2010-11-17 01:05:39.000000000 +1100
@@ -250,18 +250,14 @@ void configfs_drop_dentry(struct configf
 	struct dentry * dentry = sd->s_dentry;
 
 	if (dentry) {
-		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry) && dentry->d_inode)) {
 			dget_locked_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 			simple_unlink(parent->d_inode, dentry);
-		} else {
+		} else
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
-		}
 	}
 }
 
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 01:05:39.000000000 +1100
@@ -394,7 +394,6 @@ ncp_dget_fpos(struct dentry *dentry, str
 	}
 
 	/* If a pointer is invalid, we search the dentry. */
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
@@ -405,13 +404,11 @@ ncp_dget_fpos(struct dentry *dentry, str
 			else
 				dent = NULL;
 			spin_unlock(&parent->d_lock);
-			spin_unlock(&dcache_lock);
 			goto out;
 		}
 		next = next->next;
 	}
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 	return NULL;
 
 out:
@@ -635,21 +632,18 @@ ncp_fill_cache(struct file *filp, void *
 			struct inode *inode = newdent->d_inode;
 
 			/*
-			 * Inside ncpfs all uses of d_name are either for debugging,
-			 * or on functions which acquire inode mutex (mknod, creat,
-			 * lookup).  So grab i_mutex here, to be sure.  d_path
-			 * uses dcache_lock when generating path, so we should too.
-			 * And finally d_compare is protected by dentry's d_lock, so
-			 * here we go.
+			 * Inside ncpfs all uses of d_name are either for
+			 * debugging, or on functions which acquire inode mutex
+			 * (mknod, creat, lookup).  So grab i_mutex here, to be
+			 * sure.  And finally d_compare is protected by
+			 * dentry's d_lock, so here we go.
 			 */
 			if (inode)
 				mutex_lock(&inode->i_mutex);
-			spin_lock(&dcache_lock);
 			spin_lock(&newdent->d_lock);
 			memcpy((char *) newdent->d_name.name, qname.name,
-								newdent->d_name.len);
+							newdent->d_name.len);
 			spin_unlock(&newdent->d_lock);
-			spin_unlock(&dcache_lock);
 			if (inode)
 				mutex_unlock(&inode->i_mutex);
 		}
Index: linux-2.6/fs/ncpfs/ncplib_kernel.h
===================================================================
--- linux-2.6.orig/fs/ncpfs/ncplib_kernel.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/ncplib_kernel.h	2010-11-17 00:52:37.000000000 +1100
@@ -193,7 +193,6 @@ ncp_renew_dentries(struct dentry *parent
 	struct list_head *next;
 	struct dentry *dentry;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
@@ -207,7 +206,6 @@ ncp_renew_dentries(struct dentry *parent
 		next = next->next;
 	}
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 static inline void
@@ -217,7 +215,6 @@ ncp_invalidate_dircache_entries(struct d
 	struct list_head *next;
 	struct dentry *dentry;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
@@ -227,7 +224,6 @@ ncp_invalidate_dircache_entries(struct d
 		next = next->next;
 	}
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 struct ncp_cache_head {
Index: linux-2.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.orig/fs/nfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/nfs/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -1692,11 +1692,9 @@ static int nfs_unlink(struct inode *dir,
 	dfprintk(VFS, "NFS: unlink(%s/%ld, %s)\n", dir->i_sb->s_id,
 		dir->i_ino, dentry->d_name.name);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	if (dentry->d_count > 1) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
 		/* Start asynchronous writeout of the inode */
 		write_inode_now(dentry->d_inode, 0);
 		error = nfs_sillyrename(dir, dentry);
@@ -1707,7 +1705,6 @@ static int nfs_unlink(struct inode *dir,
 		need_rehash = 1;
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 	error = nfs_safe_remove(dentry);
 	if (!error || error == -ENOENT) {
 		nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
Index: linux-2.6/fs/ocfs2/dcache.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ocfs2/dcache.c	2010-11-17 01:05:39.000000000 +1100
@@ -169,7 +169,6 @@ struct dentry *ocfs2_find_local_alias(st
 	struct list_head *p;
 	struct dentry *dentry = NULL;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	list_for_each(p, &inode->i_dentry) {
 		dentry = list_entry(p, struct dentry, d_alias);
@@ -189,7 +188,6 @@ struct dentry *ocfs2_find_local_alias(st
 	}
 
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 
 	return dentry;
 }
Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/kernel/cgroup.c	2010-11-17 01:05:39.000000000 +1100
@@ -874,7 +874,6 @@ static void cgroup_clear_directory(struc
 	struct list_head *node;
 
 	BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	node = dentry->d_subdirs.next;
 	while (node != &dentry->d_subdirs) {
@@ -889,18 +888,15 @@ static void cgroup_clear_directory(struc
 			dget_locked_dlock(d);
 			spin_unlock(&d->d_lock);
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 			d_delete(d);
 			simple_unlink(dentry->d_inode, d);
 			dput(d);
-			spin_lock(&dcache_lock);
 			spin_lock(&dentry->d_lock);
 		} else
 			spin_unlock(&d->d_lock);
 		node = dentry->d_subdirs.next;
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /*
@@ -912,14 +908,12 @@ static void cgroup_d_remove_dir(struct d
 
 	cgroup_clear_directory(dentry);
 
-	spin_lock(&dcache_lock);
 	parent = dentry->d_parent;
 	spin_lock(&parent->d_lock);
 	spin_lock(&dentry->d_lock);
 	list_del_init(&dentry->d_u.d_child);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 	remove_dir(dentry);
 }
 
Index: linux-2.6/security/selinux/selinuxfs.c
===================================================================
--- linux-2.6.orig/security/selinux/selinuxfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/security/selinux/selinuxfs.c	2010-11-17 01:05:39.000000000 +1100
@@ -1145,7 +1145,6 @@ static void sel_remove_entries(struct de
 {
 	struct list_head *node;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&de->d_lock);
 	node = de->d_subdirs.next;
 	while (node != &de->d_subdirs) {
@@ -1158,11 +1157,9 @@ static void sel_remove_entries(struct de
 			dget_locked_dlock(d);
 			spin_unlock(&de->d_lock);
 			spin_unlock(&d->d_lock);
-			spin_unlock(&dcache_lock);
 			d_delete(d);
 			simple_unlink(de->d_inode, d);
 			dput(d);
-			spin_lock(&dcache_lock);
 			spin_lock(&de->d_lock);
 		} else
 			spin_unlock(&d->d_lock);
@@ -1170,7 +1167,6 @@ static void sel_remove_entries(struct de
 	}
 
 	spin_unlock(&de->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 #define BOOL_DIR_NAME "booleans"
Index: linux-2.6/fs/nfs/getroot.c
===================================================================
--- linux-2.6.orig/fs/nfs/getroot.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/nfs/getroot.c	2010-11-17 00:52:37.000000000 +1100
@@ -63,13 +63,11 @@ static int nfs_superblock_set_dummy_root
 		 * This again causes shrink_dcache_for_umount_subtree() to
 		 * Oops, since the test for IS_ROOT() will fail.
 		 */
-		spin_lock(&dcache_lock);
 		spin_lock(&dcache_inode_lock);
 		spin_lock(&sb->s_root->d_lock);
 		list_del_init(&sb->s_root->d_alias);
 		spin_unlock(&sb->s_root->d_lock);
 		spin_unlock(&dcache_inode_lock);
-		spin_unlock(&dcache_lock);
 	}
 	return 0;
 }
Index: linux-2.6/drivers/staging/pohmelfs/path_entry.c
===================================================================
--- linux-2.6.orig/drivers/staging/pohmelfs/path_entry.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/staging/pohmelfs/path_entry.c	2010-11-17 00:52:37.000000000 +1100
@@ -101,7 +101,6 @@ int pohmelfs_path_length(struct pohmelfs
 	d = first;
 	seq = read_seqbegin(&rename_lock);
 	rcu_read_lock();
-	spin_lock(&dcache_lock);
 
 	if (!IS_ROOT(d) && d_unhashed(d))
 		len += UNHASHED_OBSCURE_STRING_SIZE; /* Obscure " (deleted)" string */
@@ -110,7 +109,6 @@ int pohmelfs_path_length(struct pohmelfs
 		len += d->d_name.len + 1; /* Plus slash */
 		d = d->d_parent;
 	}
-	spin_unlock(&dcache_lock);
 	rcu_read_unlock();
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
Index: linux-2.6/fs/nfs/namespace.c
===================================================================
--- linux-2.6.orig/fs/nfs/namespace.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/nfs/namespace.c	2010-11-17 00:52:37.000000000 +1100
@@ -60,7 +60,6 @@ char *nfs_path(const char *base,
 
 	seq = read_seqbegin(&rename_lock);
 	rcu_read_lock();
-	spin_lock(&dcache_lock);
 	while (!IS_ROOT(dentry) && dentry != droot) {
 		namelen = dentry->d_name.len;
 		buflen -= namelen + 1;
@@ -71,7 +70,6 @@ char *nfs_path(const char *base,
 		*--end = '/';
 		dentry = dentry->d_parent;
 	}
-	spin_unlock(&dcache_lock);
 	rcu_read_unlock();
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
@@ -91,7 +89,6 @@ char *nfs_path(const char *base,
 	memcpy(end, base, namelen);
 	return end;
 Elong_unlock:
-	spin_unlock(&dcache_lock);
 	rcu_read_unlock();
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
Index: linux-2.6/include/linux/fsnotify_backend.h
===================================================================
--- linux-2.6.orig/include/linux/fsnotify_backend.h	2010-11-17 00:50:48.000000000 +1100
+++ linux-2.6/include/linux/fsnotify_backend.h	2010-11-17 00:52:37.000000000 +1100
@@ -329,9 +329,15 @@ static inline void __fsnotify_update_dca
 {
 	struct dentry *parent;
 
-	assert_spin_locked(&dcache_lock);
 	assert_spin_locked(&dentry->d_lock);
 
+	/*
+	 * Serialisation of setting PARENT_WATCHED on the dentries is provided
+	 * by d_lock. If inotify_inode_watched changes after we have taken
+	 * d_lock, the following __fsnotify_update_child_dentry_flags call will
+	 * find our entry, so it will spin until we complete here, and update
+	 * us with the new state.
+	 */
 	parent = dentry->d_parent;
 	if (parent->d_inode && fsnotify_inode_watches_children(parent->d_inode))
 		dentry->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
@@ -341,15 +347,12 @@ static inline void __fsnotify_update_dca
 
 /*
  * fsnotify_d_instantiate - instantiate a dentry for inode
- * Called with dcache_lock held.
  */
 static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
 {
 	if (!inode)
 		return;
 
-	assert_spin_locked(&dcache_lock);
-
 	spin_lock(&dentry->d_lock);
 	__fsnotify_update_dcache_flags(dentry);
 	spin_unlock(&dentry->d_lock);
Index: linux-2.6/fs/notify/fsnotify.c
===================================================================
--- linux-2.6.orig/fs/notify/fsnotify.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/notify/fsnotify.c	2010-11-17 00:52:37.000000000 +1100
@@ -59,7 +59,6 @@ void __fsnotify_update_child_dentry_flag
 	/* determine if the children should tell inode about their events */
 	watched = fsnotify_inode_watches_children(inode);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	/* run all of the dentries associated with this inode.  Since this is a
 	 * directory, there damn well better only be one item on this list */
@@ -84,7 +83,6 @@ void __fsnotify_update_child_dentry_flag
 		spin_unlock(&alias->d_lock);
 	}
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /* Notify this dentry's parent about a child's events. */
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2010-11-17 00:50:47.000000000 +1100
+++ linux-2.6/include/linux/fs.h	2010-11-17 00:52:37.000000000 +1100
@@ -1377,7 +1377,7 @@ struct super_block {
 #else
 	struct list_head	s_files;
 #endif
-	/* s_dentry_lru and s_nr_dentry_unused are protected by dcache_lock */
+	/* s_dentry_lru, s_nr_dentry_unused protected by dcache.c lru locks */
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
@@ -2446,6 +2446,10 @@ static inline ino_t parent_ino(struct de
 {
 	ino_t res;
 
+	/*
+	 * Don't strictly need d_lock here? If the parent ino could change
+	 * then surely we'd have a deeper race in the caller?
+	 */
 	spin_lock(&dentry->d_lock);
 	res = dentry->d_parent->d_inode->i_ino;
 	spin_unlock(&dentry->d_lock);
Index: linux-2.6/fs/autofs4/autofs_i.h
===================================================================
--- linux-2.6.orig/fs/autofs4/autofs_i.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/autofs_i.h	2010-11-17 00:52:37.000000000 +1100
@@ -16,6 +16,7 @@
 #include <linux/auto_fs4.h>
 #include <linux/auto_dev-ioctl.h>
 #include <linux/mutex.h>
+#include <linux/spinlock.h>
 #include <linux/list.h>
 
 /* This is the range of ioctl() numbers we claim as ours */
@@ -60,6 +61,8 @@ do {							\
 		current->pid, __func__, ##args);	\
 } while (0)
 
+extern spinlock_t autofs4_lock;
+
 /* Unified info structure.  This is pointed to by both the dentry and
    inode structures.  Each file in the filesystem has an instance of this
    structure.  It holds a reference to the dentry, so dentries are never
Index: linux-2.6/fs/autofs4/expire.c
===================================================================
--- linux-2.6.orig/fs/autofs4/expire.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/expire.c	2010-11-17 00:52:37.000000000 +1100
@@ -102,7 +102,7 @@ static struct dentry *get_next_positive_
 	if (prev == NULL)
 		return dget(prev);
 
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 relock:
 	p = prev;
 	spin_lock(&p->d_lock);
@@ -114,7 +114,7 @@ static struct dentry *get_next_positive_
 
 			if (p == root) {
 				spin_unlock(&p->d_lock);
-				spin_unlock(&dcache_lock);
+				spin_unlock(&autofs4_lock);
 				dput(prev);
 				return NULL;
 			}
@@ -144,7 +144,7 @@ static struct dentry *get_next_positive_
 	dget_dlock(ret);
 	spin_unlock(&ret->d_lock);
 	spin_unlock(&p->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	dput(prev);
 
@@ -408,13 +408,13 @@ struct dentry *autofs4_expire_indirect(s
 	ino->flags |= AUTOFS_INF_EXPIRING;
 	init_completion(&ino->expire_complete);
 	spin_unlock(&sbi->fs_lock);
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&expired->d_parent->d_lock);
 	spin_lock_nested(&expired->d_lock, DENTRY_D_LOCK_NESTED);
   	list_move(&expired->d_parent->d_subdirs, &expired->d_u.d_child);
 	spin_unlock(&expired->d_lock);
 	spin_unlock(&expired->d_parent->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 	return expired;
 }
 
Index: linux-2.6/fs/autofs4/root.c
===================================================================
--- linux-2.6.orig/fs/autofs4/root.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/root.c	2010-11-17 00:52:37.000000000 +1100
@@ -23,6 +23,8 @@
 
 #include "autofs_i.h"
 
+DEFINE_SPINLOCK(autofs4_lock);
+
 static int autofs4_dir_symlink(struct inode *,struct dentry *,const char *);
 static int autofs4_dir_unlink(struct inode *,struct dentry *);
 static int autofs4_dir_rmdir(struct inode *,struct dentry *);
@@ -142,15 +144,15 @@ static int autofs4_dir_open(struct inode
 	 * autofs file system so just let the libfs routines handle
 	 * it.
 	 */
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&dentry->d_lock);
 	if (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs)) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 		return -ENOENT;
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 out:
 	return dcache_dir_open(inode, file);
@@ -255,11 +257,11 @@ static void *autofs4_follow_link(struct
 	/* We trigger a mount for almost all flags */
 	lookup_type = autofs4_need_mount(nd->flags);
 	spin_lock(&sbi->fs_lock);
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&dentry->d_lock);
 	if (!(lookup_type || ino->flags & AUTOFS_INF_PENDING)) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 		spin_unlock(&sbi->fs_lock);
 		goto follow;
 	}
@@ -272,7 +274,7 @@ static void *autofs4_follow_link(struct
 	if (ino->flags & AUTOFS_INF_PENDING ||
 	    (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs))) {
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 		spin_unlock(&sbi->fs_lock);
 
 		status = try_to_fill_dentry(dentry, nd->flags);
@@ -282,7 +284,7 @@ static void *autofs4_follow_link(struct
 		goto follow;
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 	spin_unlock(&sbi->fs_lock);
 follow:
 	/*
@@ -353,14 +355,14 @@ static int autofs4_revalidate(struct den
 		return 0;
 
 	/* Check for a non-mountpoint directory with no contents */
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&dentry->d_lock);
 	if (S_ISDIR(dentry->d_inode->i_mode) &&
 	    !d_mountpoint(dentry) && list_empty(&dentry->d_subdirs)) {
 		DPRINTK("dentry=%p %.*s, emptydir",
 			 dentry, dentry->d_name.len, dentry->d_name.name);
 		spin_unlock(&dentry->d_lock);
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 
 		/* The daemon never causes a mount to trigger */
 		if (oz_mode)
@@ -377,7 +379,7 @@ static int autofs4_revalidate(struct den
 		return status;
 	}
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	return 1;
 }
@@ -432,7 +434,7 @@ static struct dentry *autofs4_lookup_act
 	const unsigned char *str = name->name;
 	struct list_head *p, *head;
 
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&sbi->lookup_lock);
 	head = &sbi->active_list;
 	list_for_each(p, head) {
@@ -465,14 +467,14 @@ static struct dentry *autofs4_lookup_act
 			dget_dlock(active);
 			spin_unlock(&active->d_lock);
 			spin_unlock(&sbi->lookup_lock);
-			spin_unlock(&dcache_lock);
+			spin_unlock(&autofs4_lock);
 			return active;
 		}
 next:
 		spin_unlock(&active->d_lock);
 	}
 	spin_unlock(&sbi->lookup_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	return NULL;
 }
@@ -487,7 +489,7 @@ static struct dentry *autofs4_lookup_exp
 	const unsigned char *str = name->name;
 	struct list_head *p, *head;
 
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&sbi->lookup_lock);
 	head = &sbi->expiring_list;
 	list_for_each(p, head) {
@@ -520,14 +522,14 @@ static struct dentry *autofs4_lookup_exp
 			dget_dlock(expiring);
 			spin_unlock(&expiring->d_lock);
 			spin_unlock(&sbi->lookup_lock);
-			spin_unlock(&dcache_lock);
+			spin_unlock(&autofs4_lock);
 			return expiring;
 		}
 next:
 		spin_unlock(&expiring->d_lock);
 	}
 	spin_unlock(&sbi->lookup_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	return NULL;
 }
@@ -763,12 +765,12 @@ static int autofs4_dir_unlink(struct ino
 
 	dir->i_mtime = CURRENT_TIME;
 
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	autofs4_add_expiring(dentry);
 	spin_lock(&dentry->d_lock);
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	return 0;
 }
@@ -785,20 +787,20 @@ static int autofs4_dir_rmdir(struct inod
 	if (!autofs4_oz_mode(sbi))
 		return -EACCES;
 
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	spin_lock(&sbi->lookup_lock);
 	spin_lock(&dentry->d_lock);
 	if (!list_empty(&dentry->d_subdirs)) {
 		spin_unlock(&dentry->d_lock);
 		spin_unlock(&sbi->lookup_lock);
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 		return -ENOTEMPTY;
 	}
 	__autofs4_add_expiring(dentry);
 	spin_unlock(&sbi->lookup_lock);
 	__d_drop(dentry);
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 
 	if (atomic_dec_and_test(&ino->count)) {
 		p_ino = autofs4_dentry_ino(dentry->d_parent);
Index: linux-2.6/fs/autofs4/waitq.c
===================================================================
--- linux-2.6.orig/fs/autofs4/waitq.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/autofs4/waitq.c	2010-11-17 00:52:37.000000000 +1100
@@ -194,14 +194,15 @@ static int autofs4_getpath(struct autofs
 rename_retry:
 	buf = *name;
 	len = 0;
+
 	seq = read_seqbegin(&rename_lock);
 	rcu_read_lock();
-	spin_lock(&dcache_lock);
+	spin_lock(&autofs4_lock);
 	for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent)
 		len += tmp->d_name.len + 1;
 
 	if (!len || --len > NAME_MAX) {
-		spin_unlock(&dcache_lock);
+		spin_unlock(&autofs4_lock);
 		rcu_read_unlock();
 		if (read_seqretry(&rename_lock, seq))
 			goto rename_retry;
@@ -217,7 +218,7 @@ static int autofs4_getpath(struct autofs
 		p -= tmp->d_name.len;
 		strncpy(p, tmp->d_name.name, tmp->d_name.len);
 	}
-	spin_unlock(&dcache_lock);
+	spin_unlock(&autofs4_lock);
 	rcu_read_unlock();
 	if (read_seqretry(&rename_lock, seq))
 		goto rename_retry;
Index: linux-2.6/drivers/infiniband/hw/qib/qib_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 01:05:39.000000000 +1100
@@ -453,17 +453,14 @@ static int remove_file(struct dentry *pa
 		goto bail;
 	}
 
-	spin_lock(&dcache_lock);
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
 		dget_locked_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
-		spin_unlock(&dcache_lock);
 		simple_unlink(parent->d_inode, tmp);
 	} else {
 		spin_unlock(&tmp->d_lock);
-		spin_unlock(&dcache_lock);
 	}
 
 	ret = 0;
Index: linux-2.6/fs/ceph/dir.c
===================================================================
--- linux-2.6.orig/fs/ceph/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ceph/dir.c	2010-11-17 00:52:37.000000000 +1100
@@ -111,7 +111,6 @@ static int __dcache_readdir(struct file
 	dout("__dcache_readdir %p at %llu (last %p)\n", dir, filp->f_pos,
 	     last);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 
 	/* start at beginning? */
@@ -155,7 +154,6 @@ static int __dcache_readdir(struct file
 	dget_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 
 	dout(" %llu (%llu) dentry %p %.*s %p\n", di->offset, filp->f_pos,
 	     dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
@@ -181,21 +179,19 @@ static int __dcache_readdir(struct file
 
 	filp->f_pos++;
 
-	/* make sure a dentry wasn't dropped while we didn't have dcache_lock */
+	/* make sure a dentry wasn't dropped while we didn't have parent lock */
 	if (!ceph_i_test(dir, CEPH_I_COMPLETE)) {
 		dout(" lost I_COMPLETE on %p; falling back to mds\n", dir);
 		err = -EAGAIN;
 		goto out;
 	}
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	p = p->prev;	/* advance to next dentry */
 	goto more;
 
 out_unlock:
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 out:
 	if (last)
 		dput(last);
Index: linux-2.6/fs/ceph/inode.c
===================================================================
--- linux-2.6.orig/fs/ceph/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ceph/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -828,7 +828,6 @@ static void ceph_set_dentry_offset(struc
 	di->offset = ceph_inode(inode)->i_max_offset++;
 	spin_unlock(&inode->i_lock);
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dir->d_lock);
 	spin_lock_nested(&dn->d_lock, DENTRY_D_LOCK_NESTED);
 	list_move(&dn->d_u.d_child, &dir->d_subdirs);
@@ -836,7 +835,6 @@ static void ceph_set_dentry_offset(struc
 	     dn->d_u.d_child.prev, dn->d_u.d_child.next);
 	spin_unlock(&dn->d_lock);
 	spin_unlock(&dir->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /*
@@ -1219,13 +1217,11 @@ int ceph_readdir_prepopulate(struct ceph
 			goto retry_lookup;
 		} else {
 			/* reorder parent's d_subdirs */
-			spin_lock(&dcache_lock);
 			spin_lock(&parent->d_lock);
 			spin_lock_nested(&dn->d_lock, DENTRY_D_LOCK_NESTED);
 			list_move(&dn->d_u.d_child, &parent->d_subdirs);
 			spin_unlock(&dn->d_lock);
 			spin_unlock(&parent->d_lock);
-			spin_unlock(&dcache_lock);
 		}
 
 		di = dn->d_fsdata;
Index: linux-2.6/Documentation/filesystems/dentry-locking.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/dentry-locking.txt	2010-11-17 00:50:48.000000000 +1100
+++ linux-2.6/Documentation/filesystems/dentry-locking.txt	2010-11-17 00:52:37.000000000 +1100
@@ -31,6 +31,7 @@ significant change is the way d_lookup t
 doesn't acquire the dcache_lock for this and rely on RCU to ensure
 that the dentry has not been *freed*.
 
+dcache_lock no longer exists, dentry locking is explained in fs/dcache.c
 
 Dcache locking details
 ======================
@@ -50,14 +51,12 @@ Safe lock-free look-up of dcache hash ta
 
 Dcache is a complex data structure with the hash table entries also
 linked together in other lists. In 2.4 kernel, dcache_lock protected
-all the lists. We applied RCU only on hash chain walking. The rest of
-the lists are still protected by dcache_lock.  Some of the important
-changes are :
+all the lists. RCU dentry hash walking works like this:
 
 1. The deletion from hash chain is done using hlist_del_rcu() macro
    which doesn't initialize next pointer of the deleted dentry and
    this allows us to walk safely lock-free while a deletion is
-   happening.
+   happening. This is a standard hlist_rcu iteration.
 
 2. Insertion of a dentry into the hash table is done using
    hlist_add_head_rcu() which take care of ordering the writes - the
@@ -66,19 +65,18 @@ the lists are still protected by dcache_
    which has since been replaced by hlist_for_each_entry_rcu(), while
    walking the hash chain. The only requirement is that all
    initialization to the dentry must be done before
-   hlist_add_head_rcu() since we don't have dcache_lock protection
-   while traversing the hash chain. This isn't different from the
-   existing code.
-
-3. The dentry looked up without holding dcache_lock by cannot be
-   returned for walking if it is unhashed. It then may have a NULL
-   d_inode or other bogosity since RCU doesn't protect the other
-   fields in the dentry. We therefore use a flag DCACHE_UNHASHED to
-   indicate unhashed dentries and use this in conjunction with a
-   per-dentry lock (d_lock). Once looked up without the dcache_lock,
-   we acquire the per-dentry lock (d_lock) and check if the dentry is
-   unhashed. If so, the look-up is failed. If not, the reference count
-   of the dentry is increased and the dentry is returned.
+   hlist_add_head_rcu() since we don't have lock protection
+   while traversing the hash chain.
+
+3. The dentry looked up without holding locks cannot be returned for
+   walking if it is unhashed. It then may have a NULL d_inode or other
+   bogosity since RCU doesn't protect the other fields in the dentry. We
+   therefore use a flag DCACHE_UNHASHED to indicate unhashed dentries
+   and use this in conjunction with a per-dentry lock (d_lock). Once
+   looked up without locks, we acquire the per-dentry lock (d_lock) and
+   check if the dentry is unhashed. If so, the look-up is failed. If not,
+   the reference count of the dentry is increased and the dentry is
+   returned.
 
 4. Once a dentry is looked up, it must be ensured during the path walk
    for that component it doesn't go away. In pre-2.5.10 code, this was
@@ -86,10 +84,10 @@ the lists are still protected by dcache_
    In some sense, dcache_rcu path walking looks like the pre-2.5.10
    version.
 
-5. All dentry hash chain updates must take the dcache_lock as well as
-   the per-dentry lock in that order. dput() does this to ensure that
-   a dentry that has just been looked up in another CPU doesn't get
-   deleted before dget() can be done on it.
+5. All dentry hash chain updates must take the per-dentry lock (see
+   fs/dcache.c). This excludes dput() to ensure that a dentry that has
+   been looked up concurrently does not get deleted before dget() can
+   take a ref.
 
 6. There are several ways to do reference counting of RCU protected
    objects. One such example is in ipv4 route cache where deferred
Index: linux-2.6/Documentation/filesystems/porting
===================================================================
--- linux-2.6.orig/Documentation/filesystems/porting	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/Documentation/filesystems/porting	2010-11-17 00:52:37.000000000 +1100
@@ -216,7 +216,6 @@ had ->revalidate()) add calls in ->follo
 ->d_parent changes are not protected by BKL anymore.  Read access is safe
 if at least one of the following is true:
 	* filesystem has no cross-directory rename()
-	* dcache_lock is held
 	* we know that parent had been locked (e.g. we are looking at
 ->d_parent of ->lookup() argument).
 	* we are called from ->rename().
@@ -341,3 +340,10 @@ look at examples of other filesystems) f
 changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
 look at examples of other filesystems) for guidance.
 
+---
+[mandatory]
+	dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
+for details of what locks to replace dcache_lock with in order to protect
+particular things. Most of the time, a filesystem only needs ->d_lock, which
+protects *all* the dcache state of a given dentry.
+
Index: linux-2.6/include/linux/namei.h
===================================================================
--- linux-2.6.orig/include/linux/namei.h	2010-11-17 00:50:47.000000000 +1100
+++ linux-2.6/include/linux/namei.h	2010-11-17 00:52:37.000000000 +1100
@@ -41,7 +41,6 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LA
  *  - require a directory
  *  - ending slashes ok even for nonexistent files
  *  - internal "there are more path components" flag
- *  - locked when lookup done with dcache_lock held
  *  - dentry cache is untrusted; force a real lookup
  */
 #define LOOKUP_FOLLOW		 1
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2010-11-17 00:50:48.000000000 +1100
+++ linux-2.6/mm/filemap.c	2010-11-17 00:52:37.000000000 +1100
@@ -102,9 +102,6 @@
  *    ->inode_lock		(zap_pte_range->set_page_dirty)
  *    ->private_lock		(zap_pte_range->__set_page_dirty_buffers)
  *
- *  ->task->proc_lock
- *    ->dcache_lock		(proc_pid_lookup)
- *
  *  (code doesn't rely on that order, so you could switch it around)
  *  ->tasklist_lock             (memory_failure, collect_procs_ao)
  *    ->i_mmap_lock
Index: linux-2.6/fs/9p/vfs_inode.c
===================================================================
--- linux-2.6.orig/fs/9p/vfs_inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/9p/vfs_inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -270,13 +270,11 @@ static struct dentry *v9fs_dentry_from_d
 {
 	struct dentry *dentry;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	/* Directory should have only one entry. */
 	BUG_ON(S_ISDIR(inode->i_mode) && !list_is_singular(&inode->i_dentry));
 	dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	return dentry;
 }
 
Index: linux-2.6/fs/cifs/inode.c
===================================================================
--- linux-2.6.orig/fs/cifs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/cifs/inode.c	2010-11-17 00:52:37.000000000 +1100
@@ -804,17 +804,14 @@ inode_has_hashed_dentries(struct inode *
 {
 	struct dentry *dentry;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		if (!d_unhashed(dentry) || IS_ROOT(dentry)) {
 			spin_unlock(&dcache_inode_lock);
-			spin_unlock(&dcache_lock);
 			return true;
 		}
 	}
 	spin_unlock(&dcache_inode_lock);
-	spin_unlock(&dcache_lock);
 	return false;
 }
 
Index: linux-2.6/drivers/staging/smbfs/cache.c
===================================================================
--- linux-2.6.orig/drivers/staging/smbfs/cache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/staging/smbfs/cache.c	2010-11-17 01:05:39.000000000 +1100
@@ -62,7 +62,6 @@ smb_invalidate_dircache_entries(struct d
 	struct list_head *next;
 	struct dentry *dentry;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
@@ -72,7 +71,6 @@ smb_invalidate_dircache_entries(struct d
 		next = next->next;
 	}
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 }
 
 /*
@@ -98,7 +96,6 @@ smb_dget_fpos(struct dentry *dentry, str
 	}
 
 	/* If a pointer is invalid, we search the dentry. */
-	spin_lock(&dcache_lock);
 	spin_lock(&parent->d_lock);
 	next = parent->d_subdirs.next;
 	while (next != &parent->d_subdirs) {
@@ -115,7 +112,6 @@ smb_dget_fpos(struct dentry *dentry, str
 	dent = NULL;
 out_unlock:
 	spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_lock);
 	return dent;
 }
 
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/libfs.c	2010-11-17 00:52:37.000000000 +1100
@@ -100,7 +100,6 @@ loff_t dcache_dir_lseek(struct file *fil
 			struct dentry *cursor = file->private_data;
 			loff_t n = file->f_pos - 2;
 
-			spin_lock(&dcache_lock);
 			spin_lock(&dentry->d_lock);
 			/* d_lock not required for cursor */
 			list_del(&cursor->d_u.d_child);
@@ -116,7 +115,6 @@ loff_t dcache_dir_lseek(struct file *fil
 			}
 			list_add_tail(&cursor->d_u.d_child, p);
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 		}
 	}
 	mutex_unlock(&dentry->d_inode->i_mutex);
@@ -159,7 +157,6 @@ int dcache_readdir(struct file * filp, v
 			i++;
 			/* fallthrough */
 		default:
-			spin_lock(&dcache_lock);
 			spin_lock(&dentry->d_lock);
 			if (filp->f_pos == 2)
 				list_move(q, &dentry->d_subdirs);
@@ -175,13 +172,11 @@ int dcache_readdir(struct file * filp, v
 
 				spin_unlock(&next->d_lock);
 				spin_unlock(&dentry->d_lock);
-				spin_unlock(&dcache_lock);
 				if (filldir(dirent, next->d_name.name, 
 					    next->d_name.len, filp->f_pos, 
 					    next->d_inode->i_ino, 
 					    dt_type(next->d_inode)) < 0)
 					return 0;
-				spin_lock(&dcache_lock);
 				spin_lock(&dentry->d_lock);
 				spin_lock_nested(&next->d_lock, DENTRY_D_LOCK_NESTED);
 				/* next is still alive */
@@ -191,7 +186,6 @@ int dcache_readdir(struct file * filp, v
 				filp->f_pos++;
 			}
 			spin_unlock(&dentry->d_lock);
-			spin_unlock(&dcache_lock);
 	}
 	return 0;
 }
@@ -285,7 +279,6 @@ int simple_empty(struct dentry *dentry)
 	struct dentry *child;
 	int ret = 0;
 
-	spin_lock(&dcache_lock);
 	spin_lock(&dentry->d_lock);
 	list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
 		spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
@@ -298,7 +291,6 @@ int simple_empty(struct dentry *dentry)
 	ret = 1;
 out:
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_lock);
 	return ret;
 }
 
Index: linux-2.6/include/linux/fsnotify.h
===================================================================
--- linux-2.6.orig/include/linux/fsnotify.h	2010-11-17 00:50:48.000000000 +1100
+++ linux-2.6/include/linux/fsnotify.h	2010-11-17 00:52:37.000000000 +1100
@@ -17,7 +17,6 @@
 
 /*
  * fsnotify_d_instantiate - instantiate a dentry for inode
- * Called with dcache_lock held.
  */
 static inline void fsnotify_d_instantiate(struct dentry *dentry,
 					  struct inode *inode)
@@ -62,7 +61,6 @@ static inline int fsnotify_perm(struct f
 
 /*
  * fsnotify_d_move - dentry has been moved
- * Called with dcache_lock and dentry->d_lock held.
  */
 static inline void fsnotify_d_move(struct dentry *dentry)
 {



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 19/28] fs: dcache avoid starvation in dcache multi-step operations
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (17 preceding siblings ...)
  2010-11-16 14:09 ` [patch 18/28] fs: dcache remove dcache_lock Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 20/28] fs: dcache reduce dput locking Nick Piggin
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-multi-step-nostarve.patch --]
[-- Type: text/plain, Size: 4109 bytes --]

Long lived dcache "multi-step" operations which retry on rename seq can
be starved with a lot of rename activity. If they fail after the 1st pass,
take the rename_lock for writing to avoid further starvation.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   56 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 14 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:40.000000000 +1100
@@ -937,10 +937,11 @@ int have_submounts(struct dentry *parent
 	struct dentry *this_parent;
 	struct list_head *next;
 	unsigned seq;
+	int locked = 0;
 
-rename_retry:
-	this_parent = parent;
 	seq = read_seqbegin(&rename_lock);
+again:
+	this_parent = parent;
 
 	if (d_mountpoint(parent))
 		goto positive;
@@ -985,7 +986,7 @@ int have_submounts(struct dentry *parent
 		/* might go back up the wrong parent if we have had a rename
 		 * or deletion */
 		if (this_parent != child->d_parent ||
-				read_seqretry(&rename_lock, seq)) {
+			 (!locked && read_seqretry(&rename_lock, seq))) {
 			spin_unlock(&this_parent->d_lock);
 			rcu_read_unlock();
 			goto rename_retry;
@@ -995,13 +996,22 @@ int have_submounts(struct dentry *parent
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
-	if (read_seqretry(&rename_lock, seq))
+	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
+	if (locked)
+		write_sequnlock(&rename_lock);
 	return 0; /* No mount points found in tree */
 positive:
-	if (read_seqretry(&rename_lock, seq))
+	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
+	if (locked)
+		write_sequnlock(&rename_lock);
 	return 1;
+
+rename_retry:
+	locked = 1;
+	write_seqlock(&rename_lock);
+	goto again;
 }
 EXPORT_SYMBOL(have_submounts);
 
@@ -1025,11 +1035,11 @@ static int select_parent(struct dentry *
 	struct list_head *next;
 	unsigned seq;
 	int found = 0;
+	int locked = 0;
 
-rename_retry:
-	this_parent = parent;
 	seq = read_seqbegin(&rename_lock);
-
+again:
+	this_parent = parent;
 	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
@@ -1091,7 +1101,7 @@ static int select_parent(struct dentry *
 		/* might go back up the wrong parent if we have had a rename
 		 * or deletion */
 		if (this_parent != child->d_parent ||
-				read_seqretry(&rename_lock, seq)) {
+			(!locked && read_seqretry(&rename_lock, seq))) {
 			spin_unlock(&this_parent->d_lock);
 			rcu_read_unlock();
 			goto rename_retry;
@@ -1102,9 +1112,18 @@ static int select_parent(struct dentry *
 	}
 out:
 	spin_unlock(&this_parent->d_lock);
-	if (read_seqretry(&rename_lock, seq))
+	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
+	if (locked)
+		write_sequnlock(&rename_lock);
 	return found;
+
+rename_retry:
+	if (found)
+		return found;
+	locked = 1;
+	write_seqlock(&rename_lock);
+	goto again;
 }
 
 /**
@@ -2602,10 +2621,11 @@ void d_genocide(struct dentry *root)
 	struct dentry *this_parent;
 	struct list_head *next;
 	unsigned seq;
+	int locked = 0;
 
-rename_retry:
-	this_parent = root;
 	seq = read_seqbegin(&rename_lock);
+again:
+	this_parent = root;
 	spin_lock(&this_parent->d_lock);
 repeat:
 	next = this_parent->d_subdirs.next;
@@ -2650,7 +2670,7 @@ void d_genocide(struct dentry *root)
 		/* might go back up the wrong parent if we have had a rename
 		 * or deletion */
 		if (this_parent != child->d_parent ||
-				read_seqretry(&rename_lock, seq)) {
+			 (!locked && read_seqretry(&rename_lock, seq))) {
 			spin_unlock(&this_parent->d_lock);
 			rcu_read_unlock();
 			goto rename_retry;
@@ -2660,8 +2680,16 @@ void d_genocide(struct dentry *root)
 		goto resume;
 	}
 	spin_unlock(&this_parent->d_lock);
-	if (read_seqretry(&rename_lock, seq))
+	if (!locked && read_seqretry(&rename_lock, seq))
 		goto rename_retry;
+	if (locked)
+		write_sequnlock(&rename_lock);
+	return;
+
+rename_retry:
+	locked = 1;
+	write_seqlock(&rename_lock);
+	goto again;
 }
 
 /**



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 20/28] fs: dcache reduce dput locking
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (18 preceding siblings ...)
  2010-11-16 14:09 ` [patch 19/28] fs: dcache avoid starvation in dcache multi-step operations Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 21/28] fs: dcache reduce locking in d_alloc Nick Piggin
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-dput-less-dcache_lock.patch --]
[-- Type: text/plain, Size: 2402 bytes --]

It is possible to run dput without taking data structure locks up-front. In
many cases where we don't kill the dentry anyway, these locks are not required.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   56 +++++++++++++++++++++++++-------------------------------
 1 file changed, 25 insertions(+), 31 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:40.000000000 +1100
@@ -283,35 +283,16 @@ void dput(struct dentry *dentry)
 	if (dentry->d_count == 1)
 		might_sleep();
 	spin_lock(&dentry->d_lock);
-	if (IS_ROOT(dentry))
-		parent = NULL;
-	else
-		parent = dentry->d_parent;
-	if (dentry->d_count == 1) {
-		if (!spin_trylock(&dcache_inode_lock)) {
-drop2:
-			spin_unlock(&dentry->d_lock);
-			goto repeat;
-		}
-		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dcache_inode_lock);
-			goto drop2;
-		}
-	}
-	dentry->d_count--;
-	if (dentry->d_count) {
+	BUG_ON(!dentry->d_count);
+	if (dentry->d_count > 1) {
+		dentry->d_count--;
 		spin_unlock(&dentry->d_lock);
-		if (parent)
-			spin_unlock(&parent->d_lock);
-		return;
-	}
+ 		return;
+ 	}
 
-	/*
-	 * AV: ->d_delete() is _NOT_ allowed to block now.
-	 */
 	if (dentry->d_op && dentry->d_op->d_delete) {
 		if (dentry->d_op->d_delete(dentry))
-			goto unhash_it;
+			goto kill_it;
 	}
 
 	/* Unreachable? Get rid of it */
@@ -322,17 +303,30 @@ void dput(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_REFERENCED;
 	dentry_lru_add(dentry);
 
- 	spin_unlock(&dentry->d_lock);
-	if (parent)
-		spin_unlock(&parent->d_lock);
-	spin_unlock(&dcache_inode_lock);
+	dentry->d_count--;
+	spin_unlock(&dentry->d_lock);
 	return;
 
-unhash_it:
-	__d_drop(dentry);
 kill_it:
+	if (!spin_trylock(&dcache_inode_lock)) {
+relock:
+		spin_unlock(&dentry->d_lock);
+		cpu_relax();
+		goto repeat;
+	}
+	if (IS_ROOT(dentry))
+		parent = NULL;
+	else
+		parent = dentry->d_parent;
+	if (parent && !spin_trylock(&parent->d_lock)) {
+		spin_unlock(&dcache_inode_lock);
+		goto relock;
+	}
+	dentry->d_count--;
 	/* if dentry was on the d_lru list delete it from there */
 	dentry_lru_del(dentry);
+	/* if it was on the hash (d_delete case), then remove it */
+	__d_drop(dentry);
 	dentry = d_kill(dentry, parent);
 	if (dentry)
 		goto repeat;



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 21/28] fs: dcache reduce locking in d_alloc
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (19 preceding siblings ...)
  2010-11-16 14:09 ` [patch 20/28] fs: dcache reduce dput locking Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 22/28] fs: dcache reduce dcache_inode_lock Nick Piggin
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-d_alloc-less-d_lock.patch --]
[-- Type: text/plain, Size: 825 bytes --]

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:39.000000000 +1100
@@ -1220,11 +1220,13 @@ struct dentry *d_alloc(struct dentry * p
 
 	if (parent) {
 		spin_lock(&parent->d_lock);
-		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+		/*
+		 * don't need child lock because it is not subject
+		 * to concurrency here
+		 */
 		dentry->d_parent = dget_dlock(parent);
 		dentry->d_sb = parent->d_sb;
 		list_add(&dentry->d_u.d_child, &parent->d_subdirs);
-		spin_unlock(&dentry->d_lock);
 		spin_unlock(&parent->d_lock);
 	}
 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 22/28] fs: dcache reduce dcache_inode_lock
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (20 preceding siblings ...)
  2010-11-16 14:09 ` [patch 21/28] fs: dcache reduce locking in d_alloc Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 23/28] fs: dcache rationalise dget variants Nick Piggin
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dcache-d_delete-less-lock.patch --]
[-- Type: text/plain, Size: 2067 bytes --]

dcache_inode_lock can be avoided in d_delete() and d_materialise_unique()
in cases where it is not required.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:39.000000000 +1100
@@ -1794,10 +1794,15 @@ void d_delete(struct dentry * dentry)
 	/*
 	 * Are we the only user?
 	 */
-	spin_lock(&dcache_inode_lock);
+again:
 	spin_lock(&dentry->d_lock);
 	isdir = S_ISDIR(dentry->d_inode->i_mode);
 	if (dentry->d_count == 1) {
+		if (!spin_trylock(&dcache_inode_lock)) {
+			spin_unlock(&dentry->d_lock);
+			cpu_relax();
+			goto again;
+		}
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
 		dentry_iput(dentry);
 		fsnotify_nameremove(dentry, isdir);
@@ -1808,7 +1813,6 @@ void d_delete(struct dentry * dentry)
 		__d_drop(dentry);
 
 	spin_unlock(&dentry->d_lock);
-	spin_unlock(&dcache_inode_lock);
 
 	fsnotify_nameremove(dentry, isdir);
 }
@@ -2106,14 +2110,15 @@ struct dentry *d_materialise_unique(stru
 
 	BUG_ON(!d_unhashed(dentry));
 
-	spin_lock(&dcache_inode_lock);
-
 	if (!inode) {
 		actual = dentry;
 		__d_instantiate(dentry, NULL);
-		goto found_lock;
+		d_rehash(actual);
+		goto out_nolock;
 	}
 
+	spin_lock(&dcache_inode_lock);
+
 	if (S_ISDIR(inode->i_mode)) {
 		struct dentry *alias;
 
@@ -2145,10 +2150,9 @@ struct dentry *d_materialise_unique(stru
 	actual = __d_instantiate_unique(dentry, inode);
 	if (!actual)
 		actual = dentry;
-	else if (unlikely(!d_unhashed(actual)))
-		goto shouldnt_be_hashed;
+	else
+		BUG_ON(!d_unhashed(actual));
 
-found_lock:
 	spin_lock(&actual->d_lock);
 found:
 	spin_lock(&dcache_hash_lock);
@@ -2164,10 +2168,6 @@ struct dentry *d_materialise_unique(stru
 
 	iput(inode);
 	return actual;
-
-shouldnt_be_hashed:
-	spin_unlock(&dcache_inode_lock);
-	BUG();
 }
 EXPORT_SYMBOL_GPL(d_materialise_unique);
 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 23/28] fs: dcache rationalise dget variants
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (21 preceding siblings ...)
  2010-11-16 14:09 ` [patch 22/28] fs: dcache reduce dcache_inode_lock Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 24/28] fs: dcache reduce d_parent locking Nick Piggin
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-rationalize-dget.patch --]
[-- Type: text/plain, Size: 11150 bytes --]

dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 arch/powerpc/platforms/cell/spufs/inode.c |    2 -
 drivers/infiniband/hw/ipath/ipath_fs.c    |    2 -
 drivers/infiniband/hw/qib/qib_fs.c        |    2 -
 drivers/staging/smbfs/cache.c             |    2 -
 fs/configfs/inode.c                       |    2 -
 fs/dcache.c                               |   34 ++++++++----------------------
 fs/exportfs/expfs.c                       |    2 -
 fs/ncpfs/dir.c                            |    2 -
 fs/ocfs2/dcache.c                         |    2 -
 include/linux/dcache.h                    |   15 ++-----------
 kernel/cgroup.c                           |    2 -
 security/selinux/selinuxfs.c              |    2 -
 12 files changed, 23 insertions(+), 46 deletions(-)

Index: linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c
===================================================================
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/inode.c	2010-11-17 00:52:38.000000000 +1100
@@ -161,7 +161,7 @@ static void spufs_prune_dir(struct dentr
 	list_for_each_entry_safe(dentry, tmp, &dir->d_subdirs, d_u.d_child) {
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry)) && dentry->d_inode) {
-			dget_locked_dlock(dentry);
+			dget_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			simple_unlink(dir->d_inode, dentry);
Index: linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/ipath/ipath_fs.c	2010-11-17 00:52:38.000000000 +1100
@@ -279,7 +279,7 @@ static int remove_file(struct dentry *pa
 
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
-		dget_locked_dlock(tmp);
+		dget_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
 		simple_unlink(parent->d_inode, tmp);
Index: linux-2.6/fs/configfs/inode.c
===================================================================
--- linux-2.6.orig/fs/configfs/inode.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/configfs/inode.c	2010-11-17 00:52:38.000000000 +1100
@@ -252,7 +252,7 @@ void configfs_drop_dentry(struct configf
 	if (dentry) {
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry) && dentry->d_inode)) {
-			dget_locked_dlock(dentry);
+			dget_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			simple_unlink(parent->d_inode, dentry);
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:38.000000000 +1100
@@ -389,32 +389,17 @@ int d_invalidate(struct dentry * dentry)
 EXPORT_SYMBOL(d_invalidate);
 
 /* This must be called with d_lock held */
-static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
+static inline void __dget_dlock(struct dentry *dentry)
 {
 	dentry->d_count++;
-	dentry_lru_del(dentry);
-	return dentry;
 }
 
-/* This must be called with d_lock held */
-static inline struct dentry * __dget_locked(struct dentry *dentry)
+static inline void __dget(struct dentry *dentry)
 {
 	spin_lock(&dentry->d_lock);
-	__dget_locked_dlock(dentry);
+	__dget_dlock(dentry);
 	spin_unlock(&dentry->d_lock);
-	return dentry;
-}
-
-struct dentry * dget_locked_dlock(struct dentry *dentry)
-{
-	return __dget_locked_dlock(dentry);
-}
-
-struct dentry * dget_locked(struct dentry *dentry)
-{
-	return __dget_locked(dentry);
 }
-EXPORT_SYMBOL(dget_locked);
 
 struct dentry *dget_parent(struct dentry *dentry)
 {
@@ -495,7 +480,7 @@ static struct dentry *__d_find_alias(str
 	struct dentry *alias;
 	alias = ___d_find_alias(inode, want_discon);
 	if (alias) {
-		__dget_locked_dlock(alias);
+		__dget_dlock(alias);
 		spin_unlock(&alias->d_lock);
 	}
 	return alias;
@@ -526,7 +511,7 @@ void d_prune_aliases(struct inode *inode
 	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
 		spin_lock(&dentry->d_lock);
 		if (!dentry->d_count) {
-			__dget_locked_dlock(dentry);
+			__dget_dlock(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_inode_lock);
@@ -1224,7 +1209,8 @@ struct dentry *d_alloc(struct dentry * p
 		 * don't need child lock because it is not subject
 		 * to concurrency here
 		 */
-		dentry->d_parent = dget_dlock(parent);
+		__dget_dlock(parent);
+		dentry->d_parent = parent;
 		dentry->d_sb = parent->d_sb;
 		list_add(&dentry->d_u.d_child, &parent->d_subdirs);
 		spin_unlock(&parent->d_lock);
@@ -1323,7 +1309,7 @@ static struct dentry *__d_instantiate_un
 			goto next;
 		if (memcmp(qstr->name, name, len))
 			goto next;
-		dget_locked_dlock(alias);
+		__dget_dlock(alias);
 		spin_unlock(&alias->d_lock);
 		return alias;
 next:
@@ -1579,7 +1565,7 @@ struct dentry *d_add_ci(struct dentry *d
 	 * reference to it, move it in place and use it.
 	 */
 	new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-	dget_locked(new);
+	__dget(new);
 	spin_unlock(&dcache_inode_lock);
 	security_d_instantiate(found, inode);
 	d_move(new, found);
@@ -1755,7 +1741,7 @@ int d_validate(struct dentry *dentry, st
 	list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
 		if (dentry == child) {
 			spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-			__dget_locked_dlock(dentry);
+			__dget_dlock(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dparent->d_lock);
 			return 1;
Index: linux-2.6/fs/exportfs/expfs.c
===================================================================
--- linux-2.6.orig/fs/exportfs/expfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/exportfs/expfs.c	2010-11-17 00:52:38.000000000 +1100
@@ -49,7 +49,7 @@ find_acceptable_alias(struct dentry *res
 
 	spin_lock(&dcache_inode_lock);
 	list_for_each_entry(dentry, &result->d_inode->i_dentry, d_alias) {
-		dget_locked(dentry);
+		dget(dentry);
 		spin_unlock(&dcache_inode_lock);
 		if (toput)
 			dput(toput);
Index: linux-2.6/fs/ncpfs/dir.c
===================================================================
--- linux-2.6.orig/fs/ncpfs/dir.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ncpfs/dir.c	2010-11-17 00:52:38.000000000 +1100
@@ -400,7 +400,7 @@ ncp_dget_fpos(struct dentry *dentry, str
 		dent = list_entry(next, struct dentry, d_u.d_child);
 		if ((unsigned long)dent->d_fsdata == fpos) {
 			if (dent->d_inode)
-				dget_locked(dent);
+				dget(dent);
 			else
 				dent = NULL;
 			spin_unlock(&parent->d_lock);
Index: linux-2.6/fs/ocfs2/dcache.c
===================================================================
--- linux-2.6.orig/fs/ocfs2/dcache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/fs/ocfs2/dcache.c	2010-11-17 00:52:38.000000000 +1100
@@ -178,7 +178,7 @@ struct dentry *ocfs2_find_local_alias(st
 			mlog(0, "dentry found: %.*s\n",
 			     dentry->d_name.len, dentry->d_name.name);
 
-			dget_locked_dlock(dentry);
+			dget_dlock(dentry);
 			spin_unlock(&dentry->d_lock);
 			break;
 		}
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/include/linux/dcache.h	2010-11-17 00:52:38.000000000 +1100
@@ -317,23 +317,17 @@ extern char *dentry_path(struct dentry *
 /* Allocation counts.. */
 
 /**
- *	dget, dget_locked	-	get a reference to a dentry
+ *	dget, dget_dlock -	get a reference to a dentry
  *	@dentry: dentry to get a reference to
  *
  *	Given a dentry or %NULL pointer increment the reference count
  *	if appropriate and return the dentry. A dentry will not be 
- *	destroyed when it has references. dget() should never be
- *	called for dentries with zero reference counter. For these cases
- *	(preferably none, functions in dcache.c are sufficient for normal
- *	needs and they take necessary precautions) you should hold d_lock
- *	and call dget_dlock() instead of dget().
+ *	destroyed when it has references.
  */
 static inline struct dentry *dget_dlock(struct dentry *dentry)
 {
-	if (dentry) {
-		BUG_ON(!dentry->d_count);
+	if (dentry)
 		dentry->d_count++;
-	}
 	return dentry;
 }
 
@@ -347,9 +341,6 @@ static inline struct dentry *dget(struct
 	return dentry;
 }
 
-extern struct dentry * dget_locked(struct dentry *);
-extern struct dentry * dget_locked_dlock(struct dentry *);
-
 extern struct dentry *dget_parent(struct dentry *dentry);
 
 /**
Index: linux-2.6/kernel/cgroup.c
===================================================================
--- linux-2.6.orig/kernel/cgroup.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/kernel/cgroup.c	2010-11-17 00:52:38.000000000 +1100
@@ -885,7 +885,7 @@ static void cgroup_clear_directory(struc
 			/* This should never be called on a cgroup
 			 * directory with child cgroups */
 			BUG_ON(d->d_inode->i_mode & S_IFDIR);
-			dget_locked_dlock(d);
+			dget_dlock(d);
 			spin_unlock(&d->d_lock);
 			spin_unlock(&dentry->d_lock);
 			d_delete(d);
Index: linux-2.6/security/selinux/selinuxfs.c
===================================================================
--- linux-2.6.orig/security/selinux/selinuxfs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/security/selinux/selinuxfs.c	2010-11-17 00:52:38.000000000 +1100
@@ -1154,7 +1154,7 @@ static void sel_remove_entries(struct de
 		list_del_init(node);
 
 		if (d->d_inode) {
-			dget_locked_dlock(d);
+			dget_dlock(d);
 			spin_unlock(&de->d_lock);
 			spin_unlock(&d->d_lock);
 			d_delete(d);
Index: linux-2.6/drivers/infiniband/hw/qib/qib_fs.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/infiniband/hw/qib/qib_fs.c	2010-11-17 00:52:38.000000000 +1100
@@ -455,7 +455,7 @@ static int remove_file(struct dentry *pa
 
 	spin_lock(&tmp->d_lock);
 	if (!(d_unhashed(tmp) && tmp->d_inode)) {
-		dget_locked_dlock(tmp);
+		dget_dlock(tmp);
 		__d_drop(tmp);
 		spin_unlock(&tmp->d_lock);
 		simple_unlink(parent->d_inode, tmp);
Index: linux-2.6/drivers/staging/smbfs/cache.c
===================================================================
--- linux-2.6.orig/drivers/staging/smbfs/cache.c	2010-11-17 00:52:37.000000000 +1100
+++ linux-2.6/drivers/staging/smbfs/cache.c	2010-11-17 00:52:38.000000000 +1100
@@ -102,7 +102,7 @@ smb_dget_fpos(struct dentry *dentry, str
 		dent = list_entry(next, struct dentry, d_u.d_child);
 		if ((unsigned long)dent->d_fsdata == fpos) {
 			if (dent->d_inode)
-				dget_locked(dent);
+				dget(dent);
 			else
 				dent = NULL;
 			goto out_unlock;



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 24/28] fs: dcache reduce d_parent locking
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (22 preceding siblings ...)
  2010-11-16 14:09 ` [patch 23/28] fs: dcache rationalise dget variants Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 25/28] fs: dcache reduce prune_one_dentry locking Nick Piggin
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: fs-dget_parent-opt.patch --]
[-- Type: text/plain, Size: 1237 bytes --]

Use RCU to simplify locking in dget_parent.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:38.000000000 +1100
@@ -406,24 +406,27 @@ struct dentry *dget_parent(struct dentry
 	struct dentry *ret;
 
 repeat:
-	spin_lock(&dentry->d_lock);
+	/*
+	 * Don't need rcu_dereference because we re-check it was correct under
+	 * the lock.
+	 */
+	rcu_read_lock();
 	ret = dentry->d_parent;
-	if (!ret)
-		goto out;
-	if (dentry == ret) {
-		ret->d_count++;
-		goto out;
-	}
-	if (!spin_trylock(&ret->d_lock)) {
-		spin_unlock(&dentry->d_lock);
-		cpu_relax();
+	if (!ret) {
+		rcu_read_unlock();
+ 		goto out;
+ 	}
+	spin_lock(&ret->d_lock);
+	if (unlikely(ret != dentry->d_parent)) {
+		spin_unlock(&ret->d_lock);
+		rcu_read_unlock();
 		goto repeat;
 	}
+	rcu_read_unlock();
 	BUG_ON(!ret->d_count);
 	ret->d_count++;
 	spin_unlock(&ret->d_lock);
 out:
-	spin_unlock(&dentry->d_lock);
 	return ret;
 }
 EXPORT_SYMBOL(dget_parent);



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 25/28] fs: dcache reduce prune_one_dentry locking
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (23 preceding siblings ...)
  2010-11-16 14:09 ` [patch 24/28] fs: dcache reduce d_parent locking Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 26/28] fs: reduce dcache_inode_lock width in lru scanning Nick Piggin
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-prune_one_dentry-less-lock.patch --]
[-- Type: text/plain, Size: 1562 bytes --]

prune_one_dentry can avoid quite a bit of locking in the common case where
ancestors have an elevated refcount. Alternatively, we could have gone the
other way and made fewer trylocks in the case where d_count goes to zero, but
is probably less common.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:38.000000000 +1100
@@ -547,26 +547,29 @@ static void prune_one_dentry(struct dent
 	 * Prune ancestors.
 	 */
 	while (dentry) {
-		spin_lock(&dcache_inode_lock);
-again:
+relock:
 		spin_lock(&dentry->d_lock);
+		if (dentry->d_count > 1) {
+			dentry->d_count--;
+			spin_unlock(&dentry->d_lock);
+	 		return;
+	 	}
+		if (!spin_trylock(&dcache_inode_lock)) {
+relock2:
+			spin_unlock(&dentry->d_lock);
+			cpu_relax();
+			goto relock;
+		}
+
 		if (IS_ROOT(dentry))
 			parent = NULL;
 		else
 			parent = dentry->d_parent;
 		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dentry->d_lock);
-			goto again;
-		}
-		dentry->d_count--;
-		if (dentry->d_count) {
-			if (parent)
-				spin_unlock(&parent->d_lock);
-			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_inode_lock);
-			return;
+			goto relock2;
 		}
-
+		dentry->d_count--;
 		dentry_lru_del(dentry);
 		__d_drop(dentry);
 		dentry = d_kill(dentry, parent);



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 26/28] fs: reduce dcache_inode_lock width in lru scanning
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (24 preceding siblings ...)
  2010-11-16 14:09 ` [patch 25/28] fs: dcache reduce prune_one_dentry locking Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 27/28] fs: use RCU in shrink_dentry_list to reduce lock nesting Nick Piggin
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-lru-scanning-less-lock.patch --]
[-- Type: text/plain, Size: 2148 bytes --]

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:37.000000000 +1100
@@ -586,7 +586,7 @@ static void shrink_dentry_list(struct li
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 
 		if (!spin_trylock(&dentry->d_lock)) {
-relock:
+relock1:
 			spin_unlock(&dcache_lru_lock);
 			cpu_relax();
 			spin_lock(&dcache_lru_lock);
@@ -603,20 +603,24 @@ static void shrink_dentry_list(struct li
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
+		if (!spin_trylock(&dcache_inode_lock)) {
+relock2:
+			spin_unlock(&dentry->d_lock);
+			goto relock1;
+		}
 		if (IS_ROOT(dentry))
 			parent = NULL;
 		else
 			parent = dentry->d_parent;
 		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dentry->d_lock);
-			goto relock;
+			spin_unlock(&dcache_inode_lock);
+			goto relock2;
 		}
 		__dentry_lru_del(dentry);
 		spin_unlock(&dcache_lru_lock);
 
 		prune_one_dentry(dentry, parent);
 		/* dcache_inode_lock and dentry->d_lock dropped */
-		spin_lock(&dcache_inode_lock);
 		spin_lock(&dcache_lru_lock);
 	}
 }
@@ -637,7 +641,6 @@ static void __shrink_dcache_sb(struct su
 	LIST_HEAD(tmp);
 	int cnt = *count;
 
-	spin_lock(&dcache_inode_lock);
 relock:
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
@@ -675,7 +678,6 @@ static void __shrink_dcache_sb(struct su
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
 	spin_unlock(&dcache_lru_lock);
-	spin_unlock(&dcache_inode_lock);
 }
 
 /**
@@ -764,14 +766,12 @@ void shrink_dcache_sb(struct super_block
 {
 	LIST_HEAD(tmp);
 
-	spin_lock(&dcache_inode_lock);
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
 		list_splice_init(&sb->s_dentry_lru, &tmp);
 		shrink_dentry_list(&tmp);
 	}
 	spin_unlock(&dcache_lru_lock);
-	spin_unlock(&dcache_inode_lock);
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 27/28] fs: use RCU in shrink_dentry_list to reduce lock nesting
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (25 preceding siblings ...)
  2010-11-16 14:09 ` [patch 26/28] fs: reduce dcache_inode_lock width in lru scanning Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-16 14:09 ` [patch 28/28] fs: consolidate dentry kill sequence Nick Piggin
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-lru-scanning-rcu.patch --]
[-- Type: text/plain, Size: 2552 bytes --]

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |   38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 01:05:37.000000000 +1100
@@ -580,16 +580,16 @@ static void shrink_dentry_list(struct li
 {
 	struct dentry *dentry;
 
+	rcu_read_lock();
 	while (!list_empty(list)) {
 		struct dentry *parent;
 
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 
-		if (!spin_trylock(&dentry->d_lock)) {
-relock1:
-			spin_unlock(&dcache_lru_lock);
-			cpu_relax();
-			spin_lock(&dcache_lru_lock);
+		/* Don't need RCU dereference because we recheck under lock */
+		spin_lock(&dentry->d_lock);
+		if (dentry != list_entry(list->prev, struct dentry, d_lru)) {
+			spin_unlock(&dentry->d_lock);
 			continue;
 		}
 
@@ -599,14 +599,16 @@ static void shrink_dentry_list(struct li
 		 * it - just keep it off the LRU list.
 		 */
 		if (dentry->d_count) {
-			__dentry_lru_del(dentry);
+			dentry_lru_del(dentry);
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
+
 		if (!spin_trylock(&dcache_inode_lock)) {
-relock2:
+relock:
 			spin_unlock(&dentry->d_lock);
-			goto relock1;
+			cpu_relax();
+			continue;
 		}
 		if (IS_ROOT(dentry))
 			parent = NULL;
@@ -614,15 +616,15 @@ static void shrink_dentry_list(struct li
 			parent = dentry->d_parent;
 		if (parent && !spin_trylock(&parent->d_lock)) {
 			spin_unlock(&dcache_inode_lock);
-			goto relock2;
+			goto relock;
 		}
-		__dentry_lru_del(dentry);
-		spin_unlock(&dcache_lru_lock);
+		dentry_lru_del(dentry);
 
+		rcu_read_unlock();
 		prune_one_dentry(dentry, parent);
-		/* dcache_inode_lock and dentry->d_lock dropped */
-		spin_lock(&dcache_lru_lock);
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 }
 
 /**
@@ -671,13 +673,13 @@ static void __shrink_dcache_sb(struct su
 				break;
 		}
 	}
-
-	*count = cnt;
-	shrink_dentry_list(&tmp);
-
 	if (!list_empty(&referenced))
 		list_splice(&referenced, &sb->s_dentry_lru);
 	spin_unlock(&dcache_lru_lock);
+
+	shrink_dentry_list(&tmp);
+
+	*count = cnt;
 }
 
 /**
@@ -769,7 +771,9 @@ void shrink_dcache_sb(struct super_block
 	spin_lock(&dcache_lru_lock);
 	while (!list_empty(&sb->s_dentry_lru)) {
 		list_splice_init(&sb->s_dentry_lru, &tmp);
+		spin_unlock(&dcache_lru_lock);
 		shrink_dentry_list(&tmp);
+		spin_lock(&dcache_lru_lock);
 	}
 	spin_unlock(&dcache_lru_lock);
 }



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [patch 28/28] fs: consolidate dentry kill sequence
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (26 preceding siblings ...)
  2010-11-16 14:09 ` [patch 27/28] fs: use RCU in shrink_dentry_list to reduce lock nesting Nick Piggin
@ 2010-11-16 14:09 ` Nick Piggin
  2010-11-17  2:12 ` [patch 00/28] [rfc] dcache scaling part 1 Dave Chinner
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-16 14:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel

[-- Attachment #1: dcache-consolidate-dentry-kill.patch --]
[-- Type: text/plain, Size: 5723 bytes --]

The tricky locking for disposing of a dentry is duplicated 3 times in the
dcache (dput, pruning a dentry from the LRU, and pruning its ancestors).
Consolidate them all into a single function dentry_kill.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>

---
 fs/dcache.c |  137 +++++++++++++++++++++++++++---------------------------------
 1 file changed, 62 insertions(+), 75 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
+++ linux-2.6/fs/dcache.c	2010-11-17 00:52:38.000000000 +1100
@@ -244,6 +244,40 @@ static struct dentry *d_kill(struct dent
 	return parent;
 }
 
+/*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static inline struct dentry *dentry_kill(struct dentry *dentry, int ref)
+	__releases(dentry->d_lock)
+{
+	struct dentry *parent;
+
+	if (!spin_trylock(&dcache_inode_lock)) {
+relock:
+		spin_unlock(&dentry->d_lock);
+		cpu_relax();
+		return dentry; /* try again with same dentry */
+	}
+	if (IS_ROOT(dentry))
+		parent = NULL;
+	else
+		parent = dentry->d_parent;
+	if (parent && !spin_trylock(&parent->d_lock)) {
+		spin_unlock(&dcache_inode_lock);
+		goto relock;
+	}
+	if (ref)
+		dentry->d_count--;
+	/* if dentry was on the d_lru list delete it from there */
+	dentry_lru_del(dentry);
+	/* if it was on the hash then remove it */
+	__d_drop(dentry);
+	return d_kill(dentry, parent);
+}
+
 /* 
  * This is dput
  *
@@ -269,13 +303,9 @@ static struct dentry *d_kill(struct dent
  * call the dentry unlink method as well as removing it from the queues and
  * releasing its resources. If the parent dentries were scheduled for release
  * they too may now get deleted.
- *
- * no dcache lock, please.
  */
-
 void dput(struct dentry *dentry)
 {
-	struct dentry *parent;
 	if (!dentry)
 		return;
 
@@ -308,26 +338,7 @@ void dput(struct dentry *dentry)
 	return;
 
 kill_it:
-	if (!spin_trylock(&dcache_inode_lock)) {
-relock:
-		spin_unlock(&dentry->d_lock);
-		cpu_relax();
-		goto repeat;
-	}
-	if (IS_ROOT(dentry))
-		parent = NULL;
-	else
-		parent = dentry->d_parent;
-	if (parent && !spin_trylock(&parent->d_lock)) {
-		spin_unlock(&dcache_inode_lock);
-		goto relock;
-	}
-	dentry->d_count--;
-	/* if dentry was on the d_lru list delete it from there */
-	dentry_lru_del(dentry);
-	/* if it was on the hash (d_delete case), then remove it */
-	__d_drop(dentry);
-	dentry = d_kill(dentry, parent);
+	dentry = dentry_kill(dentry, 1);
 	if (dentry)
 		goto repeat;
 }
@@ -528,51 +539,43 @@ void d_prune_aliases(struct inode *inode
 EXPORT_SYMBOL(d_prune_aliases);
 
 /*
- * Throw away a dentry - free the inode, dput the parent.  This requires that
- * the LRU list has already been removed.
+ * Try to throw away a dentry - free the inode, dput the parent.
+ * Requires dentry->d_lock is held, and dentry->d_count == 0.
+ * Releases dentry->d_lock.
  *
- * Try to prune ancestors as well.  This is necessary to prevent
- * quadratic behavior of shrink_dcache_parent(), but is also expected
- * to be beneficial in reducing dentry cache fragmentation.
+ * This may fail if locks cannot be acquired no problem, just try again.
  */
-static void prune_one_dentry(struct dentry *dentry, struct dentry *parent)
+static void try_prune_one_dentry(struct dentry *dentry)
 	__releases(dentry->d_lock)
-	__releases(parent->d_lock)
-	__releases(dcache_inode_lock)
 {
-	__d_drop(dentry);
-	dentry = d_kill(dentry, parent);
+	struct dentry *parent;
 
+	parent = dentry_kill(dentry, 0);
 	/*
-	 * Prune ancestors.
+	 * If dentry_kill returns NULL, we have nothing more to do.
+	 * if it returns the same dentry, trylocks failed. In either
+	 * case, just loop again.
+	 *
+	 * Otherwise, we need to prune ancestors too. This is necessary
+	 * to prevent quadratic behavior of shrink_dcache_parent(), but
+	 * is also expected to be beneficial in reducing dentry cache
+	 * fragmentation.
 	 */
+	if (!parent)
+		return;
+	if (parent == dentry)
+		return;
+
+	/* Prune ancestors. */
+	dentry = parent;
 	while (dentry) {
-relock:
 		spin_lock(&dentry->d_lock);
 		if (dentry->d_count > 1) {
 			dentry->d_count--;
 			spin_unlock(&dentry->d_lock);
-	 		return;
-	 	}
-		if (!spin_trylock(&dcache_inode_lock)) {
-relock2:
-			spin_unlock(&dentry->d_lock);
-			cpu_relax();
-			goto relock;
+			return;
 		}
-
-		if (IS_ROOT(dentry))
-			parent = NULL;
-		else
-			parent = dentry->d_parent;
-		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dcache_inode_lock);
-			goto relock2;
-		}
-		dentry->d_count--;
-		dentry_lru_del(dentry);
-		__d_drop(dentry);
-		dentry = d_kill(dentry, parent);
+		dentry = dentry_kill(dentry, 1);
 	}
 }
 
@@ -582,8 +585,6 @@ static void shrink_dentry_list(struct li
 
 	rcu_read_lock();
 	while (!list_empty(list)) {
-		struct dentry *parent;
-
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 
 		/* Don't need RCU dereference because we recheck under lock */
@@ -604,24 +605,10 @@ static void shrink_dentry_list(struct li
 			continue;
 		}
 
-		if (!spin_trylock(&dcache_inode_lock)) {
-relock:
-			spin_unlock(&dentry->d_lock);
-			cpu_relax();
-			continue;
-		}
-		if (IS_ROOT(dentry))
-			parent = NULL;
-		else
-			parent = dentry->d_parent;
-		if (parent && !spin_trylock(&parent->d_lock)) {
-			spin_unlock(&dcache_inode_lock);
-			goto relock;
-		}
-		dentry_lru_del(dentry);
-
 		rcu_read_unlock();
-		prune_one_dentry(dentry, parent);
+
+		try_prune_one_dentry(dentry);
+
 		rcu_read_lock();
 	}
 	rcu_read_unlock();



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 04/28] fs: change d_delete semantics
  2010-11-16 14:09 ` [patch 04/28] fs: change d_delete semantics Nick Piggin
@ 2010-11-17  0:16   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-17  0:16 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Wed 17 Nov at 01:09:04 +1100 npiggin@kernel.dk said:

>  Documentation/filesystems/vfs.txt |   27 +++++++++++++--------------
>  include/linux/dcache.h            |    6 +++---

...minor nit:

There are white space changes in here around d_hash() and d_compare().

> 
> Index: linux-2.6/Documentation/filesystems/vfs.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/filesystems/vfs.txt	2010-11-17 00:50:54.000000000 +1100
> +++ linux-2.6/Documentation/filesystems/vfs.txt	2010-11-17 01:05:50.000000000 +1100
> @@ -841,9 +841,9 @@ the VFS uses a default. As of kernel 2.6
> 
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, struct nameidata *);
> -	int (*d_hash) (struct dentry *, struct qstr *);
> -	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
> -	int (*d_delete)(struct dentry *);
> +	int (*d_hash)(struct dentry *, struct qstr *);
> +	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
> +	int (*d_delete)(const struct dentry *);
>  	void (*d_release)(struct dentry *);
>  	void (*d_iput)(struct dentry *, struct inode *);
>  	char *(*d_dname)(struct dentry *, char *, int);

> Index: linux-2.6/include/linux/dcache.h
> ===================================================================
> --- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:04.000000000 +1100
> +++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:50.000000000 +1100
> @@ -133,9 +133,9 @@ enum dentry_d_lock_class
> 
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, struct nameidata *);
> -	int (*d_hash) (struct dentry *, struct qstr *);
> -	int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
> -	int (*d_delete)(struct dentry *);
> +	int (*d_hash)(struct dentry *, struct qstr *);
> +	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
> +	int (*d_delete)(const struct dentry *);
>  	void (*d_release)(struct dentry *);
>  	void (*d_iput)(struct dentry *, struct inode *);
>  	char *(*d_dname)(struct dentry *, char *, int);

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 07/28] fs: change d_compare for rcu-walk
  2010-11-16 14:09 ` [patch 07/28] fs: change d_compare for rcu-walk Nick Piggin
@ 2010-11-17  0:44   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-17  0:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Wed 17 Nov at 01:09:07 +1100 npiggin@kernel.dk said:

> Index: linux-2.6/include/linux/dcache.h
> ===================================================================
> --- linux-2.6.orig/include/linux/dcache.h	2010-11-17 00:52:37.000000000 +1100
> +++ linux-2.6/include/linux/dcache.h	2010-11-17 01:05:48.000000000 +1100
> @@ -134,7 +134,9 @@ enum dentry_d_lock_class
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, struct nameidata *);
>  	int (*d_hash)(struct dentry *, struct qstr *);
> -	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
> +	int (*d_compare)(const struct dentry *,
> +			const struct dentry *, const struct inode *,
> +			unsigned int, const char *, const struct qstr *);
>  	int (*d_delete)(const struct dentry *);
>  	void (*d_release)(struct dentry *);
>  	void (*d_iput)(struct dentry *, struct inode *);

The white space change I'd previously noted may as well happen here
if you're wanting it...just keeps the revision history cleaner for
future eyes.

> Index: linux-2.6/Documentation/filesystems/vfs.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/filesystems/vfs.txt	2010-11-17 00:52:37.000000000 +1100
> +++ linux-2.6/Documentation/filesystems/vfs.txt	2010-11-17 01:05:49.000000000 +1100
> @@ -842,7 +842,9 @@ the VFS uses a default. As of kernel 2.6
>  struct dentry_operations {
>  	int (*d_revalidate)(struct dentry *, struct nameidata *);
>  	int (*d_hash)(struct dentry *, struct qstr *);
> -	int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
> +	int (*d_compare)(const struct dentry *,
> +			const struct dentry *, const struct inode *,
> +			unsigned int, const char *, const struct qstr *);
>  	int (*d_delete)(const struct dentry *);
>  	void (*d_release)(struct dentry *);
>  	void (*d_iput)(struct dentry *, struct inode *);

Ditto.

> @@ -854,9 +856,26 @@ struct dentry_operations {
>  	dcache. Most filesystems leave this as NULL, because all their
>  	dentries in the dcache are valid
> 
> -  d_hash: called when the VFS adds a dentry to the hash table
> +  d_hash: called when the VFS adds a dentry to the hash table. The first
> +	dentry passed to d_hash is the parent directory that the name is
> + 	to be hashed into.

Similarly it'd be cleaner if this was in the next patch which is changing
d_hash().  Same for the d_hash() prototype white space changes.

> 
> -  d_compare: called when a dentry should be compared with another
> +  d_compare: called to compare a dentry name with a given name. The first
> +	dentry is the parent of the dentry to be compared, the second is
> +	the dentry itself. inode, len, and name string are properties of
> +	the dentry to be compared. qstr is the name to compare it with.
> +
> +	Must be constant and idempotent, and should not take locks if
> +	possible, and should not or store into the dentry or inodes.
                           ^^^^^^^^^^^^^^^^
Verb missing?  Or extra conjunction?

> +	Should not dereference pointers outside the dentry or inodes without
> +	lots of care (eg.  d_parent, d_inode shouldn't be used).
> +
> +	However, our vfsmount is pinned, and RCU held, so the dentries and
> +	inodes won't disappear, neither will our sb or filesystem module.
> +	->i_sb and ->d_sb may be used.
> +
> +	It is a tricky calling convention because it needs to be called under
> +	"rcu-walk", ie. without any locks or references on things.
> 
>    d_delete: called when the last reference to a dentry is dropped and the
>  	dcache is deciding whether or not to cache it. Return 1 to delete

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 08/28] fs: change d_hash for rcu-walk
  2010-11-16 14:09 ` [patch 08/28] fs: change d_hash " Nick Piggin
@ 2010-11-17  0:50   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-17  0:50 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Wed 17 Nov at 01:09:08 +1100 npiggin@kernel.dk said:
> Change d_hash so it may be called from lock-free RCU lookups. See similar
> patch for d_compare for details.
> 
> For in-tree filesystems, this is just a mechanical change.
> 
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> 
> ---
>  Documentation/filesystems/Locking |    5 +++--
>  Documentation/filesystems/porting |    7 +++++++
>  Documentation/filesystems/vfs.txt |    8 ++++++--
>  fs/dcache.c                       |    2 +-
>  include/linux/dcache.h            |    3 ++-

And maybe I'm lazy or something, but it seems like it'd make the reviewing
easier if the above core hunks were formatted in the email before/above
the pages and pages of per fs changes.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 00/28] [rfc] dcache scaling part 1
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (27 preceding siblings ...)
  2010-11-16 14:09 ` [patch 28/28] fs: consolidate dentry kill sequence Nick Piggin
@ 2010-11-17  2:12 ` Dave Chinner
  2010-11-17 10:56 ` Andi Kleen
  2010-11-19 19:43 ` Tim Pepper
  30 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2010-11-17  2:12 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Wed, Nov 17, 2010 at 01:09:00AM +1100, Nick Piggin wrote:
> There are 3 main parts to dcache scaling. This one primarily adds new locks
> to take over dcache_lock, and some pre/post prep and streamlining patches.
> 
> The second implements fine grained locking, and is rather trivial after
> part 1.
> 
> The third implements rcu-walk. rcu-walk depends on the first part, because it
> relies on using d_lock to protect the state of the dentry (when converting from
> rcu-walk to refcounted walk). Without the fine grained locing, we'd need to use
> dcache_lock for that, which would be a step backwards to put into path walking
> again.
> 
> Comments?

Do you have a git tree somewhere with this series in it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 01/28] fs: d_validate fixes
  2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
@ 2010-11-17 10:44   ` Andi Kleen
  2010-11-18 20:51   ` David Miller
  1 sibling, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-11-17 10:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel, linux-nfs

Nick Piggin <npiggin@kernel.dk> writes:

> d_validate has been broken for a long time.
>
> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> if it can go away at any time. Even rcu_read_lock doesn't help, because
> the pointer might be queued in RCU callbacks but not executed yet.

I wonder if that is a problem for NFS ... (which I believe is the only
user). Could these races be used to break the NFS server?

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 00/28] [rfc] dcache scaling part 1
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (28 preceding siblings ...)
  2010-11-17  2:12 ` [patch 00/28] [rfc] dcache scaling part 1 Dave Chinner
@ 2010-11-17 10:56 ` Andi Kleen
  2010-11-17 11:19   ` Nick Piggin
  2010-11-19 19:43 ` Tim Pepper
  30 siblings, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2010-11-17 10:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

Nick Piggin <npiggin@kernel.dk> writes:

> There are 3 main parts to dcache scaling. This one primarily adds new locks
> to take over dcache_lock, and some pre/post prep and streamlining patches.
>
> The second implements fine grained locking, and is rather trivial after
> part 1.
>
> The third implements rcu-walk. rcu-walk depends on the first part, because it
> relies on using d_lock to protect the state of the dentry (when converting from
> rcu-walk to refcounted walk). Without the fine grained locing, we'd need to use
> dcache_lock for that, which would be a step backwards to put into path walking
> again.
>
> Comments?

I read 15, 10, 8, 5, 4, 3, 1 so far (weird order, it showed that way in
my reader :-) There was nothing surprising in any of those and they all
seem to do what the description advertises.

I was scared a bit by the upto 4 level dcache lock nestings, but I
assume those will get better again when everything is done.
At least from a quick look they seem to be all in the right order
(I assume you attempted some runtime coverage with lockdep too, right?)

For some of the hash lists it may become attractive to consider
the newly posted lockless list, but it wasn't fully clear
if that was easy to do (the lock protected a bit more than
just the list node)

For the level file system tree sweep changes it would be nice
if there were semantic patches available. That would make
it easier to verify the changes have been consistently
done, by rerunning the patcher.

You can add a Reviewed-by: Andi Kleen <ak@linux.intel.com>
to the patches listed above.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 00/28] [rfc] dcache scaling part 1
  2010-11-17 10:56 ` Andi Kleen
@ 2010-11-17 11:19   ` Nick Piggin
  2010-11-17 12:01     ` Andi Kleen
  0 siblings, 1 reply; 46+ messages in thread
From: Nick Piggin @ 2010-11-17 11:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Nick Piggin, linux-fsdevel, linux-kernel

On Wed, Nov 17, 2010 at 11:56:02AM +0100, Andi Kleen wrote:
> Nick Piggin <npiggin@kernel.dk> writes:
> 
> > There are 3 main parts to dcache scaling. This one primarily adds new locks
> > to take over dcache_lock, and some pre/post prep and streamlining patches.
> >
> > The second implements fine grained locking, and is rather trivial after
> > part 1.
> >
> > The third implements rcu-walk. rcu-walk depends on the first part, because it
> > relies on using d_lock to protect the state of the dentry (when converting from
> > rcu-walk to refcounted walk). Without the fine grained locing, we'd need to use
> > dcache_lock for that, which would be a step backwards to put into path walking
> > again.
> >
> > Comments?
> 
> I read 15, 10, 8, 5, 4, 3, 1 so far (weird order, it showed that way in
> my reader :-) There was nothing surprising in any of those and they all
> seem to do what the description advertises.

Thanks for looking.

 
> I was scared a bit by the upto 4 level dcache lock nestings, but I
> assume those will get better again when everything is done.
> At least from a quick look they seem to be all in the right order
> (I assume you attempted some runtime coverage with lockdep too, right?)

Yes of course, it's been finding bugs in other code, actually :P

Lock nesting is primarily an issue in killing a dentry, when it needs to
be taken out of several data structures at the same time. I haven't
tackled un-nesting these yet, because things become far more complex
when you start allowing new types of concurrency like that.

I found that with fine grained locking, lock contention was low enough
that it really didn't matter to take a few locks at once (obviously it's
far better than the existing dcache_lock situation).

If it does become a problem, or we want to unwind some of the locks for
any other reason, we'd probably have a DENTRY_TEARDOWN flag in the
dentry, and dput / reclaim can set it without holding other locks, and
then drop the d_lock and individually remove it from various lists.

Of course that requires audits in every list walker, and list_empty() etc
check everywhere, so it's not appropriate for initial patches.
 

> For some of the hash lists it may become attractive to consider
> the newly posted lockless list, but it wasn't fully clear
> if that was easy to do (the lock protected a bit more than
> just the list node)

Well the first part of the series doesn't fine grain the locks, for
the most part. After this is done, it's really pretty trivial to change
locking schemes for data structures.
 

> For the level file system tree sweep changes it would be nice
> if there were semantic patches available. That would make
> it easier to verify the changes have been consistently
> done, by rerunning the patcher.

I haven't looked at doing semantic patches. I suspect it might be
a bit too hard to write.

Really, *most* filesystems do not play silly games with dcache_lock,
and those that don't are pretty easy. Ones that do need more thought
than semantic patch could probably apply.

 
> You can add a Reviewed-by: Andi Kleen <ak@linux.intel.com>
> to the patches listed above.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 00/28] [rfc] dcache scaling part 1
  2010-11-17 11:19   ` Nick Piggin
@ 2010-11-17 12:01     ` Andi Kleen
  0 siblings, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2010-11-17 12:01 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andi Kleen, linux-fsdevel, linux-kernel

> I haven't looked at doing semantic patches. I suspect it might be
> a bit too hard to write.
> 
> Really, *most* filesystems do not play silly games with dcache_lock,
> and those that don't are pretty easy. Ones that do need more thought
> than semantic patch could probably apply.

The advantage of using the semantic patch is that it catches
error paths and unusal constructions which are bad for grep better. 
It does not replace hand review and some hand modification of course.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 01/28] fs: d_validate fixes
  2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
  2010-11-17 10:44   ` Andi Kleen
@ 2010-11-18 20:51   ` David Miller
  2010-11-18 20:59     ` David Miller
  2010-11-19  5:01     ` Nick Piggin
  1 sibling, 2 replies; 46+ messages in thread
From: David Miller @ 2010-11-18 20:51 UTC (permalink / raw)
  To: npiggin; +Cc: linux-fsdevel, linux-kernel

From: Nick Piggin <npiggin@kernel.dk>
Date: Wed, 17 Nov 2010 01:09:01 +1100

> d_validate has been broken for a long time.
> 
> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> if it can go away at any time. Even rcu_read_lock doesn't help, because
> the pointer might be queued in RCU callbacks but not executed yet.
> 
> So the parent cannot be checked, nor the name hashed. The dentry pointer
> can not be touched until it can be verified under lock. Hashing simply
> cannot be used.
> 
> Instead, verify the parent/child relationship by traversing parent's
> d_child list. It's slow, but only ncpfs and the destaged smbfs care
> about it, at this point.
> 
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>

This won't apply because is conflicts with Christoph Hellwig's
RCU conversion of d_validate().

Which is a change that went in more than a month ago.

Thus I'd really appreciate if you mentioned what tree your patches are
against in your "0/N" posting, always.

Thanks.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 01/28] fs: d_validate fixes
  2010-11-18 20:51   ` David Miller
@ 2010-11-18 20:59     ` David Miller
  2010-11-19  5:05       ` Nick Piggin
  2010-11-19  5:01     ` Nick Piggin
  1 sibling, 1 reply; 46+ messages in thread
From: David Miller @ 2010-11-18 20:59 UTC (permalink / raw)
  To: npiggin; +Cc: linux-fsdevel, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Thu, 18 Nov 2010 12:51:23 -0800 (PST)

> From: Nick Piggin <npiggin@kernel.dk>
> Date: Wed, 17 Nov 2010 01:09:01 +1100
> 
>> d_validate has been broken for a long time.
>> 
>> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
>> if it can go away at any time. Even rcu_read_lock doesn't help, because
>> the pointer might be queued in RCU callbacks but not executed yet.
>> 
>> So the parent cannot be checked, nor the name hashed. The dentry pointer
>> can not be touched until it can be verified under lock. Hashing simply
>> cannot be used.
>> 
>> Instead, verify the parent/child relationship by traversing parent's
>> d_child list. It's slow, but only ncpfs and the destaged smbfs care
>> about it, at this point.
>> 
>> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> 
> This won't apply because is conflicts with Christoph Hellwig's
> RCU conversion of d_validate().
> 
> Which is a change that went in more than a month ago.

In fact the conflicts of your patch set are even more pervasive, since
all dcache hash traversals are essentially RCU protected instead of
dcache_lock protected right now.

This makes it very difficult to test or analyze your patches for those
of us on mainline or similar.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 01/28] fs: d_validate fixes
  2010-11-18 20:51   ` David Miller
  2010-11-18 20:59     ` David Miller
@ 2010-11-19  5:01     ` Nick Piggin
  1 sibling, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-19  5:01 UTC (permalink / raw)
  To: David Miller; +Cc: npiggin, linux-fsdevel, linux-kernel

On Thu, Nov 18, 2010 at 12:51:23PM -0800, David Miller wrote:
> From: Nick Piggin <npiggin@kernel.dk>
> Date: Wed, 17 Nov 2010 01:09:01 +1100
> 
> > d_validate has been broken for a long time.
> > 
> > kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> > if it can go away at any time. Even rcu_read_lock doesn't help, because
> > the pointer might be queued in RCU callbacks but not executed yet.
> > 
> > So the parent cannot be checked, nor the name hashed. The dentry pointer
> > can not be touched until it can be verified under lock. Hashing simply
> > cannot be used.
> > 
> > Instead, verify the parent/child relationship by traversing parent's
> > d_child list. It's slow, but only ncpfs and the destaged smbfs care
> > about it, at this point.
> > 
> > Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> 
> This won't apply because is conflicts with Christoph Hellwig's
> RCU conversion of d_validate().
> 
> Which is a change that went in more than a month ago.

Sorry yeah I had a local revert that I didn't send out, otherwise
it should be a vanilla upstream kernel.

 
> Thus I'd really appreciate if you mentioned what tree your patches are
> against in your "0/N" posting, always.

Try to remember. I'll put some time into polishing up a git tree
and submitting the more interesting patches as well, this weekend.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 01/28] fs: d_validate fixes
  2010-11-18 20:59     ` David Miller
@ 2010-11-19  5:05       ` Nick Piggin
  0 siblings, 0 replies; 46+ messages in thread
From: Nick Piggin @ 2010-11-19  5:05 UTC (permalink / raw)
  To: David Miller; +Cc: npiggin, linux-fsdevel, linux-kernel

On Thu, Nov 18, 2010 at 12:59:13PM -0800, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 18 Nov 2010 12:51:23 -0800 (PST)
> 
> > From: Nick Piggin <npiggin@kernel.dk>
> > Date: Wed, 17 Nov 2010 01:09:01 +1100
> > 
> >> d_validate has been broken for a long time.
> >> 
> >> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> >> if it can go away at any time. Even rcu_read_lock doesn't help, because
> >> the pointer might be queued in RCU callbacks but not executed yet.
> >> 
> >> So the parent cannot be checked, nor the name hashed. The dentry pointer
> >> can not be touched until it can be verified under lock. Hashing simply
> >> cannot be used.
> >> 
> >> Instead, verify the parent/child relationship by traversing parent's
> >> d_child list. It's slow, but only ncpfs and the destaged smbfs care
> >> about it, at this point.
> >> 
> >> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> > 
> > This won't apply because is conflicts with Christoph Hellwig's
> > RCU conversion of d_validate().
> > 
> > Which is a change that went in more than a month ago.
> 
> In fact the conflicts of your patch set are even more pervasive, since
> all dcache hash traversals are essentially RCU protected instead of
> dcache_lock protected right now.

Not sure what you mean there. The patches are against upstream+revert of
the last d_validate patch.

dcache_lock splitup of this series is to split the lock out of all the
other paths, and importantly allow d_lock to protect the complete
dcache state of the dentry.

Next 2 steps (that depend on this series but not on each other) are
fine grained locking of the split locks, and rcu-walk. rcu-walk is what
I called store-free path walking, because we extend RCU not only to the
hash lookup but the entire path walk.

I'll get all that out when I get a bit of time to work on it again.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 13/28] fs: dcache scale d_unhashed
  2010-11-16 14:09 ` [patch 13/28] fs: dcache scale d_unhashed Nick Piggin
@ 2010-11-19 19:41   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-19 19:41 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Tue, Nov 16, 2010 at 6:09 AM, Nick Piggin <npiggin@kernel.dk> wrote:

> @@ -1797,7 +1822,10 @@ static void d_move_locked(struct dentry
>        /*
>         * XXXX: do we really need to take target->d_lock?
>         */
> -       if (target < dentry) {
> +       if (d_ancestor(dentry, target)) {
> +               spin_lock(&dentry->d_lock);
> +               spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
> +       } else if (d_ancestor(target, dentry) || target < dentry) {
>                spin_lock(&target->d_lock);
>                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>        } else {

This is the first hunk out of the series where I feel like reading the
new code makes me say "ugh".

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 14/28] fs: dcache scale subdirs
  2010-11-16 14:09 ` [patch 14/28] fs: dcache scale subdirs Nick Piggin
@ 2010-11-19 19:41   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-19 19:41 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Tue, Nov 16, 2010 at 6:09 AM, Nick Piggin <npiggin@kernel.dk> wrote:

> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c  2010-11-17 00:52:37.000000000 +1100
> +++ linux-2.6/fs/dcache.c       2010-11-17 01:05:44.000000000 +1100

> @@ -217,24 +219,22 @@ static void dentry_lru_move_tail(struct
>  *
>  * If this is the root of the dentry tree, return NULL.
>  *
> - * dcache_lock and d_lock must be held by caller, are dropped by d_kill.
> + * dcache_lock and d_lock and d_parent->d_lock must be held by caller, and
> + * are dropped by d_kill.
>  */
> -static struct dentry *d_kill(struct dentry *dentry)
> +static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
>        __releases(dentry->d_lock)
> +       __releases(parent->d_lock)
>        __releases(dcache_lock)
>  {
> -       struct dentry *parent;
> -
>        list_del(&dentry->d_u.d_child);
> +       if (parent)
> +               spin_unlock(&parent->d_lock);
>        dentry_iput(dentry);
>        /*
>         * dentry_iput drops the locks, at which point nobody (except
>         * transient RCU lookups) can reach this dentry.
>         */
> -       if (IS_ROOT(dentry))
> -               parent = NULL;
> -       else
> -               parent = dentry->d_parent;
>        d_free(dentry);
>        return parent;
>  }
> @@ -270,6 +270,7 @@ static struct dentry *d_kill(struct dent
>
>  void dput(struct dentry *dentry)
>  {
> +       struct dentry *parent;
>        if (!dentry)
>                return;
>
> @@ -277,6 +278,10 @@ void dput(struct dentry *dentry)
>        if (dentry->d_count == 1)
>                might_sleep();
>        spin_lock(&dentry->d_lock);
> +       if (IS_ROOT(dentry))
> +               parent = NULL;
> +       else
> +               parent = dentry->d_parent;
>        if (dentry->d_count == 1) {
>                if (!spin_trylock(&dcache_lock)) {
>                        /*

Removed one and added three of the IS_ROOT() checks I think...Maybe
add a macro for setting the local parent pointer?  Ah never mind.  In
patch 28 they collapse back into a single instance.

In this patch the nested locking starts to feel a little more natural
to me.  The last number of patches in the series end up re-simplifying
a lot of intermediate complexity.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 15/28] fs: scale inode alias list
  2010-11-16 14:09 ` [patch 15/28] fs: scale inode alias list Nick Piggin
@ 2010-11-19 19:41   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-19 19:41 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Tue, Nov 16, 2010 at 6:09 AM, Nick Piggin <npiggin@kernel.dk> wrote:

> @@ -290,13 +300,18 @@ void dput(struct dentry *dentry)
>                         * want to reduce dcache_lock anyway so this will
>                         * get improved.
>                         */
> +drop1:
>                        spin_unlock(&dentry->d_lock);
>                        goto repeat;
>                }
> -               if (parent && !spin_trylock(&parent->d_lock)) {
> -                       spin_unlock(&dentry->d_lock);
> +               if (!spin_trylock(&dcache_inode_lock)) {
> +drop2:
>                        spin_unlock(&dcache_lock);
> -                       goto repeat;
> +                       goto drop1;
> +               }
> +               if (parent && !spin_trylock(&parent->d_lock)) {
> +                       spin_unlock(&dcache_inode_lock);
> +                       goto drop2;
>                }
>        }
>        dentry->d_count--;

Another ugh moment.  But it gets better in patches 18 and 20.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 16/28] fs: Use rename lock and RCU for multi-step operations
  2010-11-16 14:09 ` [patch 16/28] fs: Use rename lock and RCU for multi-step operations Nick Piggin
@ 2010-11-19 19:42   ` Tim Pepper
  0 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-19 19:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Tue, Nov 16, 2010 at 6:09 AM, Nick Piggin <npiggin@kernel.dk> wrote:

> +               /* might go back up the wrong parent if we have had a rename
> +                * or deletion */
> +               if (this_parent != child->d_parent ||
> +                               read_seqretry(&rename_lock, seq)) {
> +                       spin_unlock(&this_parent->d_lock);
> +                       spin_unlock(&dcache_lock);
> +                       rcu_read_unlock();
> +                       goto rename_retry;
> +               }
> +               rcu_read_unlock();
> +               next = child->d_u.d_child.next;

Again there are something like three insertions of this check.  Right
now it arguably makes sense to have everything explicitly clear inline
at every instance of the code for review, but eventually it might be
more readable if these became something like:

if (rename_happened(...))
        goto rename_retry;

(accounting for the change in patch 19 of course)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [patch 00/28] [rfc] dcache scaling part 1
  2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
                   ` (29 preceding siblings ...)
  2010-11-17 10:56 ` Andi Kleen
@ 2010-11-19 19:43 ` Tim Pepper
  30 siblings, 0 replies; 46+ messages in thread
From: Tim Pepper @ 2010-11-19 19:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, linux-kernel

On Tue, Nov 16, 2010 at 6:09 AM, Nick Piggin <npiggin@kernel.dk> wrote:
> There are 3 main parts to dcache scaling. This one primarily adds new locks
> to take over dcache_lock, and some pre/post prep and streamlining patches.
>
> The second implements fine grained locking, and is rather trivial after
> part 1.
>
> The third implements rcu-walk. rcu-walk depends on the first part, because it
> relies on using d_lock to protect the state of the dentry (when converting from
> rcu-walk to refcounted walk). Without the fine grained locing, we'd need to use
> dcache_lock for that, which would be a step backwards to put into path walking
> again.

Reviewed-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2010-11-19 19:43 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-16 14:09 [patch 00/28] [rfc] dcache scaling part 1 Nick Piggin
2010-11-16 14:09 ` [patch 01/28] fs: d_validate fixes Nick Piggin
2010-11-17 10:44   ` Andi Kleen
2010-11-18 20:51   ` David Miller
2010-11-18 20:59     ` David Miller
2010-11-19  5:05       ` Nick Piggin
2010-11-19  5:01     ` Nick Piggin
2010-11-16 14:09 ` [patch 02/28] kernel: kmem_ptr_validate considered harmful Nick Piggin
2010-11-16 14:09 ` [patch 03/28] fs: dcache documentation cleanup Nick Piggin
2010-11-16 14:09 ` [patch 04/28] fs: change d_delete semantics Nick Piggin
2010-11-17  0:16   ` Tim Pepper
2010-11-16 14:09 ` [patch 05/28] cifs: dont overwrite dentry name in d_revalidate Nick Piggin
2010-11-16 14:09 ` [patch 06/28] jfs: " Nick Piggin
2010-11-16 14:09 ` [patch 07/28] fs: change d_compare for rcu-walk Nick Piggin
2010-11-17  0:44   ` Tim Pepper
2010-11-16 14:09 ` [patch 08/28] fs: change d_hash " Nick Piggin
2010-11-17  0:50   ` Tim Pepper
2010-11-16 14:09 ` [patch 09/28] hostfs: simplify locking Nick Piggin
2010-11-16 14:09 ` [patch 10/28] fs: dcache scale hash Nick Piggin
2010-11-16 14:09 ` [patch 11/28] fs: dcache scale lru Nick Piggin
2010-11-16 14:09 ` [patch 12/28] fs: dcache scale dentry refcount Nick Piggin
2010-11-16 14:09 ` [patch 13/28] fs: dcache scale d_unhashed Nick Piggin
2010-11-19 19:41   ` Tim Pepper
2010-11-16 14:09 ` [patch 14/28] fs: dcache scale subdirs Nick Piggin
2010-11-19 19:41   ` Tim Pepper
2010-11-16 14:09 ` [patch 15/28] fs: scale inode alias list Nick Piggin
2010-11-19 19:41   ` Tim Pepper
2010-11-16 14:09 ` [patch 16/28] fs: Use rename lock and RCU for multi-step operations Nick Piggin
2010-11-19 19:42   ` Tim Pepper
2010-11-16 14:09 ` [patch 17/28] fs: increase d_name lock coverage Nick Piggin
2010-11-16 14:09 ` [patch 18/28] fs: dcache remove dcache_lock Nick Piggin
2010-11-16 14:09 ` [patch 19/28] fs: dcache avoid starvation in dcache multi-step operations Nick Piggin
2010-11-16 14:09 ` [patch 20/28] fs: dcache reduce dput locking Nick Piggin
2010-11-16 14:09 ` [patch 21/28] fs: dcache reduce locking in d_alloc Nick Piggin
2010-11-16 14:09 ` [patch 22/28] fs: dcache reduce dcache_inode_lock Nick Piggin
2010-11-16 14:09 ` [patch 23/28] fs: dcache rationalise dget variants Nick Piggin
2010-11-16 14:09 ` [patch 24/28] fs: dcache reduce d_parent locking Nick Piggin
2010-11-16 14:09 ` [patch 25/28] fs: dcache reduce prune_one_dentry locking Nick Piggin
2010-11-16 14:09 ` [patch 26/28] fs: reduce dcache_inode_lock width in lru scanning Nick Piggin
2010-11-16 14:09 ` [patch 27/28] fs: use RCU in shrink_dentry_list to reduce lock nesting Nick Piggin
2010-11-16 14:09 ` [patch 28/28] fs: consolidate dentry kill sequence Nick Piggin
2010-11-17  2:12 ` [patch 00/28] [rfc] dcache scaling part 1 Dave Chinner
2010-11-17 10:56 ` Andi Kleen
2010-11-17 11:19   ` Nick Piggin
2010-11-17 12:01     ` Andi Kleen
2010-11-19 19:43 ` Tim Pepper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).