linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/20] nfsd: open file caching
@ 2015-10-05 11:02 Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 01/20] list_lru: add list_lru_rotate Jeff Layton
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

v5:
- switch to using flush_delayed_fput instead of __fput_sync
- hash on inode->i_ino instead of inode pointer
- add /proc/fs/nfsd/file_cache_stats file to track stats on the hash
- eliminate extra fh_verify in nfsd_file_acquire

v4:
- squash some of the patches down into one patch to reduce churn
- close cached open files after unlink instead of before
- don't just close files after nfsd does an unlink, must do it
  after any vfs-layer unlink. Use fsnotify to handle that.
- use a SRCU notifier chain for setlease
- add patch to allow non-kthreads to do a fput_sync

v3:
- open files are now hashed on inode pointer instead of fh
- eliminate the recurring workqueue job in favor of shrinker/LRU and
  notifier from lease setting code
- have nfsv4 use the cache as well
- removal of raparms cache

v2:
- changelog cleanups and clarifications
- allow COMMIT to use cached open files
- tracepoints for nfsd_file cache
- proactively close open files prior to REMOVE, or a RENAME over a
  positive dentry

This is the fifth iteration of the open file cache patches for nfsd.
The main changes from the v4 set are the conversion of the code to
use flush_delayed_fput instead of __fput_sync, and some changes to
improve performance.

The kbuild test robot noted a drop in performance with this set,
which turned out to be lousy hash distribution due to hashing on
inode pointer value. Hashing on inode->i_ino gives a much better
distribution.

For those seeing this for the first time, main impetus here is to help
speed up NFSv3 I/O. nfsd will do an open+read/write+close for every READ
or WRITE RPC. This patchset allows us to cache those open files more or
less indefinitely, and close them out in response to certain vfs-layer
activity (unlinks and setlease attempts primarily).

The first few patches in the series make (small) changes to several
subsystems to enable the caching infrastructure. The tenth patch adds
the cache itself, and then the remaining patches hook the nfsd code
up to the cache. The final patch rips out the raparms cache since it's
no longer needed with these changes.

Again, the most controversial part of the set is probably the changes
to allow normal user processes to use the delayed_fput infrastructure.
Al, if you could weigh in on those, then that would be helpful. We
really do need a way to allow a thread to flush the final fput work
without returning to userland.

Jeff Layton (20):
  list_lru: add list_lru_rotate
  fs: have flush_delayed_fput flush the workqueue job
  fs: add a kerneldoc header to fput
  fs: add fput_queue
  fs: export flush_delayed_fput
  fsnotify: export several symbols
  locks: create a new notifier chain for lease attempts
  nfsd: move include of state.h from trace.c to trace.h
  sunrpc: add a new cache_detail operation for when a cache is flushed
  nfsd: add a new struct file caching facility to nfsd
  nfsd: keep some rudimentary stats on nfsd_file cache
  nfsd: allow filecache open to skip fh_verify check
  nfsd: hook up nfsd_write to the new nfsd_file cache
  nfsd: hook up nfsd_read to the nfsd_file cache
  nfsd: hook nfsd_commit up to the nfsd_file cache
  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  nfsd: have nfsd_test_lock use the nfsd_file cache
  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  nfsd: rip out the raparms cache

 fs/file_table.c              |  76 +++++-
 fs/locks.c                   |  37 +++
 fs/nfsd/Kconfig              |   2 +
 fs/nfsd/Makefile             |   3 +-
 fs/nfsd/export.c             |  14 +
 fs/nfsd/filecache.c          | 613 +++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h          |  38 +++
 fs/nfsd/nfs3proc.c           |   2 +-
 fs/nfsd/nfs4layouts.c        |  12 +-
 fs/nfsd/nfs4proc.c           |  32 +--
 fs/nfsd/nfs4state.c          | 174 ++++++------
 fs/nfsd/nfs4xdr.c            |  16 +-
 fs/nfsd/nfsctl.c             |  10 +
 fs/nfsd/nfsproc.c            |   2 +-
 fs/nfsd/nfssvc.c             |  16 +-
 fs/nfsd/state.h              |  10 +-
 fs/nfsd/trace.c              |   2 -
 fs/nfsd/trace.h              | 129 +++++++++
 fs/nfsd/vfs.c                | 269 +++++--------------
 fs/nfsd/vfs.h                |  11 +-
 fs/nfsd/xdr4.h               |  15 +-
 fs/notify/group.c            |   2 +
 fs/notify/mark.c             |   3 +
 include/linux/file.h         |   1 +
 include/linux/fs.h           |   1 +
 include/linux/list_lru.h     |  13 +
 include/linux/sunrpc/cache.h |   1 +
 mm/list_lru.c                |  15 ++
 net/sunrpc/cache.c           |   3 +
 29 files changed, 1149 insertions(+), 373 deletions(-)
 create mode 100644 fs/nfsd/filecache.c
 create mode 100644 fs/nfsd/filecache.h

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 01/20] list_lru: add list_lru_rotate
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
       [not found]   ` <1444042962-6947-2-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
  2015-10-05 11:02 ` [PATCH v5 02/20] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Add a function that can move an entry to the MRU end of the list.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 include/linux/list_lru.h | 13 +++++++++++++
 mm/list_lru.c            | 15 +++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 2a6b9947aaa3..4534b1b34d2d 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -96,6 +96,19 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item);
 bool list_lru_del(struct list_lru *lru, struct list_head *item);
 
 /**
+ * list_lru_rotate: rotate an element to the end of an lru list
+ * @list_lru: the lru pointer
+ * @item: the item to be rotated
+ *
+ * This function moves an entry to the end of an LRU list. Should be used when
+ * an entry that is on the LRU is used, and should be moved to the MRU end of
+ * the list. If the item is not on a list, then this function has no effect.
+ * The comments about an element already pertaining to a list are also valid
+ * for list_lru_rotate.
+ */
+void list_lru_rotate(struct list_lru *lru, struct list_head *item);
+
+/**
  * list_lru_count_one: return the number of objects currently held by @lru
  * @lru: the lru pointer.
  * @nid: the node id to count from.
diff --git a/mm/list_lru.c b/mm/list_lru.c
index e1da19fac1b3..66718c2a9a7b 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -130,6 +130,21 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
 }
 EXPORT_SYMBOL_GPL(list_lru_del);
 
+void list_lru_rotate(struct list_lru *lru, struct list_head *item)
+{
+	int nid = page_to_nid(virt_to_page(item));
+	struct list_lru_node *nlru = &lru->node[nid];
+	struct list_lru_one *l;
+
+	spin_lock(&nlru->lock);
+	if (!list_empty(item)) {
+		l = list_lru_from_kmem(nlru, item);
+		list_move_tail(item, &l->list);
+	}
+	spin_unlock(&nlru->lock);
+}
+EXPORT_SYMBOL_GPL(list_lru_rotate);
+
 void list_lru_isolate(struct list_lru_one *list, struct list_head *item)
 {
 	list_del_init(item);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 02/20] fs: have flush_delayed_fput flush the workqueue job
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 01/20] list_lru: add list_lru_rotate Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 03/20] fs: add a kerneldoc header to fput Jeff Layton
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work.  It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..52cc6803c07a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
  */
 void flush_delayed_fput(void)
 {
-	delayed_fput(NULL);
+	flush_delayed_work(&delayed_fput_work);
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 03/20] fs: add a kerneldoc header to fput
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 01/20] list_lru: add list_lru_rotate Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 02/20] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 04/20] fs: add fput_queue Jeff Layton
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

...and move its EXPORT_SYMBOL just below the function.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 52cc6803c07a..8cfeaee6323f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,6 +261,25 @@ void flush_delayed_fput(void)
 	flush_delayed_work(&delayed_fput_work);
 }
 
+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the final cleanup work.
+ *
+ * Why not just do it synchronously? __fput can involve taking locks of all
+ * sorts, and doing it synchronously means that the callers must take extra care
+ * not to deadlock. That can be very difficult to ensure, so by deferring it
+ * until just before return to userland or to the workqueue, we sidestep that
+ * nastiness. Also, __fput can be quite stack intensive, so doing a final fput
+ * has the possibility of blowing up if we don't take steps to ensure that we
+ * have enough stack space to make it work.
+ */
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
@@ -281,6 +300,7 @@ void fput(struct file *file)
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
+EXPORT_SYMBOL(fput);
 
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
@@ -299,7 +319,6 @@ void __fput_sync(struct file *file)
 	}
 }
 
-EXPORT_SYMBOL(fput);
 
 void put_filp(struct file *file)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 04/20] fs: add fput_queue
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (2 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 03/20] fs: add a kerneldoc header to fput Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 07/20] locks: create a new notifier chain for lease attempts Jeff Layton
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

When nfsd caches a file, we want to be able to close it down in advance
of setlease attempts. Setting a lease is generally done at the behest of
userland, so we need a mechanism to ensure that a userland task can
completely close a file without having to return back to userspace.

To do this, we borrow the delayed_fput infrastructure that kthreads use.
fput_queue will queue to the delayed_fput list if the last reference was
put. The caller can then call flush_delayed_fput to ensure that the files
are completely closed before proceeding.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 27 +++++++++++++++++++++++++++
 include/linux/file.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index 8cfeaee6323f..95361d2b8a08 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -302,6 +302,33 @@ void fput(struct file *file)
 }
 EXPORT_SYMBOL(fput);
 
+/**
+ * fput_queue - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * When fput is called in the context of a userland process, it'll queue the
+ * actual work (__fput()) to be done just before returning to userland. In some
+ * cases however, we need to ensure that the __fput runs before that point.
+ * There is no safe way to flush work that has been queued via task_work_add
+ * however, so to do this we borrow the delayed_fput infrastructure that
+ * kthreads use. The userland process can use fput_queue() on one or more
+ * struct files and then call flush_delayed_fput() to ensure that they are
+ * completely closed before proceeding.
+ *
+ * Returns true if the final fput was done, false otherwise. The caller can
+ * use this to determine whether to flush_delayed_fput afterward.
+ */
+bool fput_queue(struct file *file)
+{
+	if (atomic_long_dec_and_test(&file->f_count)) {
+		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+			schedule_delayed_work(&delayed_fput_work, 1);
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL(fput_queue);
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..f9308c9a0746 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
 struct file;
 
 extern void fput(struct file *);
+extern bool fput_queue(struct file *);
 
 struct file_operations;
 struct vfsmount;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 05/20] fs: export flush_delayed_fput
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 06/20] fsnotify: export several symbols Jeff Layton
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

...and clean up the comments over it a bit. The nfsd code will need to
be able to call back into this.

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/file_table.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 95361d2b8a08..899c19687cfa 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -246,20 +246,24 @@ static void ____fput(struct callback_head *work)
 
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
-/*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this.  The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that.  Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+/**
+ * flush_delayed_fput - ensure that all delayed_fput work is complete
+ *
+ * If kernel thread or task that has used fput_queue really needs to have the
+ * final fput() it has done to complete, call this. One of the main users is
+ * the boot - we *do* need to make sure our writes to binaries on initramfs has
+ * not left us with opened struct file waiting for __fput() - execve() won't
+ * work without that.
+ *
+ * Please, don't add more callers without very good reasons; in particular,
+ * never call that with locks held and never from a thread that might need to
+ * do some work on any kind of umount.
  */
 void flush_delayed_fput(void)
 {
 	flush_delayed_work(&delayed_fput_work);
 }
+EXPORT_SYMBOL(flush_delayed_fput);
 
 /**
  * fput - put a struct file reference
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 06/20] fsnotify: export several symbols
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
  2015-10-05 11:02   ` [PATCH v5 05/20] fs: export flush_delayed_fput Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 08/20] nfsd: move include of state.h from trace.c to trace.h Jeff Layton
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

With nfsd's new open-file caching infrastructure, we need a way to know
when unlinks occur so it can close files that it may be holding open.
fsnotify fits the bill nicely, but the symbols aren't currently exported
to modules. Export some of its symbols so nfsd can use this
infrastructure.

Cc: Eric Paris <eparis-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/notify/group.c | 2 ++
 fs/notify/mark.c  | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index d16b62cb2854..295d08800126 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -81,6 +81,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
 	if (atomic_dec_and_test(&group->refcnt))
 		fsnotify_final_destroy_group(group);
 }
+EXPORT_SYMBOL_GPL(fsnotify_put_group);
 
 /*
  * Create a new fsnotify_group and hold a reference for the group returned.
@@ -109,6 +110,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 
 	return group;
 }
+EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
 
 int fsnotify_fasync(int fd, struct file *file, int on)
 {
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index fc0df4442f7b..1dda1734ac84 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -208,6 +208,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	mutex_unlock(&group->mark_mutex);
 	fsnotify_free_mark(mark);
 }
+EXPORT_SYMBOL_GPL(fsnotify_destroy_mark);
 
 void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock)
 {
@@ -402,6 +403,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(fsnotify_add_mark);
 
 /*
  * Given a list of marks, find the mark associated with given group. If found
@@ -492,6 +494,7 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
 	atomic_set(&mark->refcnt, 1);
 	mark->free_mark = free_mark;
 }
+EXPORT_SYMBOL_GPL(fsnotify_init_mark);
 
 static int fsnotify_mark_destroy(void *ignored)
 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 07/20] locks: create a new notifier chain for lease attempts
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (3 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 04/20] fs: add fput_queue Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 09/20] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

With the new file caching infrastructure in nfsd, we can end up holding
files open for an indefinite period of time, even when they are still
idle. This may prevent the kernel from handing out leases on the file,
which is something we don't want to block.

Fix this by running a SRCU notifier call chain whenever on any
lease attempt. nfsd can then purge the cache for that inode before
returning.

Since SRCU is only conditionally compiled in, we must only define the
new chain if it's enabled, and users of the chain must ensure that
SRCU is enabled.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/locks.c         | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 2a54c800a223..a2d5794d713a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -166,6 +166,7 @@ int lease_break_time = 45;
 DEFINE_STATIC_LGLOCK(file_lock_lglock);
 static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
 
+
 /*
  * The blocked_hash is used to find POSIX lock loops for deadlock detection.
  * It is protected by blocked_lock_lock.
@@ -1780,6 +1781,40 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
 }
 EXPORT_SYMBOL(generic_setlease);
 
+#if IS_ENABLED(CONFIG_SRCU)
+/*
+ * Kernel subsystems can register to be notified on any attempt to set
+ * a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
+ * to close files that it may have cached when there is an attempt to set a
+ * conflicting lease.
+ */
+struct srcu_notifier_head lease_notifier_chain;
+EXPORT_SYMBOL_GPL(lease_notifier_chain);
+
+static inline void
+lease_notifier_chain_init(void)
+{
+	srcu_init_notifier_head(&lease_notifier_chain);
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+	if (arg != F_UNLCK)
+		srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
+}
+#else /* !IS_ENABLED(CONFIG_SRCU) */
+static inline void
+lease_notifier_chain_init(void)
+{
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+}
+#endif /* IS_ENABLED(CONFIG_SRCU) */
+
 /**
  * vfs_setlease        -       sets a lease on an open file
  * @filp:	file pointer
@@ -1800,6 +1835,7 @@ EXPORT_SYMBOL(generic_setlease);
 int
 vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
 {
+	setlease_notifier(arg, *lease);
 	if (filp->f_op->setlease)
 		return filp->f_op->setlease(filp, arg, lease, priv);
 	else
@@ -2696,6 +2732,7 @@ static int __init filelock_init(void)
 	for_each_possible_cpu(i)
 		INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
 
+	lease_notifier_chain_init();
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a844c692..c0df75909dd4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1042,6 +1042,7 @@ extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
 extern int fcntl_getlease(struct file *filp);
 
 /* fs/locks.c */
+extern struct srcu_notifier_head	lease_notifier_chain;
 void locks_free_lock_context(struct file_lock_context *ctx);
 void locks_free_lock(struct file_lock *fl);
 extern void locks_init_lock(struct file_lock *);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 08/20] nfsd: move include of state.h from trace.c to trace.h
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
  2015-10-05 11:02   ` [PATCH v5 05/20] fs: export flush_delayed_fput Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 06/20] fsnotify: export several symbols Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 12/20] nfsd: allow filecache open to skip fh_verify check Jeff Layton
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

Any file which includes trace.h will need to include state.h, even if
they aren't using any state tracepoints. Ensure that we include any
headers that might be needed in trace.h instead of relying on the
*.c files to have the right ones.

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/trace.c | 2 --
 fs/nfsd/trace.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c
index 82f89070594c..90967466a1e5 100644
--- a/fs/nfsd/trace.c
+++ b/fs/nfsd/trace.c
@@ -1,5 +1,3 @@
 
-#include "state.h"
-
 #define CREATE_TRACE_POINTS
 #include "trace.h"
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c668520c344b..0befe762762b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -9,6 +9,8 @@
 
 #include <linux/tracepoint.h>
 
+#include "state.h"
+
 DECLARE_EVENT_CLASS(nfsd_stateid_class,
 	TP_PROTO(stateid_t *stp),
 	TP_ARGS(stp),
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 09/20] sunrpc: add a new cache_detail operation for when a cache is flushed
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (4 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 07/20] locks: create a new notifier chain for lease attempts Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 10/20] nfsd: add a new struct file caching facility to nfsd Jeff Layton
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 include/linux/sunrpc/cache.h | 1 +
 net/sunrpc/cache.c           | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 03d3b4c92d9f..d1c10a978bb2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -98,6 +98,7 @@ struct cache_detail {
 					      int has_died);
 
 	struct cache_head *	(*alloc)(void);
+	void			(*flush)(void);
 	int			(*match)(struct cache_head *orig, struct cache_head *new);
 	void			(*init)(struct cache_head *orig, struct cache_head *new);
 	void			(*update)(struct cache_head *orig, struct cache_head *new);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 4a2340a54401..60da9aa2bdc5 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1451,6 +1451,9 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 	cd->nextcheck = seconds_since_boot();
 	cache_flush();
 
+	if (cd->flush)
+		cd->flush();
+
 	*ppos += count;
 	return count;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 10/20] nfsd: add a new struct file caching facility to nfsd
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (5 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 09/20] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 11/20] nfsd: keep some rudimentary stats on nfsd_file cache Jeff Layton
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.

Note that this patch just adds the infrastructure to allow the caching
of open files. Later patches will actually make nfsd use it.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/Kconfig     |   2 +
 fs/nfsd/Makefile    |   3 +-
 fs/nfsd/export.c    |  14 ++
 fs/nfsd/filecache.c | 573 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h |  37 ++++
 fs/nfsd/nfssvc.c    |  10 +-
 fs/nfsd/trace.h     | 127 ++++++++++++
 fs/nfsd/vfs.c       |   3 +-
 fs/nfsd/vfs.h       |   1 +
 9 files changed, 767 insertions(+), 3 deletions(-)
 create mode 100644 fs/nfsd/filecache.c
 create mode 100644 fs/nfsd/filecache.h

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index a0b77fc1bd39..95e0a91d41ef 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -6,6 +6,8 @@ config NFSD
 	select SUNRPC
 	select EXPORTFS
 	select NFS_ACL_SUPPORT if NFSD_V2_ACL
+	select SRCU
+	select FSNOTIFY
 	depends on MULTIUSER
 	help
 	  Choose Y here if you want to allow other computers to access
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 9a6028e120c6..8908bb467727 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_NFSD)	+= nfsd.o
 nfsd-y			+= trace.o
 
 nfsd-y 			+= nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
-			   export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+			   export.o auth.o lockd.o nfscache.o nfsxdr.o \
+			   stats.o filecache.o
 nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
 nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
 nfsd-$(CONFIG_NFSD_V3)	+= nfs3proc.o nfs3xdr.o
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b579f20..4b504edff121 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -21,6 +21,7 @@
 #include "nfsfh.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY	NFSDDBG_EXPORT
 
@@ -231,6 +232,18 @@ static struct cache_head *expkey_alloc(void)
 		return NULL;
 }
 
+static void
+expkey_flush(void)
+{
+	/*
+	 * Take the nfsd_mutex here to ensure that the file cache is not
+	 * destroyed while we're in the middle of flushing.
+	 */
+	mutex_lock(&nfsd_mutex);
+	nfsd_file_cache_purge();
+	mutex_unlock(&nfsd_mutex);
+}
+
 static struct cache_detail svc_expkey_cache_template = {
 	.owner		= THIS_MODULE,
 	.hash_size	= EXPKEY_HASHMAX,
@@ -243,6 +256,7 @@ static struct cache_detail svc_expkey_cache_template = {
 	.init		= expkey_init,
 	.update       	= expkey_update,
 	.alloc		= expkey_alloc,
+	.flush		= expkey_flush,
 };
 
 static int
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
new file mode 100644
index 000000000000..fcbc3bca3bdc
--- /dev/null
+++ b/fs/nfsd/filecache.c
@@ -0,0 +1,573 @@
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <jeff.layton@primarydata.com>
+ */
+
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+#include <linux/list_lru.h>
+#include <linux/fsnotify_backend.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "filecache.h"
+#include "trace.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_FH
+
+/* FIXME: dynamically size this for the machine somehow? */
+#define NFSD_FILE_HASH_BITS                   12
+#define NFSD_FILE_HASH_SIZE                  (1 << NFSD_FILE_HASH_BITS)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+	struct hlist_head	nfb_head;
+	spinlock_t		nfb_lock;
+};
+
+static struct kmem_cache		*nfsd_file_slab;
+static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
+static struct list_lru			nfsd_file_lru;
+static struct fsnotify_group		*nfsd_file_fsnotify_group;
+
+/*
+ * The fsnotify_mark is embedded inside the nfsd_file and we don't want to
+ * explicitly free it. It'll be freed when the nfsd_file is and we always
+ * remove the mark from the inode before freeing it. So, this is a no-op.
+ */
+static void
+nfsd_file_free_mark(struct fsnotify_mark *mark)
+{
+}
+
+static void
+nfsd_file_slab_free(struct rcu_head *rcu)
+{
+	struct nfsd_file *nf = container_of(rcu, struct nfsd_file, nf_rcu);
+
+	kmem_cache_free(nfsd_file_slab, nf);
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval)
+{
+	struct nfsd_file *nf;
+
+	nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
+	if (nf) {
+		INIT_HLIST_NODE(&nf->nf_node);
+		INIT_LIST_HEAD(&nf->nf_lru);
+		nf->nf_inode = inode;
+		nf->nf_hashval = hashval;
+		atomic_set(&nf->nf_ref, 1);
+		nf->nf_may = NFSD_FILE_MAY_MASK & may;
+		if (may & NFSD_MAY_NOT_BREAK_LEASE) {
+			if (may & NFSD_MAY_WRITE)
+				__set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			if (may & NFSD_MAY_READ)
+				__set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		}
+		fsnotify_init_mark(&nf->nf_mark, nfsd_file_free_mark);
+		nf->nf_mark.mask = FS_ATTRIB|FS_DELETE_SELF;
+		trace_nfsd_file_alloc(nf);
+	}
+	return nf;
+}
+
+static void
+nfsd_file_put_final(struct nfsd_file *nf)
+{
+	trace_nfsd_file_put_final(nf);
+	fsnotify_destroy_mark(&nf->nf_mark, nfsd_file_fsnotify_group);
+	if (nf->nf_file)
+		fput(nf->nf_file);
+	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+}
+
+static bool
+nfsd_file_put_final_delayed(struct nfsd_file *nf)
+{
+	bool flush = false;
+
+	trace_nfsd_file_put_final(nf);
+	fsnotify_destroy_mark(&nf->nf_mark, nfsd_file_fsnotify_group);
+	if (nf->nf_file)
+		flush = fput_queue(nf->nf_file);
+	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+	return flush;
+}
+
+static bool
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+	trace_nfsd_file_unhash(nf);
+	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
+		hlist_del_rcu(&nf->nf_node);
+		list_lru_del(&nfsd_file_lru, &nf->nf_lru);
+		return true;
+	}
+	return false;
+}
+
+static void
+nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+	trace_nfsd_file_unhash_and_release_locked(nf);
+	if (!nfsd_file_unhash(nf))
+		return;
+	if (!atomic_dec_and_test(&nf->nf_ref))
+		return;
+
+	list_add(&nf->nf_lru, dispose);
+}
+
+static void
+nfsd_file_unhash_and_release(struct nfsd_file *nf)
+{
+	bool destroy = false;
+
+	spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	if (nfsd_file_unhash(nf))
+		destroy = atomic_dec_and_test(&nf->nf_ref);
+	spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	if (destroy)
+		nfsd_file_put_final(nf);
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+	trace_nfsd_file_put(nf);
+	list_lru_rotate(&nfsd_file_lru, &nf->nf_lru);
+	if (!atomic_dec_and_test(&nf->nf_ref))
+		return;
+
+	WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+	nfsd_file_put_final(nf);
+}
+
+struct nfsd_file *
+nfsd_file_get(struct nfsd_file *nf)
+{
+	if (likely(atomic_inc_not_zero(&nf->nf_ref)))
+		return nf;
+	return NULL;
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+	struct nfsd_file *nf;
+
+	while(!list_empty(dispose)) {
+		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+		list_del(&nf->nf_lru);
+		nfsd_file_put_final(nf);
+	}
+}
+
+static void
+nfsd_file_dispose_list_sync(struct list_head *dispose)
+{
+	bool flush = false;
+	struct nfsd_file *nf;
+
+	while(!list_empty(dispose)) {
+		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+		list_del(&nf->nf_lru);
+		if (nfsd_file_put_final_delayed(nf))
+			flush = true;
+	}
+	if (flush)
+		flush_delayed_fput();
+}
+
+static enum lru_status
+nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
+		 spinlock_t *lock, void *arg)
+	__releases(lock)
+	__acquires(lock)
+{
+	struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
+	bool unhashed;
+
+	if (atomic_read(&nf->nf_ref) > 1)
+		return LRU_SKIP;
+
+	spin_unlock(lock);
+	spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	unhashed = nfsd_file_unhash(nf);
+	spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	if (unhashed)
+		nfsd_file_put(nf);
+	spin_lock(lock);
+	return unhashed ? LRU_REMOVED_RETRY : LRU_RETRY;
+}
+
+static unsigned long
+nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc)
+{
+	return list_lru_count(&nfsd_file_lru);
+}
+
+static unsigned long
+nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
+{
+	return list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, NULL);
+}
+
+static struct shrinker	nfsd_file_shrinker = {
+	.scan_objects = nfsd_file_lru_scan,
+	.count_objects = nfsd_file_lru_count,
+	.seeks = 1,
+};
+
+static int
+nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
+			    void *data)
+{
+	struct file_lock *fl = data;
+
+	/* Only close files for F_SETLEASE leases */
+	if (fl->fl_flags & FL_LEASE)
+		nfsd_file_close_inode_sync(file_inode(fl->fl_file));
+	return 0;
+}
+
+static struct notifier_block nfsd_file_lease_notifier = {
+	.notifier_call = nfsd_file_lease_notifier_call,
+};
+
+static int
+nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
+				struct inode *inode,
+				struct fsnotify_mark *inode_mark,
+				struct fsnotify_mark *vfsmount_mark,
+				u32 mask, void *data, int data_type,
+				const unsigned char *file_name, u32 cookie)
+{
+	struct nfsd_file	*nf;
+
+	trace_nfsd_file_fsnotify_handle_event(inode, mask);
+
+	/* Should be no marks on non-regular files */
+	if (!S_ISREG(inode->i_mode)) {
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+
+	/* ...and we don't do anything with vfsmount marks */
+	BUG_ON(vfsmount_mark);
+
+	/* don't close files if this was not the last link */
+	if (mask & FS_ATTRIB) {
+		if (inode->i_nlink)
+			return 0;
+	}
+
+	/* FIXME: get container of mark, unhash and release it */
+	nf = container_of(inode_mark, struct nfsd_file, nf_mark);
+	nfsd_file_unhash_and_release(nf);
+	return 0;
+}
+
+
+const static struct fsnotify_ops nfsd_file_fsnotify_ops = {
+	.handle_event = nfsd_file_fsnotify_handle_event,
+};
+
+int
+nfsd_file_cache_init(void)
+{
+	int		ret = -ENOMEM;
+	unsigned int	i;
+
+	if (nfsd_file_hashtbl)
+		return 0;
+
+	nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+	if (!nfsd_file_hashtbl) {
+		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
+		goto out_err;
+	}
+
+	nfsd_file_slab = kmem_cache_create("nfsd_file",
+				sizeof(struct nfsd_file), 0, 0, NULL);
+	if (!nfsd_file_slab) {
+		pr_err("nfsd: unable to create nfsd_file_slab\n");
+		goto out_err;
+	}
+
+	ret = list_lru_init(&nfsd_file_lru);
+	if (ret) {
+		pr_err("nfsd: failed to init nfsd_file_lru: %d\n", ret);
+		goto out_err;
+	}
+
+	ret = register_shrinker(&nfsd_file_shrinker);
+	if (ret) {
+		pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+		goto out_lru;
+	}
+
+	ret = srcu_notifier_chain_register(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+	if (ret) {
+		pr_err("nfsd: unable to register lease notifier: %d\n", ret);
+		goto out_shrinker;
+	}
+
+	nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops);
+	if (IS_ERR(nfsd_file_fsnotify_group)) {
+		pr_err("nfsd: unable to create fsnotify group: %ld\n",
+			PTR_ERR(nfsd_file_fsnotify_group));
+		nfsd_file_fsnotify_group = NULL;
+		goto out_notifier;
+	}
+
+	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+		INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+		spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+	}
+out:
+	return ret;
+out_notifier:
+	srcu_notifier_chain_unregister(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+out_shrinker:
+	unregister_shrinker(&nfsd_file_shrinker);
+out_lru:
+	list_lru_destroy(&nfsd_file_lru);
+out_err:
+	kmem_cache_destroy(nfsd_file_slab);
+	nfsd_file_slab = NULL;
+	kfree(nfsd_file_hashtbl);
+	nfsd_file_hashtbl = NULL;
+	goto out;
+}
+
+void
+nfsd_file_cache_purge(void)
+{
+	unsigned int		i;
+	struct nfsd_file	*nf;
+	LIST_HEAD(dispose);
+
+	if (!nfsd_file_hashtbl)
+		return;
+
+	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+		spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+		while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+			nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+					 struct nfsd_file, nf_node);
+			nfsd_file_unhash_and_release_locked(nf, &dispose);
+		}
+		spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+		nfsd_file_dispose_list(&dispose);
+	}
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+	LIST_HEAD(dispose);
+
+	srcu_notifier_chain_unregister(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+	unregister_shrinker(&nfsd_file_shrinker);
+	nfsd_file_cache_purge();
+	fsnotify_put_group(nfsd_file_fsnotify_group);
+	nfsd_file_fsnotify_group = NULL;
+	list_lru_destroy(&nfsd_file_lru);
+	rcu_barrier();
+	kmem_cache_destroy(nfsd_file_slab);
+	nfsd_file_slab = NULL;
+	kfree(nfsd_file_hashtbl);
+	nfsd_file_hashtbl = NULL;
+}
+
+/*
+ * Search nfsd_file_hashtbl[] for file. We hash on the filehandle and also on
+ * the NFSD_MAY_READ/WRITE flags. If the file is open for r/w, then it's usable
+ * for either.
+ */
+static struct nfsd_file *
+nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
+			unsigned int hashval)
+{
+	struct nfsd_file *nf;
+	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+	hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+				 nf_node) {
+		if ((need & nf->nf_may) != need)
+			continue;
+		if (nf->nf_inode == inode)
+			return nfsd_file_get(nf);
+	}
+	return NULL;
+}
+
+/**
+ * nfsd_file_close_inode - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them.
+ */
+void
+nfsd_file_close_inode_sync(struct inode *inode)
+{
+	struct nfsd_file	*nf;
+	struct hlist_node	*tmp;
+	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
+						NFSD_FILE_HASH_BITS);
+	LIST_HEAD(dispose);
+
+	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
+		if (inode == nf->nf_inode)
+			nfsd_file_unhash_and_release_locked(nf, &dispose);
+	}
+	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose));
+	nfsd_file_dispose_list_sync(&dispose);
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **pnf)
+{
+	__be32	status = nfs_ok;
+	struct nfsd_file *nf, *new = NULL;
+	struct inode *inode;
+	unsigned int hashval;
+
+	/* FIXME: skip this if fh_dentry is already set? */
+	status = fh_verify(rqstp, fhp, S_IFREG,
+				may_flags|NFSD_MAY_OWNER_OVERRIDE);
+	if (status != nfs_ok)
+		return status;
+
+	inode = d_inode(fhp->fh_dentry);
+	hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+retry:
+	rcu_read_lock();
+	nf = nfsd_file_find_locked(inode, may_flags, hashval);
+	rcu_read_unlock();
+	if (nf)
+		goto wait_for_construction;
+
+	if (!new) {
+		new = nfsd_file_alloc(inode, may_flags, hashval);
+		if (!new) {
+			trace_nfsd_file_acquire(hashval, inode, may_flags, NULL,
+						nfserr_jukebox);
+			return nfserr_jukebox;
+		}
+	}
+
+	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	nf = nfsd_file_find_locked(inode, may_flags, hashval);
+	if (likely(nf == NULL)) {
+		/* Take reference for the hashtable */
+		atomic_inc(&new->nf_ref);
+		__set_bit(NFSD_FILE_HASHED, &new->nf_flags);
+		__set_bit(NFSD_FILE_PENDING, &new->nf_flags);
+		list_lru_add(&nfsd_file_lru, &new->nf_lru);
+		hlist_add_head_rcu(&new->nf_node,
+				&nfsd_file_hashtbl[hashval].nfb_head);
+		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+		/* This should never fail since we set allow_dups to true */
+		WARN_ON_ONCE(fsnotify_add_mark(&new->nf_mark,
+			nfsd_file_fsnotify_group, inode, NULL, true));
+		nf = new;
+		new = NULL;
+		goto open_file;
+	}
+	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+wait_for_construction:
+	wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+	/* Did construction of this file fail? */
+	if (!nf->nf_file) {
+		/*
+		 * We can only take over construction for this nfsd_file if the
+		 * MAY flags are equal. Otherwise, we put the reference and try
+		 * again.
+		 */
+		if ((may_flags & NFSD_FILE_MAY_MASK) != nf->nf_may) {
+			nfsd_file_put(nf);
+			goto retry;
+		}
+
+		/* try to take over construction for this file */
+		if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
+			goto wait_for_construction;
+
+		/* sync up the BREAK_* flags with our may_flags */
+		if (may_flags & NFSD_MAY_NOT_BREAK_LEASE) {
+			if (may_flags & NFSD_MAY_WRITE)
+				set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			if (may_flags & NFSD_MAY_READ)
+				set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		} else {
+			clear_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		}
+
+		goto open_file;
+	}
+
+	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
+		bool write = (may_flags & NFSD_MAY_WRITE);
+
+		if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
+		    (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
+			status = nfserrno(nfsd_open_break_lease(
+					file_inode(nf->nf_file), may_flags));
+			if (status == nfs_ok) {
+				clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+				if (write)
+					clear_bit(NFSD_FILE_BREAK_WRITE,
+						  &nf->nf_flags);
+			}
+		}
+	}
+out:
+	if (status == nfs_ok) {
+		*pnf = nf;
+	} else {
+		nfsd_file_put(nf);
+		nf = NULL;
+	}
+
+	if (new)
+		nfsd_file_put(new);
+
+	trace_nfsd_file_acquire(hashval, inode, may_flags, nf, status);
+	return status;
+open_file:
+	/* FIXME: should we abort opening if the link count goes to 0? */
+	status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
+	smp_mb__after_atomic();
+	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+	goto out;
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
new file mode 100644
index 000000000000..5c871c3114f2
--- /dev/null
+++ b/fs/nfsd/filecache.h
@@ -0,0 +1,37 @@
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/fsnotify_backend.h>
+
+/*
+ * A representation of a file that has been opened by knfsd. These are hashed
+ * in the hashtable by inode pointer value. Note that this object doesn't
+ * hold a reference to the inode by itself, so the nf_inode pointer should
+ * never be dereferenced, only be used for comparison.
+ */
+struct nfsd_file {
+	struct hlist_node	nf_node;
+	struct list_head	nf_lru;
+	struct rcu_head		nf_rcu;
+	struct file		*nf_file;
+#define NFSD_FILE_HASHED	(0)
+#define NFSD_FILE_PENDING	(1)
+#define NFSD_FILE_BREAK_READ	(2)
+#define NFSD_FILE_BREAK_WRITE	(3)
+	unsigned long		nf_flags;
+	struct inode		*nf_inode;
+	unsigned int		nf_hashval;
+	atomic_t		nf_ref;
+	unsigned char		nf_may;
+	struct fsnotify_mark	nf_mark;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(void);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+void nfsd_file_close_inode_sync(struct inode *inode);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **nfp);
+#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e2377dd63..d816bb3faa6e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -22,6 +22,7 @@
 #include "cache.h"
 #include "vfs.h"
 #include "netns.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY	NFSDDBG_SVC
 
@@ -224,11 +225,17 @@ static int nfsd_startup_generic(int nrservs)
 	if (ret)
 		goto dec_users;
 
-	ret = nfs4_state_start();
+	ret = nfsd_file_cache_init();
 	if (ret)
 		goto out_racache;
+
+	ret = nfs4_state_start();
+	if (ret)
+		goto out_file_cache;
 	return 0;
 
+out_file_cache:
+	nfsd_file_cache_shutdown();
 out_racache:
 	nfsd_racache_shutdown();
 dec_users:
@@ -242,6 +249,7 @@ static void nfsd_shutdown_generic(void)
 		return;
 
 	nfs4_state_shutdown();
+	nfsd_file_cache_shutdown();
 	nfsd_racache_shutdown();
 }
 
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 0befe762762b..49f0d1f9c949 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -10,6 +10,8 @@
 #include <linux/tracepoint.h>
 
 #include "state.h"
+#include "filecache.h"
+#include "vfs.h"
 
 DECLARE_EVENT_CLASS(nfsd_stateid_class,
 	TP_PROTO(stateid_t *stp),
@@ -48,6 +50,131 @@ DEFINE_STATEID_EVENT(layout_recall_done);
 DEFINE_STATEID_EVENT(layout_recall_fail);
 DEFINE_STATEID_EVENT(layout_recall_release);
 
+#define show_nf_flags(val)						\
+	__print_flags(val, "|",						\
+		{ 1 << NFSD_FILE_HASHED,	"HASHED" },		\
+		{ 1 << NFSD_FILE_PENDING,	"PENDING" },		\
+		{ 1 << NFSD_FILE_BREAK_READ,	"BREAK_READ" },		\
+		{ 1 << NFSD_FILE_BREAK_WRITE,	"BREAK_WRITE" })
+
+/* FIXME: This should probably be fleshed out in the future. */
+#define show_nf_may(val)						\
+	__print_flags(val, "|",						\
+		{ NFSD_MAY_READ,		"READ" },		\
+		{ NFSD_MAY_WRITE,		"WRITE" },		\
+		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" })
+
+DECLARE_EVENT_CLASS(nfsd_file_class,
+	TP_PROTO(struct nfsd_file *nf),
+	TP_ARGS(nf),
+	TP_STRUCT__entry(
+		__field(unsigned int, nf_hashval)
+		__field(void *, nf_inode)
+		__field(int, nf_ref)
+		__field(unsigned long, nf_flags)
+		__field(unsigned char, nf_may)
+		__field(struct file *, nf_file)
+	),
+	TP_fast_assign(
+		__entry->nf_hashval = nf->nf_hashval;
+		__entry->nf_inode = nf->nf_inode;
+		__entry->nf_ref = atomic_read(&nf->nf_ref);
+		__entry->nf_flags = nf->nf_flags;
+		__entry->nf_may = nf->nf_may;
+		__entry->nf_file = nf->nf_file;
+	),
+	TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p",
+		__entry->nf_hashval,
+		__entry->nf_inode,
+		__entry->nf_ref,
+		show_nf_flags(__entry->nf_flags),
+		show_nf_may(__entry->nf_may),
+		__entry->nf_file)
+)
+
+#define DEFINE_NFSD_FILE_EVENT(name) \
+DEFINE_EVENT(nfsd_file_class, name, \
+	TP_PROTO(struct nfsd_file *nf), \
+	TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
+
+TRACE_EVENT(nfsd_file_acquire,
+	TP_PROTO(unsigned int hash, struct inode *inode,
+		 unsigned int may_flags, struct nfsd_file *nf,
+		 __be32 status),
+
+	TP_ARGS(hash, inode, may_flags, nf, status),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, hash)
+		__field(void *, inode)
+		__field(unsigned int, may_flags)
+		__field(int, nf_ref)
+		__field(unsigned long, nf_flags)
+		__field(unsigned char, nf_may)
+		__field(struct file *, nf_file)
+		__field(__be32, status)
+	),
+
+	TP_fast_assign(
+		__entry->hash = hash;
+		__entry->inode = inode;
+		__entry->may_flags = may_flags;
+		__entry->nf_ref = nf ? atomic_read(&nf->nf_ref) : 0;
+		__entry->nf_flags = nf ? nf->nf_flags : 0;
+		__entry->nf_may = nf ? nf->nf_may : 0;
+		__entry->nf_file = nf ? nf->nf_file : NULL;
+		__entry->status = status;
+	),
+
+	TP_printk("hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u",
+			__entry->hash, __entry->inode,
+			show_nf_may(__entry->may_flags), __entry->nf_ref,
+			show_nf_flags(__entry->nf_flags),
+			show_nf_may(__entry->nf_may), __entry->nf_file,
+			be32_to_cpu(__entry->status))
+);
+
+TRACE_EVENT(nfsd_file_close_inode_sync,
+	TP_PROTO(struct inode *inode, unsigned int hash, int found),
+	TP_ARGS(inode, hash, found),
+	TP_STRUCT__entry(
+		__field(struct inode *, inode)
+		__field(unsigned int, hash)
+		__field(int, found)
+	),
+	TP_fast_assign(
+		__entry->inode = inode;
+		__entry->hash = hash;
+		__entry->found = found;
+	),
+	TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash,
+			__entry->inode, __entry->found)
+);
+
+TRACE_EVENT(nfsd_file_fsnotify_handle_event,
+	TP_PROTO(struct inode *inode, u32 mask),
+	TP_ARGS(inode, mask),
+	TP_STRUCT__entry(
+		__field(struct inode *, inode)
+		__field(unsigned int, nlink)
+		__field(umode_t, mode)
+		__field(u32, mask)
+	),
+	TP_fast_assign(
+		__entry->inode = inode;
+		__entry->nlink = inode->i_nlink;
+		__entry->mode = inode->i_mode;
+		__entry->mask = mask;
+	),
+	TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode,
+			__entry->nlink, __entry->mode, __entry->mask)
+);
 #endif /* _NFSD_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4212aaacbb55..a144849cec10 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -623,7 +623,8 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor
 }
 #endif /* CONFIG_NFSD_V3 */
 
-static int nfsd_open_break_lease(struct inode *inode, int access)
+int
+nfsd_open_break_lease(struct inode *inode, int access)
 {
 	unsigned int mode;
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fcfc48cbe136..a877be59d5dd 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -69,6 +69,7 @@ __be32		do_nfsd_create(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
 				loff_t, unsigned long);
 #endif /* CONFIG_NFSD_V3 */
+int		nfsd_open_break_lease(struct inode *, int);
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
 struct raparms;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 11/20] nfsd: keep some rudimentary stats on nfsd_file cache
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (6 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 10/20] nfsd: add a new struct file caching facility to nfsd Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Per chain count and max length, protected by the per-chain spinlock.
When the file is read, we walk the array of buckets and fetch the
count from each.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/filecache.c | 40 ++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h |  1 +
 fs/nfsd/nfsctl.c    | 10 ++++++++++
 3 files changed, 51 insertions(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fcbc3bca3bdc..932b58a5774f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -11,6 +11,7 @@
 #include <linux/sched.h>
 #include <linux/list_lru.h>
 #include <linux/fsnotify_backend.h>
+#include <linux/seq_file.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -30,6 +31,8 @@
 struct nfsd_fcache_bucket {
 	struct hlist_head	nfb_head;
 	spinlock_t		nfb_lock;
+	unsigned int		nfb_count;
+	unsigned int		nfb_maxcount;
 };
 
 static struct kmem_cache		*nfsd_file_slab;
@@ -111,6 +114,7 @@ nfsd_file_unhash(struct nfsd_file *nf)
 
 	trace_nfsd_file_unhash(nf);
 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		--nfsd_file_hashtbl[nf->nf_hashval].nfb_count;
 		clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 		hlist_del_rcu(&nf->nf_node);
 		list_lru_del(&nfsd_file_lru, &nf->nf_lru);
@@ -491,6 +495,9 @@ retry:
 		list_lru_add(&nfsd_file_lru, &new->nf_lru);
 		hlist_add_head_rcu(&new->nf_node,
 				&nfsd_file_hashtbl[hashval].nfb_head);
+		++nfsd_file_hashtbl[hashval].nfb_count;
+		nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
+				nfsd_file_hashtbl[hashval].nfb_count);
 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
 
 		/* This should never fail since we set allow_dups to true */
@@ -571,3 +578,36 @@ open_file:
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
 	goto out;
 }
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
+{
+	unsigned int i, count = 0, longest = 0;
+
+	/*
+	 * No need for spinlocks here since we're not terribly interested in
+	 * accuracy. We do take the nfsd_mutex simply to ensure that we
+	 * don't end up racing with server shutdown
+	 */
+	mutex_lock(&nfsd_mutex);
+	if (nfsd_file_hashtbl) {
+		for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+			count += nfsd_file_hashtbl[i].nfb_count;
+			longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
+		}
+	}
+	mutex_unlock(&nfsd_mutex);
+
+	seq_printf(m, "total entries: %u\n", count);
+	seq_printf(m, "longest chain: %u\n", longest);
+	return 0;
+}
+
+int nfsd_file_cache_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, nfsd_file_cache_stats_show, NULL);
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 5c871c3114f2..a1467dbf1d29 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -34,4 +34,5 @@ struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **nfp);
+int	nfsd_file_cache_stats_open(struct inode *, struct file *);
 #endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9690cb4dd588..eff44a277f70 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -22,6 +22,7 @@
 #include "state.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 /*
  *	We have a single directory with several nodes in it.
@@ -36,6 +37,7 @@ enum {
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Pool_Stats,
+	NFSD_File_Cache_Stats,
 	NFSD_Reply_Cache_Stats,
 	NFSD_Versions,
 	NFSD_Ports,
@@ -220,6 +222,13 @@ static const struct file_operations pool_stats_operations = {
 	.owner		= THIS_MODULE,
 };
 
+static struct file_operations file_cache_stats_operations = {
+	.open		= nfsd_file_cache_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static struct file_operations reply_cache_stats_operations = {
 	.open		= nfsd_reply_cache_stats_open,
 	.read		= seq_read,
@@ -1138,6 +1147,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
+		[NFSD_File_Cache_Stats] = {"file_cache_stats", &file_cache_stats_operations, S_IRUGO},
 		[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 12/20] nfsd: allow filecache open to skip fh_verify check
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-10-05 11:02   ` [PATCH v5 08/20] nfsd: move include of state.h from trace.c to trace.h Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 13/20] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

Currently, we call fh_verify twice on the filehandle. Once when we call
into nfsd_file_acquire, and then again from nfsd_open. The second one is
completely superfluous though, and fh_verify can do some things that
require a fair bit of work (checking permissions, for instance).

Create a new nfsd_open_verified function that will do an nfsd_open on a
filehandle that has already been verified. Call that from the filecache
code.

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/nfsd/filecache.c |  2 +-
 fs/nfsd/vfs.c       | 63 +++++++++++++++++++++++++++++++++++------------------
 fs/nfsd/vfs.h       |  2 ++
 3 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 932b58a5774f..9944152df415 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -572,7 +572,7 @@ out:
 	return status;
 open_file:
 	/* FIXME: should we abort opening if the link count goes to 0? */
-	status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+	status = nfsd_open_verified(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
 	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
 	smp_mb__after_atomic();
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index a144849cec10..cf4a2018d57a 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -640,9 +640,9 @@ nfsd_open_break_lease(struct inode *inode, int access)
  * and additional flags.
  * N.B. After this call fhp needs an fh_put
  */
-__be32
-nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
-			int may_flags, struct file **filp)
+static __be32
+__nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+	    int may_flags, struct file **filp)
 {
 	struct path	path;
 	struct inode	*inode;
@@ -651,24 +651,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	__be32		err;
 	int		host_err = 0;
 
-	validate_process_creds();
-
-	/*
-	 * If we get here, then the client has already done an "open",
-	 * and (hopefully) checked permission - so allow OWNER_OVERRIDE
-	 * in case a chmod has now revoked permission.
-	 *
-	 * Arguably we should also allow the owner override for
-	 * directories, but we never have and it doesn't seem to have
-	 * caused anyone a problem.  If we were to change this, note
-	 * also that our filldir callbacks would need a variant of
-	 * lookup_one_len that doesn't check permissions.
-	 */
-	if (type == S_IFREG)
-		may_flags |= NFSD_MAY_OWNER_OVERRIDE;
-	err = fh_verify(rqstp, fhp, type, may_flags);
-	if (err)
-		goto out;
+	BUG_ON(!fhp->fh_dentry);
 
 	path.mnt = fhp->fh_export->ex_path.mnt;
 	path.dentry = fhp->fh_dentry;
@@ -723,6 +706,44 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 out_nfserr:
 	err = nfserrno(host_err);
 out:
+	return err;
+}
+
+__be32
+nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+			int may_flags, struct file **filp)
+{
+	__be32 err;
+
+	validate_process_creds();
+	/*
+	 * If we get here, then the client has already done an "open",
+	 * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+	 * in case a chmod has now revoked permission.
+	 *
+	 * Arguably we should also allow the owner override for
+	 * directories, but we never have and it doesn't seem to have
+	 * caused anyone a problem.  If we were to change this, note
+	 * also that our filldir callbacks would need a variant of
+	 * lookup_one_len that doesn't check permissions.
+	 */
+	if (type == S_IFREG)
+		may_flags |= NFSD_MAY_OWNER_OVERRIDE;
+	err = fh_verify(rqstp, fhp, type, may_flags);
+	if (!err)
+		err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+	validate_process_creds();
+	return err;
+}
+
+__be32
+nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+		   int may_flags, struct file **filp)
+{
+	__be32 err;
+
+	validate_process_creds();
+	err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
 	validate_process_creds();
 	return err;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index a877be59d5dd..b3beb896b08d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -72,6 +72,8 @@ __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
 int		nfsd_open_break_lease(struct inode *, int);
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
+__be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
+				int, struct file **);
 struct raparms;
 __be32		nfsd_splice_read(struct svc_rqst *,
 				struct file *, loff_t, unsigned long *);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 13/20] nfsd: hook up nfsd_write to the new nfsd_file cache
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-10-05 11:02   ` [PATCH v5 12/20] nfsd: allow filecache open to skip fh_verify check Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 14/20] nfsd: hook up nfsd_read to the " Jeff Layton
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

Note that all callers currently pass in NULL for "file" anyway, so
there was already some dead code in here. Just eliminate that parm
and have it use the file cache instead of dealing directly with a
filp.

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/nfsd/nfs3proc.c |  2 +-
 fs/nfsd/nfsproc.c  |  2 +-
 fs/nfsd/vfs.c      | 33 +++++++++++----------------------
 fs/nfsd/vfs.h      |  2 +-
 4 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7b755b7f785c..4e46ac511479 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,7 +192,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp, struct nfsd3_writeargs *argp,
 
 	fh_copy(&resp->fh, &argp->fh);
 	resp->committed = argp->stable;
-	nfserr = nfsd_write(rqstp, &resp->fh, NULL,
+	nfserr = nfsd_write(rqstp, &resp->fh,
 				   argp->offset,
 				   rqstp->rq_vec, argp->vlen,
 				   &cnt,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 4cd78ef4c95c..9893095cbee1 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -213,7 +213,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, struct nfsd_writeargs *argp,
 		SVCFH_fmt(&argp->fh),
 		argp->len, argp->offset);
 
-	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
+	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
 				   argp->offset,
 				   rqstp->rq_vec, argp->vlen,
 			           &cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cf4a2018d57a..cb1ab146246b 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@
 
 #include "nfsd.h"
 #include "vfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
@@ -1030,30 +1031,18 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
  * N.B. After this call fhp needs an fh_put
  */
 __be32
-nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
-		loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt,
-		int *stablep)
+nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
+	   struct kvec *vec, int vlen, unsigned long *cnt, int *stablep)
 {
-	__be32			err = 0;
-
-	if (file) {
-		err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
-				NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
-		if (err)
-			goto out;
-		err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
-				stablep);
-	} else {
-		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
-		if (err)
-			goto out;
-
-		if (cnt)
-			err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
-					     cnt, stablep);
-		fput(file);
+	__be32			err;
+	struct nfsd_file	*nf;
+
+	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+	if (err == nfs_ok) {
+		err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+					vlen, cnt, stablep);
+		nfsd_file_put(nf);
 	}
-out:
 	return err;
 }
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b3beb896b08d..80692e06302d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -81,7 +81,7 @@ __be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
 				unsigned long *);
 __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
 				loff_t, struct kvec *, int, unsigned long *);
-__be32 		nfsd_write(struct svc_rqst *, struct svc_fh *,struct file *,
+__be32 		nfsd_write(struct svc_rqst *, struct svc_fh *,
 				loff_t, struct kvec *,int, unsigned long *, int *);
 __be32		nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct file *file, loff_t offset,
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 14/20] nfsd: hook up nfsd_read to the nfsd_file cache
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-10-05 11:02   ` [PATCH v5 13/20] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-05 11:02   ` [PATCH v5 18/20] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
  2015-10-08 16:42   ` [PATCH v5 00/20] nfsd: open file caching J. Bruce Fields
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/nfsd/vfs.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cb1ab146246b..050db266ef80 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1008,20 +1008,15 @@ out_nfserr:
 __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
 {
-	struct file *file;
-	struct raparms	*ra;
-	__be32 err;
-
-	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
-	if (err)
-		return err;
-
-	ra = nfsd_init_raparms(file);
-	err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
-	if (ra)
-		nfsd_put_raparams(file, ra);
-	fput(file);
+	__be32			err;
+	struct nfsd_file	*nf;
 
+	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+	if (err == nfs_ok) {
+		err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
+					count);
+		nfsd_file_put(nf);
+	}
 	return err;
 }
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 15/20] nfsd: hook nfsd_commit up to the nfsd_file cache
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (8 preceding siblings ...)
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 16/20] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Use cached filps if possible instead of opening a new one every time.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/vfs.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 050db266ef80..571f1000e670 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1055,9 +1055,9 @@ __be32
 nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
                loff_t offset, unsigned long count)
 {
-	struct file	*file;
-	loff_t		end = LLONG_MAX;
-	__be32		err = nfserr_inval;
+	struct nfsd_file	*nf;
+	loff_t			end = LLONG_MAX;
+	__be32			err = nfserr_inval;
 
 	if (offset < 0)
 		goto out;
@@ -1067,12 +1067,12 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = nfsd_open(rqstp, fhp, S_IFREG,
-			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &file);
+	err = nfsd_file_acquire(rqstp, fhp,
+			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
 	if (err)
 		goto out;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		int err2 = vfs_fsync_range(file, offset, end, 0);
+		int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
 
 		if (err2 != -EINVAL)
 			err = nfserrno(err2);
@@ -1080,7 +1080,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			err = nfserr_notsupp;
 	}
 
-	fput(file);
+	nfsd_file_put(nf);
 out:
 	return err;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 16/20] nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (9 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 15/20] nfsd: hook nfsd_commit up to the nfsd_file cache Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 17/20] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 23 ++++++++++++-----------
 fs/nfsd/state.h     |  2 +-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2b73e2885a82..b72fa6816860 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@
 
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY                NFSDDBG_PROC
 
@@ -292,7 +293,7 @@ static struct file *
 __nfs4_get_fd(struct nfs4_file *f, int oflag)
 {
 	if (f->fi_fds[oflag])
-		return get_file(f->fi_fds[oflag]);
+		return get_file(f->fi_fds[oflag]->nf_file);
 	return NULL;
 }
 
@@ -449,17 +450,17 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
 	might_lock(&fp->fi_lock);
 
 	if (atomic_dec_and_lock(&fp->fi_access[oflag], &fp->fi_lock)) {
-		struct file *f1 = NULL;
-		struct file *f2 = NULL;
+		struct nfsd_file *f1 = NULL;
+		struct nfsd_file *f2 = NULL;
 
 		swap(f1, fp->fi_fds[oflag]);
 		if (atomic_read(&fp->fi_access[1 - oflag]) == 0)
 			swap(f2, fp->fi_fds[O_RDWR]);
 		spin_unlock(&fp->fi_lock);
 		if (f1)
-			fput(f1);
+			nfsd_file_put(f1);
 		if (f2)
-			fput(f2);
+			nfsd_file_put(f2);
 	}
 }
 
@@ -3841,7 +3842,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 		struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp,
 		struct nfsd4_open *open)
 {
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	__be32 status;
 	int oflag = nfs4_access_to_omode(open->op_share_access);
 	int access = nfs4_access_to_access(open->op_share_access);
@@ -3877,18 +3878,18 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 
 	if (!fp->fi_fds[oflag]) {
 		spin_unlock(&fp->fi_lock);
-		status = nfsd_open(rqstp, cur_fh, S_IFREG, access, &filp);
+		status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
 		if (status)
 			goto out_put_access;
 		spin_lock(&fp->fi_lock);
 		if (!fp->fi_fds[oflag]) {
-			fp->fi_fds[oflag] = filp;
-			filp = NULL;
+			fp->fi_fds[oflag] = nf;
+			nf = NULL;
 		}
 	}
 	spin_unlock(&fp->fi_lock);
-	if (filp)
-		fput(filp);
+	if (nf)
+		nfsd_file_put(nf);
 
 	status = nfsd4_truncate(rqstp, cur_fh, open);
 	if (status)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 77fdf4de91ba..473faa436e07 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -491,7 +491,7 @@ struct nfs4_file {
 	};
 	struct list_head	fi_clnt_odstate;
 	/* One each for O_RDONLY, O_WRONLY, O_RDWR: */
-	struct file *		fi_fds[3];
+	struct nfsd_file	*fi_fds[3];
 	/*
 	 * Each open or lock stateid contributes 0-4 to the counts
 	 * below depending on which bits are set in st_access_bitmap:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 17/20] nfsd: have nfsd_test_lock use the nfsd_file cache
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (10 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 16/20] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 19/20] nfsd: hook up nfs4_preprocess_stateid_op to " Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 20/20] nfsd: rip out the raparms cache Jeff Layton
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b72fa6816860..4348af408ccb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5596,11 +5596,11 @@ out:
  */
 static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
 {
-	struct file *file;
-	__be32 err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+	struct nfsd_file *nf;
+	__be32 err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (!err) {
-		err = nfserrno(vfs_test_lock(file, lock));
-		fput(file);
+		err = nfserrno(vfs_test_lock(nf->nf_file, lock));
+		nfsd_file_put(nf);
 	}
 	return err;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 18/20] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-10-05 11:02   ` [PATCH v5 14/20] nfsd: hook up nfsd_read to the " Jeff Layton
@ 2015-10-05 11:02   ` Jeff Layton
  2015-10-08 16:42   ` [PATCH v5 00/20] nfsd: open file caching J. Bruce Fields
  7 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

Have them keep an nfsd_file reference instead of a struct file.

Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
---
 fs/nfsd/nfs4layouts.c |  12 ++---
 fs/nfsd/nfs4state.c   | 131 ++++++++++++++++++++++++++------------------------
 fs/nfsd/state.h       |   6 +--
 3 files changed, 76 insertions(+), 73 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 9ffef06b30d5..1ee1881cf8d0 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -144,8 +144,8 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
 	list_del_init(&ls->ls_perfile);
 	spin_unlock(&fp->fi_lock);
 
-	vfs_setlease(ls->ls_file, F_UNLCK, NULL, (void **)&ls);
-	fput(ls->ls_file);
+	vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+	nfsd_file_put(ls->ls_file);
 
 	if (ls->ls_recalled)
 		atomic_dec(&ls->ls_stid.sc_file->fi_lo_recalls);
@@ -169,7 +169,7 @@ nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_owner = ls;
 	fl->fl_pid = current->tgid;
-	fl->fl_file = ls->ls_file;
+	fl->fl_file = ls->ls_file->nf_file;
 
 	status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
 	if (status) {
@@ -207,13 +207,13 @@ nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
 			NFSPROC4_CLNT_CB_LAYOUT);
 
 	if (parent->sc_type == NFS4_DELEG_STID)
-		ls->ls_file = get_file(fp->fi_deleg_file);
+		ls->ls_file = nfsd_file_get(fp->fi_deleg_file);
 	else
 		ls->ls_file = find_any_file(fp);
 	BUG_ON(!ls->ls_file);
 
 	if (nfsd4_layout_setlease(ls)) {
-		fput(ls->ls_file);
+		nfsd_file_put(ls->ls_file);
 		put_nfs4_file(fp);
 		kmem_cache_free(nfs4_layout_stateid_cache, ls);
 		return NULL;
@@ -598,7 +598,7 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
 
 	argv[0] = "/sbin/nfsd-recall-failed";
 	argv[1] = addr_str;
-	argv[2] = ls->ls_file->f_path.mnt->mnt_sb->s_id;
+	argv[2] = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_id;
 	argv[3] = NULL;
 
 	error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4348af408ccb..4d58a6db6e41 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -289,18 +289,18 @@ put_nfs4_file(struct nfs4_file *fi)
 	}
 }
 
-static struct file *
+static struct nfsd_file *
 __nfs4_get_fd(struct nfs4_file *f, int oflag)
 {
 	if (f->fi_fds[oflag])
-		return get_file(f->fi_fds[oflag]->nf_file);
+		return nfsd_file_get(f->fi_fds[oflag]);
 	return NULL;
 }
 
-static struct file *
+static struct nfsd_file *
 find_writeable_file_locked(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	lockdep_assert_held(&f->fi_lock);
 
@@ -310,10 +310,10 @@ find_writeable_file_locked(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *
+static struct nfsd_file *
 find_writeable_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = find_writeable_file_locked(f);
@@ -322,9 +322,10 @@ find_writeable_file(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *find_readable_file_locked(struct nfs4_file *f)
+static struct nfsd_file *
+find_readable_file_locked(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	lockdep_assert_held(&f->fi_lock);
 
@@ -334,10 +335,10 @@ static struct file *find_readable_file_locked(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *
+static struct nfsd_file *
 find_readable_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = find_readable_file_locked(f);
@@ -346,10 +347,10 @@ find_readable_file(struct nfs4_file *f)
 	return ret;
 }
 
-struct file *
+struct nfsd_file *
 find_any_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = __nfs4_get_fd(f, O_RDWR);
@@ -761,16 +762,16 @@ nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid)
 
 static void nfs4_put_deleg_lease(struct nfs4_file *fp)
 {
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 
 	spin_lock(&fp->fi_lock);
 	if (fp->fi_deleg_file && --fp->fi_delegees == 0)
-		swap(filp, fp->fi_deleg_file);
+		swap(nf, fp->fi_deleg_file);
 	spin_unlock(&fp->fi_lock);
 
-	if (filp) {
-		vfs_setlease(filp, F_UNLCK, NULL, (void **)&fp);
-		fput(filp);
+	if (nf) {
+		vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&fp);
+		nfsd_file_put(nf);
 	}
 }
 
@@ -1062,11 +1063,14 @@ static void nfs4_free_lock_stateid(struct nfs4_stid *stid)
 {
 	struct nfs4_ol_stateid *stp = openlockstateid(stid);
 	struct nfs4_lockowner *lo = lockowner(stp->st_stateowner);
-	struct file *file;
+	struct nfsd_file *nf;
 
-	file = find_any_file(stp->st_stid.sc_file);
-	if (file)
-		filp_close(file, (fl_owner_t)lo);
+	nf = find_any_file(stp->st_stid.sc_file);
+	if (nf) {
+		get_file(nf->nf_file);
+		filp_close(nf->nf_file, (fl_owner_t)lo);
+		nfsd_file_put(nf);
+	}
 	nfs4_free_ol_stateid(stid);
 }
 
@@ -3964,21 +3968,21 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
 {
 	struct nfs4_file *fp = dp->dl_stid.sc_file;
 	struct file_lock *fl;
-	struct file *filp;
+	struct nfsd_file *nf;
 	int status = 0;
 
 	fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
 	if (!fl)
 		return -ENOMEM;
-	filp = find_readable_file(fp);
-	if (!filp) {
+	nf = find_readable_file(fp);
+	if (!nf) {
 		/* We should always have a readable file here */
 		WARN_ON_ONCE(1);
 		locks_free_lock(fl);
 		return -EBADF;
 	}
-	fl->fl_file = filp;
-	status = vfs_setlease(filp, fl->fl_type, &fl, NULL);
+	fl->fl_file = nf->nf_file;
+	status = vfs_setlease(nf->nf_file, fl->fl_type, &fl, NULL);
 	if (fl)
 		locks_free_lock(fl);
 	if (status)
@@ -3996,7 +4000,7 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
 		hash_delegation_locked(dp, fp);
 		goto out_unlock;
 	}
-	fp->fi_deleg_file = filp;
+	fp->fi_deleg_file = nf;
 	fp->fi_delegees = 1;
 	hash_delegation_locked(dp, fp);
 	spin_unlock(&fp->fi_lock);
@@ -4006,7 +4010,7 @@ out_unlock:
 	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);
 out_fput:
-	fput(filp);
+	nfsd_file_put(nf);
 	return status;
 }
 
@@ -4618,7 +4622,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 	return nfs_ok;
 }
 
-static struct file *
+static struct nfsd_file *
 nfs4_find_file(struct nfs4_stid *s, int flags)
 {
 	if (!s)
@@ -4628,7 +4632,7 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
 	case NFS4_DELEG_STID:
 		if (WARN_ON_ONCE(!s->sc_file->fi_deleg_file))
 			return NULL;
-		return get_file(s->sc_file->fi_deleg_file);
+		return nfsd_file_get(s->sc_file->fi_deleg_file);
 	case NFS4_OPEN_STID:
 	case NFS4_LOCK_STID:
 		if (flags & RD_STATE)
@@ -4657,21 +4661,17 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 		struct file **filpp, bool *tmp_file, int flags)
 {
 	int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
-	struct file *file;
+	struct nfsd_file *nf;
 	__be32 status;
 
-	file = nfs4_find_file(s, flags);
-	if (file) {
+	nf = nfs4_find_file(s, flags);
+	if (nf) {
 		status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
 				acc | NFSD_MAY_OWNER_OVERRIDE);
-		if (status) {
-			fput(file);
-			return status;
-		}
-
-		*filpp = file;
+		if (status)
+			goto out;
 	} else {
-		status = nfsd_open(rqstp, fhp, S_IFREG, acc, filpp);
+		status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
 		if (status)
 			return status;
 
@@ -4679,7 +4679,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 			*tmp_file = true;
 	}
 
-	return 0;
+	*filpp = get_file(nf->nf_file);
+out:
+	nfsd_file_put(nf);
+	return status;
 }
 
 /*
@@ -5413,7 +5416,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	struct nfs4_ol_stateid *lock_stp = NULL;
 	struct nfs4_ol_stateid *open_stp = NULL;
 	struct nfs4_file *fp;
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	struct file_lock *file_lock = NULL;
 	struct file_lock *conflock = NULL;
 	__be32 status = 0;
@@ -5498,8 +5501,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		case NFS4_READ_LT:
 		case NFS4_READW_LT:
 			spin_lock(&fp->fi_lock);
-			filp = find_readable_file_locked(fp);
-			if (filp)
+			nf = find_readable_file_locked(fp);
+			if (nf)
 				get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
 			spin_unlock(&fp->fi_lock);
 			file_lock->fl_type = F_RDLCK;
@@ -5507,8 +5510,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		case NFS4_WRITE_LT:
 		case NFS4_WRITEW_LT:
 			spin_lock(&fp->fi_lock);
-			filp = find_writeable_file_locked(fp);
-			if (filp)
+			nf = find_writeable_file_locked(fp);
+			if (nf)
 				get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
 			spin_unlock(&fp->fi_lock);
 			file_lock->fl_type = F_WRLCK;
@@ -5517,14 +5520,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			status = nfserr_inval;
 		goto out;
 	}
-	if (!filp) {
+	if (!nf) {
 		status = nfserr_openmode;
 		goto out;
 	}
 
 	file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
 	file_lock->fl_pid = current->tgid;
-	file_lock->fl_file = filp;
+	file_lock->fl_file = nf->nf_file;
 	file_lock->fl_flags = FL_POSIX;
 	file_lock->fl_lmops = &nfsd_posix_mng_ops;
 	file_lock->fl_start = lock->lk_offset;
@@ -5538,7 +5541,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		goto out;
 	}
 
-	err = vfs_lock_file(filp, F_SETLK, file_lock, conflock);
+	err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
 	switch (-err) {
 	case 0: /* success! */
 		nfs4_inc_and_copy_stateid(&lock->lk_resp_stateid, &lock_stp->st_stid);
@@ -5558,8 +5561,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		break;
 	}
 out:
-	if (filp)
-		fput(filp);
+	if (nf)
+		nfsd_file_put(nf);
 	if (lock_stp) {
 		/* Bump seqid manually if the 4.0 replay owner is openowner */
 		if (cstate->replay_owner &&
@@ -5686,7 +5689,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	    struct nfsd4_locku *locku)
 {
 	struct nfs4_ol_stateid *stp;
-	struct file *filp = NULL;
+	struct nfsd_file *nf;
 	struct file_lock *file_lock = NULL;
 	__be32 status;
 	int err;
@@ -5704,8 +5707,8 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 					&stp, nn);
 	if (status)
 		goto out;
-	filp = find_any_file(stp->st_stid.sc_file);
-	if (!filp) {
+	nf = find_any_file(stp->st_stid.sc_file);
+	if (!nf) {
 		status = nfserr_lock_range;
 		goto put_stateid;
 	}
@@ -5713,13 +5716,13 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (!file_lock) {
 		dprintk("NFSD: %s: unable to allocate lock!\n", __func__);
 		status = nfserr_jukebox;
-		goto fput;
+		goto put_file;
 	}
 
 	file_lock->fl_type = F_UNLCK;
 	file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
 	file_lock->fl_pid = current->tgid;
-	file_lock->fl_file = filp;
+	file_lock->fl_file = nf->nf_file;
 	file_lock->fl_flags = FL_POSIX;
 	file_lock->fl_lmops = &nfsd_posix_mng_ops;
 	file_lock->fl_start = locku->lu_offset;
@@ -5728,14 +5731,14 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 						locku->lu_length);
 	nfs4_transform_lock_offset(file_lock);
 
-	err = vfs_lock_file(filp, F_SETLK, file_lock, NULL);
+	err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, NULL);
 	if (err) {
 		dprintk("NFSD: nfs4_locku: vfs_lock_file failed!\n");
 		goto out_nfserr;
 	}
 	nfs4_inc_and_copy_stateid(&locku->lu_stateid, &stp->st_stid);
-fput:
-	fput(filp);
+put_file:
+	nfsd_file_put(nf);
 put_stateid:
 	up_write(&stp->st_rwsem);
 	nfs4_put_stid(&stp->st_stid);
@@ -5747,7 +5750,7 @@ out:
 
 out_nfserr:
 	status = nfserrno(err);
-	goto fput;
+	goto put_file;
 }
 
 /*
@@ -5760,17 +5763,17 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 {
 	struct file_lock *fl;
 	int status = false;
-	struct file *filp = find_any_file(fp);
+	struct nfsd_file *nf = find_any_file(fp);
 	struct inode *inode;
 	struct file_lock_context *flctx;
 
-	if (!filp) {
+	if (!nf) {
 		/* Any valid lock stateid should have some sort of access */
 		WARN_ON_ONCE(1);
 		return status;
 	}
 
-	inode = file_inode(filp);
+	inode = file_inode(nf->nf_file);
 	flctx = inode->i_flctx;
 
 	if (flctx && !list_empty_careful(&flctx->flc_posix)) {
@@ -5783,7 +5786,7 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 		}
 		spin_unlock(&flctx->flc_lock);
 	}
-	fput(filp);
+	nfsd_file_put(nf);
 	return status;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 473faa436e07..ee23de10663c 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -501,7 +501,7 @@ struct nfs4_file {
 	 */
 	atomic_t		fi_access[2];
 	u32			fi_share_deny;
-	struct file		*fi_deleg_file;
+	struct nfsd_file	*fi_deleg_file;
 	int			fi_delegees;
 	struct knfsd_fh		fi_fhandle;
 	bool			fi_had_conflict;
@@ -550,7 +550,7 @@ struct nfs4_layout_stateid {
 	spinlock_t			ls_lock;
 	struct list_head		ls_layouts;
 	u32				ls_layout_type;
-	struct file			*ls_file;
+	struct nfsd_file		*ls_file;
 	struct nfsd4_callback		ls_recall;
 	stateid_t			ls_recall_sid;
 	bool				ls_recalled;
@@ -615,7 +615,7 @@ static inline void get_nfs4_file(struct nfs4_file *fi)
 {
 	atomic_inc(&fi->fi_ref);
 }
-struct file *find_any_file(struct nfs4_file *f);
+struct nfsd_file *find_any_file(struct nfs4_file *f);
 
 /* grace period management */
 void nfsd4_end_grace(struct nfsd_net *nn);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 19/20] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (11 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 17/20] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  2015-10-05 11:02 ` [PATCH v5 20/20] nfsd: rip out the raparms cache Jeff Layton
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4proc.c  | 32 ++++++++++++++++----------------
 fs/nfsd/nfs4state.c | 24 ++++++++++--------------
 fs/nfsd/nfs4xdr.c   | 16 +++++-----------
 fs/nfsd/state.h     |  2 +-
 fs/nfsd/xdr4.h      | 15 +++++++--------
 5 files changed, 39 insertions(+), 50 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a9f096c7e99f..7e21763a35f2 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -758,7 +758,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	__be32 status;
 
-	read->rd_filp = NULL;
+	read->rd_nf = NULL;
 	if (read->rd_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
@@ -775,7 +775,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	/* check stateid */
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
-			RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+			RD_STATE, &read->rd_nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -921,7 +921,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
-			&setattr->sa_stateid, WR_STATE, NULL, NULL);
+			&setattr->sa_stateid, WR_STATE, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -977,7 +977,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	    struct nfsd4_write *write)
 {
 	stateid_t *stateid = &write->wr_stateid;
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	__be32 status = nfs_ok;
 	unsigned long cnt;
 	int nvecs;
@@ -986,7 +986,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return nfserr_inval;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
-			&filp, NULL);
+			&nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -999,10 +999,10 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	nvecs = fill_in_write_vector(rqstp->rq_vec, write);
 	WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
 
-	status = nfsd_vfs_write(rqstp, &cstate->current_fh, filp,
+	status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf->nf_file,
 				write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
 				&write->wr_how_written);
-	fput(filp);
+	nfsd_file_put(nf);
 
 	write->wr_bytes_written = cnt;
 
@@ -1014,21 +1014,21 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
 	__be32 status = nfserr_notsupp;
-	struct file *file;
+	struct nfsd_file *nf;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate,
 					    &fallocate->falloc_stateid,
-					    WR_STATE, &file, NULL);
+					    WR_STATE, &nf);
 	if (status != nfs_ok) {
 		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
 		return status;
 	}
 
-	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, file,
+	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
 				     fallocate->falloc_offset,
 				     fallocate->falloc_length,
 				     flags);
-	fput(file);
+	nfsd_file_put(nf);
 	return status;
 }
 
@@ -1053,11 +1053,11 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	int whence;
 	__be32 status;
-	struct file *file;
+	struct nfsd_file *nf;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate,
 					    &seek->seek_stateid,
-					    RD_STATE, &file, NULL);
+					    RD_STATE, &nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
 		return status;
@@ -1079,14 +1079,14 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	 * Note:  This call does change file->f_pos, but nothing in NFSD
 	 *        should ever file->f_pos.
 	 */
-	seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
+	seek->seek_pos = vfs_llseek(nf->nf_file, seek->seek_offset, whence);
 	if (seek->seek_pos < 0)
 		status = nfserrno(seek->seek_pos);
-	else if (seek->seek_pos >= i_size_read(file_inode(file)))
+	else if (seek->seek_pos >= i_size_read(file_inode(nf->nf_file)))
 		seek->seek_eof = true;
 
 out:
-	fput(file);
+	nfsd_file_put(nf);
 	return status;
 }
 
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4d58a6db6e41..477fe34567d0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4658,7 +4658,7 @@ nfs4_check_olstateid(struct svc_fh *fhp, struct nfs4_ol_stateid *ols, int flags)
 
 static __be32
 nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
-		struct file **filpp, bool *tmp_file, int flags)
+		struct nfsd_file **nfp, int flags)
 {
 	int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
 	struct nfsd_file *nf;
@@ -4668,20 +4668,18 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 	if (nf) {
 		status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
 				acc | NFSD_MAY_OWNER_OVERRIDE);
-		if (status)
+		if (status) {
+			nfsd_file_put(nf);
 			goto out;
+		}
 	} else {
 		status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
 		if (status)
 			return status;
-
-		if (tmp_file)
-			*tmp_file = true;
 	}
 
-	*filpp = get_file(nf->nf_file);
+	*nfp = nf;
 out:
-	nfsd_file_put(nf);
 	return status;
 }
 
@@ -4691,7 +4689,7 @@ out:
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filpp, bool *tmp_file)
+		int flags, struct nfsd_file **nfp)
 {
 	struct svc_fh *fhp = &cstate->current_fh;
 	struct inode *ino = d_inode(fhp->fh_dentry);
@@ -4700,10 +4698,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 	struct nfs4_stid *s = NULL;
 	__be32 status;
 
-	if (filpp)
-		*filpp = NULL;
-	if (tmp_file)
-		*tmp_file = false;
+	if (nfp)
+		*nfp = NULL;
 
 	if (grace_disallows_io(net, ino))
 		return nfserr_grace;
@@ -4740,8 +4736,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 	status = nfs4_check_fh(fhp, s);
 
 done:
-	if (!status && filpp)
-		status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
+	if (status == nfs_ok && nfp)
+		status = nfs4_check_file(rqstp, fhp, s, nfp, flags);
 out:
 	if (s)
 		nfs4_put_stid(s);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 325521ce389a..92e5e8f884d0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -49,6 +49,7 @@
 #include "cache.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -3460,14 +3461,14 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
 	unsigned long maxcount;
 	struct xdr_stream *xdr = &resp->xdr;
-	struct file *file = read->rd_filp;
+	struct file *file;
 	int starting_len = xdr->buf->len;
-	struct raparms *ra = NULL;
 	__be32 *p;
 
 	if (nfserr)
 		goto out;
 
+	file = read->rd_nf->nf_file;
 	p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
 	if (!p) {
 		WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags));
@@ -3487,24 +3488,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 			 (xdr->buf->buflen - xdr->buf->len));
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
 
-	if (read->rd_tmp_file)
-		ra = nfsd_init_raparms(file);
-
 	if (file->f_op->splice_read &&
 	    test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
 		nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
 	else
 		nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
 
-	if (ra)
-		nfsd_put_raparams(file, ra);
-
 	if (nfserr)
 		xdr_truncate_encode(xdr, starting_len);
-
 out:
-	if (file)
-		fput(file);
+	if (read->rd_nf)
+		nfsd_file_put(read->rd_nf);
 	return nfserr;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ee23de10663c..732fbb3b5ef1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -579,7 +579,7 @@ struct nfsd_net;
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filp, bool *tmp_file);
+		int flags, struct nfsd_file **filp);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ce7362c88b48..c167d7a5b0e6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -268,15 +268,14 @@ struct nfsd4_open_downgrade {
 
 
 struct nfsd4_read {
-	stateid_t	rd_stateid;         /* request */
-	u64		rd_offset;          /* request */
-	u32		rd_length;          /* request */
-	int		rd_vlen;
-	struct file     *rd_filp;
-	bool		rd_tmp_file;
+	stateid_t		rd_stateid;         /* request */
+	u64			rd_offset;          /* request */
+	u32			rd_length;          /* request */
+	int			rd_vlen;
+	struct nfsd_file	*rd_nf;
 	
-	struct svc_rqst *rd_rqstp;          /* response */
-	struct svc_fh * rd_fhp;             /* response */
+	struct svc_rqst		*rd_rqstp;          /* response */
+	struct svc_fh		*rd_fhp;             /* response */
 };
 
 struct nfsd4_readdir {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 20/20] nfsd: rip out the raparms cache
  2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
                   ` (12 preceding siblings ...)
  2015-10-05 11:02 ` [PATCH v5 19/20] nfsd: hook up nfs4_preprocess_stateid_op to " Jeff Layton
@ 2015-10-05 11:02 ` Jeff Layton
  13 siblings, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2015-10-05 11:02 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, linux-fsdevel, Al Viro

Nothing uses it anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfssvc.c |  14 +-----
 fs/nfsd/vfs.c    | 147 -------------------------------------------------------
 fs/nfsd/vfs.h    |   6 ---
 3 files changed, 1 insertion(+), 166 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d816bb3faa6e..d1034d119afb 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -216,18 +216,9 @@ static int nfsd_startup_generic(int nrservs)
 	if (nfsd_users++)
 		return 0;
 
-	/*
-	 * Readahead param cache - will no-op if it already exists.
-	 * (Note therefore results will be suboptimal if number of
-	 * threads is modified after nfsd start.)
-	 */
-	ret = nfsd_racache_init(2*nrservs);
-	if (ret)
-		goto dec_users;
-
 	ret = nfsd_file_cache_init();
 	if (ret)
-		goto out_racache;
+		goto dec_users;
 
 	ret = nfs4_state_start();
 	if (ret)
@@ -236,8 +227,6 @@ static int nfsd_startup_generic(int nrservs)
 
 out_file_cache:
 	nfsd_file_cache_shutdown();
-out_racache:
-	nfsd_racache_shutdown();
 dec_users:
 	nfsd_users--;
 	return ret;
@@ -250,7 +239,6 @@ static void nfsd_shutdown_generic(void)
 
 	nfs4_state_shutdown();
 	nfsd_file_cache_shutdown();
-	nfsd_racache_shutdown();
 }
 
 static bool nfsd_needs_lockd(void)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 571f1000e670..aafb9b00b767 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -46,34 +46,6 @@
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
-
-/*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
- * If you increase the number of cached files very much, you'll need to
- * add a hash table here.
- */
-struct raparms {
-	struct raparms		*p_next;
-	unsigned int		p_count;
-	ino_t			p_ino;
-	dev_t			p_dev;
-	int			p_set;
-	struct file_ra_state	p_ra;
-	unsigned int		p_hindex;
-};
-
-struct raparm_hbucket {
-	struct raparms		*pb_head;
-	spinlock_t		pb_lock;
-} ____cacheline_aligned_in_smp;
-
-#define RAPARM_HASH_BITS	4
-#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
-
 /* 
  * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
  * a mount point.
@@ -749,65 +721,6 @@ nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	return err;
 }
 
-struct raparms *
-nfsd_init_raparms(struct file *file)
-{
-	struct inode *inode = file_inode(file);
-	dev_t dev = inode->i_sb->s_dev;
-	ino_t ino = inode->i_ino;
-	struct raparms	*ra, **rap, **frap = NULL;
-	int depth = 0;
-	unsigned int hash;
-	struct raparm_hbucket *rab;
-
-	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
-	rab = &raparm_hash[hash];
-
-	spin_lock(&rab->pb_lock);
-	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
-		if (ra->p_ino == ino && ra->p_dev == dev)
-			goto found;
-		depth++;
-		if (ra->p_count == 0)
-			frap = rap;
-	}
-	depth = nfsdstats.ra_size;
-	if (!frap) {	
-		spin_unlock(&rab->pb_lock);
-		return NULL;
-	}
-	rap = frap;
-	ra = *frap;
-	ra->p_dev = dev;
-	ra->p_ino = ino;
-	ra->p_set = 0;
-	ra->p_hindex = hash;
-found:
-	if (rap != &rab->pb_head) {
-		*rap = ra->p_next;
-		ra->p_next   = rab->pb_head;
-		rab->pb_head = ra;
-	}
-	ra->p_count++;
-	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
-	spin_unlock(&rab->pb_lock);
-
-	if (ra->p_set)
-		file->f_ra = ra->p_ra;
-	return ra;
-}
-
-void nfsd_put_raparams(struct file *file, struct raparms *ra)
-{
-	struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
-
-	spin_lock(&rab->pb_lock);
-	ra->p_ra = file->f_ra;
-	ra->p_set = 1;
-	ra->p_count--;
-	spin_unlock(&rab->pb_lock);
-}
-
 /*
  * Grab and keep cached pages associated with a file in the svc_rqst
  * so that they can be passed to the network sendmsg/sendpage routines
@@ -2017,63 +1930,3 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,
 
 	return err? nfserrno(err) : 0;
 }
-
-void
-nfsd_racache_shutdown(void)
-{
-	struct raparms *raparm, *last_raparm;
-	unsigned int i;
-
-	dprintk("nfsd: freeing readahead buffers.\n");
-
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		raparm = raparm_hash[i].pb_head;
-		while(raparm) {
-			last_raparm = raparm;
-			raparm = raparm->p_next;
-			kfree(last_raparm);
-		}
-		raparm_hash[i].pb_head = NULL;
-	}
-}
-/*
- * Initialize readahead param cache
- */
-int
-nfsd_racache_init(int cache_size)
-{
-	int	i;
-	int	j = 0;
-	int	nperbucket;
-	struct raparms **raparm = NULL;
-
-
-	if (raparm_hash[0].pb_head)
-		return 0;
-	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
-	nperbucket = max(2, nperbucket);
-	cache_size = nperbucket * RAPARM_HASH_SIZE;
-
-	dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
-
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		spin_lock_init(&raparm_hash[i].pb_lock);
-
-		raparm = &raparm_hash[i].pb_head;
-		for (j = 0; j < nperbucket; j++) {
-			*raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
-			if (!*raparm)
-				goto out_nomem;
-			raparm = &(*raparm)->p_next;
-		}
-		*raparm = NULL;
-	}
-
-	nfsdstats.ra_size = cache_size;
-	return 0;
-
-out_nomem:
-	dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
-	nfsd_racache_shutdown();
-	return -ENOMEM;
-}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 80692e06302d..1efccde4bf9a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -39,8 +39,6 @@
 typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 
 /* nfsd/vfs.c */
-int		nfsd_racache_init(int);
-void		nfsd_racache_shutdown(void);
 int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
@@ -74,7 +72,6 @@ __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
 __be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
-struct raparms;
 __be32		nfsd_splice_read(struct svc_rqst *,
 				struct file *, loff_t, unsigned long *);
 __be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
@@ -107,9 +104,6 @@ __be32		nfsd_statfs(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_permission(struct svc_rqst *, struct svc_export *,
 				struct dentry *, int);
 
-struct raparms *nfsd_init_raparms(struct file *file);
-void		nfsd_put_raparams(struct file *file, struct raparms *ra);
-
 static inline int fh_want_write(struct svc_fh *fh)
 {
 	int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 01/20] list_lru: add list_lru_rotate
       [not found]   ` <1444042962-6947-2-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
@ 2015-10-05 21:47     ` Dave Chinner
  2015-10-06 11:43       ` Jeff Layton
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2015-10-05 21:47 UTC (permalink / raw)
  To: Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Mon, Oct 05, 2015 at 07:02:23AM -0400, Jeff Layton wrote:
> Add a function that can move an entry to the MRU end of the list.
> 
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
> Reviewed-by: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>

Having read through patch 10 (nfsd: add a new struct file caching
facility to nfsd) that uses this function, I think it is unnecessary
as it's usage is incorrect from the perspective of the list_lru
shrinker management.

What you are attempting to do is rotate the object to the tail of
the LRU when the last reference is dropped, so that it gets a full
trip through the LRU before being reclaimed by the shrinker. And to
ensure this "works", the scan from the shrinker checks for reference
counts and skip the item being isolated (i.e. return LRU_SKIP) and
so leave it in it's place in the LRU.

i.e. you're attempting to manage LRU-ness of the list yourself when,
in fact, the list_lru infrastructure does this and doesn't have the
subtle bugs your version has. By trying to manage it yourself, the
list_lru lists are no longer sorted into memory pressure driven
LRU order.

e.g. your manual rotation technique means if there are nr_to_walk
referenced items at the head of the list, the shrinker will skip
them all and do nothing, even though there are reclaimable objects
further down the list. i.e. it can't do any reclaim because it
doesn't sort the list into LRU order any more.

This comes from using LRU_SKIP improperly. LRU_SKIP is there for
objects that we can't lock in the isolate callback due to lock
inversion issues (e.g. see dentry_lru_isolate()), and so we need to
look at it again on the next scan pass. hence it gets left in place.

However, if we can lock the item and peer at it's reference counts
safely and we decide that we cannot reclaim it because it is
referenced, the isolate callback should be returning LRU_ROTATE
to move the referenced item to the tail of the list. (Again, see
dentry_lru_isolate() for an example.) The means that
the next nr_to_walk scan of the list will not rescan that item and
skip it again (unless the list is very short), but will instead scan
items that it hasn't yet reached.

This avoids the "shrinker does nothing due to skipped items at the
head of the list" problem, and makes the LRU function as an actual
LRU. i.e.  referenced items all cluster towards the tail of the LRU
under memory pressure and the head of the LRU contains the
reclaimable objects.

So I think the correct solution is to use LRU_ROTATE correctly
rather than try to manage the LRU list order externally like this.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 01/20] list_lru: add list_lru_rotate
  2015-10-05 21:47     ` Dave Chinner
@ 2015-10-06 11:43       ` Jeff Layton
       [not found]         ` <20151006074341.0e2f796e-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2015-10-06 11:43 UTC (permalink / raw)
  To: Dave Chinner
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Tue, 6 Oct 2015 08:47:17 +1100
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:

> On Mon, Oct 05, 2015 at 07:02:23AM -0400, Jeff Layton wrote:
> > Add a function that can move an entry to the MRU end of the list.
> > 
> > Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
> > Reviewed-by: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> > Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
> 
> Having read through patch 10 (nfsd: add a new struct file caching
> facility to nfsd) that uses this function, I think it is unnecessary
> as it's usage is incorrect from the perspective of the list_lru
> shrinker management.
> 
> What you are attempting to do is rotate the object to the tail of
> the LRU when the last reference is dropped, so that it gets a full
> trip through the LRU before being reclaimed by the shrinker. And to
> ensure this "works", the scan from the shrinker checks for reference
> counts and skip the item being isolated (i.e. return LRU_SKIP) and
> so leave it in it's place in the LRU.
> 
> i.e. you're attempting to manage LRU-ness of the list yourself when,
> in fact, the list_lru infrastructure does this and doesn't have the
> subtle bugs your version has. By trying to manage it yourself, the
> list_lru lists are no longer sorted into memory pressure driven
> LRU order.
> 
> e.g. your manual rotation technique means if there are nr_to_walk
> referenced items at the head of the list, the shrinker will skip
> them all and do nothing, even though there are reclaimable objects
> further down the list. i.e. it can't do any reclaim because it
> doesn't sort the list into LRU order any more.
> 
> This comes from using LRU_SKIP improperly. LRU_SKIP is there for
> objects that we can't lock in the isolate callback due to lock
> inversion issues (e.g. see dentry_lru_isolate()), and so we need to
> look at it again on the next scan pass. hence it gets left in place.
> 
> However, if we can lock the item and peer at it's reference counts
> safely and we decide that we cannot reclaim it because it is
> referenced, the isolate callback should be returning LRU_ROTATE
> to move the referenced item to the tail of the list. (Again, see
> dentry_lru_isolate() for an example.) The means that
> the next nr_to_walk scan of the list will not rescan that item and
> skip it again (unless the list is very short), but will instead scan
> items that it hasn't yet reached.
> 
> This avoids the "shrinker does nothing due to skipped items at the
> head of the list" problem, and makes the LRU function as an actual
> LRU. i.e.  referenced items all cluster towards the tail of the LRU
> under memory pressure and the head of the LRU contains the
> reclaimable objects.
> 
> So I think the correct solution is to use LRU_ROTATE correctly
> rather than try to manage the LRU list order externally like this.
> 

Thanks for looking, Dave. Ok, fair enough.

I grafted the LRU list stuff on after I did the original set, and I
think the way I designed the refcounting doesn't really work very well
with it. It has been a while since I added that in, but I do remember
struggling a bit with lock inversion problems trying to do it the more
standard way. It's solvable with a nfsd_file spinlock, but I wanted
to avoid that -- still maybe it's the best way.

What I don't quite get conceptually is how the list_lru stuff really
works...

Looking at the dcache's usage, dentry_lru_add is only called from dput
and only removed from the list when you're shrinking the dcache or from
__dentry_kill. It will rotate entries to the end of the list via
LRU_ROTATE from the shrinker callback if DCACHE_REFERENCED was set, but
I don't see how you end up with stuff at the end of the list otherwise.

So, the dcache's LRU list doesn't really seem to keep the entries in LRU
order at all. It just prunes a number of entries that haven't been used
since the last time the shrinker callback was called, and the rest end
up staying on the list in whatever order they were originally added.

So...

dentry1			dentry2
allocated
dput
			allocated
			dput

found
dput again
(maybe many more times)

Now, the shrinker runs once and skips both because DCACHE_REFERENCED is
set. It then runs again later and prunes dentry1 before dentry2 even
though it has been used many more times since dentry2 has.

Am I missing something in how this works?
-- 
Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 01/20] list_lru: add list_lru_rotate
       [not found]         ` <20151006074341.0e2f796e-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2015-10-07  1:09           ` Dave Chinner
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Chinner @ 2015-10-07  1:09 UTC (permalink / raw)
  To: Jeff Layton
  Cc: bfields-uC3wQj2KruNg9hUCZPvPmw, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Tue, Oct 06, 2015 at 07:43:41AM -0400, Jeff Layton wrote:
> On Tue, 6 Oct 2015 08:47:17 +1100
> Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> 
> > On Mon, Oct 05, 2015 at 07:02:23AM -0400, Jeff Layton wrote:
> > > Add a function that can move an entry to the MRU end of the list.
> > > 
> > > Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > > Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
> > > Reviewed-by: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> > > Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
> > 
> > Having read through patch 10 (nfsd: add a new struct file caching
> > facility to nfsd) that uses this function, I think it is unnecessary
> > as it's usage is incorrect from the perspective of the list_lru
> > shrinker management.
> > 
> > What you are attempting to do is rotate the object to the tail of
> > the LRU when the last reference is dropped, so that it gets a full
> > trip through the LRU before being reclaimed by the shrinker. And to
> > ensure this "works", the scan from the shrinker checks for reference
> > counts and skip the item being isolated (i.e. return LRU_SKIP) and
> > so leave it in it's place in the LRU.
> > 
> > i.e. you're attempting to manage LRU-ness of the list yourself when,
> > in fact, the list_lru infrastructure does this and doesn't have the
> > subtle bugs your version has. By trying to manage it yourself, the
> > list_lru lists are no longer sorted into memory pressure driven
> > LRU order.
> > 
> > e.g. your manual rotation technique means if there are nr_to_walk
> > referenced items at the head of the list, the shrinker will skip
> > them all and do nothing, even though there are reclaimable objects
> > further down the list. i.e. it can't do any reclaim because it
> > doesn't sort the list into LRU order any more.
> > 
> > This comes from using LRU_SKIP improperly. LRU_SKIP is there for
> > objects that we can't lock in the isolate callback due to lock
> > inversion issues (e.g. see dentry_lru_isolate()), and so we need to
> > look at it again on the next scan pass. hence it gets left in place.
> > 
> > However, if we can lock the item and peer at it's reference counts
> > safely and we decide that we cannot reclaim it because it is
> > referenced, the isolate callback should be returning LRU_ROTATE
> > to move the referenced item to the tail of the list. (Again, see
> > dentry_lru_isolate() for an example.) The means that
> > the next nr_to_walk scan of the list will not rescan that item and
> > skip it again (unless the list is very short), but will instead scan
> > items that it hasn't yet reached.
> > 
> > This avoids the "shrinker does nothing due to skipped items at the
> > head of the list" problem, and makes the LRU function as an actual
> > LRU. i.e.  referenced items all cluster towards the tail of the LRU
> > under memory pressure and the head of the LRU contains the
> > reclaimable objects.
> > 
> > So I think the correct solution is to use LRU_ROTATE correctly
> > rather than try to manage the LRU list order externally like this.
> > 
> 
> Thanks for looking, Dave. Ok, fair enough.
> 
> I grafted the LRU list stuff on after I did the original set, and I
> think the way I designed the refcounting doesn't really work very well
> with it. It has been a while since I added that in, but I do remember
> struggling a bit with lock inversion problems trying to do it the more
> standard way. It's solvable with a nfsd_file spinlock, but I wanted
> to avoid that -- still maybe it's the best way.
> 
> What I don't quite get conceptually is how the list_lru stuff really
> works...
> 
> Looking at the dcache's usage, dentry_lru_add is only called from dput
> and only removed from the list when you're shrinking the dcache or from
> __dentry_kill. It will rotate entries to the end of the list via
> LRU_ROTATE from the shrinker callback if DCACHE_REFERENCED was set, but
> I don't see how you end up with stuff at the end of the list otherwise.

The LRU lists are managed lazily to keep overhead down. You add them
to the list the first time the object becomes unreferenced, and then
don't remove it until the object is reclaimed.

This means that when you do repeated "lookup, grab first reference,
drop last reference" operations on an object, there is no LRU list
management overhead. YOu don't touch the list, you don't touch the
locks, etc. All you touch is the referenced flag in the object and
when memory pressure occurs the object will then be rotated.

> So, the dcache's LRU list doesn't really seem to keep the entries in LRU
> order at all. It just prunes a number of entries that haven't been used
> since the last time the shrinker callback was called, and the rest end
> up staying on the list in whatever order they were originally added.
> So...
> 
> dentry1			dentry2
> allocated
> dput
> 			allocated
> 			dput
> 
> found
> dput again
> (maybe many more times)
> 
> Now, the shrinker runs once and skips both because DCACHE_REFERENCED is
> set. It then runs again later and prunes dentry1 before dentry2 even
> though it has been used many more times since dentry2 has.
> 
> Am I missing something in how this works?

Yes - the frame of reference. When you look at individual cases like
this, it's only "roughly LRU". However, when you scale it up
this small "inaccuracy" turns into noise. Put a thousand entries on
the LRU, and these two inodes don't get reclaimed until 998 others
are reclaimed. Whether d1 or d2 gets reclaimed first really doesn't
matter.

Also, the list_lru code needs to scale to tens of millions
of objects in the LRU and turning over hundreds of thousands of
objects every second, so little inaccuracies really don't matter at
this level - performance and scalability are much more important.

Further, the list_lru is not a true global LRU list at all. It's a
segmented LRU, with separate LRUs for each node or memcg in the
machine. So the LRU really isn't a global LRU at all, it's a bunch
of isolated LRUs designed to allow the mm/ subsystem to do NUMA and
memcg aware object reclaim...

Combine this all and it becomes obvious why the shrinker is
responsible for maintainer LRU order. That comes from object having
a "referenced flag" in it to tell the shrinker that since it has
seen this object the last time, the object has been referenced
again. The shrinker can then remove the referenced flag and rotate
the object to the tail of the list.

If sustained memory pressure occurs, then object will eventually
make it's way back to the head of the LRU, at which time the
shrinker will check the referenced flag again. If it's not set, it
gets reclaimed, if it is set, it gets rotated again.

IOWs, the LRU frame of reference is *memory pressure* - the amount
of object rotation is determined by the amount of memory pressure.
It doesn't matter how many times the code accesses the object,
it's whether it is accessed frequently enough during periods of
memory pressure that it constantly gets rotated to the tail of the
LRU. This basically means the objects that are kept under sustained
heavy memory pressure are the objects that are being constantly
referenced. Anything that is not regularly referenced will filter to
the head of the LRU and hence get reclaimed.

Some subsystems are a bit more complex with their reference "flags".
e.g. the XFS buffer cache keeps a "reclaim count" rather than a
reference flag that determine the number of times an object will be
rotated without an active reference before being reclaimed. This is
done becuase no all buffers are equal e.g. btree roots are much more
important than interior tree nodes which are more important than
leaf nodes, and you can't express this with a single "reference
flag". Hence in terms of reclaim count, root > node > leaf and so we
hold on to metadata that is more likely to be referenced under
sustained memory pressure...

So, it you were expecting a "perfect LRU" list mechanism, the
list_lru abstraction isn't it. When looked at from a macro level it
gives solid, scalable LRU cache reclaim with NUMA and memcg
awareness. When looked at from a micro level, it will display all
sorts of quirks that are a result of the design decisions to enable
performance, scalability and reclaim features at the macro level...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/20] nfsd: open file caching
       [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-10-05 11:02   ` [PATCH v5 18/20] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
@ 2015-10-08 16:42   ` J. Bruce Fields
  2015-10-08 16:55     ` Jeff Layton
  7 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2015-10-08 16:42 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

I get a this on the client running some lease tests:

[   38.552120] BUG: unable to handle kernel NULL pointer dereference at (null)
[   38.552723] IP: [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
[   38.553111] PGD 56c2d067 PUD 51145067 PMD 0 
[   38.553534] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC 
[   38.554128] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
[   38.555102] CPU: 0 PID: 4890 Comm: lease_tests Not tainted 4.3.0-rc3-14186-g7619b8e #322
[   38.555593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[   38.556005] task: ffff880075bd8080 ti: ffff880055560000 task.ti: ffff880055560000
[   38.556005] RIP: 0010:[<ffffffff811fcb3f>]  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
[   38.556005] RSP: 0018:ffff880055563e98  EFLAGS: 00010246
[   38.556005] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff880055563ec8
[   38.556005] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880051133e40
[   38.556005] RBP: ffff880055563eb8 R08: 0000000000000000 R09: 00007ffc941da360
[   38.556005] R10: 0000000000000008 R11: 0000000000000212 R12: ffff880051133e40
[   38.556005] R13: 0000000000000000 R14: ffff880051133e40 R15: ffff880051133e40
[   38.556005] FS:  00007fbbe6864700(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
[   38.556005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   38.556005] CR2: 0000000000000000 CR3: 00000000590b0000 CR4: 00000000000406f0
[   38.556005] Stack:
[   38.556005]  ffff880056dd1f88 0000000000000002 0000000000000400 0000000000000002
[   38.556005]  ffff880055563ef8 ffffffff811fd4c1 ffff880051133e40 ffffffff8157b913
[   38.556005]  0000000000000000 0000000000000000 0000000000000400 0000000000000002
[   38.556005] Call Trace:
[   38.556005]  [<ffffffff811fd4c1>] fcntl_setlease+0xa1/0xd0
[   38.556005]  [<ffffffff8157b913>] ? security_file_fcntl+0x43/0x60
[   38.556005]  [<ffffffff811bc74f>] SyS_fcntl+0x31f/0x630
[   38.556005]  [<ffffffff81a77117>] entry_SYSCALL_64_fastpath+0x12/0x6f
[   38.556005] Code: ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 49 89 d5 49 89 fc 48 89 f3 48 83 ec 08 48 83 fe 02 <48> 8b 12 74 14 48 c7 c7 40 cb 27 83 48 89 4d e0 e8 9c d8 e9 ff 
[   38.556005] RIP  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
[   38.556005]  RSP <ffff880055563e98>
[   38.556005] CR2: 0000000000000000
[   38.573673] ---[ end trace 2e6e1d4b9df8a11e ]---

--b.

On Mon, Oct 05, 2015 at 07:02:22AM -0400, Jeff Layton wrote:
> v5:
> - switch to using flush_delayed_fput instead of __fput_sync
> - hash on inode->i_ino instead of inode pointer
> - add /proc/fs/nfsd/file_cache_stats file to track stats on the hash
> - eliminate extra fh_verify in nfsd_file_acquire
> 
> v4:
> - squash some of the patches down into one patch to reduce churn
> - close cached open files after unlink instead of before
> - don't just close files after nfsd does an unlink, must do it
>   after any vfs-layer unlink. Use fsnotify to handle that.
> - use a SRCU notifier chain for setlease
> - add patch to allow non-kthreads to do a fput_sync
> 
> v3:
> - open files are now hashed on inode pointer instead of fh
> - eliminate the recurring workqueue job in favor of shrinker/LRU and
>   notifier from lease setting code
> - have nfsv4 use the cache as well
> - removal of raparms cache
> 
> v2:
> - changelog cleanups and clarifications
> - allow COMMIT to use cached open files
> - tracepoints for nfsd_file cache
> - proactively close open files prior to REMOVE, or a RENAME over a
>   positive dentry
> 
> This is the fifth iteration of the open file cache patches for nfsd.
> The main changes from the v4 set are the conversion of the code to
> use flush_delayed_fput instead of __fput_sync, and some changes to
> improve performance.
> 
> The kbuild test robot noted a drop in performance with this set,
> which turned out to be lousy hash distribution due to hashing on
> inode pointer value. Hashing on inode->i_ino gives a much better
> distribution.
> 
> For those seeing this for the first time, main impetus here is to help
> speed up NFSv3 I/O. nfsd will do an open+read/write+close for every READ
> or WRITE RPC. This patchset allows us to cache those open files more or
> less indefinitely, and close them out in response to certain vfs-layer
> activity (unlinks and setlease attempts primarily).
> 
> The first few patches in the series make (small) changes to several
> subsystems to enable the caching infrastructure. The tenth patch adds
> the cache itself, and then the remaining patches hook the nfsd code
> up to the cache. The final patch rips out the raparms cache since it's
> no longer needed with these changes.
> 
> Again, the most controversial part of the set is probably the changes
> to allow normal user processes to use the delayed_fput infrastructure.
> Al, if you could weigh in on those, then that would be helpful. We
> really do need a way to allow a thread to flush the final fput work
> without returning to userland.
> 
> Jeff Layton (20):
>   list_lru: add list_lru_rotate
>   fs: have flush_delayed_fput flush the workqueue job
>   fs: add a kerneldoc header to fput
>   fs: add fput_queue
>   fs: export flush_delayed_fput
>   fsnotify: export several symbols
>   locks: create a new notifier chain for lease attempts
>   nfsd: move include of state.h from trace.c to trace.h
>   sunrpc: add a new cache_detail operation for when a cache is flushed
>   nfsd: add a new struct file caching facility to nfsd
>   nfsd: keep some rudimentary stats on nfsd_file cache
>   nfsd: allow filecache open to skip fh_verify check
>   nfsd: hook up nfsd_write to the new nfsd_file cache
>   nfsd: hook up nfsd_read to the nfsd_file cache
>   nfsd: hook nfsd_commit up to the nfsd_file cache
>   nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>   nfsd: have nfsd_test_lock use the nfsd_file cache
>   nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>   nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>   nfsd: rip out the raparms cache
> 
>  fs/file_table.c              |  76 +++++-
>  fs/locks.c                   |  37 +++
>  fs/nfsd/Kconfig              |   2 +
>  fs/nfsd/Makefile             |   3 +-
>  fs/nfsd/export.c             |  14 +
>  fs/nfsd/filecache.c          | 613 +++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/filecache.h          |  38 +++
>  fs/nfsd/nfs3proc.c           |   2 +-
>  fs/nfsd/nfs4layouts.c        |  12 +-
>  fs/nfsd/nfs4proc.c           |  32 +--
>  fs/nfsd/nfs4state.c          | 174 ++++++------
>  fs/nfsd/nfs4xdr.c            |  16 +-
>  fs/nfsd/nfsctl.c             |  10 +
>  fs/nfsd/nfsproc.c            |   2 +-
>  fs/nfsd/nfssvc.c             |  16 +-
>  fs/nfsd/state.h              |  10 +-
>  fs/nfsd/trace.c              |   2 -
>  fs/nfsd/trace.h              | 129 +++++++++
>  fs/nfsd/vfs.c                | 269 +++++--------------
>  fs/nfsd/vfs.h                |  11 +-
>  fs/nfsd/xdr4.h               |  15 +-
>  fs/notify/group.c            |   2 +
>  fs/notify/mark.c             |   3 +
>  include/linux/file.h         |   1 +
>  include/linux/fs.h           |   1 +
>  include/linux/list_lru.h     |  13 +
>  include/linux/sunrpc/cache.h |   1 +
>  mm/list_lru.c                |  15 ++
>  net/sunrpc/cache.c           |   3 +
>  29 files changed, 1149 insertions(+), 373 deletions(-)
>  create mode 100644 fs/nfsd/filecache.c
>  create mode 100644 fs/nfsd/filecache.h
> 
> -- 
> 2.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/20] nfsd: open file caching
  2015-10-08 16:42   ` [PATCH v5 00/20] nfsd: open file caching J. Bruce Fields
@ 2015-10-08 16:55     ` Jeff Layton
       [not found]       ` <20151008125529.3f30308e-08S845evdOaAjSkqwZiSMmfYqLom42DlXqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2015-10-08 16:55 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, linux-fsdevel, Al Viro

On Thu, 8 Oct 2015 12:42:25 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> I get a this on the client running some lease tests:
> 
> [   38.552120] BUG: unable to handle kernel NULL pointer dereference at (null)
> [   38.552723] IP: [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
> [   38.553111] PGD 56c2d067 PUD 51145067 PMD 0 
> [   38.553534] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC 
> [   38.554128] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> [   38.555102] CPU: 0 PID: 4890 Comm: lease_tests Not tainted 4.3.0-rc3-14186-g7619b8e #322
> [   38.555593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [   38.556005] task: ffff880075bd8080 ti: ffff880055560000 task.ti: ffff880055560000
> [   38.556005] RIP: 0010:[<ffffffff811fcb3f>]  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
> [   38.556005] RSP: 0018:ffff880055563e98  EFLAGS: 00010246
> [   38.556005] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff880055563ec8
> [   38.556005] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880051133e40
> [   38.556005] RBP: ffff880055563eb8 R08: 0000000000000000 R09: 00007ffc941da360
> [   38.556005] R10: 0000000000000008 R11: 0000000000000212 R12: ffff880051133e40
> [   38.556005] R13: 0000000000000000 R14: ffff880051133e40 R15: ffff880051133e40
> [   38.556005] FS:  00007fbbe6864700(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
> [   38.556005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   38.556005] CR2: 0000000000000000 CR3: 00000000590b0000 CR4: 00000000000406f0
> [   38.556005] Stack:
> [   38.556005]  ffff880056dd1f88 0000000000000002 0000000000000400 0000000000000002
> [   38.556005]  ffff880055563ef8 ffffffff811fd4c1 ffff880051133e40 ffffffff8157b913
> [   38.556005]  0000000000000000 0000000000000000 0000000000000400 0000000000000002
> [   38.556005] Call Trace:
> [   38.556005]  [<ffffffff811fd4c1>] fcntl_setlease+0xa1/0xd0
> [   38.556005]  [<ffffffff8157b913>] ? security_file_fcntl+0x43/0x60
> [   38.556005]  [<ffffffff811bc74f>] SyS_fcntl+0x31f/0x630
> [   38.556005]  [<ffffffff81a77117>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [   38.556005] Code: ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 49 89 d5 49 89 fc 48 89 f3 48 83 ec 08 48 83 fe 02 <48> 8b 12 74 14 48 c7 c7 40 cb 27 83 48 89 4d e0 e8 9c d8 e9 ff 
> [   38.556005] RIP  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
> [   38.556005]  RSP <ffff880055563e98>
> [   38.556005] CR2: 0000000000000000
> [   38.573673] ---[ end trace 2e6e1d4b9df8a11e ]---
> 
> --b.

My bad...it needs this patch. I'll roll this into the set before the
next posting.


>From 9f04033dcf00f7b252f03c8782795b6a1f847991 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jeff.layton@primarydata.com>
Date: Thu, 8 Oct 2015 12:53:54 -0400
Subject: [PATCH] locks: "lease" pointer can be NULL

...in which case we just want to skip the notifier.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/locks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index a2d5794d713a..4fccd3035842 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1835,7 +1835,8 @@ setlease_notifier(long arg, struct file_lock *lease)
 int
 vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
 {
-	setlease_notifier(arg, *lease);
+	if (lease)
+		setlease_notifier(arg, *lease);
 	if (filp->f_op->setlease)
 		return filp->f_op->setlease(filp, arg, lease, priv);
 	else
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/20] nfsd: open file caching
       [not found]       ` <20151008125529.3f30308e-08S845evdOaAjSkqwZiSMmfYqLom42DlXqFh9Ls21Oc@public.gmane.org>
@ 2015-10-08 18:04         ` J. Bruce Fields
       [not found]           ` <20151008180400.GB496-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: J. Bruce Fields @ 2015-10-08 18:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Thu, Oct 08, 2015 at 12:55:29PM -0400, Jeff Layton wrote:
> My bad...it needs this patch. I'll roll this into the set before the
> next posting.

Oh, good, thanks.

Also, just seen on the server side--not sure what was going on at the
time.

There were a ton of these:

Oct 08 12:35:07 f21-1.fieldses.org kernel: ------------[ cut here ]------------
Oct 08 12:35:07 f21-1.fieldses.org kernel: WARNING: CPU: 1 PID: 584 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0()
Oct 08 12:35:07 f21-1.fieldses.org kernel: list_del corruption.  prev->next should be ffff88004cb23f80, but was b6a7e8df8948e4eb
Oct 08 12:35:07 f21-1.fieldses.org kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
Oct 08 12:35:07 f21-1.fieldses.org kernel: CPU: 1 PID: 584 Comm: fsnotify_mark Not tainted 4.3.0-rc3-14186-g7619b8e #322
Oct 08 12:35:07 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffffffff81f62683 ffff880071af3d50 ffffffff8160540c ffff880071af3d98
Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3d88 ffffffff81077692 ffff88004cb23f80 ffffffff8109c160
Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3e08 ffff880071af3e30 ffff88004cb23f70 ffff880071af3de8
Oct 08 12:35:07 f21-1.fieldses.org kernel: Call Trace:
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8160540c>] dump_stack+0x4e/0x82
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81077692>] warn_slowpath_common+0x82/0xc0
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8109c160>] ?  sort_range+0x20/0x30
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8107771c>] warn_slowpath_fmt+0x4c/0x50
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8162219e>] __list_del_entry+0x9e/0xc0
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef485>] fsnotify_mark_destroy+0x95/0x140
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff810baa10>] ?  wait_woken+0x90/0x90
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef3f0>] ?  fsnotify_put_mark+0x30/0x30
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ?  _raw_spin_unlock_irq+0x2c/0x50
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
Oct 08 12:35:07 f21-1.fieldses.org kernel: ---[ end trace 687abd8552e06b32 ]---

Then:

Oct 08 12:41:54 f21-1.fieldses.org kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Oct 08 12:41:54 f21-1.fieldses.org kernel: IP: [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel: PGD 0 
Oct 08 12:41:54 f21-1.fieldses.org kernel: Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC 
Oct 08 12:41:54 f21-1.fieldses.org kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
Oct 08 12:41:54 f21-1.fieldses.org kernel: CPU: 1 PID: 4360 Comm: nfsd Tainted: G    B D W       4.3.0-rc3-14186-g7619b8e #322
Oct 08 12:41:54 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
Oct 08 12:41:54 f21-1.fieldses.org kernel: task: ffff880051ed8040 ti: ffff880051edc000 task.ti: ffff880051edc000
Oct 08 12:41:54 f21-1.fieldses.org kernel: RIP: 0010:[<ffffffff811fcb3f>]  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel: RSP: 0018:ffff880051edfc98  EFLAGS: 00010246
Oct 08 12:41:54 f21-1.fieldses.org kernel: RAX: 0000000080000000 RBX: 0000000000000002 RCX: ffff880051edfcc8
Oct 08 12:41:54 f21-1.fieldses.org kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88004b7d8e40
Oct 08 12:41:54 f21-1.fieldses.org kernel: RBP: ffff880051edfcb8 R08: 0000000000000001 R09: 0000000000000000
Oct 08 12:41:54 f21-1.fieldses.org kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004b7d8e40
Oct 08 12:41:54 f21-1.fieldses.org kernel: R13: 0000000000000000 R14: ffff88007c64cf80 R15: ffff880069fa5240
Oct 08 12:41:54 f21-1.fieldses.org kernel: FS:  0000000000000000(0000) GS:ffff88007f900000(0000) knlGS:0000000000000000
Oct 08 12:41:54 f21-1.fieldses.org kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 08 12:41:54 f21-1.fieldses.org kernel: CR2: 0000000000000000 CR3: 000000000220b000 CR4: 00000000000406e0
Oct 08 12:41:54 f21-1.fieldses.org kernel: Stack:
Oct 08 12:41:54 f21-1.fieldses.org kernel:  ffff88006dbf9ea0 ffff880051c59f18 ffff88003ed0ceb0 0000000000000001
Oct 08 12:41:54 f21-1.fieldses.org kernel:  ffff880051edfcd8 ffffffffa00d6816 ffff88006dbf9e98 0000000000000000
Oct 08 12:41:54 f21-1.fieldses.org kernel:  ffff880051edfd20 ffffffffa00e0811 ffffffffa00e05e5 ffff88003ed0ceb0
Oct 08 12:41:54 f21-1.fieldses.org kernel: Call Trace:
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00d6816>] nfs4_put_deleg_lease+0x76/0x90 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00e0811>] nfsd4_delegreturn+0x231/0x240 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00e05e5>] ? nfsd4_delegreturn+0x5/0x240 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00ca63a>] nfsd4_proc_compound+0x38a/0x660 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b4608>] nfsd_dispatch+0xb8/0x200 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00151ff>] svc_process_common+0x40f/0x620 [sunrpc]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa0015557>] svc_process+0x147/0x320 [sunrpc]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b3b71>] nfsd+0x181/0x280 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b39f5>] ? nfsd+0x5/0x280 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b39f0>] ? nfsd_destroy+0x190/0x190 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ? _raw_spin_unlock_irq+0x2c/0x50
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:41:54 f21-1.fieldses.org kernel: Code: ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 49 89 d5 49 89 fc 48 89 f3 48 83 ec 08 48 83 fe 02 <48> 
Oct 08 12:41:54 f21-1.fieldses.org kernel: RIP  [<ffffffff811fcb3f>] vfs_setlease+0x1f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel:  RSP <ffff880051edfc98>
Oct 08 12:41:54 f21-1.fieldses.org kernel: CR2: 0000000000000000
Oct 08 12:41:54 f21-1.fieldses.org kernel: ---[ end trace 687abd8552e07823 ]---
Oct 08 12:41:54 f21-1.fieldses.org kernel: BUG: sleeping function called from invalid context at include/linux/sched.h:2768
Oct 08 12:41:54 f21-1.fieldses.org kernel: in_atomic(): 0, irqs_disabled(): 1, pid: 4360, name: nfsd
Oct 08 12:41:54 f21-1.fieldses.org kernel: INFO: lockdep is turned off.
Oct 08 12:41:54 f21-1.fieldses.org kernel: irq event stamp: 1107102
Oct 08 12:41:54 f21-1.fieldses.org kernel: hardirqs last  enabled at (1107101): [<ffffffff81a76797>] _raw_spin_unlock_irqrestore+0x57/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel: hardirqs last disabled at (1107102): [<ffffffff81a76607>] _raw_spin_lock_irq+0x17/0x50
Oct 08 12:41:54 f21-1.fieldses.org kernel: softirqs last  enabled at (1107096): [<ffffffff81927075>] release_sock+0x165/0x1b0
Oct 08 12:41:54 f21-1.fieldses.org kernel: softirqs last disabled at (1107094): [<ffffffff81926f44>] release_sock+0x34/0x1b0
Oct 08 12:41:54 f21-1.fieldses.org kernel: CPU: 1 PID: 4360 Comm: nfsd Tainted: G    B D W       4.3.0-rc3-14186-g7619b8e #322
Oct 08 12:41:54 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
Oct 08 12:41:54 f21-1.fieldses.org kernel:  ffff880051ed8040 ffff880051edf9b0 ffffffff8160540c 0000000000000000
Oct 08 12:41:54 f21-1.fieldses.org kernel:  ffff880051edf9d8 ffffffff810a02de ffffffff81f0907b 0000000000000ad0
Oct 08 12:41:54 f21-1.fieldses.org kernel:  0000000000000000 ffff880051edfa00 ffffffff810a0409 ffff880051ed8040
Oct 08 12:41:54 f21-1.fieldses.org kernel: Call Trace:
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff8160540c>] dump_stack+0x4e/0x82
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810a02de>] ___might_sleep+0x15e/0x240
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810a0409>] __might_sleep+0x49/0x80
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81087fc4>] exit_signals+0x24/0x120
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810796a2>] do_exit+0xb2/0xbc0
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810d2345>] ? kmsg_dump+0x135/0x180
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810d2232>] ? kmsg_dump+0x22/0x180
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff8100795d>] oops_end+0x6d/0x90
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff810443ea>] no_context+0x13a/0x360
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81583edb>] ? selinux_cred_prepare+0x1b/0x30
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff8104471e>] __bad_area_nosemaphore+0x10e/0x220
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81583edb>] ? selinux_cred_prepare+0x1b/0x30
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81044843>] bad_area_nosemaphore+0x13/0x20
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81044a9c>] __do_page_fault+0x8c/0x490
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81000fc0>] ? trace_hardirqs_off_thunk+0x17/0x19
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81044eac>] do_page_fault+0xc/0x10
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81a78f52>] page_fault+0x22/0x30
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff811fcb3f>] ? vfs_setlease+0x1f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00d6816>] nfs4_put_deleg_lease+0x76/0x90 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00e0811>] nfsd4_delegreturn+0x231/0x240 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00e05e5>] ? nfsd4_delegreturn+0x5/0x240 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00ca63a>] nfsd4_proc_compound+0x38a/0x660 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b4608>] nfsd_dispatch+0xb8/0x200 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00151ff>] svc_process_common+0x40f/0x620 [sunrpc]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa0015557>] svc_process+0x147/0x320 [sunrpc]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b3b71>] nfsd+0x181/0x280 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b39f5>] ? nfsd+0x5/0x280 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffffa00b39f0>] ? nfsd_destroy+0x190/0x190 [nfsd]
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ? _raw_spin_unlock_irq+0x2c/0x50
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
Oct 08 12:41:54 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:41:54 f21-1.fieldses.org kernel: nfsd (4360) used greatest stack depth: 12280 bytes left
Oct 08 12:42:09 f21-1.fieldses.org kernel: general protection fault: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC 
Oct 08 12:42:09 f21-1.fieldses.org kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
Oct 08 12:42:09 f21-1.fieldses.org kernel: CPU: 1 PID: 4361 Comm: nfsd Tainted: G    B D W       4.3.0-rc3-14186-g7619b8e #322
Oct 08 12:42:09 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
Oct 08 12:42:09 f21-1.fieldses.org kernel: task: ffff880051f20080 ti: ffff880051f24000 task.ti: ffff880051f24000
Oct 08 12:42:09 f21-1.fieldses.org kernel: RIP: 0010:[<ffffffffa00c21ce>]  [<ffffffffa00c21ce>] nfsd_file_acquire+0x1ce/0x820 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel: RSP: 0018:ffff880051f27ad0  EFLAGS: 00010202
Oct 08 12:42:09 f21-1.fieldses.org kernel: RAX: 5a5a5a5a5a5a5a5a RBX: ffff88003e9aaf18 RCX: 0000000000000001
Oct 08 12:42:09 f21-1.fieldses.org kernel: RDX: 0000000000000008 RSI: 00000000000003db RDI: ffffffffa00f53c4
Oct 08 12:42:09 f21-1.fieldses.org kernel: RBP: ffff880051f27b40 R08: 0000000000000000 R09: 0000000000000000
Oct 08 12:42:09 f21-1.fieldses.org kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
Oct 08 12:42:09 f21-1.fieldses.org kernel: R13: 0000000000000000 R14: ffff88003e9aaf50 R15: 0000000000000002
Oct 08 12:42:09 f21-1.fieldses.org kernel: FS:  0000000000000000(0000) GS:ffff88007f900000(0000) knlGS:0000000000000000
Oct 08 12:42:09 f21-1.fieldses.org kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 08 12:42:09 f21-1.fieldses.org kernel: CR2: 00007f8f5903c000 CR3: 000000005d801000 CR4: 00000000000406e0
Oct 08 12:42:09 f21-1.fieldses.org kernel: Stack:
Oct 08 12:42:09 f21-1.fieldses.org kernel:  ffffffffa00c20c9 0000000200000000 0200000000000000 ffff880051f27b78
Oct 08 12:42:09 f21-1.fieldses.org kernel:  ffff880056436e00 ffff880051ed6000 0000000000039720 00000cc400000000
Oct 08 12:42:09 f21-1.fieldses.org kernel:  ffff880040e03c90 ffff88006dbf9e98 0000000000000000 ffff88006dbf9ea0
Oct 08 12:42:09 f21-1.fieldses.org kernel: Call Trace:
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00c20c9>] ? nfsd_file_acquire+0xc9/0x820 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00d7dc1>] nfs4_get_vfs_file+0x2e1/0x3e0 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00ddc9b>] ? nfsd4_process_open2+0x2bb/0x1400 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00ddcb4>] nfsd4_process_open2+0x2d4/0x1400 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00dd9e5>] ? nfsd4_process_open2+0x5/0x1400 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b888d>] ? fh_verify+0x15d/0x570 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b986f>] ? nfsd_lookup+0x7f/0x120 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00ca172>] nfsd4_open+0x7e2/0x920 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00c9995>] ? nfsd4_open+0x5/0x920 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00ca63a>] nfsd4_proc_compound+0x38a/0x660 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b4608>] nfsd_dispatch+0xb8/0x200 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00151ff>] svc_process_common+0x40f/0x620 [sunrpc]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa0015557>] svc_process+0x147/0x320 [sunrpc]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b3b71>] nfsd+0x181/0x280 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b39f5>] ? nfsd+0x5/0x280 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffffa00b39f0>] ? nfsd_destroy+0x190/0x190 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ? _raw_spin_unlock_irq+0x2c/0x50
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
Oct 08 12:42:09 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
Oct 08 12:42:09 f21-1.fieldses.org kernel: Code: ff ff 41 f7 c4 00 02 00 00 75 53 44 89 e2 d1 ea 89 d1 48 8b 53 38 83 e1 01 83 e2 04 75 0d 48 8b 53 38 83 e2 08 74 37 84 c9 74 33 <48> 
Oct 08 12:42:09 f21-1.fieldses.org kernel: RIP  [<ffffffffa00c21ce>] nfsd_file_acquire+0x1ce/0x820 [nfsd]
Oct 08 12:42:09 f21-1.fieldses.org kernel:  RSP <ffff880051f27ad0>
Oct 08 12:42:09 f21-1.fieldses.org kernel: ---[ end trace 687abd8552e07824 ]---

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/20] nfsd: open file caching
       [not found]           ` <20151008180400.GB496-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2015-10-10 11:19             ` Jeff Layton
  2015-10-10 13:48               ` J. Bruce Fields
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2015-10-10 11:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Thu, 8 Oct 2015 14:04:00 -0400
"J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:

> On Thu, Oct 08, 2015 at 12:55:29PM -0400, Jeff Layton wrote:
> > My bad...it needs this patch. I'll roll this into the set before the
> > next posting.
> 
> Oh, good, thanks.
> 
> Also, just seen on the server side--not sure what was going on at the
> time.
> 
> There were a ton of these:
> 
> Oct 08 12:35:07 f21-1.fieldses.org kernel: ------------[ cut here ]------------
> Oct 08 12:35:07 f21-1.fieldses.org kernel: WARNING: CPU: 1 PID: 584 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0()
> Oct 08 12:35:07 f21-1.fieldses.org kernel: list_del corruption.  prev->next should be ffff88004cb23f80, but was b6a7e8df8948e4eb
> Oct 08 12:35:07 f21-1.fieldses.org kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> Oct 08 12:35:07 f21-1.fieldses.org kernel: CPU: 1 PID: 584 Comm: fsnotify_mark Not tainted 4.3.0-rc3-14186-g7619b8e #322
> Oct 08 12:35:07 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffffffff81f62683 ffff880071af3d50 ffffffff8160540c ffff880071af3d98
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3d88 ffffffff81077692 ffff88004cb23f80 ffffffff8109c160
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3e08 ffff880071af3e30 ffff88004cb23f70 ffff880071af3de8
> Oct 08 12:35:07 f21-1.fieldses.org kernel: Call Trace:
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8160540c>] dump_stack+0x4e/0x82
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81077692>] warn_slowpath_common+0x82/0xc0
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8109c160>] ?  sort_range+0x20/0x30
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8107771c>] warn_slowpath_fmt+0x4c/0x50
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8162219e>] __list_del_entry+0x9e/0xc0
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef485>] fsnotify_mark_destroy+0x95/0x140
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff810baa10>] ?  wait_woken+0x90/0x90
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef3f0>] ?  fsnotify_put_mark+0x30/0x30
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ?  _raw_spin_unlock_irq+0x2c/0x50
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
> Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
> Oct 08 12:35:07 f21-1.fieldses.org kernel: ---[ end trace 687abd8552e06b32 ]---
> 

Thanks for the bug report! I think I understand the problem now:

It's in the way this patchset embeds a fsnotify_mark inside the
nfsd_file. The way fsnotify_destroy_mark works sort of requires that it
be freed separately since it wants to traverse these objects under a
srcu read lock. The rest of the stack traces are probably collateral
damage from that mem corruption.

I think I'll have to change the code to allocate the fsnotify_mark objects
separately. It may also be better to have just one mark per inode and
have each nfsd_file take a reference to the mark. I'll need to stare at
the code a bit longer to see what makes the most sense.

-- 
Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 00/20] nfsd: open file caching
  2015-10-10 11:19             ` Jeff Layton
@ 2015-10-10 13:48               ` J. Bruce Fields
  0 siblings, 0 replies; 29+ messages in thread
From: J. Bruce Fields @ 2015-10-10 13:48 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs, linux-fsdevel, Al Viro

On Sat, Oct 10, 2015 at 07:19:23AM -0400, Jeff Layton wrote:
> On Thu, 8 Oct 2015 14:04:00 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Thu, Oct 08, 2015 at 12:55:29PM -0400, Jeff Layton wrote:
> > > My bad...it needs this patch. I'll roll this into the set before the
> > > next posting.
> > 
> > Oh, good, thanks.
> > 
> > Also, just seen on the server side--not sure what was going on at the
> > time.
> > 
> > There were a ton of these:
> > 
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: ------------[ cut here ]------------
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: WARNING: CPU: 1 PID: 584 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0()
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: list_del corruption.  prev->next should be ffff88004cb23f80, but was b6a7e8df8948e4eb
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: CPU: 1 PID: 584 Comm: fsnotify_mark Not tainted 4.3.0-rc3-14186-g7619b8e #322
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffffffff81f62683 ffff880071af3d50 ffffffff8160540c ffff880071af3d98
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3d88 ffffffff81077692 ffff88004cb23f80 ffffffff8109c160
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  ffff880071af3e08 ffff880071af3e30 ffff88004cb23f70 ffff880071af3de8
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: Call Trace:
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8160540c>] dump_stack+0x4e/0x82
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81077692>] warn_slowpath_common+0x82/0xc0
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8109c160>] ?  sort_range+0x20/0x30
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8107771c>] warn_slowpath_fmt+0x4c/0x50
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff8162219e>] __list_del_entry+0x9e/0xc0
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef485>] fsnotify_mark_destroy+0x95/0x140
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff810baa10>] ?  wait_woken+0x90/0x90
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff811ef3f0>] ?  fsnotify_put_mark+0x30/0x30
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098d6f>] kthread+0xef/0x110
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a767dc>] ?  _raw_spin_unlock_irq+0x2c/0x50
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81a7748f>] ret_from_fork+0x3f/0x70
> > Oct 08 12:35:07 f21-1.fieldses.org kernel:  [<ffffffff81098c80>] ?  kthread_create_on_node+0x200/0x200
> > Oct 08 12:35:07 f21-1.fieldses.org kernel: ---[ end trace 687abd8552e06b32 ]---
> > 
> 
> Thanks for the bug report! I think I understand the problem now:
> 
> It's in the way this patchset embeds a fsnotify_mark inside the
> nfsd_file. The way fsnotify_destroy_mark works sort of requires that it
> be freed separately since it wants to traverse these objects under a
> srcu read lock. The rest of the stack traces are probably collateral
> damage from that mem corruption.
> 
> I think I'll have to change the code to allocate the fsnotify_mark objects
> separately. It may also be better to have just one mark per inode and
> have each nfsd_file take a reference to the mark. I'll need to stare at
> the code a bit longer to see what makes the most sense.

OK, thanks!

--b.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-10-10 13:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-05 11:02 [PATCH v5 00/20] nfsd: open file caching Jeff Layton
2015-10-05 11:02 ` [PATCH v5 01/20] list_lru: add list_lru_rotate Jeff Layton
     [not found]   ` <1444042962-6947-2-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-10-05 21:47     ` Dave Chinner
2015-10-06 11:43       ` Jeff Layton
     [not found]         ` <20151006074341.0e2f796e-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2015-10-07  1:09           ` Dave Chinner
2015-10-05 11:02 ` [PATCH v5 02/20] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
2015-10-05 11:02 ` [PATCH v5 03/20] fs: add a kerneldoc header to fput Jeff Layton
2015-10-05 11:02 ` [PATCH v5 04/20] fs: add fput_queue Jeff Layton
2015-10-05 11:02 ` [PATCH v5 07/20] locks: create a new notifier chain for lease attempts Jeff Layton
2015-10-05 11:02 ` [PATCH v5 09/20] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
2015-10-05 11:02 ` [PATCH v5 10/20] nfsd: add a new struct file caching facility to nfsd Jeff Layton
2015-10-05 11:02 ` [PATCH v5 11/20] nfsd: keep some rudimentary stats on nfsd_file cache Jeff Layton
     [not found] ` <1444042962-6947-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-10-05 11:02   ` [PATCH v5 05/20] fs: export flush_delayed_fput Jeff Layton
2015-10-05 11:02   ` [PATCH v5 06/20] fsnotify: export several symbols Jeff Layton
2015-10-05 11:02   ` [PATCH v5 08/20] nfsd: move include of state.h from trace.c to trace.h Jeff Layton
2015-10-05 11:02   ` [PATCH v5 12/20] nfsd: allow filecache open to skip fh_verify check Jeff Layton
2015-10-05 11:02   ` [PATCH v5 13/20] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
2015-10-05 11:02   ` [PATCH v5 14/20] nfsd: hook up nfsd_read to the " Jeff Layton
2015-10-05 11:02   ` [PATCH v5 18/20] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
2015-10-08 16:42   ` [PATCH v5 00/20] nfsd: open file caching J. Bruce Fields
2015-10-08 16:55     ` Jeff Layton
     [not found]       ` <20151008125529.3f30308e-08S845evdOaAjSkqwZiSMmfYqLom42DlXqFh9Ls21Oc@public.gmane.org>
2015-10-08 18:04         ` J. Bruce Fields
     [not found]           ` <20151008180400.GB496-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-10-10 11:19             ` Jeff Layton
2015-10-10 13:48               ` J. Bruce Fields
2015-10-05 11:02 ` [PATCH v5 15/20] nfsd: hook nfsd_commit up to the nfsd_file cache Jeff Layton
2015-10-05 11:02 ` [PATCH v5 16/20] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
2015-10-05 11:02 ` [PATCH v5 17/20] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
2015-10-05 11:02 ` [PATCH v5 19/20] nfsd: hook up nfs4_preprocess_stateid_op to " Jeff Layton
2015-10-05 11:02 ` [PATCH v5 20/20] nfsd: rip out the raparms cache Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).