All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd
@ 2015-11-17 11:52 Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 01/38] nfsd: add new io class tracepoint Jeff Layton
                   ` (37 more replies)
  0 siblings, 38 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

This patchset adds export operations to nfs, so that it can be
reexported via knfsd. You're probably thinking to yourself: "Why on
earth would I want to do such a thing?". I'm glad you asked...

The primary use case here is to allow clients that do not support newer
NFS versions (and in particular, those that don't support pnfs) to
access servers that do not support older NFS versions. Our main interest
is in allowing NFSv4.2 (particularly with pnfs) to be reexported via
NFSv3.

The traditional way of allowing legacy client access is to simply allow
the MDS to support older NFS versions, but handling in-band I/O can be a
fair bit of work for the MDS. By reexporting we can offload that work
onto a different host and allow the MDS to focus on layout handling.

I can also envision this being useful with the pnfs block protocol to
allow access by clients that don't have access to the SAN on which the
block devices live. An admin could designate a separate host on the SAN
as a "portal" and have the external clients mount that host.

The main part of the set is focused on adding an open file cache to
nfsd. NFSv4 has a pretty slow open codepath, so allowing the server to
cache the open files takes the performance from abyssimal to acceptable.

I've posted that portion of the patchset before so it may look familiar
to some folks. Implementing that involves some changes to a few other
vfs-layer subsystems:

fsnotify's SRCU cleanup needs to be converted to use call_srcu, and we
have to add a function to call srcu_barrier on it. This is so we can be
sure that all the fsnotify marks are gone before we tear down their
slabcache. It also needs some symbols exported so that nfsd can use it.

The fput machinery needs a function that allows non-kthreads to queue
the final __fput to the list that kthreads ordinarily use. This is
mainly to allow us to completely close files in advance of a setlease
attempt.

After that swath of patches, there is a pile of NFS client patches that
add the export operations that are necessary to allow it to be
reexported. Most of these are under the aegis of a new
CONFIG_NFS_REEXPORT option (that defaults to 'n'). There are a number of
caveats to reexporting that I've tried to document as well in a new
Documenation/ file.

I know it's ambitious for such a large set, but I'd like to
see this merged in v4.5 if possible. If not, then it would be helpful
to be able to make some progress toward that by getting the fput and
fsnotify changes merged for that release.

Jeff Layton (32):
  nfsd: add new io class tracepoint
  fs: have flush_delayed_fput flush the workqueue job
  fs: add a kerneldoc header to fput
  fs: rename "delayed_fput" infrastructure to "fput_global"
  fs: add fput_global
  fsnotify: fix a sparse warning
  fsnotify: export several symbols
  fsnotify: destroy marks with call_srcu instead of dedicated thread
  fsnotify: add a srcu barrier for fsnotify
  locks: create a new notifier chain for lease attempts
  sunrpc: add a new cache_detail operation for when a cache is flushed
  nfsd: add a new struct file caching facility to nfsd
  nfsd: keep some rudimentary stats on nfsd_file cache
  nfsd: allow filecache open to skip fh_verify check
  nfsd: hook up nfsd_write to the new nfsd_file cache
  nfsd: hook up nfsd_read to the nfsd_file cache
  nfsd: hook nfsd_commit up to the nfsd_file cache
  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  nfsd: have nfsd_test_lock use the nfsd_file cache
  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  nfsd: rip out the raparms cache
  nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
  nfsd: allow lockd to be forcibly disabled
  nfsd: add errno mapping for EREMOTEIO
  nfsd: return EREMOTE if we find an S_AUTOMOUNT inode
  nfsd: allow filesystems to opt out of subtree checking
  nfsd: close cached files prior to a REMOVE or RENAME that would
    replace target
  nfsd: retry once in nfsd_open on an -EOPENSTALE return
  nfs4: add NFSv4 LOOKUPP handlers
  nfs: add a get_parent export operation for NFS
  nfs: add a Kconfig option for NFS reexporting and documentation

Peng Tao (6):
  nfsd: close cached file when underlying file systems says no such file
  nfs: replace d_add with d_splice_alias in atomic_open
  nfs: add encode_fh export op
  nfs: add fh_to_dentry export op
  nfs: nfs_fh_to_dentry() make use of inode cache
  nfs: set export ops

 Documentation/filesystems/nfs/Exporting    |  52 ++
 Documentation/filesystems/nfs/reexport.txt |  95 ++++
 fs/file_table.c                            |  94 +++-
 fs/locks.c                                 |  37 ++
 fs/nfs/Kconfig                             |  11 +
 fs/nfs/Makefile                            |   1 +
 fs/nfs/dir.c                               |   2 +-
 fs/nfs/export.c                            | 169 +++++++
 fs/nfs/inode.c                             |  22 +
 fs/nfs/internal.h                          |   2 +
 fs/nfs/nfs4proc.c                          |  49 ++
 fs/nfs/nfs4trace.h                         |  29 ++
 fs/nfs/nfs4xdr.c                           |  73 +++
 fs/nfs/super.c                             |   4 +
 fs/nfsd/Kconfig                            |   2 +
 fs/nfsd/Makefile                           |   3 +-
 fs/nfsd/export.c                           |  20 +
 fs/nfsd/filecache.c                        | 748 +++++++++++++++++++++++++++++
 fs/nfsd/filecache.h                        |  45 ++
 fs/nfsd/nfs3proc.c                         |   2 +-
 fs/nfsd/nfs3xdr.c                          |   7 +-
 fs/nfsd/nfs4layouts.c                      |  12 +-
 fs/nfsd/nfs4proc.c                         |  32 +-
 fs/nfsd/nfs4state.c                        | 174 +++----
 fs/nfsd/nfs4xdr.c                          |  16 +-
 fs/nfsd/nfsctl.c                           |  10 +
 fs/nfsd/nfsfh.c                            |  14 +
 fs/nfsd/nfsfh.h                            |  28 +-
 fs/nfsd/nfsproc.c                          |   4 +-
 fs/nfsd/nfssvc.c                           |  27 +-
 fs/nfsd/state.h                            |  10 +-
 fs/nfsd/trace.h                            | 181 +++++++
 fs/nfsd/vfs.c                              | 423 ++++++++--------
 fs/nfsd/vfs.h                              |  11 +-
 fs/nfsd/xdr4.h                             |  15 +-
 fs/notify/fdinfo.c                         |   2 +-
 fs/notify/group.c                          |   2 +
 fs/notify/inode_mark.c                     |   1 +
 fs/notify/mark.c                           |  77 +--
 include/linux/exportfs.h                   |  12 +
 include/linux/file.h                       |   3 +-
 include/linux/fs.h                         |   1 +
 include/linux/fsnotify_backend.h           |  12 +-
 include/linux/nfs4.h                       |   1 +
 include/linux/nfs_fs.h                     |   1 +
 include/linux/nfs_xdr.h                    |  17 +-
 include/linux/sunrpc/cache.h               |   1 +
 init/main.c                                |   2 +-
 net/sunrpc/cache.c                         |   3 +
 49 files changed, 2105 insertions(+), 454 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/reexport.txt
 create mode 100644 fs/nfs/export.c
 create mode 100644 fs/nfsd/filecache.c
 create mode 100644 fs/nfsd/filecache.h

-- 
2.4.3


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v1 01/38] nfsd: add new io class tracepoint
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 02/38] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Add some new tracepoints in the nfsd read/write codepaths. The idea
is that this will give us the ability to measure how long each phase of
a read or write operation takes.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfsfh.h | 23 +++++++++++++++++++++++
 fs/nfsd/trace.h | 41 +++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.c   | 15 +++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 2087bae17582..0770bcb543c8 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -7,6 +7,7 @@
 #ifndef _LINUX_NFSD_NFSFH_H
 #define _LINUX_NFSD_NFSFH_H
 
+#include <linux/crc32.h>
 #include <linux/sunrpc/svc.h>
 #include <uapi/linux/nfsd/nfsfh.h>
 
@@ -205,6 +206,28 @@ static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2)
 	return true;
 }
 
+#ifdef CONFIG_CRC32
+/**
+ * knfsd_fh_hash - calculate the crc32 hash for the filehandle
+ * @fh - pointer to filehandle
+ *
+ * returns a crc32 hash for the filehandle that is compatible with
+ * the one displayed by "wireshark".
+ */
+
+static inline u32
+knfsd_fh_hash(struct knfsd_fh *fh)
+{
+	return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size);
+}
+#else
+static inline u32
+knfsd_fh_hash(struct knfsd_fh *fh)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_NFSD_V3
 /*
  * The wcc data stored in current_fh should be cleared
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 0befe762762b..3287041905da 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -8,6 +8,47 @@
 #define _NFSD_TRACE_H
 
 #include <linux/tracepoint.h>
+#include "nfsfh.h"
+
+DECLARE_EVENT_CLASS(nfsd_io_class,
+	TP_PROTO(struct svc_rqst *rqstp,
+		 struct svc_fh	*fhp,
+		 loff_t		offset,
+		 int		len),
+	TP_ARGS(rqstp, fhp, offset, len),
+	TP_STRUCT__entry(
+		__field(__be32, xid)
+		__field_struct(struct knfsd_fh, fh)
+		__field(loff_t, offset)
+		__field(int, len)
+	),
+	TP_fast_assign(
+		__entry->xid = rqstp->rq_xid,
+		fh_copy_shallow(&__entry->fh, &fhp->fh_handle);
+		__entry->offset = offset;
+		__entry->len = len;
+	),
+	TP_printk("xid=0x%x fh=0x%x offset=%lld len=%d",
+		  __be32_to_cpu(__entry->xid), knfsd_fh_hash(&__entry->fh),
+		  __entry->offset, __entry->len)
+)
+
+#define DEFINE_NFSD_IO_EVENT(name)		\
+DEFINE_EVENT(nfsd_io_class, name,		\
+	TP_PROTO(struct svc_rqst *rqstp,	\
+		 struct svc_fh	*fhp,		\
+		 loff_t		offset,		\
+		 int		len),		\
+	TP_ARGS(rqstp, fhp, offset, len))
+
+DEFINE_NFSD_IO_EVENT(read_start);
+DEFINE_NFSD_IO_EVENT(read_opened);
+DEFINE_NFSD_IO_EVENT(read_io_done);
+DEFINE_NFSD_IO_EVENT(read_done);
+DEFINE_NFSD_IO_EVENT(write_start);
+DEFINE_NFSD_IO_EVENT(write_opened);
+DEFINE_NFSD_IO_EVENT(write_io_done);
+DEFINE_NFSD_IO_EVENT(write_done);
 
 #include "state.h"
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66fbb446..3257c59dc860 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@
 
 #include "nfsd.h"
 #include "vfs.h"
+#include "trace.h"
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
@@ -983,16 +984,23 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct raparms	*ra;
 	__be32 err;
 
+	trace_read_start(rqstp, fhp, offset, vlen);
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
 	if (err)
 		return err;
 
 	ra = nfsd_init_raparms(file);
+
+	trace_read_opened(rqstp, fhp, offset, vlen);
 	err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
+	trace_read_io_done(rqstp, fhp, offset, vlen);
+
 	if (ra)
 		nfsd_put_raparams(file, ra);
 	fput(file);
 
+	trace_read_done(rqstp, fhp, offset, vlen);
+
 	return err;
 }
 
@@ -1008,24 +1016,31 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 {
 	__be32			err = 0;
 
+	trace_write_start(rqstp, fhp, offset, vlen);
+
 	if (file) {
 		err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
 				NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
 		if (err)
 			goto out;
+		trace_write_opened(rqstp, fhp, offset, vlen);
 		err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
 				stablep);
+		trace_write_io_done(rqstp, fhp, offset, vlen);
 	} else {
 		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 		if (err)
 			goto out;
 
+		trace_write_opened(rqstp, fhp, offset, vlen);
 		if (cnt)
 			err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
 					     cnt, stablep);
+		trace_write_io_done(rqstp, fhp, offset, vlen);
 		fput(file);
 	}
 out:
+	trace_write_done(rqstp, fhp, offset, vlen);
 	return err;
 }
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 02/38] fs: have flush_delayed_fput flush the workqueue job
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 01/38] nfsd: add new io class tracepoint Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 03/38] fs: add a kerneldoc header to fput Jeff Layton
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work.  It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..52cc6803c07a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
  */
 void flush_delayed_fput(void)
 {
-	delayed_fput(NULL);
+	flush_delayed_work(&delayed_fput_work);
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 03/38] fs: add a kerneldoc header to fput
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 01/38] nfsd: add new io class tracepoint Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 02/38] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 04/38] fs: rename "delayed_fput" infrastructure to "fput_global" Jeff Layton
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

...and move its EXPORT_SYMBOL just below the function.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 52cc6803c07a..8cfeaee6323f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,6 +261,25 @@ void flush_delayed_fput(void)
 	flush_delayed_work(&delayed_fput_work);
 }
 
+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the final cleanup work.
+ *
+ * Why not just do it synchronously? __fput can involve taking locks of all
+ * sorts, and doing it synchronously means that the callers must take extra care
+ * not to deadlock. That can be very difficult to ensure, so by deferring it
+ * until just before return to userland or to the workqueue, we sidestep that
+ * nastiness. Also, __fput can be quite stack intensive, so doing a final fput
+ * has the possibility of blowing up if we don't take steps to ensure that we
+ * have enough stack space to make it work.
+ */
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
@@ -281,6 +300,7 @@ void fput(struct file *file)
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
+EXPORT_SYMBOL(fput);
 
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
@@ -299,7 +319,6 @@ void __fput_sync(struct file *file)
 	}
 }
 
-EXPORT_SYMBOL(fput);
 
 void put_filp(struct file *file)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 04/38] fs: rename "delayed_fput" infrastructure to "fput_global"
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (2 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 03/38] fs: add a kerneldoc header to fput Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 05/38] fs: add fput_global Jeff Layton
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

delayed_fput work is only delayed until the workqueue job runs. It's
even possible that it will run immediately after you call it if the
workqueue job happens to be executing at the time. What it does
guarantee is that the final __fput will run in a different task than the
the one that queued it. It's this property that we want to use in the
NFSD code to close files in advance of a setlease attempt by userland.

Change the name of the "delayed_fput" infrastructure to "fput_global"
as that better describes what this code does.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 38 ++++++++++++++++++++------------------
 include/linux/file.h |  2 +-
 init/main.c          |  2 +-
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 8cfeaee6323f..0bf8ddc680ab 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -227,10 +227,10 @@ static void __fput(struct file *file)
 	mntput(mnt);
 }
 
-static LLIST_HEAD(delayed_fput_list);
-static void delayed_fput(struct work_struct *unused)
+static LLIST_HEAD(global_fput_list);
+static void global_fput(struct work_struct *unused)
 {
-	struct llist_node *node = llist_del_all(&delayed_fput_list);
+	struct llist_node *node = llist_del_all(&global_fput_list);
 	struct llist_node *next;
 
 	for (; node; node = next) {
@@ -244,21 +244,23 @@ static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+static DECLARE_DELAYED_WORK(global_fput_work, global_fput);
 
-/*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this.  The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that.  Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+/**
+ * fput_global_flush - ensure that all global_fput work is complete
+ *
+ * If a kernel thread really needs to have the final fput() it has done to
+ * complete, call this. One of the main users is the boot - we *do* need to
+ * make sure our writes to binaries on initramfs has not left us with opened
+ * struct file waiting for __fput() - execve() won't work without that.
+ *
+ * Please, don't add more callers without very good reasons; in particular,
+ * never call that with locks held and never from a thread that might need to
+ * do some work on any kind of umount.
  */
-void flush_delayed_fput(void)
+void fput_global_flush(void)
 {
-	flush_delayed_work(&delayed_fput_work);
+	flush_delayed_work(&global_fput_work);
 }
 
 /**
@@ -296,15 +298,15 @@ void fput(struct file *file)
 			 */
 		}
 
-		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
-			schedule_delayed_work(&delayed_fput_work, 1);
+		if (llist_add(&file->f_u.fu_llist, &global_fput_list))
+			schedule_delayed_work(&global_fput_work, 1);
 	}
 }
 EXPORT_SYMBOL(fput);
 
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
- * in some umount() (and thus can't use flush_delayed_fput() without
+ * in some umount() (and thus can't use fput_global_flush() without
  * risking deadlocks), need to wait for completion of __fput() and know
  * for this specific struct file it won't involve anything that would
  * need them.  Use only if you really need it - at the very least,
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..73bb7cee57e9 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -70,7 +70,7 @@ extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
-extern void flush_delayed_fput(void);
+extern void fput_global_flush(void);
 extern void __fput_sync(struct file *);
 
 #endif /* __LINUX_FILE_H */
diff --git a/init/main.c b/init/main.c
index 9e64d7097f1a..6cb24a16ddd6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -941,7 +941,7 @@ static int __ref kernel_init(void *unused)
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
-	flush_delayed_fput();
+	fput_global_flush();
 
 	if (ramdisk_execute_command) {
 		ret = run_init_process(ramdisk_execute_command);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 05/38] fs: add fput_global
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (3 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 04/38] fs: rename "delayed_fput" infrastructure to "fput_global" Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 06/38] fsnotify: fix a sparse warning Jeff Layton
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

When nfsd caches a file, we want to be able to close it down in advance
of setlease attempts. Setting a lease is generally done at the behest of
userland, so we need a mechanism to ensure that a userland task can
completely close a file without having to return back to userspace.

To do this, we borrow the global delayed_fput infrastructure that
kthreads use. fput_global will queue to the global_fput list if
the last reference was put. The caller can then use fput_global_flush
to force the final __fput to run.

While we're at it, export fput_global_flush as nfsd will need to use
that as well.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 33 +++++++++++++++++++++++++++++++++
 include/linux/file.h |  1 +
 2 files changed, 34 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index 0bf8ddc680ab..d214c45765e6 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -262,6 +262,7 @@ void fput_global_flush(void)
 {
 	flush_delayed_work(&global_fput_work);
 }
+EXPORT_SYMBOL(fput_global_flush);
 
 /**
  * fput - put a struct file reference
@@ -304,6 +305,38 @@ void fput(struct file *file)
 }
 EXPORT_SYMBOL(fput);
 
+/**
+ * fput_global - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * When fput is called in the context of a userland process, it'll queue the
+ * actual work (__fput()) to be done just before returning to userland. In some
+ * cases however, we need to ensure that the __fput runs before that point.
+ *
+ * There is no safe way to flush work that has been queued via task_work_add
+ * however, so to do this we borrow the global_fput infrastructure that
+ * kthreads use. The userland process can use fput_global() on one or more
+ * struct files and then call fput_global_flush() to ensure that they are
+ * completely closed before proceeding.
+ *
+ * The main user is nfsd, which uses this to ensure that all cached but
+ * otherwise unused files are closed to allow a userland request for a lease
+ * to proceed.
+ *
+ * Returns true if the final fput was done, false otherwise. The caller can
+ * use this to determine whether to fput_global_flush afterward.
+ */
+bool fput_global(struct file *file)
+{
+	if (atomic_long_dec_and_test(&file->f_count)) {
+		if (llist_add(&file->f_u.fu_llist, &global_fput_list))
+			schedule_delayed_work(&global_fput_work, 1);
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL(fput_global);
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use fput_global_flush() without
diff --git a/include/linux/file.h b/include/linux/file.h
index 73bb7cee57e9..7803aed00271 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
 struct file;
 
 extern void fput(struct file *);
+extern bool fput_global(struct file *);
 
 struct file_operations;
 struct vfsmount;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 06/38] fsnotify: fix a sparse warning
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (4 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 05/38] fs: add fput_global Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 07/38] fsnotify: export several symbols Jeff Layton
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/notify/fdinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index fd98e5100cab..0c76916e42f0 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -48,7 +48,7 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
 	f.handle.handle_bytes = sizeof(f.pad);
 	size = f.handle.handle_bytes >> 2;
 
-	ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, 0);
+	ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL);
 	if ((ret == FILEID_INVALID) || (ret < 0)) {
 		WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
 		return;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 07/38] fsnotify: export several symbols
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (5 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 06/38] fsnotify: fix a sparse warning Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 08/38] fsnotify: destroy marks with call_srcu instead of dedicated thread Jeff Layton
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

With nfsd's new open-file caching infrastructure, we need a way to know
when unlinks occur so it can close files that it may be holding open.
fsnotify fits the bill nicely, but the symbols aren't currently exported
to modules. Export some of its symbols so nfsd can use this
infrastructure.

Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/notify/group.c      | 2 ++
 fs/notify/inode_mark.c | 1 +
 fs/notify/mark.c       | 4 ++++
 3 files changed, 7 insertions(+)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index d16b62cb2854..295d08800126 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -81,6 +81,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
 	if (atomic_dec_and_test(&group->refcnt))
 		fsnotify_final_destroy_group(group);
 }
+EXPORT_SYMBOL_GPL(fsnotify_put_group);
 
 /*
  * Create a new fsnotify_group and hold a reference for the group returned.
@@ -109,6 +110,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 
 	return group;
 }
+EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
 
 int fsnotify_fasync(int fd, struct file *file, int on)
 {
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index e785fd954c30..76e608b8894c 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -87,6 +87,7 @@ struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
 
 	return mark;
 }
+EXPORT_SYMBOL_GPL(fsnotify_find_inode_mark);
 
 /*
  * If we are setting a mark mask on an inode mark we should pin the inode
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index fc0df4442f7b..c2bd670d4704 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -109,6 +109,7 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 		mark->free_mark(mark);
 	}
 }
+EXPORT_SYMBOL_GPL(fsnotify_put_mark);
 
 /* Calculate mask of events for a list of marks */
 u32 fsnotify_recalc_mask(struct hlist_head *head)
@@ -208,6 +209,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	mutex_unlock(&group->mark_mutex);
 	fsnotify_free_mark(mark);
 }
+EXPORT_SYMBOL_GPL(fsnotify_destroy_mark);
 
 void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock)
 {
@@ -402,6 +404,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(fsnotify_add_mark);
 
 /*
  * Given a list of marks, find the mark associated with given group. If found
@@ -492,6 +495,7 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
 	atomic_set(&mark->refcnt, 1);
 	mark->free_mark = free_mark;
 }
+EXPORT_SYMBOL_GPL(fsnotify_init_mark);
 
 static int fsnotify_mark_destroy(void *ignored)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 08/38] fsnotify: destroy marks with call_srcu instead of dedicated thread
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (6 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 07/38] fsnotify: export several symbols Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 09/38] fsnotify: add a srcu barrier for fsnotify Jeff Layton
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

At the time that this code was originally written, call_srcu didn't
exist, so this thread was required to ensure that we waited for that
SRCU grace period to settle before finally freeing the object.

It does exist now however and we can much more efficiently use call_srcu
to handle this. That also allows us to potentially use srcu_barrier
to ensure that they are all of the callbacks have run before proceeding.
In order to conserve space, we union the rcu_head with the g_list.

This will be necessary for nfsd which will allocate marks from a
dedicated slabcache. We have to be able to ensure that all of the
objects are destroyed before destroying the cache. That's fairly
difficult to ensure with dedicated thread doing the destruction.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Jan Kara <jack@suse.com>
---
 fs/notify/mark.c                 | 66 +++++++++-------------------------------
 include/linux/fsnotify_backend.h | 10 +++---
 2 files changed, 20 insertions(+), 56 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c2bd670d4704..00e7072d3c95 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -92,9 +92,6 @@
 #include "fsnotify.h"
 
 struct srcu_struct fsnotify_mark_srcu;
-static DEFINE_SPINLOCK(destroy_lock);
-static LIST_HEAD(destroy_list);
-static DECLARE_WAIT_QUEUE_HEAD(destroy_waitq);
 
 void fsnotify_get_mark(struct fsnotify_mark *mark)
 {
@@ -169,10 +166,19 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 	atomic_dec(&group->num_marks);
 }
 
+static void
+fsnotify_mark_free_rcu(struct rcu_head *rcu)
+{
+	struct fsnotify_mark	*mark;
+
+	mark = container_of(rcu, struct fsnotify_mark, g_rcu);
+	fsnotify_put_mark(mark);
+}
+
 /*
- * Free fsnotify mark. The freeing is actually happening from a kthread which
- * first waits for srcu period end. Caller must have a reference to the mark
- * or be protected by fsnotify_mark_srcu.
+ * Free fsnotify mark. The freeing is actually happening from a call_srcu
+ * callback. Caller must have a reference to the mark or be protected by
+ * fsnotify_mark_srcu.
  */
 void fsnotify_free_mark(struct fsnotify_mark *mark)
 {
@@ -187,10 +193,7 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
 	spin_unlock(&mark->lock);
 
-	spin_lock(&destroy_lock);
-	list_add(&mark->g_list, &destroy_list);
-	spin_unlock(&destroy_lock);
-	wake_up(&destroy_waitq);
+	call_srcu(&fsnotify_mark_srcu, &mark->g_rcu, fsnotify_mark_free_rcu);
 
 	/*
 	 * Some groups like to know that marks are being freed.  This is a
@@ -387,11 +390,7 @@ err:
 
 	spin_unlock(&mark->lock);
 
-	spin_lock(&destroy_lock);
-	list_add(&mark->g_list, &destroy_list);
-	spin_unlock(&destroy_lock);
-	wake_up(&destroy_waitq);
-
+	call_srcu(&fsnotify_mark_srcu, &mark->g_rcu, fsnotify_mark_free_rcu);
 	return ret;
 }
 
@@ -496,40 +495,3 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
 	mark->free_mark = free_mark;
 }
 EXPORT_SYMBOL_GPL(fsnotify_init_mark);
-
-static int fsnotify_mark_destroy(void *ignored)
-{
-	struct fsnotify_mark *mark, *next;
-	struct list_head private_destroy_list;
-
-	for (;;) {
-		spin_lock(&destroy_lock);
-		/* exchange the list head */
-		list_replace_init(&destroy_list, &private_destroy_list);
-		spin_unlock(&destroy_lock);
-
-		synchronize_srcu(&fsnotify_mark_srcu);
-
-		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
-			list_del_init(&mark->g_list);
-			fsnotify_put_mark(mark);
-		}
-
-		wait_event_interruptible(destroy_waitq, !list_empty(&destroy_list));
-	}
-
-	return 0;
-}
-
-static int __init fsnotify_mark_init(void)
-{
-	struct task_struct *thread;
-
-	thread = kthread_run(fsnotify_mark_destroy, NULL,
-			     "fsnotify_mark");
-	if (IS_ERR(thread))
-		panic("unable to start fsnotify mark destruction thread.");
-
-	return 0;
-}
-device_initcall(fsnotify_mark_init);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 533c4408529a..1c582747b410 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -217,10 +217,12 @@ struct fsnotify_mark {
 	/* Group this mark is for. Set on mark creation, stable until last ref
 	 * is dropped */
 	struct fsnotify_group *group;
-	/* List of marks by group->i_fsnotify_marks. Also reused for queueing
-	 * mark into destroy_list when it's waiting for the end of SRCU period
-	 * before it can be freed. [group->mark_mutex] */
-	struct list_head g_list;
+	union {
+		/* List of marks by group->i_fsnotify_marks. [group->mark_mutex] */
+		struct list_head g_list;
+		/* rcu_head for call_srcu-based destructor */
+		struct rcu_head g_rcu;
+	};
 	/* Protects inode / mnt pointers, flags, masks */
 	spinlock_t lock;
 	/* List of marks for inode / vfsmount [obj_lock] */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 09/38] fsnotify: add a srcu barrier for fsnotify
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (7 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 08/38] fsnotify: destroy marks with call_srcu instead of dedicated thread Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 10/38] locks: create a new notifier chain for lease attempts Jeff Layton
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

For nfsd, we'll need to be able to ensure that all fsnotify_mark
callbacks have run before we can tear down the nfsd_file_mark_cache.
That's simple now that we're using call_srcu directly. Just use
srcu_barrier to ensure that all the callbacks have completed.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/notify/mark.c                 | 7 +++++++
 include/linux/fsnotify_backend.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 00e7072d3c95..6b6b99447348 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -204,6 +204,13 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 		group->ops->freeing_mark(mark, group);
 }
 
+void
+fsnotify_srcu_barrier(void)
+{
+	srcu_barrier(&fsnotify_mark_srcu);
+}
+EXPORT_SYMBOL_GPL(fsnotify_srcu_barrier);
+
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 			   struct fsnotify_group *group)
 {
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 1c582747b410..b211eda20842 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -360,6 +360,8 @@ extern void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 extern void fsnotify_detach_mark(struct fsnotify_mark *mark);
 /* free mark */
 extern void fsnotify_free_mark(struct fsnotify_mark *mark);
+/* wait for all in-flight fsnotify srcu callbacks to run */
+extern void fsnotify_srcu_barrier(void);
 /* run all the marks in a group, and clear all of the vfsmount marks */
 extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the inode marks */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 10/38] locks: create a new notifier chain for lease attempts
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (8 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 09/38] fsnotify: add a srcu barrier for fsnotify Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 11/38] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

With the new file caching infrastructure in nfsd, we can end up holding
files open for an indefinite period of time, even when they are still
idle. This may prevent the kernel from handing out leases on the file,
which is something we don't want to block.

Fix this by running a SRCU notifier call chain whenever any lease
attempt is made. nfsd can then purge the nfsd_file cache of any entries
associated with that inode, and use the fput_global infrastructure to
make sure the final __fput is done.

Since SRCU is only conditionally compiled in, we must only define the
new chain if it's enabled, and users of the chain must ensure that
SRCU is enabled.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/locks.c         | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 86c94674ab22..e463d585f77d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1786,6 +1786,40 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
 }
 EXPORT_SYMBOL(generic_setlease);
 
+#if IS_ENABLED(CONFIG_SRCU)
+/*
+ * Kernel subsystems can register to be notified on any attempt to set
+ * a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
+ * to close files that it may have cached when there is an attempt to set a
+ * conflicting lease.
+ */
+struct srcu_notifier_head lease_notifier_chain;
+EXPORT_SYMBOL_GPL(lease_notifier_chain);
+
+static inline void
+lease_notifier_chain_init(void)
+{
+	srcu_init_notifier_head(&lease_notifier_chain);
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+	if (arg != F_UNLCK)
+		srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
+}
+#else /* !IS_ENABLED(CONFIG_SRCU) */
+static inline void
+lease_notifier_chain_init(void)
+{
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+}
+#endif /* IS_ENABLED(CONFIG_SRCU) */
+
 /**
  * vfs_setlease        -       sets a lease on an open file
  * @filp:	file pointer
@@ -1806,6 +1840,8 @@ EXPORT_SYMBOL(generic_setlease);
 int
 vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
 {
+	if (lease)
+		setlease_notifier(arg, *lease);
 	if (filp->f_op->setlease)
 		return filp->f_op->setlease(filp, arg, lease, priv);
 	else
@@ -2726,6 +2762,7 @@ static int __init filelock_init(void)
 	for_each_possible_cpu(i)
 		INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
 
+	lease_notifier_chain_init();
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cbf08d5c246e..0492ff432f01 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1042,6 +1042,7 @@ extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
 extern int fcntl_getlease(struct file *filp);
 
 /* fs/locks.c */
+extern struct srcu_notifier_head	lease_notifier_chain;
 void locks_free_lock_context(struct file_lock_context *ctx);
 void locks_free_lock(struct file_lock *fl);
 extern void locks_init_lock(struct file_lock *);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 11/38] sunrpc: add a new cache_detail operation for when a cache is flushed
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (9 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 10/38] locks: create a new notifier chain for lease attempts Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 12/38] nfsd: add a new struct file caching facility to nfsd Jeff Layton
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 include/linux/sunrpc/cache.h | 1 +
 net/sunrpc/cache.c           | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index ed03c9f7f908..e67b35d75417 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -100,6 +100,7 @@ struct cache_detail {
 					      int has_died);
 
 	struct cache_head *	(*alloc)(void);
+	void			(*flush)(void);
 	int			(*match)(struct cache_head *orig, struct cache_head *new);
 	void			(*init)(struct cache_head *orig, struct cache_head *new);
 	void			(*update)(struct cache_head *orig, struct cache_head *new);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 5e4f815c2b34..19ee8feec8f3 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1478,6 +1478,9 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 	cd->flush_time = then;
 	cache_flush();
 
+	if (cd->flush)
+		cd->flush();
+
 	*ppos += count;
 	return count;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 12/38] nfsd: add a new struct file caching facility to nfsd
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (10 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 11/38] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 13/38] nfsd: keep some rudimentary stats on nfsd_file cache Jeff Layton
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around since we never know when the next READ/WRITE will come in. So,
we keep them open more or less indefinitely until there is a reason
to close them:

The constructor adds a fsnotify_mark to the inode so that nfsd can be
informed when the link count drops to zero. This allows us to close the
file when it's deleted. A notifier chain is added to the lease
infrastructure that allows nfsd to be informed when there is an attempt
to set a lease on the inode and we clean the cache of any open files for
that inode when this occurs. A shrinker is also registered to allow
nfsd to close files under memory pressure, and the open file cache is
also cleaned out whenever the exports cache is flushed.

Note that this patch just adds the infrastructure to allow the caching
of open files. Later patches will actually make nfsd use it.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/Kconfig     |   2 +
 fs/nfsd/Makefile    |   3 +-
 fs/nfsd/export.c    |  14 ++
 fs/nfsd/filecache.c | 668 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h |  43 ++++
 fs/nfsd/nfssvc.c    |  10 +-
 fs/nfsd/trace.h     | 138 +++++++++++
 fs/nfsd/vfs.c       |   3 +-
 fs/nfsd/vfs.h       |   1 +
 9 files changed, 879 insertions(+), 3 deletions(-)
 create mode 100644 fs/nfsd/filecache.c
 create mode 100644 fs/nfsd/filecache.h

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index a0b77fc1bd39..95e0a91d41ef 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -6,6 +6,8 @@ config NFSD
 	select SUNRPC
 	select EXPORTFS
 	select NFS_ACL_SUPPORT if NFSD_V2_ACL
+	select SRCU
+	select FSNOTIFY
 	depends on MULTIUSER
 	help
 	  Choose Y here if you want to allow other computers to access
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 9a6028e120c6..8908bb467727 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_NFSD)	+= nfsd.o
 nfsd-y			+= trace.o
 
 nfsd-y 			+= nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
-			   export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+			   export.o auth.o lockd.o nfscache.o nfsxdr.o \
+			   stats.o filecache.o
 nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
 nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
 nfsd-$(CONFIG_NFSD_V3)	+= nfs3proc.o nfs3xdr.o
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b579f20..4b504edff121 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -21,6 +21,7 @@
 #include "nfsfh.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY	NFSDDBG_EXPORT
 
@@ -231,6 +232,18 @@ static struct cache_head *expkey_alloc(void)
 		return NULL;
 }
 
+static void
+expkey_flush(void)
+{
+	/*
+	 * Take the nfsd_mutex here to ensure that the file cache is not
+	 * destroyed while we're in the middle of flushing.
+	 */
+	mutex_lock(&nfsd_mutex);
+	nfsd_file_cache_purge();
+	mutex_unlock(&nfsd_mutex);
+}
+
 static struct cache_detail svc_expkey_cache_template = {
 	.owner		= THIS_MODULE,
 	.hash_size	= EXPKEY_HASHMAX,
@@ -243,6 +256,7 @@ static struct cache_detail svc_expkey_cache_template = {
 	.init		= expkey_init,
 	.update       	= expkey_update,
 	.alloc		= expkey_alloc,
+	.flush		= expkey_flush,
 };
 
 static int
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
new file mode 100644
index 000000000000..f6c4480c5d78
--- /dev/null
+++ b/fs/nfsd/filecache.c
@@ -0,0 +1,668 @@
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <jeff.layton@primarydata.com>
+ */
+
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+#include <linux/list_lru.h>
+#include <linux/fsnotify_backend.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "filecache.h"
+#include "trace.h"
+
+#define NFSDDBG_FACILITY	NFSDDBG_FH
+
+/* FIXME: dynamically size this for the machine somehow? */
+#define NFSD_FILE_HASH_BITS                   12
+#define NFSD_FILE_HASH_SIZE                  (1 << NFSD_FILE_HASH_BITS)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+	struct hlist_head	nfb_head;
+	spinlock_t		nfb_lock;
+};
+
+static struct kmem_cache		*nfsd_file_slab;
+static struct kmem_cache		*nfsd_file_mark_slab;
+static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
+static struct list_lru			nfsd_file_lru;
+static struct fsnotify_group		*nfsd_file_fsnotify_group;
+
+static void
+nfsd_file_slab_free(struct rcu_head *rcu)
+{
+	struct nfsd_file *nf = container_of(rcu, struct nfsd_file, nf_rcu);
+
+	kmem_cache_free(nfsd_file_slab, nf);
+}
+
+static void
+nfsd_file_mark_free(struct fsnotify_mark *mark)
+{
+	struct nfsd_file_mark *nfm = container_of(mark, struct nfsd_file_mark,
+						  nfm_mark);
+
+	kmem_cache_free(nfsd_file_mark_slab, nfm);
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_get(struct nfsd_file_mark *nfm)
+{
+	if (!atomic_inc_not_zero(&nfm->nfm_ref))
+		return NULL;
+	return nfm;
+}
+
+static void
+nfsd_file_mark_put(struct nfsd_file_mark *nfm)
+{
+	if (atomic_dec_and_test(&nfm->nfm_ref))
+		fsnotify_destroy_mark(&nfm->nfm_mark, nfsd_file_fsnotify_group);
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode)
+{
+	int			err;
+	struct fsnotify_mark	*mark;
+	struct nfsd_file_mark	*nfm = NULL, *new = NULL;
+
+	do {
+		mark = fsnotify_find_inode_mark(nfsd_file_fsnotify_group,
+						inode);
+		if (mark) {
+			nfm = nfsd_file_mark_get(container_of(mark,
+						 struct nfsd_file_mark,
+						 nfm_mark));
+			fsnotify_put_mark(mark);
+			if (likely(nfm))
+				break;
+		}
+
+		/* allocate a new nfm */
+		if (!new) {
+			new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL);
+			if (!new)
+				return NULL;
+			fsnotify_init_mark(&new->nfm_mark, nfsd_file_mark_free);
+			new->nfm_mark.mask = FS_ATTRIB|FS_DELETE_SELF;
+			atomic_set(&new->nfm_ref, 1);
+		}
+
+		err = fsnotify_add_mark(&new->nfm_mark,
+				nfsd_file_fsnotify_group, nf->nf_inode,
+				NULL, false);
+		if (likely(!err)) {
+			nfm = new;
+			new = NULL;
+		}
+	} while (unlikely(err == EEXIST));
+
+	if (new)
+		kmem_cache_free(nfsd_file_mark_slab, new);
+	return nfm;
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval)
+{
+	struct nfsd_file *nf;
+
+	nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
+	if (nf) {
+		INIT_HLIST_NODE(&nf->nf_node);
+		INIT_LIST_HEAD(&nf->nf_lru);
+		nf->nf_file = NULL;
+		nf->nf_flags = 0;
+		nf->nf_inode = inode;
+		nf->nf_hashval = hashval;
+		atomic_set(&nf->nf_ref, 1);
+		nf->nf_may = may & NFSD_FILE_MAY_MASK;
+		if (may & NFSD_MAY_NOT_BREAK_LEASE) {
+			if (may & NFSD_MAY_WRITE)
+				__set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			if (may & NFSD_MAY_READ)
+				__set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		}
+		nf->nf_mark = NULL;
+		trace_nfsd_file_alloc(nf);
+	}
+	return nf;
+}
+
+static void
+nfsd_file_put_final(struct nfsd_file *nf)
+{
+	trace_nfsd_file_put_final(nf);
+	if (nf->nf_mark)
+		nfsd_file_mark_put(nf->nf_mark);
+	if (nf->nf_file)
+		fput(nf->nf_file);
+	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+}
+
+static bool
+nfsd_file_put_final_delayed(struct nfsd_file *nf)
+{
+	bool flush = false;
+
+	trace_nfsd_file_put_final(nf);
+	if (nf->nf_mark)
+		nfsd_file_mark_put(nf->nf_mark);
+	if (nf->nf_file)
+		flush = fput_global(nf->nf_file);
+	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+	return flush;
+}
+
+static bool
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+	trace_nfsd_file_unhash(nf);
+	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
+		hlist_del_rcu(&nf->nf_node);
+		list_lru_del(&nfsd_file_lru, &nf->nf_lru);
+		return true;
+	}
+	return false;
+}
+
+static void
+nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+	lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+	trace_nfsd_file_unhash_and_release_locked(nf);
+	if (!nfsd_file_unhash(nf))
+		return;
+	if (!atomic_dec_and_test(&nf->nf_ref))
+		return;
+
+	list_add(&nf->nf_lru, dispose);
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+	trace_nfsd_file_put(nf);
+	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+	smp_mb__after_atomic();
+	if (atomic_dec_and_test(&nf->nf_ref)) {
+		WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+		nfsd_file_put_final(nf);
+	}
+}
+
+struct nfsd_file *
+nfsd_file_get(struct nfsd_file *nf)
+{
+	if (likely(atomic_inc_not_zero(&nf->nf_ref)))
+		return nf;
+	return NULL;
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+	struct nfsd_file *nf;
+
+	while(!list_empty(dispose)) {
+		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+		list_del(&nf->nf_lru);
+		nfsd_file_put_final(nf);
+	}
+}
+
+static void
+nfsd_file_dispose_list_sync(struct list_head *dispose)
+{
+	bool flush = false;
+	struct nfsd_file *nf;
+
+	while(!list_empty(dispose)) {
+		nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+		list_del(&nf->nf_lru);
+		if (nfsd_file_put_final_delayed(nf))
+			flush = true;
+	}
+	if (flush)
+		fput_global_flush();
+}
+
+static enum lru_status
+nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
+		 spinlock_t *lock, void *arg)
+	__releases(lock)
+	__acquires(lock)
+{
+	struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
+	bool unhashed;
+
+	/*
+	 * Do a lockless refcount check. The hashtable holds one reference, so
+	 * we look to see if anything else has a reference, or if any have
+	 * been put since the shrinker last ran. Those don't get unhashed and
+	 * released.
+	 *
+	 * Note that in the put path, we set the flag and then decrement the
+	 * counter. Here we check the counter and then test and clear the flag.
+	 * That order is deliberate to ensure that we can do this locklessly.
+	 */
+	if (atomic_read(&nf->nf_ref) > 1 ||
+	    test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags))
+		return LRU_ROTATE;
+
+	spin_unlock(lock);
+	spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	unhashed = nfsd_file_unhash(nf);
+	spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+	if (unhashed)
+		nfsd_file_put(nf);
+	spin_lock(lock);
+	return unhashed ? LRU_REMOVED_RETRY : LRU_RETRY;
+}
+
+static unsigned long
+nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc)
+{
+	return list_lru_count(&nfsd_file_lru);
+}
+
+static unsigned long
+nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
+{
+	return list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, NULL);
+}
+
+static struct shrinker	nfsd_file_shrinker = {
+	.scan_objects = nfsd_file_lru_scan,
+	.count_objects = nfsd_file_lru_count,
+	.seeks = 1,
+};
+
+static void
+__nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
+			struct list_head *dispose)
+{
+	struct nfsd_file	*nf;
+	struct hlist_node	*tmp;
+
+	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
+		if (inode == nf->nf_inode)
+			nfsd_file_unhash_and_release_locked(nf, dispose);
+	}
+	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put. Also ensure that any of the
+ * fputs also have their final __fput done as well.
+ */
+void
+nfsd_file_close_inode_sync(struct inode *inode)
+{
+	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
+						NFSD_FILE_HASH_BITS);
+	LIST_HEAD(dispose);
+
+	__nfsd_file_close_inode(inode, hashval, &dispose);
+	trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose));
+	nfsd_file_dispose_list_sync(&dispose);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put.
+ */
+static void
+nfsd_file_close_inode(struct inode *inode)
+{
+	unsigned int		hashval = (unsigned int)hash_long(inode->i_ino,
+						NFSD_FILE_HASH_BITS);
+	LIST_HEAD(dispose);
+
+	__nfsd_file_close_inode(inode, hashval, &dispose);
+	trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose));
+	nfsd_file_dispose_list(&dispose);
+}
+
+static int
+nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
+			    void *data)
+{
+	struct file_lock *fl = data;
+
+	/* Only close files for F_SETLEASE leases */
+	if (fl->fl_flags & FL_LEASE)
+		nfsd_file_close_inode_sync(file_inode(fl->fl_file));
+	return 0;
+}
+
+static struct notifier_block nfsd_file_lease_notifier = {
+	.notifier_call = nfsd_file_lease_notifier_call,
+};
+
+static int
+nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
+				struct inode *inode,
+				struct fsnotify_mark *inode_mark,
+				struct fsnotify_mark *vfsmount_mark,
+				u32 mask, void *data, int data_type,
+				const unsigned char *file_name, u32 cookie)
+{
+	trace_nfsd_file_fsnotify_handle_event(inode, mask);
+
+	/* Should be no marks on non-regular files */
+	if (!S_ISREG(inode->i_mode)) {
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+
+	/* ...and we don't do anything with vfsmount marks */
+	BUG_ON(vfsmount_mark);
+
+	/* don't close files if this was not the last link */
+	if (mask & FS_ATTRIB) {
+		if (inode->i_nlink)
+			return 0;
+	}
+
+	nfsd_file_close_inode(inode);
+	return 0;
+}
+
+
+const static struct fsnotify_ops nfsd_file_fsnotify_ops = {
+	.handle_event = nfsd_file_fsnotify_handle_event,
+};
+
+int
+nfsd_file_cache_init(void)
+{
+	int		ret = -ENOMEM;
+	unsigned int	i;
+
+	if (nfsd_file_hashtbl)
+		return 0;
+
+	nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+				sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+	if (!nfsd_file_hashtbl) {
+		pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
+		goto out_err;
+	}
+
+	nfsd_file_slab = kmem_cache_create("nfsd_file",
+				sizeof(struct nfsd_file), 0, 0, NULL);
+	if (!nfsd_file_slab) {
+		pr_err("nfsd: unable to create nfsd_file_slab\n");
+		goto out_err;
+	}
+
+	nfsd_file_mark_slab = kmem_cache_create("nfsd_file_mark",
+					sizeof(struct nfsd_file_mark), 0, 0, NULL);
+	if (!nfsd_file_mark_slab) {
+		pr_err("nfsd: unable to create nfsd_file_mark_slab\n");
+		goto out_err;
+	}
+
+
+	ret = list_lru_init(&nfsd_file_lru);
+	if (ret) {
+		pr_err("nfsd: failed to init nfsd_file_lru: %d\n", ret);
+		goto out_err;
+	}
+
+	ret = register_shrinker(&nfsd_file_shrinker);
+	if (ret) {
+		pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+		goto out_lru;
+	}
+
+	ret = srcu_notifier_chain_register(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+	if (ret) {
+		pr_err("nfsd: unable to register lease notifier: %d\n", ret);
+		goto out_shrinker;
+	}
+
+	nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops);
+	if (IS_ERR(nfsd_file_fsnotify_group)) {
+		pr_err("nfsd: unable to create fsnotify group: %ld\n",
+			PTR_ERR(nfsd_file_fsnotify_group));
+		nfsd_file_fsnotify_group = NULL;
+		goto out_notifier;
+	}
+
+	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+		INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+		spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+	}
+out:
+	return ret;
+out_notifier:
+	srcu_notifier_chain_unregister(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+out_shrinker:
+	unregister_shrinker(&nfsd_file_shrinker);
+out_lru:
+	list_lru_destroy(&nfsd_file_lru);
+out_err:
+	kmem_cache_destroy(nfsd_file_slab);
+	nfsd_file_slab = NULL;
+	kmem_cache_destroy(nfsd_file_mark_slab);
+	nfsd_file_mark_slab = NULL;
+	kfree(nfsd_file_hashtbl);
+	nfsd_file_hashtbl = NULL;
+	goto out;
+}
+
+void
+nfsd_file_cache_purge(void)
+{
+	unsigned int		i;
+	struct nfsd_file	*nf;
+	LIST_HEAD(dispose);
+
+	if (!nfsd_file_hashtbl)
+		return;
+
+	for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+		spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+		while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+			nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+					 struct nfsd_file, nf_node);
+			nfsd_file_unhash_and_release_locked(nf, &dispose);
+		}
+		spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+		nfsd_file_dispose_list(&dispose);
+	}
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+	LIST_HEAD(dispose);
+
+	srcu_notifier_chain_unregister(&lease_notifier_chain,
+				&nfsd_file_lease_notifier);
+	unregister_shrinker(&nfsd_file_shrinker);
+	nfsd_file_cache_purge();
+	list_lru_destroy(&nfsd_file_lru);
+	rcu_barrier();
+	fsnotify_put_group(nfsd_file_fsnotify_group);
+	nfsd_file_fsnotify_group = NULL;
+	kmem_cache_destroy(nfsd_file_slab);
+	nfsd_file_slab = NULL;
+	fsnotify_srcu_barrier();
+	kmem_cache_destroy(nfsd_file_mark_slab);
+	nfsd_file_mark_slab = NULL;
+	kfree(nfsd_file_hashtbl);
+	nfsd_file_hashtbl = NULL;
+}
+
+static struct nfsd_file *
+nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
+			unsigned int hashval)
+{
+	struct nfsd_file *nf;
+	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+	hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+				 nf_node) {
+		if ((need & nf->nf_may) != need)
+			continue;
+		if (nf->nf_inode == inode)
+			return nfsd_file_get(nf);
+	}
+	return NULL;
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **pnf)
+{
+	__be32	status;
+	struct nfsd_file *nf, *new = NULL;
+	struct inode *inode;
+	unsigned int hashval;
+
+	/* FIXME: skip this if fh_dentry is already set? */
+	status = fh_verify(rqstp, fhp, S_IFREG,
+				may_flags|NFSD_MAY_OWNER_OVERRIDE);
+	if (status != nfs_ok)
+		return status;
+
+	inode = d_inode(fhp->fh_dentry);
+	hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+retry:
+	rcu_read_lock();
+	nf = nfsd_file_find_locked(inode, may_flags, hashval);
+	rcu_read_unlock();
+	if (nf)
+		goto wait_for_construction;
+
+	if (!new) {
+		new = nfsd_file_alloc(inode, may_flags, hashval);
+		if (!new) {
+			trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags,
+						NULL, nfserr_jukebox);
+			return nfserr_jukebox;
+		}
+	}
+
+	spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+	nf = nfsd_file_find_locked(inode, may_flags, hashval);
+	if (likely(nf == NULL)) {
+		/* Take reference for the hashtable */
+		atomic_inc(&new->nf_ref);
+		__set_bit(NFSD_FILE_HASHED, &new->nf_flags);
+		__set_bit(NFSD_FILE_PENDING, &new->nf_flags);
+		list_lru_add(&nfsd_file_lru, &new->nf_lru);
+		hlist_add_head_rcu(&new->nf_node,
+				&nfsd_file_hashtbl[hashval].nfb_head);
+		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+		nf = new;
+		new = NULL;
+		goto open_file;
+	}
+	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+wait_for_construction:
+	wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+	/* Did construction of this file fail? */
+	if (!nf->nf_file) {
+		/*
+		 * We can only take over construction for this nfsd_file if the
+		 * MAY flags are equal. Otherwise, we put the reference and try
+		 * again.
+		 */
+		if ((may_flags & NFSD_FILE_MAY_MASK) != nf->nf_may) {
+			nfsd_file_put(nf);
+			goto retry;
+		}
+
+		/* try to take over construction for this file */
+		if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
+			goto wait_for_construction;
+
+		/* sync up the BREAK_* flags with our may_flags */
+		if (may_flags & NFSD_MAY_NOT_BREAK_LEASE) {
+			if (may_flags & NFSD_MAY_WRITE)
+				set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			if (may_flags & NFSD_MAY_READ)
+				set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		} else {
+			clear_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+			clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+		}
+
+		goto open_file;
+	}
+
+	if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
+		bool write = (may_flags & NFSD_MAY_WRITE);
+
+		if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
+		    (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
+			status = nfserrno(nfsd_open_break_lease(
+					file_inode(nf->nf_file), may_flags));
+			if (status == nfs_ok) {
+				clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+				if (write)
+					clear_bit(NFSD_FILE_BREAK_WRITE,
+						  &nf->nf_flags);
+			}
+		}
+	}
+out:
+	if (status == nfs_ok) {
+		*pnf = nf;
+	} else {
+		nfsd_file_put(nf);
+		nf = NULL;
+	}
+
+	if (new)
+		nfsd_file_put(new);
+
+	trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status);
+	return status;
+open_file:
+	if (!nf->nf_mark) {
+		nf->nf_mark = nfsd_file_mark_find_or_create(nf, inode);
+		if (!nf->nf_mark)
+			status = nfserr_jukebox;
+	}
+	/* FIXME: should we abort opening if the link count goes to 0? */
+	if (status == nfs_ok)
+		status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
+	smp_mb__after_atomic();
+	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+	goto out;
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
new file mode 100644
index 000000000000..a045c7fe3655
--- /dev/null
+++ b/fs/nfsd/filecache.h
@@ -0,0 +1,43 @@
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/fsnotify_backend.h>
+
+struct nfsd_file_mark {
+	struct fsnotify_mark	nfm_mark;
+	atomic_t		nfm_ref;
+};
+
+/*
+ * A representation of a file that has been opened by knfsd. These are hashed
+ * in the hashtable by inode pointer value. Note that this object doesn't
+ * hold a reference to the inode by itself, so the nf_inode pointer should
+ * never be dereferenced, only used for comparison.
+ */
+struct nfsd_file {
+	struct hlist_node	nf_node;
+	struct list_head	nf_lru;
+	struct rcu_head		nf_rcu;
+	struct file		*nf_file;
+#define NFSD_FILE_HASHED	(0)
+#define NFSD_FILE_PENDING	(1)
+#define NFSD_FILE_BREAK_READ	(2)
+#define NFSD_FILE_BREAK_WRITE	(3)
+#define NFSD_FILE_REFERENCED	(4)
+	unsigned long		nf_flags;
+	struct inode		*nf_inode;
+	unsigned int		nf_hashval;
+	atomic_t		nf_ref;
+	unsigned char		nf_may;
+	struct nfsd_file_mark	*nf_mark;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(void);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+void nfsd_file_close_inode_sync(struct inode *inode);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+		  unsigned int may_flags, struct nfsd_file **nfp);
+#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e2377dd63..d816bb3faa6e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -22,6 +22,7 @@
 #include "cache.h"
 #include "vfs.h"
 #include "netns.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY	NFSDDBG_SVC
 
@@ -224,11 +225,17 @@ static int nfsd_startup_generic(int nrservs)
 	if (ret)
 		goto dec_users;
 
-	ret = nfs4_state_start();
+	ret = nfsd_file_cache_init();
 	if (ret)
 		goto out_racache;
+
+	ret = nfs4_state_start();
+	if (ret)
+		goto out_file_cache;
 	return 0;
 
+out_file_cache:
+	nfsd_file_cache_shutdown();
 out_racache:
 	nfsd_racache_shutdown();
 dec_users:
@@ -242,6 +249,7 @@ static void nfsd_shutdown_generic(void)
 		return;
 
 	nfs4_state_shutdown();
+	nfsd_file_cache_shutdown();
 	nfsd_racache_shutdown();
 }
 
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 3287041905da..9174d126ff6e 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -51,6 +51,8 @@ DEFINE_NFSD_IO_EVENT(write_io_done);
 DEFINE_NFSD_IO_EVENT(write_done);
 
 #include "state.h"
+#include "filecache.h"
+#include "vfs.h"
 
 DECLARE_EVENT_CLASS(nfsd_stateid_class,
 	TP_PROTO(stateid_t *stp),
@@ -89,6 +91,142 @@ DEFINE_STATEID_EVENT(layout_recall_done);
 DEFINE_STATEID_EVENT(layout_recall_fail);
 DEFINE_STATEID_EVENT(layout_recall_release);
 
+#define show_nf_flags(val)						\
+	__print_flags(val, "|",						\
+		{ 1 << NFSD_FILE_HASHED,	"HASHED" },		\
+		{ 1 << NFSD_FILE_PENDING,	"PENDING" },		\
+		{ 1 << NFSD_FILE_BREAK_READ,	"BREAK_READ" },		\
+		{ 1 << NFSD_FILE_BREAK_WRITE,	"BREAK_WRITE" },	\
+		{ 1 << NFSD_FILE_REFERENCED,	"REFERENCED"})
+
+/* FIXME: This should probably be fleshed out in the future. */
+#define show_nf_may(val)						\
+	__print_flags(val, "|",						\
+		{ NFSD_MAY_READ,		"READ" },		\
+		{ NFSD_MAY_WRITE,		"WRITE" },		\
+		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" })
+
+DECLARE_EVENT_CLASS(nfsd_file_class,
+	TP_PROTO(struct nfsd_file *nf),
+	TP_ARGS(nf),
+	TP_STRUCT__entry(
+		__field(unsigned int, nf_hashval)
+		__field(void *, nf_inode)
+		__field(int, nf_ref)
+		__field(unsigned long, nf_flags)
+		__field(unsigned char, nf_may)
+		__field(struct file *, nf_file)
+	),
+	TP_fast_assign(
+		__entry->nf_hashval = nf->nf_hashval;
+		__entry->nf_inode = nf->nf_inode;
+		__entry->nf_ref = atomic_read(&nf->nf_ref);
+		__entry->nf_flags = nf->nf_flags;
+		__entry->nf_may = nf->nf_may;
+		__entry->nf_file = nf->nf_file;
+	),
+	TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p",
+		__entry->nf_hashval,
+		__entry->nf_inode,
+		__entry->nf_ref,
+		show_nf_flags(__entry->nf_flags),
+		show_nf_may(__entry->nf_may),
+		__entry->nf_file)
+)
+
+#define DEFINE_NFSD_FILE_EVENT(name) \
+DEFINE_EVENT(nfsd_file_class, name, \
+	TP_PROTO(struct nfsd_file *nf), \
+	TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
+
+TRACE_EVENT(nfsd_file_acquire,
+	TP_PROTO(struct svc_rqst *rqstp, unsigned int hash,
+		 struct inode *inode, unsigned int may_flags,
+		 struct nfsd_file *nf, __be32 status),
+
+	TP_ARGS(rqstp, hash, inode, may_flags, nf, status),
+
+	TP_STRUCT__entry(
+		__field(__be32, xid)
+		__field(unsigned int, hash)
+		__field(void *, inode)
+		__field(unsigned int, may_flags)
+		__field(int, nf_ref)
+		__field(unsigned long, nf_flags)
+		__field(unsigned char, nf_may)
+		__field(struct file *, nf_file)
+		__field(__be32, status)
+	),
+
+	TP_fast_assign(
+		__entry->xid = rqstp->rq_xid;
+		__entry->hash = hash;
+		__entry->inode = inode;
+		__entry->may_flags = may_flags;
+		__entry->nf_ref = nf ? atomic_read(&nf->nf_ref) : 0;
+		__entry->nf_flags = nf ? nf->nf_flags : 0;
+		__entry->nf_may = nf ? nf->nf_may : 0;
+		__entry->nf_file = nf ? nf->nf_file : NULL;
+		__entry->status = status;
+	),
+
+	TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u",
+			be32_to_cpu(__entry->xid), __entry->hash, __entry->inode,
+			show_nf_may(__entry->may_flags), __entry->nf_ref,
+			show_nf_flags(__entry->nf_flags),
+			show_nf_may(__entry->nf_may), __entry->nf_file,
+			be32_to_cpu(__entry->status))
+);
+
+DECLARE_EVENT_CLASS(nfsd_file_search_class,
+	TP_PROTO(struct inode *inode, unsigned int hash, int found),
+	TP_ARGS(inode, hash, found),
+	TP_STRUCT__entry(
+		__field(struct inode *, inode)
+		__field(unsigned int, hash)
+		__field(int, found)
+	),
+	TP_fast_assign(
+		__entry->inode = inode;
+		__entry->hash = hash;
+		__entry->found = found;
+	),
+	TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash,
+			__entry->inode, __entry->found)
+);
+
+#define DEFINE_NFSD_FILE_SEARCH_EVENT(name)				\
+DEFINE_EVENT(nfsd_file_search_class, name,				\
+	TP_PROTO(struct inode *inode, unsigned int hash, int found),	\
+	TP_ARGS(inode, hash, found))
+
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);
+
+TRACE_EVENT(nfsd_file_fsnotify_handle_event,
+	TP_PROTO(struct inode *inode, u32 mask),
+	TP_ARGS(inode, mask),
+	TP_STRUCT__entry(
+		__field(struct inode *, inode)
+		__field(unsigned int, nlink)
+		__field(umode_t, mode)
+		__field(u32, mask)
+	),
+	TP_fast_assign(
+		__entry->inode = inode;
+		__entry->nlink = inode->i_nlink;
+		__entry->mode = inode->i_mode;
+		__entry->mask = mask;
+	),
+	TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode,
+			__entry->nlink, __entry->mode, __entry->mask)
+);
 #endif /* _NFSD_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 3257c59dc860..bd8b2433a2cb 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -618,7 +618,8 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor
 }
 #endif /* CONFIG_NFSD_V3 */
 
-static int nfsd_open_break_lease(struct inode *inode, int access)
+int
+nfsd_open_break_lease(struct inode *inode, int access)
 {
 	unsigned int mode;
 
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fcfc48cbe136..a877be59d5dd 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -69,6 +69,7 @@ __be32		do_nfsd_create(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
 				loff_t, unsigned long);
 #endif /* CONFIG_NFSD_V3 */
+int		nfsd_open_break_lease(struct inode *, int);
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
 struct raparms;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 13/38] nfsd: keep some rudimentary stats on nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (11 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 12/38] nfsd: add a new struct file caching facility to nfsd Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 14/38] nfsd: allow filecache open to skip fh_verify check Jeff Layton
                   ` (24 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Per chain count and max length, protected by the per-chain spinlock.
When the file is read, we walk the array of buckets and fetch the
count from each.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/filecache.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/filecache.h |  1 +
 fs/nfsd/nfsctl.c    | 10 ++++++++++
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f6c4480c5d78..f6adccc6f740 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -11,6 +11,7 @@
 #include <linux/sched.h>
 #include <linux/list_lru.h>
 #include <linux/fsnotify_backend.h>
+#include <linux/seq_file.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -30,8 +31,12 @@
 struct nfsd_fcache_bucket {
 	struct hlist_head	nfb_head;
 	spinlock_t		nfb_lock;
+	unsigned int		nfb_count;
+	unsigned int		nfb_maxcount;
 };
 
+static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
+
 static struct kmem_cache		*nfsd_file_slab;
 static struct kmem_cache		*nfsd_file_mark_slab;
 static struct nfsd_fcache_bucket	*nfsd_file_hashtbl;
@@ -172,6 +177,7 @@ nfsd_file_unhash(struct nfsd_file *nf)
 
 	trace_nfsd_file_unhash(nf);
 	if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+		--nfsd_file_hashtbl[nf->nf_hashval].nfb_count;
 		clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
 		hlist_del_rcu(&nf->nf_node);
 		list_lru_del(&nfsd_file_lru, &nf->nf_lru);
@@ -562,8 +568,10 @@ retry:
 	rcu_read_lock();
 	nf = nfsd_file_find_locked(inode, may_flags, hashval);
 	rcu_read_unlock();
-	if (nf)
+	if (nf) {
+		this_cpu_inc(nfsd_file_cache_hits);
 		goto wait_for_construction;
+	}
 
 	if (!new) {
 		new = nfsd_file_alloc(inode, may_flags, hashval);
@@ -584,11 +592,15 @@ retry:
 		list_lru_add(&nfsd_file_lru, &new->nf_lru);
 		hlist_add_head_rcu(&new->nf_node,
 				&nfsd_file_hashtbl[hashval].nfb_head);
+		++nfsd_file_hashtbl[hashval].nfb_count;
+		nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
+				nfsd_file_hashtbl[hashval].nfb_count);
 		spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
 		nf = new;
 		new = NULL;
 		goto open_file;
 	}
+	this_cpu_inc(nfsd_file_cache_hits);
 	spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
 
 wait_for_construction:
@@ -666,3 +678,41 @@ open_file:
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
 	goto out;
 }
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
+{
+	unsigned int i, count = 0, longest = 0;
+	unsigned long hits = 0;
+
+	/*
+	 * No need for spinlocks here since we're not terribly interested in
+	 * accuracy. We do take the nfsd_mutex simply to ensure that we
+	 * don't end up racing with server shutdown
+	 */
+	mutex_lock(&nfsd_mutex);
+	if (nfsd_file_hashtbl) {
+		for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+			count += nfsd_file_hashtbl[i].nfb_count;
+			longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
+		}
+	}
+	mutex_unlock(&nfsd_mutex);
+
+	for_each_possible_cpu(i)
+		hits += per_cpu(nfsd_file_cache_hits, i);
+
+	seq_printf(m, "total entries: %u\n", count);
+	seq_printf(m, "longest chain: %u\n", longest);
+	seq_printf(m, "cache hits:    %lu\n", hits);
+	return 0;
+}
+
+int nfsd_file_cache_stats_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, nfsd_file_cache_stats_show, NULL);
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index a045c7fe3655..756aea5431da 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -40,4 +40,5 @@ struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **nfp);
+int	nfsd_file_cache_stats_open(struct inode *, struct file *);
 #endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9690cb4dd588..eff44a277f70 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -22,6 +22,7 @@
 #include "state.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 /*
  *	We have a single directory with several nodes in it.
@@ -36,6 +37,7 @@ enum {
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Pool_Stats,
+	NFSD_File_Cache_Stats,
 	NFSD_Reply_Cache_Stats,
 	NFSD_Versions,
 	NFSD_Ports,
@@ -220,6 +222,13 @@ static const struct file_operations pool_stats_operations = {
 	.owner		= THIS_MODULE,
 };
 
+static struct file_operations file_cache_stats_operations = {
+	.open		= nfsd_file_cache_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static struct file_operations reply_cache_stats_operations = {
 	.open		= nfsd_reply_cache_stats_open,
 	.read		= seq_read,
@@ -1138,6 +1147,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
+		[NFSD_File_Cache_Stats] = {"file_cache_stats", &file_cache_stats_operations, S_IRUGO},
 		[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
 		[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 14/38] nfsd: allow filecache open to skip fh_verify check
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (12 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 13/38] nfsd: keep some rudimentary stats on nfsd_file cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 15/38] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Currently, we call fh_verify twice on the filehandle. Once when we call
into nfsd_file_acquire, and then again from nfsd_open. The second one is
completely superfluous though, and fh_verify can do some things that
require a fair bit of work (checking permissions, for instance).

Create a new nfsd_open_verified function that will do an nfsd_open on a
filehandle that has already been verified. Call that from the filecache
code.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/filecache.c |  3 ++-
 fs/nfsd/vfs.c       | 63 +++++++++++++++++++++++++++++++++++------------------
 fs/nfsd/vfs.h       |  2 ++
 3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f6adccc6f740..79daf2677176 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -672,7 +672,8 @@ open_file:
 	}
 	/* FIXME: should we abort opening if the link count goes to 0? */
 	if (status == nfs_ok)
-		status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+		status = nfsd_open_verified(rqstp, fhp, S_IFREG, may_flags,
+						&nf->nf_file);
 	clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
 	smp_mb__after_atomic();
 	wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index bd8b2433a2cb..67cce7554bb3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -635,9 +635,9 @@ nfsd_open_break_lease(struct inode *inode, int access)
  * and additional flags.
  * N.B. After this call fhp needs an fh_put
  */
-__be32
-nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
-			int may_flags, struct file **filp)
+static __be32
+__nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+	    int may_flags, struct file **filp)
 {
 	struct path	path;
 	struct inode	*inode;
@@ -646,24 +646,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	__be32		err;
 	int		host_err = 0;
 
-	validate_process_creds();
-
-	/*
-	 * If we get here, then the client has already done an "open",
-	 * and (hopefully) checked permission - so allow OWNER_OVERRIDE
-	 * in case a chmod has now revoked permission.
-	 *
-	 * Arguably we should also allow the owner override for
-	 * directories, but we never have and it doesn't seem to have
-	 * caused anyone a problem.  If we were to change this, note
-	 * also that our filldir callbacks would need a variant of
-	 * lookup_one_len that doesn't check permissions.
-	 */
-	if (type == S_IFREG)
-		may_flags |= NFSD_MAY_OWNER_OVERRIDE;
-	err = fh_verify(rqstp, fhp, type, may_flags);
-	if (err)
-		goto out;
+	BUG_ON(!fhp->fh_dentry);
 
 	path.mnt = fhp->fh_export->ex_path.mnt;
 	path.dentry = fhp->fh_dentry;
@@ -718,6 +701,44 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 out_nfserr:
 	err = nfserrno(host_err);
 out:
+	return err;
+}
+
+__be32
+nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+			int may_flags, struct file **filp)
+{
+	__be32 err;
+
+	validate_process_creds();
+	/*
+	 * If we get here, then the client has already done an "open",
+	 * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+	 * in case a chmod has now revoked permission.
+	 *
+	 * Arguably we should also allow the owner override for
+	 * directories, but we never have and it doesn't seem to have
+	 * caused anyone a problem.  If we were to change this, note
+	 * also that our filldir callbacks would need a variant of
+	 * lookup_one_len that doesn't check permissions.
+	 */
+	if (type == S_IFREG)
+		may_flags |= NFSD_MAY_OWNER_OVERRIDE;
+	err = fh_verify(rqstp, fhp, type, may_flags);
+	if (!err)
+		err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+	validate_process_creds();
+	return err;
+}
+
+__be32
+nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+		   int may_flags, struct file **filp)
+{
+	__be32 err;
+
+	validate_process_creds();
+	err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
 	validate_process_creds();
 	return err;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index a877be59d5dd..b3beb896b08d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -72,6 +72,8 @@ __be32		nfsd_commit(struct svc_rqst *, struct svc_fh *,
 int		nfsd_open_break_lease(struct inode *, int);
 __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
+__be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
+				int, struct file **);
 struct raparms;
 __be32		nfsd_splice_read(struct svc_rqst *,
 				struct file *, loff_t, unsigned long *);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 15/38] nfsd: hook up nfsd_write to the new nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (13 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 14/38] nfsd: allow filecache open to skip fh_verify check Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 16/38] nfsd: hook up nfsd_read to the " Jeff Layton
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Note that all callers currently pass in NULL for "file" anyway, so
there was already some dead code in here. Just eliminate that parm
and have it use the file cache instead of dealing directly with a
filp.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs3proc.c |  2 +-
 fs/nfsd/nfsproc.c  |  2 +-
 fs/nfsd/vfs.c      | 34 ++++++++++------------------------
 fs/nfsd/vfs.h      |  2 +-
 4 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7b755b7f785c..4e46ac511479 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,7 +192,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp, struct nfsd3_writeargs *argp,
 
 	fh_copy(&resp->fh, &argp->fh);
 	resp->committed = argp->stable;
-	nfserr = nfsd_write(rqstp, &resp->fh, NULL,
+	nfserr = nfsd_write(rqstp, &resp->fh,
 				   argp->offset,
 				   rqstp->rq_vec, argp->vlen,
 				   &cnt,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 4cd78ef4c95c..9893095cbee1 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -213,7 +213,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, struct nfsd_writeargs *argp,
 		SVCFH_fmt(&argp->fh),
 		argp->len, argp->offset);
 
-	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
+	nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
 				   argp->offset,
 				   rqstp->rq_vec, argp->vlen,
 			           &cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 67cce7554bb3..0f31af897d4c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@
 
 #include "nfsd.h"
 #include "vfs.h"
+#include "filecache.h"
 #include "trace.h"
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
@@ -1032,36 +1033,21 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
  * N.B. After this call fhp needs an fh_put
  */
 __be32
-nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
-		loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt,
-		int *stablep)
+nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
+	   struct kvec *vec, int vlen, unsigned long *cnt, int *stablep)
 {
-	__be32			err = 0;
+	__be32			err;
+	struct nfsd_file	*nf;
 
 	trace_write_start(rqstp, fhp, offset, vlen);
-
-	if (file) {
-		err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
-				NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
-		if (err)
-			goto out;
+	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+	if (err == nfs_ok) {
 		trace_write_opened(rqstp, fhp, offset, vlen);
-		err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
-				stablep);
+		err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+					vlen, cnt, stablep);
 		trace_write_io_done(rqstp, fhp, offset, vlen);
-	} else {
-		err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
-		if (err)
-			goto out;
-
-		trace_write_opened(rqstp, fhp, offset, vlen);
-		if (cnt)
-			err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
-					     cnt, stablep);
-		trace_write_io_done(rqstp, fhp, offset, vlen);
-		fput(file);
+		nfsd_file_put(nf);
 	}
-out:
 	trace_write_done(rqstp, fhp, offset, vlen);
 	return err;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b3beb896b08d..80692e06302d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -81,7 +81,7 @@ __be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
 				unsigned long *);
 __be32 		nfsd_read(struct svc_rqst *, struct svc_fh *,
 				loff_t, struct kvec *, int, unsigned long *);
-__be32 		nfsd_write(struct svc_rqst *, struct svc_fh *,struct file *,
+__be32 		nfsd_write(struct svc_rqst *, struct svc_fh *,
 				loff_t, struct kvec *,int, unsigned long *, int *);
 __be32		nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				struct file *file, loff_t offset,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 16/38] nfsd: hook up nfsd_read to the nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (14 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 15/38] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 17/38] nfsd: hook nfsd_commit up " Jeff Layton
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/vfs.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0f31af897d4c..0873c1355bb1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1003,27 +1003,19 @@ out_nfserr:
 __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
 {
-	struct file *file;
-	struct raparms	*ra;
-	__be32 err;
+	__be32			err;
+	struct nfsd_file	*nf;
 
 	trace_read_start(rqstp, fhp, offset, vlen);
-	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
-	if (err)
-		return err;
-
-	ra = nfsd_init_raparms(file);
-
-	trace_read_opened(rqstp, fhp, offset, vlen);
-	err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
-	trace_read_io_done(rqstp, fhp, offset, vlen);
-
-	if (ra)
-		nfsd_put_raparams(file, ra);
-	fput(file);
-
+	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+	if (err == nfs_ok) {
+		trace_read_opened(rqstp, fhp, offset, vlen);
+		err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
+					count);
+		trace_read_io_done(rqstp, fhp, offset, vlen);
+		nfsd_file_put(nf);
+	}
 	trace_read_done(rqstp, fhp, offset, vlen);
-
 	return err;
 }
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 17/38] nfsd: hook nfsd_commit up to the nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (15 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 16/38] nfsd: hook up nfsd_read to the " Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 18/38] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Use cached filps if possible instead of opening a new one every time.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/vfs.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0873c1355bb1..62e6194d58f8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1058,9 +1058,9 @@ __be32
 nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
                loff_t offset, unsigned long count)
 {
-	struct file	*file;
-	loff_t		end = LLONG_MAX;
-	__be32		err = nfserr_inval;
+	struct nfsd_file	*nf;
+	loff_t			end = LLONG_MAX;
+	__be32			err = nfserr_inval;
 
 	if (offset < 0)
 		goto out;
@@ -1070,12 +1070,12 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = nfsd_open(rqstp, fhp, S_IFREG,
-			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &file);
+	err = nfsd_file_acquire(rqstp, fhp,
+			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
 	if (err)
 		goto out;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		int err2 = vfs_fsync_range(file, offset, end, 0);
+		int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
 
 		if (err2 != -EINVAL)
 			err = nfserrno(err2);
@@ -1083,7 +1083,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			err = nfserr_notsupp;
 	}
 
-	fput(file);
+	nfsd_file_put(nf);
 out:
 	return err;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 18/38] nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (16 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 17/38] nfsd: hook nfsd_commit up " Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 19/38] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 23 ++++++++++++-----------
 fs/nfsd/state.h     |  2 +-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6b800b5b8fed..e7adb6b60005 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@
 
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #define NFSDDBG_FACILITY                NFSDDBG_PROC
 
@@ -292,7 +293,7 @@ static struct file *
 __nfs4_get_fd(struct nfs4_file *f, int oflag)
 {
 	if (f->fi_fds[oflag])
-		return get_file(f->fi_fds[oflag]);
+		return get_file(f->fi_fds[oflag]->nf_file);
 	return NULL;
 }
 
@@ -449,17 +450,17 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
 	might_lock(&fp->fi_lock);
 
 	if (atomic_dec_and_lock(&fp->fi_access[oflag], &fp->fi_lock)) {
-		struct file *f1 = NULL;
-		struct file *f2 = NULL;
+		struct nfsd_file *f1 = NULL;
+		struct nfsd_file *f2 = NULL;
 
 		swap(f1, fp->fi_fds[oflag]);
 		if (atomic_read(&fp->fi_access[1 - oflag]) == 0)
 			swap(f2, fp->fi_fds[O_RDWR]);
 		spin_unlock(&fp->fi_lock);
 		if (f1)
-			fput(f1);
+			nfsd_file_put(f1);
 		if (f2)
-			fput(f2);
+			nfsd_file_put(f2);
 	}
 }
 
@@ -3915,7 +3916,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 		struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp,
 		struct nfsd4_open *open)
 {
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	__be32 status;
 	int oflag = nfs4_access_to_omode(open->op_share_access);
 	int access = nfs4_access_to_access(open->op_share_access);
@@ -3951,18 +3952,18 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
 
 	if (!fp->fi_fds[oflag]) {
 		spin_unlock(&fp->fi_lock);
-		status = nfsd_open(rqstp, cur_fh, S_IFREG, access, &filp);
+		status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
 		if (status)
 			goto out_put_access;
 		spin_lock(&fp->fi_lock);
 		if (!fp->fi_fds[oflag]) {
-			fp->fi_fds[oflag] = filp;
-			filp = NULL;
+			fp->fi_fds[oflag] = nf;
+			nf = NULL;
 		}
 	}
 	spin_unlock(&fp->fi_lock);
-	if (filp)
-		fput(filp);
+	if (nf)
+		nfsd_file_put(nf);
 
 	status = nfsd4_truncate(rqstp, cur_fh, open);
 	if (status)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 77fdf4de91ba..473faa436e07 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -491,7 +491,7 @@ struct nfs4_file {
 	};
 	struct list_head	fi_clnt_odstate;
 	/* One each for O_RDONLY, O_WRONLY, O_RDWR: */
-	struct file *		fi_fds[3];
+	struct nfsd_file	*fi_fds[3];
 	/*
 	 * Each open or lock stateid contributes 0-4 to the counts
 	 * below depending on which bits are set in st_access_bitmap:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 19/38] nfsd: have nfsd_test_lock use the nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (17 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 18/38] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 20/38] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4state.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e7adb6b60005..5007095d0228 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5707,11 +5707,11 @@ out:
  */
 static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
 {
-	struct file *file;
-	__be32 err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+	struct nfsd_file *nf;
+	__be32 err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (!err) {
-		err = nfserrno(vfs_test_lock(file, lock));
-		fput(file);
+		err = nfserrno(vfs_test_lock(nf->nf_file, lock));
+		nfsd_file_put(nf);
 	}
 	return err;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 20/38] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (18 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 19/38] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 21/38] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Jeff Layton
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Have them keep an nfsd_file reference instead of a struct file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4layouts.c |  12 ++---
 fs/nfsd/nfs4state.c   | 131 ++++++++++++++++++++++++++------------------------
 fs/nfsd/state.h       |   6 +--
 3 files changed, 76 insertions(+), 73 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 9ffef06b30d5..1ee1881cf8d0 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -144,8 +144,8 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
 	list_del_init(&ls->ls_perfile);
 	spin_unlock(&fp->fi_lock);
 
-	vfs_setlease(ls->ls_file, F_UNLCK, NULL, (void **)&ls);
-	fput(ls->ls_file);
+	vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+	nfsd_file_put(ls->ls_file);
 
 	if (ls->ls_recalled)
 		atomic_dec(&ls->ls_stid.sc_file->fi_lo_recalls);
@@ -169,7 +169,7 @@ nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_owner = ls;
 	fl->fl_pid = current->tgid;
-	fl->fl_file = ls->ls_file;
+	fl->fl_file = ls->ls_file->nf_file;
 
 	status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
 	if (status) {
@@ -207,13 +207,13 @@ nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
 			NFSPROC4_CLNT_CB_LAYOUT);
 
 	if (parent->sc_type == NFS4_DELEG_STID)
-		ls->ls_file = get_file(fp->fi_deleg_file);
+		ls->ls_file = nfsd_file_get(fp->fi_deleg_file);
 	else
 		ls->ls_file = find_any_file(fp);
 	BUG_ON(!ls->ls_file);
 
 	if (nfsd4_layout_setlease(ls)) {
-		fput(ls->ls_file);
+		nfsd_file_put(ls->ls_file);
 		put_nfs4_file(fp);
 		kmem_cache_free(nfs4_layout_stateid_cache, ls);
 		return NULL;
@@ -598,7 +598,7 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
 
 	argv[0] = "/sbin/nfsd-recall-failed";
 	argv[1] = addr_str;
-	argv[2] = ls->ls_file->f_path.mnt->mnt_sb->s_id;
+	argv[2] = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_id;
 	argv[3] = NULL;
 
 	error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5007095d0228..420bc56923fa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -289,18 +289,18 @@ put_nfs4_file(struct nfs4_file *fi)
 	}
 }
 
-static struct file *
+static struct nfsd_file *
 __nfs4_get_fd(struct nfs4_file *f, int oflag)
 {
 	if (f->fi_fds[oflag])
-		return get_file(f->fi_fds[oflag]->nf_file);
+		return nfsd_file_get(f->fi_fds[oflag]);
 	return NULL;
 }
 
-static struct file *
+static struct nfsd_file *
 find_writeable_file_locked(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	lockdep_assert_held(&f->fi_lock);
 
@@ -310,10 +310,10 @@ find_writeable_file_locked(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *
+static struct nfsd_file *
 find_writeable_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = find_writeable_file_locked(f);
@@ -322,9 +322,10 @@ find_writeable_file(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *find_readable_file_locked(struct nfs4_file *f)
+static struct nfsd_file *
+find_readable_file_locked(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	lockdep_assert_held(&f->fi_lock);
 
@@ -334,10 +335,10 @@ static struct file *find_readable_file_locked(struct nfs4_file *f)
 	return ret;
 }
 
-static struct file *
+static struct nfsd_file *
 find_readable_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = find_readable_file_locked(f);
@@ -346,10 +347,10 @@ find_readable_file(struct nfs4_file *f)
 	return ret;
 }
 
-struct file *
+struct nfsd_file *
 find_any_file(struct nfs4_file *f)
 {
-	struct file *ret;
+	struct nfsd_file *ret;
 
 	spin_lock(&f->fi_lock);
 	ret = __nfs4_get_fd(f, O_RDWR);
@@ -761,16 +762,16 @@ nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid)
 
 static void nfs4_put_deleg_lease(struct nfs4_file *fp)
 {
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 
 	spin_lock(&fp->fi_lock);
 	if (fp->fi_deleg_file && --fp->fi_delegees == 0)
-		swap(filp, fp->fi_deleg_file);
+		swap(nf, fp->fi_deleg_file);
 	spin_unlock(&fp->fi_lock);
 
-	if (filp) {
-		vfs_setlease(filp, F_UNLCK, NULL, (void **)&fp);
-		fput(filp);
+	if (nf) {
+		vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&fp);
+		nfsd_file_put(nf);
 	}
 }
 
@@ -1114,11 +1115,14 @@ static void nfs4_free_lock_stateid(struct nfs4_stid *stid)
 {
 	struct nfs4_ol_stateid *stp = openlockstateid(stid);
 	struct nfs4_lockowner *lo = lockowner(stp->st_stateowner);
-	struct file *file;
+	struct nfsd_file *nf;
 
-	file = find_any_file(stp->st_stid.sc_file);
-	if (file)
-		filp_close(file, (fl_owner_t)lo);
+	nf = find_any_file(stp->st_stid.sc_file);
+	if (nf) {
+		get_file(nf->nf_file);
+		filp_close(nf->nf_file, (fl_owner_t)lo);
+		nfsd_file_put(nf);
+	}
 	nfs4_free_ol_stateid(stid);
 }
 
@@ -4050,21 +4054,21 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
 {
 	struct nfs4_file *fp = dp->dl_stid.sc_file;
 	struct file_lock *fl;
-	struct file *filp;
+	struct nfsd_file *nf;
 	int status = 0;
 
 	fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
 	if (!fl)
 		return -ENOMEM;
-	filp = find_readable_file(fp);
-	if (!filp) {
+	nf = find_readable_file(fp);
+	if (!nf) {
 		/* We should always have a readable file here */
 		WARN_ON_ONCE(1);
 		locks_free_lock(fl);
 		return -EBADF;
 	}
-	fl->fl_file = filp;
-	status = vfs_setlease(filp, fl->fl_type, &fl, NULL);
+	fl->fl_file = nf->nf_file;
+	status = vfs_setlease(nf->nf_file, fl->fl_type, &fl, NULL);
 	if (fl)
 		locks_free_lock(fl);
 	if (status)
@@ -4080,7 +4084,7 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
 		status = hash_delegation_locked(dp, fp);
 		goto out_unlock;
 	}
-	fp->fi_deleg_file = filp;
+	fp->fi_deleg_file = nf;
 	fp->fi_delegees = 0;
 	status = hash_delegation_locked(dp, fp);
 	spin_unlock(&fp->fi_lock);
@@ -4095,7 +4099,7 @@ out_unlock:
 	spin_unlock(&fp->fi_lock);
 	spin_unlock(&state_lock);
 out_fput:
-	fput(filp);
+	nfsd_file_put(nf);
 	return status;
 }
 
@@ -4729,7 +4733,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 	return nfs_ok;
 }
 
-static struct file *
+static struct nfsd_file *
 nfs4_find_file(struct nfs4_stid *s, int flags)
 {
 	if (!s)
@@ -4739,7 +4743,7 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
 	case NFS4_DELEG_STID:
 		if (WARN_ON_ONCE(!s->sc_file->fi_deleg_file))
 			return NULL;
-		return get_file(s->sc_file->fi_deleg_file);
+		return nfsd_file_get(s->sc_file->fi_deleg_file);
 	case NFS4_OPEN_STID:
 	case NFS4_LOCK_STID:
 		if (flags & RD_STATE)
@@ -4768,21 +4772,17 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 		struct file **filpp, bool *tmp_file, int flags)
 {
 	int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
-	struct file *file;
+	struct nfsd_file *nf;
 	__be32 status;
 
-	file = nfs4_find_file(s, flags);
-	if (file) {
+	nf = nfs4_find_file(s, flags);
+	if (nf) {
 		status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
 				acc | NFSD_MAY_OWNER_OVERRIDE);
-		if (status) {
-			fput(file);
-			return status;
-		}
-
-		*filpp = file;
+		if (status)
+			goto out;
 	} else {
-		status = nfsd_open(rqstp, fhp, S_IFREG, acc, filpp);
+		status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
 		if (status)
 			return status;
 
@@ -4790,7 +4790,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 			*tmp_file = true;
 	}
 
-	return 0;
+	*filpp = get_file(nf->nf_file);
+out:
+	nfsd_file_put(nf);
+	return status;
 }
 
 /*
@@ -5524,7 +5527,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	struct nfs4_ol_stateid *lock_stp = NULL;
 	struct nfs4_ol_stateid *open_stp = NULL;
 	struct nfs4_file *fp;
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	struct file_lock *file_lock = NULL;
 	struct file_lock *conflock = NULL;
 	__be32 status = 0;
@@ -5609,8 +5612,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		case NFS4_READ_LT:
 		case NFS4_READW_LT:
 			spin_lock(&fp->fi_lock);
-			filp = find_readable_file_locked(fp);
-			if (filp)
+			nf = find_readable_file_locked(fp);
+			if (nf)
 				get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
 			spin_unlock(&fp->fi_lock);
 			file_lock->fl_type = F_RDLCK;
@@ -5618,8 +5621,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		case NFS4_WRITE_LT:
 		case NFS4_WRITEW_LT:
 			spin_lock(&fp->fi_lock);
-			filp = find_writeable_file_locked(fp);
-			if (filp)
+			nf = find_writeable_file_locked(fp);
+			if (nf)
 				get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
 			spin_unlock(&fp->fi_lock);
 			file_lock->fl_type = F_WRLCK;
@@ -5628,14 +5631,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 			status = nfserr_inval;
 		goto out;
 	}
-	if (!filp) {
+	if (!nf) {
 		status = nfserr_openmode;
 		goto out;
 	}
 
 	file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
 	file_lock->fl_pid = current->tgid;
-	file_lock->fl_file = filp;
+	file_lock->fl_file = nf->nf_file;
 	file_lock->fl_flags = FL_POSIX;
 	file_lock->fl_lmops = &nfsd_posix_mng_ops;
 	file_lock->fl_start = lock->lk_offset;
@@ -5649,7 +5652,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		goto out;
 	}
 
-	err = vfs_lock_file(filp, F_SETLK, file_lock, conflock);
+	err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
 	switch (-err) {
 	case 0: /* success! */
 		nfs4_inc_and_copy_stateid(&lock->lk_resp_stateid, &lock_stp->st_stid);
@@ -5669,8 +5672,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		break;
 	}
 out:
-	if (filp)
-		fput(filp);
+	if (nf)
+		nfsd_file_put(nf);
 	if (lock_stp) {
 		/* Bump seqid manually if the 4.0 replay owner is openowner */
 		if (cstate->replay_owner &&
@@ -5797,7 +5800,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	    struct nfsd4_locku *locku)
 {
 	struct nfs4_ol_stateid *stp;
-	struct file *filp = NULL;
+	struct nfsd_file *nf;
 	struct file_lock *file_lock = NULL;
 	__be32 status;
 	int err;
@@ -5815,8 +5818,8 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 					&stp, nn);
 	if (status)
 		goto out;
-	filp = find_any_file(stp->st_stid.sc_file);
-	if (!filp) {
+	nf = find_any_file(stp->st_stid.sc_file);
+	if (!nf) {
 		status = nfserr_lock_range;
 		goto put_stateid;
 	}
@@ -5824,13 +5827,13 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (!file_lock) {
 		dprintk("NFSD: %s: unable to allocate lock!\n", __func__);
 		status = nfserr_jukebox;
-		goto fput;
+		goto put_file;
 	}
 
 	file_lock->fl_type = F_UNLCK;
 	file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
 	file_lock->fl_pid = current->tgid;
-	file_lock->fl_file = filp;
+	file_lock->fl_file = nf->nf_file;
 	file_lock->fl_flags = FL_POSIX;
 	file_lock->fl_lmops = &nfsd_posix_mng_ops;
 	file_lock->fl_start = locku->lu_offset;
@@ -5839,14 +5842,14 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 						locku->lu_length);
 	nfs4_transform_lock_offset(file_lock);
 
-	err = vfs_lock_file(filp, F_SETLK, file_lock, NULL);
+	err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, NULL);
 	if (err) {
 		dprintk("NFSD: nfs4_locku: vfs_lock_file failed!\n");
 		goto out_nfserr;
 	}
 	nfs4_inc_and_copy_stateid(&locku->lu_stateid, &stp->st_stid);
-fput:
-	fput(filp);
+put_file:
+	nfsd_file_put(nf);
 put_stateid:
 	up_write(&stp->st_rwsem);
 	nfs4_put_stid(&stp->st_stid);
@@ -5858,7 +5861,7 @@ out:
 
 out_nfserr:
 	status = nfserrno(err);
-	goto fput;
+	goto put_file;
 }
 
 /*
@@ -5871,17 +5874,17 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 {
 	struct file_lock *fl;
 	int status = false;
-	struct file *filp = find_any_file(fp);
+	struct nfsd_file *nf = find_any_file(fp);
 	struct inode *inode;
 	struct file_lock_context *flctx;
 
-	if (!filp) {
+	if (!nf) {
 		/* Any valid lock stateid should have some sort of access */
 		WARN_ON_ONCE(1);
 		return status;
 	}
 
-	inode = file_inode(filp);
+	inode = file_inode(nf->nf_file);
 	flctx = inode->i_flctx;
 
 	if (flctx && !list_empty_careful(&flctx->flc_posix)) {
@@ -5894,7 +5897,7 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 		}
 		spin_unlock(&flctx->flc_lock);
 	}
-	fput(filp);
+	nfsd_file_put(nf);
 	return status;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 473faa436e07..ee23de10663c 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -501,7 +501,7 @@ struct nfs4_file {
 	 */
 	atomic_t		fi_access[2];
 	u32			fi_share_deny;
-	struct file		*fi_deleg_file;
+	struct nfsd_file	*fi_deleg_file;
 	int			fi_delegees;
 	struct knfsd_fh		fi_fhandle;
 	bool			fi_had_conflict;
@@ -550,7 +550,7 @@ struct nfs4_layout_stateid {
 	spinlock_t			ls_lock;
 	struct list_head		ls_layouts;
 	u32				ls_layout_type;
-	struct file			*ls_file;
+	struct nfsd_file		*ls_file;
 	struct nfsd4_callback		ls_recall;
 	stateid_t			ls_recall_sid;
 	bool				ls_recalled;
@@ -615,7 +615,7 @@ static inline void get_nfs4_file(struct nfs4_file *fi)
 {
 	atomic_inc(&fi->fi_ref);
 }
-struct file *find_any_file(struct nfs4_file *f);
+struct nfsd_file *find_any_file(struct nfs4_file *f);
 
 /* grace period management */
 void nfsd4_end_grace(struct nfsd_net *nn);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 21/38] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (19 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 20/38] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 22/38] nfsd: rip out the raparms cache Jeff Layton
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs4proc.c  | 32 ++++++++++++++++----------------
 fs/nfsd/nfs4state.c | 24 ++++++++++--------------
 fs/nfsd/nfs4xdr.c   | 16 +++++-----------
 fs/nfsd/state.h     |  2 +-
 fs/nfsd/xdr4.h      | 15 +++++++--------
 5 files changed, 39 insertions(+), 50 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a9f096c7e99f..7e21763a35f2 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -758,7 +758,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	__be32 status;
 
-	read->rd_filp = NULL;
+	read->rd_nf = NULL;
 	if (read->rd_offset >= OFFSET_MAX)
 		return nfserr_inval;
 
@@ -775,7 +775,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	/* check stateid */
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
-			RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+			RD_STATE, &read->rd_nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
 		goto out;
@@ -921,7 +921,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
 		status = nfs4_preprocess_stateid_op(rqstp, cstate,
-			&setattr->sa_stateid, WR_STATE, NULL, NULL);
+			&setattr->sa_stateid, WR_STATE, NULL);
 		if (status) {
 			dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
 			return status;
@@ -977,7 +977,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	    struct nfsd4_write *write)
 {
 	stateid_t *stateid = &write->wr_stateid;
-	struct file *filp = NULL;
+	struct nfsd_file *nf = NULL;
 	__be32 status = nfs_ok;
 	unsigned long cnt;
 	int nvecs;
@@ -986,7 +986,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		return nfserr_inval;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
-			&filp, NULL);
+			&nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
 		return status;
@@ -999,10 +999,10 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	nvecs = fill_in_write_vector(rqstp->rq_vec, write);
 	WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));
 
-	status = nfsd_vfs_write(rqstp, &cstate->current_fh, filp,
+	status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf->nf_file,
 				write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
 				&write->wr_how_written);
-	fput(filp);
+	nfsd_file_put(nf);
 
 	write->wr_bytes_written = cnt;
 
@@ -1014,21 +1014,21 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		struct nfsd4_fallocate *fallocate, int flags)
 {
 	__be32 status = nfserr_notsupp;
-	struct file *file;
+	struct nfsd_file *nf;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate,
 					    &fallocate->falloc_stateid,
-					    WR_STATE, &file, NULL);
+					    WR_STATE, &nf);
 	if (status != nfs_ok) {
 		dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
 		return status;
 	}
 
-	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, file,
+	status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
 				     fallocate->falloc_offset,
 				     fallocate->falloc_length,
 				     flags);
-	fput(file);
+	nfsd_file_put(nf);
 	return status;
 }
 
@@ -1053,11 +1053,11 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 {
 	int whence;
 	__be32 status;
-	struct file *file;
+	struct nfsd_file *nf;
 
 	status = nfs4_preprocess_stateid_op(rqstp, cstate,
 					    &seek->seek_stateid,
-					    RD_STATE, &file, NULL);
+					    RD_STATE, &nf);
 	if (status) {
 		dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
 		return status;
@@ -1079,14 +1079,14 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	 * Note:  This call does change file->f_pos, but nothing in NFSD
 	 *        should ever file->f_pos.
 	 */
-	seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
+	seek->seek_pos = vfs_llseek(nf->nf_file, seek->seek_offset, whence);
 	if (seek->seek_pos < 0)
 		status = nfserrno(seek->seek_pos);
-	else if (seek->seek_pos >= i_size_read(file_inode(file)))
+	else if (seek->seek_pos >= i_size_read(file_inode(nf->nf_file)))
 		seek->seek_eof = true;
 
 out:
-	fput(file);
+	nfsd_file_put(nf);
 	return status;
 }
 
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 420bc56923fa..a64138620975 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4769,7 +4769,7 @@ nfs4_check_olstateid(struct svc_fh *fhp, struct nfs4_ol_stateid *ols, int flags)
 
 static __be32
 nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
-		struct file **filpp, bool *tmp_file, int flags)
+		struct nfsd_file **nfp, int flags)
 {
 	int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
 	struct nfsd_file *nf;
@@ -4779,20 +4779,18 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
 	if (nf) {
 		status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
 				acc | NFSD_MAY_OWNER_OVERRIDE);
-		if (status)
+		if (status) {
+			nfsd_file_put(nf);
 			goto out;
+		}
 	} else {
 		status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
 		if (status)
 			return status;
-
-		if (tmp_file)
-			*tmp_file = true;
 	}
 
-	*filpp = get_file(nf->nf_file);
+	*nfp = nf;
 out:
-	nfsd_file_put(nf);
 	return status;
 }
 
@@ -4802,7 +4800,7 @@ out:
 __be32
 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filpp, bool *tmp_file)
+		int flags, struct nfsd_file **nfp)
 {
 	struct svc_fh *fhp = &cstate->current_fh;
 	struct inode *ino = d_inode(fhp->fh_dentry);
@@ -4811,10 +4809,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 	struct nfs4_stid *s = NULL;
 	__be32 status;
 
-	if (filpp)
-		*filpp = NULL;
-	if (tmp_file)
-		*tmp_file = false;
+	if (nfp)
+		*nfp = NULL;
 
 	if (grace_disallows_io(net, ino))
 		return nfserr_grace;
@@ -4851,8 +4847,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 	status = nfs4_check_fh(fhp, s);
 
 done:
-	if (!status && filpp)
-		status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
+	if (status == nfs_ok && nfp)
+		status = nfs4_check_file(rqstp, fhp, s, nfp, flags);
 out:
 	if (s)
 		nfs4_put_stid(s);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 51c9e9ca39a4..fc9f805b2b58 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -49,6 +49,7 @@
 #include "cache.h"
 #include "netns.h"
 #include "pnfs.h"
+#include "filecache.h"
 
 #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
 #include <linux/security.h>
@@ -3460,14 +3461,14 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 {
 	unsigned long maxcount;
 	struct xdr_stream *xdr = &resp->xdr;
-	struct file *file = read->rd_filp;
+	struct file *file;
 	int starting_len = xdr->buf->len;
-	struct raparms *ra = NULL;
 	__be32 *p;
 
 	if (nfserr)
 		goto out;
 
+	file = read->rd_nf->nf_file;
 	p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
 	if (!p) {
 		WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags));
@@ -3487,24 +3488,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
 			 (xdr->buf->buflen - xdr->buf->len));
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
 
-	if (read->rd_tmp_file)
-		ra = nfsd_init_raparms(file);
-
 	if (file->f_op->splice_read &&
 	    test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
 		nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
 	else
 		nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
 
-	if (ra)
-		nfsd_put_raparams(file, ra);
-
 	if (nfserr)
 		xdr_truncate_encode(xdr, starting_len);
-
 out:
-	if (file)
-		fput(file);
+	if (read->rd_nf)
+		nfsd_file_put(read->rd_nf);
 	return nfserr;
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ee23de10663c..732fbb3b5ef1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -579,7 +579,7 @@ struct nfsd_net;
 
 extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
 		struct nfsd4_compound_state *cstate, stateid_t *stateid,
-		int flags, struct file **filp, bool *tmp_file);
+		int flags, struct nfsd_file **filp);
 __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
 		     stateid_t *stateid, unsigned char typemask,
 		     struct nfs4_stid **s, struct nfsd_net *nn);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ce7362c88b48..c167d7a5b0e6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -268,15 +268,14 @@ struct nfsd4_open_downgrade {
 
 
 struct nfsd4_read {
-	stateid_t	rd_stateid;         /* request */
-	u64		rd_offset;          /* request */
-	u32		rd_length;          /* request */
-	int		rd_vlen;
-	struct file     *rd_filp;
-	bool		rd_tmp_file;
+	stateid_t		rd_stateid;         /* request */
+	u64			rd_offset;          /* request */
+	u32			rd_length;          /* request */
+	int			rd_vlen;
+	struct nfsd_file	*rd_nf;
 	
-	struct svc_rqst *rd_rqstp;          /* response */
-	struct svc_fh * rd_fhp;             /* response */
+	struct svc_rqst		*rd_rqstp;          /* response */
+	struct svc_fh		*rd_fhp;             /* response */
 };
 
 struct nfsd4_readdir {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 22/38] nfsd: rip out the raparms cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (20 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 21/38] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 23/38] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Jeff Layton
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Nothing uses it anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfssvc.c |  14 +-----
 fs/nfsd/vfs.c    | 147 -------------------------------------------------------
 fs/nfsd/vfs.h    |   6 ---
 3 files changed, 1 insertion(+), 166 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d816bb3faa6e..d1034d119afb 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -216,18 +216,9 @@ static int nfsd_startup_generic(int nrservs)
 	if (nfsd_users++)
 		return 0;
 
-	/*
-	 * Readahead param cache - will no-op if it already exists.
-	 * (Note therefore results will be suboptimal if number of
-	 * threads is modified after nfsd start.)
-	 */
-	ret = nfsd_racache_init(2*nrservs);
-	if (ret)
-		goto dec_users;
-
 	ret = nfsd_file_cache_init();
 	if (ret)
-		goto out_racache;
+		goto dec_users;
 
 	ret = nfs4_state_start();
 	if (ret)
@@ -236,8 +227,6 @@ static int nfsd_startup_generic(int nrservs)
 
 out_file_cache:
 	nfsd_file_cache_shutdown();
-out_racache:
-	nfsd_racache_shutdown();
 dec_users:
 	nfsd_users--;
 	return ret;
@@ -250,7 +239,6 @@ static void nfsd_shutdown_generic(void)
 
 	nfs4_state_shutdown();
 	nfsd_file_cache_shutdown();
-	nfsd_racache_shutdown();
 }
 
 static bool nfsd_needs_lockd(void)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 62e6194d58f8..0b5072632ae5 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -47,34 +47,6 @@
 
 #define NFSDDBG_FACILITY		NFSDDBG_FILEOP
 
-
-/*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
- * If you increase the number of cached files very much, you'll need to
- * add a hash table here.
- */
-struct raparms {
-	struct raparms		*p_next;
-	unsigned int		p_count;
-	ino_t			p_ino;
-	dev_t			p_dev;
-	int			p_set;
-	struct file_ra_state	p_ra;
-	unsigned int		p_hindex;
-};
-
-struct raparm_hbucket {
-	struct raparms		*pb_head;
-	spinlock_t		pb_lock;
-} ____cacheline_aligned_in_smp;
-
-#define RAPARM_HASH_BITS	4
-#define RAPARM_HASH_SIZE	(1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK	(RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket	raparm_hash[RAPARM_HASH_SIZE];
-
 /* 
  * Called from nfsd_lookup and encode_dirent. Check if we have crossed 
  * a mount point.
@@ -744,65 +716,6 @@ nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	return err;
 }
 
-struct raparms *
-nfsd_init_raparms(struct file *file)
-{
-	struct inode *inode = file_inode(file);
-	dev_t dev = inode->i_sb->s_dev;
-	ino_t ino = inode->i_ino;
-	struct raparms	*ra, **rap, **frap = NULL;
-	int depth = 0;
-	unsigned int hash;
-	struct raparm_hbucket *rab;
-
-	hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
-	rab = &raparm_hash[hash];
-
-	spin_lock(&rab->pb_lock);
-	for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
-		if (ra->p_ino == ino && ra->p_dev == dev)
-			goto found;
-		depth++;
-		if (ra->p_count == 0)
-			frap = rap;
-	}
-	depth = nfsdstats.ra_size;
-	if (!frap) {	
-		spin_unlock(&rab->pb_lock);
-		return NULL;
-	}
-	rap = frap;
-	ra = *frap;
-	ra->p_dev = dev;
-	ra->p_ino = ino;
-	ra->p_set = 0;
-	ra->p_hindex = hash;
-found:
-	if (rap != &rab->pb_head) {
-		*rap = ra->p_next;
-		ra->p_next   = rab->pb_head;
-		rab->pb_head = ra;
-	}
-	ra->p_count++;
-	nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
-	spin_unlock(&rab->pb_lock);
-
-	if (ra->p_set)
-		file->f_ra = ra->p_ra;
-	return ra;
-}
-
-void nfsd_put_raparams(struct file *file, struct raparms *ra)
-{
-	struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
-
-	spin_lock(&rab->pb_lock);
-	ra->p_ra = file->f_ra;
-	ra->p_set = 1;
-	ra->p_count--;
-	spin_unlock(&rab->pb_lock);
-}
-
 /*
  * Grab and keep cached pages associated with a file in the svc_rqst
  * so that they can be passed to the network sendmsg/sendpage routines
@@ -2031,63 +1944,3 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,
 
 	return err? nfserrno(err) : 0;
 }
-
-void
-nfsd_racache_shutdown(void)
-{
-	struct raparms *raparm, *last_raparm;
-	unsigned int i;
-
-	dprintk("nfsd: freeing readahead buffers.\n");
-
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		raparm = raparm_hash[i].pb_head;
-		while(raparm) {
-			last_raparm = raparm;
-			raparm = raparm->p_next;
-			kfree(last_raparm);
-		}
-		raparm_hash[i].pb_head = NULL;
-	}
-}
-/*
- * Initialize readahead param cache
- */
-int
-nfsd_racache_init(int cache_size)
-{
-	int	i;
-	int	j = 0;
-	int	nperbucket;
-	struct raparms **raparm = NULL;
-
-
-	if (raparm_hash[0].pb_head)
-		return 0;
-	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
-	nperbucket = max(2, nperbucket);
-	cache_size = nperbucket * RAPARM_HASH_SIZE;
-
-	dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
-
-	for (i = 0; i < RAPARM_HASH_SIZE; i++) {
-		spin_lock_init(&raparm_hash[i].pb_lock);
-
-		raparm = &raparm_hash[i].pb_head;
-		for (j = 0; j < nperbucket; j++) {
-			*raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
-			if (!*raparm)
-				goto out_nomem;
-			raparm = &(*raparm)->p_next;
-		}
-		*raparm = NULL;
-	}
-
-	nfsdstats.ra_size = cache_size;
-	return 0;
-
-out_nomem:
-	dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
-	nfsd_racache_shutdown();
-	return -ENOMEM;
-}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 80692e06302d..1efccde4bf9a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -39,8 +39,6 @@
 typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);
 
 /* nfsd/vfs.c */
-int		nfsd_racache_init(int);
-void		nfsd_racache_shutdown(void);
 int		nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		                struct svc_export **expp);
 __be32		nfsd_lookup(struct svc_rqst *, struct svc_fh *,
@@ -74,7 +72,6 @@ __be32		nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
 __be32		nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
 				int, struct file **);
-struct raparms;
 __be32		nfsd_splice_read(struct svc_rqst *,
 				struct file *, loff_t, unsigned long *);
 __be32		nfsd_readv(struct file *, loff_t, struct kvec *, int,
@@ -107,9 +104,6 @@ __be32		nfsd_statfs(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_permission(struct svc_rqst *, struct svc_export *,
 				struct dentry *, int);
 
-struct raparms *nfsd_init_raparms(struct file *file);
-void		nfsd_put_raparams(struct file *file, struct raparms *ra);
-
 static inline int fh_want_write(struct svc_fh *fh)
 {
 	int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 23/38] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (21 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 22/38] nfsd: rip out the raparms cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 24/38] nfsd: allow lockd to be forcibly disabled Jeff Layton
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

With NFSv3 nfsd will always attempt to send along WCC data to the
client. This generally involves saving off the in-core inode information
prior to doing the operation on the given filehandle, and then issuing a
vfs_getattr to it after the op.

Some filesystems (particularly clustered or networked ones) have an
expensive ->getattr inode operation. Atomicitiy is also often difficult
or impossible to guarantee on such filesystems. For those, we're best
off not trying to provide WCC information to the client at all, and to
simply allow it to poll for that information as needed with a GETATTR
RPC.

This patch adds a new flags field to struct export_operations, and
defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
that nfsd should not attempt to provide WCC info in NFSv3 replies. It
also adds a blurb about the new flags field and flag to the exporting
documentation.

The server will also now skip collecting this information for NFSv2 as
well, since that info is never used there anyway.

Note that this patch does not add this flag to any filesystem
export_operations structures. This was originally developed to allow
reexporting nfs via nfsd. That code is not (and may never be) suitable
for merging into mainline.

Other filesystems may want to consider enabling this flag too. It's hard
to tell however which ones have export operations to enable export via
knfsd and which ones mostly rely on them for open-by-filehandle support,
so I'm leaving that up to the individual maintainers to decide. I am
cc'ing the relevant lists for those filesystems that I think may want to
consider adding this though.

Cc: HPDD-discuss@lists.01.org
Cc: ceph-devel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: fuse-devel@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
 fs/nfsd/nfs3xdr.c                       |  5 ++++-
 fs/nfsd/nfsfh.c                         | 14 ++++++++++++++
 fs/nfsd/nfsfh.h                         |  5 ++++-
 include/linux/exportfs.h                |  2 ++
 5 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
index 520a4becb75c..fa636cde3907 100644
--- a/Documentation/filesystems/nfs/Exporting
+++ b/Documentation/filesystems/nfs/Exporting
@@ -138,6 +138,11 @@ struct which has the following members:
     to find potential names, and matches inode numbers to find the correct
     match.
 
+  flags
+    Some filesystems may need to be handled differently than others. The
+    export_operations struct also includes a flags field that allows the
+    filesystem to communicate such information to nfsd. See the Export
+    Operations Flags section below for more explanation.
 
 A filehandle fragment consists of an array of 1 or more 4byte words,
 together with a one byte "type".
@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have been padded with
 nuls.  Rather, the encode_fh routine should choose a "type" which
 indicates the decode_fh how much of the filehandle is valid, and how
 it should be interpreted.
+
+Export Operations Flags
+-----------------------
+In addition to the operation vector pointers, struct export_operations also
+contains a "flags" field that allows the filesystem to communicate to nfsd
+that it may want to do things differently when dealing with it. The
+following flags are defined:
+
+  EXPORT_OP_NOWCC
+    RFC 1813 recommends that servers always send weak cache consistency
+    (WCC) data to the client after each operation. The server should
+    atomically collect attributes about the inode, do an operation on it,
+    and then collect the attributes afterward. This allows the client to
+    skip issuing GETATTRs in some situations but means that the server
+    is calling vfs_getattr for almost all RPCs. On some filesystems
+    (particularly those that are clustered or networked) this is expensive
+    and atomicity is difficult to guarantee. This flag indicates to nfsd
+    that it should skip providing WCC attributes to the client in NFSv3
+    replies when doing operations on this filesystem. Consider enabling
+    this on filesystems that have an expensive ->getattr inode operation,
+    or when atomicity between pre and post operation attribute collection
+    is impossible to guarantee.
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 00575d776d91..6420e28a4d2f 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -203,7 +203,7 @@ static __be32 *
 encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp)
 {
 	struct dentry *dentry = fhp->fh_dentry;
-	if (dentry && d_really_is_positive(dentry)) {
+	if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
 	        __be32 err;
 		struct kstat stat;
 
@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
 {
 	__be32 err;
 
+	if (fhp->fh_no_wcc)
+		return;
+
 	if (fhp->fh_post_saved)
 		printk("nfsd: inode locked twice during operation.\n");
 
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c1681ce894c5..f56f938a2e3f 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 
 	fhp->fh_dentry = dentry;
 	fhp->fh_export = exp;
+
+	switch (rqstp->rq_vers) {
+	case 3:
+		if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
+			break;
+		/* Fallthrough */
+	case 2:
+		fhp->fh_no_wcc = true;
+	}
+
 	return 0;
 out:
 	exp_put(exp);
@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
 	 */
 	 set_version_and_fsid_type(fhp, exp, ref_fh);
 
+	/* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
+	fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
+
 	if (ref_fh == fhp)
 		fh_put(ref_fh);
 
@@ -638,6 +651,7 @@ fh_put(struct svc_fh *fhp)
 		exp_put(exp);
 		fhp->fh_export = NULL;
 	}
+	fhp->fh_no_wcc = false;
 	return;
 }
 
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 0770bcb543c8..1cca265ad3e9 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -33,6 +33,7 @@ typedef struct svc_fh {
 
 	bool			fh_locked;	/* inode locked by us */
 	bool			fh_want_write;	/* remount protection taken */
+	bool			fh_no_wcc;	/* no wcc data needed */
 
 #ifdef CONFIG_NFSD_V3
 	bool			fh_post_saved;	/* post-op attrs saved */
@@ -52,7 +53,6 @@ typedef struct svc_fh {
 	struct kstat		fh_post_attr;	/* full attrs after operation */
 	u64			fh_post_change; /* nfsv4 change; see above */
 #endif /* CONFIG_NFSD_V3 */
-
 } svc_fh;
 
 enum nfsd_fsid {
@@ -248,6 +248,9 @@ fill_pre_wcc(struct svc_fh *fhp)
 {
 	struct inode    *inode;
 
+	if (fhp->fh_no_wcc)
+		return;
+
 	inode = d_inode(fhp->fh_dentry);
 	if (!fhp->fh_pre_saved) {
 		fhp->fh_pre_mtime = inode->i_mtime;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index fa05e04c5531..600c3fccc999 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -214,6 +214,8 @@ struct export_operations {
 			  bool write, u32 *device_generation);
 	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
 			     int nr_iomaps, struct iattr *iattr);
+#define	EXPORT_OP_NOWCC		(0x1)	/* Don't collect wcc data for NFSv3 replies */
+	unsigned long	flags;
 };
 
 extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 24/38] nfsd: allow lockd to be forcibly disabled
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (22 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 23/38] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 25/38] nfsd: add errno mapping for EREMOTEIO Jeff Layton
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

In some cases, we may want to use a userland NLM server which will
require that we turn off lockd.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfssvc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d1034d119afb..e266a41bed9b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -241,8 +241,19 @@ static void nfsd_shutdown_generic(void)
 	nfsd_file_cache_shutdown();
 }
 
+/*
+ * Allow admin to disable lockd. This would typically be used to allow (e.g.)
+ * a userspace NLM server of some sort to be used.
+ */
+static bool nfsd_disable_lockd = false;
+module_param(nfsd_disable_lockd, bool, 0644);
+MODULE_PARM_DESC(nfsd_disable_lockd, "Allow lockd to be manually disabled.");
+
 static bool nfsd_needs_lockd(void)
 {
+	if (nfsd_disable_lockd)
+		return false;
+
 #if defined(CONFIG_NFSD_V3)
 	return (nfsd_versions[2] != NULL) || (nfsd_versions[3] != NULL);
 #else
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 25/38] nfsd: add errno mapping for EREMOTEIO
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (23 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 24/38] nfsd: allow lockd to be forcibly disabled Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 26/38] nfsd: return EREMOTE if we find an S_AUTOMOUNT inode Jeff Layton
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

...map to NFSERR_IO.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfsproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 9893095cbee1..5a82d20a50b4 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -794,6 +794,7 @@ nfserrno (int errno)
 		{ nfserr_toosmall, -ETOOSMALL },
 		{ nfserr_serverfault, -ESERVERFAULT },
 		{ nfserr_serverfault, -ENFILE },
+		{ nfserr_io, -EREMOTEIO },
 	};
 	int	i;
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 26/38] nfsd: return EREMOTE if we find an S_AUTOMOUNT inode
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (24 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 25/38] nfsd: add errno mapping for EREMOTEIO Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking Jeff Layton
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Our interest with the NFS reexporting code is primarily to reexport NFSv4
filesystems, but there is no way to get the fsid that is associated with
an inode. So, we must require that any reexported NFSv4 filesystem have
an explicit fsid= value assigned to it. That value must also be
persistent across reboots or we'll see a lot of ESTALE errors.

Because of this requirement, we can't really deal with transparent
automounting (and implicit exporting) of NFSv4 filesystems after a
mountpoint traversal.

Don't allow knfsd to traverse S_AUTOMOUNT inodes. If we find one in a
LOOKUP then we simply return NFS3ERR_REMOTE. In the case of READDIRPLUS,
we opt not to send any attributes for the inode, so the follow on stat()
call (if any) can figure out that it's remote.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfs3xdr.c | 2 ++
 fs/nfsd/vfs.c     | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 6420e28a4d2f..8c8c89bc752b 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -833,6 +833,8 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
 		goto out;
 	if (d_really_is_negative(dchild))
 		goto out;
+	if (IS_AUTOMOUNT(dchild->d_inode))
+		goto out;
 	if (dchild->d_inode->i_ino != ino)
 		goto out;
 	rv = fh_compose(fhp, exp, dchild, &cd->fh);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0b5072632ae5..d3fac79c4eaa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -244,8 +244,12 @@ nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name,
 	 * dentry may be negative, it may need to be updated.
 	 */
 	err = fh_compose(resfh, exp, dentry, fhp);
-	if (!err && d_really_is_negative(dentry))
-		err = nfserr_noent;
+	if (!err) {
+		if (d_really_is_negative(dentry))
+			err = nfserr_noent;
+		else if (IS_AUTOMOUNT(dentry->d_inode))
+			err = nfserr_remote;
+	}
 out:
 	dput(dentry);
 	exp_put(exp);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (25 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 26/38] nfsd: return EREMOTE if we find an S_AUTOMOUNT inode Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 22:53   ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 28/38] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Jeff Layton
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

When we start allowing NFS to be reexported, then we have some problems
when it comes to subtree checking. In principle, we could allow it, but
it would mean encoding parent info in the filehandles and there may not
be enough space for that in a NFSv3 filehandle.

To enforce this at export upcall time, we add a new export_ops flag
that declares the filesystem ineligible for subtree checking.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 Documentation/filesystems/nfs/Exporting | 14 +++++++++++++-
 fs/nfsd/export.c                        |  6 ++++++
 include/linux/exportfs.h                |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
index fa636cde3907..a89b5be22703 100644
--- a/Documentation/filesystems/nfs/Exporting
+++ b/Documentation/filesystems/nfs/Exporting
@@ -160,7 +160,7 @@ contains a "flags" field that allows the filesystem to communicate to nfsd
 that it may want to do things differently when dealing with it. The
 following flags are defined:
 
-  EXPORT_OP_NOWCC
+  EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem
     RFC 1813 recommends that servers always send weak cache consistency
     (WCC) data to the client after each operation. The server should
     atomically collect attributes about the inode, do an operation on it,
@@ -174,3 +174,15 @@ following flags are defined:
     this on filesystems that have an expensive ->getattr inode operation,
     or when atomicity between pre and post operation attribute collection
     is impossible to guarantee.
+
+  EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs
+    Many NFS operations deal with filehandles, which the server must then
+    vet to ensure that they live inside of an exported tree. When the
+    export consists of an entire filesystem, this is trivial. nfsd can just
+    ensure that the filehandle live on the filesystem. When only part of a
+    filesystem is exported however, then nfsd must walk the ancestors of the
+    inode to ensure that it's within an exported subtree. This is an
+    expensive operation and not all filesystems can support it properly.
+    This flag exempts the filesystem from subtree checking and causes
+    exportfs to get back an error if it tries to enable subtree checking
+    on it.
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 4b504edff121..295d22e8fdad 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -392,6 +392,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
 		return -EINVAL;
 	}
 
+	if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK &&
+	    *flags & NFSEXP_NOSUBTREECHECK) {
+		dprintk("%s: %s does not support subtree checking!\n",
+			__func__, inode->i_sb->s_type->name);
+		return -EINVAL;
+	}
 	return 0;
 
 }
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 600c3fccc999..5f9b5345f717 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -215,6 +215,7 @@ struct export_operations {
 	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
 			     int nr_iomaps, struct iattr *iattr);
 #define	EXPORT_OP_NOWCC		(0x1)	/* Don't collect wcc data for NFSv3 replies */
+#define	EXPORT_OP_NOSUBTREECHK	(0x2)	/* Subtree checking is not supported! */
 	unsigned long	flags;
 };
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 28/38] nfsd: close cached files prior to a REMOVE or RENAME that would replace target
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (26 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 29/38] nfsd: retry once in nfsd_open on an -EOPENSTALE return Jeff Layton
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.

On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.

This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.

Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.

None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 Documentation/filesystems/nfs/Exporting | 13 +++++++
 fs/nfsd/filecache.c                     | 29 +++++++++++++++
 fs/nfsd/filecache.h                     |  1 +
 fs/nfsd/trace.h                         |  2 ++
 fs/nfsd/vfs.c                           | 64 ++++++++++++++++++++++++++++-----
 include/linux/exportfs.h                |  5 +--
 6 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
index a89b5be22703..eb3eec811e67 100644
--- a/Documentation/filesystems/nfs/Exporting
+++ b/Documentation/filesystems/nfs/Exporting
@@ -186,3 +186,16 @@ following flags are defined:
     This flag exempts the filesystem from subtree checking and causes
     exportfs to get back an error if it tries to enable subtree checking
     on it.
+
+  EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking
+    On some exportable filesystems (such as NFS) unlinking a file that
+    is still open can cause a fair bit of extra work. For instance,
+    the NFS client will do a "sillyrename" to ensure that the file
+    sticks around while it's still open. When reexporting, that open
+    file is held by nfsd so we usually end up doing a sillyrename, and
+    then immediately deleting the sillyrenamed file just afterward when
+    the link count actually goes to zero. Sometimes this delete can race
+    with other operations (for instance an rmdir of the parent directory).
+    This flag causes nfsd to close any open files for this inode _before_
+    calling into the vfs to do an unlink or a rename that would replace
+    an existing file.
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 79daf2677176..e7756664b8d8 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -547,6 +547,35 @@ nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
 	return NULL;
 }
 
+/**
+ * nfsd_file_is_cached - are there any cached open files for this fh?
+ * @inode: inode of the file to check
+ *
+ * Scan the hashtable for open files that match this fh. Returns true if there
+ * are any, and false if not.
+ */
+bool
+nfsd_file_is_cached(struct inode *inode)
+{
+	bool			ret = false;
+	struct nfsd_file	*nf;
+	unsigned int		hashval;
+
+        hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+				 nf_node) {
+		if (inode == nf->nf_inode) {
+			ret = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	trace_nfsd_file_is_cached(inode, hashval, (int)ret);
+	return ret;
+}
+
 __be32
 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **pnf)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 756aea5431da..aea5d347c27b 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -38,6 +38,7 @@ void nfsd_file_cache_shutdown(void);
 void nfsd_file_put(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
+bool nfsd_file_is_cached(struct inode *inode);
 __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **nfp);
 int	nfsd_file_cache_stats_open(struct inode *, struct file *);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 9174d126ff6e..f991619f4f67 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -208,6 +208,7 @@ DEFINE_EVENT(nfsd_file_search_class, name,				\
 
 DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync);
 DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached);
 
 TRACE_EVENT(nfsd_file_fsnotify_handle_event,
 	TP_PROTO(struct inode *inode, u32 mask),
@@ -227,6 +228,7 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event,
 	TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode,
 			__entry->nlink, __entry->mode, __entry->mask)
 );
+
 #endif /* _NFSD_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d3fac79c4eaa..9bf194be2b8e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1524,6 +1524,26 @@ out_nfserr:
 	goto out_unlock;
 }
 
+static void
+nfsd_close_cached_files(struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		nfsd_file_close_inode_sync(inode);
+}
+
+static bool
+nfsd_has_cached_files(struct dentry *dentry)
+{
+	bool		ret = false;
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		ret = nfsd_file_is_cached(inode);
+	return ret;
+}
+
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1536,6 +1556,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	struct inode	*fdir, *tdir;
 	__be32		err;
 	int		host_err;
+	bool		close_cached = false;
 
 	err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
 	if (err)
@@ -1554,6 +1575,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
 		goto out;
 
+retry:
 	host_err = fh_want_write(ffhp);
 	if (host_err) {
 		err = nfserrno(host_err);
@@ -1593,11 +1615,17 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
 		goto out_dput_new;
 
-	host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
-	if (!host_err) {
-		host_err = commit_metadata(tfhp);
-		if (!host_err)
-			host_err = commit_metadata(ffhp);
+	if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
+	    nfsd_has_cached_files(ndentry)) {
+		close_cached = true;
+		goto out_dput_old;
+	} else {
+		host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+		if (!host_err) {
+			host_err = commit_metadata(tfhp);
+			if (!host_err)
+				host_err = commit_metadata(ffhp);
+		}
 	}
  out_dput_new:
 	dput(ndentry);
@@ -1610,12 +1638,26 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 	 * as that would do the wrong thing if the two directories
 	 * were the same, so again we do it by hand.
 	 */
-	fill_post_wcc(ffhp);
-	fill_post_wcc(tfhp);
+	if (!close_cached) {
+		fill_post_wcc(ffhp);
+		fill_post_wcc(tfhp);
+	}
 	unlock_rename(tdentry, fdentry);
 	ffhp->fh_locked = tfhp->fh_locked = false;
 	fh_drop_write(ffhp);
 
+	/*
+	 * If the target dentry has cached open files, then we need to try to
+	 * close them prior to doing the rename. Flushing delayed fput
+	 * shouldn't be done with locks held however, so we delay it until this
+	 * point and then reattempt the whole shebang.
+	 */
+	if (close_cached) {
+		close_cached = false;
+		nfsd_close_cached_files(ndentry);
+		dput(ndentry);
+		goto retry;
+	}
 out:
 	return err;
 }
@@ -1662,10 +1704,14 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (!type)
 		type = d_inode(rdentry)->i_mode & S_IFMT;
 
-	if (type != S_IFDIR)
+	if (type != S_IFDIR) {
+		if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK)
+			nfsd_close_cached_files(rdentry);
 		host_err = vfs_unlink(dirp, rdentry, NULL);
-	else
+	} else {
 		host_err = vfs_rmdir(dirp, rdentry);
+	}
+
 	if (!host_err)
 		host_err = commit_metadata(fhp);
 	dput(rdentry);
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 5f9b5345f717..e8ba130f0aa5 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -214,8 +214,9 @@ struct export_operations {
 			  bool write, u32 *device_generation);
 	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
 			     int nr_iomaps, struct iattr *iattr);
-#define	EXPORT_OP_NOWCC		(0x1)	/* Don't collect wcc data for NFSv3 replies */
-#define	EXPORT_OP_NOSUBTREECHK	(0x2)	/* Subtree checking is not supported! */
+#define	EXPORT_OP_NOWCC			(0x1) /* don't collect v3 wcc data */
+#define	EXPORT_OP_NOSUBTREECHK		(0x2) /* no subtree checking */
+#define	EXPORT_OP_CLOSE_BEFORE_UNLINK	(0x4) /* close files before unlink */
 	unsigned long	flags;
 };
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 29/38] nfsd: retry once in nfsd_open on an -EOPENSTALE return
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (27 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 28/38] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 30/38] nfsd: close cached file when underlying file systems says no such file Jeff Layton
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

If we get back -EOPENSTALE from an NFSv4 open, then we either got some
unhandled error or the inode we got back was not the same as the one
associated with the dentry.

We really have no recourse in that situation other than to retry the
open, and if it fails to just return nfserr_stale back to the client.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/nfsproc.c |  1 +
 fs/nfsd/vfs.c     | 12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 5a82d20a50b4..697fc0b9ba09 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -795,6 +795,7 @@ nfserrno (int errno)
 		{ nfserr_serverfault, -ESERVERFAULT },
 		{ nfserr_serverfault, -ENFILE },
 		{ nfserr_io, -EREMOTEIO },
+		{ nfserr_stale, -EOPENSTALE },
 	};
 	int	i;
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9bf194be2b8e..5293cba6b8f8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -622,9 +622,9 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	int		flags = O_RDONLY|O_LARGEFILE;
 	__be32		err;
 	int		host_err = 0;
+	bool		retried = false;
 
-	BUG_ON(!fhp->fh_dentry);
-
+retry:
 	path.mnt = fhp->fh_export->ex_path.mnt;
 	path.dentry = fhp->fh_dentry;
 	inode = d_inode(path.dentry);
@@ -659,6 +659,14 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 
 	file = dentry_open(&path, flags, current_cred());
 	if (IS_ERR(file)) {
+		if (file == ERR_PTR(-EOPENSTALE) && !retried) {
+			retried = true;
+			fh_put(fhp);
+			err = fh_verify(rqstp, fhp, type, may_flags);
+			if (err)
+				goto out;
+			goto retry;
+		}
 		host_err = PTR_ERR(file);
 		goto out_nfserr;
 	}
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 30/38] nfsd: close cached file when underlying file systems says no such file
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (28 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 29/38] nfsd: retry once in nfsd_open on an -EOPENSTALE return Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open Jeff Layton
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

Upon setattr, fallocate, read, write, commit, if underlying file system
returns EBADF or ENOENT, we should close the cached file.

This is possible on data portal when remote files are removed by other clients.

Also call .d_weak_revalidate() upon such failures and if we finds out
inode is still valid, retry above operation once. In theory we should
not see the retry but it makes code more robust, in case underlying nfsv4 state
recovery fails badly.

[jlayton: turn retry ints into bools]

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfsd/vfs.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 83 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5293cba6b8f8..ca6896bb25fb 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -333,6 +333,51 @@ out_nfserrno:
 	return nfserrno(host_err);
 }
 
+static void
+nfsd_close_cached_files(struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		nfsd_file_close_inode_sync(inode);
+}
+
+static bool
+nfsd_has_cached_files(struct dentry *dentry)
+{
+	bool		ret = false;
+	struct inode *inode = d_inode(dentry);
+
+	if (inode && S_ISREG(inode->i_mode))
+		ret = nfsd_file_is_cached(inode);
+	return ret;
+}
+
+static bool
+nfsd_cached_files_handle_vfs_error(struct dentry *dentry, int err)
+{
+	struct inode *inode = d_inode(dentry);
+
+	switch (err) {
+	case -EBADF:
+	case -ENOENT:
+	case -EOPENSTALE:
+		if (inode && S_ISREG(inode->i_mode))
+			nfsd_file_close_inode_sync(inode);
+		if (dentry->d_flags & DCACHE_OP_WEAK_REVALIDATE &&
+		    dentry->d_op->d_weak_revalidate(dentry, LOOKUP_REVAL) > 0) {
+			printk(KERN_NOTICE
+				"%s: file %s still alive!\n", __func__,
+				dentry->d_name.name);
+			return true;
+		}
+	default:
+	      break;
+	}
+
+	return false;
+}
+
 /*
  * Set various file attributes.  After this call fhp needs an fh_put.
  */
@@ -348,12 +393,14 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 	int		host_err;
 	bool		get_write_count;
 	int		size_change = 0;
+	bool		retry = false;
 
 	if (iap->ia_valid & (ATTR_ATIME | ATTR_MTIME | ATTR_SIZE))
 		accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE;
 	if (iap->ia_valid & ATTR_SIZE)
 		ftype = S_IFREG;
 
+try_again:
 	/* Callers that do fh_verify should do the fh_want_write: */
 	get_write_count = !fhp->fh_dentry;
 
@@ -410,6 +457,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 	fh_lock(fhp);
 	host_err = notify_change(dentry, iap, NULL);
 	fh_unlock(fhp);
+	if (!retry && nfsd_cached_files_handle_vfs_error(dentry, host_err)) {
+		retry = true;
+		goto try_again;
+	}
 	err = nfserrno(host_err);
 
 out_put_write_access:
@@ -466,6 +517,7 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	mutex_lock(&d_inode(dentry)->i_mutex);
 	host_error = security_inode_setsecctx(dentry, label->data, label->len);
 	mutex_unlock(&d_inode(dentry)->i_mutex);
+	nfsd_cached_files_handle_vfs_error(dentry, host_error);
 	return nfserrno(host_error);
 }
 #else
@@ -489,6 +541,7 @@ __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (!error)
 		error = commit_metadata(fhp);
 
+	nfsd_cached_files_handle_vfs_error(fhp->fh_dentry, error);
 	return nfserrno(error);
 }
 #endif /* defined(CONFIG_NFSD_V4) */
@@ -659,7 +712,8 @@ retry:
 
 	file = dentry_open(&path, flags, current_cred());
 	if (IS_ERR(file)) {
-		if (file == ERR_PTR(-EOPENSTALE) && !retried) {
+		if (nfsd_cached_files_handle_vfs_error(path.dentry, PTR_ERR(file))
+		    && !retried) {
 			retried = true;
 			fh_put(fhp);
 			err = fh_verify(rqstp, fhp, type, may_flags);
@@ -776,7 +830,7 @@ nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
 		*count = host_err;
 		fsnotify_access(file);
 		return 0;
-	} else 
+	} else
 		return nfserrno(host_err);
 }
 
@@ -930,8 +984,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 {
 	__be32			err;
 	struct nfsd_file	*nf;
+	int			retry = false;
 
 	trace_read_start(rqstp, fhp, offset, vlen);
+try_again:
 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
 	if (err == nfs_ok) {
 		trace_read_opened(rqstp, fhp, offset, vlen);
@@ -939,6 +995,11 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
 					count);
 		trace_read_io_done(rqstp, fhp, offset, vlen);
 		nfsd_file_put(nf);
+		if (!retry &&
+		    nfsd_cached_files_handle_vfs_error(fhp->fh_dentry, err)) {
+			retry = true;
+			goto try_again;
+		}
 	}
 	trace_read_done(rqstp, fhp, offset, vlen);
 	return err;
@@ -955,8 +1016,10 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
 {
 	__be32			err;
 	struct nfsd_file	*nf;
+	bool			retry = false;
 
 	trace_write_start(rqstp, fhp, offset, vlen);
+try_again:
 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
 	if (err == nfs_ok) {
 		trace_write_opened(rqstp, fhp, offset, vlen);
@@ -964,6 +1027,10 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
 					vlen, cnt, stablep);
 		trace_write_io_done(rqstp, fhp, offset, vlen);
 		nfsd_file_put(nf);
+		if (!retry && nfsd_cached_files_handle_vfs_error(fhp->fh_dentry, err)) {
+			retry = true;
+			goto try_again;
+		}
 	}
 	trace_write_done(rqstp, fhp, offset, vlen);
 	return err;
@@ -986,6 +1053,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct nfsd_file	*nf;
 	loff_t			end = LLONG_MAX;
 	__be32			err = nfserr_inval;
+	bool			retry = false;
 
 	if (offset < 0)
 		goto out;
@@ -995,6 +1063,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			goto out;
 	}
 
+try_again:
 	err = nfsd_file_acquire(rqstp, fhp,
 			NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
 	if (err)
@@ -1002,6 +1071,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (EX_ISSYNC(fhp->fh_export)) {
 		int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
 
+		if (!retry && nfsd_cached_files_handle_vfs_error(fhp->fh_dentry, err)) {
+			nfsd_file_put(nf);
+			retry = true;
+			goto try_again;
+		}
 		if (err2 != -EINVAL)
 			err = nfserrno(err2);
 		else
@@ -1471,7 +1545,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	struct inode	*dirp;
 	__be32		err;
 	int		host_err;
+	int		retry = true;
 
+try_again:
 	err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_CREATE);
 	if (err)
 		goto out;
@@ -1524,6 +1600,11 @@ out_dput:
 out_unlock:
 	fh_unlock(ffhp);
 	fh_drop_write(tfhp);
+	if (!retry &&
+	    nfsd_cached_files_handle_vfs_error(tfhp->fh_dentry, host_err)) {
+		retry = true;
+		goto try_again;
+	}
 out:
 	return err;
 
@@ -1532,26 +1613,6 @@ out_nfserr:
 	goto out_unlock;
 }
 
-static void
-nfsd_close_cached_files(struct dentry *dentry)
-{
-	struct inode *inode = d_inode(dentry);
-
-	if (inode && S_ISREG(inode->i_mode))
-		nfsd_file_close_inode_sync(inode);
-}
-
-static bool
-nfsd_has_cached_files(struct dentry *dentry)
-{
-	bool		ret = false;
-	struct inode *inode = d_inode(dentry);
-
-	if (inode && S_ISREG(inode->i_mode))
-		ret = nfsd_file_is_cached(inode);
-	return ret;
-}
-
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (29 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 30/38] nfsd: close cached file when underlying file systems says no such file Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-19 20:06   ` J. Bruce Fields
  2015-11-17 11:52 ` [PATCH v1 32/38] nfs: add encode_fh export op Jeff Layton
                   ` (6 subsequent siblings)
  37 siblings, 1 reply; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

It's a trival change but follows knfsd export document that asks
for d_splice_alias during lookup.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
---
 fs/nfs/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ce5a21861074..a4df1137878e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1534,7 +1534,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		switch (err) {
 		case -ENOENT:
 			d_drop(dentry);
-			d_add(dentry, NULL);
+			d_splice_alias(NULL, dentry);
 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 			break;
 		case -EISDIR:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 32/38] nfs: add encode_fh export op
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (30 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 33/38] nfs: add fh_to_dentry " Jeff Layton
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

A very first step of re-exporting nfs via knfsd. For now, it's just
copying the underlying server's filehandle. Later patches will add
the piece that embeds it in a new FH instead.

[jlayton: add export ops flags field]

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfs/Makefile |  1 +
 fs/nfs/export.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)
 create mode 100644 fs/nfs/export.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 8664417955a2..f1443f14be79 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -11,6 +11,7 @@ nfs-y 			:= client.o dir.o file.o getroot.o inode.o super.o \
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o
 nfs-$(CONFIG_SYSCTL)	+= sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
+nfs-$(CONFIG_NFS_REEXPORT) += export.o
 
 obj-$(CONFIG_NFS_V2) += nfsv2.o
 nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/export.c b/fs/nfs/export.c
new file mode 100644
index 000000000000..d446f77b26b3
--- /dev/null
+++ b/fs/nfs/export.c
@@ -0,0 +1,46 @@
+/*
+ * Module for pnfs flexfile layout driver.
+ *
+ * Copyright (c) 2015, Primary Data, Inc. All rights reserved.
+ *
+ * Tao Peng <bergwolf@primarydata.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY		NFSDBG_VFS
+
+/*
+ * Let's break subtree checking for now... otherwise we'll have to embed parent fh
+ * but there might not be enough space.
+ */
+static int
+nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
+{
+	struct nfs_fh *server_fh = NFS_FH(inode);
+	struct nfs_fh *clnt_fh = (struct nfs_fh *)p;
+	int disconnected_fh_len = server_fh->size / 4 + 1;
+
+	dprintk("%s: max fh len %d inode %p parent %p",
+		__func__, *max_len, inode, parent);
+
+	if (*max_len < disconnected_fh_len) {
+		*max_len = disconnected_fh_len;
+		return FILEID_INVALID;
+	}
+
+	nfs_copy_fh(clnt_fh, server_fh);
+	*max_len = disconnected_fh_len;
+
+	dprintk("%s: result fh size %d\n", __func__, *max_len);
+	return *max_len;
+}
+
+const struct export_operations nfs_export_ops = {
+	.encode_fh = nfs_encode_fh,
+	.flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK,
+};
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 33/38] nfs: add fh_to_dentry export op
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (31 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 32/38] nfs: add encode_fh export op Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 34/38] nfs: nfs_fh_to_dentry() make use of inode cache Jeff Layton
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
---
 fs/nfs/export.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index d446f77b26b3..374dcc2590a0 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -6,10 +6,12 @@
  * Tao Peng <bergwolf@primarydata.com>
  */
 
+#include <linux/dcache.h>
 #include <linux/exportfs.h>
 #include <linux/nfs.h>
 #include <linux/nfs_fs.h>
 
+#include "internal.h"
 #include "nfstrace.h"
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
@@ -40,7 +42,54 @@ nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
 	return *max_len;
 }
 
+static struct dentry *
+nfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
+		 int fh_len, int fh_type)
+{
+	struct nfs4_label *label = NULL;
+	struct nfs_fattr *fattr = NULL;
+	struct nfs_fh *server_fh = (struct nfs_fh *)fid->raw;
+	const struct nfs_rpc_ops *rpc_ops;
+	struct dentry *dentry = NULL;
+	struct inode *inode;
+	int len = server_fh->size / 4 + 1;
+	int ret;
+
+	/* NULL translates to ESTALE */
+	if (fh_len < len || fh_type != len)
+		return NULL;
+
+	fattr = nfs_alloc_fattr();
+	if (fattr == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	label = nfs4_label_alloc(NFS_SB(sb), GFP_KERNEL);
+	if (IS_ERR(label)) {
+		ret = PTR_ERR(label);
+		goto out;
+	}
+
+	rpc_ops = NFS_SB(sb)->nfs_client->rpc_ops;
+	ret = rpc_ops->getattr(NFS_SB(sb), server_fh, fattr, label);
+	if (ret) {
+		dprintk("%s: getattr failed %d\n", __func__, ret);
+		goto out;
+	}
+
+	inode = nfs_fhget(sb, server_fh, fattr, label);
+	dentry = d_obtain_alias(inode);
+
+out:
+	nfs4_label_free(label);
+	nfs_free_fattr(fattr);
+
+	return ret ? ERR_PTR(ret) : dentry;
+}
+
 const struct export_operations nfs_export_ops = {
 	.encode_fh = nfs_encode_fh,
+	.fh_to_dentry = nfs_fh_to_dentry,
 	.flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK,
 };
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 34/38] nfs: nfs_fh_to_dentry() make use of inode cache
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (32 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 33/38] nfs: add fh_to_dentry " Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 35/38] nfs4: add NFSv4 LOOKUPP handlers Jeff Layton
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

Embed NFS fileid and i_mode in the file handle returned to nfsd.
So that in nfs_fh_to_dentry(), we can use them to query inode cache
and thus avoid sending GETATTR.

[jlayton: mask off permission bits in i_mode
	  replace hand-rolled routine with DIV_ROUND_UP
	  add FILEID_NFS_REEXPORT ]

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfs/export.c          | 51 ++++++++++++++++++++++++++++++++++++++----------
 fs/nfs/inode.c           | 22 +++++++++++++++++++++
 include/linux/exportfs.h |  8 ++++++++
 include/linux/nfs_fs.h   |  1 +
 4 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 374dcc2590a0..a40a9fc31d13 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -16,6 +16,18 @@
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
+enum {
+	FILEID_HIGH_OFF = 0,	/* inode fileid high */
+	FILEID_LOW_OFF,		/* inode fileid low */
+	FILE_I_MODE_OFF,	/* type portion of inode->i_mode */
+	EMBED_FH_OFF		/* embeded server fh */
+};
+
+static struct nfs_fh *nfs_exp_embedfh(__u32 *p)
+{
+	return (struct nfs_fh *)(p + EMBED_FH_OFF);
+}
+
 /*
  * Let's break subtree checking for now... otherwise we'll have to embed parent fh
  * but there might not be enough space.
@@ -24,22 +36,28 @@ static int
 nfs_encode_fh(struct inode *inode, __u32 *p, int *max_len, struct inode *parent)
 {
 	struct nfs_fh *server_fh = NFS_FH(inode);
-	struct nfs_fh *clnt_fh = (struct nfs_fh *)p;
-	int disconnected_fh_len = server_fh->size / 4 + 1;
+	struct nfs_fh *clnt_fh = nfs_exp_embedfh(p);
+	int len = EMBED_FH_OFF + DIV_ROUND_UP(server_fh->size, 4) + 1;
 
 	dprintk("%s: max fh len %d inode %p parent %p",
 		__func__, *max_len, inode, parent);
 
-	if (*max_len < disconnected_fh_len) {
-		*max_len = disconnected_fh_len;
+	if (*max_len < len) {
+		dprintk("%s: fh len %d too small, required %d\n",
+			__func__, *max_len, len);
+		*max_len = len;
 		return FILEID_INVALID;
 	}
 
+	p[FILEID_HIGH_OFF] = NFS_FILEID(inode) >> 32;
+	p[FILEID_LOW_OFF] = NFS_FILEID(inode);
+	p[FILE_I_MODE_OFF] = inode->i_mode & S_IFMT;
 	nfs_copy_fh(clnt_fh, server_fh);
-	*max_len = disconnected_fh_len;
+	*max_len = len;
 
-	dprintk("%s: result fh size %d\n", __func__, *max_len);
-	return *max_len;
+	dprintk("%s: result fh fileid %llu mode %u size %d\n",
+		__func__, NFS_FILEID(inode), inode->i_mode, *max_len);
+	return FILEID_NFS_REEXPORT;
 }
 
 static struct dentry *
@@ -48,15 +66,16 @@ nfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
 {
 	struct nfs4_label *label = NULL;
 	struct nfs_fattr *fattr = NULL;
-	struct nfs_fh *server_fh = (struct nfs_fh *)fid->raw;
+	struct nfs_fh *server_fh = nfs_exp_embedfh(fid->raw);
 	const struct nfs_rpc_ops *rpc_ops;
 	struct dentry *dentry = NULL;
 	struct inode *inode;
-	int len = server_fh->size / 4 + 1;
+	int len = EMBED_FH_OFF + DIV_ROUND_UP(server_fh->size, 4) + 1;
+	u32 *p = fid->raw;
 	int ret;
 
 	/* NULL translates to ESTALE */
-	if (fh_len < len || fh_type != len)
+	if (fh_len < len || fh_type != FILEID_NFS_REEXPORT)
 		return NULL;
 
 	fattr = nfs_alloc_fattr();
@@ -65,6 +84,16 @@ nfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
 		goto out;
 	}
 
+	fattr->fileid = ((u64)p[FILEID_HIGH_OFF] << 32) + p[FILEID_LOW_OFF];
+	fattr->mode = p[FILE_I_MODE_OFF];
+	fattr->valid |= NFS_ATTR_FATTR_FILEID | NFS_ATTR_FATTR_MODE;
+
+	dprintk("%s: fileid %llu mode %d\n", __func__, fattr->fileid, fattr->mode);
+
+	inode = nfs_ilookup(sb, fattr, server_fh);
+	if (inode)
+		goto out_found;
+
 	label = nfs4_label_alloc(NFS_SB(sb), GFP_KERNEL);
 	if (IS_ERR(label)) {
 		ret = PTR_ERR(label);
@@ -79,6 +108,8 @@ nfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
 	}
 
 	inode = nfs_fhget(sb, server_fh, fattr, label);
+
+out_found:
 	dentry = d_obtain_alias(inode);
 
 out:
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 326d9e10d833..bf6dd7a975b6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -342,6 +342,28 @@ void nfs_setsecurity(struct inode *inode, struct nfs_fattr *fattr,
 #endif
 EXPORT_SYMBOL_GPL(nfs_setsecurity);
 
+/* Search for inode identified by fh, fileid and i_mode in inode cache. */
+struct inode *
+nfs_ilookup(struct super_block *sb, struct nfs_fattr *fattr, struct nfs_fh *fh)
+{
+	struct nfs_find_desc desc = {
+		.fh	= fh,
+		.fattr	= fattr,
+	};
+	struct inode *inode;
+	unsigned long hash;
+
+	if (!(fattr->valid & NFS_ATTR_FATTR_FILEID) ||
+	    !(fattr->valid & NFS_ATTR_FATTR_MODE))
+		return NULL;
+
+	hash = nfs_fattr_to_ino_t(fattr);
+	inode = ilookup5(sb, hash, nfs_find_actor, &desc);
+
+	dprintk("%s: returning %p\n", __func__, inode);
+	return inode;
+}
+
 /*
  * This is our front-end to iget that looks up inodes by file handle
  * instead of inode number.
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index e8ba130f0aa5..783cedf2f636 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -97,6 +97,14 @@ enum fid_type {
 	FILEID_FAT_WITH_PARENT = 0x72,
 
 	/*
+	 * High-order 4 bytes of 64-bit fileid
+	 * Low-order 4 bytes of 64-bit fileid
+	 * inode->i_mode & S_IFMT (4 bytes)
+	 * Embedded copy of fh from underlying server
+	 */
+	FILEID_NFS_REEXPORT = 0x81,
+
+	/*
 	 * Filesystems must not use 0xff file ID.
 	 */
 	FILEID_INVALID = 0xff,
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index c0e961474a52..958092e8b7ca 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -344,6 +344,7 @@ extern void nfs_zap_caches(struct inode *);
 extern void nfs_invalidate_atime(struct inode *);
 extern struct inode *nfs_fhget(struct super_block *, struct nfs_fh *,
 				struct nfs_fattr *, struct nfs4_label *);
+struct inode *nfs_ilookup(struct super_block *sb, struct nfs_fattr *, struct nfs_fh *);
 extern int nfs_refresh_inode(struct inode *, struct nfs_fattr *);
 extern int nfs_post_op_update_inode(struct inode *inode, struct nfs_fattr *fattr);
 extern int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 35/38] nfs4: add NFSv4 LOOKUPP handlers
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (33 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 34/38] nfs: nfs_fh_to_dentry() make use of inode cache Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 36/38] nfs: add a get_parent export operation for NFS Jeff Layton
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

This will be needed in order to implement the get_parent export op
for nfsd.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfs/nfs4proc.c       | 49 +++++++++++++++++++++++++++++++++
 fs/nfs/nfs4trace.h      | 29 ++++++++++++++++++++
 fs/nfs/nfs4xdr.c        | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs4.h    |  1 +
 include/linux/nfs_xdr.h | 17 +++++++++++-
 5 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 765a03559363..1f75374112b0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3573,6 +3573,54 @@ nfs4_proc_lookup_mountpoint(struct inode *dir, struct qstr *name,
 	return (client == NFS_CLIENT(dir)) ? rpc_clone_client(client) : client;
 }
 
+static int _nfs4_proc_lookupp(struct inode *inode,
+		struct nfs_fh *fhandle, struct nfs_fattr *fattr,
+		struct nfs4_label *label)
+{
+	struct rpc_clnt *clnt = NFS_CLIENT(inode);
+	struct nfs_server *server = NFS_SERVER(inode);
+	int		       status;
+	struct nfs4_lookupp_arg args = {
+		.bitmask = server->attr_bitmask,
+		.fh = NFS_FH(inode),
+	};
+	struct nfs4_lookupp_res res = {
+		.server = server,
+		.fattr = fattr,
+		.label = label,
+		.fh = fhandle,
+	};
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_LOOKUPP],
+		.rpc_argp = &args,
+		.rpc_resp = &res,
+	};
+
+	args.bitmask = nfs4_bitmask(server, label);
+
+	nfs_fattr_init(fattr);
+
+	dprintk("NFS call  lookupp ino=0x%lx\n", inode->i_ino);
+	status = nfs4_call_sync(clnt, server, &msg, &args.seq_args,
+				&res.seq_res, 0);
+	dprintk("NFS reply lookupp: %d\n", status);
+	return status;
+}
+
+static int nfs4_proc_lookupp(struct inode *inode, struct nfs_fh *fhandle,
+			     struct nfs_fattr *fattr, struct nfs4_label *label)
+{
+	struct nfs4_exception exception = { };
+	int err;
+	do {
+		err = _nfs4_proc_lookupp(inode, fhandle, fattr, label);
+		trace_nfs4_lookupp(inode, err);
+		err = nfs4_handle_exception(NFS_SERVER(inode), err,
+				&exception);
+	} while (exception.retry);
+	return err;
+}
+
 static int _nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry)
 {
 	struct nfs_server *server = NFS_SERVER(inode);
@@ -8791,6 +8839,7 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
 	.getattr	= nfs4_proc_getattr,
 	.setattr	= nfs4_proc_setattr,
 	.lookup		= nfs4_proc_lookup,
+	.lookupp	= nfs4_proc_lookupp,
 	.access		= nfs4_proc_access,
 	.readlink	= nfs4_proc_readlink,
 	.create		= nfs4_proc_create,
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index 671cf68fe56b..3ddd06298c4d 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -737,6 +737,35 @@ DEFINE_NFS4_LOOKUP_EVENT(nfs4_remove);
 DEFINE_NFS4_LOOKUP_EVENT(nfs4_get_fs_locations);
 DEFINE_NFS4_LOOKUP_EVENT(nfs4_secinfo);
 
+TRACE_EVENT(nfs4_lookupp,
+		TP_PROTO(
+			const struct inode *inode,
+			int error
+		),
+
+		TP_ARGS(inode, error),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u64, ino)
+			__field(int, error)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->ino = NFS_FILEID(inode);
+			__entry->error = error;
+		),
+
+		TP_printk(
+			"error=%d (%s) inode=%02x:%02x:%llu",
+			__entry->error,
+			show_nfsv4_errors(__entry->error),
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->ino
+		)
+);
+
 TRACE_EVENT(nfs4_rename,
 		TP_PROTO(
 			const struct inode *olddir,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index dfed4f5c8fcc..38485cc31e28 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -158,6 +158,8 @@ static int nfs4_stat_to_errno(int);
 				(op_decode_hdr_maxsz)
 #define encode_lookup_maxsz	(op_encode_hdr_maxsz + nfs4_name_maxsz)
 #define decode_lookup_maxsz	(op_decode_hdr_maxsz)
+#define encode_lookupp_maxsz	(op_encode_hdr_maxsz)
+#define decode_lookupp_maxsz	(op_decode_hdr_maxsz)
 #define encode_share_access_maxsz \
 				(2)
 #define encode_createmode_maxsz	(1 + encode_attrs_maxsz + encode_verifier_maxsz)
@@ -611,6 +613,18 @@ static int nfs4_stat_to_errno(int);
 				decode_lookup_maxsz + \
 				decode_getattr_maxsz + \
 				decode_getfh_maxsz)
+#define NFS4_enc_lookupp_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_lookupp_maxsz + \
+				encode_getattr_maxsz + \
+				encode_getfh_maxsz)
+#define NFS4_dec_lookupp_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_lookupp_maxsz + \
+				decode_getattr_maxsz + \
+				decode_getfh_maxsz)
 #define NFS4_enc_lookup_root_sz (compound_encode_hdr_maxsz + \
 				encode_sequence_maxsz + \
 				encode_putrootfh_maxsz + \
@@ -1368,6 +1382,11 @@ static void encode_lookup(struct xdr_stream *xdr, const struct qstr *name, struc
 	encode_string(xdr, name->len, name->name);
 }
 
+static void encode_lookupp(struct xdr_stream *xdr, struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_LOOKUPP, decode_lookupp_maxsz, hdr);
+}
+
 static void encode_share_access(struct xdr_stream *xdr, u32 share_access)
 {
 	__be32 *p;
@@ -2111,6 +2130,25 @@ static void nfs4_xdr_enc_lookup(struct rpc_rqst *req, struct xdr_stream *xdr,
 }
 
 /*
+ * Encode LOOKUPP request
+ */
+static void nfs4_xdr_enc_lookupp(struct rpc_rqst *req, struct xdr_stream *xdr,
+				const struct nfs4_lookupp_arg *args)
+{
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_lookupp(xdr, &hdr);
+	encode_getfh(xdr, &hdr);
+	encode_getfattr(xdr, args->bitmask, &hdr);
+	encode_nops(&hdr);
+}
+
+/*
  * Encode LOOKUP_ROOT request
  */
 static void nfs4_xdr_enc_lookup_root(struct rpc_rqst *req,
@@ -4979,6 +5017,11 @@ static int decode_lookup(struct xdr_stream *xdr)
 	return decode_op_hdr(xdr, OP_LOOKUP);
 }
 
+static int decode_lookupp(struct xdr_stream *xdr)
+{
+	return decode_op_hdr(xdr, OP_LOOKUPP);
+}
+
 /* This is too sick! */
 static int decode_space_limit(struct xdr_stream *xdr,
 		unsigned long *pagemod_limit)
@@ -6146,6 +6189,35 @@ out:
 }
 
 /*
+ * Decode LOOKUPP response
+ */
+static int nfs4_xdr_dec_lookupp(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
+				struct nfs4_lookupp_res *res)
+{
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_lookupp(xdr);
+	if (status)
+		goto out;
+	status = decode_getfh(xdr, res->fh);
+	if (status)
+		goto out;
+	status = decode_getfattr_label(xdr, res->fattr, res->label, res->server);
+out:
+	return status;
+}
+
+/*
  * Decode LOOKUP_ROOT response
  */
 static int nfs4_xdr_dec_lookup_root(struct rpc_rqst *rqstp,
@@ -7471,6 +7543,7 @@ struct rpc_procinfo	nfs4_procedures[] = {
 	PROC(ACCESS,		enc_access,		dec_access),
 	PROC(GETATTR,		enc_getattr,		dec_getattr),
 	PROC(LOOKUP,		enc_lookup,		dec_lookup),
+	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
 	PROC(LOOKUP_ROOT,	enc_lookup_root,	dec_lookup_root),
 	PROC(REMOVE,		enc_remove,		dec_remove),
 	PROC(RENAME,		enc_rename,		dec_rename),
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e7e78537aea2..d8ef0334672e 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -461,6 +461,7 @@ enum {
 	NFSPROC4_CLNT_ACCESS,
 	NFSPROC4_CLNT_GETATTR,
 	NFSPROC4_CLNT_LOOKUP,
+	NFSPROC4_CLNT_LOOKUPP,
 	NFSPROC4_CLNT_LOOKUP_ROOT,
 	NFSPROC4_CLNT_REMOVE,
 	NFSPROC4_CLNT_RENAME,
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 570d630f98ae..a648595e922b 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -983,7 +983,6 @@ struct nfs4_link_res {
 	struct nfs_fattr *		dir_attr;
 };
 
-
 struct nfs4_lookup_arg {
 	struct nfs4_sequence_args	seq_args;
 	const struct nfs_fh *		dir_fh;
@@ -999,6 +998,20 @@ struct nfs4_lookup_res {
 	struct nfs4_label		*label;
 };
 
+struct nfs4_lookupp_arg {
+	struct nfs4_sequence_args	seq_args;
+	const struct nfs_fh		*fh;
+	const u32			*bitmask;
+};
+
+struct nfs4_lookupp_res {
+	struct nfs4_sequence_res	seq_res;
+	const struct nfs_server		*server;
+	struct nfs_fattr		*fattr;
+	struct nfs_fh			*fh;
+	struct nfs4_label		*label;
+};
+
 struct nfs4_lookup_root_arg {
 	struct nfs4_sequence_args	seq_args;
 	const u32 *			bitmask;
@@ -1516,6 +1529,8 @@ struct nfs_rpc_ops {
 	int	(*lookup)  (struct inode *, struct qstr *,
 			    struct nfs_fh *, struct nfs_fattr *,
 			    struct nfs4_label *);
+	int	(*lookupp) (struct inode *, struct nfs_fh *,
+			    struct nfs_fattr *, struct nfs4_label *);
 	int	(*access)  (struct inode *, struct nfs_access_entry *);
 	int	(*readlink)(struct inode *, struct page *, unsigned int,
 			    unsigned int);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 36/38] nfs: add a get_parent export operation for NFS
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (34 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 35/38] nfs4: add NFSv4 LOOKUPP handlers Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:52 ` [PATCH v1 37/38] nfs: set export ops Jeff Layton
  2015-11-17 11:53 ` [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation Jeff Layton
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

In order to connect disconnected dentries into the tree, we must be
able to determine their parent. Add a get_parent routine for NFS
which does a LOOKUPP operation (in the case of NFSv4), finds the
parent inode and then obtains the dentry for it.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/nfs/export.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index a40a9fc31d13..e1c8165652c5 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -119,8 +119,51 @@ out:
 	return ret ? ERR_PTR(ret) : dentry;
 }
 
+static struct dentry *
+nfs_get_parent(struct dentry *dentry)
+{
+	int ret;
+	struct inode *inode = d_inode(dentry), *pinode;
+	struct super_block *sb = inode->i_sb;
+	struct nfs_server *server = NFS_SB(sb);
+	struct nfs_fattr *fattr = NULL;
+	struct nfs4_label *label = NULL;
+	struct dentry *parent;
+	struct nfs_rpc_ops const *ops = server->nfs_client->rpc_ops;
+	struct nfs_fh fh;
+
+	if (!ops->lookupp)
+		return ERR_PTR(-EACCES);
+
+	fattr = nfs_alloc_fattr();
+	if (fattr == NULL) {
+		parent = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	label = nfs4_label_alloc(server, GFP_KERNEL);
+	if (IS_ERR(label)) {
+		parent = ERR_CAST(label);
+		goto out;
+	}
+
+	ret = ops->lookupp(inode, &fh, fattr, label);
+	if (ret) {
+		parent = ERR_PTR(ret);
+		goto out;
+	}
+
+	pinode = nfs_fhget(sb, &fh, fattr, label);
+	parent = d_obtain_alias(pinode);
+out:
+	nfs4_label_free(label);
+	nfs_free_fattr(fattr);
+	return parent;
+}
+
 const struct export_operations nfs_export_ops = {
 	.encode_fh = nfs_encode_fh,
 	.fh_to_dentry = nfs_fh_to_dentry,
+	.get_parent = nfs_get_parent,
 	.flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK,
 };
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 37/38] nfs: set export ops
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (35 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 36/38] nfs: add a get_parent export operation for NFS Jeff Layton
@ 2015-11-17 11:52 ` Jeff Layton
  2015-11-17 11:53 ` [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation Jeff Layton
  37 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:52 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

From: Peng Tao <tao.peng@primarydata.com>

All the pieces are in place. We can now set the export ops in the
superblock.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
---
 fs/nfs/internal.h | 2 ++
 fs/nfs/super.c    | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 56cfde26fb9c..fb83d83c959e 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -10,6 +10,8 @@
 
 #define NFS_MS_MASK (MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_SYNCHRONOUS)
 
+extern const struct export_operations nfs_export_ops;
+
 struct nfs_string;
 
 /* Maximum number of readahead requests
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index f1268280244e..ee3cc13c92cb 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2321,6 +2321,9 @@ void nfs_fill_super(struct super_block *sb, struct nfs_mount_info *mount_info)
 		 */
 		sb->s_flags |= MS_POSIXACL;
 		sb->s_time_gran = 1;
+#if IS_ENABLED(CONFIG_NFS_REEXPORT)
+		sb->s_export_op = &nfs_export_ops;
+#endif
 	}
 
  	nfs_initialise_sb(sb);
@@ -2341,6 +2344,7 @@ void nfs_clone_super(struct super_block *sb, struct nfs_mount_info *mount_info)
 	sb->s_xattr = old_sb->s_xattr;
 	sb->s_op = old_sb->s_op;
 	sb->s_time_gran = 1;
+	sb->s_export_op = old_sb->s_export_op;
 
 	if (server->nfs_client->rpc_ops->version != 2) {
 		/* The VFS shouldn't apply the umask to mode bits. We will do
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
                   ` (36 preceding siblings ...)
  2015-11-17 11:52 ` [PATCH v1 37/38] nfs: set export ops Jeff Layton
@ 2015-11-17 11:53 ` Jeff Layton
  2015-11-18 20:22   ` J. Bruce Fields
  37 siblings, 1 reply; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 11:53 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 Documentation/filesystems/nfs/reexport.txt | 95 ++++++++++++++++++++++++++++++
 fs/nfs/Kconfig                             | 11 ++++
 2 files changed, 106 insertions(+)
 create mode 100644 Documentation/filesystems/nfs/reexport.txt

diff --git a/Documentation/filesystems/nfs/reexport.txt b/Documentation/filesystems/nfs/reexport.txt
new file mode 100644
index 000000000000..4ecfd3832338
--- /dev/null
+++ b/Documentation/filesystems/nfs/reexport.txt
@@ -0,0 +1,95 @@
+Re-exporting nfs via nfsd:
+--------------------------
+It is possible to reeexport a nfs filesystem via nfsd, but there are
+some limitations to this scheme.
+
+The primary use case for this is allowing clients that do not support
+newer versions of NFS to access servers that do not export older
+versions of NFS. In particular, it's a way to distribute pnfs support to
+non-pnfs enabled clients (albeit at the cost of an extra hop).
+
+There are a number of caveats to doing this -- be sure to read the
+entire document below and make sure that you know what you're doing!
+
+Quick Start:
+------------
+1) ensure that the kernel is built with CONFIG_NFS_REEXPORT
+
+2) Mount the _entire_ directory tree that you wish to reexport on the
+server. nfsd is unable to cross server filesystem boundaries
+automatically, so the entire tree to be reexported must be mounted
+prior to exporting.
+
+3) Add exports for the reexported filesystem to /etc/exports, assigning
+fsid= values to each. NFS doesn't have a persistent UUID or device
+number that is guaranteed to be unique across multiple servers, so
+fsid= values must always be explicitly assigned.
+
+4) Avoid stateful operations from the clients. File locking is
+particularly problematic, but reexporting NFSv4 via NFSv4 is likely to
+have similar problems with open and delegation stateids as well.
+
+The gory details of reexportng:
+-------------------------------
+Below is a detailed list of the _known_ problems with reexporting NFS
+via nfsd. Be aware of these facts when using this feature:
+
+Filehandle size:
+----------------
+The maximum filehandle size is governed by the NFS version. Version 2
+used fixed 32 byte filehandles. Version 3 moved to variable length
+filehandles that can be up to 64 bytes in size. NFSv4 increased that
+maximum to 128 bytes.
+
+When reexporting an NFS filesystem, the underlying filehandle from the
+server must be embedded inside the filehandles presented to clients.
+Thus if the underlying server presents filehandles that are too big, the
+reexporting server can fail to encode them. This can lead to
+NFSERR_OPNOTSUPP errors being returned to clients.
+
+This is not a trivial thing to programatically determine ahead of time
+(and it can vary even within the same server), so some foreknowledge of
+how the underlying server constructs filehandles, and their maximum
+size is a must.
+
+No subtree checking:
+--------------------
+Subtree checking requires that information about the parent be encoded
+in non-directory filehandles. Since filehandle space is already at a
+premium, subtree checking is disallowed on reexported nfs filesystems.
+
+No crossing of mountpoints:
+---------------------------
+Crossing from one exported filesystem to another typically involves the
+nfs client doing a behind-the-scenes mount of the "child" filesystem. nfsd
+lacks the machinery to do this. It could (in principle) be added, but
+there's really no point as there is no way to ensure that the fsid
+(filesystem identifier) value that got assigned was persistent.
+
+Lack of a persistent fsid= value:
+---------------------------------
+NFS filesystems don't have a persistent value that we can stuff into
+the fsid. We could repackage the one that the server provides, but that
+could lead to collisions if the reexporting server has mounts to
+different underlying servers. Thus, reexporting NFS requires assigning
+a fsid= value in the export options. This value must be persistent
+across reboots of the reexporting server as well or the clients will
+see filehandles change (the dreaded "Stale NFS filehandle" error).
+
+Statefulness and locking:
+-------------------------
+Holding any sort of state across a reexported NFS mount is problematic.
+It's always possible that the reexporting server could reboot, in which
+case it will lose track of the state held on the underlying server.
+
+When it comes back up, the clients will then try to reclaim that state
+from the reexporter, but the reexporter can't provide the necessary
+guarantees to ensure that conflicting state wasn't set and released
+during the time it was down. This may mean silent data corruption.
+Any sort of stateful operations against the reexporting fileserver are
+best avoided.
+
+Because of this, it's best to use a configuration that does not involve
+the clients holding any state on the reexporter. For example, reexporting
+a NFSv4 filesystem to legacy clients via NFSv3 (sans file locking) should
+basically work.
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index f31fd0dd92c6..92ad6bcc81cc 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -200,3 +200,14 @@ config NFS_DEBUG
 	depends on NFS_FS && SUNRPC_DEBUG
 	select CRC32
 	default y
+
+config NFS_REEXPORT
+	bool "Allow reexporting of NFS filesystems via knfsd"
+	depends on NFSD
+	default n
+	help
+	  This option allows NFS filesystems to be re-exported via knfsd.
+	  This is generally only useful in some very limited situations.
+	  One such is to allow legacy client access to servers that do not
+	  support older NFS versions. Use with caution and be sure to read
+	  Documentation/filesystems/nfs/reexport.txt first!
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking
  2015-11-17 11:52 ` [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking Jeff Layton
@ 2015-11-17 22:53   ` Jeff Layton
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-17 22:53 UTC (permalink / raw)
  To: bfields, trond.myklebust
  Cc: linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel, steved

On Tue, 17 Nov 2015 06:52:49 -0500
Jeff Layton <jlayton@poochiereds.net> wrote:

> When we start allowing NFS to be reexported, then we have some problems
> when it comes to subtree checking. In principle, we could allow it, but
> it would mean encoding parent info in the filehandles and there may not
> be enough space for that in a NFSv3 filehandle.
> 
> To enforce this at export upcall time, we add a new export_ops flag
> that declares the filesystem ineligible for subtree checking.
> 
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
>  Documentation/filesystems/nfs/Exporting | 14 +++++++++++++-
>  fs/nfsd/export.c                        |  6 ++++++
>  include/linux/exportfs.h                |  1 +
>  3 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting
> index fa636cde3907..a89b5be22703 100644
> --- a/Documentation/filesystems/nfs/Exporting
> +++ b/Documentation/filesystems/nfs/Exporting
> @@ -160,7 +160,7 @@ contains a "flags" field that allows the filesystem to communicate to nfsd
>  that it may want to do things differently when dealing with it. The
>  following flags are defined:
>  
> -  EXPORT_OP_NOWCC
> +  EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem
>      RFC 1813 recommends that servers always send weak cache consistency
>      (WCC) data to the client after each operation. The server should
>      atomically collect attributes about the inode, do an operation on it,
> @@ -174,3 +174,15 @@ following flags are defined:
>      this on filesystems that have an expensive ->getattr inode operation,
>      or when atomicity between pre and post operation attribute collection
>      is impossible to guarantee.
> +
> +  EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs
> +    Many NFS operations deal with filehandles, which the server must then
> +    vet to ensure that they live inside of an exported tree. When the
> +    export consists of an entire filesystem, this is trivial. nfsd can just
> +    ensure that the filehandle live on the filesystem. When only part of a
> +    filesystem is exported however, then nfsd must walk the ancestors of the
> +    inode to ensure that it's within an exported subtree. This is an
> +    expensive operation and not all filesystems can support it properly.
> +    This flag exempts the filesystem from subtree checking and causes
> +    exportfs to get back an error if it tries to enable subtree checking
> +    on it.
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index 4b504edff121..295d22e8fdad 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -392,6 +392,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid)
>  		return -EINVAL;
>  	}
>  
> +	if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK &&
> +	    *)) {

I had the sense reversed here, so this is not working properly. It
should be checking !(*flags & NFSEXP_NOSUBTREECHECK).

> +		dprintk("%s: %s does not support subtree checking!\n",
> +			__func__, inode->i_sb->s_type->name);
> +		return -EINVAL;
> +	}
>  	return 0;
>  
>  }
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 600c3fccc999..5f9b5345f717 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -215,6 +215,7 @@ struct export_operations {
>  	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
>  			     int nr_iomaps, struct iattr *iattr);
>  #define	EXPORT_OP_NOWCC		(0x1)	/* Don't collect wcc data for NFSv3 replies */
> +#define	EXPORT_OP_NOSUBTREECHK	(0x2)	/* Subtree checking is not supported! */
>  	unsigned long	flags;
>  };
>  

...but:

I may have to drop this patch, at least for now...

Why? exportfs' test_export function does not pass in
NFSEXP_NOSUBTREECHK, but mountd (of course) does. So, we can't actually
do reliable checking of export options that involve flags at exportfs
time right now.

That doesn't look too hard to fix in exportfs (just a matter of passing
in the actual flags instead of just the FSID flag, but I wonder if we
might end up running afoul of older kernels if we do that?

-- 
Jeff Layton <jeff.layton@primarydata.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-17 11:53 ` [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation Jeff Layton
@ 2015-11-18 20:22   ` J. Bruce Fields
  2015-11-18 21:15     ` Jeff Layton
  0 siblings, 1 reply; 52+ messages in thread
From: J. Bruce Fields @ 2015-11-18 20:22 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> +Filehandle size:
> +----------------
> +The maximum filehandle size is governed by the NFS version. Version 2
> +used fixed 32 byte filehandles. Version 3 moved to variable length
> +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> +maximum to 128 bytes.
> +
> +When reexporting an NFS filesystem, the underlying filehandle from the
> +server must be embedded inside the filehandles presented to clients.
> +Thus if the underlying server presents filehandles that are too big, the
> +reexporting server can fail to encode them. This can lead to
> +NFSERR_OPNOTSUPP errors being returned to clients.
> +
> +This is not a trivial thing to programatically determine ahead of time
> +(and it can vary even within the same server), so some foreknowledge of
> +how the underlying server constructs filehandles, and their maximum
> +size is a must.

This is the trickiest one, since it depends on an undocumented
implementation detail of the server.

Do we even know if this works for all the exportable Linux filesystems?

If proxying NFSv4.x servers is actually useful, could we add a per-fs
maximum-filesystem-size attribute to the protocol?

--b.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-18 20:22   ` J. Bruce Fields
@ 2015-11-18 21:15     ` Jeff Layton
  2015-11-18 22:30       ` Frank Filz
  2015-11-20  0:04       ` J. Bruce Fields
  0 siblings, 2 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-18 21:15 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Wed, 18 Nov 2015 15:22:20 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > +Filehandle size:
> > +----------------
> > +The maximum filehandle size is governed by the NFS version. Version 2
> > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > +maximum to 128 bytes.
> > +
> > +When reexporting an NFS filesystem, the underlying filehandle from the
> > +server must be embedded inside the filehandles presented to clients.
> > +Thus if the underlying server presents filehandles that are too big, the
> > +reexporting server can fail to encode them. This can lead to
> > +NFSERR_OPNOTSUPP errors being returned to clients.
> > +
> > +This is not a trivial thing to programatically determine ahead of time
> > +(and it can vary even within the same server), so some foreknowledge of
> > +how the underlying server constructs filehandles, and their maximum
> > +size is a must.
> 
> This is the trickiest one, since it depends on an undocumented
> implementation detail of the server.
> 

Yes, indeed...

> Do we even know if this works for all the exportable Linux filesystems?
> 
> If proxying NFSv4.x servers is actually useful, could we add a per-fs
> maximum-filesystem-size attribute to the protocol?
> 

Erm, I think you mean maximum-filehandle-size, but I get your point...

It's tough to do more than a quick survey, but looking at new-style fh:

The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
can get that down to 8 bytes if you specify the fsid directly. The fsid
choice is weird, because it sort of depends on the filehandle sent by
the client (which is used as a template), so I guess we really do need
to assume worst-case.

Once that's done, the encode_fh routines add the fileid part. btrfs has
a pretty large maximum one: 40 bytes. That brings the max size up to 68
bytes, which is already too large for NFSv3, before we ever get to
the part where we embed that inside another fh. We require another 12
bytes on top of the "underlying" filehandle for reexporting.

So, no this may very well not work for all exportable Linux
filesystems, but it sort of depends on the situation (and to some
degree, what gets sent by the clients). That's what makes this so hard
to figure out programmatically.

As far as extending the protocol...that's not a bad idea, though that's
obviously a longer-term solution. I don't think we can reasonably rely
on that anyway. Maybe though...

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-18 21:15     ` Jeff Layton
@ 2015-11-18 22:30       ` Frank Filz
  2015-11-19 14:01         ` Jeff Layton
  2015-11-20  0:04       ` J. Bruce Fields
  1 sibling, 1 reply; 52+ messages in thread
From: Frank Filz @ 2015-11-18 22:30 UTC (permalink / raw)
  To: 'Jeff Layton', 'J. Bruce Fields'
  Cc: trond.myklebust, linux-nfs, 'Eric Paris',
	'Alexander Viro',
	linux-fsdevel

Jeff Layton said:
> On Wed, 18 Nov 2015 15:22:20 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > +Filehandle size:
> > > +----------------
> > > +The maximum filehandle size is governed by the NFS version. Version
> > > +2 used fixed 32 byte filehandles. Version 3 moved to variable
> > > +length filehandles that can be up to 64 bytes in size. NFSv4
> > > +increased that maximum to 128 bytes.
> > > +
> > > +When reexporting an NFS filesystem, the underlying filehandle from
> > > +the server must be embedded inside the filehandles presented to
> clients.
> > > +Thus if the underlying server presents filehandles that are too
> > > +big, the reexporting server can fail to encode them. This can lead
> > > +to NFSERR_OPNOTSUPP errors being returned to clients.
> > > +
> > > +This is not a trivial thing to programatically determine ahead of
> > > +time (and it can vary even within the same server), so some
> > > +foreknowledge of how the underlying server constructs filehandles,
> > > +and their maximum size is a must.
> >
> > This is the trickiest one, since it depends on an undocumented
> > implementation detail of the server.
> >
> 
> Yes, indeed...
> 
> > Do we even know if this works for all the exportable Linux filesystems?
> >
> > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > maximum-filesystem-size attribute to the protocol?
> >
> 
> Erm, I think you mean maximum-filehandle-size, but I get your point...
> 
> It's tough to do more than a quick survey, but looking at new-style fh:
> 
> The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> can get that down to 8 bytes if you specify the fsid directly. The fsid
choice is
> weird, because it sort of depends on the filehandle sent by the client
(which
> is used as a template), so I guess we really do need to assume worst-case.
> 
> Once that's done, the encode_fh routines add the fileid part. btrfs has a
> pretty large maximum one: 40 bytes. That brings the max size up to 68
bytes,
> which is already too large for NFSv3, before we ever get to the part where
> we embed that inside another fh. We require another 12 bytes on top of the
> "underlying" filehandle for reexporting.
> 
> So, no this may very well not work for all exportable Linux filesystems,
but it
> sort of depends on the situation (and to some degree, what gets sent by
the
> clients). That's what makes this so hard to figure out programmatically.
> 
> As far as extending the protocol...that's not a bad idea, though that's
> obviously a longer-term solution. I don't think we can reasonably rely on
that
> anyway. Maybe though...

I've been thinking about this kind of thing with Ganesha's proxy server, and
conveniently, you have also provided a good use case for proxy...

One option I was going to give Ganesha is the ability to in export
configuration indicate the upstream server is Ganesha, and expect the export
configuration to be mirrored (easy for a config tool to do across the set of
servers, primary and proxy) so that Ganesha could just pass handles through.
Something similar might be possible for knfsd. With a bit more work, we
could be prepared to deal with other servers (like Ganesha providing for
knfsd or visa versa) to break apart the upstream handle to an "export"
component which can be static, and a "filesystem specific" portion that
needs to be passed through. So Ganesha could break out knfsd's fsid encoding
and map that to an exportid, and just pass through the payload handle (the
portion that comes from the exportfs interface).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-18 22:30       ` Frank Filz
@ 2015-11-19 14:01         ` Jeff Layton
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2015-11-19 14:01 UTC (permalink / raw)
  To: Frank Filz
  Cc: 'J. Bruce Fields',
	trond.myklebust, linux-nfs, 'Eric Paris',
	'Alexander Viro',
	linux-fsdevel

On Wed, 18 Nov 2015 14:30:41 -0800
"Frank Filz" <ffilzlnx@mindspring.com> wrote:

> Jeff Layton said:
> > On Wed, 18 Nov 2015 15:22:20 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > +Filehandle size:
> > > > +----------------
> > > > +The maximum filehandle size is governed by the NFS version. Version
> > > > +2 used fixed 32 byte filehandles. Version 3 moved to variable
> > > > +length filehandles that can be up to 64 bytes in size. NFSv4
> > > > +increased that maximum to 128 bytes.
> > > > +
> > > > +When reexporting an NFS filesystem, the underlying filehandle from
> > > > +the server must be embedded inside the filehandles presented to
> > clients.
> > > > +Thus if the underlying server presents filehandles that are too
> > > > +big, the reexporting server can fail to encode them. This can lead
> > > > +to NFSERR_OPNOTSUPP errors being returned to clients.
> > > > +
> > > > +This is not a trivial thing to programatically determine ahead of
> > > > +time (and it can vary even within the same server), so some
> > > > +foreknowledge of how the underlying server constructs filehandles,
> > > > +and their maximum size is a must.
> > >
> > > This is the trickiest one, since it depends on an undocumented
> > > implementation detail of the server.
> > >
> > 
> > Yes, indeed...
> > 
> > > Do we even know if this works for all the exportable Linux filesystems?
> > >
> > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > maximum-filesystem-size attribute to the protocol?
> > >
> > 
> > Erm, I think you mean maximum-filehandle-size, but I get your point...
> > 
> > It's tough to do more than a quick survey, but looking at new-style fh:
> > 
> > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > can get that down to 8 bytes if you specify the fsid directly. The fsid
> choice is
> > weird, because it sort of depends on the filehandle sent by the client
> (which
> > is used as a template), so I guess we really do need to assume worst-case.
> > 
> > Once that's done, the encode_fh routines add the fileid part. btrfs has a
> > pretty large maximum one: 40 bytes. That brings the max size up to 68
> bytes,
> > which is already too large for NFSv3, before we ever get to the part where
> > we embed that inside another fh. We require another 12 bytes on top of the
> > "underlying" filehandle for reexporting.
> > 
> > So, no this may very well not work for all exportable Linux filesystems,
> but it
> > sort of depends on the situation (and to some degree, what gets sent by
> the
> > clients). That's what makes this so hard to figure out programmatically.
> > 
> > As far as extending the protocol...that's not a bad idea, though that's
> > obviously a longer-term solution. I don't think we can reasonably rely on
> that
> > anyway. Maybe though...
> 
> I've been thinking about this kind of thing with Ganesha's proxy server, and
> conveniently, you have also provided a good use case for proxy...
> 
> One option I was going to give Ganesha is the ability to in export
> configuration indicate the upstream server is Ganesha, and expect the export
> configuration to be mirrored (easy for a config tool to do across the set of
> servers, primary and proxy) so that Ganesha could just pass handles through.
> Something similar might be possible for knfsd. With a bit more work, we
> could be prepared to deal with other servers (like Ganesha providing for
> knfsd or visa versa) to break apart the upstream handle to an "export"
> component which can be static, and a "filesystem specific" portion that
> needs to be passed through. So Ganesha could break out knfsd's fsid encoding
> and map that to an exportid, and just pass through the payload handle (the
> portion that comes from the exportfs interface).
> 
> Frank
> 

It would be very tough to just pass the filehandles through here. After
all, we're hooking nfsd up to the nfs client code. You could (in
principle) have a mix of regular filesystems and reexported nfs mounts.
What if there are filehandle collisions between the one you passed
through and one of your local exported filesystems?

Breaking up the filehandle is also pretty much impossible to do in a
general way. The problem of course is that filehandles are really
opaque blobs to the client. You'd have to know ahead of time what part
of it refers to the fsid and what part is the fileid part. knfsd can
compose all sorts of filehandles, and it tries hard to mirror the type
of fh that the client is using.

Beyond that, what do you do when you get one of these "reconstituted"
filehandles back from the client where you've stripped off the fsid
info? At some point you have to reconstruct the original filehandle so
you can call back to the underlying server. Where do you store the fsid
info? Note that it has to be persistent across reboots too or you'll
see stale nfs filehandles on the clients.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open
  2015-11-17 11:52 ` [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open Jeff Layton
@ 2015-11-19 20:06   ` J. Bruce Fields
  2015-11-19 20:52     ` Trond Myklebust
  2015-11-19 20:59     ` Jeff Layton
  0 siblings, 2 replies; 52+ messages in thread
From: J. Bruce Fields @ 2015-11-19 20:06 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Tue, Nov 17, 2015 at 06:52:53AM -0500, Jeff Layton wrote:
> From: Peng Tao <tao.peng@primarydata.com>
> 
> It's a trival change but follows knfsd export document that asks
> for d_splice_alias during lookup.

This is a bug even before you start exporting, isn't it?

OK, I see, in the atomic_open case we're probably only dealing with a
positive dentry for a regular file at this point, in which case
d_splice_alias is really just d_add....

I'm not sure this patch is really necessary, then.

--b.


> 
> Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> ---
>  fs/nfs/dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ce5a21861074..a4df1137878e 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1534,7 +1534,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>  		switch (err) {
>  		case -ENOENT:
>  			d_drop(dentry);
> -			d_add(dentry, NULL);
> +			d_splice_alias(NULL, dentry);
>  			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
>  			break;
>  		case -EISDIR:
> -- 
> 2.4.3

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open
  2015-11-19 20:06   ` J. Bruce Fields
@ 2015-11-19 20:52     ` Trond Myklebust
  2015-11-19 20:59     ` Jeff Layton
  1 sibling, 0 replies; 52+ messages in thread
From: Trond Myklebust @ 2015-11-19 20:52 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Linux NFS Mailing List, Eric Paris, Alexander Viro,
	Linux FS-devel Mailing List

On Thu, Nov 19, 2015 at 3:06 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Tue, Nov 17, 2015 at 06:52:53AM -0500, Jeff Layton wrote:
> > From: Peng Tao <tao.peng@primarydata.com>
> >
> > It's a trival change but follows knfsd export document that asks
> > for d_splice_alias during lookup.
>
> This is a bug even before you start exporting, isn't it?
>
> OK, I see, in the atomic_open case we're probably only dealing with a
> positive dentry for a regular file at this point, in which case
> d_splice_alias is really just d_add....
>
> I'm not sure this patch is really necessary, then.
>

d_splice_alias() also instantly reduces to d_add() whenever the
'inode' argument is NULL.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open
  2015-11-19 20:06   ` J. Bruce Fields
  2015-11-19 20:52     ` Trond Myklebust
@ 2015-11-19 20:59     ` Jeff Layton
  2015-11-19 22:32       ` J. Bruce Fields
  1 sibling, 1 reply; 52+ messages in thread
From: Jeff Layton @ 2015-11-19 20:59 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Thu, 19 Nov 2015 15:06:51 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Nov 17, 2015 at 06:52:53AM -0500, Jeff Layton wrote:
> > From: Peng Tao <tao.peng@primarydata.com>
> > 
> > It's a trival change but follows knfsd export document that asks
> > for d_splice_alias during lookup.
> 
> This is a bug even before you start exporting, isn't it?
> 
> OK, I see, in the atomic_open case we're probably only dealing with a
> positive dentry for a regular file at this point, in which case
> d_splice_alias is really just d_add....
> 
> I'm not sure this patch is really necessary, then.
> 
> --b.
> 
> 
> > 
> > Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> > ---
> >  fs/nfs/dir.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index ce5a21861074..a4df1137878e 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -1534,7 +1534,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> >  		switch (err) {
> >  		case -ENOENT:
> >  			d_drop(dentry);
> > -			d_add(dentry, NULL);
> > +			d_splice_alias(NULL, dentry);
> >  			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> >  			break;
> >  		case -EISDIR:
> > -- 
> > 2.4.3

Ahh right -- good point. d_splice_alias does this:

        if (!inode) {
                __d_instantiate(dentry, NULL);
                goto out;
        }
[...]
out:
	security_d_instantiate(dentry, inode);
        d_rehash(dentry);
        return NULL;

...which is exactly what d_add does. So yeah, that change isn't
strictly necessary. Still, it might be more future-proof to use
d_splice_alias there if the semantics ever change...

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open
  2015-11-19 20:59     ` Jeff Layton
@ 2015-11-19 22:32       ` J. Bruce Fields
  0 siblings, 0 replies; 52+ messages in thread
From: J. Bruce Fields @ 2015-11-19 22:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Thu, Nov 19, 2015 at 03:59:31PM -0500, Jeff Layton wrote:
> Ahh right -- good point. d_splice_alias does this:
> 
>         if (!inode) {
>                 __d_instantiate(dentry, NULL);
>                 goto out;
>         }
> [...]
> out:
> 	security_d_instantiate(dentry, inode);
>         d_rehash(dentry);
>         return NULL;
> 
> ...which is exactly what d_add does. So yeah, that change isn't
> strictly necessary. Still, it might be more future-proof to use
> d_splice_alias there if the semantics ever change...

Could be, I don't know.

May as well apply it (or not) independently of this series, though.

--b.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-18 21:15     ` Jeff Layton
  2015-11-18 22:30       ` Frank Filz
@ 2015-11-20  0:04       ` J. Bruce Fields
  2015-11-20  0:28         ` Jeff Layton
  1 sibling, 1 reply; 52+ messages in thread
From: J. Bruce Fields @ 2015-11-20  0:04 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> On Wed, 18 Nov 2015 15:22:20 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > +Filehandle size:
> > > +----------------
> > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > +maximum to 128 bytes.
> > > +
> > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > +server must be embedded inside the filehandles presented to clients.
> > > +Thus if the underlying server presents filehandles that are too big, the
> > > +reexporting server can fail to encode them. This can lead to
> > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > +
> > > +This is not a trivial thing to programatically determine ahead of time
> > > +(and it can vary even within the same server), so some foreknowledge of
> > > +how the underlying server constructs filehandles, and their maximum
> > > +size is a must.
> > 
> > This is the trickiest one, since it depends on an undocumented
> > implementation detail of the server.
> > 
> 
> Yes, indeed...
> 
> > Do we even know if this works for all the exportable Linux filesystems?
> > 
> > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > maximum-filesystem-size attribute to the protocol?
> > 
> 
> Erm, I think you mean maximum-filehandle-size, but I get your point...

Whoops, thanks.

> It's tough to do more than a quick survey, but looking at new-style fh:
> 
> The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> can get that down to 8 bytes if you specify the fsid directly. The fsid
> choice is weird, because it sort of depends on the filehandle sent by
> the client (which is used as a template), so I guess we really do need
> to assume worst-case.

The client can only ever use filehandles it's been given, so if the
backend server's always been configured to use a certain kind (e.g. if
the exports have fsid= set), then we're OK, we're not responsible for
clients that guess random filehandles.

> Once that's done, the encode_fh routines add the fileid part. btrfs has
> a pretty large maximum one: 40 bytes. That brings the max size up to 68
> bytes, which is already too large for NFSv3, before we ever get to
> the part where we embed that inside another fh. We require another 12
> bytes on top of the "underlying" filehandle for reexporting.

So it's not necessarily that bad for nfsd, though of course it makes it
more complicated to configure the backend server.  Well, and knfsd has
v3 support so this is all a bit academic I guess.

So I'm having trouble weighing the benefits of this patch set against
the risks.

It's not even necessarily true that filehandles on a given filesystem
need be constant length.  In theory a server could decide to start
giving out bigger filehandles some day (as long as it continued to
respect the old ones), and the proxy would break.  In practice maybe
nobody does that.

--b.

> So, no this may very well not work for all exportable Linux
> filesystems, but it sort of depends on the situation (and to some
> degree, what gets sent by the clients). That's what makes this so hard
> to figure out programmatically.
> 
> As far as extending the protocol...that's not a bad idea, though that's
> obviously a longer-term solution. I don't think we can reasonably rely
> on that anyway. Maybe though...
> 
> -- 
> Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-20  0:04       ` J. Bruce Fields
@ 2015-11-20  0:28         ` Jeff Layton
  2016-01-14 22:21           ` J. Bruce Fields
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff Layton @ 2015-11-20  0:28 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Thu, 19 Nov 2015 19:04:15 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> > On Wed, 18 Nov 2015 15:22:20 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > +Filehandle size:
> > > > +----------------
> > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > +maximum to 128 bytes.
> > > > +
> > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > +server must be embedded inside the filehandles presented to clients.
> > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > +reexporting server can fail to encode them. This can lead to
> > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > +
> > > > +This is not a trivial thing to programatically determine ahead of time
> > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > +how the underlying server constructs filehandles, and their maximum
> > > > +size is a must.
> > > 
> > > This is the trickiest one, since it depends on an undocumented
> > > implementation detail of the server.
> > > 
> > 
> > Yes, indeed...
> > 
> > > Do we even know if this works for all the exportable Linux filesystems?
> > > 
> > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > maximum-filesystem-size attribute to the protocol?
> > > 
> > 
> > Erm, I think you mean maximum-filehandle-size, but I get your point...
> 
> Whoops, thanks.
> 
> > It's tough to do more than a quick survey, but looking at new-style fh:
> > 
> > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > choice is weird, because it sort of depends on the filehandle sent by
> > the client (which is used as a template), so I guess we really do need
> > to assume worst-case.
> 
> The client can only ever use filehandles it's been given, so if the
> backend server's always been configured to use a certain kind (e.g. if
> the exports have fsid= set), then we're OK, we're not responsible for
> clients that guess random filehandles.
> 
> > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > bytes, which is already too large for NFSv3, before we ever get to
> > the part where we embed that inside another fh. We require another 12
> > bytes on top of the "underlying" filehandle for reexporting.
> 
> So it's not necessarily that bad for nfsd, though of course it makes it
> more complicated to configure the backend server.  Well, and knfsd has
> v3 support so this is all a bit academic I guess.
> 

You just have to make sure you vet the filehandle size on the stuff
you're reexporting. In our use-case, we know that the backend server's
filehandles are well under 42 bytes, so we're well under the max size.

One thing we could consider is promoting the dprintk in nfs_encode_fh
when this occurs to a pr_err or something. That would at least make
it very obvious when that occurs...

> So I'm having trouble weighing the benefits of this patch set against
> the risks.
> 
> It's not even necessarily true that filehandles on a given filesystem
> need be constant length.  In theory a server could decide to start
> giving out bigger filehandles some day (as long as it continued to
> respect the old ones), and the proxy would break.  In practice maybe
> nobody does that.
> 

Hard to say. There are a lot of oddball servers out there. There
certainly are risks involved in reexporting, particularly if you don't
heed the caveats. It's for good reason this Kconfig option defaults to
"n". ;)

OTOH, the kernel shouldn't crash or anything if that occurs. If your
filehandles are too large to be embedded, then you just end up getting
back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
it shouldn't happen if you're careful about what gets reexported.


> 
> > So, no this may very well not work for all exportable Linux
> > filesystems, but it sort of depends on the situation (and to some
> > degree, what gets sent by the clients). That's what makes this so hard
> > to figure out programmatically.
> > 
> > As far as extending the protocol...that's not a bad idea, though that's
> > obviously a longer-term solution. I don't think we can reasonably rely
> > on that anyway. Maybe though...
> > 
> > -- 
> > Jeff Layton <jlayton@poochiereds.net>


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2015-11-20  0:28         ` Jeff Layton
@ 2016-01-14 22:21           ` J. Bruce Fields
  2016-01-15 16:00             ` Jeff Layton
  0 siblings, 1 reply; 52+ messages in thread
From: J. Bruce Fields @ 2016-01-14 22:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Thu, Nov 19, 2015 at 07:28:49PM -0500, Jeff Layton wrote:
> On Thu, 19 Nov 2015 19:04:15 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:
> > > On Wed, 18 Nov 2015 15:22:20 -0500
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > 
> > > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:
> > > > > +Filehandle size:
> > > > > +----------------
> > > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > > +maximum to 128 bytes.
> > > > > +
> > > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > > +server must be embedded inside the filehandles presented to clients.
> > > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > > +reexporting server can fail to encode them. This can lead to
> > > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > > +
> > > > > +This is not a trivial thing to programatically determine ahead of time
> > > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > > +how the underlying server constructs filehandles, and their maximum
> > > > > +size is a must.
> > > > 
> > > > This is the trickiest one, since it depends on an undocumented
> > > > implementation detail of the server.
> > > > 
> > > 
> > > Yes, indeed...
> > > 
> > > > Do we even know if this works for all the exportable Linux filesystems?
> > > > 
> > > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > > maximum-filesystem-size attribute to the protocol?
> > > > 
> > > 
> > > Erm, I think you mean maximum-filehandle-size, but I get your point...
> > 
> > Whoops, thanks.
> > 
> > > It's tough to do more than a quick survey, but looking at new-style fh:
> > > 
> > > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > > choice is weird, because it sort of depends on the filehandle sent by
> > > the client (which is used as a template), so I guess we really do need
> > > to assume worst-case.
> > 
> > The client can only ever use filehandles it's been given, so if the
> > backend server's always been configured to use a certain kind (e.g. if
> > the exports have fsid= set), then we're OK, we're not responsible for
> > clients that guess random filehandles.
> > 
> > > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > > bytes, which is already too large for NFSv3, before we ever get to
> > > the part where we embed that inside another fh. We require another 12
> > > bytes on top of the "underlying" filehandle for reexporting.
> > 
> > So it's not necessarily that bad for nfsd, though of course it makes it
> > more complicated to configure the backend server.  Well, and knfsd has
> > v3 support so this is all a bit academic I guess.
> > 
> 
> You just have to make sure you vet the filehandle size on the stuff
> you're reexporting. In our use-case, we know that the backend server's
> filehandles are well under 42 bytes, so we're well under the max size.
> 
> One thing we could consider is promoting the dprintk in nfs_encode_fh
> when this occurs to a pr_err or something. That would at least make
> it very obvious when that occurs...
> 
> > So I'm having trouble weighing the benefits of this patch set against
> > the risks.
> > 
> > It's not even necessarily true that filehandles on a given filesystem
> > need be constant length.  In theory a server could decide to start
> > giving out bigger filehandles some day (as long as it continued to
> > respect the old ones), and the proxy would break.  In practice maybe
> > nobody does that.
> > 
> 
> Hard to say. There are a lot of oddball servers out there. There
> certainly are risks involved in reexporting, particularly if you don't
> heed the caveats. It's for good reason this Kconfig option defaults to
> "n". ;)
> 
> OTOH, the kernel shouldn't crash or anything if that occurs. If your
> filehandles are too large to be embedded, then you just end up getting
> back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
> it shouldn't happen if you're careful about what gets reexported.

OK, sorry for the long silence on this.

Basically I'm having trouble making the case to myself here:

	- On the one hand, having you guys carry all this stuff is
	  annoying, I'd rather our code bases were closer.
	- On the other hand, I can't see taking something that's in
	  practice basically only useful for one proprietary server,
	  which is the way it looks to me right now.
	- Also, "NFS proxying" *sounds* much more general than it really
	  is, and I fear a lot of people are going to fall into that
	  trap now matter how we warn them.

Gah.

Anyway, for now I should take the one tracepoint patch at least (and
shouldn't some of the fs patches go in regardless?) but I'm punting on
the rest.

--b.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation
  2016-01-14 22:21           ` J. Bruce Fields
@ 2016-01-15 16:00             ` Jeff Layton
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff Layton @ 2016-01-15 16:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, linux-nfs, Eric Paris, Alexander Viro, linux-fsdevel

On Thu, 14 Jan 2016 17:21:27 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Nov 19, 2015 at 07:28:49PM -0500, Jeff Layton wrote:
> > On Thu, 19 Nov 2015 19:04:15 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> >   
> > > On Wed, Nov 18, 2015 at 04:15:21PM -0500, Jeff Layton wrote:  
> > > > On Wed, 18 Nov 2015 15:22:20 -0500
> > > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > > >   
> > > > > On Tue, Nov 17, 2015 at 06:53:00AM -0500, Jeff Layton wrote:  
> > > > > > +Filehandle size:
> > > > > > +----------------
> > > > > > +The maximum filehandle size is governed by the NFS version. Version 2
> > > > > > +used fixed 32 byte filehandles. Version 3 moved to variable length
> > > > > > +filehandles that can be up to 64 bytes in size. NFSv4 increased that
> > > > > > +maximum to 128 bytes.
> > > > > > +
> > > > > > +When reexporting an NFS filesystem, the underlying filehandle from the
> > > > > > +server must be embedded inside the filehandles presented to clients.
> > > > > > +Thus if the underlying server presents filehandles that are too big, the
> > > > > > +reexporting server can fail to encode them. This can lead to
> > > > > > +NFSERR_OPNOTSUPP errors being returned to clients.
> > > > > > +
> > > > > > +This is not a trivial thing to programatically determine ahead of time
> > > > > > +(and it can vary even within the same server), so some foreknowledge of
> > > > > > +how the underlying server constructs filehandles, and their maximum
> > > > > > +size is a must.  
> > > > > 
> > > > > This is the trickiest one, since it depends on an undocumented
> > > > > implementation detail of the server.
> > > > >   
> > > > 
> > > > Yes, indeed...
> > > >   
> > > > > Do we even know if this works for all the exportable Linux filesystems?
> > > > > 
> > > > > If proxying NFSv4.x servers is actually useful, could we add a per-fs
> > > > > maximum-filesystem-size attribute to the protocol?
> > > > >   
> > > > 
> > > > Erm, I think you mean maximum-filehandle-size, but I get your point...  
> > > 
> > > Whoops, thanks.
> > >   
> > > > It's tough to do more than a quick survey, but looking at new-style fh:
> > > > 
> > > > The max fsid len seems to be 28 bytes (FSID_UUID16_INUM), though you
> > > > can get that down to 8 bytes if you specify the fsid directly. The fsid
> > > > choice is weird, because it sort of depends on the filehandle sent by
> > > > the client (which is used as a template), so I guess we really do need
> > > > to assume worst-case.  
> > > 
> > > The client can only ever use filehandles it's been given, so if the
> > > backend server's always been configured to use a certain kind (e.g. if
> > > the exports have fsid= set), then we're OK, we're not responsible for
> > > clients that guess random filehandles.
> > >   
> > > > Once that's done, the encode_fh routines add the fileid part. btrfs has
> > > > a pretty large maximum one: 40 bytes. That brings the max size up to 68
> > > > bytes, which is already too large for NFSv3, before we ever get to
> > > > the part where we embed that inside another fh. We require another 12
> > > > bytes on top of the "underlying" filehandle for reexporting.  
> > > 
> > > So it's not necessarily that bad for nfsd, though of course it makes it
> > > more complicated to configure the backend server.  Well, and knfsd has
> > > v3 support so this is all a bit academic I guess.
> > >   
> > 
> > You just have to make sure you vet the filehandle size on the stuff
> > you're reexporting. In our use-case, we know that the backend server's
> > filehandles are well under 42 bytes, so we're well under the max size.
> > 
> > One thing we could consider is promoting the dprintk in nfs_encode_fh
> > when this occurs to a pr_err or something. That would at least make
> > it very obvious when that occurs...
> >   
> > > So I'm having trouble weighing the benefits of this patch set against
> > > the risks.
> > > 
> > > It's not even necessarily true that filehandles on a given filesystem
> > > need be constant length.  In theory a server could decide to start
> > > giving out bigger filehandles some day (as long as it continued to
> > > respect the old ones), and the proxy would break.  In practice maybe
> > > nobody does that.
> > >   
> > 
> > Hard to say. There are a lot of oddball servers out there. There
> > certainly are risks involved in reexporting, particularly if you don't
> > heed the caveats. It's for good reason this Kconfig option defaults to
> > "n". ;)
> > 
> > OTOH, the kernel shouldn't crash or anything if that occurs. If your
> > filehandles are too large to be embedded, then you just end up getting
> > back FILEID_INVALID on the encode_fh. That sucks if it occurs, but
> > it shouldn't happen if you're careful about what gets reexported.  
> 
> OK, sorry for the long silence on this.
> 
> Basically I'm having trouble making the case to myself here:
> 
> 	- On the one hand, having you guys carry all this stuff is
> 	  annoying, I'd rather our code bases were closer.
> 	- On the other hand, I can't see taking something that's in
> 	  practice basically only useful for one proprietary server,
> 	  which is the way it looks to me right now.
> 	- Also, "NFS proxying" *sounds* much more general than it really
> 	  is, and I fear a lot of people are going to fall into that
> 	  trap now matter how we warn them.
> 
> Gah.
> 
> Anyway, for now I should take the one tracepoint patch at least (and
> shouldn't some of the fs patches go in regardless?) but I'm punting on
> the rest.
> 
> --b.

Understood.

I've not had the cycles to spend on this lately anyway, as I've been
putting out fires elsewhere. Perhaps once I am able to do that and
spend some time on the performance of this, we may find that the open
file cache is more generally useful, and we can revisit it then. We'll
see...

FWIW, there is one significant bugfix to that series that I've also not
had the time to post as well. The error handling when fsnotify_add_mark
returns an error is not right, and it can end up with a double free of
the mark.

As far as what should go in soon...yeah, this tracepoint patch might be
nice:

    nfsd: add new io class tracepoint

For the vfs, these two might be good, but I'd like Al to offer an
opinion on the first one. I'm pretty sure we don't call
flush_delayed_fput until after the workqueue threads have been started,
but the only caller now is in the boot code, AFAICT and I'm not 100%
sure on that point:

    fs: have flush_delayed_fput flush the workqueue job
    fs: add a kerneldoc header to fput

This patch has already been picked up by Andrew, AFAICT:

    fsnotify: destroy marks with call_srcu instead of dedicated thread

...and the rest are pretty much specific to the reexporting
functionality.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2016-01-15 16:00 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17 11:52 [PATCH v1 00/38] Allow NFS filesystems to be reexported via knfsd Jeff Layton
2015-11-17 11:52 ` [PATCH v1 01/38] nfsd: add new io class tracepoint Jeff Layton
2015-11-17 11:52 ` [PATCH v1 02/38] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
2015-11-17 11:52 ` [PATCH v1 03/38] fs: add a kerneldoc header to fput Jeff Layton
2015-11-17 11:52 ` [PATCH v1 04/38] fs: rename "delayed_fput" infrastructure to "fput_global" Jeff Layton
2015-11-17 11:52 ` [PATCH v1 05/38] fs: add fput_global Jeff Layton
2015-11-17 11:52 ` [PATCH v1 06/38] fsnotify: fix a sparse warning Jeff Layton
2015-11-17 11:52 ` [PATCH v1 07/38] fsnotify: export several symbols Jeff Layton
2015-11-17 11:52 ` [PATCH v1 08/38] fsnotify: destroy marks with call_srcu instead of dedicated thread Jeff Layton
2015-11-17 11:52 ` [PATCH v1 09/38] fsnotify: add a srcu barrier for fsnotify Jeff Layton
2015-11-17 11:52 ` [PATCH v1 10/38] locks: create a new notifier chain for lease attempts Jeff Layton
2015-11-17 11:52 ` [PATCH v1 11/38] sunrpc: add a new cache_detail operation for when a cache is flushed Jeff Layton
2015-11-17 11:52 ` [PATCH v1 12/38] nfsd: add a new struct file caching facility to nfsd Jeff Layton
2015-11-17 11:52 ` [PATCH v1 13/38] nfsd: keep some rudimentary stats on nfsd_file cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 14/38] nfsd: allow filecache open to skip fh_verify check Jeff Layton
2015-11-17 11:52 ` [PATCH v1 15/38] nfsd: hook up nfsd_write to the new nfsd_file cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 16/38] nfsd: hook up nfsd_read to the " Jeff Layton
2015-11-17 11:52 ` [PATCH v1 17/38] nfsd: hook nfsd_commit up " Jeff Layton
2015-11-17 11:52 ` [PATCH v1 18/38] nfsd: convert nfs4_file->fi_fds array to use nfsd_files Jeff Layton
2015-11-17 11:52 ` [PATCH v1 19/38] nfsd: have nfsd_test_lock use the nfsd_file cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 20/38] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file Jeff Layton
2015-11-17 11:52 ` [PATCH v1 21/38] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 22/38] nfsd: rip out the raparms cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 23/38] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Jeff Layton
2015-11-17 11:52 ` [PATCH v1 24/38] nfsd: allow lockd to be forcibly disabled Jeff Layton
2015-11-17 11:52 ` [PATCH v1 25/38] nfsd: add errno mapping for EREMOTEIO Jeff Layton
2015-11-17 11:52 ` [PATCH v1 26/38] nfsd: return EREMOTE if we find an S_AUTOMOUNT inode Jeff Layton
2015-11-17 11:52 ` [PATCH v1 27/38] nfsd: allow filesystems to opt out of subtree checking Jeff Layton
2015-11-17 22:53   ` Jeff Layton
2015-11-17 11:52 ` [PATCH v1 28/38] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Jeff Layton
2015-11-17 11:52 ` [PATCH v1 29/38] nfsd: retry once in nfsd_open on an -EOPENSTALE return Jeff Layton
2015-11-17 11:52 ` [PATCH v1 30/38] nfsd: close cached file when underlying file systems says no such file Jeff Layton
2015-11-17 11:52 ` [PATCH v1 31/38] nfs: replace d_add with d_splice_alias in atomic_open Jeff Layton
2015-11-19 20:06   ` J. Bruce Fields
2015-11-19 20:52     ` Trond Myklebust
2015-11-19 20:59     ` Jeff Layton
2015-11-19 22:32       ` J. Bruce Fields
2015-11-17 11:52 ` [PATCH v1 32/38] nfs: add encode_fh export op Jeff Layton
2015-11-17 11:52 ` [PATCH v1 33/38] nfs: add fh_to_dentry " Jeff Layton
2015-11-17 11:52 ` [PATCH v1 34/38] nfs: nfs_fh_to_dentry() make use of inode cache Jeff Layton
2015-11-17 11:52 ` [PATCH v1 35/38] nfs4: add NFSv4 LOOKUPP handlers Jeff Layton
2015-11-17 11:52 ` [PATCH v1 36/38] nfs: add a get_parent export operation for NFS Jeff Layton
2015-11-17 11:52 ` [PATCH v1 37/38] nfs: set export ops Jeff Layton
2015-11-17 11:53 ` [PATCH v1 38/38] nfs: add a Kconfig option for NFS reexporting and documentation Jeff Layton
2015-11-18 20:22   ` J. Bruce Fields
2015-11-18 21:15     ` Jeff Layton
2015-11-18 22:30       ` Frank Filz
2015-11-19 14:01         ` Jeff Layton
2015-11-20  0:04       ` J. Bruce Fields
2015-11-20  0:28         ` Jeff Layton
2016-01-14 22:21           ` J. Bruce Fields
2016-01-15 16:00             ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.